Sunday, October 6, 2013

Script: Boxplot.gs; Make Box Plots from user-input data.

I wrote this script after I finished the script statpack.gs since most of the heavy lifting was already finished.  Boxplot takes data from a user input array in the form of a string with array elements separated by spaces (same as scatter.gs) and plots out a box plot using that data.  To use this script, no outside file is required.  It is written such that an unspecified number of box plots can be plotted on the same plot.  In most cases, this type of analysis is best left to other plotting software, but this script may be of some use in certain cases.

Example usage: Hail Reports between 1995 and 2006.

    array1='64 88 127 49 175 115 128 31 190 82 229'
    array2='712 595 663 777 795 1400 1375 1029 1256 1083 1015 '

    'boxplot -d 'array1' -d 'array2' -range 0 1500 -enclose -cap -title -ytitle -xlab custom'

Example of Boxplot.gs output: Hail Reports.

This example is for the number of small hail reports in New York and Oklahoma.
As you can gather from the above example, this script takes a whole bunch of arguments, all are listed in a help section and can be set at the top of boxplot.gs.  I had other examples to present involving climatology data, but the government shutdown has made accessing this data a bit frustrating, so that has been tabled for now.  For more information on how to populate an array from data contained in a GrADS .ctl file, check out the example on scatter.gs.

Notes:
  • This script can process as many arrays as you specify, as long as each array is preceded by a "-d"
  • Unlike scatter.gs I did not spend a lot of time on "anti-clipping", so if you specify your data-axis (rather than leaving it on auto) you may find your data exceeding your boundaries.
    • Similarly, the larger you set your boxsize, the more likely your data might exceed your plot boundaries (a boxsize <0.1 is recommended)
  • The data represents the inner-quartile-range, the median, and the lines extend to the 10th and the 90th  percentile [calculated using the formula used by MS Excel - q=p*(n-1)+1].
  • If you turn on title plotting (-title) in your arguments, you will be prompted to enter your title, so be sure to keep an eye on your console for instructions.

Hopefully you find this script useful, this is v1.0 so be on the lookout for bugs, I have tested most of the options and I can't see any bugs as of now, but if you find some, please report them here!

Download Boxplot.gs

Download Example Script


Saturday, October 5, 2013

Script: Statpack; Perform basic statistical analysis on user-input arrays.





This is another one of those scrips that extends the flexibility of GrADS beyond it's intended purpose.  As usual, these analyses might better be performed using higher level data analysis software (e.g., python or matlab).  However, this script might be useful for generating quick statistics on data sets that are in GrADS format without the intermediary step of saving data to a file to read into another program.


This script is not a script that you call, but instead a library of functions that you include in your script.  Note: you must include all functions as some of these functions rely on each other to operate.

In order to use this script, your data arrays must in in the format of strings with your array elements separated by spaces.  e.g., array='1 2 3 4 5 6 7 8 9 10'.

Each function requires different inputs.  A list of the included functions is below, as well is in a brief help page at the top of the script.

Function list
  • mean(arr): Returns mean of input array "arr"
  • stdev(arr): Returns Standard Deviation of input array "arr"
  • sort(arr): Returns sorted array (lowest to highest) of array "arr"
  • rank(arr): Returns an array size equal to array "arr" ranging from 1 to size(arr)
  • percentile(arr,p): Returns the data percentile (p) of array "arr"
  • size(arr): Returns the number of elements in array "arr"
  • max(arr): Returns the maximum value in array "arr"
  • min(arr): Returns the minimum value in array "arr"
  • correlation(arr1,arr2): Returns the correlation coefficient "r" between "arr1" and "arr2"
  • regression(arr1,arr2): Returns the regression coefficient between "arr1" and "arr2


Example Use: 
  •  Calculate median of array: data='9 7 8 6 5 4 3 2 1' 
           median=percentile(data,0.5)
           say median
 
           Outputs "5" to screen

  • Sort array: data='9 7 8 6 5 4 3 2 1' 
          sorted=sort(data)
          say sorted
  
          Outputs "1 2 3 4 5 6 7 8 9" to screen


Download statpack.gs