Wednesday, December 30, 2015

Tutorial: The difference between self describing and non-self describing NETCDF files, and how to write them


In writing posts for this blog, I try my best to avoid what could be considered "mission creep," or writing coding tutorials that at best only tangentially related to GrADS.  This tutorial is verging upon what could be considered mission creep and has as much to do with teaching "good data" practices as it does with teaching GrADS skills.  Furthermore, there are no GrADS scripting lessons to be learned here, and the only code I provide is written for Python. 

All that said, in this tutorial, I will be showing you how to write NETCDF files that can be read using GrADS using both the self describing 'sdfopen' and non-self describing 'xdfopen' commands. I do this because, you may one day find yourself with some gridded data set that you want to plot using GrADS, however this data set may be in a text file format, or perhaps maybe you need to perform higher level calculations on the data that cannot be performed in GrADS before you display it.  While you could write the data out as a binary data file with a GrADS control file, maybe you are more comfortable with the NETCDF format (I know I certainly am).  Plus there are other benefits to using NETCDF instead of binary; one example is cross compatibility. NETCDF files can be read in with pretty much any data analysis software, so not only can you read your file in GrADS, but you can read it in with Matlab, IDL, R, Python, etc., as well.

What makes a NETCDF file self describing?

It's all about the metadata!  Metadata is often referred to "data about the data."  For example, metadata includes things like:
  • The data and time a file was created
  • What data is in the file
  • Units of variables within the file
  • Information about the coordinates (e.g., lat, lon vs. x,y)
  • If the data file is for model output it might include information such as 
    • Resolution, parameterizations, time step
Metadata is super important, and learning how to include it in any data set you are producing/sharing is a critical skill in a scientific career.  We are lucky in meteorology because there have been solid efforts to standardize metadata practices in the past couple of decades.  Now, NETCDF makes it very easy to provide good metadata, so there is no excuse not to do it.  So if nothing else, I hope to get that point across to you with this tutorial.  To really drive this point home, here is a meme.


To make a file self-describing (that is such that GrADS can read the file in with NO accompanying control file required using the 'sfdopen' command) the metadata including in the NETCDF file must adhere to the COARDS data standards: More information on COARDS here.  Personally, I didn't find this website all that useful when figuring out how to make self-describing NETCDF files, but it does provide some nice background.

How to write a self describing NETCDF file (SDFOPEN)

Here I will provide the key metadata required to write a self-describing NETCDF file.  The example, I provide is sub-setting the first 24 hours of GFS temperature data from the 00z December 30th 2015 run.  If you choose to try and reproduce these results, you will need to update the date.  I will us python to read in the data from the GrADS Data Server (GDS) and to write the local NETCDF file.  You could use any program of your choice to do this, so long as it reads and writes NETCDF files.  The python script is included in this tutorial.

So the most important thing to understand when writing your datafile is that your coordinate dimensions and accompanying dimension variables (e.g., lat,lon) adhere to the COARDS guidelines, otherwise GrADS will not open your file.  For example, you might get an error like:

     gadsdf: SDF file has no discernable Y coordinate

The metadata for the non-dimension variables (e.g., temperature, u,v) is more up to the person writing the file.

Assuming you know how to read in the NETCDF file and data from the GDS, in python:

   datafile='http://nomads.ncep.noaa.gov:9090/dods/gfs_0p25/gfs20151230/gfs_0p25_00z'
   cdata=netCDF4.Dataset(datafile,'r')

 
To start writing the file, we will first look at the dimensions and dimension variables.
You must include the following dimensions and variables in your NETCDF file:
  • lat
  • lon
  • time
  • lev
 In python you will make these dimensions and variables as such:

   newdata=netCDF4.Dataset(newfile,'w')
   newdata.createDimension('lat', len(lat[lmin:lmax]))
   newdata.createDimension('lon', len(lon[lomin:lomax]))
   newdata.createDimension('lev', len(lev[:]))
   newdata.createDimension('time', 8)


Where lmin,lmax,lomin,lomax are indices that mark your subset boundaries.
Then in addition to creating the dimensions, you must make the variables as well (For example latitude):

   latvar=newdata.createVariable('lat','f8',('lat'))

Now you need to populate latvar with metadata.  To determine what metadata you need, I simply printed out the 'lat' variable from within the GDS datafile as this datafile is self-describing and adheres to the COARDS standards.  If you've seen my other tutorials I often access these files using the 'sdfopen' command.  In doing that, I get the following (again this will look different if you aren't using python):

<type 'netCDF4.Variable'>
float64 lat(lat)
    grads_dim: y
    grads_mapping: linear
    grads_size: 721
    units: degrees_north
    long_name: latitude
    minimum: -90.0
    maximum: 90.0
    resolution: 0.25
unlimited dimensions:
current shape = (721,)


These are the metadata that you need to include for the latitude variable in your NETCDF file.  The dimensions will change, because you are sub setting the data, but things like 'grads_dim' and 'grads_mapping' need to be the same.  In python adding metadata to a NETCDF variable is easy:

   latvar.grads_dim='y'
   latvar.grads_mapping='linear'
   latvar.grads_size=str(len(lat[lmin:lmax]))
   latvar.units='degrees_north'
   latvar.long_name='latitude'
   latvar.minimum=str(lat[lmin])
   latvar.maximum=str(lat[lmax])
   latvar.resolution='0.25'
   latvar[:]=lat[lmin:lmax]


Again, remember with the sub setting, the lmin and lmax indices are required to match the output file.
Now if I print out MY latitude variable within the output NETCDF file it looks similar to the GDS file with only minor differences:

<type 'netCDF4.Variable'>
float64 lat(lat)
    grads_dim: y
    grads_mapping: linear
    grads_size: 200
    units: degrees_north
    long_name: latitude
    minimum: 10.0
    maximum: 60.0
    resolution: 0.25
unlimited dimensions:
current shape = (200,)


You will need to do the same thing for the other dimension variables (lon,lev,time) with a few minor differences.  I don't explicitly show the steps here to save space and redundancy.  You can find the steps written down in the accompanying .py file.

Once you have your dimension variables, you need to add the temperature variables.  In the example, I have you add both the 2 meter temperature and the temperature as a function of pressure level.  This is to show you how to add variables with differing dimensions.

As I stated earlier, there are fewer restrictions for these variables than the dimension ones.  But it is still a good practice to add metadata.  When printing out the GDS 'tmp2m' var, you get this:

<type 'netCDF4.Variable'>
float32 tmp2m(time, lat, lon)
    _FillValue: 9.999e+20
    missing_value: 9.999e+20
    long_name: ** 2 m above ground temperature [k]
unlimited dimensions:
current shape = (81, 721, 1440)


So for our variable, we will try to match this as close as possible:
  gtmpvar=newdata.createVariable('GTMP','f4',('time','lat','lon'),fill_value=9.999E20)
  gtmpvar.long_name='2 Meter Temperature'
  gtmpvar.units='K'
  gtmpvar[:]=var1[:8,lmin:lmax,lomin:lomax] ## Add the actual data from the GDS file


And that's it for the 2 meter temperature.  Now, you can do the exact same thing for the vertically varying temperature with only minor changes:
   tmpvar=newdata.createVariable('TEMP','f4',('time','lev','lat','lon'),fill_value=9.999E20)
   tmpvar.long_name='Pressure Interpolated Temperature'
   tmpvar.units='K'
   tmpvar[:]=var2[:8,:,lmin:lmax,lomin:lomax]


The big difference here, is that in the dimensions section of the createVariable function includes a dimension for 'lev'.

When you are done, close out your file and you can then open it in GrADS using the 'sdfopen' command.  Then if you display the temperature data, you should get something that looks like this:


How to read and write a NON self describing NETCDF file (xdfopen)

If you want to read a file that is not self-describing into GrADS, you need to include a control file.
Essentially, you need to treat the NETCDF file as if it were a binary data file.

Now for this tutorial, we will run through the exact same process, highlighting the differences in how we write metadata to the NETCDF file.

Starting with the dimension variables, we will no longer include the metadata that we did for the self describing file:

    newdata=netCDF4.Dataset(xdffile,'w')
    newdata.createDimension('lat', len(lat[lmin:lmax]))
    latvar=newdata.createVariable('lat','f8',('lat'))
    latvar[:]=lat[lmin:lmax]


Similarly, I'll skip the rest of the dimension variables, but as you see, all I did was define the variable and add the data to it.  For simplicity, and since it doesn't matter, I'll keep the temperature variables as they were in the self describing example.

Now the NETCDF file is created.  If however you try to open this file using the 'sdfopen' command, you will get an error.  So we need to create a control file.

The control file can be easily written using python (as in this example) or whatever program you are using.  I'm going to copy the code to write the control file below, and explain it after.

    ### Now write the control file ###
    xdfctl='C:/OpenGrADS/xdffile.ctl'
    xdflocal='xdffile.nc'
    xdf = open(xdfctl, 'w')
    xdf.write('DSET ^' + xdflocal +'\n')
    xdf.write('TITLE XDF example File 00z30dec2015 \n')
    xdf.write('UNDEF 99999.0\n')
    xdf.write('XDEF lon '+str(len(lon[lomin:lomax]))+' LINEAR ' + str(lon[lomin]) +' 0.25\n')
    xdf.write('YDEF lat '+str(len(lat[lmin:lmax]))+' LINEAR ' + str(lat[lmin]) +' 0.25\n')
    xdf.write('ZDEF lev '+str(len(lev[:]))+' LEVELS ')
    for i in lev[:]:
        xdf.write(str(int(i))+' ')
    xdf.write('\n')
    xdf.write('TDEF time 8 LINEAR 00z30dec2015 3hr\n')
    xdf.write('VARS 2\n')
    xdf.write('GTMP=>GTMP 0 99 2-meter Temperature [k]\n')
    xdf.write('TEMP=>GTMP '+str(len(lev[:]))+' 99 Temperature [K]\n')
    xdf.write('ENDVARS')
    xdf.close()


The most important take away to learn about writing a control for a NETCDF file is that you match the variable/dimension names in the control file to the variable/dimension names in the NETCDF file (case sensitive).

For example: 'GTMP=>GTMP' will only work if there is a variable in the NETCDF file called 'GTMP'.  Similarly, following the XDEF definition, you need to point to the longitude variable (lon) in the NETCDF file.  This is different than writing out control files for binary datasets which don't require anything after XDEF.

Other than that, you see some other things, the levels are defined in a special way since the vertical coordinate is not incremented linearly.  Also the lat/lon sizes are not hard coded, to allow you to make bigger or smaller spatial subsets without a lot of extra work.

Open the control file in GrADS using the 'xdfopen' command:

'xdfopen xdffile.ctl'

A few notes on using the control file:


Note that the coordinate data is not actually read
from the NETCDF file, it is inferred from the control file.  So it is important that the resolution and the starting points are right, otherwise you will get an incorrect result, even if your longitude data is correct in the NETCDF file.  For example, comparing the correct (0.25) resolution to an incorrect resolution (0.45) gives the two different maps shown on the right.  So get the resolution right!

One final note: This kind of control file only seems to work with nice rectangular datasets.  That is lat/lon coordinates that vary independently from one another.  If your lat/lon grid does not vary independently (i.e., longitude is a function of y and visa-versa), then using this method will warp tour map projection.  In these situations it is recommended that you write your data out to a binary file with the PDEF option.  I have yet to get the PDEF option to work with a NETCDF file.






So that is it for the tutorial, hopefully you have a better understanding regarding the difference between self describing and non self describing NETCDF files, and how to write and open them in GrADS.  No GrADS scripts to download here, but a full python script (version 2.7) is included that will reproduce the files and plots generated in this tutorial.  You will need the netCDF4 module installed for this script to work however.  Numpy and sys are standard I believe.  Also you may need to update the date.

Download Python Script here


Script: Colormaps.gs version 2.0; create elegant color scales for your plots using GrADS

Whats New?

  • New Maps Added!
  • Bug involving rounding issues has been fixed
  • option for custom user levels

This script was created to make it easy to generate very pretty color scales for plotting.  These maps are generally based off of preexisting color scales for matplotlib from python, or ncl, or other common color scales.  This script attempts to reproduce these for GrADS.  For more information on this script see the original blog post here.


The following two commands may prove useful to you:


  'colormaps -help'
  'colormaps -demo'

I include a couple of images showing the updates in action.

1. The following image is made using the new '-custom' option and the new c3pcpn color scale.

  'sdfopen http://nomads.ncep.noaa.gov:9090/dods/gfs_0p25/gfs20151230/gfs_0p25_00z' 
  'set display color white'
  ' set gxout shaded'
  'colormaps -map s3pcpn -custom 0 1 2 3 4 5 7.5 10 12.5 15 20 25 30 35 40 45 50 60 70 80 90 100' 
  'd sum(apcpsfc,t=1,t=48,2)'
  'xcbar -fs 2'
GFS precipitation using custom user levels in colormaps.gs


2. The following map is the GFS Topography using the ncl_topo option.  

Since this scale was added specifically for topographical maps, there is a special feature that sets the lowest level to blue, and all subsequent levels to follow the linear topography scale.

  'sdfopen http://nomads.ncep.noaa.gov:9090/dods/gfs_0p25/gfs20151230/gfs_0p25_00z' 
  'set display color white'
  ' set gxout shaded'  'colormaps -map ncl_topo -levels 0.01 3.75 0.25'
  'd hgtsfc/1000'  
  'xcbar -fs 8'
GFS Topography with special color map

 


Aside from that, the rest should be self explanatory if you read the help page, and check out the original blog post.  As always, if you find bugs or have new suggestions, let me know!

 Colormaps_v2.gs

Thursday, July 23, 2015

HCL Color Scales for GrADS

It's been a while since I've posted anything here.  I have been extraordinarily busy with research, and I have not had the time to keep up seriously with this blog.  However, I've recently been involved with several discussions about how to communicate scientific ideas with good figures.  One example of good figure practice is making sure your figures are readable for the color blind.  It's amazing that even today, many popular color scales do not meet this base objective.

Anyway, in these discussions I came across an open access BAMS paper discussing a shift from the traditional RGB color scales to an updated Hue-Chroma-Luminence (HCL) color scale.  So here is a short blog post on how to use this technique with GrADS.

Link to paper:  BAMS HCL paper

The paper includes a link to an online tool used to generate HCL color scales, with an option to output into GrADS RGB format.  So below are a couple of examples of GrADS figures using HCL color scales

Link to the HCL Wizard: http://hclwizard.org/

To access the HCL creator, just click on the link to the Online HCL Creator.  From there it should be pretty easy to follow, simply pick your color scale, and set the export option to GrADS.  It takes a little time to figure out how to use the sliders, but you'll get the hang of it.

A couple of examples from the GFS:





Other color science resources:
Some good information how to use color in your figures: http://betterfigures.org/
Color Brewer, another website with color scale creation, and modification: http://colorbrewer2.org/

Thursday, February 19, 2015

Tutorial: Use basemap.gs to make a beautiful map of "SST", Sea Ice and Snow in GrADS

This tutorial will draw upon many of the skills discussed in several other tutorials on this site, e.g., how to handle multiple files at once, or how to use basemap.gs.  There isn't too much "new" information in this tutorial on how to use these different features and functions of GrADS, rather this tutorial will show you how to specifically make a very pretty map of sea-surface temperature, sea ice, and snow.  The final outcome is shown below.

For this tutorial you will need:

Snow Cover and Sea Ice on the Robinson Map Projection



So before we start, I want to give a full disclaimer: The plot is not actually showing any observed SST.  In this tutorial I'm going to use the 0.25 degree GFS surface temperature.  In practice the surface temperature should be roughly equal to the SST over the Oceans.

The first thing we need to do is open both the GFS data and the sea ice data.  This data will come from NOMADS and will be opened using the 'sdfopen' command.  Note, the files below are for February 2015, so keep in mind, that you will need to change the date on files if you wish to copy paste the example code below.


   'reinit'

   gfsfile='http://nomads.ncep.noaa.gov:9090/dods/gfs_0p25/gfs20150218/gfs_0p25_18z'
   icefile='http://nomads.ncep.noaa.gov:9090/dods/ice/ice20150218/ice.00z'

  'sdfopen 'gfsfile
  'sdfopen 'icefile


Now that the files are open we will simply set up the map.  Since we are using the Robinson Projection, we need to set the longitude to range from -180 to 180 and the latitude -90 to 90.  

   'set gxout shaded'
   'set mpdset hires'
   'set lon -180 180'
   'set lat -90 90'
   'set mproj robinson'
   'colormaps -l 272 307 0.5 -map jet' ;*Note the use of colormaps.gs



Then we display the variable:

   'd tmpsfc'
   'xcbar -fs 4'

  
After a moment, the surface temperature will be displayed following the "jet" color map.
Now, we are going to mask out the land using the land basemap from basemap.gs.

Note, in my version of basemap.gs, I did not want to include the frame around the map projection.  So I opened up basemap.gs and commented out the lines at the bottom that draw the rectangle:

   * Draw a new square frame around the plot
   * If you have 'set frame off' or 'set frame circle' before running basemap,
   * you may want to comment out the next 2 lines
   *'set line 1 1 6'
   *'draw rec 'x1' 'y1' 'x2' 'y2


We are going to use the "medium (M)" resolution option for the coastlines, and we are going to define a custom dark gray color for the land.

   'set rgb 73 80 80 80'
   'basemap L 73 0 M'         ;*L = Land, 73 = Fill Color, 0 = outline Color, M=Medium resolution


SST only!
  This will fill in the land areas with a gray map and you will have an image like this:

Now that we are done with the "SST" and the land mask, we are going to open up the sea-ice file and plot the sea ice.  It is important that you use the maskout function when plotting the sea-ice, otherwise you will overwrite all of the SST data with zeros.

  'set gxout shaded'
  'set dfile 2'                   ;*Sets Default file to file 2 (Sea ice)
  'set z 1'                        ;*Set z to 1
  'set t 1'                        ;*Set t to 1
  'set map 0 1 6'   
  'color 0 0.6 0.1 -kind dimgray->seashell->white'           ;*Note use of color.gs
  'd maskout(icecmsl,icecmsl-0.1)'                                     ;*Need to mask out ice.



After a few moments, the sea ice will plot and you will have an image like this:

Sea Ice and SST only!


Now this is all well and good, but our knowledge is limited to the ice over the oceans, we don't really get any sense of how ice and snow covers the land masses.  So the final touch that we'll add to the map is to plot the GFS snow water equivalent (SWE) using a similar color scale as ice cover.  To do that, we need to reset the file to our first file and then reset our time our vertical level.  Lastly, we plot SWE (again using maskout to avoid plotting over everything).

  'set dfile 1'
  'color 10 100 5 -kind dimgray->seashell->white'
  'set z 1'
  'set t 1'
  'set map 0 1 6'
  'd maskout(weasdsfc,weasdsfc-10)'


And after a few moments, you have a map like the one shown at the beginning of the tutorial.
You can play around with your minimum values for ice and snow, I used 0.1 for sea-ice and 10 mm for SWE.  That seemed to produce a good looking map.  Experiment as you like!  This plot is really basic, no special formatting outside of the actual plot (titles, etc).  Hopefully you enjoyed this tutorial, I know it was oddly specific, and didn't really present anything new, hopefully you learned something anyway!

Download Example Script