More and more data is available each day. The trick is to extract insights. Data warehouse tools, like dashDB, are designed to help you perform meaningful analysis. In this example, we’ll take some raw climate data, convert it, and analyse to deliver meaningful trends and numbers to prospective vacationers choosing a holiday destination.

Environmental, climate and satellite observations are commonly stored in binary netcdf files which contain spatial raster data in a 3D matrix. The 3 axes are: latitude, longitude, and time. You can imagine them as slices of global maps stacked on top of each other for each month. The values of the cells in the matrix are the temperature for a certain time and location: temperature[time,latitude,longitude].

This blog explains how to

  • convert netcdf files to csv format using Python
  • import csv data into dashDB
  • analyse and visualise the data using R

Example of 3D raster data

Convert netcdf into csv with Python

  1. Download this netcdf file containing the average observed monthly temperature.

  2. Install Python from Anaconda (a free distribution that includes the most common packages).

  3. Load the data.
    • If you’re using Windows and are new to Python, watch this movie on how to run the following Python code.
    • For Linux or Mac: open a Terminal, type python at the command prompt, and run the following code:
    from netCDF4 import Dataset
    import numpy as np
    
    cfile = 'absolute.nc'
    id_in = Dataset(cfile)
    longitude = id_in.variables['lon'][:]
    latitude = id_in.variables['lat'][:]
    time = id_in.variables['time'][:]
    temperature = id_in.variables['tem'][:,:,:]
    
    print np.shape(temperature)
  4. View the data.

    Let’s take a quick look at what we’ve got. The 3D temperature matrix contains values for 12 months, 36 latitudes and 72 longitudes. To see what this data looks like, enter the following code to create a quick global map with the average temperature for January. The slice of data for January is temperature[0,:,:] (in Python the index starts with 0).

    from mpl_toolkits.basemap import Basemap, addcyclic, shiftgrid, maskoceans
    import scipy
    import matplotlib
    from pylab import *
    
    # define the area to plot and projection to use
    m =
    Basemap(llcrnrlon=-180,llcrnrlat=-60,urcrnrlon=180,urcrnrlat=80,projection='mill')
    
    # covert the latitude and longitude to raster coordinates to be plotted
    t1 = temperature[0,:,:]
    t1, lon = addcyclic(t1, longitude)
    january, lons = shiftgrid(180., t1, lon, start=False)
    x,y  = np.meshgrid(lons,latitude)
    px,py = m(x,y)
    
    # create the global map and save it as a png file
    rcParams['font.size'] = 12
    rcParams['figure.figsize'] = [8.0, 6.0]
    palette = cm.RdYlBu_r
    figure()
    m.drawcoastlines(linewidth=0.5)
    m.drawmapboundary(fill_color=(1.0,1.0,1.0))
    cf=m.pcolormesh(px, py, january, cmap = palette)
    cbar = colorbar(cf,orientation='horizontal', shrink=0.95)
    cbar.set_label('Mean Temperature in January')
    
    tight_layout()
    
    savefig('temperature_january.png')

    Temperature

  5. Format the data for dashDB.

    dashDB has a relational database structure, so we need to convert the temperature data into a csv file with 4 columns: month, latitude, longitude, and temperature. We do so by looping over the 3 dimensions of the matrix:

    f = open('temperature.csv', "w")
        f.write('month,temperature,latitude,longitude,latmin,latmax,lonmin,lonmax' + 'n')
    for tim in range(len(time)):
    for lat in range(len(latitude)):
        for lon in range(len(longitude)):
            dataline1 = '{0:.5f},{1:.5f},{2:.5f},{3:.5f},{4:.5f},{5:.5f},{6:.5f},{7:.5f}' .format(time[tim],temperature[tim,lat,lon],latitude[lat],longitude[lon],latitude[lat]-2.5,latitude[lat]+2.5,longitude[lon]-2.5,longitude[lon]+2.5)
            f.write(dataline1 + 'n')
    
    f.close()

Load data into dashDB

  1. If you don’t already have an account, sign up for Bluemix, IBM’s cloud platform.

  2. Add the dashDB service.

    From your Bluemix dashboard, click Work with data. Click New service then choose dashDB and create your instance (more on getting started with dashDB).

  3. Load the csv file into dashDB.

    1. Launch dashDB.
    2. On the left, select Load > Load from Desktop.
    3. Click Browse files and choose the temperature.csv file. Change no options–the default settings are fine.

      What about dates? There are no dates or times in this file. The Month column contains the average temperature for each month over multiple years and not for a specific date.

    4. Click Preview and check the column names and values in the table. Then click Next.

    5. Select Create a new table and load and click Next.

    6. Click Finish.

Your data is now in dashDB!

Visualise data for any location

Within dashDB, we can use R to figure out the seasonal temperature for any location. R is a free software environment for statistical computing and graphics, which comes built-in to dashDB. You can run scripts or use R studio inside your dashDB service, with features that help you load and combine different data tables.

  1. From the menu on the left, select on Analytics > R scripts.
  2. Click the + to create a new R script and select the Temperature table.
  3. Click Apply and a few lines of R code appear in the script that load the data into a data frame, from which you can start your analysis.
  4. Enter the following code and click Submit:
    library(ibmdbR)
    mycon <- idaConnect("BLUDB", "", "")
    idaInit(mycon)
    
    df1 <- as.data.frame(ida.data.frame('"DASH107239"."TEMP"')[ ,c('LATITUDE', 'LONGITUDE', 'MONTH', 'TEMPERATURE')])

    If you prefer to use R studio you can. To get your dashDB username and password, go to the menu on the left and click Connections.

  5. Click Save. You can return to this script any time. Just launch dashDB and from the menu choose Analytics > R scripts. You’ll see a My Projects tab with all your saved scripts.

Now we can analyse and visualise the temperature data by adding to the code we just ran.

Say we want to compute the average seasonal temperature in London and Sydney.

Tip: you can pick any location. You just need the latitude and longitude of your Point of Interest (POI).

The following algorithm finds the shortest distance between the longitude/latitude of the POI and the centres of the raster cells. These are then used to extract the temperatures for the 12 months of this location.

# Latitude and Longitude for Points of Interest
POI_NAME1 <- 'London'
POI_LAT1 <- 51.5
POI_LON1 <- -0.1275

POI_NAME2 <- 'Sydney'
POI_LAT2 <- -33.8
POI_LON2 <- 151.21

# Find the minimum distance between POI to the centre of the nearest grid cell and use this distance to select the temperature values for this location
LONM <- min(abs(as.numeric(df1$LONGITUDE)-POI_LON1))
LATM <- min(abs(as.numeric(df1$LATITUDE)-POI_LAT1))
POI1 <- subset(df1, abs(as.numeric(LONGITUDE)-POI_LON1) == LONM & abs(as.numeric(LATITUDE)-POI_LAT1) == LATM)

LONM <- min(abs(as.numeric(df1$LONGITUDE)-POI_LON2))
LATM <- min(abs(as.numeric(df1$LATITUDE)-POI_LAT2))
POI2 <- subset(df1, abs(as.numeric(LONGITUDE)-POI_LON2) == LONM & abs(as.numeric(LATITUDE)-POI_LAT2) == LATM)

POI1 and POI2 each contain a subset of the original data frame df1 with a temperature value for each month for the specified locations, which we use in the following code to make a plot of the monthly temperatures as both a jpeg and pdf file.

# Make a figure of the seasonal temperature for the POIs
jpeg("temperature.jpg",width=600,height=400)
sink('/dev/null')
plot(1, type = 'n', xlim = c(1, 12), ylim = c(0, 25), xaxt ='n',ylab = "Temperature")
points(POI1$MONTH,POI1$TEMPERATURE,col="blue",pch=15)
points(POI2$MONTH,POI2$TEMPERATURE,col="red",pch=15)
axis(1, at=c(1,2,3,4,5,6,7,8,9,10,11,12), labels=c("January","February","March","April","May","June","July","August","September","October","November","December"), las=2)
legend("top", c(POI_NAME1,POI_NAME2),fill=c("blue","red"))
dev.off()

pdf("temperature.pdf")
sink('/dev/null')
plot(1, type = 'n', xlim = c(1, 12), ylim = c(0, 25), xaxt ='n',ylab = "Temperature")
points(POI1$MONTH,POI1$TEMPERATURE,col="blue",pch=15)
points(POI2$MONTH,POI2$TEMPERATURE,col="red",pch=15)
axis(1, at=c(1,2,3,4,5,6,7,8,9,10,11,12), labels=c("January","February","March","April","May","June","July","August","September","October","November","December"), las=2)
legend("top", c(POI_NAME1,POI_NAME2),fill=c("blue","red"))
dev.off()
sink()

The resulting chart clearly shows the difference between the southern and northern hemisphere.

Monthly temperatures

Next steps

In this tutorial, we converted raster data into a csv table, imported the table in to dashDB, and did some first analyses. There is much more climate raster data available here, which is offered under the Open Database License. (These files are already in a table format, so the conversion is much easier. Just open with Excel and save them as csv files.)

To find out more about the average climate of your holiday location, you could add more weather variables to the analysis we just ran, such as precipitation or sunshine hours. But be warned: these are all average values. So, the actual weather may not be exactly what you expect. ;-)

Looking for more R examples? See these videos: Analyze data using RStudio.

Join The Discussion

Your email address will not be published. Required fields are marked *