20 Using NetCDF files
20.1 Learning Objectives
In this lesson, you will:
- Learn to read data from a NetCDF file
- Wrangle the example data into a data frame
- Make some plots
20.2 Introduction
NetCDF files are hierarchical data files that contain embedded metadata and allow for efficient extraction of data. They are particularly useful for storing large data, such as raster data and model outputs.
This lesson draws from a previous lesson written by Leah Wasser, available here.
This R blog post also contains some good introduction material.
20.3 Reading in data
First let’s load the ncdf4
package
library(ncdf4)
library(ggplot2)
library(dplyr)
library(tidyr)
Let’s grab an example file. Download the .nc file from Fiamma Straneo. 2019. Temperature and salinity profiles adjacent to a tidewater glacier in Sarqardleq Fjord, West Greenland, collected during July 2013. Arctic Data Center. doi:10.18739/A2B853H78. http://doi.org/10.18739/A2B853H78
First we open a connection to our NetCDF file using nc_open
.
nc <- nc_open("data/WG2013CTD.nc")
You can print information about what is contained in the file using the print
function on the nc
object.
print(nc)
The netcdf file has a lot of information in the top level. You can navigate through the nc
connection using the list selector operator. For example:
nc$filename
You can return the names of the variables by using the attributes
function on the var
element within the nc
object.
vars <- attributes(nc$var)$names
vars
Note that we haven’t read in any data yet - we have only read in all of the attributes, which are all of the different fields used to store metadata.
You can retrieve individual variables by calling ncvar_get
, and the variable by name.
sal <- ncvar_get(nc, "sal")
time_mat <- ncvar_get(nc, "time")
Note that if the file also has dimension variables, you can retrieve these values the same way as if they were variables.
#examine dimension variable names
names(nc$dim)
Read in the depth dimension variable.
depth <- ncvar_get(nc, "z")
20.4 Reshaping the data into a data.frame
Depending on what your analysis goals are, you may want to convert your data into a data.frame
structure. These data would work well in one since it is not a big dataset, and it is not gridded. Other dataset types, like gridded raster data, should be dealt with differently (such as using the raster
package).
First, we might want to convert the MATLAB date-time number to a POSIXct number.
time <- as.POSIXct((time_mat + 719529)*86400, origin = "1970-01-01", tz = "UTC")
Next we coerce the salinity matrix, which is represented with rows according to time and columns according to depth, into a data frame,
# coerce matrix to data frame
salinity_data <- as.data.frame(sal)
We then assign column names to the character value of our depth vector.
# name columns according to depth dimension
names(salinity_data) <- as.character(depth)
And finally, we add the time column to our matrix, gather over the depth columns, and turn the depth column back to a numeric value,
salinity_data <- salinity_data %>%
mutate(time = time) %>%
gather(key = "depth", value = "salinity", -time) %>%
mutate(depth = as.numeric(depth))
20.5 Plotting
First let’s try to make a raster
plot using geom_raster
.
ggplot(salinity_data, aes(x = time, y = depth, fill = salinity)) +
geom_raster() +
theme_bw() +
ylab("Depth (m)") +
xlab("") +
scale_fill_continuous(low = "gray", high = "red", name = "Salinity (psu)")
Turs out the data are fairly discontinuous, so we might want something like this instead, overlaying the profile data together.
ggplot(salinity_data, aes(x = salinity,
y = depth,
group = time,
color = time)) +
geom_line(size = .1) +
scale_y_reverse() +
theme_bw() +
ylab("Depth (m)") +
xlab("Salinity (psu)") +
theme(legend.title = element_blank())