Preface. The code of this blog posting will only work with the latest development-branch of the oce source.

Introduction

The section dataset from oce provides a good example of a dataset containing flags.

Methods

1
library(oce)
## Loading required package: methods
## Loading required package: gsw
## Loading required package: testthat
1
2
data(section)
Sflag <- section[['salinityFlag']]

A good first step is to see what flags are being used

1
table(Sflag)
## Sflag
##    2    3    4 
## 2298  440  103

This dataset uses the WHP convention for flags (see ?section), in which a flag value of 2 is used to indicate data considered to be acceptable. Thus, the table indicates that only 3/4 of the salinity measurements are considered to be acceptable. This makes this a good dataset to illustrate the handling of flags.

First, extract some relevant data.

1
2
3
4
5
S <- section[['salinity']]
T <- section[['temperature']]
theta <- section[['theta']]
p <- section[['pressure']]
Sflag <- section[['salinityFlag']]

Next, plot salinity flag vs salinity

1
plot(S, Sflag, pch=Sflag-1)

center

This suggests that, apart from one distinct outlier at a salinity of 26, the salinities of bad data are generally in the range of the salinities of good data. Next, examine temperature and salinity together.

1
plotTS(as.ctd(S, T, p), pch=Sflag-1)

center

The last two plots suggest that one of the points marked as being bad (flag=4) is distinctly anomalous compared with all the other data. A detailed analysis could be made of that point (e.g. first isolate the station, then plot it in detail) but time may be better spent simply focussing on data that have been assessed as being reasonable during the data-archiving process.

1
2
ok <- Sflag == 2
plotTS(as.ctd(S[ok], T[ok], p[ok]))

center

Another approach is to use handleFlags to select the good data

1
2
section2 <- handleFlags(section)
plotTS(section2)

center where we have used the fact that plotTS can recognize section objects. The use of handleFlags is also recommended because it carries over to other types of plots, e.g. a salinity section. For example, a salinity section of all the data is produced with

1
plot(section, which="salinity")
## Warning in sectionGrid(x, debug = debug - 1): Data flags are omitted from
## the gridded section object. Use handleFlags() first to remove bad data.

center while one of just the acceptable data is produced with

1
plot(section2, which="salinity")
## Warning in sectionGrid(x, debug = debug - 1): Data flags are omitted from
## the gridded section object. Use handleFlags() first to remove bad data.

center

Exercises

  1. Find which station has the very low salinity, and examine that station in more detail.
  2. Try as above, but only discarding data with salinityFlag==4, which are known to be bad (i.e. retain both acceptable and questionable data).
  3. Continue step 2, with other types of analysis (e.g. examine spatial dependence).
  4. Look online for the source of the section dataset, to find out more about how the data-quality flags were assigned.

Resources

This website is written in Jekyll, and the source is available on GitHub.