The section dataset from oce provides a good example of a dataset containing flags. I’ll show how to handle such flags here, step by step.

library(oce)
data(section)
Sflag <- section[["salinityFlag"]]

A good first step is to see what flags are being used. The table() function call

table(Sflag)

yields

## Sflag
##    2    3    4 
## 2298  440  103

What do the numbers mean? Well, this particular dataset uses the World Hydrographic Program (WHP) convention for flags (see ?section), in which a flag value of 2 is used to indicate data considered to be acceptable. Thus, the table indicates that only 3/4 of the salinity measurements are considered to be acceptable. This makes this a good dataset to illustrate the handling of flags.

First, extract some relevant data.

S <- section[["salinity"]]
T <- section[["temperature"]]
theta <- section[["theta"]]
p <- section[["pressure"]]
Sflag <- section[["salinityFlag"]]
lon <- section[["longitude"]]
lat <- section[["latitude"]]

Next, plot salinity flag vs salinity

plot(S, Sflag, pch = Sflag - 1)

This suggests that, apart from one distinct outlier at a salinity of 26, the salinities of bad data are generally in the range of the salinities of good data. Next, examine temperature and salinity together.

plotTS(as.ctd(S, T, p, longitude = lon, latitude = lat), pch = Sflag - 1)

The last two plots suggest that one of the points marked as being bad (flag=4) is distinctly anomalous compared with all the other data. A detailed analysis could be made of that point (e.g. first isolate the station, then plot it in detail) but time may be better spent simply focussing on data that have been assessed as being reasonable during the data-archiving process.

ok <- Sflag == 2
plotTS(as.ctd(S[ok], T[ok], p[ok], longitude = lon[ok], latitude = lat[ok]))

Another approach is to use handleFlags() to select the good data

section2 <- handleFlags(section)
plotTS(section2)

Here, we have used the fact that plotTS can recognize section objects. The use of handleFlags() is also recommended because it carries over to other types of plots, e.g. a salinity section. For example, a salinity section of all the data is produced as follows. The top row shows the original section, and the bottom one shows the result after using handleFlags(). Note that the top row has quite a few isolated points that do not fit the pattern, as revealed by small closed contours on those plots and blue blobs on the images. Also note that the overall pattern is changed in these plots, because the bad data are controlling the contour/image scale ranges.

par(mfrow = c(2, 2))
plot(section, which = "salinity")
plot(section, which = "salinity", ztype = "image", zcol = oceColorsJet)
plot(section2, which = "salinity")
plot(section2, which = "salinity", ztype = "image", zcol = oceColorsJet)

Exercises

  1. Find which station has the very low salinity, and examine that station in more detail.
  2. Try as above, but only discarding data with salinityFlag==4, which are known to be bad (i.e. retain both acceptable and questionable data).
  3. Continue step 2, with other types of analysis (e.g. examine spatial dependence).
  4. Look online for the source of the section dataset, to find out more about how the data-quality flags were assigned.