Smoothing data is a common component of the analysis of oceanographic data. If we can think of errors (or uncertainties – you pick the word you like) as occurring just in the “y” variable, then two approaches come to mind: (1) fit a smoothing spline and (2) interpolate the data to a uniform grid and then apply a smoothing digital filter. The first of these is familiar to many R users, but the second is more common in the wider community.

Let me illustrate with a graph I was making today, as shown below. This shows the co-dependence of two seawater properties: density (on the x axis) and something called “spiciness” (on the y axis). My interest is in smoothing the curve, and finding spots on it that have zero slope, i.e. densities at which spiciness is a local maximum or minimum.

I will let the graph and the code speak for themselves. The code downloads a sample file, and I encourage readers to run the code (after removing the png() call for interactive work), altering things slightly to see the effect on the curves. For the splines, try altering the df definition and, in the second smooth.spline() call, alter the altered value for that parameter. Start with small changes, so you can gain intuition on such things as (a) whether increasing df increases or decreases the smoothness of the resultant lines on the diagram and (b) how sensitive this smoothing change is to variations of df. And, for the lowpass case, alter deltaSigma and n individually and then together, to see how results change. In half an hour, you will likely have enough intuition to try this sort of thing for yourself.

Because there are no agreed-upon methods for selecting and adjusting smoothing methods, I advise doing tests like this on data of your own, to find values that suit your goals, and to see if these are related methods will be of use to you.

Graph under discussion

spline

Code

The first time you run this code, it will download an Argo file. After that, it will use that file without downloading it again. If run non-interactively, this code produces a PNG file.

library(oce)
# Download a test (Argo) file and cache it locally.
url <- "https://github.com/dankelley/dek_blog/raw/main/docs/assets/data/D4901076_061.nc"
file <- gsub(".*/", "", url)
if (!file.exists(file)) {
    download.file(url, file)
}
# If run non-interactively, create a PNG file
if (!interactive()) {
    png("2024-02-16-spline.png", height = 7, width = 7, unit = "in", res = 200)
}
par(mfrow = c(1, 1), mar = c(3.5, 3.5, 1, 1), mgp = c(2, 0.7, 0), lwd = 1.4)
# Read the file, extract some data, and start plotting
a <- read.oce(file)
sig0 <- a[["sigma0"]]
pi0 <- a[["spiciness0"]]
plot(sig0, pi0,
    xlab = resizableLabel("sigma0"),
    ylab = resizableLabel("spiciness0"),
    ylim = rev(range(pi0)),
    cex = 0.8
)
# Smooth in 4 different ways
deltaSigma <- 0.01
n <- 9 # must be an odd number
df <- length(sig0) / 3
sig0out <- seq(min(sig0), max(sig0), deltaSigma)
a <- approx(sig0, pi0, sig0out)
lines(a$x, lowpass(a$y, n = n), col = 1, lwd = 2)
s1 <- smooth.spline(pi0 ~ sig0, df = df)
ps1 <- predict(s1, sig0out)
lines(ps1$x, ps1$y, col = 2)
s2 <- smooth.spline(a$y ~ a$x, df = df)
ps2 <- predict(s2, sig0out)
lines(ps2$x, ps2$y, col = 3)
s3 <- smooth.spline(a$y ~ a$x, df = df * 2)
ps3 <- predict(s3, sig0out)
lines(ps3$x, ps3$y, col = 4)

grid()
legend("bottomleft",
    lwd = 1, col = 1:4, bg = "white",
    legend = c(
        "lowpass",
        "smoothing spline on raw data",
        "smoothing spline on gridded data",
        "smoothing spline on gridded data (df doubled)"
    )
)