The method identifies spikes with respect to a "reference" time-series, and
replaces these spikes with the reference value, or with NA
according
to the value of action
; see “Details”.
Arguments
- x
a vector of (time-series) values, a list of vectors, a data frame, or an oce object.
- reference
indication of the type of reference time series to be used in the detection of spikes; see “Details”.
- n
an indication of the limit to differences between
x
and the reference time series, used forreference="median"
orreference="smooth"
; see “Details.”- k
length of running median used with
reference="median"
, and ignored for other values ofreference
.- min
minimum non-spike value of
x
, used withreference="trim"
.- max
maximum non-spike value of
x
, used withreference="trim"
.- replace
an indication of what to do with spike values, with
"reference"
indicating to replace them with the reference time series, and"NA"
indicating to replace them withNA
.- skip
optional vector naming columns to be skipped. This is ignored if
x
is a simple vector. Any items named inskip
will be passed through to the return value without modification. In some cases,despike
will set up reasonable defaults forskip
, e.g. for actd
object,skip
will be set toc("time", "scan", "pressure")
if it is not supplied as an argument.
Details
Three modes of operation are permitted, depending on the value of
reference
.
For
reference="median"
, the first step is to linearly interpolate across any gaps (spots wherex==NA
), usingapprox()
withrule=2
. The second step is to pass this throughrunmed()
to get a running median spanningk
elements. The result of these two steps is the "reference" time-series. Then, the standard deviation of the difference betweenx
and the reference is calculated. Anyx
values that differ from the reference by more thann
times this standard deviation are considered to be spikes. Ifreplace="reference"
, the spike values are replaced with the reference, and the resultant time series is returned. Ifreplace="NA"
, the spikes are replaced withNA
, and that result is returned.For
reference="smooth"
, the processing is the same as for"median"
, except thatsmooth()
is used to calculate the reference time series.For
reference="trim"
, the reference time series is constructed by linear interpolation across any regions in whichx<min
orx>max
. (Again, this is done withapprox()
withrule=2
.) In this case, the value ofn
is ignored, and the return value is the same asx
, except that spikes are replaced with the reference series (ifreplace="reference"
or withNA
, ifreplace="NA"
.
Examples
n <- 50
x <- 1:n
y <- rnorm(n = n)
y[n / 2] <- 10 # 10 standard deviations
plot(x, y, type = "l")
lines(x, despike(y), col = "red")
lines(x, despike(y, reference = "smooth"), col = "darkgreen")
lines(x, despike(y, reference = "trim", min = -3, max = 3), col = "blue")
legend("topright",
lwd = 1, col = c("black", "red", "darkgreen", "blue"),
legend = c("raw", "median", "smooth", "trim")
)
# add a spike to a CTD object
data(ctd)
plot(ctd)
T <- ctd[["temperature"]]
T[10] <- T[10] + 10
ctd[["temperature"]] <- T
CTD <- despike(ctd)
plot(CTD)