Introduction to STICr

The goal of STICr (pronounced “sticker”) is to provide a standardized set of functions for working with data from Stream Temperature, Intermittency, and Conductivity (STIC) loggers, first described in Chapin et al. (2014). STICs and other intermittency sensors are becoming more popular, but their raw data output is not in a form that allows for convenient analysis. In this vignette, we will walk through some typical processing steps, starting from the raw STIC model output and ending with classified and visualized STIC data.

Let’s get going - load STICr!

# install.packages("STICr")  # if needed: install package from CRAN
# devtools::install_github("HEAL-KGS/STICr") # if needed: install dev version from GitHub
library(STICr)

Tidying raw STIC data

STICs are created from modified HOBO Pendant loggers (see instructions for how to do this here), and as a result the raw data downloaded from the STIC logger comes in a proprietary .hobo format. Before using STICr, you must use the HOBOware software to export the STIC output data as a CSV file. STICr can then use tidy_hobo_data() to get this into an R-friendly tidy format. We will also use the convert_utc argument to ensure that the data are converted from local time to UTC.

# use tidy_hobo_data to load and tidy your raw HOBO data
df_tidy <-
  tidy_hobo_data(
    infile = "https://samzipper.com/data/raw_hobo_data.csv",
    outfile = FALSE, convert_utc = TRUE
  )

head(df_tidy)
#>              datetime condUncal  tempC
#> 1 2021-07-16 22:00:00   88178.4 27.764
#> 2 2021-07-16 22:15:00   77156.1 28.655
#> 3 2021-07-16 22:30:00   74400.5 28.060
#> 4 2021-07-16 22:45:00   74400.5 27.764
#> 5 2021-07-16 23:00:00   74400.5 27.862
#> 6 2021-07-16 23:15:00   71644.9 27.370

The output of tidy_hobo_data() is a data frame with 3 columns:

datetime = the date and time of the STIC measurement (in this case, converted to UTC).
condUncal = the STIC conductivity reading, which is an uncalibrated, sensor-specific relative conductivity measurement.
tempC = the STIC temperature reading in degrees celsius.

Calibrating STIC data

Since the raw STIC data is in a sensor-specific relative conductivity value, it can often be useful to calibrate STIC loggers. This involves taking STIC readings in the lab with the sensor immersed in standards with known specific conductivity - detailed instructions for how to do this can be found here.. STICr can then use the get_calibration() and apply_calibration() functions to take these lab standard measurements and apply them to the STIC sensor output data.

# inspect the example calibration standard data provided with the package
data(calibration_standard_data)
head(calibration_standard_data)
#> # A tibble: 4 × 3
#>     sensor standard condUncal
#>      <dbl>    <dbl>     <dbl>
#> 1 20946471      100    12400.
#> 2 20946471      250    23422.
#> 3 20946471      500    46845.
#> 4 20946471     1000   104712.

The calibration standard data has three columns:

sensor = an identifier or serial number for the STIC sensor, since the calibration differs for each sensor.
standard = the specific conductivity for the lab standard data used for calibration.
condUncal = the uncalibrated conductivity data reading by the STIC sensor when immersed in the standard.

Using this calibration standard data, we can then create a calibration curve using the get_calibration() function and apply it to our data using the apply_calibration() function. Currently, get_calibration() only allows for a linear calibration curve.

# get calibration
lm_calibration <- get_calibration(calibration_standard_data)
summary(lm_calibration)
#> 
#> Call:
#> lm(formula = standard ~ condUncal, data = calibration_data)
#> 
#> Residuals:
#>      1      2      3      4 
#> -33.43  11.27  37.50 -15.34 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)   
#> (Intercept) 1.496e+01  3.136e+01   0.477   0.6803   
#> condUncal   9.554e-03  5.328e-04  17.932   0.0031 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 37.99 on 2 degrees of freedom
#> Multiple R-squared:  0.9938, Adjusted R-squared:  0.9907 
#> F-statistic: 321.5 on 1 and 2 DF,  p-value: 0.003096

This sensor has a very strong calibration, with an R^2 > 0.99. We can now apply this to the tidied STIC data we loaded earlier. We will use the outside_range_flag argument to flag any data points that are outside the range of the standards used to develop the calibration curve, as these should be treated with caution.

# apply calibration
df_calibrated <- apply_calibration(
  stic_data = df_tidy,
  calibration = lm_calibration,
  outside_std_range_flag = T
)

head(df_calibrated)
#>              datetime condUncal  tempC      SpC outside_std_range
#> 1 2021-07-16 22:00:00   88178.4 27.764 857.3845                  
#> 2 2021-07-16 22:15:00   77156.1 28.655 752.0820                  
#> 3 2021-07-16 22:30:00   74400.5 28.060 725.7561                  
#> 4 2021-07-16 22:45:00   74400.5 27.764 725.7561                  
#> 5 2021-07-16 23:00:00   74400.5 27.862 725.7561                  
#> 6 2021-07-16 23:15:00   71644.9 27.370 699.4302

We now have two additional columns in our STIC data frame:

SpC = the calibrated specific conductivity value at each timestep.
outside_range = a flag indicating whether the STIC data was within the range of the standards used for calibration (column is empty) or outside the range of the standards (column has an O).

Classify the data

Many people use STIC sensors to monitor when a stream is wet or dry, based on the principle that the conductivity of water is much higher than that of air. Therefore, high values of condUncal and/or SpC can be classified as wet conditions, and low values can be classified as dry conditions. In STICr, this can be done using the classify_wetdry() function, but this requires determining a suitable threshold to use for differentiating wet and dry conditions from the STIC data. It is typically useful to plot the data to determine this threshold.

# plot SpC as a timeseries and histogram
plot(df_calibrated$datetime, df_calibrated$SpC,
  xlab = "Datetime", ylab = "SpC",
  main = "Specific Conductivity Timeseries"
)


hist(df_calibrated$SpC,
  xlab = "Specific Conductivity", breaks = seq(0, 1025, 25),
  main = "Specific Conductivity Distribution"
)

It can be unclear when exactly the sensor is wet or dry, particularly if the sensor has been buried by deposited sediments (which have an intermediate conductivity between water and air). In this case, there is a clear abundance of points with SpC < 100, which we will use for our classification threshold.

# classify data
df_classified <- classify_wetdry(
  stic_data = df_calibrated,
  classify_var = "SpC",
  threshold = 100,
  method = "absolute"
)
head(df_classified)
#>              datetime condUncal  tempC      SpC outside_std_range wetdry
#> 1 2021-07-16 22:00:00   88178.4 27.764 857.3845                      wet
#> 2 2021-07-16 22:15:00   77156.1 28.655 752.0820                      wet
#> 3 2021-07-16 22:30:00   74400.5 28.060 725.7561                      wet
#> 4 2021-07-16 22:45:00   74400.5 27.764 725.7561                      wet
#> 5 2021-07-16 23:00:00   74400.5 27.862 725.7561                      wet
#> 6 2021-07-16 23:15:00   71644.9 27.370 699.4302                      wet

We now have a new column, wetdry, which reads “wet” when SpC exceeds the threshold and “dry” when SpC is less than the threshold. We can plot and visualize the classified data.

# plot SpC through time, colored by wetdry
plot(df_classified$datetime, df_classified$SpC,
  col = as.factor(df_classified$wetdry),
  pch = 16,
  lty = 2,
  xlab = "Datetime",
  ylab = "Specific conductivity"
)
legend("topright", c("dry", "wet"),
  fill = c("black", "red"), cex = 0.75
)

QAQC data

STICr has built-in a built-in QAQC function, qaqc_stic_data(), to deal with some common data issues we have experienced. This function requires classified STIC data, such as that output by classify_stic_data(), and produces a QAQC column or columns which could include the following data flags:

C = calibrated SpC value was negative and corrected to 0, flagged if spc_neg_correction = T.
D = point was identified as a deviation or anomaly based on a moving window, flagged if inspect_deviation = T. If this flag is calculated, an anomaly_size and window_size need to be set.
‘O’ = calibrated SpC was outside standard range, determined in apply_calibration().

If concatenate_flags = T, these flags will be combined into a single column named QAQC; if not, they will be separate columns. A blank value in this column means no QAQC flags were identified at that timestep.

# apply qaqc function
df_qaqc <-
  qaqc_stic_data(
    stic_data = df_classified,
    spc_neg_correction = T,
    inspect_deviation = T,
    deviation_size = 4,
    window_size = 96
  )
head(df_qaqc)
#>              datetime condUncal  tempC      SpC wetdry QAQC
#> 1 2021-07-16 22:00:00   88178.4 27.764 857.3845    wet     
#> 2 2021-07-16 22:15:00   77156.1 28.655 752.0820    wet     
#> 3 2021-07-16 22:30:00   74400.5 28.060 725.7561    wet     
#> 4 2021-07-16 22:45:00   74400.5 27.764 725.7561    wet     
#> 5 2021-07-16 23:00:00   74400.5 27.862 725.7561    wet     
#> 6 2021-07-16 23:15:00   71644.9 27.370 699.4302    wet
table(df_qaqc$QAQC)
#> 
#>      DO   O 
#> 916   1  83

We now have a single column, QAQC with all QAQC flags. We can see that there was 1 “D” flag for a potential deviation anomaly (a dry reading surrounding by wet readings) and 83 “O” points indicating the calibrated SpC was outside the range of the standards used for calibration. Using table, we can inspect the total number of data flags in our classified dataset.

Validate via comparison to field observations

Comparing STIC data to field observations is useful to help determine the appropriate threshold that should be used for wet/dry classification and to assess confidence in the STIC data. The more field observations the better, and it is best if they span a range of wet and dry conditions at each site.

# load and inspect sample field observation data
head(field_obs)
#>              datetime wetdry      SpC
#> 1 2021-07-16 18:03:00    wet 612.1672
#> 2 2021-07-19 15:01:00    wet 589.4157
#> 3 2021-07-21 02:44:00    dry 599.6622
#> 4 2021-07-23 13:55:00    wet 916.8215
#> 5 2021-07-25 16:27:00    wet 631.9857

# use validate_stic_data to compile closest-in-time STIC reading for each field observation
stic_validation <-
  validate_stic_data(
    stic_data = classified_df,
    field_observations = field_obs,
    max_time_diff = 30,
    join_cols = NULL,
    get_SpC = TRUE,
    get_QAQC = FALSE
  )

# we can now compare the field observations and classified STIC data in the table
head(stic_validation)
#>              datetime wetdry_obs  SpC_obs condUncal_STIC wetdry_STIC SpC_STIC
#> 1 2021-07-16 18:03:00        wet 612.1672        74400.5         wet 725.7561
#> 2 2021-07-19 15:01:00        wet 589.4157        88178.4         wet 857.3845
#> 3 2021-07-21 02:44:00        dry 599.6622        85422.8         wet 831.0587
#> 4 2021-07-23 13:55:00        wet 916.8215        88178.4         wet 857.3845
#> 5 2021-07-25 16:27:00        wet 631.9857        77156.1         wet 752.0820
#>   timediff_min
#> 1            3
#> 2            1
#> 3           -1
#> 4           -5
#> 5           -3

# calculate percent classification accuracy
sum(stic_validation$wetdry_obs == stic_validation$wetdry_STIC) / length(stic_validation$wetdry_STIC)
#> [1] 0.8

# compare SpC
plot(stic_validation$SpC_obs, stic_validation$SpC_STIC,
  xlab = "Observed SpC", ylab = "STIC SpC"
)

In this example, we are seeing pretty good classification accuracy, but fairly poor SpC agreement.