The underlying code repository is available at: https://github.com/FEEprojects/Paper_calibration_nocs_data_analysis

This folder contains the scripts used to conduct the analysis of the data collected at NOCS Southampton, with a Fidas 200S and 40 PM sensors between July 2020 and July 2021 presented in the paper XXXXXXXX.

How to use this folder

The data needs to be downloaded from https://doi.org/10.5281/zenodo.7198378 and placed in the data folder in .rds format. Then the script R/utilities/data_preparation.r must be ran to tidy and collate the different datasets used in the study, see Data preparation for more details.

Finally, this document presents an explanation of the different analysis conducted.

Data preparation

The datasets used during this analysis are:

  • data from the Fidas 200S: PM mass concentrations, PM size distribution and weather data.

  • data from the low-cost sensors: PM mass concentrations, PM size distribution, weather data.

Fidas 200S data

The data from the Fidas 200S was extracted from the .promo files using PDAnalyze (proprietary software from Palas Gmbh) at 1min resolution. The text files extracted from the .promo files were parsed and stored at:

  • data\df_pm_2min.rds - PM concentrations

  • data\df_weather_2min.rds - weather data

Low-cost sensors data

The data from the sensors was sent to an influxDB, and was saved in data/202007_to_202107_nocs.rds.

The data is then processed by utilities/data_preparation.r and saved in DF_JOINED (see utilities/variables.r for the exact location).

Sensors “PMS-60N1” and “PMS-56N2” have been removed because of the high peaks they report now and again. “SPS-40N2” has been removed because it reports a constant value of zero.

Two time periods have been removed:

  • 15th October between 10:00 and 11:00 for a preliminary incense experiment (not presented in the paper).

  • 3rd November all day for the incense experiments described in the paper.

The sensors were spread across 4 air quality monitors described in (Johnston et al. 2019) called Nocs-1, Nocs-2, Nocs-3, and Nocs-4.

The air quality monitor Nocs-2 stopped working on the 23rd December 2020. This means 5 PMS and 6 SPS to study.

Data analysis conducted for the study

The following points have been investigated:

Each of these points is detailled further in the next sections.

Initial data exploration

The main results are the delay at 10 second resolution especially with the experiments conducted with the incense peaks and that a temporal resolution of 2 min is preferred given the low concentrations encountered during the study.

Summary of the data from the Fidas 200S

df_fidas <- readRDS("C:/Data/Fidas/promo/df_pm_2min.rds")
df_fidas <-
  df_fidas[df_fidas$date >= as.POSIXct("2020-07-01 00:00:00", tz = "UTC"), ]
df_weather <- readRDS("C:/Data/Fidas/promo/df_weather_2min.rds")
df_weather <-
  df_weather[df_weather$date >= as.POSIXct("2020-07-01 00:00:00", tz = "UTC"), ]
df_distrib <- readRDS("C:/Data/Fidas/promo/df_distrib_2min.rds")
df_distrib <-
  df_distrib[df_distrib$date >= as.POSIXct("2020-07-01 00:00:00", tz = "UTC"), ]



p <- df_fidas %>%
  ggplot(aes(x = date, y = PM2.5)) +
  geom_line()

ggplotly(p, dynamicTicks = T)
p <- df_weather %>%
  ggplot(aes(x = date, y = rh)) +
  geom_line()
ggplotly(p, dynamicTicks = T)

PM\(_{2.5}\) concentrations

Statistics on the PM\(_{2.5}\) concentrations reported by the Fidas 200S, per month.

df_fidas$month <- cut(df_fidas$date, breaks = "1 month")
df_fidas %>%
  group_by(month) %>%
  summarise(
    min = min(PM2.5, na.rm = T),
    Quart1 = quantile(PM2.5, probs = c(0.25), na.rm = T),
    Median = median(PM2.5, na.rm = T),
    Quart3 = quantile(PM2.5, probs = c(0.75), na.rm = T),
    max = max(PM2.5, na.rm = T)
  )
p <- df_fidas %>%
  ggplot() +
  geom_boxplot(aes(x = month, y = PM2.5), outlier.shape = NA) +
  scale_y_continuous(limits = quantile(df_fidas$PM2.5, c(0.1, 0.9)))
p

#ggplotly(p, dynamicTicks = T)

The concentration levels was higher between February 2021 until end of April 2021 and then comes back to lower levels in May/June 2021. November 2020 and June 2021 registered the highest peaks. August, September, October registered concentrations > 100 \(\mu g.m^{-3}\) .

Relative humidity

Statistics on the relative humidity reported by the Fidas 200S, per month.

df_weather$month <- cut(df_weather$date, breaks = "1 month")
df_weather %>%
  group_by(month) %>%
  summarise(
    min = min(rh, na.rm = T),
    Quart1 = quantile(rh, probs = c(0.25), na.rm = T),
    Median = median(rh, na.rm = T),
    Quart3 = quantile(rh, probs = c(0.75), na.rm = T),
    max = max(rh, na.rm = T)
  )
p <- df_weather %>%
  ggplot() +
  geom_boxplot(aes(x = month, y = rh), outlier.shape = NA) +
  scale_y_continuous(limits = quantile(df_weather$rh, c(0.1, 0.9)))
p

Relative humidity is as expected with a yearly pattern peaking in December during winter.

Temperature

Statistics on the temperature reported by the Fidas 200S, per month.

df_weather$month <- cut(df_weather$date, breaks = "1 month")
df_weather %>%
  group_by(month) %>%
  summarise(
    min = min(temperature, na.rm = T),
    Quart1 = quantile(temperature, probs = c(0.25), na.rm = T),
    Median = median(temperature, na.rm = T),
    Quart3 = quantile(temperature, probs = c(0.75), na.rm = T),
    max = max(temperature, na.rm = T)
  )
p <- df_weather %>%
  ggplot() +
  geom_boxplot(aes(x = month, y = temperature), outlier.shape = NA) +
  scale_y_continuous(limits = quantile(df_weather$temperature, c(0.1, 0.9)))
p

Idem for temperature with a low in December.

Visualisation of the data from the sensors

The goal is to detect anomalies in the data and faulty sensors.

The visualisation is available at Time series sensors.

Impact of the time averaging on the certification

This is presented in Demonstration of equivalence for daily, hourly and 2min averages

The demonstration of equivalence is calculated for each combination of 40 consecutive days in the data for each sensors.

Preliminary calibration method selection

The calibration is first tested by using the first two weeks of the data and using the next 40 days for verification as described in the Supplementary information of the paper.

The results are presented in Test calibration

Variation of the calibration through the year

The sensors are calibrated using a range of methods using the first 2 weeks of the data at a 2min resolution.

The performances of the calibration is then assessed throughout the year, per week and per month. Plotted by calibration_2weeks_restoftheyear.rmd, with the calculations being done by calibration_2weeks_restoftheyear/calibration_2weeks_restoftheyear.r and aggregated by calibration_2weeks_restoftheyear/calibration_2weeks_restoftheyear_agg.r.

The results will be available in Calibration on 2 first weeks, performances on the year

Robust method selection

In this part, the calibration methods are evaluated on each subsets of 2 weeks + 40 consecutive days possible during the study. The results are presented in Calibration 2 weeks 40 days.

Calibration scenarios comparison

The different scenario (lenght and frequency of the calibration) are evaluated on all the possible subsets of these scenarios in the dataset. The results are presented in Calibration scenarios comparison.

The calibration scenarios are then used to compare to the criteria of the demonstration of equivalence, presented in Calibration scenarios comparison equivalence.

References

Johnston, Steven J., Philip J. Basford, Florentin M. J. Bulot, Mihaela Apetroaie-Cristea, Natasha H. C. Easton, Charlie Davenport, Gavin L. Foster, Matthew Loxham, Andrew K. R. Morris, and Simon J. Cox. 2019. “City Scale Particulate Matter Monitoring Using LoRaWAN Based Air Quality IoT Devices.” Sensors 19 (1): 209. https://doi.org/10.3390/s19010209.