2 Introduction

dfW <- openair::importAURN(
  site = "SA33",
  year = 2019,
  pollutant = "all",
  hc = FALSE,
  meta = TRUE,
  to_narrow = FALSE, # produces long form data yay!
  verbose = TRUE # for now

# fails. it worked before
# dfL <- openair::importAURN(
#   site = "SA33",
#   year = 2019,
#   pollutant = "all",
#   hc = FALSE,
#   meta = TRUE,
#   to_narrow = TRUE, # produces long form data yay!
#   verbose = TRUE
# )

dtW <- data.table::as.data.table(dfW) # we like data.tables

Data downloaded from http://uk-air.defra.gov.uk/openair/R_data/ using ōpenair::importAURN().

Southampton City Council collects various forms of air quality data at the sites shown in 2.1. WHO publishes information on the health consequences and “acceptable” exposure levels for each of these.

lDT <- data.table::melt(dtW, id.vars = c("site", "date", "code", "latitude", "longitude", 
    "site_type"), measure.vars = c("no", "no2", "nox", "pm10", "nv10", "v10", "ws", 
    "wd"), value.name = "value"  # varies 

# remove NA
lDT <- lDT[!is.na(value)]

t <- lDT[, .(from = min(date), to = max(date), nObs = .N), keyby = .(site, variable)]

kableExtra::kable(t, caption = "Dates data available by site and measure", digits = 2) %>% 
Table 2.1: Dates data available by site and measure
site variable from to nObs
Southampton A33 no 2019-01-01 2019-12-20 23:00:00 8211
Southampton A33 no2 2019-01-01 2019-12-20 23:00:00 8211
Southampton A33 nox 2019-01-01 2019-12-20 23:00:00 8211
Southampton A33 pm10 2019-01-01 2019-12-20 22:00:00 7724
Southampton A33 nv10 2019-01-01 2019-12-20 22:00:00 7724
Southampton A33 v10 2019-01-01 2019-12-20 22:00:00 7724
Southampton A33 ws 2019-01-01 2019-12-20 23:00:00 8064
Southampton A33 wd 2019-01-01 2019-12-20 23:00:00 8064

3 Summarise data

Summarise previously downloaded and processed data… Note that this may not be completely up to date.

Table 3.1: Data summary
Name dfW
Number of rows 8760
Number of columns 14
Column type frequency:
character 3
numeric 10
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
code 0 1 4 4 0 1 0
site 0 1 15 15 0 1 0
site_type 0 1 13 13 0 1 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
no 549 0.94 27.95 38.75 -0.22 4.12 14.20 35.60 444.91 ▇▁▁▁▁
no2 549 0.94 32.74 22.47 0.00 15.11 28.03 45.55 158.31 ▇▅▁▁▁
nox 549 0.94 75.57 76.58 0.84 22.77 52.03 100.15 771.31 ▇▁▁▁▁
pm10 1036 0.88 16.74 10.98 -1.40 9.60 13.70 20.50 95.00 ▇▃▁▁▁
nv10 1036 0.88 13.66 9.48 -9.60 7.40 11.30 17.42 99.70 ▇▆▁▁▁
v10 1036 0.88 3.07 3.09 -14.00 1.30 2.70 4.30 22.80 ▁▂▇▁▁
ws 696 0.92 3.76 2.03 0.00 2.30 3.30 4.90 13.00 ▆▇▃▁▁
wd 696 0.92 201.12 104.40 0.00 116.77 229.75 284.10 360.00 ▅▃▃▇▆
latitude 0 1.00 50.92 0.00 50.92 50.92 50.92 50.92 50.92 ▁▁▇▁▁
longitude 0 1.00 -1.46 0.00 -1.46 -1.46 -1.46 -1.46 -1.46 ▁▁▇▁▁

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
date 0 1 2019-01-01 2019-12-31 23:00:00 2019-07-02 11:30:00 8760

Table 3.1 gives an indication of the availability of the different measures.

4 Analysis

In this section we present graphical analysis of the previoulsy downloaded data. Note this is just a snapshot of the data available.

4.1 Nitrogen Dioxide

yLab <- "Nitrogen Dioxide (ug/m3)"

t <- lDT[variable == "no2", .(mean = mean(value, na.rm = TRUE), sd = sd(value, na.rm = TRUE), 
    min = min(value, na.rm = TRUE), max = max(value, na.rm = TRUE)), keyby = .(site)]
kableExtra::kable(t, caption = "Summary of no2 data") %>% kable_styling()
Table 4.1: Summary of no2 data
site mean sd min max
Southampton A33 32.73677 22.47269 0 158.3072

Table 4.1 suggests that there may be a few (0) negative values. These are summarised in 4.2 while Figure 4.1 shows the availability and levels of the pollutant data over time.

t <- head(lDT[variable == "no2" & value < 0], 10)
kableExtra::kable(t, caption = "Negative no2 values (up to first 6)") %>% kable_styling()
Table 4.2: Negative no2 values (up to first 6)
site date code latitude longitude site_type variable value
t <- table(lDT[variable == "no2" & value < 0, .(site)])
kableExtra::kable(t, caption = "Negative no2 values (count by site)") %>% kable_styling()
Table 4.2: Negative no2 values (count by site)
# dt,xvar, yvar,fillVar, yLab
p <- makeTilePlot(lDT[variable == "no2"], xVar = "date", yVar = "site", fillVar = "value", 
    yLab = yLab)

Nitrogen Dioxide data availability and levels over time

Figure 4.1: Nitrogen Dioxide data availability and levels over time

# p <- ggplot2::ggplot(dt, aes(x = obsDateTime, y = nox2, colour = site, alpha =
# 0.1)) + geom_point(shape=4, size = 1)

t <- lDT[variable == "no2" & value > 200][order(-value)]

kableExtra::kable(caption = paste0("Values greater than WHO threshold (NO2 > ", hourlyno2Threshold_WHO, 
    ")"), head(t, 10)) %>% kable_styling()
Table 4.3: Values greater than WHO threshold (NO2 > 200)
site date code latitude longitude site_type variable value
p <- makeDotPlot(lDT[variable == "no2"], xVar = "date", yVar = "value", byVar = "site", 
    yLab = yLab)

p <- p + geom_hline(yintercept = hourlyno2Threshold_WHO) + labs(caption = "Reference line = WHO hourly guideline threshold")

if (doPlotly) {
    plotly::ggplotly(p + xlim(xlimMinDateTime, xlimMaxDateTime))  # interactive, xlimited 
} else {

Figure 4.2: Nitrogen Dioxide levels, Southampton (hourly)

Figure 4.2 shows hourly values for all sites. In the study period there were 0 hours when the hourly Nitrogen Dioxide level breached WHO guidelines. The worst 10 cases are shown in Table 4.3.

lDT[, obsDate := lubridate::date(date)]

plotDT <- lDT[variable == "no2", .(mean = mean(value, na.rm = TRUE)),
             keyby = .(obsDate, site)]

p <- makeDotPlot(plotDT, 
                 xVar = "obsDate", 
                 yVar = "mean", 
                 byVar = "site", 
                 yLab = yLab)

p <- p +
  geom_smooth() + # add smoothed line
  labs(caption = "Trend line = Generalized additive model (gam) with integrated smoothness estimation")

  plotly::ggplotly(p + xlim(xlimMinDate, xlimMaxDate)) # interactive, xlimited # interactive
} else {

Figure 4.3: Nitrogen Dioxide levels, Southampton (daily mean

Figure 4.3 shows daily mean values for all sites over time and includes smoother trend lines for each site.

Clearly the mean daily values show less variance (and less extremes) than the hourly data and there has also been a decreasing trend over time.

4.2 openair tests

Wind rose

Wind rose for Southampton A33, 2019

Figure 4.4: Wind rose for Southampton A33, 2019

Pollution rose

openair::pollutionRose(dfW, pollutant = "no2")
Pollution rose for Southampton A33, 2019, hourly data

Figure 4.5: Pollution rose for Southampton A33, 2019, hourly data

We get a slightly higher % of high measures when the wind is from the SE?

5 Runtime

Report generated using knitr in RStudio with R version 3.6.2 (2019-12-12) running on x86_64-apple-darwin15.6.0 (Darwin Kernel Version 17.7.0: Sun Dec 1 19:19:56 PST 2019; root:xnu-4570.71.63~1/RELEASE_X86_64).

t <- proc.time() - startTime

elapsed <- t[[3]]

Analysis completed in 7.88 seconds ( 0.13 minutes).

R packages used:

  • data.table - (Dowle et al. 2015)
  • ggplot2 - (Wickham 2009)
  • here - (Müller 2017)
  • kableExtra - (Zhu 2018)
  • lubridate - (Grolemund and Wickham 2011)
  • openair - (Carslaw and Ropkins 2012)
  • plotly - (Sievert et al. 2016)
  • skimr - (Arino de la Rubia et al. 2017)


