1 About

1.1 Contributions

Please note that authorship is alphabetical. Contributions are listed below - see github for details and who to blame for what :-).

1.3 Citation

If you wish to refer to any of the material from this report please cite as:

  • Anderson, B., (2019) Air Quality in Southampton (UK): Exploring the data (using AURN & openair) , University of Southampton: Southampton, UK.

Report circulation:

  • Public

Report purpose:

This work is (c) 2019 the University of Southampton.

2 Introduction

dfW <- openair::importAURN(
  site = "SA33",
  year = 2019,
  pollutant = "all",
  hc = FALSE,
  meta = TRUE,
  to_narrow = FALSE, # produces long form data yay!
  verbose = TRUE # for now
)

# fails. it worked before
# dfL <- openair::importAURN(
#   site = "SA33",
#   year = 2019,
#   pollutant = "all",
#   hc = FALSE,
#   meta = TRUE,
#   to_narrow = TRUE, # produces long form data yay!
#   verbose = TRUE
# )

dtW <- data.table::as.data.table(dfW) # we like data.tables

Data downloaded from http://uk-air.defra.gov.uk/openair/R_data/ using ōpenair::importAURN().

Southampton City Council collects various forms of air quality data at the sites shown in 2.1. WHO publishes information on the health consequences and “acceptable” exposure levels for each of these.

lDT <- data.table::melt(dtW, id.vars = c("site", "date", "code", "latitude", "longitude", 
    "site_type"), measure.vars = c("no", "no2", "nox", "pm10", "nv10", "v10", "ws", 
    "wd"), value.name = "value"  # varies 
)

# remove NA
lDT <- lDT[!is.na(value)]

t <- lDT[, .(from = min(date), to = max(date), nObs = .N), keyby = .(site, variable)]

kableExtra::kable(t, caption = "Dates data available by site and measure", digits = 2) %>% 
    kable_styling()
Table 2.1: Dates data available by site and measure
site variable from to nObs
Southampton A33 no 2019-01-01 2019-12-20 23:00:00 8211
Southampton A33 no2 2019-01-01 2019-12-20 23:00:00 8211
Southampton A33 nox 2019-01-01 2019-12-20 23:00:00 8211
Southampton A33 pm10 2019-01-01 2019-12-20 22:00:00 7724
Southampton A33 nv10 2019-01-01 2019-12-20 22:00:00 7724
Southampton A33 v10 2019-01-01 2019-12-20 22:00:00 7724
Southampton A33 ws 2019-01-01 2019-12-20 23:00:00 8064
Southampton A33 wd 2019-01-01 2019-12-20 23:00:00 8064

3 Summarise data

Summarise previously downloaded and processed data… Note that this may not be completely up to date.

skimr::skim(dfW)
Table 3.1: Data summary
Name dfW
Number of rows 8760
Number of columns 14
_______________________
Column type frequency:
character 3
numeric 10
POSIXct 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
code 0 1 4 4 0 1 0
site 0 1 15 15 0 1 0
site_type 0 1 13 13 0 1 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
no 549 0.94 27.95 38.75 -0.22 4.12 14.20 35.60 444.91 ▇▁▁▁▁
no2 549 0.94 32.74 22.47 0.00 15.11 28.03 45.55 158.31 ▇▅▁▁▁
nox 549 0.94 75.57 76.58 0.84 22.77 52.03 100.15 771.31 ▇▁▁▁▁
pm10 1036 0.88 16.74 10.98 -1.40 9.60 13.70 20.50 95.00 ▇▃▁▁▁
nv10 1036 0.88 13.66 9.48 -9.60 7.40 11.30 17.42 99.70 ▇▆▁▁▁
v10 1036 0.88 3.07 3.09 -14.00 1.30 2.70 4.30 22.80 ▁▂▇▁▁
ws 696 0.92 3.76 2.03 0.00 2.30 3.30 4.90 13.00 ▆▇▃▁▁
wd 696 0.92 201.12 104.40 0.00 116.77 229.75 284.10 360.00 ▅▃▃▇▆
latitude 0 1.00 50.92 0.00 50.92 50.92 50.92 50.92 50.92 ▁▁▇▁▁
longitude 0 1.00 -1.46 0.00 -1.46 -1.46 -1.46 -1.46 -1.46 ▁▁▇▁▁

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
date 0 1 2019-01-01 2019-12-31 23:00:00 2019-07-02 11:30:00 8760

Table 3.1 gives an indication of the availability of the different measures.

4 Analysis

In this section we present graphical analysis of the previoulsy downloaded data. Note this is just a snapshot of the data available.

4.1 Nitrogen Dioxide

yLab <- "Nitrogen Dioxide (ug/m3)"

t <- lDT[variable == "no2", .(mean = mean(value, na.rm = TRUE), sd = sd(value, na.rm = TRUE), 
    min = min(value, na.rm = TRUE), max = max(value, na.rm = TRUE)), keyby = .(site)]
kableExtra::kable(t, caption = "Summary of no2 data") %>% kable_styling()
Table 4.1: Summary of no2 data
site mean sd min max
Southampton A33 32.73677 22.47269 0 158.3072

Table 4.1 suggests that there may be a few (0) negative values. These are summarised in 4.2 while Figure 4.1 shows the availability and levels of the pollutant data over time.

t <- head(lDT[variable == "no2" & value < 0], 10)
kableExtra::kable(t, caption = "Negative no2 values (up to first 6)") %>% kable_styling()
Table 4.2: Negative no2 values (up to first 6)
site date code latitude longitude site_type variable value
t <- table(lDT[variable == "no2" & value < 0, .(site)])
kableExtra::kable(t, caption = "Negative no2 values (count by site)") %>% kable_styling()
Table 4.2: Negative no2 values (count by site)
Freq
# dt,xvar, yvar,fillVar, yLab
p <- makeTilePlot(lDT[variable == "no2"], xVar = "date", yVar = "site", fillVar = "value", 
    yLab = yLab)

p
Nitrogen Dioxide data availability and levels over time

Figure 4.1: Nitrogen Dioxide data availability and levels over time

# p <- ggplot2::ggplot(dt, aes(x = obsDateTime, y = nox2, colour = site, alpha =
# 0.1)) + geom_point(shape=4, size = 1)

t <- lDT[variable == "no2" & value > 200][order(-value)]

kableExtra::kable(caption = paste0("Values greater than WHO threshold (NO2 > ", hourlyno2Threshold_WHO, 
    ")"), head(t, 10)) %>% kable_styling()
Table 4.3: Values greater than WHO threshold (NO2 > 200)
site date code latitude longitude site_type variable value
p <- makeDotPlot(lDT[variable == "no2"], xVar = "date", yVar = "value", byVar = "site", 
    yLab = yLab)

p <- p + geom_hline(yintercept = hourlyno2Threshold_WHO) + labs(caption = "Reference line = WHO hourly guideline threshold")


if (doPlotly) {
    p
    plotly::ggplotly(p + xlim(xlimMinDateTime, xlimMaxDateTime))  # interactive, xlimited 
} else {
    p
}

Figure 4.2: Nitrogen Dioxide levels, Southampton (hourly)

Figure 4.2 shows hourly values for all sites. In the study period there were 0 hours when the hourly Nitrogen Dioxide level breached WHO guidelines. The worst 10 cases are shown in Table 4.3.

lDT[, obsDate := lubridate::date(date)]

plotDT <- lDT[variable == "no2", .(mean = mean(value, na.rm = TRUE)),
             keyby = .(obsDate, site)]

p <- makeDotPlot(plotDT, 
                 xVar = "obsDate", 
                 yVar = "mean", 
                 byVar = "site", 
                 yLab = yLab)

p <- p +
  geom_smooth() + # add smoothed line
  labs(caption = "Trend line = Generalized additive model (gam) with integrated smoothness estimation")

if(doPlotly){
  p
  plotly::ggplotly(p + xlim(xlimMinDate, xlimMaxDate)) # interactive, xlimited # interactive
} else {
  p
}

Figure 4.3: Nitrogen Dioxide levels, Southampton (daily mean

Figure 4.3 shows daily mean values for all sites over time and includes smoother trend lines for each site.

Clearly the mean daily values show less variance (and less extremes) than the hourly data and there has also been a decreasing trend over time.

4.2 openair tests

Wind rose

openair::windRose(dfW)
Wind rose for Southampton A33, 2019

Figure 4.4: Wind rose for Southampton A33, 2019

Pollution rose

openair::pollutionRose(dfW, pollutant = "no2")
Pollution rose for Southampton A33, 2019, hourly data

Figure 4.5: Pollution rose for Southampton A33, 2019, hourly data

We get a slightly higher % of high measures when the wind is from the SE?

5 Runtime

Report generated using knitr in RStudio with R version 3.6.2 (2019-12-12) running on x86_64-apple-darwin15.6.0 (Darwin Kernel Version 17.7.0: Sun Dec 1 19:19:56 PST 2019; root:xnu-4570.71.63~1/RELEASE_X86_64).

t <- proc.time() - startTime

elapsed <- t[[3]]

Analysis completed in 7.88 seconds ( 0.13 minutes).

R packages used:

  • data.table - (Dowle et al. 2015)
  • ggplot2 - (Wickham 2009)
  • here - (Müller 2017)
  • kableExtra - (Zhu 2018)
  • lubridate - (Grolemund and Wickham 2011)
  • openair - (Carslaw and Ropkins 2012)
  • plotly - (Sievert et al. 2016)
  • skimr - (Arino de la Rubia et al. 2017)

References

Arino de la Rubia, Eduardo, Hao Zhu, Shannon Ellis, Elin Waring, and Michael Quinn. 2017. Skimr: Skimr. https://github.com/ropenscilabs/skimr.

Carslaw, David C., and Karl Ropkins. 2012. “Openair — an R Package for Air Quality Data Analysis.” Environmental Modelling & Software 27–28 (0): 52–61. https://doi.org/10.1016/j.envsoft.2011.09.008.

Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

Müller, Kirill. 2017. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.

Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2016. Plotly: Create Interactive Web Graphics via ’Plotly.js’. https://CRAN.R-project.org/package=plotly.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

Zhu, Hao. 2018. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.