1 About

1.1 Purpose

Extracting data for Southampton A33 location for input to modelling.

2 Data

Data for Southampton downloaded from :

Southampton City Council collects various forms of air quality data at the sites shown in 3.1. Some of these sites feed data to AURN.

The AURN data then undergoes a manual check and ratification process. Data that is less than 6 months old has not undergone this process.

3 Extract specification

Pollutants: CO, NO2, NOX, PM10, PM2.5 (ideally, or any subset – PM10 less useful as not such a strong road signal)

Dates:

  • 11 Feb – 5 May 2019 - Training dataset
  • 11 Feb – 23 March 2020 – Model testing
  • 24 Mar – now - Reduction prediction

Model runs on 10-minute averages or hourly averages – would be useful to have both.

Ultimately need to create a dataset with: Year, month, day, hour, minute (0,10,20,30,40,50), weekday/weekend, wind speed, wind direction, upwind/downwind, average pollutant concentration

wind speed etc available on AURN data (not raw Southampton data)

For much more detailed analysis see a longer and very messy data report.

aurnDT <- aurnDT[obsDate > as.Date("2018-01-01")]  # for speed

aurnDT[, `:=`(pollutant, ifelse(pollutant == "wd", "windDirection", pollutant))]
aurnDT[, `:=`(pollutant, ifelse(pollutant == "ws", "windSpeed", pollutant))]

t <- table(aurnDT$pollutant, aurnDT$site)

kableExtra::kable(t, caption = "Dates where data != NA by site and measure", digits = 2) %>% kable_styling()
Table 3.1: Dates where data != NA by site and measure
Southampton A33 Southampton Centre
no 26280 26280
no2 26280 26280
nox 26280 26280
nv10 26280 17496
nv2.5 0 17496
o3 0 26280
pm10 26280 26280
pm2.5 0 26280
so2 0 26280
v10 26280 17496
v2.5 0 17496
windDirection 26280 26280
windSpeed 26280 26280

Site locations:

4 Nitrogen Dioxide (no2)

yLab <- "Nitrogen Dioxide (ug/m3)"
no2dt <- aurnDT[pollutant == "no2"]

Figure 4.1 shows the availability of this data.

# dt,xvar, yvar,fillVar, yLab
p <- makeTilePlot(no2dt, xVar = "dateTimeUTC", yVar = "site", fillVar = "value", yLab = yLab)

p
Nitrogen Dioxide data availability and levels over time

Figure 4.1: Nitrogen Dioxide data availability and levels over time

5 Oxides of Nitrogen (nox)

yLab <- "Oxides of Nitrogen (ug/m3)"
noxdt <- aurnDT[pollutant == "nox"]

Figure 5.1 shows the availability of this data over time.

# dt,xvar, yvar,fillVar, yLab
p <- makeTilePlot(noxdt, xVar = "dateTimeUTC", yVar = "site", fillVar = "value", yLab = yLab)

p
Oxides of nitrogen data availability and levels over time

Figure 5.1: Oxides of nitrogen data availability and levels over time

6 Sulphour Dioxide

yLab <- "Sulphour Dioxide (ug/m3)"
so2dt <- aurnDT[pollutant == "so2"]

Figure 6.1 shows the availability of this data over time.

# dt,xvar, yvar,fillVar, yLab
p <- makeTilePlot(so2dt, xVar = "dateTimeUTC", yVar = "site", fillVar = "value", yLab = yLab)

p
Sulphour Dioxide data availability and levels over time

Figure 6.1: Sulphour Dioxide data availability and levels over time

7 Ozone

yLab <- "Ozone (ug/m3)"
o3dt <- aurnDT[pollutant == "o3"]

Figure 7.1 shows the most recent hourly data.

p <- makeTilePlot(o3dt, xVar = "dateTimeUTC", yVar = "site", fillVar = "value", yLab = yLab)

p
Availability and level of o3 data over time

Figure 7.1: Availability and level of o3 data over time

8 PM 10

yLab <- "PM 10 (ug/m3)"
pm10dt <- aurnDT[pollutant == "pm10"]

Figure 8.1 shows the availability of data over time.

p <- makeTilePlot(pm10dt, xVar = "dateTimeUTC", yVar = "site", fillVar = "value", yLab = yLab)

p
Availability and level of o3 data over time

Figure 8.1: Availability and level of o3 data over time

9 PM 2.5

yLab <- "PM 2.5 (ug/m3)"
pm25dt <- aurnDT[pollutant == "pm2.5"]

Figure 9.1 shows the availability of data over time.

p <- makeTilePlot(pm25dt, xVar = "dateTimeUTC", yVar = "site", fillVar = "value", yLab = yLab)

p
Availability and level of o3 data over time

Figure 9.1: Availability and level of o3 data over time

9.1 Wind speed

yLab <- "Wind speed (m/s)"
wsdt <- aurnDT[pollutant == "windSpeed"]

Figure 9.2 shows the availability of data over time.

p <- makeTilePlot(wsdt, xVar = "dateTimeUTC", yVar = "site", fillVar = "value", yLab = yLab)

p
Availability and level of wind speed data over time

Figure 9.2: Availability and level of wind speed data over time

9.2 Wind direction

yLab <- "Wind direction (deg)"
wddt <- aurnDT[pollutant == "windDirection"]

Figure 9.3 shows the availability of data over time.

p <- makeTilePlot(wddt, xVar = "dateTimeUTC", yVar = "site", fillVar = "value", yLab = yLab)

p
Availability and level of wind direction data over time

Figure 9.3: Availability and level of wind direction data over time

10 Save data

Save long form data to data folder.

aurnDT[, `:=`(weekDay, lubridate::wday(dateTimeUTC, label = TRUE, abbr = TRUE))]
f <- paste0(here::here(), "/data/sotonExtract2018_2020.csv")
data.table::fwrite(aurnDT, f)

Saved data description:

skimr::skim(aurnDT)
Table 10.1: Data summary
Name aurnDT
Number of rows 516744
Number of columns 9
_______________________
Column type frequency:
character 5
Date 1
factor 1
numeric 1
POSIXct 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
date 0 1 20 20 0 26280 0
code 0 1 4 4 0 2 0
site 0 1 15 18 0 2 0
pollutant 0 1 2 13 0 13 0
source 0 1 4 4 0 1 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
obsDate 0 1 2018-01-02 2020-12-31 2019-05-29 1095

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
weekDay 0 1 TRUE 7 Tue: 74136, Wed: 74040, Thu: 74040, Sun: 73632

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
value 147233 0.72 38.83 70.44 -9.4 4 11.68 34.52 855.62 ▇▁▁▁▁

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
dateTimeUTC 0 1 2018-01-02 2020-12-31 23:00:00 2019-05-29 15:00:00 26280

11 Annex

12 Runtime

Report generated using knitr in RStudio with R version 3.6.3 (2020-02-29) running on x86_64-apple-darwin15.6.0 (Darwin Kernel Version 19.4.0: Wed Mar 4 22:28:40 PST 2020; root:xnu-6153.101.6~15/RELEASE_X86_64).

t <- proc.time() - myParams$startTime

elapsed <- t[[3]]

Analysis completed in 65.358 seconds ( 1.09 minutes).

R packages used:

  • data.table - (Dowle et al. 2015)
  • ggplot2 - (Wickham 2009)
  • here - (Müller 2017)
  • kableExtra - (Zhu 2018)
  • lubridate - (Grolemund and Wickham 2011)
  • skimr - (Arino de la Rubia et al. 2017)
  • viridis - (Garnier 2018)

References

Arino de la Rubia, Eduardo, Hao Zhu, Shannon Ellis, Elin Waring, and Michael Quinn. 2017. Skimr: Skimr. https://github.com/ropenscilabs/skimr.

Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.

Garnier, Simon. 2018. Viridis: Default Color Maps from ’Matplotlib’. https://CRAN.R-project.org/package=viridis.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

Müller, Kirill. 2017. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

Zhu, Hao. 2018. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.