@dataknut
)Extracting data for Southampton A33 location for input to modelling.
Data for Southampton downloaded from :
Southampton City Council collects various forms of air quality data at the sites shown in 3.1. Some of these sites feed data to AURN.
The AURN data then undergoes a manual check and ratification process. Data that is less than 6 months old has not undergone this process.
Pollutants: CO, NO2, NOX, PM10, PM2.5 (ideally, or any subset – PM10 less useful as not such a strong road signal)
Dates:
Model runs on 10-minute averages or hourly averages – would be useful to have both.
Ultimately need to create a dataset with: Year, month, day, hour, minute (0,10,20,30,40,50), weekday/weekend, wind speed, wind direction, upwind/downwind, average pollutant concentration
wind speed etc available on AURN data (not raw Southampton data)
For much more detailed analysis see a longer and very messy data report.
aurnDT <- aurnDT[obsDate > as.Date("2018-01-01")] # for speed
aurnDT[, `:=`(pollutant, ifelse(pollutant == "wd", "windDirection", pollutant))]
aurnDT[, `:=`(pollutant, ifelse(pollutant == "ws", "windSpeed", pollutant))]
t <- table(aurnDT$pollutant, aurnDT$site)
kableExtra::kable(t, caption = "Dates where data != NA by site and measure", digits = 2) %>% kable_styling()
Southampton A33 | Southampton Centre | |
---|---|---|
no | 26280 | 26280 |
no2 | 26280 | 26280 |
nox | 26280 | 26280 |
nv10 | 26280 | 17496 |
nv2.5 | 0 | 17496 |
o3 | 0 | 26280 |
pm10 | 26280 | 26280 |
pm2.5 | 0 | 26280 |
so2 | 0 | 26280 |
v10 | 26280 | 17496 |
v2.5 | 0 | 17496 |
windDirection | 26280 | 26280 |
windSpeed | 26280 | 26280 |
Site locations:
yLab <- "Nitrogen Dioxide (ug/m3)"
no2dt <- aurnDT[pollutant == "no2"]
Figure 4.1 shows the availability of this data.
# dt,xvar, yvar,fillVar, yLab
p <- makeTilePlot(no2dt, xVar = "dateTimeUTC", yVar = "site", fillVar = "value", yLab = yLab)
p
yLab <- "Oxides of Nitrogen (ug/m3)"
noxdt <- aurnDT[pollutant == "nox"]
Figure 5.1 shows the availability of this data over time.
# dt,xvar, yvar,fillVar, yLab
p <- makeTilePlot(noxdt, xVar = "dateTimeUTC", yVar = "site", fillVar = "value", yLab = yLab)
p
yLab <- "Sulphour Dioxide (ug/m3)"
so2dt <- aurnDT[pollutant == "so2"]
Figure 6.1 shows the availability of this data over time.
# dt,xvar, yvar,fillVar, yLab
p <- makeTilePlot(so2dt, xVar = "dateTimeUTC", yVar = "site", fillVar = "value", yLab = yLab)
p
yLab <- "Ozone (ug/m3)"
o3dt <- aurnDT[pollutant == "o3"]
Figure 7.1 shows the most recent hourly data.
p <- makeTilePlot(o3dt, xVar = "dateTimeUTC", yVar = "site", fillVar = "value", yLab = yLab)
p
yLab <- "PM 10 (ug/m3)"
pm10dt <- aurnDT[pollutant == "pm10"]
Figure 8.1 shows the availability of data over time.
p <- makeTilePlot(pm10dt, xVar = "dateTimeUTC", yVar = "site", fillVar = "value", yLab = yLab)
p
yLab <- "PM 2.5 (ug/m3)"
pm25dt <- aurnDT[pollutant == "pm2.5"]
Figure 9.1 shows the availability of data over time.
p <- makeTilePlot(pm25dt, xVar = "dateTimeUTC", yVar = "site", fillVar = "value", yLab = yLab)
p
yLab <- "Wind speed (m/s)"
wsdt <- aurnDT[pollutant == "windSpeed"]
Figure 9.2 shows the availability of data over time.
p <- makeTilePlot(wsdt, xVar = "dateTimeUTC", yVar = "site", fillVar = "value", yLab = yLab)
p
yLab <- "Wind direction (deg)"
wddt <- aurnDT[pollutant == "windDirection"]
Figure 9.3 shows the availability of data over time.
p <- makeTilePlot(wddt, xVar = "dateTimeUTC", yVar = "site", fillVar = "value", yLab = yLab)
p
Save long form data to data folder.
aurnDT[, `:=`(weekDay, lubridate::wday(dateTimeUTC, label = TRUE, abbr = TRUE))]
f <- paste0(here::here(), "/data/sotonExtract2018_2020.csv")
data.table::fwrite(aurnDT, f)
Saved data description:
skimr::skim(aurnDT)
Name | aurnDT |
Number of rows | 516744 |
Number of columns | 9 |
_______________________ | |
Column type frequency: | |
character | 5 |
Date | 1 |
factor | 1 |
numeric | 1 |
POSIXct | 1 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
date | 0 | 1 | 20 | 20 | 0 | 26280 | 0 |
code | 0 | 1 | 4 | 4 | 0 | 2 | 0 |
site | 0 | 1 | 15 | 18 | 0 | 2 | 0 |
pollutant | 0 | 1 | 2 | 13 | 0 | 13 | 0 |
source | 0 | 1 | 4 | 4 | 0 | 1 | 0 |
Variable type: Date
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
obsDate | 0 | 1 | 2018-01-02 | 2020-12-31 | 2019-05-29 | 1095 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
weekDay | 0 | 1 | TRUE | 7 | Tue: 74136, Wed: 74040, Thu: 74040, Sun: 73632 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
value | 147233 | 0.72 | 38.83 | 70.44 | -9.4 | 4 | 11.68 | 34.52 | 855.62 | ▇▁▁▁▁ |
Variable type: POSIXct
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
dateTimeUTC | 0 | 1 | 2018-01-02 | 2020-12-31 23:00:00 | 2019-05-29 15:00:00 | 26280 |
Report generated using knitr in RStudio with R version 3.6.3 (2020-02-29) running on x86_64-apple-darwin15.6.0 (Darwin Kernel Version 19.4.0: Wed Mar 4 22:28:40 PST 2020; root:xnu-6153.101.6~15/RELEASE_X86_64).
t <- proc.time() - myParams$startTime
elapsed <- t[[3]]
Analysis completed in 65.358 seconds ( 1.09 minutes).
R packages used:
Arino de la Rubia, Eduardo, Hao Zhu, Shannon Ellis, Elin Waring, and Michael Quinn. 2017. Skimr: Skimr. https://github.com/ropenscilabs/skimr.
Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.
Garnier, Simon. 2018. Viridis: Default Color Maps from ’Matplotlib’. https://CRAN.R-project.org/package=viridis.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.
Müller, Kirill. 2017. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.
Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.
Zhu, Hao. 2018. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.