1 About

1.1 Contributions

Please note that authorship is alphabetical. Contributions are listed below - see github for details and who to blame for what :-).

Ben Anderson ( @dataknut)

1.3 Citation

If you wish to refer to any of the material from this report please cite as:

  • Anderson, B., (2019) Air Quality in New Zealand: Exploring “official” data , University of Southampton: Southampton, UK.

Report circulation:

  • Public.

Report purpose:

  • to explore official New Zealand Air Quality data

This work has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 700386 (SPATIALEC).

This work is (c) 2019 the University of Southampton.

2 Introduction

LAWA seems to hold sub-daily data as it is used to create almost real-time plots & reports - see https://www.lawa.org.nz/explore-data/otago-region/air-quality/alexandra/alexandra-at-5-ventry-street/

It is unclear how we can access this data so for now we have used the MfE data.

3 PM 10 data

PM 10 data: has more sensors and wider coverage.

Data source: https://data.mfe.govt.nz/data/category/air/

Data file: mfe-pm10-concentrations-200617-CSV/pm10-concentrations-200617.csv

df <- paste0(dPath, pm10File)

pm10dt <- data.table::fread(df)
pm10dt[, `:=`(ba_date, lubridate::as_date(date))]
# the data is daily but there may be gaps?
pm10dt[, `:=`(council.site, paste0(council, ".", site))]

Overall there are:

  • 93 sites spread over
  • 15 councils
# looks like daily data with gaps
p <- makeTilePlot(pm10dt[council.site %like% "ORC"], yvar = "pm10", byvar = "council.site")
p + labs(y = "pm10") + guides(fill = guide_legend(title = "pm10"))
Test data values by date and site (Otago RC)

Figure 3.1: Test data values by date and site (Otago RC)

st <- pm10dt[, .(mean_PM10 = mean(pm10), min_PM10 = min(pm10), max_PM10 = max(pm10), 
    nObs = .N, startDate = min(ba_date), endDate = max(ba_date)), keyby = .(council)]

kableExtra::kable(st, digits = 2, caption = "Summary statstics for PM10 by Council") %>% 
    kable_styling()
Table 3.1: Summary statstics for PM10 by Council
council mean_PM10 min_PM10 max_PM10 nObs startDate endDate
AC 14.63 1.28 275.95 41471 2006-01-02 2016-12-31
BOPRC 17.16 -0.82 433.75 15931 2006-01-02 2016-12-31
ECAN 20.70 -0.55 191.53 36549 2006-01-02 2017-09-03
GDC 13.11 -10.00 75.30 1671 2006-04-18 2017-11-20
GWRC 12.04 0.07 93.75 14944 2006-01-02 2016-12-31
HBRC 16.04 -0.60 86.22 6346 2006-01-02 2017-11-01
HRC 14.35 2.92 105.48 6703 2006-07-05 2017-12-31
MDC 16.17 -8.00 82.70 3833 2006-06-24 2017-11-25
NCC 18.39 -1.00 116.00 10359 2006-01-02 2017-12-31
NRC 13.84 -0.20 94.91 5692 2006-05-04 2016-12-31
ORC 23.39 0.00 203.21 21712 2006-01-02 2017-08-31
SRC 19.88 0.54 141.68 7920 2006-05-07 2016-12-31
TDC 19.99 3.00 133.00 3920 2006-01-02 2016-12-31
WCRC 20.08 0.00 129.06 3721 2006-05-03 2017-11-19
WRC 14.02 -7.43 135.47 29192 2006-01-02 2016-12-31

Table 3.1 suggests there are negative values for some days. Why?

Figure 3.2 shows daily values for all sites and indicates those that cross the:

# looks like daily data with gaps
p <- makeLinePlot(pm10dt, yvar = "pm10", byvar = "council.site")

p <- p + labs(y = "pm10", caption = "NZ/WHO threshold shown in red") + geom_hline(yintercept = dailyPm10Threshold_WHO, 
    colour = "red") + geom_hline(yintercept = dailyPm10Threshold_NZ, colour = "red") + 
    guides(colour = guide_legend(title = "pm10")) + theme(legend.position = "bottom") + 
    facet_grid(council ~ .)

p
Test data values by date and site

Figure 3.2: Test data values by date and site

plotly::ggplotly(p)

Figure 3.2: Test data values by date and site

4 PM 2.5 data

PM 2.5 data: has fewer sensors and less coverage.

Data source: https://data.mfe.govt.nz/data/category/air/

Data file: mfe-pm25-concentrations-200817-CSV/pm25-concentrations-200817.csv

df <- paste0(dPath, pm2.5File)

pm2.5dt <- data.table::fread(df)
pm2.5dt[, `:=`(ba_date, lubridate::as_date(date))]
# the data is daily but there may be gaps?
pm2.5dt[, `:=`(council.site, paste0(council, ".", site))]

Overall there are:

  • 35 sites spread over
  • 9 councils
# looks like daily data with gaps
p <- makeTilePlot(pm2.5dt, yvar = "pm2_5", byvar = "council.site")
p + labs(y = "pm2_5") + guides(fill = guide_legend(title = "pm2_5"))
Test data values by date and site

(#fig:pm2_5TestData)Test data values by date and site

st <- pm2.5dt[, .(mean_PM10 = mean(pm2_5), min_PM10 = min(pm2_5), max_PM10 = max(pm2_5), 
    nObs = .N, startDate = min(ba_date), endDate = max(ba_date)), keyby = .(council)]

kableExtra::kable(st, caption = "Summary statstics for PM2.5 by Council") %>% kable_styling()
(#tab:pm2_5TestData)Summary statstics for PM2.5 by Council
council mean_PM10 min_PM10 max_PM10 nObs startDate endDate
AC 5.718759 0.32478 71.92000 15340 2008-01-01 2016-12-31
ECAN 10.782545 0.00000 119.20000 10580 2010-12-14 2017-09-03
GWRC 9.086482 0.09300 94.45000 4782 2011-01-28 2016-12-31
HBRC 8.018358 0.12000 68.35000 895 2016-06-30 2017-11-01
MDC 14.834513 1.56000 64.18000 339 2016-12-21 2017-11-25
NCC 12.395870 0.14000 180.00000 523 2008-07-02 2017-11-22
NRC 6.335671 2.22091 20.24834 155 2016-07-30 2016-12-31
TDC 10.171622 0.20000 46.00000 370 2015-10-01 2016-12-29
WRC 9.919644 -0.43056 65.48847 766 2013-05-17 2016-09-07

We also seem to have some negative values here…

Figure @ref(fig:pm2_5TestDataPlotly) shows daily values for all sites and indicates those that cross the:

NZ has yet to set a PM2.5 exposure threshold.

# looks like daily data with gaps
p <- makeLinePlot(pm2.5dt[council.site %like% "ORC" | council.site %like% "ECAN"], 
    yvar = "pm2_5", byvar = "council.site")

p <- p + labs(y = "pm2_5", caption = "WHO threshold shown in red") + geom_hline(yintercept = dailyPm2.5Threshold_WHO, 
    colour = "red") + guides(colour = guide_legend(title = "pm2_5")) + theme(legend.position = "bottom") + 
    facet_grid(council.site ~ .)

p
Test data values by date and site for ECAN & ORC

(#fig:pm2_5TestDataPlotly)Test data values by date and site for ECAN & ORC

plotly::ggplotly(p)

(#fig:pm2_5TestDataPlotly)Test data values by date and site for ECAN & ORC

5 Statistical Annex

5.1 PM10

skimr::skim(pm10dt)
Table 5.1: Data summary
Name pm10dt
Number of rows 209964
Number of columns 10
_______________________
Column type frequency:
character 5
Date 1
logical 3
numeric 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
site 0 1 4 28 0 93 0
date 0 1 19 19 0 4382 0
council 0 1 2 5 0 15 0
method 0 1 3 4 0 2 0
council.site 0 1 8 34 0 93 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
ba_date 0 1 2006-01-02 2017-12-31 2011-12-27 4382

Variable type: logical

skim_variable n_missing complete_rate mean count
complete_for_trend 1152 0.99 0.64 TRU: 134660, FAL: 74152
complete_for_mean 1152 0.99 0.74 TRU: 154552, FAL: 54260
complete_year 74218 0.65 1.00 TRU: 135746

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
pm10 0 1 17.12 12.83 -10 9.9 14 20.22 433.75 ▇▁▁▁▁
# looks like daily data with gaps
t <- pm10dt[, .(nObs = .N), keyby = .(council, site)]

kableExtra::kable(t, caption = "N obs at sites (PM10)") %>% kable_styling()
Table 5.2: N obs at sites (PM10)
council site nObs
AC BeachlandsMobile 340
AC Botany 1229
AC BotanyDowns 1813
AC GlenEden 3975
AC Helensville 342
AC Henderson 3951
AC KhyberPass 1172
AC KhyberPassRoada 1701
AC Kingsland 602
AC Kumeu 2188
AC MobilePOAL 1327
AC MountEdenIIb 18
AC Orewa 2439
AC Pakuranga 3979
AC Penrose 2114
AC PenroseIIb 1797
AC PenroseIVd 15
AC Pukekohe 802
AC Putamahoe 3944
AC Takapuna 3954
AC Waiheke 663
AC Waiuku 297
AC Warkworth 570
AC Whangaparaoa 2239
BOPRC Rotorua_at_Edmund_Road 3410
BOPRC Rotorua_at_Fenton_St 328
BOPRC Rotorua_at_Ngapuna 2279
BOPRC Rotorua_at_Pererika_Street 1947
BOPRC Tauranga_at_Morland_Fox_Park 2367
BOPRC Tauranga_at_Otumoetai 3291
BOPRC Whakatane_at_Kopeopeo 2309
ECAN Ashburton 4213
ECAN ChRiccRd 606
ECAN ChStA 4208
ECAN ChWoolston 4201
ECAN Geraldine 3761
ECAN Kaiapoi 4120
ECAN Rangiora 4138
ECAN Timaru 4201
ECAN Waimate 3738
ECAN Washdyke 3363
GDC GisborneBoysHigh 1671
GWRC LowerHutt 2180
GWRC MastertonEast 1596
GWRC MastertonWest 3425
GWRC UpperHutt 3966
GWRC Wainuiomata 3777
HBRC Awatoto 2043
HBRC Marewa_Park 4303
HRC Taihape 4039
HRC Taumarunui 2664
MDC Blenheim 3833
NCC Airshed_A 4344
NCC Airshed_B1 3974
NCC Airshed_B2 972
NCC Airshed_C 935
NCC Airshed_C2 134
NRC Kaitaia 349
NRC Kerikeri 323
NRC RobertSt 3571
NRC Ruakaka 1449
ORC Alexandra 3744
ORC Arrowtown 2772
ORC Balclutha 1459
ORC Clyde 1893
ORC Cromwell 1931
ORC Dunedin 3429
ORC Lawrence 536
ORC Milton 1765
ORC Mosgiel 2971
ORC Naseby 97
ORC Oamaru 474
ORC Palmerston 318
ORC Ranfurly 231
ORC Roxburgh 92
SRC Gore 3850
SRC Invercargill 2806
SRC Winton 1264
TDC Richmond 3920
WCRC Reefton 3721
WRC Cambridge 1129
WRC Hamilton_Claudelands 975
WRC Hamilton_Ohaupo_Rd 1723
WRC Hamilton_Peachgrove_Rd 2833
WRC Matamata 2626
WRC Morrinsville 526
WRC Putaruru 3731
WRC Taupo 3542
WRC Te_Awamutu 1199
WRC Te_Kuiti 3976
WRC Thames 263
WRC Tokoroa 3977
WRC Turangi 2692

5.2 PM2.5

skimr::skim(pm2.5dt)
(#tab:skimPm2.5)Data summary
Name pm2.5dt
Number of rows 33750
Number of columns 10
_______________________
Column type frequency:
character 5
Date 1
logical 3
numeric 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
site 0 1 4 22 0 35 0
date 0 1 10 10 0 3617 0
council 0 1 2 4 0 9 0
method 0 1 3 3 0 1 0
council.site 0 1 7 26 0 35 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
ba_date 0 1 2008-01-01 2017-11-25 2014-03-05 3617

Variable type: logical

skim_variable n_missing complete_rate mean count
complete_for_trend 0 1.00 0.32 FAL: 22856, TRU: 10894
complete_for_mean 0 1.00 0.56 TRU: 18763, FAL: 14987
complete_year 14594 0.57 1.00 TRU: 19156

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
pm2_5 0 1 8.19 7.89 -0.43 4.1 5.79 8.69 180 ▇▁▁▁▁
# looks like daily data with gaps
t <- pm2.5dt[, .(nObs = .N), keyby = .(council, site)]

kableExtra::kable(t, caption = "N obs at sites (PM2.5)") %>% kable_styling()
(#tab:pm2.5Sites)N obs at sites (PM2.5)
council site nObs
AC BeachlandsMobile 219
AC Helensville 341
AC MobilePOAL 1024
AC POAL 303
AC Patumahoe 3109
AC Penrose 2129
AC PenroseIIb 1090
AC Pukekohe 352
AC Takapuna 3244
AC Waiheke 670
AC Waiuku 296
AC Warkworth 316
AC Whangaparaoa 2247
ECAN Ashburton 703
ECAN ChRiccRd 186
ECAN ChStA 2412
ECAN ChWoolston 2116
ECAN Geraldine 561
ECAN Kaiapoi 361
ECAN Rangiora 742
ECAN Timaru 2003
ECAN Waimate 705
ECAN Washdyke 791
GWRC MastertonEast 1032
GWRC MastertonWest 2066
GWRC Wainuiomata 1684
HBRC Awatoto 409
HBRC St_John_s 486
MDC Blenheim 339
NCC Airshed_A 523
NRC RobertSt 155
TDC Richmond 370
WRC Hamilton_Claudelands 336
WRC Hamilton_Peachgrove_Rd 43
WRC Tokoroa 387

6 Runtime

Report generated using knitr in RStudio with R version 3.6.2 (2019-12-12) running on x86_64-apple-darwin15.6.0 (Darwin Kernel Version 17.7.0: Sun Dec 1 19:19:56 PST 2019; root:xnu-4570.71.63~1/RELEASE_X86_64).

t <- proc.time() - startTime

elapsed <- t[[3]]

Analysis completed in 40.306 seconds ( 0.67 minutes).

R packages used:

  • data.table - (Dowle et al. 2015)
  • ggplot2 - (Wickham 2009)
  • here - (Müller 2017)
  • kableExtra - (Zhu 2018)
  • lubridate - (Grolemund and Wickham 2011)
  • plotly - (Sievert et al. 2016)
  • skimr - (Arino de la Rubia et al. 2017)

References

Arino de la Rubia, Eduardo, Hao Zhu, Shannon Ellis, Elin Waring, and Michael Quinn. 2017. Skimr: Skimr. https://github.com/ropenscilabs/skimr.

Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

Müller, Kirill. 2017. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.

Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2016. Plotly: Create Interactive Web Graphics via ’Plotly.js’. https://CRAN.R-project.org/package=plotly.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

Zhu, Hao. 2018. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.