@dataknut
)If you wish to use any of the material from this report please cite as:
This work is (c) 2018 the University of Southampton.
This report uses the safe version of the grid spy 1 minute data which has been processed using the code in https://github.com/CfSOtago/GREENGridData/tree/master/dataProcessing/gridSpy. It also assumes you have already run the example circuit extraction script using circuit = Heat Pump.
This work was supported by:
We do not ‘support’ the code but if you notice a problem please check the issues on our repo and if it doesn’t already exist, please open a new one.
Report purpose:
The data used to generate this report is:
First we load the household data. readr
will give some feedback on the columns.
## Parsed with column specification:
## cols(
## .default = col_integer(),
## linkID = col_character(),
## hasApplianceSummary = col_character(),
## Oven = col_character(),
## `Fridge / Freezer 1` = col_character(),
## `Fridge / Freezer 2` = col_character(),
## `Fridge / Freezer 3` = col_character(),
## Dishwasher = col_character(),
## Microwave = col_character(),
## `Washing Machine` = col_character(),
## `Clothes Dryer` = col_character(),
## `Hot water cylinder` = col_character(),
## `Other Appliance` = col_character(),
## `Electric heater` = col_character(),
## `Heated towel rails` = col_character(),
## `PV Inverter` = col_character(),
## `Energy Storage` = col_character(),
## `Other Generation Device` = col_character(),
## hasLongSurvey = col_character(),
## StartDate = col_character(),
## Q14_1 = col_double()
## # ... with 12 more columns
## )
## See spec(...) for full column specifications.
Next we load the Grid Spy extract for Heat Pump. This uses a GREENGridData
package function intended to load the cleaned individual household data which warns that two of the column names are not found. These columns were dropped during the extraction process so we can safely ignore these warnings
## Warning: The following named parsers don't match the column names:
## dateTime_orig, TZ_orig
hhID | linkID | r_dateTime | circuit | powerW |
---|---|---|---|---|
rf_08 | rf_08 | 2015-04-01 00:00:00 | Heat Pump$2092 | 0 |
rf_08 | rf_08 | 2015-04-01 00:01:00 | Heat Pump$2092 | 0 |
rf_08 | rf_08 | 2015-04-01 00:02:00 | Heat Pump$2092 | 0 |
rf_08 | rf_08 | 2015-04-01 00:03:00 | Heat Pump$2092 | 0 |
rf_08 | rf_08 | 2015-04-01 00:04:00 | Heat Pump$2092 | 0 |
rf_08 | rf_08 | 2015-04-01 00:05:00 | Heat Pump$2092 | 0 |
Table 3.1 shows the first few rows of the Grid Spy 1 minute power data.
|
| r_dateTime | circuit |
| |
---|---|---|---|---|---|
Length:14250284 | Length:14250284 | Min. :2015-04-01 00:00:00 | Length:14250284 | Min. : -655.00 | |
Class :character | Class :character | 1st Qu.:2015-06-22 12:39:00 | Class :character | 1st Qu.: 0.00 | |
Mode :character | Mode :character | Median :2015-09-16 13:12:00 | Mode :character | Median : 0.00 | |
NA | NA | Mean :2015-09-21 08:00:39 | NA | Mean : 147.92 | |
NA | NA | 3rd Qu.:2015-12-17 17:52:00 | NA | 3rd Qu.: 61.29 | |
NA | NA | Max. :2016-03-31 23:59:00 | NA | Max. :27759.00 |
Table 3.2 shows a summary of the Grid Spy 1 minute power data.
Note that we have some Nega watts - which households have them?
##
## NegaW PosW
## rf_08 0 521724
## rf_09 88 152576
## rf_10 0 526678
## rf_11 0 519127
## rf_13 181 1053639
## rf_17a 0 519910
## rf_19 34 1052942
## rf_20 28 102140
## rf_21 0 505028
## rf_25 7 443881
## rf_27 0 497686
## rf_28 0 79033
## rf_29 0 526778
## rf_31 0 526802
## rf_32 0 526665
## rf_33 0 526863
## rf_34 0 526557
## rf_35 0 327860
## rf_36 0 516127
## rf_37 0 526651
## rf_38 54 373608
## rf_40 0 338280
## rf_41 0 223790
## rf_42 0 518064
## rf_43 0 288814
## rf_44 0 526737
## rf_45 0 525994
## rf_46 397951 552976
## rf_47 0 525011
##
## NegaW PosW
## rf_08 0.0 100.0
## rf_09 0.1 99.9
## rf_10 0.0 100.0
## rf_11 0.0 100.0
## rf_13 0.0 100.0
## rf_17a 0.0 100.0
## rf_19 0.0 100.0
## rf_20 0.0 100.0
## rf_21 0.0 100.0
## rf_25 0.0 100.0
## rf_27 0.0 100.0
## rf_28 0.0 100.0
## rf_29 0.0 100.0
## rf_31 0.0 100.0
## rf_32 0.0 100.0
## rf_33 0.0 100.0
## rf_34 0.0 100.0
## rf_35 0.0 100.0
## rf_36 0.0 100.0
## rf_37 0.0 100.0
## rf_38 0.0 100.0
## rf_40 0.0 100.0
## rf_41 0.0 100.0
## rf_42 0.0 100.0
## rf_43 0.0 100.0
## rf_44 0.0 100.0
## rf_45 0.0 100.0
## rf_46 41.8 58.2
## rf_47 0.0 100.0
|
| r_dateTime | circuit |
| obsTime |
| |
---|---|---|---|---|---|---|---|
Length:13851941 | Length:13851941 | Min. :2015-04-01 00:00:00 | Length:13851941 | Min. : 0.00 | Length:13851941 | Length:13851941 | |
Class :character | Class :character | 1st Qu.:2015-06-21 15:43:00 | Class :character | 1st Qu.: 0.00 | Class1:hms | Class :character | |
Mode :character | Mode :character | Median :2015-09-16 00:11:00 | Mode :character | Median : 0.00 | Class2:difftime | Mode :character | |
NA | NA | Mean :2015-09-20 18:47:53 | NA | Mean : 154.28 | Mode :numeric | NA | |
NA | NA | 3rd Qu.:2015-12-17 10:44:00 | NA | 3rd Qu.: 70.19 | NA | NA | |
NA | NA | Max. :2016-03-31 23:59:00 | NA | Max. :27759.00 | NA | NA |
Table 3.3 shows a summary of the Grid Spy 1 minute power data after the removal of any negaWatts.
Note that:
First we create a Southern Hemisphere season variable. Luckily we have a function to do this in the GREENGridData
package. We print a check table to ensure we are all happy with the coding of season
.
gsDT <- GREENGridData::addNZSeason(gsDT)
table(lubridate::month(gsDT$r_dateTime, label = TRUE), gsDT$season, useNA = "always")
##
## Spring Summer Autumn Winter <NA>
## Jan 0 1018616 0 0 0
## Feb 0 959530 0 0 0
## Mar 0 0 1005004 0 0
## Apr 0 0 1266948 0 0
## May 0 0 1316893 0 0
## Jun 0 0 0 1270440 0
## Jul 0 0 0 1264173 0
## Aug 0 0 0 1209094 0
## Sep 1187018 0 0 0 0
## Oct 1209409 0 0 0 0
## Nov 1104322 0 0 0 0
## Dec 0 1040494 0 0 0
## <NA> 0 0 0 0 0
For simplicity we will focus only on Summer and Winter.
This section plots overall mean power per half hour by season.
gsDT <- gsDT[, r_dateTimeQHour := lubridate::floor_date(r_dateTime, unit = "15 mins")]
# create mean power across 15 minute periods to use as base dataset (comparable to SAVE)
qHourDT <- gsDT[, .(meanW = mean(powerW)), keyby = .(r_dateTimeQHour,linkID, season)
]
qHourDT <- qHourDT[, obsQHour := hms::as.hms(r_dateTimeQHour)]
plotDT <- qHourDT[, .(meanW = mean(meanW)), keyby = .(season, obsQHour)
]
# set attributes for plot
vLineAlpha <- 0.4
vLineCol <- "#0072B2" # http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/#a-colorblind-friendly-palette
timeBreaks <- c(hms::as.hms("04:00:00"),
hms::as.hms("08:00:00"),
hms::as.hms("12:00:00"),
hms::as.hms("16:00:00"),
hms::as.hms("20:00:00"),
hms::as.hms("24:00:00")
)
# create default caption
myCaption <- paste0("GREENGrid Grid Spy household electricity demand data (https://dx.doi.org/10.5255/UKDA-SN-853334)",
"\n", min(lubridate::date(gsDT$r_dateTime)),
" to ", max(lubridate::date(gsDT$r_dateTime)),
"\nTime = Pacific/Auckland",
"\n (c) ", lubridate::year(now())," University of Otago")
myPlot <- ggplot2::ggplot(plotDT[!is.na(season)], # make sure no un-set seasons/non-parsed dates
aes(x = obsQHour, y = meanW/1000)) +
geom_line() +
facet_grid(season ~ .) +
scale_colour_manual(values=ggParams$cbPalette) + # use colour-blind friendly palette
theme(strip.text.y = element_text(angle = 0, vjust = 0.5, hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.5)) +
guides(colour = guide_legend(title = "Season: ")) +
theme(legend.position = "bottom") +
labs(title = paste0(params$circuit, ": seasonal mean power demand profiles"),
y = "Mean kW per 15 minutes",
x = "Time of day",
caption = myCaption
)
myPlot +
scale_x_time(breaks = timeBreaks) +
geom_vline(xintercept = timeBreaks, alpha = vLineAlpha, colour = vLineCol)
#ggplot2::ggsave(paste0(ggParams$repoLoc,"/examples/outputs/", params$circuit, "_meankWperminBySeason.png"))
Figure 4.1 shows the overall mean kW per minute in each season for this circuit (Heat Pump).
Table 4.1 shows the number of households who have different numbers of people (children and adults). This table includes households where we do not know the number of people (NA) but we do have electricity demand data.
Q57 | Freq |
---|---|
NA | 2 |
1 | 4 |
2 | 11 |
3 | 10 |
4 | 10 |
5 | 5 |
6 | 2 |
Clearly this is too fine grained (too many categories). We therefor collapse to form the coding shown in 4.2.
nPeople | Freq |
---|---|
NA | 2 |
1 | 4 |
2 | 11 |
3 | 10 |
4+ | 17 |
Now we link (join) the Grid Spy and household data.tables and aggregate (summarise) by season and number of people. You can do this using data.table
’s on the fly join but we have found pre-joining of the columns you want to be much faster. We’re not sure why as it shouldn’t be. You can probably also do this in dplyr
etc but we haven’t tried.
Figure 4.2 shows the mean kW per minute per season by presence of young children for this circuit (Heat Pump). Can you see anything interesting or unusual and might this be due to the numbers of households in each group?
nPeople | nHHs |
---|---|
1 | 2 |
2 | 4 |
3 | 8 |
4+ | 12 |
This section plots overall mean power per minute by season and number of children aged 0-12 as an illustration of how to link the Grid Spy and household data. We will go through the steps with commentary and showing the code…
Table 4.4 shows the number of households who have different numbers of children aged 0-12 so we know how many households make up each line on the plot. This table includes households where we do not know the number of children (NA) but we do have electricity demand data.
nChildren0_12 | Freq |
---|---|
NA | 2 |
0 | 17 |
1 | 11 |
2 | 10 |
3 | 4 |
presenceChildren | Freq |
---|---|
0 children | 19 |
1+ child | 25 |
Now use the aggregated data.table
to make the plot. Note that as specified this will add a line for nChildren0_12 == NA household(s) - see Table ??.
keepCols <- c("linkID", "nChildren0_12")
mergedDT <- qHourDT[hhDT[, ..keepCols]]
plotDT <- mergedDT[!is.na(nChildren0_12), .(meanW = mean(meanW),
sdW = sd(meanW),
nObs = .N), keyby = .(season, obsQHour, nChildren0_12)]
plotDT <- plotDT[, ci_upper := meanW + qnorm(0.975)*(sdW/sqrt(nObs))]
plotDT <- plotDT[, ci_lower := meanW - qnorm(0.975)*(sdW/sqrt(nObs))]
basePlot <- ggplot2::ggplot(plotDT[!is.na(season)], # make sure no un-set seasons/non-parsed dates
aes(x = obsQHour, y = meanW/1000,
colour = as.factor(nChildren0_12))) +
geom_line() +
scale_colour_manual(values=ggParams$cbPalette) + # use colour-blind friendly palette
facet_grid(season ~ .) +
theme(strip.text.y = element_text(angle = 0, vjust = 0.5, hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.5)) +
guides(colour = guide_legend(title = "Number of children aged 0 - 12: ")) +
theme(legend.position = "bottom") +
labs(title = paste0(params$circuit, ": seasonal mean power demand profiles by n children aged 0-12"),
y = "Mean kW per 15 minutes",
x = "Time of day",
caption = myCaption
)
basePlot <- basePlot +
scale_x_time(breaks = timeBreaks) +
geom_vline(xintercept = timeBreaks, alpha = vLineAlpha, colour = vLineCol)
basePlot
# add 95% CI
ciPlot <- basePlot + geom_errorbar(aes(ymin = ci_lower/1000, ymax = ci_upper/1000))
ciPlot
# use reduced
keepCols <- c("linkID", "presenceChildren")
mergedDT <- qHourDT[hhDT[, ..keepCols]]
plotDT <- mergedDT[!is.na(presenceChildren), .(meanW = mean(meanW),
sdW = sd(meanW),
nObs = .N), keyby = .(season, obsQHour, presenceChildren)]
plotDT <- plotDT[, ci_upper := meanW + qnorm(0.975)*(sdW/sqrt(nObs))]
plotDT <- plotDT[, ci_lower := meanW - qnorm(0.975)*(sdW/sqrt(nObs))]
basePlot <- ggplot2::ggplot(plotDT[!is.na(season)], # make sure no un-set seasons/non-parsed dates
aes(x = obsQHour, y = meanW/1000,
colour = as.factor(presenceChildren))) +
geom_line() +
scale_colour_manual(values=ggParams$cbPalette) + # use colour-blind friendly palette
facet_grid(season ~ .) +
theme(strip.text.y = element_text(angle = 0, vjust = 0.5, hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.5)) +
guides(colour = guide_legend(title = "Number of children aged 0 - 12: ")) +
theme(legend.position = "bottom") +
labs(title = paste0(params$circuit, ": seasonal mean power demand profiles by n children aged 0-12"),
y = "Mean kW per 15 minutes",
x = "Time of day",
caption = myCaption
)
basePlot <- basePlot +
scale_x_time(breaks = timeBreaks) +
geom_vline(xintercept = timeBreaks, alpha = vLineAlpha, colour = vLineCol)
basePlot
# add 95% CI
ciPlot <- basePlot + geom_errorbar(aes(ymin = ci_lower/1000, ymax = ci_upper/1000))
ciPlot
# actual n household used
t <- mergedDT[!is.na(presenceChildren) & !is.na(season) , .(nHHs = uniqueN(linkID)), keyby = .(presenceChildren)]
knitr::kable(t, caption = "Actual n households used in plot")
presenceChildren | nHHs |
---|---|
0 children | 8 |
1+ child | 20 |
Figure 4.3 shows the mean kW per minute per season by presence of young children for this circuit (Heat Pump). Can you see anything interesting or unusual and might this be due to the numbers of households in each group?
Analysis completed in 62.5 seconds ( 1.04 minutes) using knitr in RStudio with R version 3.5.1 (2018-07-02) running on x86_64-apple-darwin15.6.0.
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] kableExtra_0.9.0 GREENGridData_1.0 bookdown_0.7 rmarkdown_1.10
## [5] readr_1.1.1 ggplot2_3.1.0 lubridate_1.7.4 data.table_1.11.8
## [9] GREENGrid_0.1.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.19 highr_0.7 cellranger_1.1.0
## [4] compiler_3.5.1 pillar_1.3.0 plyr_1.8.4
## [7] bindr_0.1.1 prettyunits_1.0.2 progress_1.2.0
## [10] tools_3.5.1 digest_0.6.18 viridisLite_0.3.0
## [13] evaluate_0.12 tibble_1.4.2 gtable_0.2.0
## [16] pkgconfig_2.0.2 rlang_0.3.0.1 rstudioapi_0.8
## [19] yaml_2.2.0 xfun_0.4 bindrcpp_0.2.2
## [22] xml2_1.2.0 httr_1.3.1 withr_2.1.2
## [25] stringr_1.3.1 dplyr_0.7.7 knitr_1.20
## [28] hms_0.4.2 rprojroot_1.3-2 grid_3.5.1
## [31] tidyselect_0.2.5 glue_1.3.0 R6_2.3.0
## [34] readxl_1.1.0 reshape2_1.4.3 purrr_0.2.5
## [37] magrittr_1.5 backports_1.1.2 scales_1.0.0
## [40] htmltools_0.3.6 rvest_0.3.2 assertthat_0.2.0
## [43] colorspace_1.3-2 labeling_0.3 stringi_1.2.4
## [46] lazyeval_0.2.1 munsell_0.5.0 crayon_1.3.4
Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, and Winston Chang. 2018. Rmarkdown: Dynamic Documents for R. https://CRAN.R-project.org/package=rmarkdown.
Anderson, Ben, and David Eyers. 2018. GREENGridData: Processing Nz Green Grid Project Data to Create a ’Safe’ Version for Data Archiving and Re-Use. https://github.com/CfSOtago/GREENGridData.
Csárdi, Gábor, and Rich FitzJohn. 2016. Progress: Terminal Progress Bars. https://CRAN.R-project.org/package=progress.
Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.
R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Wickham, Hadley. 2007. “Reshaping Data with the reshape Package.” Journal of Statistical Software 21 (12): 1–20. http://www.jstatsoft.org/v21/i12/.
Wickham, Hadley, and Jennifer Bryan. 2017. Readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.
Wickham, Hadley, and Romain Francois. 2016. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, Jim Hester, and Romain Francois. 2016. Readr: Read Tabular Data. https://CRAN.R-project.org/package=readr.
Xie, Yihui. 2016a. Bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida: Chapman; Hall/CRC. https://github.com/rstudio/bookdown.
———. 2016b. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.
Zhu, Hao. 2018. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.