1 About

1.1 Report circulation:

  • Public - analysis for use in ICEERB 2018 presentation.

1.2 License

1.3 Citation

If you wish to use any of the material from this report please cite as:

  • Anderson, B. (2018) GREENGrid Household Electricity Demand Data Circuit Extract Test: Heat Pump. Centre for Sustainability, University of Otago: Dunedin, New Zealand.

This work is (c) 2018 the University of Southampton.

1.4 History

1.5 Requirements:

This report uses the safe version of the grid spy 1 minute data which has been processed using the code in https://github.com/CfSOtago/GREENGridData/tree/master/dataProcessing/gridSpy. It also assumes you have already run the example circuit extraction script using circuit = Heat Pump.

1.6 Support

This work was supported by:

We do not ‘support’ the code but if you notice a problem please check the issues on our repo and if it doesn’t already exist, please open a new one.

2 Introduction

Report purpose:

3 Load data

The data used to generate this report is:

  • /Users/ben/Dropbox/Work/Otago_CfS_Ben/data/nzGREENGrid/dataExtracts/Heat Pump_2015-04-01_2016-03-31_observations.csv.gz
  • /Users/ben/Dropbox/Work/Otago_CfS_Ben/data/nzGREENGrid/ggHouseholdAttributesSafe.csv

First we load the household data. readr will give some feedback on the columns.

## Parsed with column specification:
## cols(
##   .default = col_integer(),
##   linkID = col_character(),
##   hasApplianceSummary = col_character(),
##   Oven = col_character(),
##   `Fridge / Freezer 1` = col_character(),
##   `Fridge / Freezer 2` = col_character(),
##   `Fridge / Freezer 3` = col_character(),
##   Dishwasher = col_character(),
##   Microwave = col_character(),
##   `Washing Machine` = col_character(),
##   `Clothes Dryer` = col_character(),
##   `Hot water cylinder` = col_character(),
##   `Other Appliance` = col_character(),
##   `Electric heater` = col_character(),
##   `Heated towel rails` = col_character(),
##   `PV Inverter` = col_character(),
##   `Energy Storage` = col_character(),
##   `Other Generation Device` = col_character(),
##   hasLongSurvey = col_character(),
##   StartDate = col_character(),
##   Q14_1 = col_double()
##   # ... with 12 more columns
## )
## See spec(...) for full column specifications.

Next we load the Grid Spy extract for Heat Pump. This uses a GREENGridData package function intended to load the cleaned individual household data which warns that two of the column names are not found. These columns were dropped during the extraction process so we can safely ignore these warnings

## Warning: The following named parsers don't match the column names:
## dateTime_orig, TZ_orig
Table 3.1: First few rows of grid spy data
hhID linkID r_dateTime circuit powerW
rf_08 rf_08 2015-04-01 00:00:00 Heat Pump$2092 0
rf_08 rf_08 2015-04-01 00:01:00 Heat Pump$2092 0
rf_08 rf_08 2015-04-01 00:02:00 Heat Pump$2092 0
rf_08 rf_08 2015-04-01 00:03:00 Heat Pump$2092 0
rf_08 rf_08 2015-04-01 00:04:00 Heat Pump$2092 0
rf_08 rf_08 2015-04-01 00:05:00 Heat Pump$2092 0

Table 3.1 shows the first few rows of the Grid Spy 1 minute power data.

Table 3.2: Summary of grid spy data
 hhID </th>
linkID </th>
r_dateTime circuit
 powerW </th>
Length:14250284 Length:14250284 Min. :2015-04-01 00:00:00 Length:14250284 Min. : -655.00
Class :character Class :character 1st Qu.:2015-06-22 12:39:00 Class :character 1st Qu.: 0.00
Mode :character Mode :character Median :2015-09-16 13:12:00 Mode :character Median : 0.00
NA NA Mean :2015-09-21 08:00:39 NA Mean : 147.92
NA NA 3rd Qu.:2015-12-17 17:52:00 NA 3rd Qu.: 61.29
NA NA Max. :2016-03-31 23:59:00 NA Max. :27759.00

Table 3.2 shows a summary of the Grid Spy 1 minute power data.

Note that we have some Nega watts - which households have them?

##         
##            NegaW    PosW
##   rf_08        0  521724
##   rf_09       88  152576
##   rf_10        0  526678
##   rf_11        0  519127
##   rf_13      181 1053639
##   rf_17a       0  519910
##   rf_19       34 1052942
##   rf_20       28  102140
##   rf_21        0  505028
##   rf_25        7  443881
##   rf_27        0  497686
##   rf_28        0   79033
##   rf_29        0  526778
##   rf_31        0  526802
##   rf_32        0  526665
##   rf_33        0  526863
##   rf_34        0  526557
##   rf_35        0  327860
##   rf_36        0  516127
##   rf_37        0  526651
##   rf_38       54  373608
##   rf_40        0  338280
##   rf_41        0  223790
##   rf_42        0  518064
##   rf_43        0  288814
##   rf_44        0  526737
##   rf_45        0  525994
##   rf_46   397951  552976
##   rf_47        0  525011
##         
##          NegaW  PosW
##   rf_08    0.0 100.0
##   rf_09    0.1  99.9
##   rf_10    0.0 100.0
##   rf_11    0.0 100.0
##   rf_13    0.0 100.0
##   rf_17a   0.0 100.0
##   rf_19    0.0 100.0
##   rf_20    0.0 100.0
##   rf_21    0.0 100.0
##   rf_25    0.0 100.0
##   rf_27    0.0 100.0
##   rf_28    0.0 100.0
##   rf_29    0.0 100.0
##   rf_31    0.0 100.0
##   rf_32    0.0 100.0
##   rf_33    0.0 100.0
##   rf_34    0.0 100.0
##   rf_35    0.0 100.0
##   rf_36    0.0 100.0
##   rf_37    0.0 100.0
##   rf_38    0.0 100.0
##   rf_40    0.0 100.0
##   rf_41    0.0 100.0
##   rf_42    0.0 100.0
##   rf_43    0.0 100.0
##   rf_44    0.0 100.0
##   rf_45    0.0 100.0
##   rf_46   41.8  58.2
##   rf_47    0.0 100.0
Table 3.3: Summary of cleaned grid spy data (check for NAs)
 hhID </th>
linkID </th>
r_dateTime circuit
 powerW </th>
obsTime
 negW </th>
Length:13851941 Length:13851941 Min. :2015-04-01 00:00:00 Length:13851941 Min. : 0.00 Length:13851941 Length:13851941
Class :character Class :character 1st Qu.:2015-06-21 15:43:00 Class :character 1st Qu.: 0.00 Class1:hms Class :character
Mode :character Mode :character Median :2015-09-16 00:11:00 Mode :character Median : 0.00 Class2:difftime Mode :character
NA NA Mean :2015-09-20 18:47:53 NA Mean : 154.28 Mode :numeric NA
NA NA 3rd Qu.:2015-12-17 10:44:00 NA 3rd Qu.: 70.19 NA NA
NA NA Max. :2016-03-31 23:59:00 NA Max. :27759.00 NA NA

Table 3.3 shows a summary of the Grid Spy 1 minute power data after the removal of any negaWatts.

Note that:

  • r_datetime is the correct dateTime of each observation in UTC and will have loaded as your local timezone. If you are conducting this analysis outside NZ then you will get strange results until you use lubridate to tell R to use tz = “Pacific/Auckland” with this variable;
  • there can be 0 Wh observations.

4 Plot seasonal mean power profiles

First we create a Southern Hemisphere season variable. Luckily we have a function to do this in the GREENGridData package. We print a check table to ensure we are all happy with the coding of season.

gsDT <- GREENGridData::addNZSeason(gsDT)
table(lubridate::month(gsDT$r_dateTime, label = TRUE), gsDT$season, useNA = "always")
##       
##         Spring  Summer  Autumn  Winter    <NA>
##   Jan        0 1018616       0       0       0
##   Feb        0  959530       0       0       0
##   Mar        0       0 1005004       0       0
##   Apr        0       0 1266948       0       0
##   May        0       0 1316893       0       0
##   Jun        0       0       0 1270440       0
##   Jul        0       0       0 1264173       0
##   Aug        0       0       0 1209094       0
##   Sep  1187018       0       0       0       0
##   Oct  1209409       0       0       0       0
##   Nov  1104322       0       0       0       0
##   Dec        0 1040494       0       0       0
##   <NA>       0       0       0       0       0

For simplicity we will focus only on Summer and Winter.

4.1 Overall profiles

This section plots overall mean power per half hour by season.

gsDT <- gsDT[, r_dateTimeQHour := lubridate::floor_date(r_dateTime, unit = "15 mins")]

# create mean power across 15 minute periods to use as base dataset (comparable to SAVE)
qHourDT <- gsDT[, .(meanW = mean(powerW)), keyby = .(r_dateTimeQHour,linkID, season)
             ]
qHourDT <- qHourDT[, obsQHour := hms::as.hms(r_dateTimeQHour)]

plotDT <- qHourDT[, .(meanW = mean(meanW)), keyby = .(season, obsQHour)
             ]

# set attributes for plot
vLineAlpha <- 0.4
vLineCol <- "#0072B2" # http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/#a-colorblind-friendly-palette
timeBreaks <- c(hms::as.hms("04:00:00"), 
                hms::as.hms("08:00:00"),
                hms::as.hms("12:00:00"),
                hms::as.hms("16:00:00"),
                hms::as.hms("20:00:00"),
                hms::as.hms("24:00:00")
)

# create default caption
myCaption <- paste0("GREENGrid Grid Spy household electricity demand data (https://dx.doi.org/10.5255/UKDA-SN-853334)",
                        "\n", min(lubridate::date(gsDT$r_dateTime)), 
                        " to ", max(lubridate::date(gsDT$r_dateTime)),
                        "\nTime = Pacific/Auckland",
                        "\n (c) ", lubridate::year(now())," University of Otago")

myPlot <- ggplot2::ggplot(plotDT[!is.na(season)], # make sure no un-set seasons/non-parsed dates
                          aes(x = obsQHour, y = meanW/1000)) +
  geom_line() + 
  facet_grid(season ~ .) +
  scale_colour_manual(values=ggParams$cbPalette) + # use colour-blind friendly palette
  theme(strip.text.y = element_text(angle = 0, vjust = 0.5, hjust = 0.5)) + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.5)) + 
  guides(colour = guide_legend(title = "Season: ")) +
  theme(legend.position = "bottom")  + 
  labs(title = paste0(params$circuit, ": seasonal mean power demand profiles"),
       y = "Mean kW per 15 minutes", 
       x = "Time of day",
       caption = myCaption
       )

myPlot + 
  scale_x_time(breaks = timeBreaks) +
  geom_vline(xintercept = timeBreaks, alpha = vLineAlpha, colour = vLineCol)
Demand profile plot

Figure 4.1: Demand profile plot

#ggplot2::ggsave(paste0(ggParams$repoLoc,"/examples/outputs/", params$circuit, "_meankWperminBySeason.png"))

Figure 4.1 shows the overall mean kW per minute in each season for this circuit (Heat Pump).

4.2 Profiles by linked household attributes

4.2.1 Number of people

Table 4.1 shows the number of households who have different numbers of people (children and adults). This table includes households where we do not know the number of people (NA) but we do have electricity demand data.

Table 4.1: Number of households with number of persons (full data)
Q57 Freq
NA 2
1 4
2 11
3 10
4 10
5 5
6 2

Clearly this is too fine grained (too many categories). We therefor collapse to form the coding shown in 4.2.

Table 4.2: Number of households with number of persons (recoded)
nPeople Freq
NA 2
1 4
2 11
3 10
4+ 17

Now we link (join) the Grid Spy and household data.tables and aggregate (summarise) by season and number of people. You can do this using data.table’s on the fly join but we have found pre-joining of the columns you want to be much faster. We’re not sure why as it shouldn’t be. You can probably also do this in dplyr etc but we haven’t tried.

Figure 4.2 shows the mean kW per minute per season by presence of young children for this circuit (Heat Pump). Can you see anything interesting or unusual and might this be due to the numbers of households in each group?

Demand profile plot - n people

Figure 4.2: Demand profile plot - n people

Demand profile plot - n people

Figure 4.2: Demand profile plot - n people

Table 4.3: Actual n households used in plot
nPeople nHHs
1 2
2 4
3 8
4+ 12

4.2.2 Number of children

This section plots overall mean power per minute by season and number of children aged 0-12 as an illustration of how to link the Grid Spy and household data. We will go through the steps with commentary and showing the code…

Table 4.4 shows the number of households who have different numbers of children aged 0-12 so we know how many households make up each line on the plot. This table includes households where we do not know the number of children (NA) but we do have electricity demand data.

Table 4.4: Number of households with children aged 0-12
nChildren0_12 Freq
NA 2
0 17
1 11
2 10
3 4
Table 4.4: Number of households with 1+ child aged 0-12
presenceChildren Freq
0 children 19
1+ child 25

Now use the aggregated data.table to make the plot. Note that as specified this will add a line for nChildren0_12 == NA household(s) - see Table ??.

keepCols <- c("linkID", "nChildren0_12")
mergedDT <- qHourDT[hhDT[, ..keepCols]]
plotDT <- mergedDT[!is.na(nChildren0_12), .(meanW = mean(meanW),
                                      sdW = sd(meanW),
                                      nObs = .N), keyby = .(season, obsQHour, nChildren0_12)]

plotDT <- plotDT[, ci_upper := meanW + qnorm(0.975)*(sdW/sqrt(nObs))]
plotDT <- plotDT[, ci_lower := meanW - qnorm(0.975)*(sdW/sqrt(nObs))]

basePlot <- ggplot2::ggplot(plotDT[!is.na(season)], # make sure no un-set seasons/non-parsed dates
                          aes(x = obsQHour, y = meanW/1000, 
                              colour = as.factor(nChildren0_12))) +
  geom_line() + 
  scale_colour_manual(values=ggParams$cbPalette) + # use colour-blind friendly palette
  facet_grid(season  ~ .) + 
  theme(strip.text.y = element_text(angle = 0, vjust = 0.5, hjust = 0.5)) + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.5)) + 
  guides(colour = guide_legend(title = "Number of children aged 0 - 12: ")) +
  theme(legend.position = "bottom")  + 
  labs(title = paste0(params$circuit, ": seasonal mean power demand profiles by n children aged 0-12"),
       y = "Mean kW per 15 minutes",
       x = "Time of day",
       caption = myCaption
       )

basePlot <- basePlot + 
  scale_x_time(breaks = timeBreaks) +
  geom_vline(xintercept = timeBreaks, alpha = vLineAlpha, colour = vLineCol)

basePlot
Demand profile plot - n kids

Figure 4.3: Demand profile plot - n kids

# add 95% CI

ciPlot <- basePlot + geom_errorbar(aes(ymin = ci_lower/1000, ymax = ci_upper/1000))
ciPlot
Demand profile plot - n kids

Figure 4.3: Demand profile plot - n kids

# use reduced
keepCols <- c("linkID", "presenceChildren")
mergedDT <- qHourDT[hhDT[, ..keepCols]]
plotDT <- mergedDT[!is.na(presenceChildren), .(meanW = mean(meanW),
                                      sdW = sd(meanW),
                                      nObs = .N), keyby = .(season, obsQHour, presenceChildren)]

plotDT <- plotDT[, ci_upper := meanW + qnorm(0.975)*(sdW/sqrt(nObs))]
plotDT <- plotDT[, ci_lower := meanW - qnorm(0.975)*(sdW/sqrt(nObs))]

basePlot <- ggplot2::ggplot(plotDT[!is.na(season)], # make sure no un-set seasons/non-parsed dates
                          aes(x = obsQHour, y = meanW/1000, 
                              colour = as.factor(presenceChildren))) +
  geom_line() + 
  scale_colour_manual(values=ggParams$cbPalette) + # use colour-blind friendly palette
  facet_grid(season  ~ .) + 
  theme(strip.text.y = element_text(angle = 0, vjust = 0.5, hjust = 0.5)) + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.5)) + 
  guides(colour = guide_legend(title = "Number of children aged 0 - 12: ")) +
  theme(legend.position = "bottom")  + 
  labs(title = paste0(params$circuit, ": seasonal mean power demand profiles by n children aged 0-12"),
       y = "Mean kW per 15 minutes",
       x = "Time of day",
       caption = myCaption
       )

basePlot <- basePlot + 
  scale_x_time(breaks = timeBreaks) +
  geom_vline(xintercept = timeBreaks, alpha = vLineAlpha, colour = vLineCol)

basePlot
Demand profile plot - n kids

Figure 4.3: Demand profile plot - n kids

# add 95% CI

ciPlot <- basePlot + geom_errorbar(aes(ymin = ci_lower/1000, ymax = ci_upper/1000))
ciPlot
Demand profile plot - n kids

Figure 4.3: Demand profile plot - n kids

# actual n household used 
t <- mergedDT[!is.na(presenceChildren) & !is.na(season) , .(nHHs = uniqueN(linkID)), keyby = .(presenceChildren)]

knitr::kable(t, caption = "Actual n households used in plot")
Table 4.5: Actual n households used in plot
presenceChildren nHHs
0 children 8
1+ child 20

Figure 4.3 shows the mean kW per minute per season by presence of young children for this circuit (Heat Pump). Can you see anything interesting or unusual and might this be due to the numbers of households in each group?

5 Runtime

Analysis completed in 62.5 seconds ( 1.04 minutes) using knitr in RStudio with R version 3.5.1 (2018-07-02) running on x86_64-apple-darwin15.6.0.

6 R environment

6.1 R packages used

  • base R (R Core Team 2016)
  • bookdown (Xie 2016a)
  • GREENGridData (Anderson and Eyers 2018) which depends on:
    • data.table (Dowle et al. 2015)
    • dplyr (Wickham and Francois 2016)
    • hms (???)
    • lubridate (Grolemund and Wickham 2011)
    • progress (Csárdi and FitzJohn 2016)
    • readr (Wickham, Hester, and Francois 2016)
    • readxl (Wickham and Bryan 2017)
    • reshape2 (Wickham 2007)
  • kableExtra (Zhu 2018)
  • knitr (Xie 2016b)
  • rmarkdown (Allaire et al. 2018)

6.2 Session info

## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] kableExtra_0.9.0  GREENGridData_1.0 bookdown_0.7      rmarkdown_1.10   
## [5] readr_1.1.1       ggplot2_3.1.0     lubridate_1.7.4   data.table_1.11.8
## [9] GREENGrid_0.1.0  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.19      highr_0.7         cellranger_1.1.0 
##  [4] compiler_3.5.1    pillar_1.3.0      plyr_1.8.4       
##  [7] bindr_0.1.1       prettyunits_1.0.2 progress_1.2.0   
## [10] tools_3.5.1       digest_0.6.18     viridisLite_0.3.0
## [13] evaluate_0.12     tibble_1.4.2      gtable_0.2.0     
## [16] pkgconfig_2.0.2   rlang_0.3.0.1     rstudioapi_0.8   
## [19] yaml_2.2.0        xfun_0.4          bindrcpp_0.2.2   
## [22] xml2_1.2.0        httr_1.3.1        withr_2.1.2      
## [25] stringr_1.3.1     dplyr_0.7.7       knitr_1.20       
## [28] hms_0.4.2         rprojroot_1.3-2   grid_3.5.1       
## [31] tidyselect_0.2.5  glue_1.3.0        R6_2.3.0         
## [34] readxl_1.1.0      reshape2_1.4.3    purrr_0.2.5      
## [37] magrittr_1.5      backports_1.1.2   scales_1.0.0     
## [40] htmltools_0.3.6   rvest_0.3.2       assertthat_0.2.0 
## [43] colorspace_1.3-2  labeling_0.3      stringi_1.2.4    
## [46] lazyeval_0.2.1    munsell_0.5.0     crayon_1.3.4

References

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, and Winston Chang. 2018. Rmarkdown: Dynamic Documents for R. https://CRAN.R-project.org/package=rmarkdown.

Anderson, Ben, and David Eyers. 2018. GREENGridData: Processing Nz Green Grid Project Data to Create a ’Safe’ Version for Data Archiving and Re-Use. https://github.com/CfSOtago/GREENGridData.

Csárdi, Gábor, and Rich FitzJohn. 2016. Progress: Terminal Progress Bars. https://CRAN.R-project.org/package=progress.

Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Wickham, Hadley. 2007. “Reshaping Data with the reshape Package.” Journal of Statistical Software 21 (12): 1–20. http://www.jstatsoft.org/v21/i12/.

Wickham, Hadley, and Jennifer Bryan. 2017. Readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.

Wickham, Hadley, and Romain Francois. 2016. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Wickham, Hadley, Jim Hester, and Romain Francois. 2016. Readr: Read Tabular Data. https://CRAN.R-project.org/package=readr.

Xie, Yihui. 2016a. Bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida: Chapman; Hall/CRC. https://github.com/rstudio/bookdown.

———. 2016b. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.

Zhu, Hao. 2018. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.