A data-driven predictive model of city-scale energy use in buildings

Constantine E. Kontokosta, Christopher Tull

Research output: Contribution to journalArticlepeer-review


Many cities across the United States have turned to building energy disclosure (or benchmarking) laws to encourage transparency in energy efficiency markets and to support sustainability and carbon reduction plans. In addition to direct peer-to-peer comparisons, the benchmarking data published under these laws have been used as a tool by researchers and policy-makers to study the distribution and determinants of energy use in large buildings. However, these policies only cover a small subset of the building stock in a given city, and thus capture only a fraction of energy use at the urban scale. To overcome this limitation, we develop a predictive model of energy use at the building, district, and city scales using training data from energy disclosure policies and predictors from widely-available property and zoning information. We use statistical models to predict the energy use of 1.1 million buildings in New York City using the physical, spatial, and energy use attributes of a subset derived from 23,000 buildings required to report energy use data each year. Linear regression (OLS), random forest, and support vector regression (SVM) algorithms are fit to the city's energy benchmarking data and then used to predict electricity and natural gas use for every property in the city. Model accuracy is assessed and validated at the building level and zip code level using actual consumption data from calendar year 2014. We find the OLS model performs best when generalizing to the City as a whole, and SVM results in the lowest mean absolute error for predicting energy use within the LL84 sample. Our median predicted electric energy use intensity for office buildings is 71.2 kbtu/sf and for residential buildings is 31.2 kbtu/sf with mean absolute log accuracy ratio of 0.17. Building age is found to be a significant predictor of energy use, with newer buildings (particularly those built since 1991) found to have higher consumption levels than those constructed before 1930. We also find higher electric consumption in office and retail buildings, although the sign is reversed for natural gas. In general, larger buildings use less energy per square foot, while taller buildings with more stories, controlling for floor area, use more energy per square foot. Attached buildings – those with adjacent buildings and a shared party wall – are found to have lower natural gas use intensity. The results demonstrate that electricity consumption can be reliably predicted using actual data from a relatively small subset of buildings, while natural gas use presents a more complicated problem given the bimodal distribution of consumption and infrastructure availability.

Original languageEnglish (US)
Pages (from-to)303-317
Number of pages15
JournalApplied Energy
StatePublished - 2017


  • Building energy
  • Energy efficiency
  • Energy prediction
  • Machine learning
  • Urban dynamics

ASJC Scopus subject areas

  • Building and Construction
  • Mechanical Engineering
  • General Energy
  • Management, Monitoring, Policy and Law


Dive into the research topics of 'A data-driven predictive model of city-scale energy use in buildings'. Together they form a unique fingerprint.

Cite this