Datasets

Table of Contents

  1. Overview

  2. Univariate time series

  3. Multivariate time series

Overview

The Time series Forecasting Benchmark (TFB) aims to provide a wide variety of time series datasets that cover diverse charactistics, rich domains, multiple tasks, and varying lengths and dimensions.
Diverse charactistics:
TFB covers diverse time series data characteristics, including seasonality, trend, stationarity, transition, shifting, and correlation.
Rich domains:
TFB datasets come from ten different domains, including traffic, electricity, energy, the environment, nature, economic, stock markets, banking, health, and the web.
Multiple tasks:
We cover two fundamental time series forecasting tasks: univariate forecating and multivariate forecating.

Univariate time series

The univariate dataset includes 8,068 time series which are carefully curated from 16 open-source datasets from multiple domains. Table 1 categorizes the univariate dataset by sampling frequency; and for each frequency category, it reports the number of time series with different charactistics, including seasonality, trend, shifting, transition, stationarity, and the average lengths.

Frequency #Series Seasonality Trend Shifting Transition Stationarity Average Lens
Yearly 1,500 611 1,086 978 633 354 32
Quarterly 1,514 486 933 889 894 471 97
Monthly 1,674 883 884 778 1,212 667 259
Weekly 805 253 330 445 407 372 536
Daily 1,484 374 502 487 1,176 714 4,951
Hourly 706 435 276 284 680 472 5,109
Other 385 75 248 236 195 124 1,678
Total 8,068 3,117 4,259 4,097 5,197 3,174 1,569
Table 1: Statistics of univariate datasets.

Multivariate time series

The multivariate datasets include 25 multivariate time series from 10 domains. The sampling frequencies vary from every 5 minutes to 1 month, the range of feature dimensions varies from 5 to 2,000, and the time series length varies from 728 to 57,600. This substantial diversity of the multivariate datasets enables comprehensive benchmarking of different forecasting methods. To ensure fair comparisons, we choose a fixed data split ratio for each dataset chronologically, e.g., 7:1:2 or 6:2:2, for training, validation and testing.

Dataset Domain Frequency Lengths Dim Split Description
METR-LA Traffic 5 mins 34,272 207 7:1:2 Traffic speed dataset collected from loopdetectors in the LA County road network
PEMS-BAY Traffic 5 mins 52,116 325 7:1:2 Traffic speed dataset collected from the CalTrans PeMS
PEMS04 Traffic 5 mins 16,992 307 6:2:2 Traffic flow time series collected from the CalTrans PeMS
PEMS08 Traffic 5 mins 17,856 170 6:2:2 Traffic flow time series collected from the CalTrans PeMS
Traffic Traffic 1 hour 17,544 862 7:1:2 Road occupancy rates measured by 862 sensors on San Francisco Bay area freeways
ETTh1 Electricity 1 hour 14,400 7 6:2:2 Power transformer 1, comprising seven indicators such as oil temperature and useful load
ETTh2 Electricity 1 hour 14,400 7 6:2:2 Power transformer 2, comprising seven indicators such as oil temperature and useful load
ETTm1 Electricity 15 mins 57,600 7 6:2:2 Power transformer 1, comprising seven indicators such as oil temperature and useful load
ETTm2 Electricity 15 mins 57,600 7 6:2:2 Power transformer 2, comprising seven indicators such as oil temperature and useful load
Electricity Electricity 1 hour 26,304 321 7:1:2 Electricity records the electricity consumption in kWh every 1 hour from 2012 to 2014
Solar Energy 10 mins 52,560 137 6:2:2 Solar production records collected from 137 PV plants in Alabama
Wind Energy 15 mins 48,673 7 7:1:2 Wind power records from 2020-2021 at 15-minute intervals
Weather Environment 10 mins 52,696 21 7:1:2 Recorded every for the whole year 2020, which contains 21 meteorological indicators
AQShunyi Environment 1 hour 35,064 11 6:2:2 Air quality datasets from a measurement station, over a period of 4 years
AQWan Environment 1 hour 35,064 11 6:2:2 Air quality datasets from a measurement station, over a period of 4 years
ZafNoo Nature 30 mins 19,225 11 7:1:2 From the Sapflux data project includes sap flow measurements and nvironmental variables
CzeLan Nature 30 mins 19,934 11 7:1:2 From the Sapflux data project includes sap flow measurements and nvironmental variables
FRED-MD Economic 1 month 728 107 7:1:2 Time series showing a set of macroeconomic indicators from the Federal Reserve Bank
Exchange Economic 1 day 7,588 8 7:1:2 ExchangeRate collects the daily exchange rates of eight countries
NASDAQ Stock 1 day 1,244 5 7:1:2 Records opening price, closing price, trading volume, lowest price, and highest price
NYSE Stock 1 day 1,243 5 7:1:2 Records opening price, closing price, trading volume, lowest price, and highest price
NN5 Banking 1 day 791 111 7:1:2 NN5 is from banking, records the daily cash withdrawals from ATMs in UK
ILI Health 1 week 966 7 7:1:2 Recorded indicators of patients data from Centers for Disease Control and Prevention
Covid-19 Health 1 day 1,392 948 7:1:2 Provide opportunities for researchers to investigate the dynamics of COVID-19
Wike2000 Web 1 day 792 2,000 7:1:2 Wike2000 is daily page views of 2000 Wikipedia pages
Table 2: Statistics of multivariate datasets.