OpenTS-BM 数据集

目录

  1. 概述

  2. 单变量时间序列

  3. 多变量时间序列

概述

OpenTS-BM 旨在提供各种各样的时间序列数据集,涵盖不同的特征、丰富的领域、多种任务以及不同的长度和维度。
多样化的时序特性:
OpenTS-BM 涵盖了多种时间序列数据特征,包括季节性、趋势性、平稳性、漂移性、转移性和相关性。
Rich domains:
OpenTS-BM 数据集来自十个不同的领域,包括交通、电力、能源、环境、自然、经济、股票市场、银行、健康和网络。
多项任务:
我们涵盖两个基本的时间序列预测任务: 单变量预测 and 多变量预测.

单变量时间序列

单变量数据集包含 8,068 个时间序列,这些序列是从来自多个领域的 16 个开源数据集中精心挑选出来的。表 1 按采样频率对单变量数据集进行了分类;对于每个频率类别,报告了具有不同特征的时间序列的数量,这些特征包括季节性、趋势性、漂移性、转移性、平稳性和平均长度。

Frequency #Series Seasonality Trend Shifting Transition Stationarity Average Lens
Yearly 1,500 611 1,086 978 633 354 32
Quarterly 1,514 486 933 889 894 471 97
Monthly 1,674 883 884 778 1,212 667 259
Weekly 805 253 330 445 407 372 536
Daily 1,484 374 502 487 1,176 714 4,951
Hourly 706 435 276 284 680 472 5,109
Other 385 75 248 236 195 124 1,678
Total 8,068 3,117 4,259 4,097 5,197 3,174 1,569
表 1:单变量数据集的统计数据。

多变量时间序列

多变量数据集包含来自10个领域的25个多变量时间序列。采样频率从每5分钟到1个月不等,特征维度范围从5到2,000不等,时间序列长度从728到57,600不等。多变量数据集的这种显著多样性使得能够对不同的预测方法进行全面的基准测试。为了确保公平比较,我们按时间顺序为每个数据集选择固定的数据分割比例,例如7:1:2或6:2:2,用于训练、验证和测试。

Dataset Domain Frequency Lengths Dim Split Description
METR-LA Traffic 5 mins 34,272 207 7:1:2 Traffic speed dataset collected from loopdetectors in the LA County road network
PEMS-BAY Traffic 5 mins 52,116 325 7:1:2 Traffic speed dataset collected from the CalTrans PeMS
PEMS04 Traffic 5 mins 16,992 307 6:2:2 Traffic flow time series collected from the CalTrans PeMS
PEMS08 Traffic 5 mins 17,856 170 6:2:2 Traffic flow time series collected from the CalTrans PeMS
Traffic Traffic 1 hour 17,544 862 7:1:2 Road occupancy rates measured by 862 sensors on San Francisco Bay area freeways
ETTh1 Electricity 1 hour 14,400 7 6:2:2 Power transformer 1, comprising seven indicators such as oil temperature and useful load
ETTh2 Electricity 1 hour 14,400 7 6:2:2 Power transformer 2, comprising seven indicators such as oil temperature and useful load
ETTm1 Electricity 15 mins 57,600 7 6:2:2 Power transformer 1, comprising seven indicators such as oil temperature and useful load
ETTm2 Electricity 15 mins 57,600 7 6:2:2 Power transformer 2, comprising seven indicators such as oil temperature and useful load
Electricity Electricity 1 hour 26,304 321 7:1:2 Electricity records the electricity consumption in kWh every 1 hour from 2012 to 2014
Solar Energy 10 mins 52,560 137 6:2:2 Solar production records collected from 137 PV plants in Alabama
Wind Energy 15 mins 48,673 7 7:1:2 Wind power records from 2020-2021 at 15-minute intervals
Weather Environment 10 mins 52,696 21 7:1:2 Recorded every for the whole year 2020, which contains 21 meteorological indicators
AQShunyi Environment 1 hour 35,064 11 6:2:2 Air quality datasets from a measurement station, over a period of 4 years
AQWan Environment 1 hour 35,064 11 6:2:2 Air quality datasets from a measurement station, over a period of 4 years
ZafNoo Nature 30 mins 19,225 11 7:1:2 From the Sapflux data project includes sap flow measurements and nvironmental variables
CzeLan Nature 30 mins 19,934 11 7:1:2 From the Sapflux data project includes sap flow measurements and nvironmental variables
FRED-MD Economic 1 month 728 107 7:1:2 Time series showing a set of macroeconomic indicators from the Federal Reserve Bank
Exchange Economic 1 day 7,588 8 7:1:2 ExchangeRate collects the daily exchange rates of eight countries
NASDAQ Stock 1 day 1,244 5 7:1:2 Records opening price, closing price, trading volume, lowest price, and highest price
NYSE Stock 1 day 1,243 5 7:1:2 Records opening price, closing price, trading volume, lowest price, and highest price
NN5 Banking 1 day 791 111 7:1:2 NN5 is from banking, records the daily cash withdrawals from ATMs in UK
ILI Health 1 week 966 7 7:1:2 Recorded indicators of patients data from Centers for Disease Control and Prevention
Covid-19 Health 1 day 1,392 948 7:1:2 Provide opportunities for researchers to investigate the dynamics of COVID-19
Wike2000 Web 1 day 792 2,000 7:1:2 Wike2000 is daily page views of 2000 Wikipedia pages
表 2:多元数据集的统计。