A new study challenges recent claims that simple linear models are superior to Transformers for time series forecasting. The research confirms that when models are compared fairly, Transformer-based architectures still achieve better performance.
The debate was sparked by a paper presented at AAAI 2023, which introduced DLinear—a straightforward feedforward network with a decomposition layer. DLinear was touted as outperforming more complex Transformer models. However, re-evaluation shows that Transformers hold their ground.
In comparative tests on three key datasets—Traffic, Exchange-Rate, and Electricity—the Transformer-based Autoformer model achieved lower Mean Absolute Scaled Error (MASE) scores than DLinear. For the Traffic dataset, Autoformer scored 0.910 vs. DLinear's 0.965; on Exchange-Rate, 1.087 vs. 1.690; and on Electricity, 0.751 vs. 0.831.
The key insight is that while simple linear models have advantages in speed and interpretability, they lack the capacity to incorporate covariates—a crucial feature for handling complex real-world data. Transformers, with their ability to model long-range dependencies and incorporate exogenous variables, remain more effective for multivariate forecasting.
Autoformer: A Closer Look
Autoformer advances time series modeling through two innovations: a Decomposition Layer and an Autocorrelation Mechanism that replaces standard self-attention.
The decomposition layer separates a time series into trend-cycle, seasonal, and random components—a classic statistical method now embedded directly into the neural network. This inner decomposition has been adopted by subsequent models like FEDformer and DLinear, highlighting its practical impact.
The autocorrelation mechanism leverages period-based dependencies, allowing the model to attend to relevant patterns across time without the quadratic complexity of full self-attention. This makes Autoformer both efficient and accurate.
Setting the Record Straight
The findings underscore the importance of fair benchmarking. When Transformer models are properly tuned and compared against linear models of equivalent parameter count, their superior pattern-recognition capabilities shine. Simplicity is valuable, but not at the cost of performance in data-rich environments.
For practitioners, the takeaway is clear: Transformer-based models remain a powerful tool for time series forecasting, especially when dealing with high-dimensional data that includes external variables.