Time series foundation models (TSFMs) build on advances in foundation models from natural language processing and vision. Using transformer-based architectures and large-scale training data, they achieve zero-shot performance and adapt across sectors such as finance, retail, energy, and healthcare.
Discover the architecture, use cases, adoption in industries, benefits, challenges, and comparisons of time series foundation models with existing models.
What are Time Series Foundation Models?
Time series foundation models (TSFMs) are large-scale pre-trained models designed to handle time series data across diverse domains and applications.
Inspired by the success of foundation models in natural language processing (NLP) and computer vision, TSFMs extend the foundation models’ represent paradigm to forecasting and sequential analysis.
TimesFM
TimesFM is a decoder-only foundation model for time series forecasting. It is a pretrained TimesFM model with ~200M parameters, trained on a corpus of roughly 100B real-world time series data points to deliver strong zero-shot forecasting on unseen datasets across domains and granularities. Compared with large language models (LLMs), the emphasis here is compact size, fast application, and broad generalization.
Architecture and training
TimesFM borrows the decoder-only transformer architecture from language models: stacked causal self-attention and feedforward layers generate the following output conditioned only on past context.
Unlike text, the model represents a sequence as patches of contiguous time points; each patch is embedded (via an MLP residual block plus positional encodings) and treated as a token. A key design choice is to predict a longer output patch length than the input patch, which reduces iterative steps at inference and limits error accumulation on long horizons.
For model training, Google mixes synthetic data (to teach basic temporal “grammar”) with a large, diverse dataset of real series (e.g., Google Trends and Wikipedia Pageviews) to improve transfer. The total pretraining scale is on the order of 100B time points.

Figure 1: Graph showing TimesFM’s architecture.1
Evaluation and results
Google evaluates TimesFM in pure zero-shot mode across public benchmarks. On the Monash Forecasting Archive, TimesFM outperforms most statistical models (e.g., ARIMA, ETS) and matches or exceeds several deep learning baselines trained on the target series.
On long-horizon tasks (e.g., ETT datasets), TimesFM’s zero-shot accuracy rivals supervised baselines (e.g., PatchTST trained per dataset) and beats prompt-based LLM forecasters (e.g., llmtime with GPT-3.5). Metrics include scaled MAE and geometric-mean summaries across datasets.
Key characteristics and architecture of TSFMs
TSFMs’ transformer architecture uses self-attention, residual connections, and linear layers to model long-range dependencies and seasonality patterns. Input patches are transformed via a multilayer perceptron into embeddings, while positional encodings preserve temporal order.
Compared to other foundation models, these architectures are adapted for forecasting tasks, rather than text or image processing.

Figure 2: Diagram showing different adaptation techniques.2
What are the primary use cases?
Forecasting
Forecasting involves predicting future points in a time series given historical patterns. TSFMs approach this by generating point forecasts or probabilistic time series forecasting outputs, depending on the requirement.
Unlike univariate time series forecasting models or statistical models, they integrate multiple signals, including exogenous variables like weather or promotions. This flexibility makes them suitable for retail demand planning, energy load forecasting, and financial market analysis.
Classification
In classification, the objective is to label or categorize patterns within a series foundation. TSFMs use transformer-based models to recognize characteristic structures such as arrhythmias in medical data or unusual demand peaks in retail.
Imputation
Imputation fills gaps in missing sequences. TSFMs reconstruct missing intervals by leveraging patterns learned from diverse datasets during unified training.
Unlike simple interpolation, they retain consistency with seasonality and trends. Applications include filling gaps in energy usage logs or medical monitoring data, where missing information can affect downstream forecasting tasks.
Anomaly detection
TSFMs identify deviations from expected patterns by comparing current signals with their learned representation of normal behavior.
Their ability to generalize across domains improves zero-shot performance, even in cases where anomalies are rare. This is relevant in fraud detection, predictive maintenance, and cybersecurity monitoring. Compared to prior work in anomaly detection, TSFMs integrate time series forecasting with classification, providing context-aware detection.
Industries adopting TSFMs
Retail
Retailers rely heavily on forecasting models for inventory management and sales planning.
Traditional statistical models often fail to capture external influences such as holidays, promotions, or economic shifts. TSFMs incorporate exogenous variables and adapt through a few-shot adjustments.
For example, a global retailer can apply one model trained on a diverse dataset and achieve reliable predictions across multiple regions.
Finance
Financial systems require both forecast horizon projections and anomaly detection. Regression models or deep learning models tuned for specific markets often struggle with structural changes.
TSFMs provide zero-shot forecasting for new instruments and adapt to volatility through transfer learning. Use cases include stock price forecasting, portfolio risk modeling, and fraud detection.
Healthcare
Healthcare produces continuous time series data from monitoring devices. Traditional approaches to anomaly detection in vitals rely on fixed thresholds. TSFMs instead learn from both clinical and synthetic data, enabling early warning systems that adapt to patient-specific baselines. Beyond monitoring, they support knowledge discovery in drug trials by identifying subtle temporal patterns across large datasets.
Energy
Energy systems generate time series from sensors and meters. Unlike traditional methods that assume fixed seasonal patterns, TSFMs handle variable conditions such as renewable generation.
They combine consumption histories with exogenous variables like temperature and wind speed, producing probabilistic time series forecasting outputs for grid balancing. Computational efficiency is relevant here, as tiny time mixers provide localized predictions at lower cost. Explore sustainability AI applications for more information.
Transportation
Transportation networks depend on forecasting for traffic flow and logistics. Earlier machine learning models required separate model training for each city or route. TSFMs trained on diverse datasets can transfer across regions with minimal fine-tuned adaptation.
Real-world examples include congestion forecasting in urban areas and optimizing delivery routes in logistics.
Manufacturing
In manufacturing, predictive maintenance is a core use case. Traditional regression models trained on single-machine data often lack transferability. TSFMs handle long-range dependencies across sensors and production cycles, improving early fault detection.
When fine-tuned with facility-specific data, they achieve improved performance in reducing downtime and ensuring quality control.
Weather and climate
Weather and climate modeling requires managing multiple forecast horizons, from hours to years. Statistical models and traditional methods often fail to capture multi-scale variability.
TSFMs, through their transformer architecture and self-attention mechanisms, can model both local and global dependencies. Examples include short-term precipitation forecasting and climate cycle predictions. Probabilistic time series forecasting helps quantify uncertainty in these outputs.
Urban computing
Smart cities rely on time series data from transportation, utilities, and infrastructure. Existing models are currently siloed by task. TSFMs unify them under one model that can be deployed across domains, adapting with minimal additional training data.
Examples include optimizing energy use in buildings, predicting traffic congestion, and managing water supply systems.
Benefits of time series foundation models
Key advantages of TSFMs compared to existing models include:
- Zero-shot performance: Delivering strong results on unseen datasets without fine-tuned adaptation.
- Reduced training costs: Reuse of one model across domains instead of training separate models.
- Domain generalization: A model adapts to varied contexts with transfer learning and few-shot learners.
- Computational efficiency: Smaller than large foundation models in NLP while still delivering improved performance.
- Versatility: Handling diverse forecast horizons, granularities, and output patch lengths.
Challenges
Technical challenges
- Training data scarcity: Unlike text for language models, there is no vast, unified corpus for time series.
- Lack of universal structure: No equivalent of vocabulary or grammar.
- Complex temporal dynamics: Diverse seasonality patterns and histories.
- Domain specificity: Different sampling rates and behaviors across industries.
Practical challenges
- Privacy concerns in collecting diverse datasets.
- High computational efficiency requirements for model training.
- Distribution shift in evolving environments.
- Interpretability and transparency in real-world applications.
- Integration into legacy systems and related work pipelines.
Time series foundation models: Development and design factors
Approach | Setup effort | Training data needs | Architecture |
---|---|---|---|
TimesFM (decoder-only foundation model) | Minimal; strong zero shot forecasting across domains without retraining | Pretrained on ~100B time points from diverse datasets (real + synthetic data) | Decoder only transformer with patch tokenization, self attention, longer output patch length |
Traditional methods (ARIMA, ETS, etc.) | Moderate; requires manual model selection, tuning, stationarity assumptions | Uses only the target time series; no pretraining | Statistical models with linear and seasonal assumptions |
Supervised deep learning models (PatchTST, Informer, etc.) | High; per-dataset model training and hyperparameter tuning | Requires large labeled training sets for each domain | Transformer based architectures or CNN/RNN hybrids |
LLM-based prompting (GPT, etc.) | Low at inference; requires careful text formatting of numeric sequences | Trained on text corpora, not time series data | Large language models; tokenization designed for text, not numeric dependencies |
Time series foundation models: Outcomes and operational factors
Approach | Performance on benchmarks | Adaptability | Efficiency |
---|---|---|---|
TimesFM (decoder-only foundation model) | Matches or surpasses supervised baselines in zero-shot; better than statistical models and LLM prompting | Generalizes across finance, energy, retail, healthcare; supports transfer learning and few shot adjustments | Compact (~200M parameters); lower computational efficiency needs than large foundation models |
Traditional methods (ARIMA, ETS, etc.) | Strong on stable, short-horizon series; weak on irregular or multivariate data | Little to no transfer across domains; must refit per dataset | Lightweight, fast; runs on limited hardware |
Supervised deep learning models (PatchTST, Informer, etc.) | Often highest accuracy when trained on sufficient data; strong on domain-specific tasks | Poor generalization; needs retraining per dataset | Resource-intensive (>1B parameters); slower to train and deploy |
LLM-based prompting (GPT, etc.) | Weaker than TimesFM; struggles with long forecast horizons and numeric accuracy | Adaptable in principle, but heavily reliant on prompt engineering | Very costly at inference; inefficient due to scale and token length |
Differences from other foundation models
TSFMs diverge from language models and vision foundational models in several ways:
- Data modality: Sequential numeric data rather than text or images.
- Architecture: Adapted transformer-based architectures with patching and normalization (e.g., reversible instance normalization).
- Training approach: Incorporating both synthetic data and real-world corpora, like Google Research datasets.
- Scale: Smaller in size than large foundation models, yet delivering high-quality point forecasts.
- Evaluation: Benchmarked on forecasting tasks, anomaly detection, and imputation instead of text understanding.
Conclusion
Time series foundation models represent a shift from domain-specific statistical models, regression models, and supervised deep learning toward a unified model for time series. By applying transformer-based architectures and leveraging pre-trained models, they offer scalable solutions for forecasting tasks, anomaly detection, and other applications across industries.
While challenges remain in training data availability, interpretability, and integration into existing workflows, the advantages in zero-shot forecasting, transfer learning, and cross-domain adaptability position TSFMs as a key step toward general-purpose forecasting. As research progresses and open source foundation models expand, adoption is likely to grow across both academic and real-world settings.
Comments
Your email address will not be published. All fields are required.