Time Series Analysis in Python: Complete Toolkit with Prophet, sktime, ARIMA & Darts

Master Python time series analysis with Prophet, sktime, statsmodels ARIMA, and Darts. Compare tools, build forecasting pipelines, and avoid common pitfalls.

  • MIT
  • Updated 2026-05-18

{</* resource-info */>}

Time series analysis represents one of the most practically consequential domains in data science. Unlike cross-sectional problems where observations are exchangeable, time series data carries temporal structure — trends, seasonality, autocorrelation, and regime changes — that demands specialized tools and careful methodology. A demand forecast that is 10% too high leads to excess inventory costs; a financial risk model that misses volatility clustering leads to catastrophic losses. The Python ecosystem offers a rich toolkit for time series work, but the diversity of libraries creates genuine confusion about which tool fits which problem.

This guide examines four essential libraries: Prophet for business forecasting, sktime for unified time series machine learning, statsmodels for classical statistical foundations, and Darts for deep learning approaches. We compare their capabilities, demonstrate usage patterns, and provide a practical framework for building end-to-end forecasting pipelines.

The Time Series Analysis Landscape in 2026 #

Time series methodologies have evolved along three parallel tracks. Classical statistical methods — ARIMA, exponential smoothing, structural models — dominated academic forecasting for decades and remain relevant for smaller datasets requiring interpretability. Machine learning approaches treat time series as supervised learning problems through feature engineering (lag features, rolling statistics) and algorithms like gradient boosting and random forests. Deep learning methods — including recurrent networks, transformers, and specialized architectures like N-BEATS and TFT — capture complex nonlinear patterns at the cost of increased data requirements and computational demands.

Key tasks in the time series domain include forecasting (predicting future values), classification (assigning category labels to series), anomaly detection (identifying unusual patterns), and clustering (grouping similar series). Library selection depends heavily on which tasks you face and your data characteristics: series length, frequency, seasonality patterns, and whether you work with univariate or multivariate data.

Prophet: Facebook’s Forecasting Tool #

Prophet, released by Facebook’s data science team in 2017 and now maintained at version 1.1.x, takes an intentionally opinionated approach to forecasting. Rather than requiring users to specify ARIMA orders or perform differencing, Prophet fits an additive regression model with three components: a piecewise linear trend with automatic changepoint detection, yearly and weekly seasonality modeled via Fourier series, and a holiday effects layer. This decomposition makes Prophet remarkably robust to missing data, outliers, and shifts in trend — exactly the messiness found in real business time series.

The API is deliberately minimal. You create a DataFrame with ds (datetime) and y (value) columns, call Prophet().fit(df), and request future predictions with predict(future_df). Prophet handles the rest: trend fitting, seasonality extraction, uncertainty intervals, and component decomposition. This simplicity explains Prophet’s adoption among business analysts who need reliable forecasts without mastering time series theory.

Prophet Advanced Configuration #

While Prophet works well out of the box, production forecasting requires tuning. Custom seasonality lets you add business-specific cycles — quarterly patterns for retail, daily patterns for food service. The add_seasonality method accepts arbitrary periods and Fourier orders to control smoothness. Holiday effects use a custom DataFrame specifying holiday dates and optional lower_window / upper_window parameters to model anticipation and lingering effects.

Changepoint detection controls trend flexibility. By default, Prophet places 25 potential changepoints uniformly across the first 80% of the time series, then uses a sparse prior to select only the most significant. The changepoint_prior_scale parameter controls this sparsity — higher values allow more flexible trends but risk overfitting. Cross-validation through cross_validation and performance metrics through performance_metrics provide systematic evaluation across multiple forecast horizons.

Multiplicative mode (seasonality_mode='multiplicative') replaces additive seasonality when seasonal fluctuations grow with the trend level — common in retail and economic data where peak seasons represent percentage increases rather than absolute ones. Prophet also handles sub-daily data, multiple seasonality patterns (daily + weekly + yearly), and regressors for external variables like marketing spend or weather.

sktime: The Unified Time Series Framework #

sktime addresses a critical gap in the Python ecosystem: the absence of a unified, scikit-learn-compatible framework for time series tasks. While scikit-learn excels at cross-sectional machine learning, its API assumes independent observations — an assumption violated by autocorrelated time series data. sktime extends scikit-learn’s interface (fit, predict, transform) to forecasting, classification, regression, and clustering while maintaining full composability with existing scikit-learn pipelines and grid search.

The library’s design philosophy centers on composability and reduction. For forecasting, sktime implements reduction strategies that transform forecasting problems into supervised regression problems that any scikit-learn estimator can solve. The recursive strategy trains a regressor to predict one step ahead, then feeds predictions back as inputs for multi-step forecasting. The direct strategy trains separate models for each forecast horizon. These reductions let you use Random Forests, Gradient Boosting, or Support Vector Machines for forecasting without manual feature engineering.

sktime Pipelines and Model Composition #

sktime pipelines compose transformations and estimators in sequence, just like scikit-learn’s Pipeline. The TransformedTargetForecaster applies transformations (differencing, Box-Cox, detrending) to the target series before fitting and inversely transforms predictions afterward. This pattern handles non-stationary series gracefully without manual preprocessing.

Ensembling in sktime combines multiple forecasters through strategies like simple averaging, weighted combinations, or stacking. The EnsembleForecaster trains diverse models (ARIMA, Exponential Smoothing, Reduced Regression) and combines their predictions, often outperforming individual components. Grid search with ForecastingGridSearchCV tunes hyperparameters using time-series-aware cross-validation that respects temporal ordering — preventing data leakage that would inflate performance estimates unrealistically.

sktime includes a substantial model zoo covering statistical methods (ARIMA, ETS, Theta), machine learning reductions, and interfaces to deep learning libraries. Benchmark datasets (load_airline, load_longley) provide standardized evaluation targets. For researchers and practitioners who value scikit-learn’s design patterns, sktime offers the most coherent time series API available.

statsmodels: Classical Statistical Foundation #

statsmodels provides the statistical rigor that underpins modern time series analysis. While newer libraries offer convenience and deep learning power, statsmodels delivers maximum interpretability and theoretical grounding. The library implements ARIMA and SARIMA (seasonal ARIMA) models with full diagnostic output: parameter estimates with standard errors, Ljung-Box tests for residual autocorrelation, Jarque-Bera tests for normality, and information criteria (AIC, BIC) for model selection.

The ARIMA model in statsmodels follows the classical Box-Jenkins methodology: identify model order through ACF/PACF analysis, estimate parameters via maximum likelihood, and validate through residual diagnostics. While this process requires more analyst judgment than Prophet’s automated approach, it produces models where every parameter has a clear statistical interpretation. This interpretability matters enormously in regulated industries where “the neural network said so” is not an acceptable explanation.

Beyond ARIMA, statsmodels offers Vector Autoregression (VAR) for multivariate systems where multiple series influence each other, exponential smoothing methods (Holt-Winters) for trend and seasonal data, and seasonal decomposition (additive and multiplicative) for isolating trend, seasonal, and residual components. Stationarity testing through Augmented Dickey-Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests provides the statistical foundation for determining differencing requirements.

Darts: Modern Deep Learning for Time Series #

Darts (Darts: Unified Time Series) represents the cutting edge of Python time series libraries. Developed by Unit8 and actively maintained, Darts provides a unified interface to over 40 forecasting models ranging from classical exponential smoothing to state-of-the-art deep learning architectures. The library’s standout feature is its deep learning model collection: N-BEATS and N-HiTS for univariate forecasting, Temporal Fusion Transformers (TFT) for multi-horizon forecasting with interpretable attention, DeepAR for probabilistic forecasting, and N-HiTS for hierarchical time series.

Probabilistic forecasting distinguishes Darts from Prophet and sktime. Rather than predicting point estimates, Darts’ deep learning models output full predictive distributions — you get not just an expected value but quantile estimates at any confidence level. This capability is essential for risk management, inventory optimization, and any decision problem where uncertainty matters as much as the point forecast.

Darts’ TimeSeries class provides a consistent data structure across all models, with built-in support for train/test splitting, scaling, and backtesting. The backtesting framework simulates historical forecast scenarios — training on data up to time T, forecasting T+1 through T+H, then moving the training window forward — producing realistic performance estimates that account for model updating over time. GPU acceleration through PyTorch enables training on large datasets in minutes rather than hours.

Feature Engineering for Time Series #

Regardless of which library you choose, feature engineering dramatically impacts forecasting accuracy. The most impactful features for traditional ML approaches include:

  • Lag features: values at previous time steps (y(t-1), y(t-7), y(t-365))
  • Rolling statistics: moving averages, standard deviations, min/max over sliding windows
  • Datetime features: hour, day of week, month, quarter, year, is_holiday flags
  • Fourier terms: sine and cosine transforms of time indices to capture seasonality

The tsfresh library integrates with sktime and scikit-learn to automatically extract 794 features from time series, including trend, seasonality, complexity, and entropy measures. Feature selection then identifies the subset most predictive for your specific problem.

Time-based train-test splitting is critical and frequently done incorrectly. Standard random splitting destroys temporal structure and produces inflated performance estimates. Instead, use contiguous splits: train on the first 80% chronologically, test on the final 20%. For rolling evaluation, use expanding or sliding windows that respect the temporal ordering. All major libraries — Prophet’s cross-validation, sktime’s ForecastingGridSearchCV, Darts’ backtesting — implement correct temporal splitting, but custom evaluation code must handle this carefully.

Complete Tool Comparison Matrix #

DimensionProphetsktimestatsmodelsDarts
Model TypesAdditive regressionStatistical + ML reductionStatistical (ARIMA, VAR, ETS)Deep learning + classical
Ease of UseExcellent (minimal API)Good (scikit-learn pattern)Moderate (statistical knowledge)Moderate (deep learning concepts)
ScalabilitySingle machine, ~1M rowsSingle machineSingle machineGPU accelerated, distributed
Probabilistic OutputYes (uncertainty intervals)LimitedYes (distribution parameters)Yes (full distributions)
MultivariateLimited (regressors only)Yes (via reduction)Yes (VAR)Yes (TFT, DeepAR, N-BEATS)
GPU SupportNoNoNoYes (PyTorch backend)
InterpretabilityGood (component plots)GoodExcellentModerate (attention for TFT)
Deep LearningNoNo (interfaces available)NoYes (extensive)
Best ForBusiness forecasting, baselinesML pipelines, researchStatistical rigor, interpretabilityComplex patterns, large scale
Production ReadyYes (battle-tested)YesYesYes (with PyTorch ops)

The matrix reveals complementary strengths rather than direct competition. Prophet and statsmodels serve analysts needing interpretable, statistically grounded models. sktime bridges statistical and machine learning approaches through scikit-learn compatibility. Darts pushes the frontier with deep learning for complex, large-scale problems where interpretability trades off against accuracy.

Building an End-to-End Forecasting Pipeline #

A complete forecasting workflow proceeds through defined stages:

  1. Data exploration: Plot the series, check for trends, seasonality, and anomalies. Use statsmodels’ seasonal decomposition to isolate components.

  2. Preprocessing: Handle missing values through interpolation, detect and treat outliers, ensure regular frequency (resample if needed).

  3. Feature engineering: Create lag features, rolling statistics, and datetime features. Use tsfresh for automated feature extraction if using ML approaches.

  4. Model selection: Start with Prophet for a strong baseline (1-2 hours). Benchmark against sktime’s reduced regression models and statsmodels’ ARIMA.

  5. Training: Fit selected models on training data. For deep learning, use Darts with GPU acceleration and early stopping.

  6. Evaluation: Compare models using time-series-aware cross-validation. Report MAE, RMSE, MAPE, and MASE across multiple horizons.

  7. Backtesting: Simulate historical forecasts with Darts’ backtesting or Prophet’s cross-validation to assess real-world performance.

  8. Deployment: Export Prophet as JSON, Darts models as PyTorch checkpoints, or serialize sktime pipelines with joblib.

  9. Monitoring: Track forecast accuracy versus actuals. Detect drift through prediction error trends and retrain when accuracy degrades.

Common Pitfalls and Best Practices #

Data leakage is the most destructive and common error. Including future information in training features — directly or indirectly — produces misleadingly accurate validation metrics and catastrophic production performance. Verify that every feature used at time T is computable using only information available at or before T.

Improper cross-validation runs a close second. Random k-fold splitting on time series data produces training sets containing future observations relative to validation folds. Always use temporal cross-validation: either expanding windows (train on 1..T, validate on T+1..T+H) or sliding windows (train on T-W..T, validate on T+1..T+H).

Irregular timestamps plague real-world datasets. Sensor data with missing readings, financial data with market holidays, and event logs with bursty arrivals all violate the regular sampling assumption underlying many models. Resample to regular frequency and handle gaps explicitly through interpolation or missing value modeling.

Multiple seasonality appears in data with overlapping cycles — daily + weekly + yearly patterns in electricity demand, for instance. Prophet handles this naturally. For other libraries, engineer Fourier terms at each seasonal period or use Darts’ N-BEATS/N-HiTS models that learn seasonality implicitly through deep learning.

Forecast horizon selection should match the decision timeline. A supply chain team needing monthly replenishment orders should evaluate models on 30-day ahead forecasts, not one-step-ahead accuracy. Multi-horizon evaluation through Darts or Prophet’s cross-validation with custom horizons provides the relevant performance metrics.

Frequently Asked Questions #

Should I use Prophet or ARIMA for forecasting? #

Prophet generally produces better results with less effort for business time series with strong seasonality, trend changes, and holiday effects. ARIMA (via statsmodels) excels when you need statistical rigor, have shorter series (under 100 observations), or require interpretable parameters for regulatory documentation. A practical approach: fit Prophet first for your baseline, then benchmark against ARIMA/SARIMA. If ARIMA comes within 5% of Prophet’s accuracy and interpretability matters, choose ARIMA. If Prophet significantly outperforms or you need holiday modeling, stick with Prophet.

Can sktime handle multivariate time series? #

Yes, through several mechanisms. The VAR (Vector Autoregression) implementation handles multivariate statistical forecasting directly. Reduction strategies can use any scikit-learn regressor that supports multi-output prediction. For more complex multivariate relationships, sktime interfaces with deep learning libraries, though Darts provides a more native multivariate deep learning experience. sktime’s strength is unifying these approaches under one API rather than excelling at any single multivariate method.

Is Darts better than Prophet for deep learning? #

Darts and Prophet solve different problems. Prophet is not a deep learning library — it uses Bayesian curve fitting. Darts includes deep learning models (N-BEATS, TFT, DeepAR) that capture complex nonlinear patterns Prophet cannot. On datasets with strong trends and seasonality, Prophet often matches or beats deep learning models with far less computation. On datasets with irregular patterns, multiple interacting series, or long-range dependencies, Darts’ deep learning models typically win. Start with Prophet for baselines; upgrade to Darts when complexity demands it.

How do I prevent data leakage in time series? #

Follow three rules strictly. First, never use future information to compute features or make predictions at time T — this includes target values, but also features derived from future periods. Second, always split temporally: training data must precede validation data chronologically. Third, when performing feature selection or hyperparameter tuning, embed the selection process inside your cross-validation loop rather than selecting features on the full dataset beforehand. sktime’s ForecastingGridSearchCV and Darts’ backtesting implement these protections automatically; custom code must enforce them manually.

Which library is best for real-time forecasting? #

Real-time forecasting (sub-second predictions on streaming data) requires specific architectural choices. Prophet is unsuitable — it refits models from scratch and has no incremental update mechanism. statsmodels ARIMA supports apply for updating fitted models with new observations, making it viable for moderate-frequency updates. Darts’ deep learning models can score single observations quickly on GPU once trained, but model retraining remains batch-oriented. For true real-time systems, consider dedicated streaming libraries like River (formerly creme) for online learning, or deploy Darts/Prophet models with scheduled batch retraining rather than continuous updates.


To run any of the tools above reliably 24/7, infrastructure matters:

  • DigitalOcean — $200 free credit, 14+ global regions, one-click droplets for AI/dev workloads.
  • HTStack — Hong Kong VPS with low latency for mainland China access. This is the same IDC hosting dibi8.com — production-proven.

Affiliate links — no extra cost to you, helps keep dibi8.com running.

💬 Discussion