2 posts tagged with "concept-drift"

Adapting Stock Forecasts with AI

December 29, 2024 · 7 min read

Senior Software Engineer at Vitrifi

Introduction

Financial markets are dynamic: price trends, volatility, and patterns constantly change. These shifts in data distribution, commonly called concept drift, pose a serious challenge for AI models trained on historical data. When the market regime changes—such as transitioning from a calm to a volatile environment—a “stale” model can drastically lose predictive power.

DDG-DA (Data Distribution Generation for Predictable Concept Drift Adaptation) addresses this by forecasting how the data distribution might evolve in the future, instead of only reacting to the most recent data. The approach is rooted in meta-learning (via Qlib’s Meta Controller framework) and helps trading or investment models stay ahead of new trends.

By the end of this article, you will understand:

Why concept drift complicates forecasting in stocks and other financial time series
How DDG-DA uses a future distribution predictor to resample training data
How to incorporate this into Qlib-based workflows to improve stock return and risk-adjusted performance

Concept Drift in Stock Markets

Concept drift refers to changes in the underlying distribution of stock market data. These changes can manifest in multiple ways:

Trends: Bull or bear markets can shift faster or slower than expected
Volatility: Sudden spikes can invalidate models calibrated during calmer periods
Patterns: Market microstructure changes or new correlations can emerge, causing old patterns to wane

Traditional methods often react after drift appears (by retraining on recent data). However, if the drift is somewhat predictable, we can model its trajectory—and proactively train models on future conditions before they fully materialize.

Diagram: Concept Drift Overview

Here, a continuous market data stream (A) encounters distribution shifts (B). These can appear as new trends (C), volatility regimes (D), or changed patterns (E). As a result, a previously trained model (F) gradually loses accuracy (G) if not adapted.

DDG-DA: High-Level Approach

The core principle behind DDG-DA is to forecast the distribution shift itself. Specifically:

Predict Future Distributions
- A meta-model observes historical tasks (for example, monthly or daily tasks in which you train a new stock-prediction model).
- This meta-model estimates how the data distribution might move in the next period, such as anticipating an uptick in volatility or a shift in factor exposures.
Generate Synthetic Training Samples
- Using the distribution forecast, DDG-DA resamples historical data to emulate the expected future conditions.
- It might assign higher weights to periods with similar volatility or market conditions so the final training set reflects what the market might soon become.
Train or Retrain the Forecasting Model
- Your usual forecasting model (for example, LightGBM or LSTM) is then retrained on these forward-looking samples, aligning better with the next period’s actual data distribution.
- As a result, the model remains more accurate when concept drift occurs.

Diagram: DDG-DA Core Steps

This process repeats periodically (for example, each month) to keep your forecasting models aligned with upcoming market conditions.

How It Integrates with Qlib

Qlib provides an AI-oriented Quantitative Investment Platform that handles:

Data: Collecting and structuring historical pricing data, factors, and fundamentals
Modeling: Building daily or intraday forecasts using built-in ML or custom models
Meta Controller: A specialized component for tasks like DDG-DA, which revolve around higher-level meta-learning and distribution adaptation

Diagram: Qlib plus DDG-DA Integration

Qlib Data Layer (A): Feeds into a standard ML pipeline for daily or intraday forecasting (B).
DDG-DA sits in the Meta Controller (C), analyzing tasks, predicting distribution changes, and adjusting the pipeline.
Results circle back into Qlib for backtesting and analysis (D).

Example: Monthly Stock Trend Forecasting

Setting the Tasks
- Suppose you update your stock-ranking model every month, using the last 2 years of data.
- Each month is a “task” in Qlib. Over multiple months, you get a series of tasks for training and validation.
Train the Meta-Model
- DDG-DA learns a function that maps old data distribution patterns to new sample weights.
- This ensures the next month’s training data distribution is closer to the actual conditions that month.
Evaluate
- Compare the results to standard approaches:
  - Rolling Retrain: Only uses the most recent data, ignoring the predictable drift pattern
  - Gradual Forgetting: Weighted by how recent data is, but no direct distribution forecast
  - DDG-DA: Weighs data by predicted future distribution, leading to stronger alignment when drift is not purely random

Diagram: Monthly Task Workflow

Performance and Findings

Research in the associated DDG-DA paper and Qlib examples shows:

Better Signal Quality: Higher Information Coefficient (IC) for stock selection
Enhanced Portfolio Returns: Larger annual returns, improved Sharpe Ratio, and lower drawdowns in backtests
Versatility: Works with a wide range of ML models (Linear, LightGBM, neural networks)
Limitations: If concept drift is completely random or abrupt (no pattern), DDG-DA’s advantages diminish. Predictability is key

Diagram: Performance Improvement

Practical Steps

Install Qlib and ensure you have the dataset (for example, Alpha158) set up

Clone the DDG-DA Example from the Qlib GitHub:

git clone https://github.com/microsoft/qlib.git
cd qlib/examples/benchmarks_dynamic/DDG-DA

Install Requirements:
```
pip install -r requirements.txt
```
Run the Workflow:
```
python workflow.py run
```
- By default, it uses a simple linear forecasting model
- To use LightGBM or another model, specify the --conf_path argument, for example:
```
python workflow.py --conf_path=../workflow_config_lightgbm_Alpha158.yaml run
```
Analyze Results:
- Qlib’s recorder logs signal metrics (IC, ICIR) and backtest performance (annual return, Sharpe)
- Compare with baseline methods (Rolling Retrain, Exponential Forgetting, etc.)

Diagram: Running DDG-DA Workflow

Conclusion

DDG-DA shows how AI can proactively tackle concept drift in stock forecasting. Instead of merely reacting to new data, it anticipates potential distribution changes, producing a more robust, forward-looking training set. When integrated into Qlib’s Meta Controller, it seamlessly fits your existing pipelines, from data ingestion to backtesting.

For practical use:

Ensure your market conditions exhibit some predictability. Random, sudden changes are harder to model
Combine with conventional best practices (risk management, hyperparameter tuning) for a holistic pipeline
Monitor performance: If drift patterns shift, you may need to retrain or retune the DDG-DA meta-model

By forecasting future market states and adapting ahead of time, DDG-DA helps your quantitative strategies remain agile and profitable in evolving financial environments.

Leveraging Qlib and MLflow for Unified Experiment Tracking

December 29, 2024 · 5 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

Financial markets present a dynamic environment where active research and experimentation are critical. Qlib offers a complete “AI-oriented” solution for quantitative investment—covering data loaders, feature engineering, model training, and evaluation. Meanwhile, MLflow provides robust functionality for experiment tracking, handling metrics, artifacts, and hyperparameters across multiple runs. You can further enhance your documentation using specialized syntax for highlighting important information, such as notes or warnings, to help readers navigate complex workflows.

This article shows how to integrate Qlib and MLflow to manage your entire workflow—from data ingestion and factor engineering to model storage and versioning—under a single, unified experiment system. It also demonstrates various ways to emphasize notes or warnings to help readers explore the complexities of this setup.

By the end of this article, you will learn:

How Qlib manages data and modeling in a typical quant workflow
How MLflow tracks experiment artifacts, logs metrics, and organizes multiple runs
How to integrate Qlib’s “Recorder” concept with MLflow’s tracking

1. Qlib Overview

Qlib is a powerful open-source toolkit designed for AI-based quantitative investment. It streamlines common challenges in this domain:

Data Layer: Standardizes daily or intraday bars, fundamental factors, and alpha signals
Feature Engineering: Offers an expression engine (alpha modeling) plus factor definitions
Modeling: Easily pluggable ML models (LightGBM, Linear, RNN, etc.) with out-of-the-box training logic
Evaluation and Backtest: Includes modules for analyzing signals, computing IC/RankIC, and running trading strategies in a backtest simulator

Diagram: Qlib Architecture

Below is a high-level view of Qlib’s architecture—how data flows from raw sources into Qlib’s data handlers, transforms into features, and ultimately fuels model training.

note

Some Qlib features—like intraday data handling or advanced factor expressions—may require additional configuration. Double-check your data paths and environment setup to ensure all pieces are properly configured.

2. MLflow Overview

MLflow is an experiment-tracking tool that organizes runs and artifacts:

Tracking: Logs params, metrics, tags, and artifacts (model checkpoints, charts)
UI: A local or remote interface (mlflow ui) for comparing runs side by side
Model Registry: Version controls deployed models, enabling easy rollback or re-deployment

Diagram: MLflow Overview

warning

When configuring MLflow on remote servers, remember to secure the tracking server appropriately. Unsecured endpoints may expose logs and artifacts to unintended parties.

3. Combining Qlib and MLflow

In typical usage, Qlib handles data ingestion, feature transformations, and model training. MLflow complements it by capturing:

Run Metadata: Each Qlib “Recorder” maps to an MLflow run
Metrics & Params: Qlib logs metrics like Sharpe Ratio or Information Coefficient (IC); MLflow’s UI centralizes them
Artifacts: Saved model files, prediction results, or charts are stored in MLflow’s artifact repository

Diagram: Qlib + MLflow Integration

Below is a top-down diagram showing how user code interacts with Qlib, which in turn leverages MLflow for run logging.

4. Minimal Example

Here’s a simplified script showing the synergy among the three components:

import qlib
from qlib.workflow import R
from qlib.utils import init_instance_by_config

# 1) Init Qlib and MLflow
qlib.init(
    exp_manager={
        "class": "MLflowExpManager",
        "module_path": "qlib.workflow.expm",
        "kwargs": {
            "uri": "file:/path/to/mlruns",
            "default_exp_name": "QlibExperiment"
        },
    }
)

# 2) Start experiment and train
with R.start(experiment_name="QlibExperiment", recorder_name="run1"):
    # Basic config
    model_config = {"class": "LightGBMModel", "kwargs": {"learning_rate": 0.05}}
    dataset_config = {...}

    model = init_instance_by_config(model_config)
    dataset = init_instance_by_config(dataset_config)
    model.fit(dataset)

    # Evaluate
    predictions = model.predict(dataset)

    # log some metrics
    R.log_metrics(Sharpe=1.2, IC=0.03)

    # Save artifacts
    R.save_objects(**{"pred.pkl": predictions, "trained_model.pkl": model})

info

The snippet above logs metrics like Sharpe or IC, making them easily comparable across multiple runs. You can further log hyperparameters via R.log_params(...) for more granular comparisons.

Results:

A new MLflow run named “run1” under “QlibExperiment”
MLflow logs parameters/metrics (learning_rate, Sharpe, IC)
Artifacts “pred.pkl” and “trained_model.pkl” appear in MLflow’s artifact UI

5. Best Practices

Organize Qlib tasks: Use Qlib’s SignalRecord or PortAnaRecord classes to store signals/backtest results, ensuring logs are automatically tied to MLflow runs
Parameter Logging: Send hyperparameters or relevant config to R.log_params(...) for easy comparison in MLflow
Artifact Naming: Keep artifact names consistent (e.g., "pred.pkl") across multiple runs
Model Registry: Consider pushing your best runs to MLflow’s Model Registry for versioned deployment

danger

A mismatch between your local Qlib environment and remote MLflow server can cause logging errors. Ensure both environments are in sync (same Python versions, same library versions).

6. Conclusion

By connecting Qlib’s experiment pipeline to MLflow’s tracking features—and documenting everything thoroughly—you get the best of all worlds:

Qlib: AI-centric quant platform automating data handling, factor engineering, and modeling
MLflow: A robust interface for comparing runs, storing artifacts, and version-controlling the entire process

This synergy simplifies large-scale experimentation—especially when you frequently iterate over factor definitions, hyperparameters, or new trading strategies.

2 posts tagged with "concept-drift"

Adapting Stock Forecasts with AI

Introduction

Concept Drift in Stock Markets

Diagram: Concept Drift Overview

DDG-DA: High-Level Approach

Diagram: DDG-DA Core Steps

How It Integrates with Qlib

Diagram: Qlib plus DDG-DA Integration

Example: Monthly Stock Trend Forecasting

Diagram: Monthly Task Workflow

Performance and Findings

Diagram: Performance Improvement

Practical Steps

Diagram: Running DDG-DA Workflow

Conclusion

Further Reading and References

Leveraging Qlib and MLflow for Unified Experiment Tracking

Introduction

1. Qlib Overview

Diagram: Qlib Architecture

2. MLflow Overview

Diagram: MLflow Overview

3. Combining Qlib and MLflow

Diagram: Qlib + MLflow Integration

4. Minimal Example

5. Best Practices

6. Conclusion

Further Reading and References

Introduction​

Concept Drift in Stock Markets​

Diagram: Concept Drift Overview​

DDG-DA: High-Level Approach​

Diagram: DDG-DA Core Steps​

How It Integrates with Qlib​

Diagram: Qlib plus DDG-DA Integration​

Example: Monthly Stock Trend Forecasting​

Diagram: Monthly Task Workflow​

Performance and Findings​

Diagram: Performance Improvement​

Practical Steps​

Diagram: Running DDG-DA Workflow​

Conclusion​

Further Reading and References​

Introduction​

1. Qlib Overview​

Diagram: Qlib Architecture​

2. MLflow Overview​

Diagram: MLflow Overview​

3. Combining Qlib and MLflow​

Diagram: Qlib + MLflow Integration​

4. Minimal Example​

5. Best Practices​

6. Conclusion​

Further Reading and References​

Introduction

Concept Drift in Stock Markets

Diagram: Concept Drift Overview

DDG-DA: High-Level Approach

Diagram: DDG-DA Core Steps

How It Integrates with Qlib

Diagram: Qlib plus DDG-DA Integration

Example: Monthly Stock Trend Forecasting

Diagram: Monthly Task Workflow

Performance and Findings

Diagram: Performance Improvement

Practical Steps

Diagram: Running DDG-DA Workflow

Conclusion

Further Reading and References

Introduction

1. Qlib Overview

Diagram: Qlib Architecture

2. MLflow Overview

Diagram: MLflow Overview

3. Combining Qlib and MLflow

Diagram: Qlib + MLflow Integration

4. Minimal Example

5. Best Practices

6. Conclusion

Further Reading and References