Repository: https://github.com/chirindaopensource/continuous_time_rl_for_alm
Owner: 2025 Craig Chirinda (Open Source Projects)
This repository contains an independent, professional-grade Python implementation of the research methodology from the 2025 paper entitled "Continuous-Time Reinforcement Learning for Asset-Liability Management" by:
- Yilie Huang
The project provides a complete, end-to-end computational framework for replicating the paper's novel continuous-time reinforcement learning approach to ALM. It delivers a modular, auditable, and extensible pipeline that executes the entire research workflow: from rigorous, reproducible experimental setup and parallelized simulation to comprehensive statistical analysis and the generation of all publication-quality figures and tables.
- Introduction
- Theoretical Background
- Features
- Methodology Implemented
- Core Components (Notebook Structure)
- Key Callables
- Prerequisites
- Installation
- Input Data Structure
- Usage
- Output Structure
- Project Structure
- Customization
- Contributing
- Recommended Extensions
- License
- Citation
- Acknowledgments
This project provides a Python implementation of the methodologies presented in the 2025 paper "Continuous-Time Reinforcement Learning for Asset-Liability Management." The core of this repository is the iPython Notebook continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb, which contains a comprehensive suite of functions to replicate the paper's findings, from initial data validation to the final generation of all analytical tables and figures.
The paper introduces a novel model-free, continuous-time reinforcement learning (RL) algorithm for the Asset-Liability Management (ALM) problem. It frames the problem as a Linear-Quadratic (LQ) control task and develops a soft actor-critic method with adaptive exploration to dynamically manage the surplus deviation between assets and liabilities. This codebase operationalizes this framework, allowing users to:
- Rigorously validate and manage the entire experimental configuration.
- Systematically generate reproducible, randomized market scenarios based on a stochastic differential equation (SDE) model.
- Execute large-scale, parallelized simulations comparing the proposed ALM-RL agent against six distinct baselines.
- Perform comprehensive statistical analysis using non-parametric tests to validate performance claims.
- Conduct a full suite of robustness analyses, including hyperparameter sensitivity, market parameter stress tests, and discretization analysis.
The implemented methods are grounded in stochastic optimal control, reinforcement learning, and numerical methods for SDEs.
1. ALM as a Linear-Quadratic (LQ) Control Problem:
The core of the problem is to control the surplus deviation, x(t), from a target. Its dynamics are modeled by the SDE:
$$
dx(t) = (A x(t) + B u(t))dt + (C x(t) + D u(t))dW(t)
$$
where u(t) is the control action. The objective is to maximize the expected value of a quadratic functional that penalizes deviations over a finite horizon [0, T]:
$$
\max_{u} \mathbb{E}\left[ \int_{0}^{T} -\frac{1}{2}Qx(t)^2 dt - \frac{1}{2}Hx(T)^2 \right]
$$
2. Continuous-Time Soft Actor-Critic:
Since the market parameters A, B, C, D are unknown, a model-free RL approach is used. The paper develops a continuous-time soft actor-critic algorithm based on an entropy-regularized objective:
$$
J(t, x; \pi) = \mathbb{E}\left[ \int_{t}^{T} \left(-\frac{1}{2}Qx(s)^2 + \gamma p(s)\right) ds - \frac{1}{2}Hx(T)^2 \Big| x(t)=x \right]
$$
where p(s) is the entropy of the stochastic policy π.
3. Key Algorithmic Features:
- Parametric Forms: Based on LQ theory, the value function
Jis parameterized as a quadratic function ofx, and the policyπis a Gaussian distribution whose mean is linear inx. - Adaptive Exploration: The policy's variance (actor exploration) is learned via policy gradient.
- Scheduled Exploration: The entropy temperature
γ(critic exploration) follows a deterministic, decaying schedule. - Update Rules: The agent learns via discretized versions of continuous-time temporal difference and policy gradient updates (Eqs. 16, 17, 18 in the paper).
The provided iPython Notebook (continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb) implements the full research pipeline, including:
- Modular, Multi-Task Architecture: The entire pipeline is broken down into 13 distinct, modular tasks, each with its own orchestrator function, covering validation, setup, simulation, analysis, and reporting.
- Configuration-Driven Design: All experimental parameters are managed in an external
config.yamlfile, allowing for easy customization and replication without code changes. - Multi-Algorithm Support: Complete, from-scratch implementations of the proposed ALM-RL agent and six baselines: DCPPI, ACS, MBP, SAC, PPO, and DDPG.
- Rigorous Reproducibility: A multi-level seeding protocol ensures bitwise reproducibility of market scenarios and isolates stochastic streams for fair agent comparison.
- Parallelized Execution: The main experimental pipeline is designed for parallel execution across multiple CPU cores, dramatically reducing the time required for the 200 independent runs.
- Comprehensive Analysis Suite: Implements the full statistical analysis from the paper, including moving average smoothing, terminal performance extraction, and one-sided Wilcoxon signed-rank tests.
- Robustness Analysis Module: Includes a full suite of post-hoc analyses to test hyperparameter sensitivity, robustness to extreme market conditions, and sensitivity to SDE discretization.
- Automated Reporting: Programmatic generation of all key tables and figures from the paper.
The core analytical steps directly implement the methodology from the paper:
- Validation (Task 1): Ingests and rigorously validates the
config.yamlfor structural, mathematical, and logical consistency. - Setup (Task 2): Establishes the deterministic seeding hierarchy for the entire experiment.
- Initialization (Task 3): Generates the 200 randomized market scenarios and the corresponding initial parameters for all agents.
- Agent & Environment Implementation (Tasks 4-7): Provides complete, professional-grade implementations of all agents and the SDE environment.
- Execution (Task 8): Runs the main simulation pipeline in parallel, executing 20,000 episodes for each of the 7 agents across all 200 market scenarios.
- Metrics & Analysis (Tasks 9-10): Processes the raw simulation data to compute smoothed learning curves, terminal performance, and the final p-value matrix.
- Visualization (Task 11): Generates the final, publication-quality plots and summary tables.
- Orchestration & Robustness (Tasks 12-13): Provides top-level orchestrators to run the main pipeline and the additional robustness analyses.
The continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb notebook is structured as a logical pipeline with modular orchestrator functions for each of the major tasks. All functions are self-contained, fully documented with type hints and docstrings, and designed for professional-grade execution.
The project is designed around a single, top-level user-facing interface function:
main: This master orchestrator function, located in the final section of the notebook, runs the entire automated research pipeline from end-to-end. It can be configured to run the main reproduction experiment, the robustness analyses, or both. A single call to this function reproduces the entire computational portion of the project.
- Python 3.9+
- Core dependencies:
numpy,pandas,scipy,pyyaml,torch,gymnasium,matplotlib,seaborn,tqdm.
-
Clone the repository:
git clone https://github.com/chirindaopensource/continuous_time_rl_for_alm.git cd continuous_time_reinforcement_learning_asset_liability_management -
Create and activate a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install Python dependencies:
pip install numpy pandas scipy pyyaml torch gymnasium matplotlib seaborn tqdm
The pipeline is driven by a single config.yaml file. No external datasets are required, as the market scenarios are procedurally generated based on the parameters within this file.
The continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb notebook provides a complete, step-by-step guide. The primary workflow is to execute the final cell of the notebook, which calls the top-level main orchestrator:
# Final cell of the notebook or in a main.py script
# Load the configuration from the YAML file.
STUDY_INPUTS = load_config('config.yaml')
# Run the entire study (reproduction and robustness analysis).
final_artifacts = main(
study_params=STUDY_INPUTS,
run_reproduction=True,
run_robustness=True,
num_workers=8 # Adjust based on available CPU cores
)
# The `final_artifacts` dictionary will contain the key results DataFrames.The main function creates one or two output directories (alm_rl_reproduction_output/ and alm_rl_robustness_output/) with the following structure:
output_directory/
│
├── data/
│ ├── seed_table.csv
│ ├── market_params_table.csv
│ ├── alm_rl_initial_table.csv
│ ├── baselines_initial_table.csv
│ ├── raw_results.npy
│ ├── learning_curves.csv
│ ├── terminal_performance.csv
│ └── p_value_matrix.csv
│
├── figures/
│ ├── figure1_learning_curves.png
│ └── figure2_p_value_heatmap.png
│
└── tables/
└── table1_summary_statistics.html
continuous_time_rl_for_alm/
│
├── continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb # Main implementation notebook
├── config.yaml # Master configuration file
├── requirements.txt # Python package dependencies
├── LICENSE # MIT license file
└── README.md # This documentation file
The pipeline is highly customizable via the config.yaml file. Users can easily modify all experimental parameters, including the number of runs/episodes, SDE parameter distributions, agent hyperparameters, and evaluation settings, without altering the core Python code.
Contributions are welcome. Please fork the repository, create a feature branch, and submit a pull request with a clear description of your changes. Adherence to PEP 8, type hinting, and comprehensive docstrings is required.
Future extensions could include:
- Alternative SDE Models: Integrating more complex market models, such as those with stochastic volatility (e.g., Heston model) or jumps.
- Multi-Asset Formulations: Extending the state and action spaces to handle a portfolio of multiple assets.
- Automated Hyperparameter Tuning: Wrapping the pipeline with a hyperparameter optimization library (e.g., Optuna) to automatically find the best settings for the ALM-RL agent.
- Real-World Data Application: Adapting the framework to use historical financial data by first estimating the SDE parameters from time series data.
This project is licensed under the MIT License.
If you use this code or the methodology in your research, please cite the original paper:
@inproceedings{huang2025continuous,
author = {Huang, Yilie},
title = {Continuous-Time Reinforcement Learning for Asset-Liability Management},
booktitle = {Proceedings of the 6th ACM International Conference on AI in Finance},
series = {ICAIF '25},
year = {2025},
publisher = {ACM},
note = {arXiv:2509.23280}
}For the implementation itself, you may cite this repository:
Chirinda, C. (2025). A Professional-Grade Implementation of the "Continuous-Time RL for ALM" Framework.
GitHub repository: https://github.com/chirindaopensource/continuous_time_rl_for_alm
- Credit to Yilie Huang for the foundational research that forms the entire basis for this computational replication.
- This project is built upon the exceptional tools provided by the open-source community. Sincere thanks to the developers of the scientific Python ecosystem, including NumPy, Pandas, SciPy, PyTorch, Gymnasium, Matplotlib, and Jupyter, whose work makes complex computational analysis accessible and robust.
--
This README was generated based on the structure and content of continuous_time_reinforcement_learning_asset_liability_management_draft.ipynb and follows best practices for research software documentation.