How FinRL's Pipeline Enhances Trading Performance in Real-time Markets

8 Jun 2024


(1) Xiao-Yang Liu, Hongyang Yang, Columbia University (xl2427,;

(2) Jiechao Gao, University of Virginia (;

(3) Christina Dan Wang (Corresponding Author), New York University Shanghai (

Abstract and 1 Introduction

2 Related Works and 2.1 Deep Reinforcement Learning Algorithms

2.2 Deep Reinforcement Learning Libraries and 2.3 Deep Reinforcement Learning in Finance

3 The Proposed FinRL Framework and 3.1 Overview of FinRL Framework

3.2 Application Layer

3.3 Agent Layer

3.4 Environment Layer

3.5 Training-Testing-Trading Pipeline

4 Hands-on Tutorials and Benchmark Performance and 4.1 Backtesting Module

4.2 Baseline Strategies and Trading Metrics

4.3 Hands-on Tutorials

4.4 Use Case I: Stock Trading

4.5 Use Case II: Portfolio Allocation and 4.6 Use Case III: Cryptocurrencies Trading

5 Ecosystem of FinRL and Conclusions, and References

3.5 Training-Testing-Trading Pipeline

The "training-testing" workflow used by conventional machine learning methods falls short for financial tasks. It splits the data into training set and testing set. On the training data, users select features and tune parameters; then evaluate on the testing data. However, financial tasks will experience a simulation-to-reality gap between the testing performance and real-live market performance. Because the testing here is offline backtesting, while the users’ goal is to place orders in a real-world market.

FinRL employs a “training-testing-trading" pipeline to reduce the simulation-to-reality gap. We use historical data (time series) for the “training-testing" part, which is the same as conventional machine learning tasks, and this testing period is for backtesting purpose. For the “trading" part, we use live trading APIs, such as CCXT, Alpaca, or Interactive Broker, allowing users carry out trades directly in a trading system. Therefore, FinRL directly connects with live trading APIs: 1). downloads live data, 2). feeds data to the trained DRL model and obtains the trading positions, and 3). allows users to place trades.

Fig. 4 illustrates the “training-testing-trading” pipeline:

Step 1). A training window to retrain an agent.

Step 2). A testing window to evaluate the trained agent, while hyperparameters can be tuned iteratively.

Step 3). Use the trained agent to trade in a trading window.

Rolling window is used in the training-testing-trading pipeline, because the investors and portfolio managers need to retrain the model periodically as time goes ahead. FinRL provides flexible selections of rolling windows, such as monthly, quarterly, yearly windows, or by users’ specifications.

This paper is available on arxiv under CC BY 4.0 DEED license.