StockTradingEnv_FinRLMeta
Action Space
The action space is a Box(-1,1,(stock_dim,),float32).
Each element a is in the range [-1,1], representing the normalized number
of shares to buy or sell for the corresponding stock.
a > 0: Buya*hmaxsharesa = 0: Holda < 0: Sella*hmaxshares
State Space
The state space consists of the following components:
Account balance (1 element): current account balance
Stock prices (
stock_dimelements): current price for each stockHolding positions (
stock_dimelements): number of shares held for each stockFeature vector (
I × stock_dimelements): technical indicators and turbulence index,Ielements for each stock
Thus, the total state dimension is 1 + (I + 2) × stock_dim.
Transition Dynamics
At each time step, the environment performs the following updates:
Scale and Discretize Actions
The agent outputs a continuous action vector
a_t ∈ [-1, 1]^{stock_dim}. Actions are scaled to integer share counts:actions = (actions * self.hmax).astype(int)
Risk Gate (Turbulence Check)
If
turbulence_thresholdis set and exceeded, all actions are overridden to force full liquidation:if self.turbulence_threshold is not None and self.turbulence >= self.turbulence_threshold: actions = np.array([-self.hmax] * self.stock_dim)
Execute Trades
Trades are executed in two passes to avoid interference between sells and buys:
Sells first (most negative actions first):
For each asset
i:If tradable and holdings > 0:
\[ \begin{align}\begin{aligned}\text{sell_qty} = \min(|a^{(i)}|, \text{holding}^{(i)})\\\text{cash} \leftarrow \text{cash} + \text{price}^{(i)} \text{sell_qty} (1 - \text{cost}^{\text{sell}}_i)\\\text{holding}^{(i)} \leftarrow \text{holding}^{(i)} - \text{sell_qty}\end{aligned}\end{align} \]Otherwise,
sell_qty = 0.
Buys second (largest positive actions first, skipped if turbulence exceeds threshold):
For each asset
i:If turbulence < turbulence_threshold:
\[ \begin{align}\begin{aligned}\text{buy_qty}_{\max} = \left\lfloor \frac{\text{cash}}{\text{price}^{(i)} (1 + \text{cost}^{\text{buy}}_i)} \right\rfloor\\\text{buy_qty} = \min(\text{buy_qty}_{\max}, a^{(i)})\\\text{cash} \leftarrow \text{cash} - \text{price}^{(i)} \text{buy_qty} (1 + c^{\text{buy}}_i)\\\text{holding}^{(i)} \leftarrow \text{holding}^{(i)} + \text{buy_qty}\end{aligned}\end{align} \]Otherwise,
buy_qty = 0.
Advance Time and Update State
The environment advances to the next trading day, loads the new market data (prices, indicators), recomputes turbulence index, and reconstructs the state vector with updated cash, holdings, and features.
Constraints
We set the transaction cost to 0.1% for buy and sell.
Reward Design
The goal is to maximize the gain in asset value, thus the reward is defined as the change in asset value, multiplied by a scaling multiplier.
It is calculated by (end_total_asset - begin_total_asset)*reward_scaling, where reward_scaling is set to 0.0001.
Initial State
At the start of each episode:
Account balance is initialized to
1,000,000.Holding positions are initialized to
0for all assets.Stock prices are set to the prices from the first trading day.
Feature vector is initialized using the feature values for the first day.
Ending Condition
The episode terminates when the trading day would exceed the available data range.