Tabular Q Learning for Automated Stock Trading

T.E. Truong
CS 221 Winter 2021: Artificial Intelligence Principles and Techniques

About

The project modeled the trading environment as a Markov Decision Process and used Q-learning to find the optimal trading policy. The state representation of the stock market involved discretized indicators such as MACD, ADX, and RSI but also included normalized price values of the current close price and of the owned stock price. The agent’s action were to buy/sell/hold a single stock.

I tested two reward functions: profit, Return On Investment (ROI). Using ROI led to desired behavior of selling and holding within the training set and performed considerably better than a trader that buys/sells/holds at random on the test set.

I will likely update this to a deep Q-learning algorithm so that the state representation will be more expressive.

 
 
Optimal policy shown on training set (before the black line) and on test set (after black line). Q-learning using reward 1 (top) and Q-learning using reward 2 (bottom)

Optimal policy shown on training set (before the black line) and on test set (after black line). Q-learning using reward 1 (top) and Q-learning using reward 2 (bottom)

Previous
Previous

CS 229 Spring 2021: Using Q-Learning to Personalize Pedagogical Policies for Addition Problems

Next
Next

Personal Project 2020: Posture Trainer