Tabular Q Learning for Automated Stock Trading
T.E. Truong
CS 221 Winter 2021: Artificial Intelligence Principles and Techniques
About
The project modeled the trading environment as a Markov Decision Process and used Q-learning to find the optimal trading policy. The state representation of the stock market involved discretized indicators such as MACD, ADX, and RSI but also included normalized price values of the current close price and of the owned stock price. The agent’s action were to buy/sell/hold a single stock.
I tested two reward functions: profit, Return On Investment (ROI). Using ROI led to desired behavior of selling and holding within the training set and performed considerably better than a trader that buys/sells/holds at random on the test set.
I will likely update this to a deep Q-learning algorithm so that the state representation will be more expressive.
Optimal policy shown on training set (before the black line) and on test set (after black line). Q-learning using reward 1 (top) and Q-learning using reward 2 (bottom)