Machine Learning in Stock Market
Quick Overview
A publicly-traded company’s market value often reflects its financial performance, as captured in fundamental data. This data is comprised of reported financial information such as income, revenue, assets, dividends, and debt. It is presumed that a company’s stock price should ultimately move towards the company’s intrinsic value. One investing strategy called value investing is based on the idea that long-run prices reflect this intrinsic value and that the best features for predicting long-term intrinsic value are a company’s fundamental data.
In this challenge, we will try a new method —in which we identify value stocks depending on how well-priced the stock currently is with respect to its future fundamentals. Our task is to create a ML model that predicts a company’s Operating Income and Revenue, based on the previous 10 years of quarterly fundamental data. You will be given data on a selection of stocks and you have to create two models that can predict any company’s Operating Income and Revenue 12 months into the future.
- Since we have 50 csv file, each file contains 10 years of fundamental data on 50 publicly traded companies. The dataset also contains the target variables, labelled as: Revenue(Y) and Income(Y).
- The dataset that we have been given is a sample from the problem data. The data has been normalized with the companies market capitalization to bring data across companies to a similar scale.
- Apply K-Means Clustering, so that similar company grouped in a same cluster.
- Merge the dataset of the company which falls under same cluster because we expect companies which are in same cluster will behave similarly and also we will be having more data to train.
- Separate the class label from the dataset ie. Revenue, income class label. And fill the ‘inf’ value with the ‘0’ because dropping them was not a good idea since we already have very limited data and so model will not be able to train so well.
- Since we have more than 300 features so after applying PCA, we reduced the dimension and using covariance graph we figured out that only 5–12 features are important for each clusters.
- Created Two function to train a model for revenue and income prediction.