Machine Learning and how it helps in Stock Prediction

by Samay
15 minutes
Machine Learning and how it helps in Stock Prediction

Human beings can acclimatize to the changing environment through learning. Learning can simply be defined as acquisition of knowledge or skills through experience, study Or being taught. However, for a machine, it is very tough to acquire new knowledge and skills from a given dataset. 

Here,we shall deal with the technology and see how stock prediction has evolved with it. 

Machine Learning

Machine learning is a branch of artificial intelligence (AI) that provides machines or computer systems the ability to automatically learn and improve from experience without being explicitly programmed to do so. The process of gaining knowledge begins with the observation or data, such as example or instructions in order to look for patterns in the data. The primary goal is to let the machines or computer learn automatically without human intervention or assistance and adjust accordingly. 

Different algorithms are written on the basis of where the technology is to be used which is mostly in case it is difficult and infeasible to develop conventional algorithms such as autonomously drive cars, find terrorist suspects, filtering emails for spams etc. 

A Brief History of Machine Learning

The history of machine learning dates back to 1950s, with Alan Turing's question "Can machines think? "published in his seminal paper "Computer Machinery and Intelligence"on the topic of artificial intelligence.

 In 1952,Arthur Samuel ,the pioneer of machine learning created a program that helped IBM computer play checkers game and seven years later in 1959 he coined the term " Machine Learning".

Algorithms used in Machine Learning

There are four types of machine learning algorithms- supervised, semisupervised, unsupervised and reinforcement. 

Supervised Learning

In Supervised learning, a machine is taught through examples. The operator provides the algorithm with a known dataset that includes desired inputs and outputs for it to find out a method to determine how to arrive at those inputs and outputs. While the operator knows the correct answers to the problem, the algorithm  finds patterns in data, learns from observation and make predictions which are corrected by the operator and this continues until it achieves high level of accuracy/performance.

Under supervised learning comes the Classification, Regression and Forecasting

Semisupervised Learning

Semisupervised learning uses both labelled and unlabelled data. Labelled data is essentially information which has meaningful tags so that the machine learning algorithm can understand the data while unlabelled data lacks that information and can learn to label unlabelled data. 

Unsupervised Learning

The machine learning algorithm studies data by identifying patterns without any answer key or human operator to instruct. Instead the machine determines the correlations and relationships by analysing available data. The algorithm is left to interpret large data sets and address that data accordingly and tries to organize the data to describe its structure

This might mean grouping the data into clusters and arranging them into more organized form. 

Assessing more data improves its ability of decision making and it becomes more refined. 

Under this method, come clustering and dimension reduction. 

Reinforcement Learning

Reinforcement learning focuses on regimented learning processes, where a machine is provided with a set of actions, parameters and end values. It tries to explore different options and possibilities, monitoring and evaluating each result determining which one is optimal. It teaches the machine trial and error . It learns from past experiences and begins to adopt its approach in response to the changing situation to achieve the best possible result.

Stocks and Shares

The companies need a huge amount of capital to invest in their business for its growth and development. They raise this capital by selling parts of their businesses as shares, also known as equities. A person buying this 'share'owns a small part of this company and becomes a ' shareholder'. Stocks are basically collection of shares

Why is Stock Prediction Needed?

People invest in shares in order to yield profit by selling the shares at higher values when the price rises. However there is ups and downs in the market everyday (note:not all share markets are open all day everyday) because of imbalance in supply and demand. As a result there are chances of making loss as well. 

For example, buying a gold bar at ₹100 and selling it when the price rises to ₹110 can generate a profit of ₹10, similarly buying 10 gold bars will yield a profit of ₹1000.But in unfavorable situations, one may make a loss of ₹1000, if he buys 10 gold bars of ₹100 each and then the price falls to ₹90 , making a loss of ₹10 per share. 

Now nobody wants to make a loss. Therefore,the shareholders make predictions about rise in price in future by carefully examining the past records of the companies and decide when to buy and when to sell. This is where machine learning comes to play. 

Prediction of Stocks using Machine Learning

Machine learning can help us predict changes in the market by examining the past records of the company. Recurrent Neural Network or RNN has proven to be one of the most powerful models for processing sequential data. The Long Short Term Memory or LTSM is one of the most successful RNN architectures. Hence we shall use RNN and LTSM approach to predict stock market indices.

Recurrent Neural Network or RNN

RNNs, used in deep learning and development of models that stimulate the activity of neurons in the human brains, are especially powerful in cases when context is critical in predicting an outcome and are distinct from other types of artificial neural networks because they use feedback loops to process a sequence of data that informs the final output,which can be another sequence of data. The feedback loop allows information to persist, the effect being knwn as memory. 

A disadvantage of the process is that during back propagation the neural network may have vanishing gradient problem. 

Long Short Term Memory or LTSM

Introduced in 1997,by two German Researchers, Horchreiter and Schmidhuber, LTSM is a unique type of RNN that is capable of learning long term dependencies useful for predictions that requires the network to retain the data for longer period of time. 

LTSM has an internal mechanism called " Gates" That can regulate the flow of information and learn which data in a sequence is important to keep or throw away.

Now that's a lot of talks, lets quickly move to the coding part!

Code and Methodology

1. Loading Dataset

To start, we are going to load our dataset. We are going to be using NIFTY stock’s historical data.

import pandas as pd

dataset = pd.read_csv('NIFTY_Train.csv',index_col="Date",parse_dates=True)

2. Data Preprocessing

In this stage, we will be normalizing our data. We will also be cleaning our data. After the dataset is transformed, we will divide it into a training set and a testing set.

#Data cleaning

dataset.isna().any()# Feature Scaling Normalization

from sklearn.preprocessing import MinMaxScaler

sc = MinMaxScaler(feature_range = (0, 1))

training_set_scaled = sc.fit_transform(training_set)# Creating a data structure with 60 timesteps and 1 output

X_train = []

y_train = []

for i in range(60, 1258):

    X_train.append(training_set_scaled[i-60:i, 0])

    y_train.append(training_set_scaled[i, 0])

X_train, y_train = np.array(X_train), np.array(y_train)

# Reshaping

X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))

3. Feature Extraction

In this stage, we will feed different features to the neural network.

# Importing the Keras libraries and packages

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import LSTM

from keras.layers import Dropout

4. Training the Neural Network

Now we will feed our training data into the neural network. Our model is made up of a sequential input layer, 3 LSTM layers, and a dense layer.

# Initialising the RNN

regressor = Sequential()# Adding the first LSTM layer and some Dropout regularisation

regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))regressor.add(Dropout(0.2))

# Adding a second LSTM layer and some Dropout regularisation

regressor.add(LSTM(units = 50, return_sequences = True))


# Adding a third LSTM layer and some Dropout regularisation

regressor.add(LSTM(units = 50, return_sequences = True))


# Adding a fourth LSTM layer and some Dropout regularisation

regressor.add(LSTM(units = 50))


# Adding the output layer

regressor.add(Dense(units = 1))

# Compiling the RNN

regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fitting the RNN to the Training set, y_train, epochs = 100, batch_size = 32)


Different types of optimizers can greatly affect the network’s success. For this network, we will use Adam optimizer. This optimizer is a combination of two other optimizers: ADAgrad and RMSprop.

ADAgrad uses a different learning rate for every time step. This is since some parameters, especially ones that are infrequent require larger learning rates while others require smaller ones. RMSprop fixes a diminishing learning rate by only using a certain number of past gradients.

Adam or Adaptive Moment Estimation can be represented with the formula


We have to make sure that our weights don’t get too large and focus on one data point also known as overfitting. To stop this from happening we can include a penalty for large weights. For our purposes, we will use Tikhonov regularization.


A new way of preventing overfitting is by considering what happens when some neurons stop working. This means that our model becomes overdependent on some neurons. Dropouts make neurons much more robust and more accurate.

Output Generation

In this layer, the neural network’s output is compared to the target value. The error is minimized through backpropagation.

5. Visualization

# Visualising the results

plt.plot(real_stock_price, color = 'orange', label = 'Real NIFTY Stock Price')

plt.plot(predicted_stock_price, color = 'blue', label = 'Predicted NIFTY Stock Price')

plt.title('NIFTY Stock Price Prediction')


plt.ylabel('NIFTY Stock Price')


Predicted Graph:

Please note that the stock prices of the year 2017 and 2018 has only been used to train the neural network. To test, only the stock data of January 2019 has been considered. Recent stock prices have not been taken because of the fact that the prices have been quite unpredictable due to COVID19 situations. 


To calculate the probability for a profit in the stock market, predicting prices are necessary, and to predict a price, historical data is analysed. Hence, machine learning could be used here to predict stock prices. Our model that uses RNN and LTSM approach can be used to predict stocks more accurately. This would help in reducing the probability of making a loss.

Want to make a mobile application or a website on this topic? Drop a mail at with your idea.