Pattern Recognition in Stock Price Volatility and Market Performance

Jan 11, 2024 | machine-learning scikit-learn data-science web-scrape finance time-series-analysis


An in depth exploration of stock market data through feature engineering and pattern recognition. This study analyzes historical price changes and volatility to identify trends across diverse companies using custom statistical metrics and datetime features.

Abstract

This project explores the identification of patterns in stock market volatility through feature engineering and data analysis. Using historical financial data from the Nasdaq 100 index, the study examines the benefits of using percentage change in price over absolute price values to ensure fair comparison and better model scaling. The research introduces a window based pattern recognition approach to identify recurring price movements before significant returns or losses. By extracting datetime features and analyzing the relationship between volatility and specific time intervals, the project provides a foundation for developing automated trading strategies based on historical price behavior.

Overview

This project uses financial data obtained from Yahoo Finance using an open source library. The data will then be explored and pre-processed to be used as training and testing datasets for the implementation of machine learning models. The implementation of the machine learning models will be done in the second part of the coursework. The project plan is to prepare a dataset of the three most volatile companies from the Nasdaq 100 index technology sector to be used for the machine learning models. Then we compare the results of these trained machine learning models. We could then use the best performing model or use them all together as an ensemble of models along with some ruleset to trade the markets.

The following coursework will cover data selection, preparing the data, preprocessing the data, and some analysis for it.

Installing yFinance

# Installing yfinance
%pip install yfinance
Note: you may need to restart the kernel to use updated packages.Requirement already satisfied: yfinance in c:\users\edwin teoh\anaconda3\lib\site-packages (0.2.33)

Requirement already satisfied: beautifulsoup4>=4.11.1 in c:\users\edwin teoh\anaconda3\lib\site-packages (from yfinance) (4.11.1)
Requirement already satisfied: pandas>=1.3.0 in c:\users\edwin teoh\anaconda3\lib\site-packages (from yfinance) (1.4.2)
Requirement already satisfied: peewee>=3.16.2 in c:\users\edwin teoh\anaconda3\lib\site-packages (from yfinance) (3.17.0)
Requirement already satisfied: appdirs>=1.4.4 in c:\users\edwin teoh\anaconda3\lib\site-packages (from yfinance) (1.4.4)
Requirement already satisfied: pytz>=2022.5 in c:\users\edwin teoh\anaconda3\lib\site-packages (from yfinance) (2023.3.post1)
Requirement already satisfied: requests>=2.31 in c:\users\edwin teoh\anaconda3\lib\site-packages (from yfinance) (2.31.0)
Requirement already satisfied: frozendict>=2.3.4 in c:\users\edwin teoh\anaconda3\lib\site-packages (from yfinance) (2.4.0)
Requirement already satisfied: multitasking>=0.0.7 in c:\users\edwin teoh\anaconda3\lib\site-packages (from yfinance) (0.0.11)
Requirement already satisfied: html5lib>=1.1 in c:\users\edwin teoh\anaconda3\lib\site-packages (from yfinance) (1.1)
Requirement already satisfied: numpy>=1.16.5 in c:\users\edwin teoh\anaconda3\lib\site-packages (from yfinance) (1.26.2)
Requirement already satisfied: lxml>=4.9.1 in c:\users\edwin teoh\anaconda3\lib\site-packages (from yfinance) (5.0.0)
Requirement already satisfied: soupsieve>1.2 in c:\users\edwin teoh\anaconda3\lib\site-packages (from beautifulsoup4>=4.11.1->yfinance) (2.3.1)
Requirement already satisfied: six>=1.9 in c:\users\edwin teoh\anaconda3\lib\site-packages (from html5lib>=1.1->yfinance) (1.16.0)
Requirement already satisfied: webencodings in c:\users\edwin teoh\anaconda3\lib\site-packages (from html5lib>=1.1->yfinance) (0.5.1)
Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\edwin teoh\anaconda3\lib\site-packages (from pandas>=1.3.0->yfinance) (2.8.2)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\edwin teoh\anaconda3\lib\site-packages (from requests>=2.31->yfinance) (2.0.4)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\edwin teoh\anaconda3\lib\site-packages (from requests>=2.31->yfinance) (2021.10.8)
Requirement already satisfied: idna<4,>=2.5 in c:\users\edwin teoh\anaconda3\lib\site-packages (from requests>=2.31->yfinance) (3.3)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\edwin teoh\anaconda3\lib\site-packages (from requests>=2.31->yfinance) (1.26.9)

Importing Libraries

import numpy as np
from bs4 import BeautifulSoup
import requests
from matplotlib import pyplot as plt
import seaborn  as sns
import pandas as pd
import yfinance as yf

Note:

That the visualization may change due to the data being updated.

The changes may not be drastic as there are only a few values updated daily.

Choosing Data

The data that we will be using in this project are going to be financial stock data as the project involve using machine learning models to predict stock prices.

The data chosen are from Yahoo Finance, specifically from their library that can be directly installed in Python for ease of use. There are several methods that can also be used to obtain the data which will also be demonstrated later in the project. The data will also be pre-processed before it is used for the machine learning model.

The companies that are chosen for the project will be based on our analysis later to see which stocks from the Nasdaq 100 index, specifically the technology stocks, have the higher standard deviation. Stocks with higher standard deviation have more opportunities for shorter term trades that could be profitable. We will then use the top three stocks selected to be trained individually and another model with the Nasdaq 100 index to see which will perform better.

Getting the list of 30 Companies in Nasdaq 100 index

The website here was chosen as it lists the companies that are in the Nasdaq 100 index and specifically the technology sector. Given any large changes or news that affect any of the 30 companies listed here it should stay the same. But do take note that the list might change.

headers = {
    "user-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
}

tech_companies_site = 'https://finance.yahoo.com/quote/%5ENDXT/components?p=%5ENDXT'
tech_companies_request = requests.get(tech_companies_site,headers=headers)
# Checking if the request was successful
tech_companies_request.status_code
200
tech_companies_df = pd.read_html(tech_companies_request.content)
tech_companies_df[0]
   Symbol                                Company Name  Last Price  Change  \
0    ASML                           ASML Holding N.V.      717.75   -0.04
1     ADI                        Analog Devices, Inc.      188.44    0.07
2     MDB                               MongoDB, Inc.      393.49    0.35
3    META                        Meta Platforms, Inc.      370.12   -0.35
4    GOOG                               Alphabet Inc.      143.60   -0.20
5    ANSS                                 ANSYS, Inc.      355.54    0.68
6    NXPI                     NXP Semiconductors N.V.      210.87   -0.42
7    INTU                                 Intuit Inc.      607.46   -1.24
8   GOOGL                               Alphabet Inc.      141.97   -0.31
9     CDW                             CDW Corporation      220.99    0.71
10   QCOM                       QUALCOMM Incorporated      138.84   -0.47
11   MSFT                       Microsoft Corporation      384.45    1.68
12   AAPL                                  Apple Inc.      185.35   -0.85
13    AMD                Advanced Micro Devices, Inc.      147.71   -0.83
14   NVDA                          NVIDIA Corporation      546.79    3.29
15   DDOG                               Datadog, Inc.      120.92    0.73
16     MU                     Micron Technology, Inc.       83.09    0.71
17   CRWD                  CrowdStrike Holdings, Inc.      284.50    2.46
18   ADSK                              Autodesk, Inc.      240.93    2.11
19   KLAC                             KLA Corporation      561.04    4.95
20   AMAT                     Applied Materials, Inc.      151.24    1.43
21    GFS                        GLOBALFOUNDRIES Inc.       56.63   -0.55
22    TXN              Texas Instruments Incorporated      165.45   -1.80
23   LRCX                    Lam Research Corporation      758.89    9.20
24   DASH                              DoorDash, Inc.      104.10   -1.49
25    PDD                           PDD Holdings Inc.      151.64    2.45
26   AVGO                               Broadcom Inc.     1098.37   17.80
27     ON                ON Semiconductor Corporation       74.28   -1.33
28   CTSH  Cognizant Technology Solutions Corporation       74.71    1.36
29   PANW                    Palo Alto Networks, Inc.      322.81    6.72

   % Change    Volume
0    -0.01%    420852
1    +0.04%   1645541
2    +0.09%   1016993
3    -0.09%  14623797
4    -0.14%  13891962
5    +0.19%    612520
6    -0.20%    943793
7    -0.20%    858896
8    -0.22%  20136213
9    +0.32%    318119
10   -0.34%   4701678
11   +0.44%  21135744
12   -0.45%  36857156
13   -0.56%  57576536
14   +0.61%  55306006
15   +0.61%   2384169
16   +0.86%   7473666
17   +0.87%   2990299
18   +0.88%    648072
19   +0.89%    351522
20   +0.95%   3834207
21   -0.96%    433762
22   -1.08%   3070799
23   +1.23%    459184
24   -1.41%   1982696
25   +1.64%   4449454
26   +1.65%   1533869
27   -1.76%   7649211
28   +1.85%   1750842
29   +2.13%   4176703

Column Description
Symbol Short abbreviation of the company name.
Company Name Self Explanatory
Last Price The latest price. at the given time
Change The change in price based on the last price, at the given time
% Change The change in price based on the last price, in percentage, at the given time
Volume How much available share there is in the market at the given time
# Check if the columns are as shown, without spacing/blanks in the column name
tech_companies_df[0].columns
Index(['Symbol', 'Company Name', 'Last Price', 'Change', '% Change', 'Volume'], dtype='object')

We are only interested in getting the list of symbols

tech_companies_symbols_list = tech_companies_df[0]['Symbol'].sort_values()
tech_companies_symbols_list
12     AAPL
1       ADI
18     ADSK
20     AMAT
13      AMD
5      ANSS
0      ASML
26     AVGO
9       CDW
17     CRWD
28     CTSH
24     DASH
15     DDOG
21      GFS
4      GOOG
8     GOOGL
7      INTU
19     KLAC
23     LRCX
2       MDB
3      META
11     MSFT
16       MU
14     NVDA
6      NXPI
27       ON
29     PANW
25      PDD
10     QCOM
22      TXN
Name: Symbol, dtype: object

Using Yahoo Finance via Web Scraping

Given this link we can make some changes to it and then get the stock price data we need. But using the URL this way we would need to understand the time period that is used in the URL. We will need to convert our desired timeline into Unix timestamp. Furthermore, by using this method we are only limited to dataset intervals of one day, one week, or one month. Datasets with intervals of lower than one day cannot be accessed easily or at least based on the limited time we had to explore ways to scrape this. Yahoo Finance have also properly formatted the data so that we can easily use pd.read_html to extract the table into a DataFrame.

site = 'https://sg.finance.yahoo.com/quote/AAPL/history?period1=345427200&period2=1704153600&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true'
dataset_request = requests.get(site,headers=headers)
dataset_request.status_code
200
pd.read_html(dataset_request.content)[0]
                                                  Date  \
0                                          29 Dec 2023
1                                          28 Dec 2023
2                                          27 Dec 2023
3                                          26 Dec 2023
4                                          22 Dec 2023
..                                                 ...
96                                         15 Aug 2023
97                                         14 Aug 2023
98                                         11 Aug 2023
99                                         11 Aug 2023
100  *Close price adjusted for splits.**Close price...

                                                  Open  \
0                                               193.90
1                                               194.14
2                                               192.49
3                                               193.61
4                                               195.18
..                                                 ...
96                                              178.88
97                                              177.97
98                                              177.32
99                                       0.24 Dividend
100  *Close price adjusted for splits.**Close price...

                                                  High  \
0                                               194.40
1                                               194.66
2                                               193.50
3                                               193.89
4                                               195.41
..                                                 ...
96                                              179.48
97                                              179.69
98                                              178.62
99                                       0.24 Dividend
100  *Close price adjusted for splits.**Close price...

                                                   Low  \
0                                               191.73
1                                               193.17
2                                               191.09
3                                               192.83
4                                               192.97
..                                                 ...
96                                              177.05
97                                              177.31
98                                              176.55
99                                       0.24 Dividend
100  *Close price adjusted for splits.**Close price...

                                                Close*  \
0                                               192.53
1                                               193.58
2                                               193.15
3                                               193.05
4                                               193.60
..                                                 ...
96                                              177.45
97                                              179.46
98                                              177.79
99                                       0.24 Dividend
100  *Close price adjusted for splits.**Close price...

                                          Adj. close**  \
0                                               192.53
1                                               193.58
2                                               193.15
3                                               193.05
4                                               193.60
..                                                 ...
96                                              177.22
97                                              179.22
98                                              177.56
99                                       0.24 Dividend
100  *Close price adjusted for splits.**Close price...

                                                Volume
0                                             42628800
1                                             34049900
2                                             48087700
3                                             28919300
4                                             37122800
..                                                 ...
96                                            43622600
97                                            43675600
98                                            51988100
99                                       0.24 Dividend
100  *Close price adjusted for splits.**Close price...

[101 rows x 7 columns]

Now we try by using the period of 10 Aug 2023, until today, in unix timestamp, then setting the interval to 1h and changing 'AAPL' to 'C' (the ticker for Citibank), and changing both the interval and frequency to 1 hour.

site = 'https://sg.finance.yahoo.com/quote/AAPL/history?period1=1696934989&period2=1704883789&interval=1h&filter=history&frequency=1h&includeAdjustedClose=true'
dataset_request = requests.get(site,headers=headers)
dataset_request.status_code
200
pd.read_html(dataset_request.content)[0]
                                                Date  \
0                                        10 Nov 2023
1  *Close price adjusted for splits.**Close price...

                                                Open  \
0                                      0.24 Dividend
1  *Close price adjusted for splits.**Close price...

                                                High  \
0                                      0.24 Dividend
1  *Close price adjusted for splits.**Close price...

                                                 Low  \
0                                      0.24 Dividend
1  *Close price adjusted for splits.**Close price...

                                              Close*  \
0                                      0.24 Dividend
1  *Close price adjusted for splits.**Close price...

                                        Adj. close**  \
0                                      0.24 Dividend
1  *Close price adjusted for splits.**Close price...

                                              Volume
0                                      0.24 Dividend
1  *Close price adjusted for splits.**Close price...

But now, we try once more, using the same time period, but changing the interval and frequency in the URL, back tto '1d' from '1h'

site = 'https://sg.finance.yahoo.com/quote/AAPL/history?period1=1696934989&period2=1704883789&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true'
dataset_request = requests.get(site,headers=headers)
dataset_request.status_code
200
pd.read_html(dataset_request.content)[0]
                                                 Date  \
0                                         09 Jan 2024
1                                         08 Jan 2024
2                                         05 Jan 2024
3                                         04 Jan 2024
4                                         03 Jan 2024
..                                                ...
60                                        13 Oct 2023
61                                        12 Oct 2023
62                                        11 Oct 2023
63                                        10 Oct 2023
64  *Close price adjusted for splits.**Close price...

                                                 Open  \
0                                              183.92
1                                              182.09
2                                              181.99
3                                              182.15
4                                              184.22
..                                                ...
60                                             181.42
61                                             180.07
62                                             178.20
63                                             178.10
64  *Close price adjusted for splits.**Close price...

                                                 High  \
0                                              185.15
1                                              185.60
2                                              182.76
3                                              183.09
4                                              185.88
..                                                ...
60                                             181.93
61                                             182.34
62                                             179.85
63                                             179.72
64  *Close price adjusted for splits.**Close price...

                                                  Low  \
0                                              182.73
1                                              181.50
2                                              180.17
3                                              180.88
4                                              183.43
..                                                ...
60                                             178.14
61                                             179.04
62                                             177.60
63                                             177.95
64  *Close price adjusted for splits.**Close price...

                                               Close*  \
0                                              185.14
1                                              185.56
2                                              181.18
3                                              181.91
4                                              184.25
..                                                ...
60                                             178.85
61                                             180.71
62                                             179.80
63                                             178.39
64  *Close price adjusted for splits.**Close price...

                                         Adj. close**  \
0                                              185.14
1                                              185.56
2                                              181.18
3                                              181.91
4                                              184.25
..                                                ...
60                                             178.61
61                                             180.47
62                                             179.56
63                                             178.16
64  *Close price adjusted for splits.**Close price...

                                               Volume
0                                            42841800
1                                            59144500
2                                            62303300
3                                            71983600
4                                            58414500
..                                                ...
60                                           51427100
61                                           56743100
62                                           47551100
63                                           43698000
64  *Close price adjusted for splits.**Close price...

[65 rows x 7 columns]

As shown, the granularity required from the dataset, cannot achieve by using this method.

Using yfinance to extract data

Using this method is easier and faster. But there are also some limitations faced. Given the interval that is chosen to be downloaded, the timeframe or the duration of the data gets shorter. For example, a daily interval dataset can be downloaded with the full timeline of a stock. If we want to have a dataset that has an interval of five minutes we can only get a maximum of 60 days. It is also understandable that the dataset size also increases significantly the lower the interval we go. Additionally, the more granular the dataset is, the more likely it is that we will have to either make a purchase or subscribe to such access of information.

aapl = yf.Ticker("AAPL")
aapl.history(period='70d',interval='5m')
AAPL: 5m data not available for startTime=1698958004 and endTime=1705006004. The requested range must be within the last 60 days.
Empty DataFrame
Columns: [Open, High, Low, Close, Adj Close, Volume]
Index: []
aapl.history(period='5y',interval='1h')
AAPL: 1h data not available for startTime=1547326005 and endTime=1705006005. The requested range must be within the last 730 days.
Empty DataFrame
Columns: [Open, High, Low, Close, Adj Close, Volume]
Index: []
aapl.history(period='7d',interval='1m')
                                 Open        High         Low       Close  \
Datetime
2024-01-03 09:30:00-05:00  184.220001  184.740005  183.860001  184.690002
2024-01-03 09:31:00-05:00  184.679993  185.220001  184.470001  185.100098
2024-01-03 09:32:00-05:00  185.100006  185.330002  184.929993  185.259995
2024-01-03 09:33:00-05:00  185.270004  185.380005  185.080002  185.360001
2024-01-03 09:34:00-05:00  185.365005  185.688705  185.270004  185.630005
...                               ...         ...         ...         ...
2024-01-11 15:42:00-05:00  185.429993  185.464996  185.375000  185.399994
2024-01-11 15:43:00-05:00  185.395004  185.419998  185.350006  185.409195
2024-01-11 15:44:00-05:00  185.399994  185.500000  185.399994  185.414993
2024-01-11 15:45:00-05:00  185.419998  185.470001  185.360001  185.384995
2024-01-11 15:46:00-05:00  185.370102  185.370102  185.370102  185.370102

                            Volume  Dividends  Stock Splits
Datetime
2024-01-03 09:30:00-05:00  2565996        0.0           0.0
2024-01-03 09:31:00-05:00   492296        0.0           0.0
2024-01-03 09:32:00-05:00   353071        0.0           0.0
2024-01-03 09:33:00-05:00   295721        0.0           0.0
2024-01-03 09:34:00-05:00   492106        0.0           0.0
...                            ...        ...           ...
2024-01-11 15:42:00-05:00    80598        0.0           0.0
2024-01-11 15:43:00-05:00    74422        0.0           0.0
2024-01-11 15:44:00-05:00    78927        0.0           0.0
2024-01-11 15:45:00-05:00    73016        0.0           0.0
2024-01-11 15:46:00-05:00        0        0.0           0.0

[2711 rows x 7 columns]
aapl.history(period='8d',interval='1m')
AAPL: 1m data not available for startTime=1704314805 and endTime=1705006005. Only 7 days worth of 1m granularity data are allowed to be fetched per request.
Empty DataFrame
Columns: [Open, High, Low, Close, Adj Close, Volume]
Index: []
aapl = yf.Ticker("AAPL")
aapl.history(period='730d',interval='1h')
                                 Open        High         Low       Close  \
Datetime
2021-02-18 09:30:00-05:00  129.199997  129.679993  127.410004  128.119904
2021-02-18 10:30:00-05:00  128.100006  129.104996  127.830002  128.675705
2021-02-18 11:30:00-05:00  128.669998  128.710007  127.900002  128.620102
2021-02-18 12:30:00-05:00  128.625000  129.070007  128.449997  128.994995
2021-02-18 13:30:00-05:00  129.000000  129.050003  128.449997  128.710007
...                               ...         ...         ...         ...
2024-01-11 11:30:00-05:00  184.184998  184.449997  183.835007  184.050003
2024-01-11 12:30:00-05:00  184.074997  184.777206  184.074997  184.654999
2024-01-11 13:30:00-05:00  184.669998  185.539993  184.559998  185.514999
2024-01-11 14:30:00-05:00  185.520004  185.699997  184.755005  185.369995
2024-01-11 15:30:00-05:00  185.369995  185.559998  185.350006  185.370102

                             Volume  Dividends  Stock Splits
Datetime
2021-02-18 09:30:00-05:00  28938326        0.0           0.0
2021-02-18 10:30:00-05:00  13477687        0.0           0.0
2021-02-18 11:30:00-05:00  12770713        0.0           0.0
2021-02-18 12:30:00-05:00   8465690        0.0           0.0
2021-02-18 13:30:00-05:00   8516504        0.0           0.0
...                             ...        ...           ...
2024-01-11 11:30:00-05:00   3849444        0.0           0.0
2024-01-11 12:30:00-05:00   4000614        0.0           0.0
2024-01-11 13:30:00-05:00   3924789        0.0           0.0
2024-01-11 14:30:00-05:00   4502053        0.0           0.0
2024-01-11 15:30:00-05:00   1503923        0.0           0.0

[5094 rows x 7 columns]

The decision to use 1 hour interval, and about 2 years to 3 years of data, will be used in the project. This is because the timeframe is fast enough where we can try to implement a system or program that can support our decisions for short term trading, and that there will be enough data to experiment. If the timeframe was too long such as 1 day for the span of 10 year, then the program will at most try to make daily or weekly decisions. Going with an hourly interval dataset, allow us to explore an hourly model or system. If we would like to take it slow and and make decisions based on a longer timeframe, we still have to option to do so.

tech_companies_history_list = []

for tech_company in tech_companies_symbols_list:
    temp = yf.Ticker(tech_company)
    tech_companies_history_list.append((tech_company, temp.history(period='730d',interval='1h')))

tech_companies_history_list[0]
GFS: 1h data not available for startTime=1635427800 and endTime=1705006008. The requested range must be within the last 730 days.
('AAPL',
                                  Open        High         Low       Close  \
 Datetime
 2021-02-18 09:30:00-05:00  129.199997  129.679993  127.410004  128.119904
 2021-02-18 10:30:00-05:00  128.100006  129.104996  127.830002  128.675705
 2021-02-18 11:30:00-05:00  128.669998  128.710007  127.900002  128.620102
 2021-02-18 12:30:00-05:00  128.625000  129.070007  128.449997  128.994995
 2021-02-18 13:30:00-05:00  129.000000  129.050003  128.449997  128.710007
 ...                               ...         ...         ...         ...
 2024-01-11 11:30:00-05:00  184.184998  184.449997  183.835007  184.050003
 2024-01-11 12:30:00-05:00  184.074997  184.777206  184.074997  184.654999
 2024-01-11 13:30:00-05:00  184.669998  185.539993  184.559998  185.514999
 2024-01-11 14:30:00-05:00  185.520004  185.699997  184.755005  185.369995
 2024-01-11 15:30:00-05:00  185.369995  185.559998  185.350006  185.370102

                              Volume  Dividends  Stock Splits
 Datetime
 2021-02-18 09:30:00-05:00  28938326        0.0           0.0
 2021-02-18 10:30:00-05:00  13477687        0.0           0.0
 2021-02-18 11:30:00-05:00  12770713        0.0           0.0
 2021-02-18 12:30:00-05:00   8465690        0.0           0.0
 2021-02-18 13:30:00-05:00   8516504        0.0           0.0
 ...                             ...        ...           ...
 2024-01-11 11:30:00-05:00   3849444        0.0           0.0
 2024-01-11 12:30:00-05:00   4000614        0.0           0.0
 2024-01-11 13:30:00-05:00   3924789        0.0           0.0
 2024-01-11 14:30:00-05:00   4502053        0.0           0.0
 2024-01-11 15:30:00-05:00   1503923        0.0           0.0

 [5094 rows x 7 columns])
Column Description
Open The price at the start of the interval
High The highest price achieve within the given interval
Low The lowest price achieve within the given interval
Close The price at the end of the interval
Volume The number of share available in the market
Dvidends How much returns are given back to investor, per share.
Stock Spits How many shares gained per share

The interval mentioned would be 1 hour, as that is the interval chosen for this coursework. But the interval could be 4 hours, 1 day, 1 week, or as low as 1 minute, 1 second or even down to per tick. So the open, high, low, close prices are highly likely to be different the larger the interval selected.

## Checking the number of rows for each company
[len(tech_company[1]) for tech_company in tech_companies_history_list]
[5094,
 5094,
 5094,
 5080,
 5094,
 5094,
 5094,
 5094,
 5094,
 5094,
 5094,
 5094,
 5094,
 0,
 5094,
 5094,
 5066,
 5094,
 5094,
 5094,
 5094,
 5094,
 5085,
 5094,
 5094,
 5092,
 5094,
 5094,
 5094,
 5094]
max_num_row = pd.DataFrame([len(tech_company[1]) for tech_company in tech_companies_history_list]).max()
max_num_row[0]
5094
tech_companies = list(
    filter(
        lambda x: len(x[1]) == max_num_row[0],
        tech_companies_history_list
    )
)

print("The number of companies left: ", len(tech_companies), end='\n\n')
print("The following are the remaining companies: ")
print(*[x[0] for x in tech_companies],sep=', ')
The number of companies left:  25

The following are the remaining companies:
AAPL, ADI, ADSK, AMD, ANSS, ASML, AVGO, CDW, CRWD, CTSH, DASH, DDOG, GOOG, GOOGL, KLAC, LRCX, MDB, META, MSFT, NVDA, NXPI, PANW, PDD, QCOM, TXN

Using change in price

There are many tutorials and courses out there that use stock prices as input, labels and predictors for all kinds of machine learning models. But there are some problems with using prices.

Prices are absolute

Because using the price of the stock at the given time, is not a fair comparison against other stocks. Stocks can have stock splits, which divides the share price by the number of shares, a single share is split into. Some companies do not soar to higher number than others. For example, Berkshire Hathaway Inc Class A shares (Ticker Symbol: BRK.A), as of 6 Jan 2024, they are currently priced at $554,300 per share, while Google Shares are priced at $135.

Scaling

It is also common practice to use some form of scaling to bring value of features closer to one another so that the machine learning models can proper consider all features equally. If we were to use min max scaling, the problem is that any timeframe, there might be a new lowest low and a new highest high. There is no limit to price movement. The model will have difficulty in making prediction with unseen data that are outside the training dataset. If we would like to compare other stock data as well, ones 0.5 might represent a different price from another. Standardization, or the Standard Scaler, has also got similar issue. Most stock data have an upward trend, because in the long term, stock prices do go up. Therefore, stock data based on price, will have increasing mean/average, which indicate that overtime, the model's understanding of value will not represent the same price as it was trained with as well.

Change in Value matters

A trader or an investor still make the same amount of money, given that a stock that cost $100 increases by $10 compared to a stock that cost $10 and increase by $1. Furthermore, with popularity of many financial services and brokers providing the ability to purchase fractional shares (the ability to purchase parts of a stock/share), the absolute price of a stock is not as relevant. A trader or an investor can still reap the benefit of holding Berkshire Hathaway Inc Class A shares with $10. Traders and investors alike can focus on their analysis and finding companies and opportunities without worrying the magnitude of the share price.

Therefore, we will be using percentage change in price as an input.

# Use a single index to access the company ticker and dataset
tech_companies[24]
('TXN',
                                  Open        High         Low       Close  \
 Datetime
 2021-02-18 09:30:00-05:00  177.429993  177.520004  174.250000  174.570007
 2021-02-18 10:30:00-05:00  174.589996  175.550003  174.220001  174.970001
 2021-02-18 11:30:00-05:00  175.020004  175.259995  174.199997  175.149994
 2021-02-18 12:30:00-05:00  175.156906  176.149994  174.500000  176.080002
 2021-02-18 13:30:00-05:00  176.100006  176.580002  175.639999  176.020004
 ...                               ...         ...         ...         ...
 2024-01-11 11:30:00-05:00  165.839996  165.880005  165.029999  165.339996
 2024-01-11 12:30:00-05:00  165.339996  165.860001  165.289993  165.619995
 2024-01-11 13:30:00-05:00  165.600006  166.270004  165.479996  166.119995
 2024-01-11 14:30:00-05:00  166.115005  166.440002  165.440002  165.520004
 2024-01-11 15:30:00-05:00  165.522797  165.766006  165.410004  165.470001

                            Volume  Dividends  Stock Splits
 Datetime
 2021-02-18 09:30:00-05:00  607178        0.0           0.0
 2021-02-18 10:30:00-05:00  443595        0.0           0.0
 2021-02-18 11:30:00-05:00  313774        0.0           0.0
 2021-02-18 12:30:00-05:00  513549        0.0           0.0
 2021-02-18 13:30:00-05:00  410585        0.0           0.0
 ...                           ...        ...           ...
 2024-01-11 11:30:00-05:00  558498        0.0           0.0
 2024-01-11 12:30:00-05:00  503848        0.0           0.0
 2024-01-11 13:30:00-05:00  347788        0.0           0.0
 2024-01-11 14:30:00-05:00  490683        0.0           0.0
 2024-01-11 15:30:00-05:00  183174        0.0           0.0

 [5094 rows x 7 columns])
# Use two index to access the company dataset
# Checking the columns
tech_companies[0][1].columns
Index(['Open', 'High', 'Low', 'Close', 'Volume', 'Dividends', 'Stock Splits'], dtype='object')

Plotting two companies' absolute share price

# Datetime as x
# Close price of 2 stock for y
datetime = tech_companies[0][1].index
close_price1 = tech_companies[0][1]['Close']
close_price2 = tech_companies[4][1]['Close']

# Plotting the 2 stock prices
plt.figure(figsize=(20, 8))
plt.plot(datetime, close_price1,label=tech_companies[0][0])
plt.plot(datetime, close_price2,label=tech_companies[4][0])
plt.legend()
Visualization 1

Adding % Change in Close Price columns

df1 = tech_companies[0][1]

df1['Close']
Datetime
2021-02-18 09:30:00-05:00    128.119904
2021-02-18 10:30:00-05:00    128.675705
2021-02-18 11:30:00-05:00    128.620102
2021-02-18 12:30:00-05:00    128.994995
2021-02-18 13:30:00-05:00    128.710007
                                ...
2024-01-11 11:30:00-05:00    184.050003
2024-01-11 12:30:00-05:00    184.654999
2024-01-11 13:30:00-05:00    185.514999
2024-01-11 14:30:00-05:00    185.369995
2024-01-11 15:30:00-05:00    185.370102
Name: Close, Length: 5094, dtype: float64
df1['Close'].shift(1)
Datetime
2021-02-18 09:30:00-05:00           NaN
2021-02-18 10:30:00-05:00    128.119904
2021-02-18 11:30:00-05:00    128.675705
2021-02-18 12:30:00-05:00    128.620102
2021-02-18 13:30:00-05:00    128.994995
                                ...
2024-01-11 11:30:00-05:00    184.190002
2024-01-11 12:30:00-05:00    184.050003
2024-01-11 13:30:00-05:00    184.654999
2024-01-11 14:30:00-05:00    185.514999
2024-01-11 15:30:00-05:00    185.369995
Name: Close, Length: 5094, dtype: float64
temp_change_of_price = ((df1['Close'] - df1['Close'].shift(1)) / df1['Close'].shift(1))
temp_change_of_price
Datetime
2021-02-18 09:30:00-05:00             NaN
2021-02-18 10:30:00-05:00    4.338135e-03
2021-02-18 11:30:00-05:00   -4.321175e-04
2021-02-18 12:30:00-05:00    2.914732e-03
2021-02-18 13:30:00-05:00   -2.209298e-03
                                 ...
2024-01-11 11:30:00-05:00   -7.600814e-04
2024-01-11 12:30:00-05:00    3.287127e-03
2024-01-11 13:30:00-05:00    4.657337e-03
2024-01-11 14:30:00-05:00   -7.816310e-04
2024-01-11 15:30:00-05:00    5.762072e-07
Name: Close, Length: 5094, dtype: float64
df1['% Change in Close Price'] = temp_change_of_price
df1['% Change in Close Price']
Datetime
2021-02-18 09:30:00-05:00             NaN
2021-02-18 10:30:00-05:00    4.338135e-03
2021-02-18 11:30:00-05:00   -4.321175e-04
2021-02-18 12:30:00-05:00    2.914732e-03
2021-02-18 13:30:00-05:00   -2.209298e-03
                                 ...
2024-01-11 11:30:00-05:00   -7.600814e-04
2024-01-11 12:30:00-05:00    3.287127e-03
2024-01-11 13:30:00-05:00    4.657337e-03
2024-01-11 14:30:00-05:00   -7.816310e-04
2024-01-11 15:30:00-05:00    5.762072e-07
Name: % Change in Close Price, Length: 5094, dtype: float64
# Initializing the dataframe of 2 stocks
df1 = tech_companies[0][1]
df2 = tech_companies[4][1]

# Create the % Change in Close price columns
df1['% Change in Close Price'] = None
df2['% Change in Close Price'] = None

# Using the change in price (current) and price (previous interval) and divided by the price (previous)
df1['% Change in Close Price'] = (df1['Close'] - df1['Close'].shift(1))/ df1['Close'].shift(1)
df2['% Change in Close Price'] = (df2['Close'] - df2['Close'].shift(1))/ df2['Close'].shift(1)

display(df1[['Close', '% Change in Close Price']])
display(df2[['Close', '% Change in Close Price']])
# Removing top row as it is not needed and used.
df1 = df1[1:]
df2 = df2[1:]

display(df1[['Close', '% Change in Close Price']])
display(df2[['Close', '% Change in Close Price']])

Plotting both the % Change in value of 2 stocks

plt.figure(figsize=(20, 8))
plt.plot(df1.index, df1['% Change in Close Price'],label=tech_companies[0][0])
plt.plot(df1.index, df2['% Change in Close Price'],label=tech_companies[4][0])
plt.legend()
Visualization 2

As compared to using only price, it can now be fairly compared. The spikes can be seen as large movement and these are the spikes that we would wish to identify. We can try to identify characteristics of such spikes, or even just above average returns. It also now resemble a lot like noise or random data with a very small hints of trends. Once the data is in this form, there are many ways we could further process this as well.

Frequency Distribution

When absolute price is plotted in a frequency distribution, it resemble a crown, because there are average prices that the stock will range between until some large price movement, shifting the average to another price level. Hence, it is like seeing multiple different normal distribution throughout.

plt.figure(figsize=(20, 7))

plt.hist(df1['Close'],bins=100,alpha=0.7,label=tech_companies[0][0])
plt.hist(df2['Close'],bins=100,alpha=0.7,label=tech_companies[4][0])
plt.legend()
plt.show()
Visualization 3
plt.figure(figsize=(20, 7))

plt.hist(df1['% Change in Close Price'],bins=100,label=tech_companies[0][0])
plt.hist(df2['% Change in Close Price'],bins=100,alpha=0.7,label=tech_companies[4][0])
plt.axvline(x = 0, linestyle = '-', color ='darkblue')
plt.legend()
plt.show()
Visualization 4
plt.figure(figsize=(20, 7))

plt.hist(df1['% Change in Close Price'],bins=500,label=tech_companies[0][0])
plt.hist(df2['% Change in Close Price'],bins=500,alpha=0.7,label=tech_companies[4][0])
plt.axvline(x = 0, linestyle = '-', color ='darkblue')
plt.legend()
plt.show()
Visualization 5

By using percentage change in price and plotting it in a frequency distribution, we can see that it resembles very much like a normal distribution. In a single glance we can see which stock has more price movement over the other. We can see if it is more skewed to downward price movements or positive movements. Exploration on using methods and models that can take advantage of the normal distribution would be advantageous but it will not be what we will do for this project.

Looking for patterns in % Change in Price, with % Change in Price

Next we will explore some patterns of the percentage change in price to see when it is going to have a positive or negative change. This is achieved by having a window size and desired return. The window size is how far back we want to look to identify a pattern and the desired return will be used as a threshold to filter on the minimum percentage change we would like to see. For example if we have a window size of three and desired return of five percent we will be looping through the percentage change. For each instance of five percent or more we will look back at the last 10 hours of percentage change.

Using the window size of 3 and desired return of 5%, we will use the example list below.

{1%,2%,6%,−3%,2%,4%,7%,1%,−2%,2%,3%,9%} \{1\%, 2\%, 6\%, -3\%, 2\%, 4\%, 7\%, 1\%, -2\%, 2\%, 3\%, 9\% \}
  1. It will loop through each value. The first value in the list that satisfies the desired return of five percent is the third element. Since we have a window size of three we cannot store the first instance. We continue looping until the next value which is the seventh element.
{1%,2%‾,6%,−3%,2%,4%,7%,1%,−2%,2%,3%,9%}{1%,2%,6%,−3%,2%,4%‾,7%,1%,−2%,2%,3%,9%} \{\underline{1\%, 2\%}, {\LARGE 6\%}, -3\%, 2\%, 4\%, 7\%, 1\%, -2\%, 2\%, 3\%, 9\% \} \\ \{1\%, 2\%, 6\%, \underline{-3\%, 2\%, 4\%}, {\LARGE 7\%}, 1\%, -2\%, 2\%, 3\%, 9\% \}
  1. Store the last 3 values before 7%. {−3%,2%,4%}\{-3\%, 2\%, 4\% \}

  2. Then we continue the loop for the next value that satisfies the condition. The next one is the last value. We will then store the last three values.

{1%,2%,6%,−3%,2%,4%,7%,1%,−2%,2%,3%‾,9%} \{1\%, 2\%, 6\%, -3\%, 2\%, 4\%, 7\%, 1\%, \underline{-2\%, 2\%, 3\%}, {\LARGE 9\%} \}
  1. Which will give us the final list of
{{−3%,2%,4%},{−2%,2%,3%}} \{ \{-3\%, 2\%, 4\% \}, \\ \{-2\%, 2\%, 3\% \} \}
  1. Analyze the list. Given this toy example, we can see that a negative change, then a 2 percent change with a 3-4 percent change may lead to a high percent change.

We will explore using a 48 hour window with a desired return of three percent. This is a good number if we would like to have a higher chance of making a return. As shown in the frequency distribution above the higher the return the less likely it will occur. Furthermore if we were to use such a high return there will be too little data to create an average pattern.

# List to store the patterns found
patterns_of_higher_return = []
changes_in_price = df1['% Change in Close Price'].to_list()

# Setting window size and desired return
window_size = 48
desired_returns = 0.03

# Loop the column to find the those that have equal or more than desired returns
for row_no, returns in enumerate(changes_in_price):
    if row_no <= window_size:
        continue
    else:
        if returns >= desired_returns:
            patterns_of_higher_return.append(changes_in_price[row_no-window_size-1:row_no])

# List to store average pattern
average_pattern_of_higher_return = []

# Loop through a list from 0 to window size, indicating the value on the x axis
for no in range(window_size):
    temp = 0

    # Loop through the list of patterns and store all the value in the given x axis
    for pattern in patterns_of_higher_return:
        temp += pattern[no]

    # Get the average of the values collected in the given x axis
    average_pattern_of_higher_return.append(temp/len(patterns_of_higher_return))

# Set chart size
plt.figure(figsize=(15, 9))

# Plot the average
for i in patterns_of_higher_return:
    plt.plot(i,'C0',alpha=0.1)

# Plot the patterns found
plt.plot(average_pattern_of_higher_return,'darkblue',alpha=0.3,label='Average')
plt.scatter([x for x in range(window_size)],average_pattern_of_higher_return,color='darkblue',alpha=0.3,label='Average')
plt.xticks(np.arange(0, window_size, 2.0))
plt.legend()
plt.show()
Visualization 6

Using the same thing as above, but we look into patterns of losses, are there also patterns that can indicate a negative percentage change?

# List to store the patterns found
patterns_of_higher_losses = []
prices = df1['% Change in Close Price'].to_list()

# Setting window size and desired return
window_size = 48
desired_returns = -0.03

# Loop the column to find the those that have equal or more than desired returns
for row_no, returns in enumerate(prices):
    if row_no <= window_size:
        continue
    else:
        if returns <= desired_returns:
            patterns_of_higher_losses.append(prices[row_no-window_size-1:row_no])

# List to store average pattern
average_pattern_of_higher_losses = []

# Loop through a list from 0 to window size, indicating the value on the x axis
for no in range(window_size):
    temp = 0

    # Loop through the list of patterns and store all the value in the given x axis
    for pattern in patterns_of_higher_losses:
        temp += pattern[no]

    # Get the average of the values collected in the given x axis
    average_pattern_of_higher_losses.append(temp/len(patterns_of_higher_losses))

# Set chart size
plt.figure(figsize=(15,9))

# Plot the average
for i in patterns_of_higher_losses:
    plt.plot(i,'C0',alpha=0.1)

# Plot the patterns found
plt.plot(average_pattern_of_higher_losses,'darkblue',alpha=0.3,label='Average')
plt.scatter([x for x in range(window_size)],average_pattern_of_higher_losses,color='darkblue',alpha=0.3,label='Average')
plt.xticks(np.arange(0, window_size, 2.0))
plt.legend()
plt.show()
Visualization 7

It seems like they do have different areas of peaks and trough. This means that these differences might be able to be used as early indicators for positive or negative change. We can have a clearer view by stacking both of them together.

# Set the chart size
plt.figure(figsize=(15,9))

# Plot both the averages
plt.plot(average_pattern_of_higher_losses)
plt.plot(average_pattern_of_higher_return)
plt.scatter([x for x in range(window_size)],average_pattern_of_higher_losses,color='darkblue',alpha=0.5,label='Average')
plt.scatter([x for x in range(window_size)],average_pattern_of_higher_return,color='darkorange',alpha=0.5,label='Average')
plt.xticks(np.arange(0, window_size, 2.0))
plt.show()
Visualization 8

From here we might even be able to use some form of image recognition models or even use dynamic time warping to match the average recurring pattern found here to create a trading strategy. The difference between the pattern identifying a positive percentage change and a negative change seem to have some distinction. It can be seen that there are several areas of opposite values which can be used as an indicator that the next predicted value would be a positive or negative one.

Using Scaled Price

Using the same codes as above, we implement and explore what if we were to use price instead. We will need to scaled the pattern to values between 0 - 1, so that we can compare the different price movement occurring before the desired return. A 5% return but the pattern before it could [1,2,3] and [100,110,113]. If we do not scale the lists individually, we cannot see any meaningful difference among them.

patterns_of_higher_return = []
prices = df1['Close']
return_percentage = df1['% Change in Close Price'].to_list()

window_size = 48
desired_returns = 0.03

for row_no, returns in enumerate(return_percentage):
    if row_no <= window_size:
        continue
    else:
        if returns >= desired_returns:
            temp = prices[row_no-window_size:row_no]
            maxtemp = temp.max()
            mintemp = temp.min()

            patterns_of_higher_return.append(
                [((x-mintemp)/(maxtemp - mintemp))*1 for x in temp]
            )

average_pattern_of_higher_return = []

for no in range(window_size):
    temp = 0
    for pattern in patterns_of_higher_losses:
        temp += pattern[no]
    average_pattern_of_higher_losses.append(temp/len(patterns_of_higher_losses))

plt.figure(figsize=(20, 7))
x_values = [x for x in range(window_size)]

for i in patterns_of_higher_return:
    plt.plot(x_values,i,'C1',alpha=0.2)
plt.show()
Visualization 9

By using normal price, even when it is scaled, it is hard to further identify some form of repeatable pattern.

We will explore this again in the later part of the project, after extracting datetime features.

Adding % Change in Close Price for all stocks

change_in_close_price_str = '% Change in Close Price'
reference_column = 'Close'

print('Before adding the column:')
print('Number of Column: ', [len(tech_company[1].columns) for tech_company in tech_companies])

for tech_company in tech_companies:

    temp = tech_company[1]
    temp[change_in_close_price_str] = (temp[reference_column] - temp[reference_column].shift(1))/temp[reference_column].shift(1)
    tech_company[1][change_in_close_price_str] = temp[change_in_close_price_str]

print()

print('After adding the column:')
print('Number of Column: ', [len(tech_company[1].columns) for tech_company in tech_companies])
Before adding the column:
Number of Column:  [8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8]

After adding the column:
Number of Column:  [8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8]
Open Price
change_in_close_price_str = '% Change in Open Price'
reference_column = 'Open'

print('Before adding the column:')
print('Number of Column: ', [len(tech_company[1].columns) for tech_company in tech_companies])

for tech_company in tech_companies:

    temp = tech_company[1]
    temp[change_in_close_price_str] = (temp[reference_column] - temp[reference_column].shift(1))/temp[reference_column].shift(1)
    tech_company[1][change_in_close_price_str] = temp[change_in_close_price_str]

print()

print('After adding the column:')
print('Number of Column: ', [len(tech_company[1].columns) for tech_company in tech_companies])
Before adding the column:
Number of Column:  [8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8]

After adding the column:
Number of Column:  [9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
Low Price
change_in_close_price_str = '% Change in Low Price'
reference_column = 'Low'

print('Before adding the column:')
print('Number of Column: ', [len(tech_company[1].columns) for tech_company in tech_companies])

for tech_company in tech_companies:

    temp = tech_company[1]
    temp[change_in_close_price_str] = (temp[reference_column] - temp[reference_column].shift(1))/temp[reference_column].shift(1)
    tech_company[1][change_in_close_price_str] = temp[change_in_close_price_str]

print()

print('After adding the column:')
print('Number of Column: ', [len(tech_company[1].columns) for tech_company in tech_companies])
Before adding the column:
Number of Column:  [9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9]

After adding the column:
Number of Column:  [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10]
High Price
change_in_close_price_str = '% Change in High Price'
reference_column = 'High'

print('Before adding the column:')
print('Number of Column: ', [len(tech_company[1].columns) for tech_company in tech_companies])

for tech_company in tech_companies:

    temp = tech_company[1]
    temp[change_in_close_price_str] = (temp[reference_column] - temp[reference_column].shift(1))/temp[reference_column].shift(1)
    tech_company[1][change_in_close_price_str] = temp[change_in_close_price_str]

print()

print('After adding the column:')
print('Number of Column: ', [len(tech_company[1].columns) for tech_company in tech_companies])
Before adding the column:
Number of Column:  [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10]

After adding the column:
Number of Column:  [11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11]
tech_companies[0][1]
                                 Open        High         Low       Close  \
Datetime
2021-02-18 09:30:00-05:00  129.199997  129.679993  127.410004  128.119904
2021-02-18 10:30:00-05:00  128.100006  129.104996  127.830002  128.675705
2021-02-18 11:30:00-05:00  128.669998  128.710007  127.900002  128.620102
2021-02-18 12:30:00-05:00  128.625000  129.070007  128.449997  128.994995
2021-02-18 13:30:00-05:00  129.000000  129.050003  128.449997  128.710007
...                               ...         ...         ...         ...
2024-01-11 11:30:00-05:00  184.184998  184.449997  183.835007  184.050003
2024-01-11 12:30:00-05:00  184.074997  184.777206  184.074997  184.654999
2024-01-11 13:30:00-05:00  184.669998  185.539993  184.559998  185.514999
2024-01-11 14:30:00-05:00  185.520004  185.699997  184.755005  185.369995
2024-01-11 15:30:00-05:00  185.369995  185.559998  185.350006  185.370102

                             Volume  Dividends  Stock Splits  \
Datetime
2021-02-18 09:30:00-05:00  28938326        0.0           0.0
2021-02-18 10:30:00-05:00  13477687        0.0           0.0
2021-02-18 11:30:00-05:00  12770713        0.0           0.0
2021-02-18 12:30:00-05:00   8465690        0.0           0.0
2021-02-18 13:30:00-05:00   8516504        0.0           0.0
...                             ...        ...           ...
2024-01-11 11:30:00-05:00   3849444        0.0           0.0
2024-01-11 12:30:00-05:00   4000614        0.0           0.0
2024-01-11 13:30:00-05:00   3924789        0.0           0.0
2024-01-11 14:30:00-05:00   4502053        0.0           0.0
2024-01-11 15:30:00-05:00   1503923        0.0           0.0

                           % Change in Close Price  % Change in Open Price  \
Datetime
2021-02-18 09:30:00-05:00                      NaN                     NaN
2021-02-18 10:30:00-05:00             4.338135e-03               -0.008514
2021-02-18 11:30:00-05:00            -4.321175e-04                0.004450
2021-02-18 12:30:00-05:00             2.914732e-03               -0.000350
2021-02-18 13:30:00-05:00            -2.209298e-03                0.002915
...                                            ...                     ...
2024-01-11 11:30:00-05:00            -7.600814e-04               -0.000976
2024-01-11 12:30:00-05:00             3.287127e-03               -0.000597
2024-01-11 13:30:00-05:00             4.657337e-03                0.003232
2024-01-11 14:30:00-05:00            -7.816310e-04                0.004603
2024-01-11 15:30:00-05:00             5.762072e-07               -0.000809

                           % Change in Low Price  % Change in High Price
Datetime
2021-02-18 09:30:00-05:00                    NaN                     NaN
2021-02-18 10:30:00-05:00               0.003296               -0.004434
2021-02-18 11:30:00-05:00               0.000548               -0.003059
2021-02-18 12:30:00-05:00               0.004300                0.002797
2021-02-18 13:30:00-05:00               0.000000               -0.000155
...                                          ...                     ...
2024-01-11 11:30:00-05:00               0.000000               -0.002326
2024-01-11 12:30:00-05:00               0.001305                0.001774
2024-01-11 13:30:00-05:00               0.002635                0.004128
2024-01-11 14:30:00-05:00               0.001057                0.000862
2024-01-11 15:30:00-05:00               0.003220               -0.000754

[5094 rows x 11 columns]

Choosing Companies

Due to time and scope constraints we will be selecting the top three companies to further explore and perform implementation in the later parts of the coursework. The condition for picking out three companies is that these companies will need to provide the most opportunity to make a return. Therefore we look for companies not only with large price movements but also with high frequency of those movements. We need to establish that we will only be looking into buying shares or entering a long position. It is understood that money could also be made when the market moves downward. However the scope will be finding the highest absolute sum of percentage change in price as any direction is an opportunity to generate returns. We will limit our analysis to making returns when the market is going up.

Because our data is already processed to represent price movement there are several ways we can approach this.

  • The sum of percentage change in price to see which company would have the highest net positive movement.
  • The number of positive percentage changes in price to see which company would have the most frequent number of positive movements.
  • The sum of positive percentage change in price to see which company would have the highest positive movement.
  • The sum of the differences between high and low price.
  • The sum of the differences between high and close price.
  • The sum of the differences between low and close price.

We will be settling on using total positive percentage change.

We will also be comparing this to the approach of using standard deviation of the stock price. Standard deviation usually represents the volatility of a financial instrument and also risk. Therefore a stock with a higher standard deviation is what we would like to pick.

tech_companies[0][1]['% Change in Close Price']
Datetime
2021-02-18 09:30:00-05:00             NaN
2021-02-18 10:30:00-05:00    4.338135e-03
2021-02-18 11:30:00-05:00   -4.321175e-04
2021-02-18 12:30:00-05:00    2.914732e-03
2021-02-18 13:30:00-05:00   -2.209298e-03
                                 ...
2024-01-11 11:30:00-05:00   -7.600814e-04
2024-01-11 12:30:00-05:00    3.287127e-03
2024-01-11 13:30:00-05:00    4.657337e-03
2024-01-11 14:30:00-05:00   -7.816310e-04
2024-01-11 15:30:00-05:00    5.762072e-07
Name: % Change in Close Price, Length: 5094, dtype: float64
tech_companies_ticker = [tech_company[0] for tech_company in tech_companies]
companies_df = pd.DataFrame(index = tech_companies_ticker)
companies_df
Empty DataFrame
Columns: []
Index: [AAPL, ADI, ADSK, AMD, ANSS, ASML, AVGO, CDW, CRWD, CTSH, DASH, DDOG, GOOG, GOOGL, KLAC, LRCX, MDB, META, MSFT, NVDA, NXPI, PANW, PDD, QCOM, TXN]

Sum of % Positive Close Price Changes

sum_of_positive_changes_str = 'Sum of Positive Close Price Changes'
companies_df[sum_of_positive_changes_str] = None

for tech_company in tech_companies:
        companies_df.loc[tech_company[0], sum_of_positive_changes_str] = sum(
            list(filter(lambda x: x > 0, tech_company[1]['% Change in Close Price'][1:]))
        )

companies_df
      Sum of Positive Close Price Changes
AAPL                            10.784127
ADI                             12.092185
ADSK                            13.781926
AMD                             19.563814
ANSS                            12.933404
ASML                            14.493156
AVGO                            12.797733
CDW                             10.444991
CRWD                            20.581841
CTSH                             9.911096
DASH                            24.763219
DDOG                            23.552644
GOOG                            11.539979
GOOGL                           11.683878
KLAC                            15.823952
LRCX                             16.54264
MDB                             25.052964
META                             15.01839
MSFT                            10.795445
NVDA                            20.052809
NXPI                             15.32139
PANW                             15.20349
PDD                             24.920476
QCOM                            13.922156
TXN                             10.838171

Sum of % Change in Price

sum_of_changes_str = 'Sum of Close Price Changes'
companies_df[sum_of_changes_str] = None

for tech_company in tech_companies:
        companies_df.loc[tech_company[0], sum_of_changes_str] = sum(tech_company[1]['% Change in Close Price'][1:])

companies_df
      Sum of Positive Close Price Changes Sum of Close Price Changes
AAPL                            10.784127                   0.475806
ADI                             12.092185                   0.316209
ADSK                            13.781926                   0.002792
AMD                             19.563814                   0.867206
ANSS                            12.933404                   0.087076
ASML                            14.493156                   0.437682
AVGO                            12.797733                   0.975791
CDW                             10.444991                    0.43849
CRWD                            20.581841                   0.647721
CTSH                             9.911096                   0.105226
DASH                            24.763219                   0.055267
DDOG                            23.552644                   0.787556
GOOG                            11.539979                   0.442579
GOOGL                           11.683878                   0.439984
KLAC                            15.823952                   0.775045
LRCX                             16.54264                   0.544324
MDB                             25.052964                   0.714716
META                             15.01839                   0.625347
MSFT                            10.795445                    0.57566
NVDA                            20.052809                   1.695652
NXPI                             15.32139                   0.346168
PANW                             15.20349                   1.153684
PDD                             24.920476                   0.707896
QCOM                            13.922156                     0.1617
TXN                             10.838171                   0.051368

Average Gap between Low and High

average_gap_str = 'Average Gap (High - Low)'
companies_df[average_gap_str] = None

for tech_company in tech_companies:
        companies_df.loc[tech_company[0], average_gap_str] = sum((tech_company[1]['High'][1:] - tech_company[1]['Low'][1:]) / tech_company[1]['Low'][1:] ) / len(tech_company[1][1:])

companies_df
      Sum of Positive Close Price Changes Sum of Close Price Changes  \
AAPL                            10.784127                   0.475806
ADI                             12.092185                   0.316209
ADSK                            13.781926                   0.002792
AMD                             19.563814                   0.867206
ANSS                            12.933404                   0.087076
ASML                            14.493156                   0.437682
AVGO                            12.797733                   0.975791
CDW                             10.444991                    0.43849
CRWD                            20.581841                   0.647721
CTSH                             9.911096                   0.105226
DASH                            24.763219                   0.055267
DDOG                            23.552644                   0.787556
GOOG                            11.539979                   0.442579
GOOGL                           11.683878                   0.439984
KLAC                            15.823952                   0.775045
LRCX                             16.54264                   0.544324
MDB                             25.052964                   0.714716
META                             15.01839                   0.625347
MSFT                            10.795445                    0.57566
NVDA                            20.052809                   1.695652
NXPI                             15.32139                   0.346168
PANW                             15.20349                   1.153684
PDD                             24.920476                   0.707896
QCOM                            13.922156                     0.1617
TXN                             10.838171                   0.051368

      Average Gap (High - Low)
AAPL                  0.007524
ADI                   0.008492
ADSK                  0.009741
AMD                   0.013881
ANSS                  0.008775
ASML                  0.008735
AVGO                  0.008587
CDW                   0.007192
CRWD                   0.01487
CTSH                  0.006904
DASH                  0.018389
DDOG                  0.016879
GOOG                  0.007889
GOOGL                 0.008093
KLAC                  0.010719
LRCX                  0.011346
MDB                   0.017585
META                  0.010453
MSFT                  0.007331
NVDA                  0.013883
NXPI                  0.010598
PANW                  0.010641
PDD                   0.016798
QCOM                  0.009806
TXN                   0.007737

Standard Deviation

columns = tech_companies[0][1].columns
index = [tech_company[0] for tech_company in tech_companies]
companies_standard_deviations = pd.DataFrame(columns=columns, index=index)

for tech_company in tech_companies:
    companies_standard_deviations.loc[tech_company[0]] = tech_company[1].std()

companies_standard_deviations = companies_standard_deviations.sort_values(['% Change in Close Price'], ascending=False).drop(columns=['Volume', 'Dividends', 'Stock Splits'])
companies_standard_deviations
             Open        High         Low       Close % Change in Close Price  \
PDD     32.080006    32.23114   31.915326   32.064646                0.019483
MDB     98.760671   99.268854   98.228868   98.727557                 0.01708
DASH    50.425317   50.804168   50.037054   50.409624                 0.01612
DDOG    30.319219   30.608159    30.01792    30.30877                0.015823
CRWD    47.528904   47.749702   47.289229   47.527375                0.013231
NVDA   114.239468  114.743463   113.70487   114.22924                0.012289
AMD     22.272153   22.431132   22.100393   22.267683                0.011628
META    80.131509   80.260925   79.970756   80.126257                 0.01079
LRCX    96.648191   96.664046   96.555595   96.629964                0.010065
PANW    45.515648   45.717731   45.370409   45.561569                0.009816
KLAC    70.415159   70.490377   70.373619   70.452785                0.009329
ASML   101.669561  101.570227  101.722034  101.663423                0.009292
NXPI     21.07974   21.067232   21.073547   21.079791                0.009153
ADSK    42.976498   43.041587   42.898398   42.968848                0.008853
QCOM    19.359434   19.527088   19.197413     19.3657                0.008699
ANSS    48.050243   48.119773   47.967689   48.050613                0.008109
AVGO     165.6667  166.459976  165.076903   165.81627                0.007442
GOOGL   16.905788   16.896378   16.890284   16.892939                0.007365
GOOG    17.025381   17.013128   17.015443   17.014153                0.007266
ADI     13.984689   13.938348    14.03361    13.99286                0.007145
MSFT    39.978472    40.00587   39.961993   39.978433                0.006599
CTSH     8.970093    8.996426    8.937564    8.970886                0.006481
AAPL    19.769706   19.766775   19.782729   19.771827                0.006468
TXN      12.61352    12.58614   12.621553   12.616761                0.006416
CDW     16.664114   16.634817   16.694416   16.665404                0.006257

      % Change in Open Price % Change in Low Price % Change in High Price
PDD                 0.018998              0.019522               0.020066
MDB                 0.017038              0.016511               0.017406
DASH                0.016489              0.016242               0.016296
DDOG                0.015958              0.015543               0.015966
CRWD                0.013339              0.013352               0.013045
NVDA                 0.01249              0.011933               0.011902
AMD                 0.011748              0.011282                0.01135
META                0.010954              0.010853               0.010767
LRCX                0.009959              0.009788               0.009623
PANW                0.009765              0.009607               0.009957
KLAC                0.009507              0.009162               0.009164
ASML                0.009398              0.009049               0.008881
NXPI                0.009172              0.008935               0.008639
ADSK                0.008846               0.00886               0.008613
QCOM                0.008886              0.008561               0.008529
ANSS                0.008587              0.008109               0.008235
AVGO                0.007365              0.007084               0.007253
GOOGL               0.007362              0.007163               0.007374
GOOG                0.007241              0.007052               0.006969
ADI                 0.007332              0.007182                0.00674
MSFT                0.006522              0.006275               0.006274
CTSH                0.006628               0.00643               0.006285
AAPL                0.006413              0.006331               0.006053
TXN                 0.006685              0.006363               0.005972
CDW                 0.006454              0.006477               0.005878

columns = companies_standard_deviations.columns[-4:]
companies_df[columns] = companies_standard_deviations[columns]
companies_df
      Sum of Positive Close Price Changes Sum of Close Price Changes  \
AAPL                            10.784127                   0.475806
ADI                             12.092185                   0.316209
ADSK                            13.781926                   0.002792
AMD                             19.563814                   0.867206
ANSS                            12.933404                   0.087076
ASML                            14.493156                   0.437682
AVGO                            12.797733                   0.975791
CDW                             10.444991                    0.43849
CRWD                            20.581841                   0.647721
CTSH                             9.911096                   0.105226
DASH                            24.763219                   0.055267
DDOG                            23.552644                   0.787556
GOOG                            11.539979                   0.442579
GOOGL                           11.683878                   0.439984
KLAC                            15.823952                   0.775045
LRCX                             16.54264                   0.544324
MDB                             25.052964                   0.714716
META                             15.01839                   0.625347
MSFT                            10.795445                    0.57566
NVDA                            20.052809                   1.695652
NXPI                             15.32139                   0.346168
PANW                             15.20349                   1.153684
PDD                             24.920476                   0.707896
QCOM                            13.922156                     0.1617
TXN                             10.838171                   0.051368

      Average Gap (High - Low) % Change in Close Price % Change in Open Price  \
AAPL                  0.007524                0.006468               0.006413
ADI                   0.008492                0.007145               0.007332
ADSK                  0.009741                0.008853               0.008846
AMD                   0.013881                0.011628               0.011748
ANSS                  0.008775                0.008109               0.008587
ASML                  0.008735                0.009292               0.009398
AVGO                  0.008587                0.007442               0.007365
CDW                   0.007192                0.006257               0.006454
CRWD                   0.01487                0.013231               0.013339
CTSH                  0.006904                0.006481               0.006628
DASH                  0.018389                 0.01612               0.016489
DDOG                  0.016879                0.015823               0.015958
GOOG                  0.007889                0.007266               0.007241
GOOGL                 0.008093                0.007365               0.007362
KLAC                  0.010719                0.009329               0.009507
LRCX                  0.011346                0.010065               0.009959
MDB                   0.017585                 0.01708               0.017038
META                  0.010453                 0.01079               0.010954
MSFT                  0.007331                0.006599               0.006522
NVDA                  0.013883                0.012289                0.01249
NXPI                  0.010598                0.009153               0.009172
PANW                  0.010641                0.009816               0.009765
PDD                   0.016798                0.019483               0.018998
QCOM                  0.009806                0.008699               0.008886
TXN                   0.007737                0.006416               0.006685

      % Change in Low Price % Change in High Price
AAPL               0.006331               0.006053
ADI                0.007182                0.00674
ADSK                0.00886               0.008613
AMD                0.011282                0.01135
ANSS               0.008109               0.008235
ASML               0.009049               0.008881
AVGO               0.007084               0.007253
CDW                0.006477               0.005878
CRWD               0.013352               0.013045
CTSH                0.00643               0.006285
DASH               0.016242               0.016296
DDOG               0.015543               0.015966
GOOG               0.007052               0.006969
GOOGL              0.007163               0.007374
KLAC               0.009162               0.009164
LRCX               0.009788               0.009623
MDB                0.016511               0.017406
META               0.010853               0.010767
MSFT               0.006275               0.006274
NVDA               0.011933               0.011902
NXPI               0.008935               0.008639
PANW               0.009607               0.009957
PDD                0.019522               0.020066
QCOM               0.008561               0.008529
TXN                0.006363               0.005972

Any of the columns available in the DataFrame are all justifiable to be used to decide. But we will go through each of them to see which are the companies that have more price movement that others.

Comparing the columns

We will create a column to see how many points can the companies collect. The top 3 highest scorer will be chosen to be used, moving forward

companies_df['Score'] = 0
companies_df
      Sum of Positive Close Price Changes Sum of Close Price Changes  \
AAPL                            10.784127                   0.475806
ADI                             12.092185                   0.316209
ADSK                            13.781926                   0.002792
AMD                             19.563814                   0.867206
ANSS                            12.933404                   0.087076
ASML                            14.493156                   0.437682
AVGO                            12.797733                   0.975791
CDW                             10.444991                    0.43849
CRWD                            20.581841                   0.647721
CTSH                             9.911096                   0.105226
DASH                            24.763219                   0.055267
DDOG                            23.552644                   0.787556
GOOG                            11.539979                   0.442579
GOOGL                           11.683878                   0.439984
KLAC                            15.823952                   0.775045
LRCX                             16.54264                   0.544324
MDB                             25.052964                   0.714716
META                             15.01839                   0.625347
MSFT                            10.795445                    0.57566
NVDA                            20.052809                   1.695652
NXPI                             15.32139                   0.346168
PANW                             15.20349                   1.153684
PDD                             24.920476                   0.707896
QCOM                            13.922156                     0.1617
TXN                             10.838171                   0.051368

      Average Gap (High - Low) % Change in Close Price % Change in Open Price  \
AAPL                  0.007524                0.006468               0.006413
ADI                   0.008492                0.007145               0.007332
ADSK                  0.009741                0.008853               0.008846
AMD                   0.013881                0.011628               0.011748
ANSS                  0.008775                0.008109               0.008587
ASML                  0.008735                0.009292               0.009398
AVGO                  0.008587                0.007442               0.007365
CDW                   0.007192                0.006257               0.006454
CRWD                   0.01487                0.013231               0.013339
CTSH                  0.006904                0.006481               0.006628
DASH                  0.018389                 0.01612               0.016489
DDOG                  0.016879                0.015823               0.015958
GOOG                  0.007889                0.007266               0.007241
GOOGL                 0.008093                0.007365               0.007362
KLAC                  0.010719                0.009329               0.009507
LRCX                  0.011346                0.010065               0.009959
MDB                   0.017585                 0.01708               0.017038
META                  0.010453                 0.01079               0.010954
MSFT                  0.007331                0.006599               0.006522
NVDA                  0.013883                0.012289                0.01249
NXPI                  0.010598                0.009153               0.009172
PANW                  0.010641                0.009816               0.009765
PDD                   0.016798                0.019483               0.018998
QCOM                  0.009806                0.008699               0.008886
TXN                   0.007737                0.006416               0.006685

      % Change in Low Price % Change in High Price  Score
AAPL               0.006331               0.006053      0
ADI                0.007182                0.00674      0
ADSK                0.00886               0.008613      0
AMD                0.011282                0.01135      0
ANSS               0.008109               0.008235      0
ASML               0.009049               0.008881      0
AVGO               0.007084               0.007253      0
CDW                0.006477               0.005878      0
CRWD               0.013352               0.013045      0
CTSH                0.00643               0.006285      0
DASH               0.016242               0.016296      0
DDOG               0.015543               0.015966      0
GOOG               0.007052               0.006969      0
GOOGL              0.007163               0.007374      0
KLAC               0.009162               0.009164      0
LRCX               0.009788               0.009623      0
MDB                0.016511               0.017406      0
META               0.010853               0.010767      0
MSFT               0.006275               0.006274      0
NVDA               0.011933               0.011902      0
NXPI               0.008935               0.008639      0
PANW               0.009607               0.009957      0
PDD                0.019522               0.020066      0
QCOM               0.008561               0.008529      0
TXN                0.006363               0.005972      0

companies_df.columns[-1]
'Score'
# Looping through all the columns in the DataFrame
for column in companies_df.columns:

    # Sorting them in descending order
    companies_df = companies_df.sort_values(column,ascending=False)

    # Giving a point to the top 3 companies
    companies_df.iloc[:3, -1] += 1

companies_df
      Sum of Positive Close Price Changes Sum of Close Price Changes  \
DASH                            24.763219                   0.055267
MDB                             25.052964                   0.714716
PDD                             24.920476                   0.707896
DDOG                            23.552644                   0.787556
NVDA                            20.052809                   1.695652
PANW                             15.20349                   1.153684
AVGO                            12.797733                   0.975791
ANSS                            12.933404                   0.087076
TXN                             10.838171                   0.051368
AAPL                            10.784127                   0.475806
MSFT                            10.795445                    0.57566
CTSH                             9.911096                   0.105226
ADI                             12.092185                   0.316209
GOOG                            11.539979                   0.442579
GOOGL                           11.683878                   0.439984
NXPI                             15.32139                   0.346168
QCOM                            13.922156                     0.1617
ADSK                            13.781926                   0.002792
ASML                            14.493156                   0.437682
KLAC                            15.823952                   0.775045
LRCX                             16.54264                   0.544324
META                             15.01839                   0.625347
AMD                             19.563814                   0.867206
CRWD                            20.581841                   0.647721
CDW                             10.444991                    0.43849

      Average Gap (High - Low) % Change in Close Price % Change in Open Price  \
DASH                  0.018389                 0.01612               0.016489
MDB                   0.017585                 0.01708               0.017038
PDD                   0.016798                0.019483               0.018998
DDOG                  0.016879                0.015823               0.015958
NVDA                  0.013883                0.012289                0.01249
PANW                  0.010641                0.009816               0.009765
AVGO                  0.008587                0.007442               0.007365
ANSS                  0.008775                0.008109               0.008587
TXN                   0.007737                0.006416               0.006685
AAPL                  0.007524                0.006468               0.006413
MSFT                  0.007331                0.006599               0.006522
CTSH                  0.006904                0.006481               0.006628
ADI                   0.008492                0.007145               0.007332
GOOG                  0.007889                0.007266               0.007241
GOOGL                 0.008093                0.007365               0.007362
NXPI                  0.010598                0.009153               0.009172
QCOM                  0.009806                0.008699               0.008886
ADSK                  0.009741                0.008853               0.008846
ASML                  0.008735                0.009292               0.009398
KLAC                  0.010719                0.009329               0.009507
LRCX                  0.011346                0.010065               0.009959
META                  0.010453                 0.01079               0.010954
AMD                   0.013881                0.011628               0.011748
CRWD                   0.01487                0.013231               0.013339
CDW                   0.007192                0.006257               0.006454

      % Change in Low Price % Change in High Price  Score
DASH               0.016242               0.016296      7
MDB                0.016511               0.017406      7
PDD                0.019522               0.020066      6
DDOG               0.015543               0.015966      1
NVDA               0.011933               0.011902      1
PANW               0.009607               0.009957      1
AVGO               0.007084               0.007253      1
ANSS               0.008109               0.008235      0
TXN                0.006363               0.005972      0
AAPL               0.006331               0.006053      0
MSFT               0.006275               0.006274      0
CTSH                0.00643               0.006285      0
ADI                0.007182                0.00674      0
GOOG               0.007052               0.006969      0
GOOGL              0.007163               0.007374      0
NXPI               0.008935               0.008639      0
QCOM               0.008561               0.008529      0
ADSK                0.00886               0.008613      0
ASML               0.009049               0.008881      0
KLAC               0.009162               0.009164      0
LRCX               0.009788               0.009623      0
META               0.010853               0.010767      0
AMD                0.011282                0.01135      0
CRWD               0.013352               0.013045      0
CDW                0.006477               0.005878      0

Based on the results, we can see that ticker 'MDB', 'PDD' and 'DASH' are the winners as they scored the most.

winning_companies = ['MDB', 'PDD', 'DASH']
companies_dict = {}

for tech_company in tech_companies:
    if tech_company[0] in winning_companies:
        companies_dict[tech_company[0]] = tech_company[1]

companies_dict
{'DASH':                                  Open        High         Low       Close  \
 Datetime
 2021-02-18 09:30:00-05:00  194.300003  196.610001  188.990005  189.800003
 2021-02-18 10:30:00-05:00  189.576096  194.800003  188.910004  193.691101
 2021-02-18 11:30:00-05:00  193.770004  196.979996  191.339996  191.554993
 2021-02-18 12:30:00-05:00  191.339996  194.490005  191.169998  193.899994
 2021-02-18 13:30:00-05:00  194.149994  196.440002  193.809998  196.220001
 ...                               ...         ...         ...         ...
 2024-01-11 11:30:00-05:00  102.565002  103.139999  102.400002  102.925003
 2024-01-11 12:30:00-05:00  102.959999  103.959999  102.889999  103.800003
 2024-01-11 13:30:00-05:00  103.779999  104.650002  103.675003  104.625000
 2024-01-11 14:30:00-05:00  104.644997  104.695000  104.065002  104.339996
 2024-01-11 15:30:00-05:00  104.320000  104.330002  104.029999  104.100098

                            Volume  Dividends  Stock Splits  \
 Datetime
 2021-02-18 09:30:00-05:00  410371        0.0           0.0
 2021-02-18 10:30:00-05:00  270042        0.0           0.0
 2021-02-18 11:30:00-05:00  270798        0.0           0.0
 2021-02-18 12:30:00-05:00  185655        0.0           0.0
 2021-02-18 13:30:00-05:00  108898        0.0           0.0
 ...                           ...        ...           ...
 2024-01-11 11:30:00-05:00  226748        0.0           0.0
 2024-01-11 12:30:00-05:00  230721        0.0           0.0
 2024-01-11 13:30:00-05:00  270964        0.0           0.0
 2024-01-11 14:30:00-05:00  247590        0.0           0.0
 2024-01-11 15:30:00-05:00   87951        0.0           0.0

                            % Change in Close Price  % Change in Open Price  \
 Datetime
 2021-02-18 09:30:00-05:00                      NaN                     NaN
 2021-02-18 10:30:00-05:00                 0.020501               -0.024312
 2021-02-18 11:30:00-05:00                -0.011028                0.022123
 2021-02-18 12:30:00-05:00                 0.012242               -0.012541
 2021-02-18 13:30:00-05:00                 0.011965                0.014686
 ...                                            ...                     ...
 2024-01-11 11:30:00-05:00                 0.003559               -0.012088
 2024-01-11 12:30:00-05:00                 0.008501                0.003851
 2024-01-11 13:30:00-05:00                 0.007948                0.007964
 2024-01-11 14:30:00-05:00                -0.002724                0.008335
 2024-01-11 15:30:00-05:00                -0.002299               -0.003106

                            % Change in Low Price  % Change in High Price
 Datetime
 2021-02-18 09:30:00-05:00                    NaN                     NaN
 2021-02-18 10:30:00-05:00              -0.000423               -0.009206
 2021-02-18 11:30:00-05:00               0.012863                0.011191
 2021-02-18 12:30:00-05:00              -0.000888               -0.012641
 2021-02-18 13:30:00-05:00               0.013810                0.010026
 ...                                          ...                     ...
 2024-01-11 11:30:00-05:00              -0.001073               -0.011596
 2024-01-11 12:30:00-05:00               0.004785                0.007950
 2024-01-11 13:30:00-05:00               0.007630                0.006637
 2024-01-11 14:30:00-05:00               0.003762                0.000430
 2024-01-11 15:30:00-05:00              -0.000336               -0.003486

 [5094 rows x 11 columns],
 'MDB':                                  Open        High         Low       Close  \
 Datetime
 2021-02-18 09:30:00-05:00  402.079987  409.820007  397.397491  398.625000
 2021-02-18 10:30:00-05:00  399.190002  407.035004  397.420013  404.570007
 2021-02-18 11:30:00-05:00  405.480011  405.480011  401.940002  403.084991
 2021-02-18 12:30:00-05:00  403.019989  404.109985  398.540009  403.070007
 2021-02-18 13:30:00-05:00  402.799988  403.959991  402.029999  402.489990
 ...                               ...         ...         ...         ...
 2024-01-11 11:30:00-05:00  385.250000  387.010010  383.200012  383.899994
 2024-01-11 12:30:00-05:00  384.160004  390.000000  384.010010  389.619995
 2024-01-11 13:30:00-05:00  389.559998  393.489990  389.006989  392.549988
 2024-01-11 14:30:00-05:00  392.679993  393.129913  391.830109  392.720001
 2024-01-11 15:30:00-05:00  392.829987  393.519989  392.470001  393.494995

                            Volume  Dividends  Stock Splits  \
 Datetime
 2021-02-18 09:30:00-05:00  108932        0.0           0.0
 2021-02-18 10:30:00-05:00   59993        0.0           0.0
 2021-02-18 11:30:00-05:00   41952        0.0           0.0
 2021-02-18 12:30:00-05:00   79363        0.0           0.0
 2021-02-18 13:30:00-05:00   59128        0.0           0.0
 ...                           ...        ...           ...
 2024-01-11 11:30:00-05:00  133670        0.0           0.0
 2024-01-11 12:30:00-05:00   90468        0.0           0.0
 2024-01-11 13:30:00-05:00  112940        0.0           0.0
 2024-01-11 14:30:00-05:00  102012        0.0           0.0
 2024-01-11 15:30:00-05:00   45298        0.0           0.0

                            % Change in Close Price  % Change in Open Price  \
 Datetime
 2021-02-18 09:30:00-05:00                      NaN                     NaN
 2021-02-18 10:30:00-05:00                 0.014914               -0.007188
 2021-02-18 11:30:00-05:00                -0.003671                0.015757
 2021-02-18 12:30:00-05:00                -0.000037               -0.006067
 2021-02-18 13:30:00-05:00                -0.001439               -0.000546
 ...                                            ...                     ...
 2024-01-11 11:30:00-05:00                -0.003375               -0.009691
 2024-01-11 12:30:00-05:00                 0.014900               -0.002829
 2024-01-11 13:30:00-05:00                 0.007520                0.014057
 2024-01-11 14:30:00-05:00                 0.000433                0.008009
 2024-01-11 15:30:00-05:00                 0.001973                0.000382

                            % Change in Low Price  % Change in High Price
 Datetime
 2021-02-18 09:30:00-05:00                    NaN                     NaN
 2021-02-18 10:30:00-05:00               0.000057               -0.006796
 2021-02-18 11:30:00-05:00               0.011373               -0.003820
 2021-02-18 12:30:00-05:00              -0.008459               -0.003379
 2021-02-18 13:30:00-05:00               0.008757               -0.000371
 ...                                          ...                     ...
 2024-01-11 11:30:00-05:00              -0.005192               -0.009457
 2024-01-11 12:30:00-05:00               0.002114                0.007726
 2024-01-11 13:30:00-05:00               0.013013                0.008949
 2024-01-11 14:30:00-05:00               0.007257               -0.000915
 2024-01-11 15:30:00-05:00               0.001633                0.000992

 [5094 rows x 11 columns],
 'PDD':                                  Open        High         Low       Close  \
 Datetime
 2021-02-18 09:30:00-05:00  198.000000  199.309998  192.899994  194.009995
 2021-02-18 10:30:00-05:00  194.000000  196.000000  193.875000  195.100006
 2021-02-18 11:30:00-05:00  195.000000  195.578003  193.100006  193.845001
 2021-02-18 12:30:00-05:00  193.820007  195.544998  192.029999  195.220001
 2021-02-18 13:30:00-05:00  195.179993  195.699997  194.660004  195.070007
 ...                               ...         ...         ...         ...
 2024-01-11 11:30:00-05:00  150.289993  151.050003  149.949997  150.550003
 2024-01-11 12:30:00-05:00  150.554993  151.339996  150.475006  151.070007
 2024-01-11 13:30:00-05:00  151.085007  151.630005  151.050003  151.539993
 2024-01-11 14:30:00-05:00  151.559998  151.774994  151.430099  151.580994
 2024-01-11 15:30:00-05:00  151.589996  151.720001  151.483994  151.649994

                             Volume  Dividends  Stock Splits  \
 Datetime
 2021-02-18 09:30:00-05:00  1851955        0.0           0.0
 2021-02-18 10:30:00-05:00   633548        0.0           0.0
 2021-02-18 11:30:00-05:00   305243        0.0           0.0
 2021-02-18 12:30:00-05:00   521230        0.0           0.0
 2021-02-18 13:30:00-05:00   185892        0.0           0.0
 ...                            ...        ...           ...
 2024-01-11 11:30:00-05:00   437558        0.0           0.0
 2024-01-11 12:30:00-05:00   321022        0.0           0.0
 2024-01-11 13:30:00-05:00   361665        0.0           0.0
 2024-01-11 14:30:00-05:00   449250        0.0           0.0
 2024-01-11 15:30:00-05:00   155503        0.0           0.0

                            % Change in Close Price  % Change in Open Price  \
 Datetime
 2021-02-18 09:30:00-05:00                      NaN                     NaN
 2021-02-18 10:30:00-05:00                 0.005618               -0.020202
 2021-02-18 11:30:00-05:00                -0.006433                0.005155
 2021-02-18 12:30:00-05:00                 0.007093               -0.006051
 2021-02-18 13:30:00-05:00                -0.000768                0.007017
 ...                                            ...                     ...
 2024-01-11 11:30:00-05:00                 0.001830               -0.006249
 2024-01-11 12:30:00-05:00                 0.003454                0.001763
 2024-01-11 13:30:00-05:00                 0.003111                0.003520
 2024-01-11 14:30:00-05:00                 0.000271                0.003144
 2024-01-11 15:30:00-05:00                 0.000455                0.000198

                            % Change in Low Price  % Change in High Price
 Datetime
 2021-02-18 09:30:00-05:00                    NaN                     NaN
 2021-02-18 10:30:00-05:00               0.005054               -0.016607
 2021-02-18 11:30:00-05:00              -0.003997               -0.002153
 2021-02-18 12:30:00-05:00              -0.005541               -0.000169
 2021-02-18 13:30:00-05:00               0.013696                0.000793
 ...                                          ...                     ...
 2024-01-11 11:30:00-05:00               0.000801               -0.003529
 2024-01-11 12:30:00-05:00               0.003501                0.001920
 2024-01-11 13:30:00-05:00               0.003821                0.001916
 2024-01-11 14:30:00-05:00               0.002516                0.000956
 2024-01-11 15:30:00-05:00               0.000356               -0.000362

 [5094 rows x 11 columns]}

Including Gap

As done above, we explored that the average gap between the high and low prices can also be an indication of price movement. Because our final dataset will be using change in price, the information about the difference between the high price and low price will be lost. Therefore, we should add a column that mentions the gap. Similarly, because the gap is based on the price, it shares the same issue as every price related column. But percentage change of the gap also does not convey the gap in price. Therefore, we will be using gap (difference between high and low) divided by the low price. Therefore, the gap is shown in percentage form.

companies_dict['MDB'].columns
Index(['Open', 'High', 'Low', 'Close', 'Volume', 'Dividends', 'Stock Splits',
       '% Change in Close Price', '% Change in Open Price',
       '% Change in Low Price', '% Change in High Price'],
      dtype='object')
for company in winning_companies:
    companies_dict[company]['Gap'] = (companies_dict[company]['High'] - companies_dict[company]['Low']) /  companies_dict[company]['Low']
    print(companies_dict[company]['Gap'])
Datetime
2021-02-18 09:30:00-05:00    0.031260
2021-02-18 10:30:00-05:00    0.024194
2021-02-18 11:30:00-05:00    0.008807
2021-02-18 12:30:00-05:00    0.013976
2021-02-18 13:30:00-05:00    0.004801
                               ...
2024-01-11 11:30:00-05:00    0.009943
2024-01-11 12:30:00-05:00    0.015599
2024-01-11 13:30:00-05:00    0.011524
2024-01-11 14:30:00-05:00    0.003317
2024-01-11 15:30:00-05:00    0.002675
Name: Gap, Length: 5094, dtype: float64
Datetime
2021-02-18 09:30:00-05:00    0.033230
2021-02-18 10:30:00-05:00    0.010961
2021-02-18 11:30:00-05:00    0.012833
2021-02-18 12:30:00-05:00    0.018304
2021-02-18 13:30:00-05:00    0.005343
                               ...
2024-01-11 11:30:00-05:00    0.007336
2024-01-11 12:30:00-05:00    0.005748
2024-01-11 13:30:00-05:00    0.003840
2024-01-11 14:30:00-05:00    0.002278
2024-01-11 15:30:00-05:00    0.001558
Name: Gap, Length: 5094, dtype: float64
Datetime
2021-02-18 09:30:00-05:00    0.040320
2021-02-18 10:30:00-05:00    0.031179
2021-02-18 11:30:00-05:00    0.029476
2021-02-18 12:30:00-05:00    0.017367
2021-02-18 13:30:00-05:00    0.013570
                               ...
2024-01-11 11:30:00-05:00    0.007227
2024-01-11 12:30:00-05:00    0.010399
2024-01-11 13:30:00-05:00    0.009404
2024-01-11 14:30:00-05:00    0.006054
2024-01-11 15:30:00-05:00    0.002884
Name: Gap, Length: 5094, dtype: float64

Datetime Features

We will now be extracting the datetime features. This is because they might be repeating patterns within the datetime that are might not be obvious to us. For example, it may be most volatile during Tuesday, 2pm to 4pm, when there change in close of at least 3%. Therefore, extracting these features will allow the machine learning model to identify patterns as mentioned.

companies_dict['DASH']
                                 Open        High         Low       Close  \
Datetime
2021-02-18 09:30:00-05:00  194.300003  196.610001  188.990005  189.800003
2021-02-18 10:30:00-05:00  189.576096  194.800003  188.910004  193.691101
2021-02-18 11:30:00-05:00  193.770004  196.979996  191.339996  191.554993
2021-02-18 12:30:00-05:00  191.339996  194.490005  191.169998  193.899994
2021-02-18 13:30:00-05:00  194.149994  196.440002  193.809998  196.220001
...                               ...         ...         ...         ...
2024-01-11 11:30:00-05:00  102.565002  103.139999  102.400002  102.925003
2024-01-11 12:30:00-05:00  102.959999  103.959999  102.889999  103.800003
2024-01-11 13:30:00-05:00  103.779999  104.650002  103.675003  104.625000
2024-01-11 14:30:00-05:00  104.644997  104.695000  104.065002  104.339996
2024-01-11 15:30:00-05:00  104.320000  104.330002  104.029999  104.100098

                           Volume  Dividends  Stock Splits  \
Datetime
2021-02-18 09:30:00-05:00  410371        0.0           0.0
2021-02-18 10:30:00-05:00  270042        0.0           0.0
2021-02-18 11:30:00-05:00  270798        0.0           0.0
2021-02-18 12:30:00-05:00  185655        0.0           0.0
2021-02-18 13:30:00-05:00  108898        0.0           0.0
...                           ...        ...           ...
2024-01-11 11:30:00-05:00  226748        0.0           0.0
2024-01-11 12:30:00-05:00  230721        0.0           0.0
2024-01-11 13:30:00-05:00  270964        0.0           0.0
2024-01-11 14:30:00-05:00  247590        0.0           0.0
2024-01-11 15:30:00-05:00   87951        0.0           0.0

                           % Change in Close Price  % Change in Open Price  \
Datetime
2021-02-18 09:30:00-05:00                      NaN                     NaN
2021-02-18 10:30:00-05:00                 0.020501               -0.024312
2021-02-18 11:30:00-05:00                -0.011028                0.022123
2021-02-18 12:30:00-05:00                 0.012242               -0.012541
2021-02-18 13:30:00-05:00                 0.011965                0.014686
...                                            ...                     ...
2024-01-11 11:30:00-05:00                 0.003559               -0.012088
2024-01-11 12:30:00-05:00                 0.008501                0.003851
2024-01-11 13:30:00-05:00                 0.007948                0.007964
2024-01-11 14:30:00-05:00                -0.002724                0.008335
2024-01-11 15:30:00-05:00                -0.002299               -0.003106

                           % Change in Low Price  % Change in High Price  \
Datetime
2021-02-18 09:30:00-05:00                    NaN                     NaN
2021-02-18 10:30:00-05:00              -0.000423               -0.009206
2021-02-18 11:30:00-05:00               0.012863                0.011191
2021-02-18 12:30:00-05:00              -0.000888               -0.012641
2021-02-18 13:30:00-05:00               0.013810                0.010026
...                                          ...                     ...
2024-01-11 11:30:00-05:00              -0.001073               -0.011596
2024-01-11 12:30:00-05:00               0.004785                0.007950
2024-01-11 13:30:00-05:00               0.007630                0.006637
2024-01-11 14:30:00-05:00               0.003762                0.000430
2024-01-11 15:30:00-05:00              -0.000336               -0.003486

                                Gap
Datetime
2021-02-18 09:30:00-05:00  0.040320
2021-02-18 10:30:00-05:00  0.031179
2021-02-18 11:30:00-05:00  0.029476
2021-02-18 12:30:00-05:00  0.017367
2021-02-18 13:30:00-05:00  0.013570
...                             ...
2024-01-11 11:30:00-05:00  0.007227
2024-01-11 12:30:00-05:00  0.010399
2024-01-11 13:30:00-05:00  0.009404
2024-01-11 14:30:00-05:00  0.006054
2024-01-11 15:30:00-05:00  0.002884

[5094 rows x 12 columns]

for company in winning_companies:

    # Add the DataTime information from index
    companies_dict[company]['DateTime'] = companies_dict[company].index

    # Extracting the Date Time Features
    companies_dict[company]['Hour'] = companies_dict[company]['DateTime'].dt.hour
    companies_dict[company]['Day'] = companies_dict[company]['DateTime'].dt.day
    companies_dict[company]['Day_of_Week'] = companies_dict[company]['DateTime'].dt.day_of_week
    companies_dict[company]['Month'] = companies_dict[company]['DateTime'].dt.month
    companies_dict[company]['Quarter'] = companies_dict[company]['DateTime'].dt.quarter

    # Drop the 'DateTime' column that was created
    companies_dict[company] = companies_dict[company].drop('DateTime', axis = 1)

    # Drop the top row - because when changing to percentage change in price, the top row has null values
    companies_dict[company] = companies_dict[company][1:]

    # Drop the columns that are not used
    companies_dict[company] = companies_dict[company][companies_dict[company].columns[7:]]
companies_dict['MDB']
                           % Change in Close Price  % Change in Open Price  \
Datetime
2021-02-18 10:30:00-05:00                 0.014914               -0.007188
2021-02-18 11:30:00-05:00                -0.003671                0.015757
2021-02-18 12:30:00-05:00                -0.000037               -0.006067
2021-02-18 13:30:00-05:00                -0.001439               -0.000546
2021-02-18 14:30:00-05:00                -0.000447               -0.001912
...                                            ...                     ...
2024-01-11 11:30:00-05:00                -0.003375               -0.009691
2024-01-11 12:30:00-05:00                 0.014900               -0.002829
2024-01-11 13:30:00-05:00                 0.007520                0.014057
2024-01-11 14:30:00-05:00                 0.000433                0.008009
2024-01-11 15:30:00-05:00                 0.001973                0.000382

                           % Change in Low Price  % Change in High Price  \
Datetime
2021-02-18 10:30:00-05:00               0.000057               -0.006796
2021-02-18 11:30:00-05:00               0.011373               -0.003820
2021-02-18 12:30:00-05:00              -0.008459               -0.003379
2021-02-18 13:30:00-05:00               0.008757               -0.000371
2021-02-18 14:30:00-05:00              -0.002375               -0.001634
...                                          ...                     ...
2024-01-11 11:30:00-05:00              -0.005192               -0.009457
2024-01-11 12:30:00-05:00               0.002114                0.007726
2024-01-11 13:30:00-05:00               0.013013                0.008949
2024-01-11 14:30:00-05:00               0.007257               -0.000915
2024-01-11 15:30:00-05:00               0.001633                0.000992

                                Gap  Hour  Day  Day_of_Week  Month  Quarter
Datetime
2021-02-18 10:30:00-05:00  0.024194    10   18            3      2        1
2021-02-18 11:30:00-05:00  0.008807    11   18            3      2        1
2021-02-18 12:30:00-05:00  0.013976    12   18            3      2        1
2021-02-18 13:30:00-05:00  0.004801    13   18            3      2        1
2021-02-18 14:30:00-05:00  0.005548    14   18            3      2        1
...                             ...   ...  ...          ...    ...      ...
2024-01-11 11:30:00-05:00  0.009943    11   11            3      1        1
2024-01-11 12:30:00-05:00  0.015599    12   11            3      1        1
2024-01-11 13:30:00-05:00  0.011524    13   11            3      1        1
2024-01-11 14:30:00-05:00  0.003317    14   11            3      1        1
2024-01-11 15:30:00-05:00  0.002675    15   11            3      1        1

[5093 rows x 10 columns]
companies_dict['PDD']
                           % Change in Close Price  % Change in Open Price  \
Datetime
2021-02-18 10:30:00-05:00                 0.005618               -0.020202
2021-02-18 11:30:00-05:00                -0.006433                0.005155
2021-02-18 12:30:00-05:00                 0.007093               -0.006051
2021-02-18 13:30:00-05:00                -0.000768                0.007017
2021-02-18 14:30:00-05:00                 0.003025               -0.000461
...                                            ...                     ...
2024-01-11 11:30:00-05:00                 0.001830               -0.006249
2024-01-11 12:30:00-05:00                 0.003454                0.001763
2024-01-11 13:30:00-05:00                 0.003111                0.003520
2024-01-11 14:30:00-05:00                 0.000271                0.003144
2024-01-11 15:30:00-05:00                 0.000455                0.000198

                           % Change in Low Price  % Change in High Price  \
Datetime
2021-02-18 10:30:00-05:00               0.005054               -0.016607
2021-02-18 11:30:00-05:00              -0.003997               -0.002153
2021-02-18 12:30:00-05:00              -0.005541               -0.000169
2021-02-18 13:30:00-05:00               0.013696                0.000793
2021-02-18 14:30:00-05:00               0.002158                0.002810
...                                          ...                     ...
2024-01-11 11:30:00-05:00               0.000801               -0.003529
2024-01-11 12:30:00-05:00               0.003501                0.001920
2024-01-11 13:30:00-05:00               0.003821                0.001916
2024-01-11 14:30:00-05:00               0.002516                0.000956
2024-01-11 15:30:00-05:00               0.000356               -0.000362

                                Gap  Hour  Day  Day_of_Week  Month  Quarter
Datetime
2021-02-18 10:30:00-05:00  0.010961    10   18            3      2        1
2021-02-18 11:30:00-05:00  0.012833    11   18            3      2        1
2021-02-18 12:30:00-05:00  0.018304    12   18            3      2        1
2021-02-18 13:30:00-05:00  0.005343    13   18            3      2        1
2021-02-18 14:30:00-05:00  0.005998    14   18            3      2        1
...                             ...   ...  ...          ...    ...      ...
2024-01-11 11:30:00-05:00  0.007336    11   11            3      1        1
2024-01-11 12:30:00-05:00  0.005748    12   11            3      1        1
2024-01-11 13:30:00-05:00  0.003840    13   11            3      1        1
2024-01-11 14:30:00-05:00  0.002278    14   11            3      1        1
2024-01-11 15:30:00-05:00  0.001558    15   11            3      1        1

[5093 rows x 10 columns]
companies_dict['MDB']
                           % Change in Close Price  % Change in Open Price  \
Datetime
2021-02-18 10:30:00-05:00                 0.014914               -0.007188
2021-02-18 11:30:00-05:00                -0.003671                0.015757
2021-02-18 12:30:00-05:00                -0.000037               -0.006067
2021-02-18 13:30:00-05:00                -0.001439               -0.000546
2021-02-18 14:30:00-05:00                -0.000447               -0.001912
...                                            ...                     ...
2024-01-11 11:30:00-05:00                -0.003375               -0.009691
2024-01-11 12:30:00-05:00                 0.014900               -0.002829
2024-01-11 13:30:00-05:00                 0.007520                0.014057
2024-01-11 14:30:00-05:00                 0.000433                0.008009
2024-01-11 15:30:00-05:00                 0.001973                0.000382

                           % Change in Low Price  % Change in High Price  \
Datetime
2021-02-18 10:30:00-05:00               0.000057               -0.006796
2021-02-18 11:30:00-05:00               0.011373               -0.003820
2021-02-18 12:30:00-05:00              -0.008459               -0.003379
2021-02-18 13:30:00-05:00               0.008757               -0.000371
2021-02-18 14:30:00-05:00              -0.002375               -0.001634
...                                          ...                     ...
2024-01-11 11:30:00-05:00              -0.005192               -0.009457
2024-01-11 12:30:00-05:00               0.002114                0.007726
2024-01-11 13:30:00-05:00               0.013013                0.008949
2024-01-11 14:30:00-05:00               0.007257               -0.000915
2024-01-11 15:30:00-05:00               0.001633                0.000992

                                Gap  Hour  Day  Day_of_Week  Month  Quarter
Datetime
2021-02-18 10:30:00-05:00  0.024194    10   18            3      2        1
2021-02-18 11:30:00-05:00  0.008807    11   18            3      2        1
2021-02-18 12:30:00-05:00  0.013976    12   18            3      2        1
2021-02-18 13:30:00-05:00  0.004801    13   18            3      2        1
2021-02-18 14:30:00-05:00  0.005548    14   18            3      2        1
...                             ...   ...  ...          ...    ...      ...
2024-01-11 11:30:00-05:00  0.009943    11   11            3      1        1
2024-01-11 12:30:00-05:00  0.015599    12   11            3      1        1
2024-01-11 13:30:00-05:00  0.011524    13   11            3      1        1
2024-01-11 14:30:00-05:00  0.003317    14   11            3      1        1
2024-01-11 15:30:00-05:00  0.002675    15   11            3      1        1

[5093 rows x 10 columns]

Finding Patterns using Datetime Features

We will go through some examples of date time features extracted can be used for.

companies_dict['MDB'].columns
Index(['% Change in Close Price', '% Change in Open Price',
       '% Change in Low Price', '% Change in High Price', 'Gap', 'Hour', 'Day',
       'Day_of_Week', 'Month', 'Quarter'],
      dtype='object')

We will able to access data from specific days, for example % change in close price on Wednesday.

df = companies_dict['MDB']
df.loc[(df['Day_of_Week'] == 2), '% Change in Close Price']
Datetime
2021-02-24 09:30:00-05:00   -0.005283
2021-02-24 10:30:00-05:00    0.033772
2021-02-24 11:30:00-05:00    0.016628
2021-02-24 12:30:00-05:00    0.003533
2021-02-24 13:30:00-05:00   -0.000200
                               ...
2024-01-10 11:30:00-05:00    0.008430
2024-01-10 12:30:00-05:00    0.005113
2024-01-10 13:30:00-05:00    0.008779
2024-01-10 14:30:00-05:00   -0.003342
2024-01-10 15:30:00-05:00    0.002218
Name: % Change in Close Price, Length: 1057, dtype: float64

We are now able to see which the average % Change in Close Price on each day.

for day_of_the_week in range(5):
    print(
        'The average return on day ' + str(day_of_the_week) + '\t' +
        "{:.3f}".format(
            df.loc[
                (df['Day_of_Week'] == day_of_the_week),
                '% Change in Close Price'
            ].mean()*100
        ) + '%'
    )
The average return on day 0	-0.101%
The average return on day 1	0.045%
The average return on day 2	0.043%
The average return on day 3	-0.011%
The average return on day 4	0.082%

Or the average % change in price based on the hour

for hour in range(9,16):
    print(
        'The average return on the hour ' + str(hour) + '\t' +
        "{:.3f}".format(
            df.loc[
                (df['Hour'] == hour),
                '% Change in Close Price'
            ].mean()*100
        ) + '%'
    )
The average return on the hour 9	0.119%
The average return on the hour 10	-0.033%
The average return on the hour 11	0.010%
The average return on the hour 12	-0.033%
The average return on the hour 13	-0.011%
The average return on the hour 14	0.058%
The average return on the hour 15	-0.011%

We can even look into which hour will be giving us the best returns on average, based on the given day.

max = 0
best_day, best_hour = [0,0]

for day_of_the_week in range(5):
    print('The average return on day ' + str(day_of_the_week) + ': ')

    for hour in range(9,16):

        max_return = df.loc[
                    (
                        (df['Hour'] == hour)
                        & (df['Day_of_Week'] == day_of_the_week)
                    ),
                    '% Change in Close Price'
                ].mean()*100

        print('on Hour ' + str(hour) + ': '+'\t' + "{:.4f}".format(max_return) + '%')

        if max_return > max:
            max = max_return
            best_day = day_of_the_week
            best_hour = hour

print()
print('On day %s, hour %s has the best average return' % (best_day, best_hour))
The average return on day 0:
on Hour 9: 	-0.6281%
on Hour 10: 	-0.0957%
on Hour 11: 	-0.0631%
on Hour 12: 	0.0505%
on Hour 13: 	-0.0095%
on Hour 14: 	-0.0099%
on Hour 15: 	0.0556%
The average return on day 1:
on Hour 9: 	0.3725%
on Hour 10: 	0.0257%
on Hour 11: 	-0.0462%
on Hour 12: 	-0.0682%
on Hour 13: 	-0.0134%
on Hour 14: 	0.0879%
on Hour 15: 	-0.0430%
The average return on day 2:
on Hour 9: 	0.3432%
on Hour 10: 	-0.0367%
on Hour 11: 	-0.0706%
on Hour 12: 	0.0013%
on Hour 13: 	0.1244%
on Hour 14: 	0.0633%
on Hour 15: 	-0.1235%
The average return on day 3:
on Hour 9: 	-0.0872%
on Hour 10: 	0.1065%
on Hour 11: 	0.1235%
on Hour 12: 	-0.1306%
on Hour 13: 	-0.0691%
on Hour 14: 	-0.0193%
on Hour 15: 	-0.0004%
The average return on day 4:
on Hour 9: 	0.5145%
on Hour 10: 	-0.1750%
on Hour 11: 	0.1036%
on Hour 12: 	-0.0094%
on Hour 13: 	-0.0935%
on Hour 14: 	0.1620%
on Hour 15: 	0.0664%

On day 4, hour 9 has the best average return

Given that my desired return is 0.03, which day do we participate to reap the benefit.

desired_returns = 0.03

for day_of_the_week in range(5):
    print( 'Day ' + str(day_of_the_week) + ": " + str(
        df.loc[
            ((df['Day_of_Week'] == day_of_the_week) &
            (df['% Change in Close Price'] >= desired_returns)),
            '% Change in Close Price'
        ].count()
        )
    )
Day 0: 17
Day 1: 21
Day 2: 28
Day 3: 32
Day 4: 29

Before ending the coursework, we will scale the data to values between 0 - 1. For columns that are not datetime feature, they are already in percentage so their value are inherently will be between 0 to 1, it is possible to have a value more than 1, but that is not likely to happen

for company in winning_companies:

    for column in companies_dict[company].columns[5:]:
        print(column)
        min = companies_dict[company][column].min()
        max = companies_dict[company][column].max()

        print(companies_dict[company][column].unique())

        companies_dict[company][column] = (companies_dict[company][column] - min) / (max - min)

        print(companies_dict[company][column].unique())
Hour
[10 11 12 13 14 15  9]
[0.16666667 0.33333333 0.5        0.66666667 0.83333333 1.
 0.        ]
Day
[18 19 22 23 24 25 26  1  2  3  4  5  8  9 10 11 12 15 16 17 29 30 31  6
  7 13 14 20 21 27 28]
[0.56666667 0.6        0.7        0.73333333 0.76666667 0.8
 0.83333333 0.         0.03333333 0.06666667 0.1        0.13333333
 0.23333333 0.26666667 0.3        0.33333333 0.36666667 0.46666667
 0.5        0.53333333 0.93333333 0.96666667 1.         0.16666667
 0.2        0.4        0.43333333 0.63333333 0.66666667 0.86666667
 0.9       ]
Day_of_Week
[3 4 0 1 2]
[0.75 1.   0.   0.25 0.5 ]
Month
[ 2  3  4  5  6  7  8  9 10 11 12  1]
[0.09090909 0.18181818 0.27272727 0.36363636 0.45454545 0.54545455
 0.63636364 0.72727273 0.81818182 0.90909091 1.         0.        ]
Quarter
[1 2 3 4]
[0.         0.33333333 0.66666667 1.        ]
Hour
[10 11 12 13 14 15  9]
[0.16666667 0.33333333 0.5        0.66666667 0.83333333 1.
 0.        ]
Day
[18 19 22 23 24 25 26  1  2  3  4  5  8  9 10 11 12 15 16 17 29 30 31  6
  7 13 14 20 21 27 28]
[0.56666667 0.6        0.7        0.73333333 0.76666667 0.8
 0.83333333 0.         0.03333333 0.06666667 0.1        0.13333333
 0.23333333 0.26666667 0.3        0.33333333 0.36666667 0.46666667
 0.5        0.53333333 0.93333333 0.96666667 1.         0.16666667
 0.2        0.4        0.43333333 0.63333333 0.66666667 0.86666667
 0.9       ]
Day_of_Week
[3 4 0 1 2]
[0.75 1.   0.   0.25 0.5 ]
Month
[ 2  3  4  5  6  7  8  9 10 11 12  1]
[0.09090909 0.18181818 0.27272727 0.36363636 0.45454545 0.54545455
 0.63636364 0.72727273 0.81818182 0.90909091 1.         0.        ]
Quarter
[1 2 3 4]
[0.         0.33333333 0.66666667 1.        ]
Hour
[10 11 12 13 14 15  9]
[0.16666667 0.33333333 0.5        0.66666667 0.83333333 1.
 0.        ]
Day
[18 19 22 23 24 25 26  1  2  3  4  5  8  9 10 11 12 15 16 17 29 30 31  6
  7 13 14 20 21 27 28]
[0.56666667 0.6        0.7        0.73333333 0.76666667 0.8
 0.83333333 0.         0.03333333 0.06666667 0.1        0.13333333
 0.23333333 0.26666667 0.3        0.33333333 0.36666667 0.46666667
 0.5        0.53333333 0.93333333 0.96666667 1.         0.16666667
 0.2        0.4        0.43333333 0.63333333 0.66666667 0.86666667
 0.9       ]
Day_of_Week
[3 4 0 1 2]
[0.75 1.   0.   0.25 0.5 ]
Month
[ 2  3  4  5  6  7  8  9 10 11 12  1]
[0.09090909 0.18181818 0.27272727 0.36363636 0.45454545 0.54545455
 0.63636364 0.72727273 0.81818182 0.90909091 1.         0.        ]
Quarter
[1 2 3 4]
[0.         0.33333333 0.66666667 1.        ]

Project Background

Interest and Relevance to the field

I have always been interested in the markets as my academic career has always consisted of both finance and IT. My contribution to the field may not be significant but I would like to try and experiment on my own. There are countless studies testing out state of the art machine learning models or other mathematical techniques to find an edge in the market.

Therefore I would like to explore several machine learning models using a small difference of percentage change in price rather than the raw stock price. Given enough time this will be implemented into coursework 2. There are many other studies and research projects that perform other forms of transformation to the raw stock price but I would like to proceed with the simpler implementation first.

Limitations and Further/Future Study

Stuck on one hour rate

Using change in price is not flexible and it will require quite a bit of work to convert it back to meaningful data. Because the change in price is based on the previous instance which is the hour before, there is information to be had knowing the total change in a day or in several hours. Therefore it will also limit the trading strategy to only look at the market at an hourly rate. There could be more patterns that we are missing out on by using hourly change in percentage.

Not Full Automation

Using data like this with machine learning models ultimately requires human intervention and will not be fully autonomous. It is limited to understanding and identifying what we feed and train for. It will not be able to make proper decisions on portfolio management or risk assessment. These areas require a whole new area of learning and research that would need a multi model implementation.

Studying only one stock at a time

At the moment of this project we only explore using individual stocks and the benchmark to identify patterns of decent return. An idea that can be looked into is extracting these patterns from many other stocks together. We might be able to create a model that identifies high return moments. Additionally we are also limiting ourselves to individual stock data in general. There are other financial instruments that can also be used and studied.

Limited Features

As mentioned we are only using stock data for the machine learning model. There are studies and research projects that use not only stock data but also NLP techniques or Generative Models to read and collate different sources of news. This helps predict what the sentiments are. Sentiment analysis can also be used together with the stock data for the machine learning models to identify more patterns.

Ethics

yfinance Library

This library is under the APACHE LICENSE, VERSION 2.0, which allows us to be using the code any way we want, even commercially, as long we attribute to author and the license. Therefore, as long we give credit and ensure that we are not claiming the code to be ours and original, we can use it any way.

Yahoo Finance Data

They have their terms of use stated here. Things to take note of is that Yahoo has full discretion over the performance of the API, including rate limits. It is mentioned this is because they want to keep the API and underlying services available to the public. There are paragraphs that states instructions to follow if our product or services uses the Yahoo API. There are mentions that an individual shall not use the Yahoo API in a manner that exceeds reasonable request volume, but the volume is not stated. It states that they will go by case by case basis if any offenses are committed.

References

  1. UOL Materials
  2. https://pypi.org/project/yfinance/
  3. https://legal.yahoo.com/us/en/yahoo/terms/otos/index.html