Investment Strategy Comparison - Buy and Hold versus Dollar Cost Averaging

Mar 18, 2024 | python data-analysis finance investment-strategies


An exploration into traditional investment strategies, comparing the performance of 'Buy and Hold' and 'Dollar Cost Averaging' across volatile stock datasets.

Abstract

In this project, we were able to explore the two most common investment strategy, 'Buy and Hold' and 'Dollar Cost Averaging' and our simple proposed models. The result is that if an investor is willing to take some risk and explore options outside of index funds, 'Dollar Cost Averaging' could be beneficial, especially if the stock is very volatile. Our proposed methods while there were instances where they managed outperformed both these strategies, it might not be worth the risk and effort required.

We have also looked into ways how we could prepare the data in various configurations for different uses such as having it used for data analysis or future studies where we would like to explore machine learning solutions for pattern recognition.

Note that all the values might change as they use the values of the last 730 days of data.

Libraries Required

The following are the required libraries. It is highly likely that required to download and install the 'yfinance' library.

Installing yfinance

%pip install yfinance
Requirement already satisfied: yfinance in ...
...
Note: you may need to restart the kernel to use updated packages.

Importing Libraries

import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
import yfinance as yf

Aims and Objective

Aim

The aim of the project is to explore if investing strategies supported with the use of machine learning techniques, can outperform the simple investing strategies, such as 'buy and hold', 'cost dollar averaging' and 'index fund investing', and simple trading strategies derived form data analysis.

Objectives

The objective of the project is to:

  • To create simple investment strategies using traditional methods of data analysis
  • To create investment strategies supported by machine learning techniques
  • To compare the all the different proposed strategies to see which can perform best
  • To explore which investment strategy can outperform the simple baseline strategies
  • Explore if combining these strategies would make a better investing strategy, like an ensemble of decision makers.

The aim and objectives set at the start were definitely ambitious. Machine learning supported strategies were not explored in this coursework.

Investment Strategies

Firstly, we will need to understand some context to the strategies that are being used as the base comparisons. The following are brief description of what each strategies do. These strategies acts as a baseline that our proposed strategies will need to beat to be a successful strategy. We will then proceed to demonstrate and calculate the performance of these strategies.

Buy and Hold

The easiest strategy that everyone can do without any experience or expertise, is to "buy and hold". This means that the investors or market participants, purchase a financial instrument, for the case of our project, it would be stock, and not sell it. Selling it after the price has appreciated or increased, will result in a profit while selling it if the price is lower than it is initially bought, would result in a loss.

Dollar Cost Averaging

This strategy is very similar to "buy and hold", but rather than immediately purchasing the stock from day 1, dollar cost averaging employs an interval based purchase over a period of time. This is a popular strategy because most people do not have the time and proper strategy in place to take advantage of the daily price movement of the market. Hence, by buying stock at a set interval, for example, every week. This will allow the investor to potentially buy into the market, at a better time. The downside of it is that, investor might also be buying a stock at a bad time.

Index Fund Investing

This is a strategy that is endorsed and supported by the famous investor, Warren Buffet. This strategy involves buying the most diversified stock, the index fund. Index funds are a financial instrument that functions a lot like stock, but rather than only exposing to a single company or business, investors will be expose to the average of many businesses. There are many types of index funds and they are usually country specific. The S&P500, which allows investors to get exposure of the top 500 companies in the US, is one the most well known index fund. This index fund is also widely used as a benchmark in gauging investment performance. The 'Buy and Hold' and 'Dollar Cost Average' strategy will also be applied.

Investment Strategies Baseline Performance and Illustrations

Now, we will look into all of the performances based on the strategies above. We will be using data that was mentioned in the previous coursework, when we have selected the top 3 highest price moving company, among the Nasdaq100 in the technology sector using evaluation criteria that was also presented in the previous coursework. The companies were Pinduoduo Inc (PDD), Mongodb Inc (MDB), and DoorDash Inc (DASH). The bracketed values are their stock ticker, which is a short formed text used to identify the stock on various of platforms.

Therefore, we will be using these three stocks for the baseline strategies. Additionally, we will also be using the the SPDR S&P 500 ETF Trust (SPY), which is a stock that tracks the S&P 500 performance and at a lower price.

Limitations

Baseline Model Complexity

For the sake of simplicity for the comparison performed in this coursework, we will limit the scope by assuming that the investor will only be purchasing one type of stock for every given strategy. If we had given the option to hold more than 1, the complexity of the model increases. This is because not only will we need to optimize the purchasing and selling of the stock, we will then need to have some way to optimize the weightage and the choice of stock the investor would like to hold.

Inflationary factors, Tax and Fees

The following model comparisons will not take inflation into account, and also any other regulatory tax. Because trading fees or management fee of any kind is highly depending on what investment services an individual uses, we will not be including this into our model as well.

Buying only

Because investment and trading strategies can quickly scale in difficulty and the dimensions of data that can be used, etc, we will limit all the strategies to only buying a stock and selling the stock that are being held. While the market does allow short-selling, which is selling of a stock that one does not own, this would also further add another layer of complexity and dimension that we would need to consider.

Stars Aligned

As much as we are using data to trying substantiate our decisions and creating an investment strategy, there also could be the aspect of luck, where it just so happens that our analysis and strategy coincide with days of high performance, which might skew the data for having higher returns, or the opposite. Furthermore, because they are so many factors that affect the stock market, and we are only exploring one aspect of it, the analysis performed might show perfect or obvious expandability, but once again, it could just be because of coincidence and our analysis may not actually reflect what is happening in the stock market. While we are in the search for an edge in the market, it needs to be accepted that it will not be an easy tasks and simple analysis such as the following can truly give an edge. But we can explore and see if such analysis performed, are able to at least outperform is the simple strategies.

Getting the Data

Before we can test the performance of these baseline strategies, we will need to download and extract the data. Similarly, shown in coursework one, the fastest and easiest way to do so is by using the 'yfinance' library and extract it. But the drawback for this is that we do not have the freedom to get the specific characteristic data. For example, by have a 1hr interval dataset, where we can see hourly changes to the prices, we only can go as far back as 2 years. Only if we were to use daily prices, where the change in price only happens every day, we then can extract a longer timeframe.

The following code is used to get the data from Yahoo Finance using 'yfinance'. We have saved the data so that the rest of the coursework can be replicable. Hence, the following code is commented.

## This was initially used when we do not have the data
## Using yfinance get the a 730 day of 1 hour interval financial data

#stocks = {}
#stocks_list = ["PDD", "MDB", "DASH", "SPY"]

#for stock in stocks_list:

#    # Extracting the data for the chosen ticker, having about 2 years of data, in 1 hour interval
#    stocks[stock] = yf.Ticker(stock).history(period='730d', interval='1h')

#    # Saving these dataset so that it can results can be replicable
#    stocks[stock].to_csv("stock_data" + "\\" + stock)

The following code is to read the data that was downloaded using the code above. We will store the extracted data into a dictionary, with the index of the stock's ticker.

stocks = {}
stocks_list = ["PDD", "MDB", "DASH", "SPY"]

for stock in stocks_list:

    # Extract the downloaded data
    stocks[stock] = pd.read_csv("stock_data" + "\\" + stock)

stocks
{'PDD':                        Datetime        Open        High         Low  \
 0     2021-04-15 09:30:00-04:00  134.850006  134.919998  131.889999
 1     2021-04-15 10:30:00-04:00  132.759995  132.860001  131.009995
 2     2021-04-15 11:30:00-04:00  131.149994  132.000000  131.070007
 3     2021-04-15 12:30:00-04:00  131.324997  132.000000  131.125000
 4     2021-04-15 13:30:00-04:00  131.649994  132.600006  131.649994
 ...                         ...         ...         ...         ...
 5089  2024-03-08 11:30:00-05:00  110.870003  111.160004  108.870003
 5090  2024-03-08 12:30:00-05:00  110.190002  111.139999  109.540001
 5091  2024-03-08 13:30:00-05:00  111.099998  111.160004  110.510002
 5092  2024-03-08 14:30:00-05:00  110.930000  111.209999  109.870003
 5093  2024-03-08 15:30:00-05:00  110.230003  110.459999  109.919998

            Close   Volume  Dividends  Stock Splits
 0     132.725006   646190        0.0           0.0
 1     131.229904   481802        0.0           0.0
 2     131.320007   246441        0.0           0.0
 3     131.639999   252910        0.0           0.0
 4     131.929993   354850        0.0           0.0
 ...          ...      ...        ...           ...
 5089  110.160004  2595785        0.0           0.0
 5090  111.099998  1317783        0.0           0.0
 5091  110.910004   770590        0.0           0.0
 5092  110.235001  1065732        0.0           0.0
 5093  110.339996  1566768        0.0           0.0

 [5094 rows x 8 columns],
 'MDB':                        Datetime        Open        High         Low  \
 0     2021-04-15 09:30:00-04:00  313.010010  317.940002  308.170013
 1     2021-04-15 10:30:00-04:00  310.989990  314.670013  309.480011
 2     2021-04-15 11:30:00-04:00  313.092987  318.929993  312.869995
 3     2021-04-15 12:30:00-04:00  317.970001  317.970001  315.299988
 4     2021-04-15 13:30:00-04:00  317.635010  321.470001  317.635010
 ...                         ...         ...         ...         ...
 5089  2024-03-08 11:30:00-05:00  381.329987  387.989899  381.000000
 5090  2024-03-08 12:30:00-05:00  386.500000  388.000000  381.079987
 5091  2024-03-08 13:30:00-05:00  384.119995  390.000000  381.559998
 5092  2024-03-08 14:30:00-05:00  384.415009  389.464996  384.364990
 5093  2024-03-08 15:30:00-05:00  385.475006  387.450012  383.184998

            Close  Volume  Dividends  Stock Splits
 0     311.209991   92921        0.0           0.0
 1     312.670013   55547        0.0           0.0
 2     318.079987   81290        0.0           0.0
 3     317.605011   56501        0.0           0.0
 4     319.959991   88311        0.0           0.0
 ...          ...     ...        ...           ...
 5089  386.524994  455519        0.0           0.0
 5090  384.000000  483851        0.0           0.0
 5091  384.359985  522412        0.0           0.0
 5092  385.424988  480558        0.0           0.0
 5093  383.420013  600317        0.0           0.0

 [5094 rows x 8 columns],
 'DASH':                        Datetime        Open        High         Low  \
 0     2021-04-15 09:30:00-04:00  145.000000  152.449997  144.869995
 1     2021-04-15 10:30:00-04:00  151.899994  154.880005  151.350006
 2     2021-04-15 11:30:00-04:00  152.660004  153.270004  151.240005
 3     2021-04-15 12:30:00-04:00  153.070007  154.169998  152.039993
 4     2021-04-15 13:30:00-04:00  153.985001  154.500000  151.699997
 ...                         ...         ...         ...         ...
 5089  2024-03-08 11:30:00-05:00  131.850006  132.559998  131.429993
 5090  2024-03-08 12:30:00-05:00  132.220001  132.220001  129.580002
 5091  2024-03-08 13:30:00-05:00  129.649994  130.770004  129.309998
 5092  2024-03-08 14:30:00-05:00  130.649994  131.779999  130.589996
 5093  2024-03-08 15:30:00-05:00  131.259995  131.970001  131.130005

            Close  Volume  Dividends  Stock Splits
 0     151.964996  693400        0.0           0.0
 1     152.500000  802458        0.0           0.0
 2     153.139999  287421        0.0           0.0
 3     153.884995  360681        0.0           0.0
 4     152.520004  285774        0.0           0.0
 ...          ...     ...        ...           ...
 5089  132.149994  408198        0.0           0.0
 5090  129.649994  630248        0.0           0.0
 5091  130.639999  395151        0.0           0.0
 5092  131.300003  467565        0.0           0.0
 5093  131.800003  575430        0.0           0.0

 [5094 rows x 8 columns],
 'SPY':                        Datetime        Open        High         Low  \
 0     2021-04-15 09:30:00-04:00  413.970001  414.559998  413.690002
 1     2021-04-15 10:30:00-04:00  414.559998  415.459991  414.554993
 2     2021-04-15 11:30:00-04:00  414.648987  415.230011  414.519989
 3     2021-04-15 12:30:00-04:00  415.179993  415.700012  415.089996
 4     2021-04-15 13:30:00-04:00  415.299011  415.670105  415.239990
 ...                         ...         ...         ...         ...
 5087  2024-03-08 11:30:00-05:00  514.750000  515.280029  513.570007
 5088  2024-03-08 12:30:00-05:00  514.640015  514.840027  512.059998
 5089  2024-03-08 13:30:00-05:00  512.440002  513.679993  511.130005
 5090  2024-03-08 14:30:00-05:00  513.460022  514.460022  511.779999
 5091  2024-03-08 15:30:00-05:00  511.929993  512.820007  511.720001

            Close    Volume  Dividends  Stock Splits  Capital Gains
 0     414.559998  11736294        0.0           0.0            0.0
 1     414.645691   8902400        0.0           0.0            0.0
 2     415.170013   5516651        0.0           0.0            0.0
 3     415.290009   5138590        0.0           0.0            0.0
 4     415.609985   5492613        0.0           0.0            0.0
 ...          ...       ...        ...           ...            ...
 5087  514.630005  11839572        0.0           0.0            0.0
 5088  512.330017  10341053        0.0           0.0            0.0
 5089  513.460022  10139821        0.0           0.0            0.0
 5090  511.929993   9164240        0.0           0.0            0.0
 5091  511.720001   9928936        0.0           0.0            0.0

[5092 rows x 9 columns]}

Define a function to plot the stock

Define a function so that it will be easier for us to repeatedly plot when needed.

def plot_stock(stock):
    plt.figure(figsize=(10,5))

    df = stocks[stock].copy()
    df['Datetime'] = pd.to_datetime(df['Datetime'], errors='coerce', utc=True)
    df = df.set_index('Datetime')

    plt.plot(df['Close'])
    plt.gcf().autofmt_xdate()
    plt.xlabel('Datetime')
    plt.ylabel('Close Price')
    plt.title(stock)
    plt.show()
for stock in stocks_list:
    plot_stock(stock)
Visualization 1
Visualization 2
Visualization 3
Visualization 4

Creating DataFrame to store all performances

performances = pd.DataFrame(columns = ['Stock Ticker', 'Buy and Hold'])
performances['Stock Ticker'] = stocks_list
performances = performances.set_index('Stock Ticker')

performances
             Buy and Hold
Stock Ticker
PDD                   NaN
MDB                   NaN
DASH                  NaN
SPY                   NaN

Buy and Hold

There will be 4 outputs for this. The 3 stocks and the index fund.

For this strategy, it is a simple calculation will suffice. We will only need to take the percentage difference between the first row and the last row.

buy_and_hold_performances = []

for stock in stocks_list:

    print(stock)

    last_price = stocks[stock]['Close'].iloc[-1]
    cost_price = stocks[stock]['Open'].iloc[0]
    print("Last Price: ", last_price)
    print("Cost Price: ", cost_price)

    percent_return = (last_price - cost_price) / (cost_price)
    print("Return (%): ", percent_return * 100)

    buy_and_hold_performances.append(percent_return)

    print()

    plt.figure(figsize=(10,5))

    df = stocks[stock].copy()
    df['Datetime'] = pd.to_datetime(df['Datetime'], errors='coerce', utc=True)
    df = df.set_index('Datetime')

    plt.axhline(df.loc[df.index[0],'Close'],linestyle='--',color='g',label='Entry')
    plt.axhline(df.loc[df.index[-1],'Close'],linestyle='--',color='r',label='Final Price')
    plt.plot(df['Close'])

    plt.gcf().autofmt_xdate()
    plt.xlabel('Datetime')
    plt.ylabel('Close Price')
    plt.ylim(0)
    plt.title(stock)
    plt.legend()
    plt.show()

performances['Buy and Hold'] = buy_and_hold_performances
performances
PDD
Last Price:  110.33999633789062
Cost Price:  134.85000610351562
Return (%):  -18.175757253440725
Visualization 5
MDB
Last Price:  383.4200134277344
Cost Price:  313.010009765625
Return (%):  22.494489462120022
Visualization 6
DASH
Last Price:  131.8000030517578
Cost Price:  145.0
Return (%):  -9.103446171201508
Visualization 7
SPY
Last Price:  511.7200012207031
Cost Price:  413.9700012207031
Return (%):  23.612822115553676
Visualization 8
              Buy and Hold
Stock Ticker
PDD              -0.181758
MDB               0.224945
DASH             -0.091034
SPY               0.236128

As shown in the results, there are stocks, such as PDD and DASH that have dropped in price from where it has started. This is indicated by the green line being above the red line. If the green line is above the red line, that means we have bought a stock above the final price, which will result in a loss. It is fine though as we still use it as a baseline, as we would like to find out if any other strategies can perform better than the loss that is produced by this baseline strategy. Will the other strategy produce a lower loss or will there be able to produce a profit.

Additionally, as shown, that given if an investor had invested during the start of the dataset, 15 April 2021, and held each of the stock until 8th March 2024, they would have ended up with a return/profit of the shown percentage in the performance DataFrame.

Dollar Cost Averaging

As for dollar cost averaging, the strategy is to go into the market at consistent and spread out pace, rather than entering all in one go. Spreading the starting fund throughout. Deciding on the frequency is also something that we can get into and properly optimize, as we can choose which day or time to buy, how frequent to buy like everyday or weekly or monthly, etc. Therefore, we will explore if we were to buy it daily, weekly and monthly. For weekly and monthly, we will go into the market on Friday, and for all of the frequency, we will enter at end of the day, so when the market is about to close.

for stock in stocks_list:

    df = stocks[stock].copy()
    df['Datetime'] = pd.to_datetime(df['Datetime'], errors='coerce', utc=True)
    last_price = stocks[stock]['Close'].iloc[-1]

    daily_dca = df[df['Datetime'].dt.hour == 15].copy()
    daily_dca['Return'] = daily_dca['Close'].map(lambda x: ((last_price - x) / x))
    performances.loc[stock, 'Dollar Cost Averaging (Daily)'] = daily_dca['Return'].sum() / len(daily_dca)

    plt.figure(figsize=(10,7))
    df = df.set_index('Datetime')
    plt.plot(df['Close'], alpha = 0.5)
    for entry in daily_dca['Datetime']:
        plt.scatter(entry, df.loc[entry, 'Close'],
                    color = 'g' if df.loc[entry, 'Close'] < df.loc[df.index[-1], 'Close'] else 'r',
                    s = 10
        )
    plt.axhline(df.loc[df.index[-1], 'Close'], color = 'grey', linestyle = '--', label= 'Final Price')
    plt.gcf().autofmt_xdate()
    plt.ylim(0)
    plt.xlabel('Datetime')
    plt.ylabel('Close Price')
    plt.title(stock)
    plt.legend()
    plt.show()

performances
Visualization 9
Visualization 10
Visualization 11
Visualization 12
              Buy and Hold  Dollar Cost Averaging (Daily)
Stock Ticker
PDD              -0.181758                       0.489227
MDB               0.224945                       0.254254
DASH             -0.091034                       0.552831
SPY               0.236128                       0.200805

Because a daily DCA means that an investor would be buying stock everyday. Hence, we see many dots appearing all over the chart. The grey line represent the final price at that we are using. Entries where it is below the line, are making money, and those that are above are losing money. Hence, it is why it is highlighted in green or red. Green for those that are below the line, which are making a profit, and those that are in red are losing money. the further away the dots are from the grey line, the more the entry is making or losing money.

Already, by using a daily dollar cost averaging, we see more returns for PDD and DASH. This is also because the price movement for PDD and DASH, is the exact scenario why an investor should DCA. When, the prices fall, you would want to pick up their share bit by bit, assuming it is a share an investor believes the prices will go up eventually and as shown for the SPY stock, where growth over time was steady, it was better to had held it from the start than to buy into it overtime. SPY is a good example of the downside of DCA.

for stock in stocks_list:

    df = stocks[stock].copy()
    df['Datetime'] = pd.to_datetime(df['Datetime'], errors='coerce', utc=True)
    last_price = stocks[stock]['Close'].iloc[-1]

    weekly_dca = df[(df['Datetime'].dt.hour == 15) & (df['Datetime'].dt.day_of_week == 4)].copy()
    weekly_dca['Return'] = weekly_dca['Close'].map(lambda x: ((last_price - x) / x))
    performances.loc[stock, 'Dollar Cost Averaging (Weekly)'] = weekly_dca['Return'].sum() / len(weekly_dca)

    plt.figure(figsize=(10,7))
    df = df.set_index('Datetime')

    plt.plot(df['Close'], alpha = 0.5)
    for entry in weekly_dca['Datetime']:

        plt.scatter(entry, df.loc[entry, 'Close'],
                    color = 'g' if df.loc[entry, 'Close'] < df.loc[df.index[-1], 'Close'] else 'r',
                    s = 20
        )

    plt.axhline(df.loc[df.index[-1], 'Close'], color = 'grey', linestyle = '--', label= 'Final Price')
    plt.gcf().autofmt_xdate()
    plt.ylim(0)
    plt.xlabel('Datetime')
    plt.ylabel('Close Price')
    plt.title(stock + ' - Weekly')
    plt.legend()
    plt.show()

    monthly_dca = weekly_dca.copy()
    monthly_dca['Month'] = monthly_dca['Datetime'].dt.month
    monthly_dca['Last Friday'] = monthly_dca['Month'].shift(-1) - monthly_dca['Month']
    monthly_dca = monthly_dca[monthly_dca['Last Friday'] == 1]
    monthly_dca['Return'] = monthly_dca['Close'].map(lambda x: ((last_price - x) / x))
    performances.loc[stock, 'Dollar Cost Averaging (Monthly)'] = monthly_dca['Return'].sum() / len(monthly_dca)

    plt.figure(figsize=(10,7))
    plt.plot(df['Close'], alpha = 0.5)
    for entry in monthly_dca['Datetime']:

        plt.scatter(entry, df.loc[entry, 'Close'],
                    color = 'g' if df.loc[entry, 'Close'] < df.loc[df.index[-1], 'Close'] else 'r',
                    s = 30
        )

    plt.axhline(df.loc[df.index[-1], 'Close'], color = 'grey', linestyle = '--', label= 'Final Price')
    plt.gcf().autofmt_xdate()
    plt.ylim(0)
    plt.xlabel('Datetime')
    plt.ylabel('Close Price')
    plt.title(stock + ' - Monthly')
    plt.legend()
    plt.show()

performances
Visualization 13
Visualization 14
Visualization 15
Visualization 16
Visualization 17
Visualization 18
Visualization 19
Visualization 20
              Buy and Hold  Dollar Cost Averaging (Daily)  \
Stock Ticker
PDD              -0.181758                       0.489227
MDB               0.224945                       0.254254
DASH             -0.091034                       0.552831
SPY               0.236128                       0.200805

              Dollar Cost Averaging (Weekly)  Dollar Cost Averaging (Monthly)
Stock Ticker
PDD                                 0.472529                         0.503826
MDB                                 0.253272                         0.242322
DASH                                0.556192                         0.554912
SPY                                 0.200736                         0.200701
for stock in stocks_list:

    df = stocks[stock].copy()
    df['Datetime'] = pd.to_datetime(df['Datetime'], errors='coerce', utc=True)
    last_price = stocks[stock]['Close'].iloc[-1]

    hourly_dca = df.copy()
    hourly_dca['Return'] = hourly_dca['Close'].map(lambda x: ((last_price - x)/ x))
    performances.loc[stock, 'Dollar Cost Averaging (Hourly)'] = hourly_dca['Return'].sum() / len(hourly_dca)

performances
              Buy and Hold  Dollar Cost Averaging (Daily)  \
Stock Ticker
PDD              -0.181758                       0.489227
MDB               0.224945                       0.254254
DASH             -0.091034                       0.552831
SPY               0.236128                       0.200805

              Dollar Cost Averaging (Weekly)  Dollar Cost Averaging (Monthly)  \
Stock Ticker
PDD                                 0.472529                         0.503826
MDB                                 0.253272                         0.242322
DASH                                0.556192                         0.554912
SPY                                 0.200736                         0.200701

              Dollar Cost Averaging (Hourly)
Stock Ticker
PDD                                 0.490052
MDB                                 0.254119
DASH                                0.552687
SPY                                 0.200839

Similarly, the same visualization technique is used to demonstrate the 'buy and hold' strategy, was not created for the hourly one because it would be mess as there are going to be so many entries.

Based on the performance collected, in most cases, it outperformed the 'buy and hold' strategy, but it is not as effective for the SPY. Because of the steadier uptrend, DCA is not as effective. As for MDB, we can see that it is quite a volatile stock and due to this volatility, we can see that there are about equal amounts of green there is as to red. In a situation like this, neither baseline strategies are getting an edge.

Using Traditional Data Analysis

In this part, we will be using the some simple data analysis such as finding the day that has the most positive price movement, is there any particular hour that results in the most price movement, etc. By exploring these trend, we will try to see if we can create any strategies that could beat the baseline strategies above.

One thing to note is that, for the strategies above, we did not really need to bother about the exit of any of the purchase of stock. Exit, as in selling a purchased stock, or closing the position. Only by doing so, we are able to have realized profits or losses. On the other hand, the strategies below, will need to take that into account.

Limitation

Length of Time for Investing

Using this method, similar to how we train a machine learning model, we will need have a training set and a test set. Because, we need some data to validate if the strategy works. This might benefit the strategy because it is exposed to the market for lesser duration, but it could work against it where the strategy does not have enough time to accumulate and realize any positive price movement. Therefore, given that there are 730 days of 1 hour data, which is around 5094 rows of data and we will use 475 days as training data, which will be around 3311 rows of data, where we will use it to identify trends and patterns, and then test it out with the remaining days to see if we are able beat the strategies above.

The use of percentage change - 1 hour rate

Because by using the percentage change, it gives more meaningful data to the actual price movement, we can have a better comparison with across the different stocks. However, the downside of preprocessing price this way is that we are locked into the the 1 hour rate and the strategy might need to revolve around by the fact. For example, if we were to find which hour will give us the best average return, we have to get into the market at that hour, and then exit it immediately

The limitation stated in the above, section also applies to this as well.

Findings from Coursework 1

In coursework 1, used percentage change between the hours to find out the average pattern that could lead to a high percentage return in the next hour. We also explored that by extracting the date time information, we can find simple trends such as the average return of a given day at which hour.

Therefore, we will be re-doing the findings from coursework 1, but with only 475 days of data, then, we will put it to the test and see if we are able to use the decisions made using our data analysis to make investment decisions.

Pre-process Data

We will also need to create a column for the percentage return. We will use the next row's close price, to determine what is the percentage change of the current row. So that we know that if we were to invest at this hour, what is the next amount of percentage change can we expect.

Then, we will then need to also extract the date time features, so that we can use it to find trends and pattern. and finally split it into training data and testing data. Most of these steps are already shown in coursework 1.

training_stock_data = {}
testing_stock_data = {}

for stock in stocks_list:

    # Referencing the value the stock DataFrame
    df = stocks[stock].copy()

    # By using shift(-1), we can access the close price of the next row
    df['% Change in Close Price'] = (df['Close'].shift(-1) - df['Close']) / df['Close']

    # Extracting the datetime feature
    df['Datetime'] = pd.to_datetime(df['Datetime'],errors='coerce',utc=True)
    df['Datetime'] = df['Datetime'].dt.tz_localize(None)
    df['Hour'] = df['Datetime'].dt.hour
    df['Day'] = df['Datetime'].dt.day
    df['Day_of_Week'] = df['Datetime'].dt.day_of_week
    df['Month'] = df['Datetime'].dt.month

    min_hour = df['Hour'].min()
    df['Hour'] = df['Hour'] - min_hour

    # Splitting it into training and splitting. 65% training and 35% testing
    training_stock_data[stock] = df[:3312]
    testing_stock_data[stock] = df[3312:-1]

training_stock_data['DASH'].head()
             Datetime        Open        High         Low       Close  Volume  \
0 2021-04-15 13:30:00  145.000000  152.449997  144.869995  151.964996  693400
1 2021-04-15 14:30:00  151.899994  154.880005  151.350006  152.500000  802458
2 2021-04-15 15:30:00  152.660004  153.270004  151.240005  153.139999  287421
3 2021-04-15 16:30:00  153.070007  154.169998  152.039993  153.884995  360681
4 2021-04-15 17:30:00  153.985001  154.500000  151.699997  152.520004  285774

   Dividends  Stock Splits  % Change in Close Price  Hour  Day  Day_of_Week  \
0        0.0           0.0                 0.003521     0   15            3
1        0.0           0.0                 0.004197     1   15            3
2        0.0           0.0                 0.004865     2   15            3
3        0.0           0.0                -0.008870     3   15            3
4        0.0           0.0                -0.010556     4   15            3

   Month
0      4
1      4
2      4
3      4
4      4
for stock in stocks_list:
    plt.figure(figsize=(9,5))

    df = training_stock_data[stock].copy()
    df2 = testing_stock_data[stock].copy()
    df = df.set_index('Datetime')
    df2 = df2.set_index('Datetime')

    plt.plot(df['Close'],label='Training Data')
    plt.plot(df2['Close'], label='Testing Data')
    plt.xlabel('Datetime')
    plt.ylabel('Close Price')
    plt.title(stock)
    plt.legend()
    plt.ylim(0)

plt.show()
Visualization 21
Visualization 22
Visualization 23
Visualization 24

We will prepare another one where we consolidated all the stocks selected into one. By analyzing all of them together, we might find a better pattern because we have the more data to validate the pattern or trend. On the other hand, it might give us worst performance because of how general the patterns that are identified and it may not be specifically applicable to the individual stock.

consolidated_training_stock_data = pd.DataFrame()

for stock in stocks_list:
    consolidated_training_stock_data = pd.concat([training_stock_data[stock], consolidated_training_stock_data],
                                                 ignore_index= True, sort = False)

Data Analysis

We will be using the training stock data and the consolidated stock data, to find some trends and patterns, and then test it to see its performance. Unlike 'buy and hold' and 'dollar cost averaging', for the return calculation for the following strategies purpose, we will not only be entering the market according to our analysis, but also exiting the market (selling our position, or selling the position we have) because of how we have changed processed the absolute price to percentage change, by the hour.

Best Hour Every Day

We will start of by finding what is the best hour to have a one hour trade. Then, we will test out what kind of performance are can we expect from using this strategy.

Individual Stock
average_return_hourly_individual = pd.DataFrame(columns=['Stock', 'Hour', '% Change in Close Price'])

for stock in stocks_list:

    print(stock)
    df = training_stock_data[stock].copy()

    min_hour = df['Hour'].min()
    max_hour = df['Hour'].max()

    for hour in range(min_hour,max_hour+1):

        temp = df.loc[(df['Hour'] == hour),'% Change in Close Price'].mean()
        average_return_hourly_individual.loc[len(average_return_hourly_individual.index)] = \
            [stock, hour-min_hour, temp]

        print(
            'The average return on the hour ' + str(hour) + '\t' +
            "{:.3f}".format(temp * 100) + '%'
        )

    print()

average_return_hourly_individual_top3 = average_return_hourly_individual.sort_values(['Stock','% Change in Close Price'],ascending=False)\
    .groupby('Stock').head(3)

average_return_hourly_individual_top3
PDD
The average return on the hour 0	-0.140%
The average return on the hour 1	0.010%
The average return on the hour 2	-0.073%
The average return on the hour 3	0.056%
The average return on the hour 4	0.040%
The average return on the hour 5	0.010%
The average return on the hour 6	0.166%
The average return on the hour 7	-0.063%

MDB
The average return on the hour 0	-0.102%
The average return on the hour 1	-0.064%
The average return on the hour 2	-0.037%
The average return on the hour 3	0.005%
The average return on the hour 4	0.100%
The average return on the hour 5	-0.011%
The average return on the hour 6	0.093%
The average return on the hour 7	0.064%

DASH
The average return on the hour 0	0.028%
The average return on the hour 1	-0.133%
The average return on the hour 2	-0.078%
The average return on the hour 3	-0.028%
The average return on the hour 4	0.072%
The average return on the hour 5	0.009%
The average return on the hour 6	0.039%
The average return on the hour 7	0.023%

SPY
The average return on the hour 0	0.004%
The average return on the hour 1	-0.015%
The average return on the hour 2	-0.004%
The average return on the hour 3	0.003%
The average return on the hour 4	0.028%
The average return on the hour 5	-0.001%
The average return on the hour 6	-0.003%
The average return on the hour 7	-0.027%
   Stock  Hour  % Change in Close Price
28   SPY     4                 0.000279
24   SPY     0                 0.000039
27   SPY     3                 0.000031
6    PDD     6                 0.001662
3    PDD     3                 0.000559
4    PDD     4                 0.000396
12   MDB     4                 0.001002
14   MDB     6                 0.000925
15   MDB     7                 0.000643
20  DASH     4                 0.000725
22  DASH     6                 0.000394
16  DASH     0                 0.000285
Consolidated
average_return_hourly_consolidated = pd.DataFrame(columns=['Hour', '% Change in Close Price'])

df = consolidated_training_stock_data.copy()

min_hour = df['Hour'].min()
max_hour = df['Hour'].max()

for hour in range(min_hour,max_hour+1):

    temp = df.loc[(df['Hour'] == hour),'% Change in Close Price'].mean()
    average_return_hourly_consolidated.loc[len(average_return_hourly_consolidated.index)] = \
        [hour-min_hour, temp]

    print(
        'The average return on the hour ' + str(hour) + '\t' +
        "{:.3f}".format(temp*100) + '%'
    )

print()

average_return_hourly_consolidated.sort_values(['% Change in Close Price'],ascending=False)
The average return on the hour 0	-0.053%
The average return on the hour 1	-0.050%
The average return on the hour 2	-0.048%
The average return on the hour 3	0.009%
The average return on the hour 4	0.060%
The average return on the hour 5	0.002%
The average return on the hour 6	0.074%
The average return on the hour 7	-0.001%
   Hour  % Change in Close Price
6   6.0                 0.000738
4   4.0                 0.000600
3   3.0                 0.000088
5   5.0                 0.000017
7   7.0                -0.000009
2   2.0                -0.000481
1   1.0                -0.000504
0   0.0                -0.000525
plt.figure(figsize=(13,7))
for stock in stocks_list:

    temp = average_return_hourly_individual.loc[(average_return_hourly_individual['Stock'] == stock)].copy()
    plt.plot(temp['Hour'],temp['% Change in Close Price'],label=stock,alpha=0.6)
    plt.scatter(temp['Hour'],temp['% Change in Close Price'],alpha=0.6)

plt.plot(average_return_hourly_consolidated['Hour'],
         average_return_hourly_consolidated['% Change in Close Price'],
         label = 'Consolidated', linestyle = '--', linewidth = 5)

plt.legend()
plt.title("Average Hourly Return")
plt.xlabel('Hour')
plt.ylabel('% Change in Close Price')
plt.show()
Visualization 25

Based on our findings and results above, it seems that at hour 4 and 6, is where generally all of the stock from our small sample list, will have the biggest positive price movement. We will now proceed to use the testing set of data, to see what are the performances we are able to achieve.

The return column created here is to track, how much have we made or loss. Therefore, the column acts as a balance indicator, like in a bank account, and reflect the current return of a given row. Therefore, the final value in the column, should indicate the value we would want to see. Because the column is already in decimal, we can now just add 1 to the current percentage change in close price, and multiple by the previous result, which gives us the new current return at the given row.

General Best 3 Hours

Using the best three performing hours, derived from the consolidated list, this will be the strategy. We will purchase the stock according to the general best performing hours found with the consolidated list. Therefore, that would at hour 0, hour 4 and hour 6. The strategy is to buy in at the hour, and then sell the stock an hour later.

print("Using Training Data")
print("_____________________")

for stock in stocks_list:
    print(stock)
    df = training_stock_data[stock].copy()
    df['Datetime'] = pd.to_datetime(df['Datetime'], errors='coerce', utc=True)
    temp = df[(df['Hour'] == 4 ) | (df['Hour'] == 6) | (df['Hour'] == 3)].copy()

    temp['Returns'] = 0
    return_col = temp.columns.get_loc('Returns')
    change_in_close_price_col = temp.columns.get_loc('% Change in Close Price')

    for i in range(0, len(temp)):

        if i == 0:
            temp.iloc[i, return_col] = (1 + temp.iloc[i,change_in_close_price_col]) * 1
        else:
            temp.iloc[i, return_col] = (1 + temp.iloc[i,change_in_close_price_col]) * temp.iloc[i-1,return_col]

    performances.loc[stock, 'General Best 3 Hours - Training'] = temp.iloc[-1,return_col] - 1
    print(temp.iloc[-1,return_col])

    plt.figure(figsize=(15,9))

    df = df.set_index('Datetime')
    temp = temp.set_index('Datetime')

    plt.plot(df['Close'], alpha = 0.5)

    for entry in temp.index:

        plt.scatter(entry, df.loc[entry, 'Close'],
            color = 'g' if temp.loc[entry, '% Change in Close Price'] >= 0 else 'r',
            s = 10
        )

    plt.gcf().autofmt_xdate()
    plt.ylim(0)
    plt.xlabel('Datetime')
    plt.ylabel('Close Price')
    plt.title(stock + ' - Training')
    plt.show()

print("\n")
print("Using Testing Data")
print("_____________________")

for stock in stocks_list:
    print(stock)

    df = testing_stock_data[stock].copy()
    temp = df[(df['Hour'] == 4 ) | (df['Hour'] == 6) | (df['Hour'] == 3)].copy()

    temp['Returns'] = 0
    return_col = temp.columns.get_loc('Returns')
    change_in_close_price_col = temp.columns.get_loc('% Change in Close Price')

    for i in range(0, len(temp)):

        if i == 0:
            temp.iloc[i, return_col] = (1 + temp.iloc[i,change_in_close_price_col]) * 1
        else:
            temp.iloc[i, return_col] = (1 + temp.iloc[i,change_in_close_price_col]) * temp.iloc[i-1,return_col]

    performances.loc[stock, 'General Best 3 Hours - Testing'] = temp.iloc[-1,return_col] - 1
    print(temp.iloc[-1,return_col])

    plt.figure(figsize=(15,9))

    df = df.set_index('Datetime')
    temp = temp.set_index('Datetime')

    plt.plot(df['Close'], alpha = 0.5)

    for entry in temp.index:

        plt.scatter(entry, df.loc[entry, 'Close'],
            color = 'g' if temp.loc[entry, '% Change in Close Price'] >= 0 else 'r',
            s = 10
        )

    plt.gcf().autofmt_xdate()
    plt.ylim(0)
    plt.xlabel('Datetime')
    plt.ylabel('Close Price')
    plt.title(stock + ' - Testing')
    plt.show()

performances
Using Training Data
_____________________
PDD
2.1043466734327567
Visualization 26
MDB
1.9001580234012414
Visualization 27
DASH
1.1354003896621916
Visualization 28
SPY
1.1216981139228874
Visualization 29


Using Testing Data
_____________________
PDD
0.8996901198051401
Visualization 30
MDB
1.3587372122741483
Visualization 31
DASH
1.4780411991174967
Visualization 32
SPY
1.1229863011156784
Visualization 33
              Buy and Hold  Dollar Cost Averaging (Daily)  \
Stock Ticker
PDD              -0.181758                       0.489227
MDB               0.224945                       0.254254
DASH             -0.091034                       0.552831
SPY               0.236128                       0.200805

              Dollar Cost Averaging (Weekly)  Dollar Cost Averaging (Monthly)  \
Stock Ticker
PDD                                 0.472529                         0.503826
MDB                                 0.253272                         0.242322
DASH                                0.556192                         0.554912
SPY                                 0.200736                         0.200701

              Dollar Cost Averaging (Hourly)  General Best 3 Hours - Training  \
Stock Ticker
PDD                                 0.490052                         1.104347
MDB                                 0.254119                         0.900158
DASH                                0.552687                         0.135400
SPY                                 0.200839                         0.121698

              General Best 3 Hours - Testing
Stock Ticker
PDD                                -0.100310
MDB                                 0.358737
DASH                                0.478041
SPY                                 0.122986
General Best Hour

Using the best performing hour, derived from the consolidated list, this will be the strategy. We will purchase the stock according to the general best performing hour found with the consolidated list. Therefore, that would at hour 0, hour 4 and hour 6. The strategy is to buy in at the hour, and then sell the stock an hour later.

print("Using Training Data")
print("_____________________")

for stock in stocks_list:
    print(stock)

    df = training_stock_data[stock].copy()
    temp = df[(df['Hour'] == 6)].copy()

    temp['Returns'] = 0
    return_col = temp.columns.get_loc('Returns')
    change_in_close_price_col = temp.columns.get_loc('% Change in Close Price')

    for i in range(0, len(temp)):

        if i == 0:
            temp.iloc[i, return_col] = (1 + temp.iloc[i,change_in_close_price_col]) * 1
        else:
            temp.iloc[i, return_col] = (1 + temp.iloc[i,change_in_close_price_col]) * temp.iloc[i-1,return_col]

    performances.loc[stock, 'General Best Hour - Training'] = temp.iloc[-1,return_col] - 1
    print(temp.iloc[-1,return_col])


    plt.figure(figsize=(15,9))

    df = df.set_index('Datetime')
    temp = temp.set_index('Datetime')

    plt.plot(df['Close'], alpha = 0.5)

    for entry in temp.index:

        plt.scatter(entry, df.loc[entry, 'Close'],
            color = 'g' if temp.loc[entry, '% Change in Close Price'] >= 0 else 'r',
            s = 10
        )

    plt.gcf().autofmt_xdate()
    plt.ylim(0)
    plt.xlabel('Datetime')
    plt.ylabel('Close Price')
    plt.title(stock + ' - Training')
    plt.show()


print("\n")
print("Using Testing Data")
print("_____________________")

for stock in stocks_list:
    print(stock)

    df = testing_stock_data[stock].copy()
    temp = df[(df['Hour'] == 6)].copy()

    temp['Returns'] = 0
    return_col = temp.columns.get_loc('Returns')
    change_in_close_price_col = temp.columns.get_loc('% Change in Close Price')

    for i in range(0, len(temp)):

        if i == 0:
            temp.iloc[i, return_col] = (1 + temp.iloc[i,change_in_close_price_col]) * 1
        else:
            temp.iloc[i, return_col] = (1 + temp.iloc[i,change_in_close_price_col]) * temp.iloc[i-1,return_col]

    performances.loc[stock, 'General Best Hour - Testing'] = temp.iloc[-1,return_col] - 1
    print(temp.iloc[-1,return_col])

    plt.figure(figsize=(15,9))

    df = df.set_index('Datetime')
    temp = temp.set_index('Datetime')

    plt.plot(df['Close'], alpha = 0.5)

    for entry in temp.index:

        plt.scatter(entry, df.loc[entry, 'Close'],
            color = 'g' if temp.loc[entry, '% Change in Close Price'] >= 0 else 'r',
            s = 10
        )

    plt.gcf().autofmt_xdate()
    plt.ylim(0)
    plt.xlabel('Datetime')
    plt.ylabel('Close Price')
    plt.title(stock + ' - Testing')
    plt.show()

performances
Using Training Data
_____________________
PDD
1.3897785667065108
Visualization 34
MDB
1.2144736634274178
Visualization 35
DASH
0.9756114959627441
Visualization 36
SPY
0.9742121489329175
Visualization 37


Using Testing Data
_____________________
PDD
0.962917971772168
Visualization 38
MDB
1.5430558320874295
Visualization 39
DASH
1.4687324315599015
Visualization 40
SPY
1.102683690578897
Visualization 41
              Buy and Hold  Dollar Cost Averaging (Daily)  \
Stock Ticker
PDD              -0.181758                       0.489227
MDB               0.224945                       0.254254
DASH             -0.091034                       0.552831
SPY               0.236128                       0.200805

              Dollar Cost Averaging (Weekly)  Dollar Cost Averaging (Monthly)  \
Stock Ticker
PDD                                 0.472529                         0.503826
MDB                                 0.253272                         0.242322
DASH                                0.556192                         0.554912
SPY                                 0.200736                         0.200701

              Dollar Cost Averaging (Hourly)  General Best 3 Hours - Training  \
Stock Ticker
PDD                                 0.490052                         1.104347
MDB                                 0.254119                         0.900158
DASH                                0.552687                         0.135400
SPY                                 0.200839                         0.121698

              General Best 3 Hours - Testing  General Best Hour - Training  \
Stock Ticker
PDD                                -0.100310                      0.389779
MDB                                 0.358737                      0.214474
DASH                                0.478041                     -0.024389
SPY                                 0.122986                     -0.025788

              General Best Hour - Testing
Stock Ticker
PDD                             -0.037082
MDB                              0.543056
DASH                             0.468732
SPY                              0.102684
Individual Best 3 Hours

Because we have also analyzed them individually, we will try to use the individual best hours and also see the results.

individual_best_hour = {
    'SPY': [4,0,3],
    'PDD': [6,3,4],
    'MDB': [4,6,7],
    'DASH': [4,6,0]
}

print("Using Training Data")
print("_____________________")

for stock in stocks_list:
    print(stock)

    df = training_stock_data[stock].copy()
    temp = pd.DataFrame(columns = df.columns)

    for hour_no, best_hour in enumerate(individual_best_hour[stock]):

        if hour_no == 0:
            temp = df[(df['Hour'] == best_hour)]
        else:
            temp = pd.concat([temp, df[df['Hour'] == best_hour]])

    temp['Returns'] = 0
    return_col = temp.columns.get_loc('Returns')
    change_in_close_price_col = temp.columns.get_loc('% Change in Close Price')

    for i in range(0, len(temp)):
        if i == 0:
            temp.iloc[i, return_col] = (1 + temp.iloc[i,change_in_close_price_col]) * 1
        else:
            temp.iloc[i, return_col] = (1 + temp.iloc[i,change_in_close_price_col]) * temp.iloc[i-1,return_col]

    performances.loc[stock, 'Individual Best 3 Hours - Training'] = temp.iloc[-1,return_col] - 1
    print(temp.iloc[-1,return_col])

    plt.figure(figsize=(15,9))

    df = df.set_index('Datetime')
    temp = temp.set_index('Datetime')

    plt.plot(df['Close'], alpha = 0.5)

    for entry in temp.index:

        plt.scatter(entry, df.loc[entry, 'Close'],
            color = 'g' if temp.loc[entry, '% Change in Close Price'] >= 0 else 'r',
            s = 10
        )

    plt.gcf().autofmt_xdate()
    plt.ylim(0)
    plt.xlabel('Datetime')
    plt.ylabel('Close Price')
    plt.title(stock + ' - Training')
    plt.show()

print("\n")
print("Using Testing Data")
print("_____________________")

for stock in stocks_list:
    print(stock)

    df = testing_stock_data[stock].copy()
    temp = pd.DataFrame(columns = df.columns)

    for hour_no, best_hour in enumerate(individual_best_hour[stock]):

        if hour_no == 0:
            temp = df[(df['Hour'] == best_hour)]
        else:
            temp = pd.concat([temp, df[df['Hour'] == best_hour]])

    temp['Returns'] = 0
    return_col = temp.columns.get_loc('Returns')
    change_in_close_price_col = temp.columns.get_loc('% Change in Close Price')

    for i in range(0, len(temp)):
        if i == 0:
            temp.iloc[i, return_col] = (1 + temp.iloc[i,change_in_close_price_col]) * 1
        else:
            temp.iloc[i, return_col] = (1 + temp.iloc[i,change_in_close_price_col]) * temp.iloc[i-1,return_col]

    performances.loc[stock, 'Individual Best 3 Hours - Testing'] = temp.iloc[-1,return_col] - 1
    print(temp.iloc[-1,return_col])

    plt.figure(figsize=(15,9))

    df = df.set_index('Datetime')
    temp = temp.set_index('Datetime')

    plt.plot(df['Close'], alpha = 0.5)

    for entry in temp.index:

        plt.scatter(entry, df.loc[entry, 'Close'],
            color = 'g' if temp.loc[entry, '% Change in Close Price'] >= 0 else 'r',
            s = 10
        )

    plt.gcf().autofmt_xdate()
    plt.ylim(0)
    plt.xlabel('Datetime')
    plt.ylabel('Close Price')
    plt.title(stock + ' - Testing')
    plt.show()

performances.T
Using Training Data
_____________________
PDD
2.104346673432768
Visualization 42
MDB
1.8456429990147925
Visualization 43
DASH
1.3813326734304454
Visualization 44
SPY
1.162690677935936
Visualization 45


Using Testing Data
_____________________
PDD
0.899690119805139
Visualization 46
MDB
1.5007797791650734
Visualization 47
DASH
1.6466367874031884
Visualization 48
SPY
1.050682044985406
Visualization 49
Stock Ticker                             PDD       MDB      DASH       SPY
Buy and Hold                       -0.181758  0.224945 -0.091034  0.236128
Dollar Cost Averaging (Daily)       0.489227  0.254254  0.552831  0.200805
Dollar Cost Averaging (Weekly)      0.472529  0.253272  0.556192  0.200736
Dollar Cost Averaging (Monthly)     0.503826  0.242322  0.554912  0.200701
Dollar Cost Averaging (Hourly)      0.490052  0.254119  0.552687  0.200839
General Best 3 Hours - Training     1.104347  0.900158  0.135400  0.121698
General Best 3 Hours - Testing     -0.100310  0.358737  0.478041  0.122986
General Best Hour - Training        0.389779  0.214474 -0.024389 -0.025788
General Best Hour - Testing        -0.037082  0.543056  0.468732  0.102684
Individual Best 3 Hours - Training  1.104347  0.845643  0.381333  0.162691
Individual Best 3 Hours - Testing  -0.100310  0.500780  0.646637  0.050682
Individual Best Hour

Now for the individual's best hours. The best hours are already arranged in the individual best hour. The first value in the list is the best hour.

print("Using Training Data")
print("_____________________")

for stock in stocks_list:
    print(stock)

    df = training_stock_data[stock].copy()
    temp = df[df['Hour']==individual_best_hour[stock][0]].copy()

    temp['Returns'] = 0
    return_col = temp.columns.get_loc('Returns')
    change_in_close_price_col = temp.columns.get_loc('% Change in Close Price')

    for i in range(0, len(temp)):
        if i == 0:
            temp.iloc[i, return_col] = (1 + temp.iloc[i,change_in_close_price_col]) * 1
        else:
            temp.iloc[i, return_col] = (1 + temp.iloc[i,change_in_close_price_col]) * temp.iloc[i-1,return_col]

    performances.loc[stock, 'Individual Best Hour - Training'] = temp.iloc[-1,return_col] - 1
    print(temp.iloc[-1,return_col])

    plt.figure(figsize=(15,9))

    df = df.set_index('Datetime')
    temp = temp.set_index('Datetime')

    plt.plot(df['Close'], alpha = 0.5)

    for entry in temp.index:

        plt.scatter(entry, df.loc[entry, 'Close'],
            color = 'g' if temp.loc[entry, '% Change in Close Price'] >= 0 else 'r',
            s = 10
        )

    plt.gcf().autofmt_xdate()
    plt.ylim(0)
    plt.xlabel('Datetime')
    plt.ylabel('Close Price')
    plt.title(stock + ' - Training')
    plt.show()

print("\n")
print("Using Testing Data")
print("_____________________")

for stock in stocks_list:
    print(stock)

    df = testing_stock_data[stock].copy()
    temp = df[df['Hour']==individual_best_hour[stock][0]].copy()

    temp['Returns'] = 0
    return_col = temp.columns.get_loc('Returns')
    change_in_close_price_col = temp.columns.get_loc('% Change in Close Price')

    for i in range(0, len(temp)):
        if i == 0:
            temp.iloc[i, return_col] = (1 + temp.iloc[i,change_in_close_price_col]) * 1
        else:
            temp.iloc[i, return_col] = (1 + temp.iloc[i,change_in_close_price_col]) * temp.iloc[i-1,return_col]

    performances.loc[stock, 'Individual Best Hour - Testing'] = temp.iloc[-1,return_col] - 1
    print(temp.iloc[-1,return_col])

    plt.figure(figsize=(15,9))

    df = df.set_index('Datetime')
    temp = temp.set_index('Datetime')

    plt.plot(df['Close'], alpha = 0.5)

    for entry in temp.index:

        plt.scatter(entry, df.loc[entry, 'Close'],
            color = 'g' if temp.loc[entry, '% Change in Close Price'] >= 0 else 'r',
            s = 10
        )

    plt.gcf().autofmt_xdate()
    plt.ylim(0)
    plt.xlabel('Datetime')
    plt.ylabel('Close Price')
    plt.title(stock + ' - Testing')
    plt.show()

performances.T
Using Training Data
_____________________
PDD
1.3897785667065108
Visualization 50
MDB
1.5631031977073437
Visualization 51
DASH
1.3655498460295028
Visualization 52
SPY
1.1370392017985178
Visualization 53


Using Testing Data
_____________________
PDD
0.962917971772168
Visualization 54
MDB
1.0033284217585219
Visualization 55
DASH
1.0128765761639726
Visualization 56
SPY
1.035842211343177
Visualization 57
Stock Ticker                             PDD       MDB      DASH       SPY
Buy and Hold                       -0.181758  0.224945 -0.091034  0.236128
Dollar Cost Averaging (Daily)       0.489227  0.254254  0.552831  0.200805
Dollar Cost Averaging (Weekly)      0.472529  0.253272  0.556192  0.200736
Dollar Cost Averaging (Monthly)     0.503826  0.242322  0.554912  0.200701
Dollar Cost Averaging (Hourly)      0.490052  0.254119  0.552687  0.200839
General Best 3 Hours - Training     1.104347  0.900158  0.135400  0.121698
General Best 3 Hours - Testing     -0.100310  0.358737  0.478041  0.122986
General Best Hour - Training        0.389779  0.214474 -0.024389 -0.025788
General Best Hour - Testing        -0.037082  0.543056  0.468732  0.102684
Individual Best 3 Hours - Training  1.104347  0.845643  0.381333  0.162691
Individual Best 3 Hours - Testing  -0.100310  0.500780  0.646637  0.050682
Individual Best Hour - Training     0.389779  0.563103  0.365550  0.137039
Individual Best Hour - Testing     -0.037082  0.003328  0.012877  0.035842
Visualizing the Results
plt.subplots(4,1, figsize=(10,30))

for stock_no, stock in enumerate(stocks_list):

    plt.subplot(4,1,stock_no+1)

    bar_color = []
    alpha_list = []
    for i in range(5): bar_color.append('tab:blue')

    count = 0
    color_range = 1
    for i in range(5,len(performances.columns)):
        bar_color.append('C' + str(color_range))

        count += 1
        if count == 2:
            count = 0
            color_range += 1

    plt.bar(performances.columns, performances.loc[stock,:], color = bar_color)

    plt.title('Strategy Performances ' + stock)
    plt.xticks(rotation=45, ha='right')

plt.tight_layout()
plt.show()
Visualization 58
Conclusion

We collected the results and we have noted the following:

PDD

While it does seem that all the proposed strategy was not able beat dollar cost averaging, but it managed to outperform buy and hold. We may need more data to further verify this, but based on what is given, these strategies proposed, has able to beat the 'buy and hold' strategy. DCA for this dataset is so dominant because of the price movement throughout our dataset. Based on this graph, DCA will definitely be in an advantageous position as we are entering the market bit by bit, and this dataset is also the DCA strategy's best case scenario. Because DCA can perform so much better than the proposed strategies, DCA is the way to go for this stock.

MDB

All of the strategy proposed for this stock has outperformed the baseline strategies, except for the individual best hour. Ironic that the best performing hour for the individual stock in the training set, is the worst performing one in the testing set. It could be because it so happens to be on a day or hour which had a big drop in price movement. The added effort in analyzing and finding the pattern might be worth it for this stock.

DASH

We can see that DCA strategy once again is performing really well as the DASH chart is also an ideal situation for the DCA strategy. Only the individual best 3 hours outperform all the baseline strategies, with the General Best 3 Hour, and Individual best hour being close to DCA. This makes the extra effort not as worth it, when DCA can perform just as well.

SPY

Because of the steadier nature of this stock, it is harder for DCA to have an advantage over 'buy and hold'. It is worth even less for us to be putting the extra effort in spotting any patterns to take advantage of, according to the strategies that has been propose, because all of them did not perform as close as the baseline strategies proposed.

At the moment, the baseline strategies still seems attractive due to how simple and easy it is to execute, and the proposed strategy, while in some cases outperformed the baseline strategies, but not reliably. With how simple either baseline strategies are, based on what we have observed and tested, if investors would be willing to take the risks by trying to select other stocks, with a little bit of effort and some number of stock pick, just by using DCA, it is likely for them to be able to make more returns than 'buy and hold'.

Optimal Time to Hold

The above strategies, we have based our decision on a pre-determine choice that we will be only entering the market for only 1 hour, and have managed to have some interesting result. Now we will increase our data into a larger size dataset by creating a very large 'what if' table. Assuming we want to hold a trade for the maximum of 1 year, as that would be our the limit of our risk appetite, we will then create rows of percentage change in price. We will try to just use basic analytics techniques to see if we can identify any trends and pattern that we can take advantage of

The following code ran for 4 hour. The initial idea for the follow code is find out is there an optimal time to hold each of the stock. Given a limit of one year, we tried to create a dataset which consisted of the current datetime, at the current hour, match with all the subsequent datetime until the next year. And we do it for all of them.

## This was used to generate the dataset.


#for stock in stocks_list:
#    print(stock)
#
#    df = pd.concat([training_stock_data[stock], testing_stock_data[stock]])
#
#    df = df.drop(['Open', 'High', 'Low', 'Volume', 'Dividends', 'Stock Splits', '% Change in Close Price'], axis = 1)
#
#    buy_columns = 'Buy - ' + df.columns
#    sell_columns = 'Sell - ' + df.columns
#    combined_columns = buy_columns.append(sell_columns)
#
#    temp_list = []
#
#    for i in range(0,len(df)):
#
#        buy_list = list(df.iloc[i,:])
#        #print(buy_list)
#        for j in range(0, len(df)):
#
#            sell_list = list(df.iloc[j,:])
#            temp = buy_list + sell_list
#
#            time_diff = (df.iloc[j, df.columns.get_loc('Datetime')] - df.iloc[i, df.columns.get_loc('Datetime')]).days
#
#            if time_diff <= 365: temp_list.append(temp)
#
#    new_df = pd.DataFrame(temp_list, columns=combined_columns)
#    new_df.to_csv("stock_data" + "\\" + stock + " - Possible Investment Durations")

These generated data could be accessed via this link. https://www.dropbox.com/scl/fo/may2j9la0d17wtjifpdi2/h?rlkey=2luvw3aqf6t6cdc8bau7pm3vw&dl=0

new_stocks_data = {}
stock_ticker = "DASH"
new_stocks_data[stock_ticker] = pd.read_csv("stock_data" + "\\" + stock_ticker + " - Possible Investment Durations")
new_stocks_data[stock_ticker]
          Unnamed: 0       Buy - Datetime  Buy - Close  Buy - Hour  Buy - Day  \
0                  0  2021-04-15 13:30:00   151.964996           0         15
1                  1  2021-04-15 13:30:00   151.964996           0         15
2                  2  2021-04-15 13:30:00   151.964996           0         15
3                  3  2021-04-15 13:30:00   151.964996           0         15
4                  4  2021-04-15 13:30:00   151.964996           0         15
...              ...                  ...          ...         ...        ...
20379689    20379689  2024-03-08 19:30:00   131.300003           6          8
20379690    20379690  2024-03-08 19:30:00   131.300003           6          8
20379691    20379691  2024-03-08 19:30:00   131.300003           6          8
20379692    20379692  2024-03-08 19:30:00   131.300003           6          8
20379693    20379693  2024-03-08 19:30:00   131.300003           6          8

          Buy - Day_of_Week  Buy - Month      Sell - Datetime  Sell - Close  \
0                         3            4  2021-04-15 13:30:00    151.964996
1                         3            4  2021-04-15 14:30:00    152.500000
2                         3            4  2021-04-15 15:30:00    153.139999
3                         3            4  2021-04-15 16:30:00    153.884995
4                         3            4  2021-04-15 17:30:00    152.520004
...                     ...          ...                  ...           ...
20379689                  4            3  2024-03-08 15:30:00    131.804993
20379690                  4            3  2024-03-08 16:30:00    132.149994
20379691                  4            3  2024-03-08 17:30:00    129.649994
20379692                  4            3  2024-03-08 18:30:00    130.639999
20379693                  4            3  2024-03-08 19:30:00    131.300003

          Sell - Hour  Sell - Day  Sell - Day_of_Week  Sell - Month
0                   0          15                   3             4
1                   1          15                   3             4
2                   2          15                   3             4
3                   3          15                   3             4
4                   4          15                   3             4
...               ...         ...                 ...           ...
20379689            2           8                   4             3
20379690            3           8                   4             3
20379691            4           8                   4             3
20379692            5           8                   4             3
20379693            6           8                   4             3

[20379694 rows x 13 columns]
stock_ticker = "PDD"
new_stocks_data[stock_ticker] = pd.read_csv("stock_data" + "\\" + stock_ticker + " - Possible Investment Durations")
new_stocks_data[stock_ticker]
          Unnamed: 0       Buy - Datetime  Buy - Close  Buy - Hour  Buy - Day  \
0                  0  2021-04-15 13:30:00   132.725006           0         15
1                  1  2021-04-15 13:30:00   132.725006           0         15
2                  2  2021-04-15 13:30:00   132.725006           0         15
3                  3  2021-04-15 13:30:00   132.725006           0         15
4                  4  2021-04-15 13:30:00   132.725006           0         15
...              ...                  ...          ...         ...        ...
20379689    20379689  2024-03-08 19:30:00   110.235001           6          8
20379690    20379690  2024-03-08 19:30:00   110.235001           6          8
20379691    20379691  2024-03-08 19:30:00   110.235001           6          8
20379692    20379692  2024-03-08 19:30:00   110.235001           6          8
20379693    20379693  2024-03-08 19:30:00   110.235001           6          8

          Buy - Day_of_Week  Buy - Month      Sell - Datetime  Sell - Close  \
0                         3            4  2021-04-15 13:30:00    132.725006
1                         3            4  2021-04-15 14:30:00    131.229904
2                         3            4  2021-04-15 15:30:00    131.320007
3                         3            4  2021-04-15 16:30:00    131.639999
4                         3            4  2021-04-15 17:30:00    131.929993
...                     ...          ...                  ...           ...
20379689                  4            3  2024-03-08 15:30:00    110.879997
20379690                  4            3  2024-03-08 16:30:00    110.160004
20379691                  4            3  2024-03-08 17:30:00    111.099998
20379692                  4            3  2024-03-08 18:30:00    110.910004
20379693                  4            3  2024-03-08 19:30:00    110.235001

          Sell - Hour  Sell - Day  Sell - Day_of_Week  Sell - Month
0                   0          15                   3             4
1                   1          15                   3             4
2                   2          15                   3             4
3                   3          15                   3             4
4                   4          15                   3             4
...               ...         ...                 ...           ...
20379689            2           8                   4             3
20379690            3           8                   4             3
20379691            4           8                   4             3
20379692            5           8                   4             3
20379693            6           8                   4             3

[20379694 rows x 13 columns]
stock_ticker = "SPY"
new_stocks_data[stock_ticker] = pd.read_csv("stock_data" + "\\" + stock_ticker + " - Possible Investment Durations")
new_stocks_data[stock_ticker]
          Unnamed: 0       Buy - Datetime  Buy - Close  Buy - Capital Gains  \
0                  0  2021-04-15 13:30:00   414.559998                  0.0
1                  1  2021-04-15 13:30:00   414.559998                  0.0
2                  2  2021-04-15 13:30:00   414.559998                  0.0
3                  3  2021-04-15 13:30:00   414.559998                  0.0
4                  4  2021-04-15 13:30:00   414.559998                  0.0
...              ...                  ...          ...                  ...
20362483    20362483  2024-03-08 19:30:00   511.929993                  0.0
20362484    20362484  2024-03-08 19:30:00   511.929993                  0.0
20362485    20362485  2024-03-08 19:30:00   511.929993                  0.0
20362486    20362486  2024-03-08 19:30:00   511.929993                  0.0
20362487    20362487  2024-03-08 19:30:00   511.929993                  0.0

          Buy - Hour  Buy - Day  Buy - Day_of_Week  Buy - Month  \
0                  0         15                  3            4
1                  0         15                  3            4
2                  0         15                  3            4
3                  0         15                  3            4
4                  0         15                  3            4
...              ...        ...                ...          ...
20362483           6          8                  4            3
20362484           6          8                  4            3
20362485           6          8                  4            3
20362486           6          8                  4            3
20362487           6          8                  4            3

              Sell - Datetime  Sell - Close  Sell - Capital Gains  \
0         2021-04-15 13:30:00    414.559998                   0.0
1         2021-04-15 14:30:00    414.645691                   0.0
2         2021-04-15 15:30:00    415.170013                   0.0
3         2021-04-15 16:30:00    415.290009                   0.0
4         2021-04-15 17:30:00    415.609985                   0.0
...                       ...           ...                   ...
20362483  2024-03-08 15:30:00    514.760010                   0.0
20362484  2024-03-08 16:30:00    514.630005                   0.0
20362485  2024-03-08 17:30:00    512.330017                   0.0
20362486  2024-03-08 18:30:00    513.460022                   0.0
20362487  2024-03-08 19:30:00    511.929993                   0.0

          Sell - Hour  Sell - Day  Sell - Day_of_Week  Sell - Month
0                   0          15                   3             4
1                   1          15                   3             4
2                   2          15                   3             4
3                   3          15                   3             4
4                   4          15                   3             4
...               ...         ...                 ...           ...
20362483            2           8                   4             3
20362484            3           8                   4             3
20362485            4           8                   4             3
20362486            5           8                   4             3
20362487            6           8                   4             3

[20362488 rows x 15 columns]

stock_ticker = "MDB"
new_stocks_data[stock_ticker] = pd.read_csv("stock_data" + "\\" + stock_ticker + " - Possible Investment Durations")
new_stocks_data[stock_ticker]
          Unnamed: 0       Buy - Datetime  Buy - Close  Buy - Hour  Buy - Day  \
0                  0  2021-04-15 13:30:00   311.209991           0         15
1                  1  2021-04-15 13:30:00   311.209991           0         15
2                  2  2021-04-15 13:30:00   311.209991           0         15
3                  3  2021-04-15 13:30:00   311.209991           0         15
4                  4  2021-04-15 13:30:00   311.209991           0         15
...              ...                  ...          ...         ...        ...
20379689    20379689  2024-03-08 19:30:00   385.424988           6          8
20379690    20379690  2024-03-08 19:30:00   385.424988           6          8
20379691    20379691  2024-03-08 19:30:00   385.424988           6          8
20379692    20379692  2024-03-08 19:30:00   385.424988           6          8
20379693    20379693  2024-03-08 19:30:00   385.424988           6          8

          Buy - Day_of_Week  Buy - Month      Sell - Datetime  Sell - Close  \
0                         3            4  2021-04-15 13:30:00    311.209991
1                         3            4  2021-04-15 14:30:00    312.670013
2                         3            4  2021-04-15 15:30:00    318.079987
3                         3            4  2021-04-15 16:30:00    317.605011
4                         3            4  2021-04-15 17:30:00    319.959991
...                     ...          ...                  ...           ...
20379689                  4            3  2024-03-08 15:30:00    381.812592
20379690                  4            3  2024-03-08 16:30:00    386.524994
20379691                  4            3  2024-03-08 17:30:00    384.000000
20379692                  4            3  2024-03-08 18:30:00    384.359985
20379693                  4            3  2024-03-08 19:30:00    385.424988

          Sell - Hour  Sell - Day  Sell - Day_of_Week  Sell - Month
0                   0          15                   3             4
1                   1          15                   3             4
2                   2          15                   3             4
3                   3          15                   3             4
4                   4          15                   3             4
...               ...         ...                 ...           ...
20379689            2           8                   4             3
20379690            3           8                   4             3
20379691            4           8                   4             3
20379692            5           8                   4             3
20379693            6           8                   4             3

[20379694 rows x 13 columns]

After creating the large dataset, we will be removing some of the data that we will not be using neither do they make any sense. We will be pre-processing this data.

We will start to create a column that indicates the duration of holding the stock.
Then we will filter those that are equal and below 0, as that indicates purchasing the same datetime itself.
Then, we will need a column to indicate the percentage change in price.

Once we have this, we can explore to see if there are trends that we can capitalize on. Similarly, we will need to split it into training and testing once again and then put it to the test again. Because the baseline for the 'buy and hold', had a performance of +50% PDD, +25% MDB, +55% DASH, and +23% SPY. Therefore, to beat the baseline, we will need to aim to do better.

In the following code, we will prepare a list of expected return, from 3% to 75% return. Then we try to store the average duration for a entry to make that amount of profit, and also further store the most occurring buying hour and day, and also the most occurring selling hour and day. While the time is limited, we did not check and test these listed properties to see if it can beat the baseline. But the general trend noted from tht below findings is that the longer one holds the entry or investment, the more likely it is to have a higher profit.

training_period = training_stock_data['PDD']['Datetime']

return_list = [0.03, 0.05, 0.1, 0.3, 0.5, 0.65, 0.75]
optimal_strategy = pd.DataFrame(columns = ['Stock', 'Expected Return', 'Average Duration',
                                           'Buy - Hour','Buy - Day', 'Buy - Day of Week',
                                           'Sell - Hour','Sell - Day', 'Sell - Day of Week'])

for stock in stocks_list:
    print(stock)

    df = new_stocks_data[stock].copy()

    # Calculating the Duration
    # Convert the datetime from string to datetime
    df['Sell - Datetime'] = pd.to_datetime(df['Sell - Datetime'], errors='coerce', utc=True)
    df['Buy - Datetime'] = pd.to_datetime(df['Buy - Datetime'], errors='coerce', utc=True)
    df['Sell - Datetime'] = df['Sell - Datetime'].dt.tz_localize(None)
    df['Buy - Datetime'] = df['Buy - Datetime'].dt.tz_localize(None)

    # Get the duration of holding the stock
    df['Duration (Hours)'] = (df['Sell - Datetime'] - df['Buy - Datetime']) / pd.Timedelta(hours = 1)

    # Removing those that have duration of equal or less than 0
    df = df[(df['Duration (Hours)'] > 0)]

    # Calculating the percentage change in price
    df['% Change in Price'] = (df['Sell - Close'] - df['Buy - Close']) / df['Buy - Close']

    # Filter those that are in the training set
    df_train = df[(df['Sell - Datetime'] <= training_period.tail(1).item())].copy()

    # Loop expected return
    for expected_return in return_list:

        # Store the most occurring buying day/hour and selling day/hour.
        temp_df = df_train[df_train['% Change in Price'] >= expected_return].copy()
        average_duration = temp_df['Duration (Hours)'].mean()
        mode_buy_hour = temp_df['Buy - Hour'].mode()
        mode_buy_day = temp_df['Buy - Day'].mode()
        mode_buy_day_of_week = temp_df['Buy - Day_of_Week'].mode()
        mode_sell_hour = temp_df['Sell - Hour'].mode()
        mode_sell_day = temp_df['Sell - Day'].mode()
        mode_sell_day_of_week = temp_df['Sell - Day_of_Week'].mode()

        # Store them into the table
        optimal_strategy.loc[len(optimal_strategy.index)] = [stock, expected_return, average_duration,
                                          mode_buy_hour, mode_buy_day, mode_buy_day_of_week,
                                          mode_sell_hour, mode_sell_day, mode_sell_day_of_week]

optimal_strategy
PDD
MDB
DASH
SPY
   Stock  Expected Return  Average Duration  \
0    PDD             0.03       3757.000516
1    PDD             0.05       3826.572919
2    PDD             0.10       3978.323944
3    PDD             0.30       4417.653129
4    PDD             0.50       4904.750649
5    PDD             0.65       5254.390759
6    PDD             0.75       5557.023504
7    MDB             0.03       3033.286414
8    MDB             0.05       3093.602709
9    MDB             0.10       3235.473729
10   MDB             0.30       3649.104038
11   MDB             0.50       4229.355688
12   MDB             0.65       4407.726249
13   MDB             0.75       4388.833449
14  DASH             0.03       2010.721397
15  DASH             0.05       2063.361792
16  DASH             0.10       2197.194164
17  DASH             0.30       2954.526927
18  DASH             0.50       3440.912500
19  DASH             0.65       3779.710158
20  DASH             0.75       3865.734914
21   SPY             0.03       3321.667364
22   SPY             0.05       3589.225845
23   SPY             0.10       4283.918912
24   SPY             0.30               NaN
25   SPY             0.50               NaN
26   SPY             0.65               NaN
27   SPY             0.75               NaN

                                    Buy - Hour  \
0        0    3
Name: Buy - Hour, dtype: int64
1        0    3
Name: Buy - Hour, dtype: int64
2        0    3
Name: Buy - Hour, dtype: int64
3        0    3
Name: Buy - Hour, dtype: int64
4        0    4
Name: Buy - Hour, dtype: int64
5        0    4
Name: Buy - Hour, dtype: int64
6        0    6
Name: Buy - Hour, dtype: int64
7        0    3
Name: Buy - Hour, dtype: int64
8        0    3
Name: Buy - Hour, dtype: int64
9        0    3
Name: Buy - Hour, dtype: int64
10       0    1
Name: Buy - Hour, dtype: int64
11       0    4
Name: Buy - Hour, dtype: int64
12       0    4
Name: Buy - Hour, dtype: int64
13       0    1
Name: Buy - Hour, dtype: int64
14       0    3
Name: Buy - Hour, dtype: int64
15       0    5
Name: Buy - Hour, dtype: int64
16       0    3
Name: Buy - Hour, dtype: int64
17       0    5
Name: Buy - Hour, dtype: int64
18       0    4
Name: Buy - Hour, dtype: int64
19       0    4
Name: Buy - Hour, dtype: int64
20       0    4
Name: Buy - Hour, dtype: int64
21       0    3
Name: Buy - Hour, dtype: int64
22       0    3
Name: Buy - Hour, dtype: int64
23       0    0
Name: Buy - Hour, dtype: int64
24  Series([], Name: Buy - Hour, dtype: int64)
25  Series([], Name: Buy - Hour, dtype: int64)
26  Series([], Name: Buy - Hour, dtype: int64)
27  Series([], Name: Buy - Hour, dtype: int64)

                                    Buy - Day  \
0       0    28
Name: Buy - Day, dtype: int64
1       0    28
Name: Buy - Day, dtype: int64
2       0    28
Name: Buy - Day, dtype: int64
3       0    11
Name: Buy - Day, dtype: int64
4       0    11
Name: Buy - Day, dtype: int64
5       0    11
Name: Buy - Day, dtype: int64
6       0    11
Name: Buy - Day, dtype: int64
7       0    28
Name: Buy - Day, dtype: int64
8       0    28
Name: Buy - Day, dtype: int64
9       0    28
Name: Buy - Day, dtype: int64
10      0    19
Name: Buy - Day, dtype: int64
11      0    19
Name: Buy - Day, dtype: int64
12       0    3
Name: Buy - Day, dtype: int64
13       0    3
Name: Buy - Day, dtype: int64
14       0    7
Name: Buy - Day, dtype: int64
15       0    3
Name: Buy - Day, dtype: int64
16       0    3
Name: Buy - Day, dtype: int64
17       0    7
Name: Buy - Day, dtype: int64
18      0    13
Name: Buy - Day, dtype: int64
19      0    13
Name: Buy - Day, dtype: int64
20      0    13
Name: Buy - Day, dtype: int64
21      0    21
Name: Buy - Day, dtype: int64
22      0    21
Name: Buy - Day, dtype: int64
23      0    19
Name: Buy - Day, dtype: int64
24  Series([], Name: Buy - Day, dtype: int64)
25  Series([], Name: Buy - Day, dtype: int64)
26  Series([], Name: Buy - Day, dtype: int64)
27  Series([], Name: Buy - Day, dtype: int64)

                                    Buy - Day of Week  \
0        0    3
Name: Buy - Day_of_Week, dtype: int64
1        0    3
Name: Buy - Day_of_Week, dtype: int64
2        0    3
Name: Buy - Day_of_Week, dtype: int64
3        0    3
Name: Buy - Day_of_Week, dtype: int64
4        0    3
Name: Buy - Day_of_Week, dtype: int64
5        0    3
Name: Buy - Day_of_Week, dtype: int64
6        0    1
Name: Buy - Day_of_Week, dtype: int64
7        0    3
Name: Buy - Day_of_Week, dtype: int64
8        0    3
Name: Buy - Day_of_Week, dtype: int64
9        0    3
Name: Buy - Day_of_Week, dtype: int64
10       0    1
Name: Buy - Day_of_Week, dtype: int64
11       0    3
Name: Buy - Day_of_Week, dtype: int64
12       0    3
Name: Buy - Day_of_Week, dtype: int64
13       0    3
Name: Buy - Day_of_Week, dtype: int64
14       0    3
Name: Buy - Day_of_Week, dtype: int64
15       0    3
Name: Buy - Day_of_Week, dtype: int64
16       0    3
Name: Buy - Day_of_Week, dtype: int64
17       0    3
Name: Buy - Day_of_Week, dtype: int64
18       0    3
Name: Buy - Day_of_Week, dtype: int64
19       0    3
Name: Buy - Day_of_Week, dtype: int64
20       0    3
Name: Buy - Day_of_Week, dtype: int64
21       0    3
Name: Buy - Day_of_Week, dtype: int64
22       0    3
Name: Buy - Day_of_Week, dtype: int64
23       0    3
Name: Buy - Day_of_Week, dtype: int64
24  Series([], Name: Buy - Day_of_Week, dtype: int64)
25  Series([], Name: Buy - Day_of_Week, dtype: int64)
26  Series([], Name: Buy - Day_of_Week, dtype: int64)
27  Series([], Name: Buy - Day_of_Week, dtype: int64)

                                      Sell - Hour  \
0          0    2
Name: Sell - Hour, dtype: int64
1          0    2
Name: Sell - Hour, dtype: int64
2          0    2
Name: Sell - Hour, dtype: int64
3          0    1
Name: Sell - Hour, dtype: int64
4          0    1
Name: Sell - Hour, dtype: int64
5          0    1
Name: Sell - Hour, dtype: int64
6          0    1
Name: Sell - Hour, dtype: int64
7          0    1
Name: Sell - Hour, dtype: int64
8          0    1
Name: Sell - Hour, dtype: int64
9          0    1
Name: Sell - Hour, dtype: int64
10         0    1
Name: Sell - Hour, dtype: int64
11         0    1
Name: Sell - Hour, dtype: int64
12         0    1
Name: Sell - Hour, dtype: int64
13         0    1
Name: Sell - Hour, dtype: int64
14         0    1
Name: Sell - Hour, dtype: int64
15         0    1
Name: Sell - Hour, dtype: int64
16         0    1
Name: Sell - Hour, dtype: int64
17         0    2
Name: Sell - Hour, dtype: int64
18  0    1
1    2
Name: Sell - Hour, dtype: int64
19         0    1
Name: Sell - Hour, dtype: int64
20         0    2
Name: Sell - Hour, dtype: int64
21         0    1
Name: Sell - Hour, dtype: int64
22         0    2
Name: Sell - Hour, dtype: int64
23         0    6
Name: Sell - Hour, dtype: int64
24    Series([], Name: Sell - Hour, dtype: int64)
25    Series([], Name: Sell - Hour, dtype: int64)
26    Series([], Name: Sell - Hour, dtype: int64)
27    Series([], Name: Sell - Hour, dtype: int64)

                                    Sell - Day  \
0        0    1
Name: Sell - Day, dtype: int64
1        0    1
Name: Sell - Day, dtype: int64
2        0    1
Name: Sell - Day, dtype: int64
3        0    6
Name: Sell - Day, dtype: int64
4        0    6
Name: Sell - Day, dtype: int64
5        0    9
Name: Sell - Day, dtype: int64
6       0    13
Name: Sell - Day, dtype: int64
7       0    22
Name: Sell - Day, dtype: int64
8       0    22
Name: Sell - Day, dtype: int64
9        0    8
Name: Sell - Day, dtype: int64
10      0    22
Name: Sell - Day, dtype: int64
11      0    15
Name: Sell - Day, dtype: int64
12      0    22
Name: Sell - Day, dtype: int64
13      0    22
Name: Sell - Day, dtype: int64
14      0    15
Name: Sell - Day, dtype: int64
15      0    15
Name: Sell - Day, dtype: int64
16      0    15
Name: Sell - Day, dtype: int64
17      0    15
Name: Sell - Day, dtype: int64
18      0    15
Name: Sell - Day, dtype: int64
19      0    15
Name: Sell - Day, dtype: int64
20      0    15
Name: Sell - Day, dtype: int64
21       0    1
Name: Sell - Day, dtype: int64
22      0    10
Name: Sell - Day, dtype: int64
23      0    10
Name: Sell - Day, dtype: int64
24  Series([], Name: Sell - Day, dtype: int64)
25  Series([], Name: Sell - Day, dtype: int64)
26  Series([], Name: Sell - Day, dtype: int64)
27  Series([], Name: Sell - Day, dtype: int64)

                                   Sell - Day of Week
0       0    1
Name: Sell - Day_of_Week, dtype: int64
1       0    1
Name: Sell - Day_of_Week, dtype: int64
2       0    1
Name: Sell - Day_of_Week, dtype: int64
3       0    1
Name: Sell - Day_of_Week, dtype: int64
4       0    3
Name: Sell - Day_of_Week, dtype: int64
5       0    3
Name: Sell - Day_of_Week, dtype: int64
6       0    3
Name: Sell - Day_of_Week, dtype: int64
7       0    2
Name: Sell - Day_of_Week, dtype: int64
8       0    2
Name: Sell - Day_of_Week, dtype: int64
9       0    2
Name: Sell - Day_of_Week, dtype: int64
10      0    2
Name: Sell - Day_of_Week, dtype: int64
11      0    1
Name: Sell - Day_of_Week, dtype: int64
12      0    1
Name: Sell - Day_of_Week, dtype: int64
13      0    3
Name: Sell - Day_of_Week, dtype: int64
14      0    2
Name: Sell - Day_of_Week, dtype: int64
15      0    2
Name: Sell - Day_of_Week, dtype: int64
16      0    3
Name: Sell - Day_of_Week, dtype: int64
17      0    3
Name: Sell - Day_of_Week, dtype: int64
18      0    4
Name: Sell - Day_of_Week, dtype: int64
19      0    4
Name: Sell - Day_of_Week, dtype: int64
20      0    4
Name: Sell - Day_of_Week, dtype: int64
21      0    2
Name: Sell - Day_of_Week, dtype: int64
22      0    2
Name: Sell - Day_of_Week, dtype: int64
23      0    2
Name: Sell - Day_of_Week, dtype: int64
24  Series([], Name: Sell - Day_of_Week, dtype: in...
25  Series([], Name: Sell - Day_of_Week, dtype: in...
26  Series([], Name: Sell - Day_of_Week, dtype: in...
27  Series([], Name: Sell - Day_of_Week, dtype: in...

By making some small adjustment to the return expectation, in the code below, it consists of 3% to 5%, we can now have a different sets of outcome we can test.

training_period = training_stock_data['PDD']['Datetime']

optimal_strategy = pd.DataFrame(columns = ['Stock', 'Expected Return', 'Average Duration',
                                           'Buy - Hour','Buy - Day', 'Buy - Day of Week',
                                           'Sell - Hour','Sell - Day', 'Sell - Day of Week'])

for stock in stocks_list:
    print(stock)

    df = new_stocks_data[stock].copy()

    # Calculating the Duration
    # Convert the datetime from string to datetime
    df['Sell - Datetime'] = pd.to_datetime(df['Sell - Datetime'], errors='coerce', utc=True)
    df['Buy - Datetime'] = pd.to_datetime(df['Buy - Datetime'], errors='coerce', utc=True)
    df['Sell - Datetime'] = df['Sell - Datetime'].dt.tz_localize(None)
    df['Buy - Datetime'] = df['Buy - Datetime'].dt.tz_localize(None)

    # Get the duration of holding the stock
    df['Duration (Hours)'] = (df['Sell - Datetime'] - df['Buy - Datetime']) / pd.Timedelta(hours = 1)

    # Removing those that have duration of equal or less than 0
    df = df[(df['Duration (Hours)'] > 0)]

    # Calculating the percentage change in price
    df['% Change in Price'] = (df['Sell - Close'] - df['Buy - Close']) / df['Buy - Close']

    # Filter those that are in the training set
    df_train = df[(df['Sell - Datetime'] <= training_period.tail(1).item())].copy()

    # Store the most occurring buying day/hour and selling day/hour.
    temp_df = df_train[(df_train['% Change in Price'] >= 0.03) & (df_train['% Change in Price'] <= 0.05)].copy()
    average_duration = temp_df['Duration (Hours)'].mean()
    mode_buy_hour = temp_df['Buy - Hour'].mode()
    mode_buy_day = temp_df['Buy - Day'].mode()
    mode_buy_day_of_week = temp_df['Buy - Day_of_Week'].mode()
    mode_sell_hour = temp_df['Sell - Hour'].mode()
    mode_sell_day = temp_df['Sell - Day'].mode()
    mode_sell_day_of_week = temp_df['Sell - Day_of_Week'].mode()

    # Store them into the table
    optimal_strategy.loc[len(optimal_strategy.index)] = [stock, "3% - 5%", average_duration,
                                        mode_buy_hour, mode_buy_day, mode_buy_day_of_week,
                                        mode_sell_hour, mode_sell_day, mode_sell_day_of_week]

optimal_strategy
PDD
MDB
DASH
SPY
  Stock Expected Return  Average Duration  \
0   PDD         3% - 5%       2252.110473
1   MDB         3% - 5%       2077.244112
2  DASH         3% - 5%       1454.757564
3   SPY         3% - 5%       2779.621244

                              Buy - Hour  \
0  0    2
Name: Buy - Hour, dtype: int64
1  0    5
Name: Buy - Hour, dtype: int64
2  0    4
Name: Buy - Hour, dtype: int64
3  0    4
Name: Buy - Hour, dtype: int64

                                       Buy - Day  \
0  0    10
1    14
Name: Buy - Day, dtype: int64
1          0    21
Name: Buy - Day, dtype: int64
2          0    28
Name: Buy - Day, dtype: int64
3          0    21
Name: Buy - Day, dtype: int64

                              Buy - Day of Week  \
0  0    2
Name: Buy - Day_of_Week, dtype: int64
1  0    1
Name: Buy - Day_of_Week, dtype: int64
2  0    2
Name: Buy - Day_of_Week, dtype: int64
3  0    2
Name: Buy - Day_of_Week, dtype: int64

                              Sell - Hour  \
0  0    1
Name: Sell - Hour, dtype: int64
1  0    5
Name: Sell - Hour, dtype: int64
2  0    1
Name: Sell - Hour, dtype: int64
3  0    6
Name: Sell - Hour, dtype: int64

                               Sell - Day  \
0   0    7
Name: Sell - Day, dtype: int64
1  0    28
Name: Sell - Day, dtype: int64
2   0    9
Name: Sell - Day, dtype: int64
3   0    1
Name: Sell - Day, dtype: int64

                              Sell - Day of Week
0  0    1
Name: Sell - Day_of_Week, dtype: int64
1  0    4
Name: Sell - Day_of_Week, dtype: int64
2  0    1
Name: Sell - Day_of_Week, dtype: int64
3  0    2
Name: Sell - Day_of_Week, dtype: int64

But because we can see that the average duration is long, we can further tweak and add another conditions, such that we might want to only hold at the maximum of 2 weeks, which is 10 days for 8 hours, so 80 hours. Having these conditons to filter, we can then further explore with different methods to see if we can find recurring patterns or an edge in the market, but according to the investors' appetite.

training_period = training_stock_data['PDD']['Datetime']

optimal_strategy = pd.DataFrame(columns = ['Stock', 'Expected Return', 'Average Duration',
                                           'Buy - Hour','Buy - Day', 'Buy - Day of Week',
                                           'Sell - Hour','Sell - Day', 'Sell - Day of Week'])

for stock in stocks_list:
    print(stock)

    df = new_stocks_data[stock].copy()

    # Calculating the Duration
    # Convert the datetime from string to datetime
    df['Sell - Datetime'] = pd.to_datetime(df['Sell - Datetime'], errors='coerce', utc=True)
    df['Buy - Datetime'] = pd.to_datetime(df['Buy - Datetime'], errors='coerce', utc=True)
    df['Sell - Datetime'] = df['Sell - Datetime'].dt.tz_localize(None)
    df['Buy - Datetime'] = df['Buy - Datetime'].dt.tz_localize(None)

    # Get the duration of holding the stock
    df['Duration (Hours)'] = (df['Sell - Datetime'] - df['Buy - Datetime']) / pd.Timedelta(hours = 1)

    # Removing those that have duration of equal or less than 0
    df = df[(df['Duration (Hours)'] > 0)]

    # Calculating the percentage change in price
    df['% Change in Price'] = (df['Sell - Close'] - df['Buy - Close']) / df['Buy - Close']

    # Filter those that are in the training set
    df_train = df[(df['Sell - Datetime'] <= training_period.tail(1).item())].copy()

    # Store the most occurring buying day/hour and selling day/hour.
    temp_df = df_train[(df_train['% Change in Price'] >= 0.03) & (df_train['% Change in Price'] <= 0.05)].copy()
    temp_df = df_train[(df_train['Duration (Hours)'] <= 80)]
    average_duration = temp_df['Duration (Hours)'].mean()
    mode_buy_hour = temp_df['Buy - Hour'].mode()
    mode_buy_day = temp_df['Buy - Day'].mode()
    mode_buy_day_of_week = temp_df['Buy - Day_of_Week'].mode()
    mode_sell_hour = temp_df['Sell - Hour'].mode()
    mode_sell_day = temp_df['Sell - Day'].mode()
    mode_sell_day_of_week = temp_df['Sell - Day_of_Week'].mode()

    # Store them into the table
    optimal_strategy.loc[len(optimal_strategy.index)] = [stock, "3% - 5%", average_duration,
                                        mode_buy_hour, mode_buy_day, mode_buy_day_of_week,
                                        mode_sell_hour, mode_sell_day, mode_sell_day_of_week]

optimal_strategy
PDD
MDB
DASH
SPY
  Stock Expected Return  Average Duration  \
0   PDD         3% - 5%         37.257470
1   MDB         3% - 5%         37.257470
2  DASH         3% - 5%         37.257470
3   SPY         3% - 5%         37.247091

                              Buy - Hour  \
0  0    1
Name: Buy - Hour, dtype: int64
1  0    1
Name: Buy - Hour, dtype: int64
2  0    1
Name: Buy - Hour, dtype: int64
3  0    1
Name: Buy - Hour, dtype: int64

                               Buy - Day  \
0  0    28
Name: Buy - Day, dtype: int64
1  0    28
Name: Buy - Day, dtype: int64
2  0    28
Name: Buy - Day, dtype: int64
3  0    28
Name: Buy - Day, dtype: int64

                              Buy - Day of Week  \
0  0    1
Name: Buy - Day_of_Week, dtype: int64
1  0    1
Name: Buy - Day_of_Week, dtype: int64
2  0    1
Name: Buy - Day_of_Week, dtype: int64
3  0    1
Name: Buy - Day_of_Week, dtype: int64

                              Sell - Hour  \
0  0    6
Name: Sell - Hour, dtype: int64
1  0    6
Name: Sell - Hour, dtype: int64
2  0    6
Name: Sell - Hour, dtype: int64
3  0    6
Name: Sell - Hour, dtype: int64

                               Sell - Day  \
0  0    23
Name: Sell - Day, dtype: int64
1  0    23
Name: Sell - Day, dtype: int64
2  0    23
Name: Sell - Day, dtype: int64
3  0    23
Name: Sell - Day, dtype: int64

                              Sell - Day of Week
0  0    4
Name: Sell - Day_of_Week, dtype: int64
1  0    4
Name: Sell - Day_of_Week, dtype: int64
2  0    4
Name: Sell - Day_of_Week, dtype: int64
3  0    4
Name: Sell - Day_of_Week, dtype: int64

In conclusion, while we might not have tested the findings we have here due to time constraint, it is definitely an area that could be further looked into, and might need to adopt some form of pattern recognizer to automatically seek out formulas or trends might not be obvious to us, such as using Genetic Algorithm, Path Finding/Search Algorithms or Clustering Algorithms.

This new set of data that was created seems like a good way to also further preprocess into many other forms of data. For example, we could remove the datetime and leave the rest of the data. Then make the duration a target variable for a machine learning model to train and it might be able to guess how long we will need to hold the stock for a certain percentage of return, given the current set of datetime information. Similar to what is shown in the coursework 1 section, where we tried to look for recurring patterns but looking into past a fixed number of periods, what kind of property of the price movement can lead that might lead to the returns that an investor can expect.