EXPLORATORY DATA ANALYSIS (EDA) OF HISTORICAL BITCOIN DATA
OVERVIEW
This project entailed collecting a year's worth of Bitcoin prices and performing an extensive EDA. I began with a data overview to understand the DataFrame's structure, followed by rigorous data cleaning to ensure accuracy. My analysis included a statistical summary to grasp the data's properties and a detailed time series analysis of various price metrics. I also examined trading volumes and used histograms for the distribution analysis of different metrics. Additionally, I conducted correlation analysis to explore inter-metric relationships and identified anomalies to detect unusual patterns or outliers, providing comprehensive insights into Bitcoin's market behaviour.
INTRODUCTION
Bitcoin is a digital currency, but to understand it fully, we must first understand what money is and the role banks play. Money, in its traditional form, is a medium of exchange, a unit of account, and a store of value. Banks play a crucial role in managing and regulating this money, acting as intermediaries for financial transactions, maintaining transaction records, and controlling the money supply through various policies.
​
Bitcoin emerged in 2009, created by an individual or group under the pseudonym Satoshi Nakamoto. It was born out of the 2008 financial crisis, aiming to create a decentralised financial system that operates independently of central banks and government control. Unlike traditional currencies, Bitcoin is entirely digital and uses a technology called blockchain. This technology records all transactions across a network of computers, making Bitcoin transparent yet secure and resistant to fraud.
​
Today, Bitcoin has gained substantial popularity and acceptance. A growing number of people use it for various purposes, including investment, online transactions, and as a hedge against inflation and currency devaluation. However, opinions on Bitcoin are polarised. Proponents praise its potential to democratise finance and provide financial services to the unbanked, while critics raise concerns over its volatility, environmental impact due to energy-intensive mining processes, and potential for use in illegal activities.
Despite these debates, Bitcoin continues to play a significant role in the evolving landscape of digital currencies and blockchain technology.
BITCOIN PRICE FLUCTUATION
Bitcoin price fluctuation can be attributed to several factors, reflecting its unique nature as a decentralised digital currency. Unlike traditional currencies, Bitcoin is not regulated by any central authority, such as a government or central bank. This absence of regulatory oversight means its price is largely driven by market demand and supply dynamics.
​
Several key factors contribute to Bitcoin's price volatility:
1. Market Sentiment: News events that portray Bitcoin in a negative light often scare investors. Conversely, positive news can attract new investors, driving the price up. Market sentiment is highly influenced by media coverage, regulatory news, and endorsements or criticisms by influential figures.
2. Speculation: Many investors see Bitcoin as an investment opportunity rather than a currency. This speculation can lead to rapid buying and selling, causing significant price swings.
3. Adoption and Utility: As more companies and platforms accept Bitcoin as a payment method, its utility increases, positively impacting its value. Conversely, regulatory crackdowns or technological issues can negatively affect its adoption.
4. Supply and Demand: Bitcoin’s total supply is capped at 21 million coins. This limited supply, combined with increasing demand, especially from institutional investors, can drive prices up.
5. Market Manipulation: Due to the relatively small market size compared to traditional currencies, Bitcoin is susceptible to price manipulation by large holders, known as 'whales'.
​
Overall, Bitcoin’s price fluctuation is a complex interplay of these factors, making it a highly volatile asset. Investors often approach it with caution, aware of the potential for rapid changes in its value.
EXPLORATORY DATA ANALYSIS (EDA) OF HISTORICAL BITCOIN DATA GATHERED OVER A 1 YEAR PERIOD
Exploratory Data Analysis (EDA) is a process of analysing datasets to summarise their main characteristics, often using visual methods. It helps uncover patterns, spot anomalies, and test hypotheses, providing a deeper understanding of data trends and relationships.
In the context of Bitcoin, EDA involves examining historical price trends, volume, and market behaviours to understand Bitcoin's past performance and market dynamics.
​
To explore the historical Bitcoin data, I first reviewed the DataFrame's structure and then cleaned it for any missing values or inconsistencies. A statistical summary provided insights into the data's properties. I conducted a time series analysis of metrics like Open, High, Low, Close, and Adjusted Close, and examined trading volumes. Histograms helped analyze the distribution of these metrics, while scatter plots and correlation coefficients revealed relationships between them. Finally, I identified any unusual patterns or outliers to detect anomalies.
Statistical Summary
This involves examining the data's basic statistical measures, such as mean, median, and standard deviation, to gain an initial understanding of its characteristics.
The above data frame presents a statistical summary of a stock market dataset, detailing descriptive statistics for the Open, High, Low, Close, Adjusted Close, and Volume of a financial instrument over 366 recorded days. This summary provides insights into market behaviour, volatility, and trading activity over the given period.
Key observations include:
-
The average (mean) opening price is approximately 25,590 units, with the closing prices tracking closely, indicating stable trading sessions.
-
The standard deviation for the High prices is the highest among the columns, suggesting greater volatility or range in daily price peaks.
-
The minimum values across Open, High, Low, and Close are very similar, hinting at a significant drop or a lower bound in the market that was tested on one or more days.
-
The maximum values show a notable peak, especially in the High column, indicating a day or days with exceptionally high prices.
-
Volume, a measure of how many units were traded, has a mean of roughly 182 million, with substantial variability (standard deviation ~ 86 million), pointing to varying levels of trading activity.
-
The median (50%) prices are consistent with the means, suggesting a relatively symmetric distribution of daily prices around the central tendency.
Time Series Analysis
Here, I plotted data points over time for key metrics like Open, High, Low, Close, and Adjusted Close to analyze trends and patterns.
The above graph is a multi-line time series chart that tracks the price movement of Bitcoin, with lines representing Open, High, Low, and Close values. This type of visualisation is crucial for investors to analyse market behaviour over time.
The lines follow similar trends, with the High values peaking slightly above the Open and Close, indicating the volatility within each period. The Close and Adjusted Close lines closely overlap, suggesting minimal adjustment for external factors, which is somewhat atypical for cryptocurrencies compared to traditional securities. The graph shows a significant upward trend towards the end, hinting at a bullish market phase.
Volume Analysis
This task focuses on analysing the trading volume's variation over time, providing insights into market activity and investor behaviour.
The trading volume shows significant variability, with pronounced spikes that suggest periods of high trading activity. These peaks could be associated with specific market events, such as news releases, regulatory announcements, or changes in investor sentiment. The volume does not exhibit a uniform trend, but rather intermittent bursts of activity.
Between these spikes, there are troughs indicating lower trading volumes. This pattern of volatility in trading volume is not unusual for cryptocurrency markets, which are known for their rapid price movements and the speculative nature of trading.
The absence of a clear upward or downward trend in the volume suggests that over the depicted timeframe, there was no consistent increase or decrease in trading interest. It is possible that the peaks correspond to both sudden surges in buying and selling pressure.
Distribution Analysis
Utilising histograms, this analysis helps in understanding how various metrics are distributed across the dataset, revealing their underlying structure.
Histograms are graphical representations of the distribution of data. The following trends and potential analyses are observed for the above histograms:
Skewed Distributions: Each histogram exhibits a right-skewed distribution, indicating that the majority of the data points fall to the left of the graph (lower values) with a long tail to the right (higher values). This suggests that the variable being measured has a lot of low-value occurrences and a few high-value occurrences.
Mode and Outliers: The mode of these distributions (the peak) is always on the left side of the histogram, which implies that the most frequent values are low. The long tail to the right could represent outliers or a small number of very high values.
Variability Between Histograms: The histograms show variability in terms of the height of the peaks and the spread of the data. This could indicate that the data is being segmented into different categories or taken from different time periods with varying levels of concentration and spread.
Volume of Transactions or Holdings: The data suggests that most addresses hold small amounts of Bitcoin, while a few addresses hold large amounts, which is consistent with the general understanding of wealth distribution in cryptocurrency markets.
​
Overall, the histograms suggest a common distribution pattern across different datasets or categories within a larger dataset, characterised by a prevalence of lower values and a rarity of higher values.
Correlation Analysis
This involves using scatter plots and correlation coefficients to explore and quantify the relationships between different financial metrics in the data.
In finance, a correlation matrix is used to measure the degree to which two securities move about each other. Correlation values range from -1 to 1, where:
A value of 1 implies that the two variables are perfectly positively correlated, moving in the same direction.
A value of -1 implies a perfect negative correlation, meaning they move in opposite directions.
A value of 0 implies no correlation.
​
The matrix shows very high positive correlations between all pairs of variables, with values exceeding 0.99 in all cases. This indicates that the Open, High, Low, Close, and Adjusted Close prices of Bitcoin move almost identically. Such a strong correlation is expected because these metrics are all directly derived from the daily price movement of Bitcoin.
The Adjusted Close price is typically used for stocks to reflect the stock's value after accounting for any corporate actions, but for Bitcoin and many other cryptocurrencies, the Adjusted Close is often the same as the Close price, hence the perfect correlation of 1.
The high degree of correlation suggests that any of these metrics can be used as a reliable representation of Bitcoin's price movement over the time period. It also indicates a stable market without significant discrepancies between the different price measures during the trading days.
Detecting Anomalies
In this step, we identify and examine any unusual or unexpected patterns in the data, which could indicate errors or significant events.
Central Tendency: The band inside each box represents the median price. It appears fairly consistent across the plots, suggesting that the central tendency of Bitcoin prices didn't change dramatically over the different time periods represented.
Spread: The height of the boxes, which indicates the interquartile range (IQR), appears similar across all plots, suggesting a consistent spread of prices around the median.
Range: The lines (or "whiskers") extending from the boxes indicate the range of the data excluding outliers. These seem to vary a bit across the plots, which might indicate changes in the volatility of Bitcoin prices.
Outliers: Any data points not included within the whiskers might be considered outliers. There are no individual points plotted outside the whiskers.
Symmetry: The boxes seem symmetric around the median, indicating that the distribution of prices is roughly equal above and below the median.
​
In summary, the box plots suggest a stable median price of Bitcoin with a consistent spread throughout the year.
CONCLUSION
The comprehensive Exploratory Data Analysis (EDA) of historical Bitcoin data over the past year provides valuable insights into the market behaviour of this pioneering cryptocurrency. The statistical summary underscored the inherent volatility of Bitcoin, with notable fluctuations in daily high and low values, yet a close alignment between opening and closing prices indicated relatively stable trading sessions. This stability, amidst the volatility, highlights Bitcoin's resilience in the face of market uncertainties and the evolving landscape of digital currencies.
The time series analysis revealed clear trends in Bitcoin's price movement, illustrating its susceptibility to market sentiments and external factors. The volume analysis further emphasised the cryptocurrency's dynamic nature, with trading volumes showing substantial variability. This underscores the impact of external events and investor sentiment on trading activities.
Distribution analysis using histograms revealed a skewed pattern, indicating a concentration of lower values and occasional high-value outliers. This skewness is reflective of the broader trends in cryptocurrency markets, where small-scale transactions are more frequent than large-scale ones.
The correlation analysis provided a clear picture of the tight relationship between various price metrics, affirming the consistency and reliability of the data. Lastly, the anomaly detection through box plots indicated a stable median price, suggesting a level of predictability amidst the asset's known volatility.
In conclusion, this EDA offers a nuanced understanding of Bitcoin's behaviour over the past year, highlighting its volatile yet resilient nature. These insights are crucial for investors, enthusiasts, and researchers, providing a factual basis for understanding the complexities and dynamics of Bitcoin as a digital asset in the ever-evolving world of cryptocurrencies.
REFERENCES
1. Nakamoto, S. (2008). "Bitcoin: A Peer-to-Peer Electronic Cash System." [Online]. Available: https://bitcoin.org/bitcoin.pdf. This white paper by Satoshi Nakamoto introduces the concept of Bitcoin and lays the foundational principles of its functioning and the technology behind it, namely blockchain.
2. Ferguson, N. (2008). "The Ascent of Money: A Financial History of the World." Penguin Books. This book provides a comprehensive historical account of the development of money, from ancient times to the modern era. It offers valuable insights into how money has evolved and shaped human civilization.
3. Narayanan, A., Bonneau, J., Felten, E., Miller, A., & Goldfeder, S. (2016). "Bitcoin and Cryptocurrency Technologies: A Comprehensive Introduction." Princeton University Press. This text offers an in-depth technical and historical analysis of Bitcoin and cryptocurrencies, making it a crucial resource for understanding the technical aspects of how Bitcoin operates.
4. Mihm, S. (2007). "A Nation of Counterfeiters: Capitalists, Con Men, and the Making of the United States." Harvard University Press. Mihm's book provides an interesting perspective on the history of money in the United States, focusing on the era of unregulated currency and the evolution of the financial system, offering context to the development of a centralized monetary system.
5. Lietaer, B., & Dunne, J. (2013). "Rethinking Money: How New Currencies Turn Scarcity into Prosperity." Berrett-Koehler Publishers. This book explores the concept of money beyond traditional paradigms, examining alternative currency systems and their potential impact on economic well-being, which provides context for understanding the emergence of cryptocurrencies like Bitcoin.
6. ChatGPT-4 by OpenAI. (2023). Conversational AI model used for providing information on various topics, including the technicalities of Bitcoin, blockchain technology, and the historical evolution of money. [Conversational interactions].
These resources collectively provide a thorough understanding of both the technical aspects of Bitcoin and the broader historical and social context of the evolution of money.
REFERENCES
-
National Institute of Water and Atmospheric Research (NIWA). (2023). "CliFlo: The National Climate Database." Retrieved from https://cliflo.niwa.co.nz/pls/niwp/wgenf.genform1.
-
NIWA Climate. (2023). "Climate Summary for January 2023." Retrieved from https://niwa.co.nz/climate/monthly/climate-summary-for-january-2023.
-
Shepherd, M. (2023, January 27). "Rainfall Records Shattered in Auckland, New Zealand—It Wasn't Even Close." Forbes. Retrieved from https://www.forbes.com/sites/marshallshepherd/2023/01/27/rainfall-records-shattered-in-auckland-new-zealandit-wasnt-even-close.
-
Wikipedia Contributors. (2023). "2023 Auckland Anniversary Weekend floods." Wikipedia. Retrieved from https://en.wikipedia.org/wiki/2023_Auckland_Anniversary_Weekend_floods.
-
Petley, D. (2023, January 30). "Auckland 1." The Landslide Blog, American Geophysical Union. Retrieved from https://blogs.agu.org/landslideblog/2023/01/30/auckland-1.
-
Center for Disaster Philanthropy. (2023). "2023 New Zealand Floods." Retrieved from https://disasterphilanthropy.org/disasters/2023-new-zealand-floods.
-
NOAA PMEL. (n.d.). "What is La Niña?" Retrieved from https://www.pmel.noaa.gov/elnino/what-is-la-nina.