Machine Learning based Cryptocurrency Price Prediction using Historical Data and Social Media Sentiment


Saachin Bhatt, Mustansar Ghazanfar, and Mohammad Hossein Amirhosseini, University of East London, United Kingdom


The purpose of this research is to investigate the impact of social media sentiments on predicting the Bitcoin price using machine learning models, with a focus on integrating on-chain data and employing a Multi Modal Fusion Model. For conducting the experiments, the crypto market data, on-chain data, and corresponding social media data (Twitter) has been collected from 2014 to 2022 containing over 2000 samples. We trained various models over historical data including K-Nearest Neighbors, Logistic Regression, Gaussian Naive Bayes, Support Vector Machine, Extreme Gradient Boosting and a Multi Modal Fusion. Next, we added Twitter sentiment data to the models, using the Twitter-roBERTa and VADAR models to analyse the sentiments expressed in social media about Bitcoin. We then compared the performance of these models with and without the Twitter sentiment data and found that the inclusion of sentiment feature resulted in consistently better performance, with Twitter-RoBERTa-based sentiment giving an average F1 scores of 0.79. The best performing model was an optimised Multi Modal Fusion classifier using Twitter-RoBERTa based sentiment, producing an F1 score of 0.85. This study represents a significant contribution to the field of financial forecasting by demonstrating the potential of social media sentiment analysis, on-chain data integration, and the application of a Multi Modal Fusion model to improve the accuracy and robustness of machine learning models for predicting market trends, providing a valuable tool for investors, brokers, and traders seeking to make informed decisions.


Cryptocurrency, Bitcoin Price, Social Media, Sentiment Analysis, Machine Learning, K-Nearest Neighbors, Logistic regression, Gaussian Naive Bayes, Support Vector Machine, Extreme Gradient Boosting, Multi Modal Fusion