The work was carried out by:
This thesis investigates the potential of leveraging Large Language Models (LLMs) to support Bitcoin traders. Specifically, we analyze the correlation between Bitcoin price movements and sentiment expressed in news headlines, posts, and comments on social media.
We build a novel, large-scale dataset that aggregates various features related to Bitcoin and its price over time, spanning from 2016 to 2024, and includes data from news outlets, social media posts, and comments.
Using this dataset, we tried to evaluate the effectiveness of LLMs and Deep Learning models by making predictions on real data through standard classification tasks, as well as backtesting and demo trading accounts with different investment strategies.
We build interactive interfaces to annotate real-time data via LLMs, perform custom backtesting, and visualize demo trading account performances.
Our approach leverages the extended context capabilities of recent LLMs through simple prompting to generate outputs such as textual reasoning, sentiment, recommended trading actions, and confidence scores. Our findings reveal that LLMs represent a powerful tool for assisting trading decisions, opening up promising avenues for future research.
+- root
+- backtest
+- data_annotation
+- data_exploratory_analysis
+- data_mining
+- data_predictions
+- demo
+- models
+- shared
+- utils
+- config.py
+- requirements.py
Where:
backtest
: Contains the scripts needed to backtest the strategies using thedata_annotation
: Contains the procedures to annotate the data using thedata_exploratory_analysis
: Contains the scripts that allow visualizingdata_mining
: Contains the procedures to retrieve all the data needed fordata_predictions
: Contains the scripts that allow deep learning models todemo
: Contains the files needed to view real-time data annotation and thehf_data
: Contains all datasets collected during the retrieval, annotation, andmodels
: Contains the definition of the deep learning models used during thesecrets
: Contains the secrets used within the project such as api keys andshared
: Contains variables and constants shared by most of the files in theutils
: Contains methods that are shared by most of the files in the project.config.py
: Contains the configuration variables shared by most of the files inrequirements.py
: Contains requirements to be installed before running theWe use Python 3.12.4 which is the last version supported by PyTorch.
python3 -m venv .venv
.venv\scripts\activate
pip install -r requirements.py
You can download the needed data from this Hugging Face Repository.
Put the downloaded folders into hf_data
directory.
The annotated
folder contains the original dataset with the annotation of the respective LLMs.
The merged
folder contains the raw dataset without annotation of LLMs (price data, blockchain, and sentiment indices)
Download and install Ollama
Setup the following LLMs:
Create Gemini API Key
Create gemini.json
file in the secrets
directory and add
{
"GOOGLE_API_KEY_1": "<api_key>",
}
Create Reddit API Key
Create reddit.json
file in the secrets
directory and add
{
"client_id": "<client_id>",
"client_secret": "<client_secret>",
"user_agent": "<user_agent>"
}
Download and install HTTP Toolkit
Set on your pc as a custom proxy:
ip: 127.0.0.1
port: 8080
Execute
python -m demo.demo
Execute
python -m backtest.backtest