This repository contains the code and resources for the machine learning focusing on language models.
The project explores the capabilities of GPT-2 and Llama models through pre-training, fine-tuning, and prompting techniques.
(please refer to the report.pdf
for more details on the project).
In this section, we pre-train the GPT-2 model on a Shakespearean text corpus to generate text in a similar style.
The codes are missing for this part, but you can refer to the report.pdf
for details on the training and validatin loss,
as well as the generated text samples.
The training and validation loss curves show the model's learning progress (for 5000 steps of iterations).
The generated text samples demonstrate the model's ability to mimic Shakespearean language,
although some grammatical inconsistencies are present due to character-level training.
We fine-tune the pre-trained GPT-2 model using the Alpaca-GPT4 dataset to enhance its instruction-following capabilities.
Alpaca-GPT4 Dataset:
The dataset contains 52k instruction-following examples with 1.5M tokens,
designed to evaluate the model's ability to follow human instructions.
Instruction Tuning Pipeline:
The fine-tuning pipeline uses a specific template for tokenizing instruction-following data
### Instruction: <instruction text here>
### Input: <input text here>
### Response: <response text here>
Memory Efficient Optimization:
We experimented with different optimization techniques to reduce memory consumption and computational cost
The fine-tuned models show significant improvements in instruction-following capabilities,
with higher relevance and fluency in generated responses.
We evaluate the effectiveness of CoT prompts on mathematical benchmarks using the Llama model,
using the following datasets for evaluation:
CoT Prompting Strategy and Results:
The CoT prompts are designed to improve the model's reasoning capabilities by providing detailed step-by-step instructions.
We evaluate the model's performance with different numbers of CoT examples (0-shot, 2-shot, and 4-shot).
The use of CoT prompts generally improves the model's performance, with 4-shot prompts yielding the best results for most datasets.
There are no datasets linked
There are no datasets linked