This project, designed in spanish, presents an educational journey comprised of three colabs designed to guide participants in the use of Large Language Models (LLMs). We cover everything from installing and configuring essential tools to pre-training and fine-tuning models. Through guided practice and participation in active communities in the field, we aim to reduce the technical barriers faced by beginners, facilitating access to these technologies. We believe that with a clear structure and the support of a community, users will be able to not only understand, but also apply and customize LLMs to solve specific problems. Participants will learn how to install and use tools such as Hugging Face and Google Colab, prepare datasets, train models, and apply fine-tuning techniques, with the goal of empowering more people to harness the potential of LLMs in their projects.
Github: https://github.com/Itnas157/llm_for_poets
Colab 1: https://colab.research.google.com/drive/1g21Ib4liopkIyQcU2qdgZpMFizXcW5aw?usp=sharing
Colab 2: https://colab.research.google.com/drive/14WAuWa-fBBM2tT93QtuwSAJZ2PgdU-8o?usp=sharing
Colab 3: https://colab.research.google.com/drive/1mZKYgpJ9tM-1_8FTLKwkFjpuv9hNGlnS?usp=sharing
Our central hypothesis is that, through a guided and structured approach, it is possible to simplify the process of adopting and customizing Large Language Models (LLMs) for users with basic programming knowledge.
Specifically, we believe that:
Accessible installation and configuration is key to democratizing the use of LLMs. By providing easy-to-implement tools and resources, we can reduce the technical barriers that many face when beginning to explore the world of language models.
Model pre-training and tuning, when taught in clear and practical steps, can be understood and applied even by those without deep experience using these models. Access to data, examples, and platforms such as Google Colab or Hugging Face will allow participants to reproduce and adapt these processes for specific use cases.
Community and ongoing support play a fundamental role for effective learning. Participation in active communities, knowledge sharing, and collaboration will accelerate the learning process and help overcome technical obstacles.
In summary, we believe that by facilitating access, practice and community support, participants will be able to develop fundamental skills to work with LLMs and apply these technologies to their own projects, successfully customizing the models to their needs.
Based on each of our hypotheses, we have set the corresponding objectives:
Getting familiar with LLMs: Provide a basic understanding of what LLMs are and how to use them. Understand the LLM work environment setup. Connect with communities developing and collaborating on LLM projects.
Training and fine-tuning: Teach how to pre-train and fine-tune models for specific tasks. Understand how to train an LLM and prepare data sets for training. Apply fine-tuning techniques to adapt an LLM to specific problems.
Building community: Foster collaborative work and problem solving around LLMs.
The introduction presents all the necessary information to become familiar with LLMs, fulfilling the first objective set. Although we explain more technical concepts such as transformers, decoders and encoders, we try not to overload the user with information and only help to reason how they work conceptually without addressing what neural networks are or more complex topics.
At the same time, in this colab we are taught how to create accounts on the HuggingFace and Weights&Biases platforms, and generate the access tokens that will allow you to link your account to use in the next colabs. This is in order to get the reader started in the community, although the corresponding objective is not completed in a specific colab; rather, throughout the three colabs the benefit of the community is present when downloading and uploading models and datasets from HuggingFace.
Although we initially started to train from scratch, we noticed that the amount of time required was too high for the course reader. In turn, this would require a greater understanding of what we are doing on the part of the reader, increasing the difficulty of the course and limiting potential interested users.
For this reason, it was decided to teach the user how to interact with the pre-trained models instead of training a model from scratch. Since, as we stated in our first hypothesis, we want the installation and configuration process to be accessible. So while Colab 1 helps the user understand what LLMs are, in Colab 2 the user is allowed to experiment with them and play (Particularly with GPT-2 and BERT)
At the same time, the importance of the community is still shown to the reader in an implicit way. Since the models and datasets used were brought from the community (HuggingFace).
And despite the development of the community and showing its usefulness has a central axis in this project. The user is also provided with the knowledge to download and upload models locally, having the LLM as a file in the system. This is because although we want the reader to get closer to the community, we do not want them to depend on it; but to keep in mind that it is not mandatory and they can always work locally on their models.
The third colab was the most complicated to bring to the reader. As this course seeks to be accessible, we had to find a fine balance between how much information to put in and how to simplify it in order to not overwhelm the reader and make them leave the course. That is why we opted for two strategies to make the information as digestible as possible:
Memes: Yes, memes! We know that the learning curve can be intense, so we recommend lightening it with humor. Although we didn't want to saturate the meme colab, we know that one way to retain the reader's attention is to give them a small dose of humor here and there to keep them interested in the content. Furthermore, by relating humor to the specific topics of the colab, they not only bring a smile, but reinforce the concepts in a memorable way. The proposal was such that memes were included in the first and second collab as well. In the end, if something makes you laugh, it is much harder to forget it.
Keeping technical information to a minimum: For example: without going into too much detail about training hyperparameters. We seek to ensure that the reader can perform fine-tuning by understanding 2 or 3 key parameters and not get overwhelmed, leaving the course due to the high number of options. At the same time, we limit the information in this colab to only what is necessary, which is why the division of a dataset into training and evaluation data was done previously. However, we also cannot explain nothing to the reader, so we give a brief explanation of the rest of the hyperparameters in case the user is interested in learning their account, and we teach in a visual way how to recognize if the data set is optimal for fine-tuning, or how to apply a small quick filter to it.
All of this with the sole purpose of making the fine-tuning process as easy to understand as possible.
Colab 1 fulfills our first objective (Familiarization with LLMs) on a conceptual level, while Colab 2 does so on a more practical level, allowing the user to replicate it with their own code. Colab 3 (and Colab 2 to a lesser extent by splitting the dataset) would fulfill the second objective (Training and fine tuning). And the last objective (Developing community) would be completed across all 3 colabs in different ways.
As for the hypotheses raised, these served as a compass for the design and structure of the project. The idea that accessible installation and configuration is key to democratizing the use of LLMs was directly integrated into Colab 1, where we prioritized step-by-step guides to configure tools such as Hugging Face and Weights&Biases, ensuring that even users with basic knowledge could follow the process without technical frustrations. This also motivated the inclusion of downloadable practical examples and a more visual than textual approach.
The hypothesis that model pre-training and fine-tuning could be understood with clear and practical steps was the central axis of the second and third colab. We reduced the number of advanced concepts to keep the reader's attention, limiting technical explanations to what was strictly necessary, such as the essential parameters of fine-tuning. At the same time, we opted for a more friendly narrative, supported by specific examples and memes, to reinforce retention and make learning more bearable.
Finally, the hypothesis about the importance of the community was reflected in how we structured the interaction with platforms such as Hugging Face, not only to download and use models, but also to show the value of collaboration and knowledge sharing. Although we cannot yet directly measure the impact on users, the integration of these ideas in each colab sought to maximize the learning and application potential of the participants.
Based on feedback from other university groups, we have implemented several key improvements to our project. One of the main ones was the fine-tuning of a step-by-step model for educational purposes, available in colab 3, designed to facilitate practical learning of the fine-tuning process. In addition, we included three quizzes—one at the end of each colab—so that users can self-assess their understanding and progress on the topics covered.
However, some suggestions, such as clearly defining the purpose of the fine-tuned model and addressing more realistic applications of the models, were not fully implemented due to logistical challenges. Originally, we planned to use the Twitter Genderbias dataset to fine-tune GPT-2 and address gender issues in automated responses. This would have fulfilled both goals: a practical approach and a concrete application.
Unfortunately, the unexpected departure of a member and lack of time led us to prioritize a more general approach. We opted to show the fine-tuning process as a replicable example, leaving open the possibility for users to apply the concepts learned to their own use cases. While this decision limited the depth of specific application, it was in line with the core purpose of the project: to facilitate hands-on learning and enable users to understand and adapt the models to their needs.