
As a student at BUT FIT, I had an incredible opportunity to access and use the MetaCentrum HPC cluster. However, setting up the environment and running even a small PyTorch demo can be quite challenging, especially for beginners. Since many students face these difficulties, I created and published a repository containing a simple demo that shows how to run text generation using Hugging Face Transformers and experiment with pre-trained models. I originally shared this code with my fellow students, and it would be a shame if it werenβt available to the general public using this service.
Okay, letβs do it. π
The published GitHub repository contains all the source code needed to run the main Python script. First, you need to log in to one of the log in nodes (also called frontend nodes). For example, to log in to the tarkil.grid.cesnet.cz frontend node:
ssh your_username@tarkil.grid.cesnet.cz
Then, clone the repository (or your fork) preferably via SSH:
(BOOKWORM)your_username@tarkil:~$ git clone git@github.com:davidchocholaty/hugging-face-transformers.git
The GitHub SSH key is used to clone, pull, and push repositories via SSH.
ssh-keygen -t ed25519 -C "your_email@example.com"
Add the public key to GitHub: go to GitHub β Settings β SSH and GPG keys β New SSH key and add your public key as Authentication Key.
In your .ssh folder, you should have both the private and public keys:
id_ed25519 # your private key
id_ed25519.pub # your public key
.
βββ configs/ # Configuration files for experiments and models
βββ metacentrum_scripts/ # Scripts for submitting jobs to the MetaCentrum HPC cluster
βββ demo.py # Main demo script for text generation using a specified model
βββ results/ # Directory for storing experiment logs and outputs
βββ README.md # This file
The configs/ folder contains YAML configuration files defining experiment setups.
Each config typically includes:
Example YAML (configs/demo_mistral_Mistral-7B-Instruct-v0.3.yaml):
desc: "Baseline experiment. Learning rate scheduler is linear, training 5 epochs."
model:
name: "Mistral-7B-Instruct-v0.3"
desc: "Mistral AI model."
path: "mistralai/Mistral-7B-Instruct-v0.3"
# Not used yet. Kept for demonstration purposes only.
datasets:
cnec2:
name: "CNEC 2.0 CoNLL"
desc: "Czech Named Entity Corpus 2.0 CoNNL dataset. General-language Czech NER dataset."
url_path: "https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11234/1-3493/cnec2.0_extended.zip"
medival:
name: "Medieval text"
desc: "A Human-Annotated Dataset for Language Modeling and Named Entity Recognition in Medieval Documents"
url_path: "https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11234/1-5024/named-entity-recognition-annotations-large.zip?sequence=2&isAllowed=y"
wikiann:
name: "Wikiann"
desc: "WikiANN (sometimes called PAN-X) is a multilingual named entity recognition dataset consisting of Wikipedia articles annotated"
slavic:
name: "Slavic"
desc: "Slavic documents"
url_train: "https://bsnlp.cs.helsinki.fi/bsnlp-2021/data/bsnlp2021_train_r1.zip"
url_test: "https://bsnlp.cs.helsinki.fi/bsnlp-2021/data/bsnlp2021_test_v5.zip"
# Not used yet. Kept for demonstration purposes only.
training:
num_train_epochs: 5
batch_size: 32
optimizer:
learning_rate: 5e-5
weight_decay: 0.01
beta1: 0.9
beta2: 0.999
eps: 1e-8
lr_scheduler:
name: "linear"
num_warmup_steps: 0
Scripts in this folder help submit experiments to the MetaCentrum HPC cluster.
Typical tasks include:
The following command is how you will run the code on the computing node. All the logic is contained in the run_job.sh and prepare_node.sh scripts.
./run_job.sh <branch_name> <config_file_name> <timeout>
./run_job.sh main demo_mistral_Mistral-7B-Instruct-v0.3 01:00:00
The demo.py file is a standalone Python script for running a text-generation demo using Hugging Face Transformers.
Features:
results/experiment_results.txtpython demo.py --config configs/my_experiment.yaml
Workflow:
Generated Output: [{'generated_text': "I am an AI assistant designed to help with various tasks."}]
Awesome! Youβve just completed the most challenging part. Congratulations! π
I would like to sincerely thank Kidist Demessie for contacting me and inviting me to this platform. I really appreciate her support in making this article possible.