
⬅️ Previous - How to Succeed in This Program
📚 First lesson - What Powers ChatGPT and Modern AI
The Module 2 Capstone Project is your final major milestone toward certification.
By completing it, you’ll demonstrate that you can take a fine-tuned model from experimentation to production — deploying it on a cloud platform using tools like vLLM, Modal, or the Hugging Face Inference API.
You’ll also practice performance testing, cost analysis, and monitoring, applying observability tools such as Weights & Biases, or CloudWatch to evaluate your system’s behavior.
In this lesson, you’ll review the objectives, deliverables, and evaluation criteria for your final project — the last step before earning your LLM Engineering & Deployment Certification.
In this project, you’ll deploy your fine-tuned model on a cloud platform such as Modal, AWS SageMaker, Amazon Bedrock, or the Hugging Face Inference API.
You’ll run performance tests to evaluate latency, cost, and reliability, while integrating a basic monitoring tool to track requests, usage, and errors.
Your goal is to demonstrate that you can:
Your project should include the following core components:
Deploy your fine-tuned model using a cloud inference platform such as:
Demonstrate successful inference requests using a demo notebook, API call, or web interface.
Your deployment should be testable — meaning reviewers should be able to interact with your model.
Document key configuration details such as model type, hardware used, and endpoint parameters.
Run a short set of tests (5–10 requests) to evaluate your deployed model’s performance.
| Metric | Cloud Deployment |
|---|---|
| Average Latency (ms) | – |
| Cost per 1K Tokens | – |
| Response Reliability | – |
Summarize your findings clearly — a brief comparison of latency, cost, and reliability is enough.
You don’t need a full benchmarking suite, just a small, reproducible test setup.
Integrate one simple monitoring or tracing tool to observe your model’s performance and behavior in real time.
Recommended options include:
Your monitoring setup should:
You don’t need complex automation or alerting — the goal is to demonstrate awareness of real-world monitoring and basic observability practices.
Create a concise setup and usage guide that includes:
Module 2 projects follow the same monthly review schedule as Module 1.
To be included in a given month’s review cycle, make sure to submit your project by one of the following dates:
If you miss a listed date, your project will simply roll over to the next month’s review.
Reviews typically take about two weeks, during which you’ll receive feedback and, if needed, an opportunity to make improvements before final evaluation.
Plan ahead so you can complete your submission comfortably within your preferred review window.
Create a short publication on Ready Tensor that:
📄 Publication Evaluation Rubric
Submit a repo that:
📄 Repository Evaluation Rubric
Successfully completing this project earns you the LLM Deployment Engineer credential — recognizing your ability to deploy, monitor, and evaluate large language models in real-world environments.
If you’ve also earned the LLM Fine-Tuning Specialist credential from Module 1, you’ll be awarded the LLM Engineering & Deployment Certification, representing full completion of the program.
Once you’ve submitted your project, it will be reviewed by the evaluation team.
If it meets the certification standards, you’ll receive your credential — and if you’ve completed both modules, your full program certificate as well.
This marks your official recognition as a certified LLM Engineer, capable of taking models from fine-tuning to production.