Building My Own Jarvis: An AI Adventure
- ChatGPT
- Coffee
- Jarvis
- Nestjs
- OpenAI API
- Tutorial
Table of contents
Building My Own Jarvis: An AI Adventure
Introduction
In this project, I set out to create my own AI assistant, inspired by Tony Stark’s Jarvis. The goal was to integrate OpenAI’s tools into a NestJS backend to develop an intelligent, interactive assistant capable of responding to user queries through both text and speech.
Inspiration
The idea stemmed from two main motivations:
- Personal Growth: A desire to explore AI integration within my IT projects.
- Real-World Use Cases: Seeing a friend use ChatGPT to prep for interviews sparked the realization that AI could enhance speech practice, debate preparation, and even casual conversations.
Technical Overview
Core Technologies
- NestJS: The backbone of the project, providing a structured and modular framework.
- OpenAI API: Powering AI-driven responses.
- Prompt Engineering: Carefully crafting AI prompts to improve interactions.
System Architecture
The project is built with multiple NestJS services working together:
- ChatOpenaiConnectorService: Connects to OpenAI's API to retrieve AI responses.
- ChatHistoryManager & ChatSession: Maintains conversation history across sessions.
- ChatGptApiService: Manages chat flow, including session creation and AI response retrieval.
- TextToSpeechService & SpeechToTextService: Converts text into speech and vice versa (Speech-to-Text planned for future updates).
- ChatSecurityService: Filters AI responses to ensure they adhere to security guidelines.
- SpeakerService: The main orchestrator that integrates all the above services into a unified AI assistant.
Key Functionalities
Chat Sessions
Users can initiate a chat session with different AI personalities. Each session has a unique UUID, allowing for personalized and continuous interactions.
OpenAI API Integration
NestJS communicates with OpenAI using the ChatOpenAI model, storing responses in a structured manner. A sample implementation
OpenAI API Integration
Users can receive AI-generated responses in audio format. This is handled via an API call to OpenAI’s Text-to-Speech (TTS) service.
Chat Security
AI responses are screened for inappropriate content using ChatSecurityService, which relies on OpenAI for content moderation. The security rules are defined in a system prompt loaded at runtime.
SpeakerService
The SpeakerService ties everything together. It allows users to select an AI personality, interact via text or speech, and ensures responses pass through security checks.
Demo
- Creating a session: Users initiate a session with a personality choice.
- Interacting via text: Sending text messages to receive AI responses.
- Interacting via speech: Sending messages through the verbal endpoint to receive spoken responses.
- Security validation: Ensuring AI responses comply with predefined rules.
Medium article: https://medium.com/@jeremy.brunel.fullstack/building-my-own-jarvis-an-ai-adventure-7c9bcfb74dc0
GitHub project: https://github.com/jbfullstack/home-made-jarvis-backend