My project is an end-to-end solution that transforms a simple prompt into a complete multimedia bedtime story experience. With just a title input, the system autonomously creates a unique story, generates a matching illustration, converts the text to speech, and combines these elements into a video experience that children can enjoy.
What makes this project truly agentic is its ability to function as an autonomous assistant that handles the entire creative process with minimal human input. The system:
The system demonstrates agency by bridging multiple specialized AI capabilities (text generation, image creation, voice synthesis) into a cohesive workflow that produces an output greater than the sum of its parts.
The application follows a modular architecture with specialized components handling different aspects of the content creation pipeline:
modules/
āāā generate_story.py - Story text generation using GPT-3.5
āāā generate_image.py - Cover image creation using DALL-E 3
āāā generate_audio.py - Speech synthesis using OpenAI TTS
āāā combine_audio_image.py - Video assembly using MoviePy
āāā database_initializer.py - PostgreSQL connection management
āāā user.py - User account and story collection management
The frontend uses a modern stack with Tailwind CSS for styling and vanilla JavaScript for interactivity.
One of the most challenging aspects was handling the lengthy AI generation processes without timing out the user's connection. I implemented an asynchronous processing system with client-side polling that:
// Client-side polling implementation async function pollForResult(taskId, url, loadingStateFunction) { const POLL_INTERVAL = 1000 // Polling interval in milliseconds let result = null while (true) { try { let response = await fetch(url + "/" + taskId) let responseData = await response.json() if (response.ok && responseData.taskCompleted) { result = responseData.result break // Exit the loop if task is completed } } catch (error) { console.error("Error:", error) } // Wait for the next polling interval await new Promise((resolve) => setTimeout(resolve, POLL_INTERVAL)) } return result }
To enhance the user experience, I implemented a dual storage approach:
This approach allows videos to be stored and replayed without requiring re-generation, significantly improving performance:
async function storeDataInIndexedDB(title, storyText, videoBase64) { localStorage.setItem("currentTitle", JSON.stringify(title)) // Clear the IndexedDB before storing new data clearObjectStore() // Open IndexedDB database and store the data const db = await openDatabase() const tx = db.transaction(["stories"], "readwrite") const store = tx.objectStore("stories") const data = { title, storyText, videoBase64 } const request = store.put(data) // Handle callbacks... }
The core of the application lies in its ability to seamlessly integrate multiple AI services:
# From combine_audio_image.py def combine(audio_content, image_data, output_file="bedtime_story.mp4"): try: # Create temporary files and directory temp_dir = tempfile.mkdtemp() temp_video_file = os.path.join(temp_dir, output_file) # Process audio and image into video with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as temp_audio_file: temp_audio_file.write(audio_content.read()) temp_audio_file.flush() audio = AudioFileClip(temp_audio_file.name, fps=44100) image = ImageClip(image_data) video = VideoClip(make_frame=lambda t: image.get_frame(t), duration=audio.duration) video = video.set_audio(audio) video.write_videofile(temp_video_file, fps=24, codec="libx264") # Read and return the completed video with open(temp_video_file, 'rb') as f: video_bytes = f.read() return video_bytes # Exception handling and cleanup...
The base64-encoded video files were often several megabytes in size, which caused performance issues in the browser.
Solution: Implemented a hybrid storage strategy using IndexedDB for client-side caching of media assets. This allows the application to store binary data efficiently and retrieve it without repeated API calls.
The generation of stories, images, and especially videos is computationally intensive and could lead to timeout issues.
Solution: Developed an asynchronous task processing system with client-side polling. This approach prevents server timeouts by immediately returning task IDs while processing continues in the background.
// Setting a reasonable timeout for long-running processes const timeoutPromise = new Promise((resolve, reject) => { setTimeout(() => { reject(new Error("Timeout exceeded")) }, 180000) // 3 minutes }) // Race between data completion and timeout const data = await Promise.race([dataPromise, timeoutPromise])
Coordinating between multiple OpenAI services (GPT, DALL-E, and TTS) required careful error handling and retry logic.
Solution: Implemented robust exception handling in each module, with detailed logging to quickly identify and address issues:
# Example from generate_image.py try: response = client.images.generate( model="dall-e-3", prompt=prompt, size="1024x1024", quality="standard", n=1 ) # Process response... except requests.exceptions.RequestException as e: ImageGenerator.logger.error(f"Error making API request: {e}") raise except json.JSONDecodeError as e: ImageGenerator.logger.error(f"Error parsing API response: {e}") raise except Exception as e: ImageGenerator.logger.error(f"Unexpected error: {e}") raise
The current system demonstrates the power of agentic AI in creative content generation, but several enhancements could further extend its capabilities:
The ChatGPT-Kids-bedtime-Story-Generator demonstrates how agentic AI can transform creative processes by orchestrating multiple specialized AI services into a cohesive workflow. By handling the entire pipeline from ideation to delivery, the system showcases how AI agents can augment human creativity and produce content that would otherwise require multiple skilled professionals.
What began as an exploration of API integration has evolved into a practical tool that demonstrates the potential of agentic AI to create meaningful experiences. I believe this project aligns perfectly with the spirit of the Agentic AI Innovation Challenge, as it demonstrates how autonomous systems can coordinate complex tasks to deliver tangible value.
Thank you for considering my submission. I would be happy to provide any additional information or clarification about the technical implementation.
Best regards,
Stephen Oketch.
There are no datasets linked
There are no datasets linked