Text Autocompletion Using LSTM Neural Networks
Table of contents
Shakespeare Text Autocompletion Using LSTM Neural Networks
Abstract
This project implements an innovative text autocompletion system that generates Shakespearean-style text using Long Short-Term Memory (LSTM) neural networks. The system is deployed as a web application using Streamlit, allowing users to input seed phrases and receive text completions in Shakespeare's distinctive style.
Introduction
Natural language generation has become increasingly sophisticated with the advent of deep learning. This project demonstrates the application of LSTM networks to generate context-aware text completions in the specific literary style of William Shakespeare, offering both educational and creative value.
Technical Architecture
Technology Stack
- Frontend: Streamlit
- Backend: Python
- Deep Learning Framework: TensorFlow
- Model Architecture: LSTM (Long Short-Term Memory)
- Text Processing: TensorFlow's Tokenizer
- Model Persistence: pickle, HDF5
Core Components
-
LSTM Model
- Trained on Shakespeare's complete works
- Optimized for next-word prediction
- Saved in HDF5 format
-
Tokenizer
- Vocabulary based on Shakespeare's works
- Handles word-to-index mapping
- Serialized using pickle
Implementation Details
Model and Tokenizer Loading
import tensorflow as tf from tensorflow.keras.models import load_model import pickle # Load pre-trained components tokenizer = pickle.load(open('tokenizer.pkl', 'rb')) model = load_model('sentence_completion.h5')
Text Autocompletion Function
def autoCompletions(text, model, tokenizer, max_sequence_len): # Convert input text to sequence sequence = tokenizer.texts_to_sequences([text]) # Pad sequence to uniform length sequence = pad_sequences(sequence, maxlen=max_sequence_len-1, padding='pre') # Predict next word predicted_word_index = np.argmax(model.predict(sequence, verbose=0)) # Convert index back to word predicted_word = '' for word, index in tokenizer.word_index.items(): if index == predicted_word_index: predicted_word = word break return text + ' ' + predicted_word
Streamlit Interface Implementation
st.title('Shakespeare Text Autocompletion') # User input controls input_text = st.text_input('Enter a starting phrase:') num_sentences = st.slider('Number of Words to generate:', 1, 2, 3) if st.button('Complete Text'): generated_text = input_text for _ in range(num_sentences): # Avoid word repetition generated_words = generated_text.split() if (len(generated_words) > 1 and generated_words[-1] == generated_words[-2]): generated_text += ' ' + autoCompletions( generated_text, model, tokenizer, max_sequence_len=len(generated_words)+1 ) else: generated_text = autoCompletions( generated_text, model, tokenizer, max_sequence_len=len(generated_words)+1 ) st.success(generated_text)
Features
-
Real-time Text Generation
- Immediate response to user input
- Configurable number of words to generate
- Prevents repetitive word generation
-
User Interface
- Clean, intuitive design
- Text input field for seed phrases
- Slider for controlling generation length
- Information sidebar explaining the system
-
Text Processing
- Tokenization of input text
- Sequence padding for uniform length
- Word index mapping for predictions
Model Training Details
The LSTM model was trained on:
- Complete works of William Shakespeare
- Preprocessed to maintain linguistic patterns
- Optimized for capturing Shakespearean style
Performance Considerations
-
Prediction Speed
- Model optimized for real-time responses
- Efficient sequence padding
- Minimal preprocessing overhead
-
Memory Usage
- Tokenizer vocabulary optimization
- Model weight compression
- Efficient loading of pre-trained components
Future Improvements
- Temperature-based sampling for varied outputs
- Support for longer text generation
- Fine-tuning options for different Shakespeare works
- Multi-language support
- Enhanced repetition avoidance algorithms
Conclusion
This project demonstrates the successful application of LSTM networks for style-specific text generation, providing an interactive way to explore and generate Shakespearean-style text. The implementation balances accuracy, performance, and user experience.
App Link : https://huggingface.co/spaces/AbhayBhaskar/Shakespeare-Text-Autocompletion
Models
Datasets
There are no datasets linked