Back to Publications

Text Autocompletion Using LSTM Neural Networks

Table of contents

Shakespeare Text Autocompletion Using LSTM Neural Networks

Abstract

This project implements an innovative text autocompletion system that generates Shakespearean-style text using Long Short-Term Memory (LSTM) neural networks. The system is deployed as a web application using Streamlit, allowing users to input seed phrases and receive text completions in Shakespeare's distinctive style.

Introduction

Natural language generation has become increasingly sophisticated with the advent of deep learning. This project demonstrates the application of LSTM networks to generate context-aware text completions in the specific literary style of William Shakespeare, offering both educational and creative value.

Technical Architecture

Technology Stack

  • Frontend: Streamlit
  • Backend: Python
  • Deep Learning Framework: TensorFlow
  • Model Architecture: LSTM (Long Short-Term Memory)
  • Text Processing: TensorFlow's Tokenizer
  • Model Persistence: pickle, HDF5

Core Components

  1. LSTM Model

    • Trained on Shakespeare's complete works
    • Optimized for next-word prediction
    • Saved in HDF5 format
  2. Tokenizer

    • Vocabulary based on Shakespeare's works
    • Handles word-to-index mapping
    • Serialized using pickle

Implementation Details

Model and Tokenizer Loading

import tensorflow as tf from tensorflow.keras.models import load_model import pickle # Load pre-trained components tokenizer = pickle.load(open('tokenizer.pkl', 'rb')) model = load_model('sentence_completion.h5')

Text Autocompletion Function

def autoCompletions(text, model, tokenizer, max_sequence_len): # Convert input text to sequence sequence = tokenizer.texts_to_sequences([text]) # Pad sequence to uniform length sequence = pad_sequences(sequence, maxlen=max_sequence_len-1, padding='pre') # Predict next word predicted_word_index = np.argmax(model.predict(sequence, verbose=0)) # Convert index back to word predicted_word = '' for word, index in tokenizer.word_index.items(): if index == predicted_word_index: predicted_word = word break return text + ' ' + predicted_word

Streamlit Interface Implementation

st.title('Shakespeare Text Autocompletion') # User input controls input_text = st.text_input('Enter a starting phrase:') num_sentences = st.slider('Number of Words to generate:', 1, 2, 3) if st.button('Complete Text'): generated_text = input_text for _ in range(num_sentences): # Avoid word repetition generated_words = generated_text.split() if (len(generated_words) > 1 and generated_words[-1] == generated_words[-2]): generated_text += ' ' + autoCompletions( generated_text, model, tokenizer, max_sequence_len=len(generated_words)+1 ) else: generated_text = autoCompletions( generated_text, model, tokenizer, max_sequence_len=len(generated_words)+1 ) st.success(generated_text)

Features

  1. Real-time Text Generation

    • Immediate response to user input
    • Configurable number of words to generate
    • Prevents repetitive word generation
  2. User Interface

    • Clean, intuitive design
    • Text input field for seed phrases
    • Slider for controlling generation length
    • Information sidebar explaining the system
  3. Text Processing

    • Tokenization of input text
    • Sequence padding for uniform length
    • Word index mapping for predictions

Model Training Details

The LSTM model was trained on:

  • Complete works of William Shakespeare
  • Preprocessed to maintain linguistic patterns
  • Optimized for capturing Shakespearean style

Performance Considerations

  1. Prediction Speed

    • Model optimized for real-time responses
    • Efficient sequence padding
    • Minimal preprocessing overhead
  2. Memory Usage

    • Tokenizer vocabulary optimization
    • Model weight compression
    • Efficient loading of pre-trained components

Future Improvements

  1. Temperature-based sampling for varied outputs
  2. Support for longer text generation
  3. Fine-tuning options for different Shakespeare works
  4. Multi-language support
  5. Enhanced repetition avoidance algorithms

Conclusion

This project demonstrates the successful application of LSTM networks for style-specific text generation, providing an interactive way to explore and generate Shakespearean-style text. The implementation balances accuracy, performance, and user experience.

App Link : https://huggingface.co/spaces/AbhayBhaskar/Shakespeare-Text-Autocompletion

Models

Datasets

There are no datasets linked

Files