Oct 31, 2024●13 reads●MIT License

Text Autocompletion Using LSTM Neural Networks

Abhay Bhaskar

Shakespeare Text Autocompletion Using LSTM Neural Networks

Abstract

This project implements an innovative text autocompletion system that generates Shakespearean-style text using Long Short-Term Memory (LSTM) neural networks. The system is deployed as a web application using Streamlit, allowing users to input seed phrases and receive text completions in Shakespeare's distinctive style.

Introduction

Natural language generation has become increasingly sophisticated with the advent of deep learning. This project demonstrates the application of LSTM networks to generate context-aware text completions in the specific literary style of William Shakespeare, offering both educational and creative value.

Technical Architecture

Technology Stack

Frontend: Streamlit
Backend: Python
Deep Learning Framework: TensorFlow
Model Architecture: LSTM (Long Short-Term Memory)
Text Processing: TensorFlow's Tokenizer
Model Persistence: pickle, HDF5

Core Components

LSTM Model
- Trained on Shakespeare's complete works
- Optimized for next-word prediction
- Saved in HDF5 format
Tokenizer
- Vocabulary based on Shakespeare's works
- Handles word-to-index mapping
- Serialized using pickle

Implementation Details

Model and Tokenizer Loading

import tensorflow as tf
from tensorflow.keras.models import load_model
import pickle

# Load pre-trained components
tokenizer = pickle.load(open('tokenizer.pkl', 'rb'))
model = load_model('sentence_completion.h5')

Text Autocompletion Function

def autoCompletions(text, model, tokenizer, max_sequence_len):
    # Convert input text to sequence
    sequence = tokenizer.texts_to_sequences([text])
    
    # Pad sequence to uniform length
    sequence = pad_sequences(sequence, 
                           maxlen=max_sequence_len-1, 
                           padding='pre')
    
    # Predict next word
    predicted_word_index = np.argmax(model.predict(sequence, 
                                                 verbose=0))
    
    # Convert index back to word
    predicted_word = ''
    for word, index in tokenizer.word_index.items():
        if index == predicted_word_index:
            predicted_word = word
            break
            
    return text + ' ' + predicted_word

Streamlit Interface Implementation

st.title('Shakespeare Text Autocompletion')

# User input controls
input_text = st.text_input('Enter a starting phrase:')
num_sentences = st.slider('Number of Words to generate:', 1, 2, 3)

if st.button('Complete Text'):
    generated_text = input_text
    for _ in range(num_sentences):
        # Avoid word repetition
        generated_words = generated_text.split()
        if (len(generated_words) > 1 and 
            generated_words[-1] == generated_words[-2]):
            generated_text += ' ' + autoCompletions(
                generated_text, 
                model, 
                tokenizer, 
                max_sequence_len=len(generated_words)+1
            )
        else:
            generated_text = autoCompletions(
                generated_text, 
                model, 
                tokenizer, 
                max_sequence_len=len(generated_words)+1
            )
    
    st.success(generated_text)

Features

Real-time Text Generation
- Immediate response to user input
- Configurable number of words to generate
- Prevents repetitive word generation
User Interface
- Clean, intuitive design
- Text input field for seed phrases
- Slider for controlling generation length
- Information sidebar explaining the system
Text Processing
- Tokenization of input text
- Sequence padding for uniform length
- Word index mapping for predictions

Model Training Details

The LSTM model was trained on:

Complete works of William Shakespeare
Preprocessed to maintain linguistic patterns
Optimized for capturing Shakespearean style

Performance Considerations

Prediction Speed
- Model optimized for real-time responses
- Efficient sequence padding
- Minimal preprocessing overhead
Memory Usage
- Tokenizer vocabulary optimization
- Model weight compression
- Efficient loading of pre-trained components

Future Improvements

Temperature-based sampling for varied outputs
Support for longer text generation
Fine-tuning options for different Shakespeare works
Multi-language support
Enhanced repetition avoidance algorithms

Conclusion

This project demonstrates the successful application of LSTM networks for style-specific text generation, providing an interactive way to explore and generate Shakespearean-style text. The implementation balances accuracy, performance, and user experience.

App Link : https://huggingface.co/spaces/AbhayBhaskar/Shakespeare-Text-Autocompletion

Files

Oct 31, 2024●13 reads●MIT License

Text Autocompletion Using LSTM Neural Networks

Abhay Bhaskar

Shakespeare Text Autocompletion Using LSTM Neural Networks

Abstract

Introduction

Technical Architecture

Technology Stack

Frontend: Streamlit
Backend: Python
Deep Learning Framework: TensorFlow
Model Architecture: LSTM (Long Short-Term Memory)
Text Processing: TensorFlow's Tokenizer
Model Persistence: pickle, HDF5

Core Components

LSTM Model
- Trained on Shakespeare's complete works
- Optimized for next-word prediction
- Saved in HDF5 format
Tokenizer
- Vocabulary based on Shakespeare's works
- Handles word-to-index mapping
- Serialized using pickle

Implementation Details

Model and Tokenizer Loading

import tensorflow as tf
from tensorflow.keras.models import load_model
import pickle

# Load pre-trained components
tokenizer = pickle.load(open('tokenizer.pkl', 'rb'))
model = load_model('sentence_completion.h5')

Text Autocompletion Function

def autoCompletions(text, model, tokenizer, max_sequence_len):
    # Convert input text to sequence
    sequence = tokenizer.texts_to_sequences([text])
    
    # Pad sequence to uniform length
    sequence = pad_sequences(sequence, 
                           maxlen=max_sequence_len-1, 
                           padding='pre')
    
    # Predict next word
    predicted_word_index = np.argmax(model.predict(sequence, 
                                                 verbose=0))
    
    # Convert index back to word
    predicted_word = ''
    for word, index in tokenizer.word_index.items():
        if index == predicted_word_index:
            predicted_word = word
            break
            
    return text + ' ' + predicted_word

Streamlit Interface Implementation

st.title('Shakespeare Text Autocompletion')

# User input controls
input_text = st.text_input('Enter a starting phrase:')
num_sentences = st.slider('Number of Words to generate:', 1, 2, 3)

if st.button('Complete Text'):
    generated_text = input_text
    for _ in range(num_sentences):
        # Avoid word repetition
        generated_words = generated_text.split()
        if (len(generated_words) > 1 and 
            generated_words[-1] == generated_words[-2]):
            generated_text += ' ' + autoCompletions(
                generated_text, 
                model, 
                tokenizer, 
                max_sequence_len=len(generated_words)+1
            )
        else:
            generated_text = autoCompletions(
                generated_text, 
                model, 
                tokenizer, 
                max_sequence_len=len(generated_words)+1
            )
    
    st.success(generated_text)

Features

Real-time Text Generation
- Immediate response to user input
- Configurable number of words to generate
- Prevents repetitive word generation
User Interface
- Clean, intuitive design
- Text input field for seed phrases
- Slider for controlling generation length
- Information sidebar explaining the system
Text Processing
- Tokenization of input text
- Sequence padding for uniform length
- Word index mapping for predictions

Model Training Details

The LSTM model was trained on:

Complete works of William Shakespeare
Preprocessed to maintain linguistic patterns
Optimized for capturing Shakespearean style

Performance Considerations

Prediction Speed
- Model optimized for real-time responses
- Efficient sequence padding
- Minimal preprocessing overhead
Memory Usage
- Tokenizer vocabulary optimization
- Model weight compression
- Efficient loading of pre-trained components

Future Improvements

Temperature-based sampling for varied outputs
Support for longer text generation
Fine-tuning options for different Shakespeare works
Multi-language support
Enhanced repetition avoidance algorithms

Conclusion

App Link : https://huggingface.co/spaces/AbhayBhaskar/Shakespeare-Text-Autocompletion

Text Autocompletion Using LSTM Neural Networks

Table of contents

Shakespeare Text Autocompletion Using LSTM Neural Networks

Abstract

Introduction

Technical Architecture

Technology Stack

Core Components

Implementation Details

Model and Tokenizer Loading

Text Autocompletion Function

Streamlit Interface Implementation

Features

Model Training Details

Performance Considerations

Future Improvements

Conclusion

Files

Text Autocompletion Using LSTM Neural Networks

Table of contents

Shakespeare Text Autocompletion Using LSTM Neural Networks

Abstract

Introduction

Technical Architecture

Technology Stack

Core Components

Implementation Details

Model and Tokenizer Loading

Text Autocompletion Function

Streamlit Interface Implementation

Features

Model Training Details

Performance Considerations

Future Improvements

Conclusion

Files

Models

Models

Datasets

Datasets