Siddharth Chandel
Lovely Professional University, Phagwara (PB)
Email: siddharthchandel2004@gmail.com
LinkedIn: siddharth-chandel-001097245
Github repo: Conversational-AI
The development of conversational agents, commonly referred to as chatbots, has made significant strides in recent years, largely due to advancements in deep learning technologies. Among these, Recurrent Neural Networks (RNNs) and their variant, Long Short-Term Memory (LSTM) networks, have emerged as powerful tools for various natural language processing (NLP) tasks.
Traditional LSTM models have proven effective in processing sequential data, but they struggle with long-term dependencies, leading to less coherent responses. The Attention mechanism enhances LSTM models by dynamically focusing on important parts of the input.
This paper explores the impact of the Attention mechanism on LSTM networks in conversational AI applications.
Several studies have explored Recurrent Neural Networks (RNNs) and LSTMs for dialogue generation. The Seq2Seq model [1] laid the foundation for chatbots, while Bahdanau et al. [2] introduced the Attention mechanism, improving performance in text generation tasks.
Further advancements include Luong et al. [3], who refined Attention with global and local attention, and Vaswani et al. [4], who introduced the Transformer model, eliminating the need for RNNs altogether.
A custom question-answer dataset was used, formatted as conversational exchanges. Preprocessing steps included:
Two models were compared: Baseline LSTM and LSTM with Attention.
A standard encoder-decoder LSTM:
Hyperparameters:
An Attention mechanism was added:
Same hyperparameters as the baseline model for fair comparison.
Training & Validation Loss:
Model | Training Loss | Validation Loss |
---|---|---|
Baseline LSTM | 1.785 | 5.42 |
LSTM with Attention | 0.507 | 6.94 |
Findings:
Example responses:
Input: "Hi, how are you doing?"
"I'm doing about about 90 of"
"I'm fine, how about yourself?"
The LSTM with Attention produced more coherent responses.
Attention distribution for query: "How are you doing today?"
"I'm doing great, what about you?"
"how"
and "doing"
, improving response relevance.The LSTM with Attention outperformed the baseline LSTM, generating contextually accurate responses.
Future work:
There are no models linked
There are no models linked