AskMyArticle: End-to-End Retrieval-Augmented Question Answering from Online Articles

Overview

AskMyArticle is a fully integrated, retrieval-based question-answering system that allows users to extract insights from any online article using natural language queries. Built on top of modern AI/NLP tools such as LangChain, OpenAI APIs, FAISS, and Streamlit, the application is optimized for high-precision, contextual Q&A over unstructured HTML content.

This documentation provides a comprehensive and technically detailed breakdown of the entire project pipeline, aligning directly with the source code and implementation strategy.

Architecture and Workflow

The system is architected in two main phases:

Phase 1: Indexing Phase (Offline Preparation)

Environment Initialization and API Configuration
Streamlit UI Layout Setup
Article URL Input Collection
Web Content Extraction using UnstructuredURLLoader
Text Chunking with RecursiveCharacterTextSplitter
Semantic Embedding Generation via OpenAI
FAISS Vector Index Construction and Local Persistence

Phase 2: Query Phase (Interactive Retrieval and Answer Generation)

Prompt Template Definition and Injection
User Query Input Interface
FAISS Vector Store Loading for Retrieval
Relevant Context Retrieval using Semantic Search
Response Generation via RetrievalQAWithSourcesChain
Final Answer Rendering in Streamlit

1. Environment Initialization and Configuration

The application begins by loading API credentials and configuring environment variables using python-dotenv:

from dotenv import load_dotenv
load_dotenv()

This practice ensures sensitive credentials (e.g., OpenAI API keys) remain external to the codebase, enhancing both security and maintainability.

2. Frontend Layout using Streamlit

Streamlit is used to construct an interactive, user-centric interface:

st.set_page_config(page_title="AskMyArticle", layout="wide")
st.markdown("<h1 style='text-align: center; color: #2c3e50;'>AskMyArticle: Smart Q&A from Online Articles</h1>", unsafe_allow_html=True)

The sidebar allows users to input up to three article URLs. This design supports comparative or aggregated Q&A over multiple sources.

3. URL Input and Web Content Acquisition

Article content is fetched using LangChain’s UnstructuredURLLoader:

loader = UnstructuredURLLoader(urls=urls)
docs = loader.load()

This loader extracts the visible text from the webpage, which forms the base corpus for semantic indexing.

4. Text Chunking via Recursive Splitting

To optimize embedding and retrieval granularity, long documents are split into semantically coherent chunks:

text_splitter = RecursiveCharacterTextSplitter(
    separators=['\n\n', '\n', '.', ','],
    chunk_size=1000
)
split_docs = text_splitter.split_documents(docs)

The recursive strategy ensures logical chunk boundaries, preserving context across paragraphs and sentences.

5. Embedding Generation with OpenAI

Each chunk is embedded into a dense vector space using OpenAI’s embedding model via LangChain:

embeddings = OpenAIEmbeddings()

These embeddings capture the semantic meaning of each chunk, enabling accurate contextual retrieval.

6. Vector Index Construction using FAISS

FAISS (Facebook AI Similarity Search) is used to build an in-memory index over the vector embeddings:

index = FAISS.from_documents(split_docs, embeddings)
index.save_local("INDEX")

The index is persisted locally to allow for fast retrieval without recomputing embeddings.

7. Prompt Engineering and Custom Prompt Template

A carefully designed prompt is used to steer the language model’s behavior during inference:

custom_prompt = PromptTemplate(
    template="""
You are an AI assistant helping a user understand articles.

Use ONLY the context below to answer the question.
If the answer cannot be found in the context, say \"I don't know\" and do not make up an answer.

Context:
{context}

Question: {question}

Answer:""",
    input_variables=["context", "question"]
)

This prompt ensures factual accuracy by restricting generation strictly to retrieved content.

8. Natural Language Question Input

Users enter their questions via a main input box:

query = st.text_input("Enter your question here:")

This input triggers semantic retrieval and answer generation over the indexed articles.

9. Retrieval-Augmented Answer Generation

At inference time, the vector store is queried, and a QA chain is invoked:

vectorstore = FAISS.load_local("INDEX", OpenAIEmbeddings(), allow_dangerous_deserialization=True)
chain = RetrievalQAWithSourcesChain.from_llm(llm=llm, retriever=vectorstore.as_retriever())
result = chain.invoke({"question": query}, return_only_outputs=True)

The retriever fetches relevant chunks from the vectorstore, and the language model generates a constrained answer using the custom prompt.

10. Result Display and Post-Processing

The final answer is rendered elegantly using HTML styling within Streamlit:

st.markdown(f"<div style='background-color:#f0f8ff;padding:15px;border-radius:10px;'>{result['answer']}</div>", unsafe_allow_html=True)

Conclusion

AskMyArticle demonstrates a fully modular, production-grade retrieval QA pipeline that integrates modern NLP infrastructure with user-friendly design. From real-time embedding of web content to safe, prompt-grounded answer generation, each component is optimized for reliability, transparency, and extensibility.

This architecture can be scaled or extended for domain-specific knowledge extraction, enterprise document analysis, or academic research assistive tools. Prompt design plays a pivotal role in governing LLM behavior, and this project emphasizes that through a carefully curated and enforced prompt template.

Technology Stack

Frontend: Streamlit
LLM & Embeddings: OpenAI APIs via LangChain
Vector Store: FAISS
Prompting Strategy: Custom PromptTemplate with enforced constraints
Data Acquisition: UnstructuredURLLoader
Text Preprocessing: RecursiveCharacterTextSplitter

AskMyArticle: End-to-End Retrieval-Augmented Question Answering from Online Articles

Overview

This documentation provides a comprehensive and technically detailed breakdown of the entire project pipeline, aligning directly with the source code and implementation strategy.

Architecture and Workflow

The system is architected in two main phases:

Phase 1: Indexing Phase (Offline Preparation)

Environment Initialization and API Configuration
Streamlit UI Layout Setup
Article URL Input Collection
Web Content Extraction using UnstructuredURLLoader
Text Chunking with RecursiveCharacterTextSplitter
Semantic Embedding Generation via OpenAI
FAISS Vector Index Construction and Local Persistence

Phase 2: Query Phase (Interactive Retrieval and Answer Generation)

Prompt Template Definition and Injection
User Query Input Interface
FAISS Vector Store Loading for Retrieval
Relevant Context Retrieval using Semantic Search
Response Generation via RetrievalQAWithSourcesChain
Final Answer Rendering in Streamlit

1. Environment Initialization and Configuration

The application begins by loading API credentials and configuring environment variables using python-dotenv:

from dotenv import load_dotenv
load_dotenv()

This practice ensures sensitive credentials (e.g., OpenAI API keys) remain external to the codebase, enhancing both security and maintainability.

2. Frontend Layout using Streamlit

Streamlit is used to construct an interactive, user-centric interface:

st.set_page_config(page_title="AskMyArticle", layout="wide")
st.markdown("<h1 style='text-align: center; color: #2c3e50;'>AskMyArticle: Smart Q&A from Online Articles</h1>", unsafe_allow_html=True)

The sidebar allows users to input up to three article URLs. This design supports comparative or aggregated Q&A over multiple sources.

3. URL Input and Web Content Acquisition

Article content is fetched using LangChain’s UnstructuredURLLoader:

loader = UnstructuredURLLoader(urls=urls)
docs = loader.load()

This loader extracts the visible text from the webpage, which forms the base corpus for semantic indexing.

4. Text Chunking via Recursive Splitting

To optimize embedding and retrieval granularity, long documents are split into semantically coherent chunks:

text_splitter = RecursiveCharacterTextSplitter(
    separators=['\n\n', '\n', '.', ','],
    chunk_size=1000
)
split_docs = text_splitter.split_documents(docs)

The recursive strategy ensures logical chunk boundaries, preserving context across paragraphs and sentences.

5. Embedding Generation with OpenAI

Each chunk is embedded into a dense vector space using OpenAI’s embedding model via LangChain:

embeddings = OpenAIEmbeddings()

These embeddings capture the semantic meaning of each chunk, enabling accurate contextual retrieval.

6. Vector Index Construction using FAISS

FAISS (Facebook AI Similarity Search) is used to build an in-memory index over the vector embeddings:

index = FAISS.from_documents(split_docs, embeddings)
index.save_local("INDEX")

The index is persisted locally to allow for fast retrieval without recomputing embeddings.

7. Prompt Engineering and Custom Prompt Template

A carefully designed prompt is used to steer the language model’s behavior during inference:

custom_prompt = PromptTemplate(
    template="""
You are an AI assistant helping a user understand articles.

Use ONLY the context below to answer the question.
If the answer cannot be found in the context, say \"I don't know\" and do not make up an answer.

Context:
{context}

Question: {question}

Answer:""",
    input_variables=["context", "question"]
)

This prompt ensures factual accuracy by restricting generation strictly to retrieved content.

8. Natural Language Question Input

Users enter their questions via a main input box:

query = st.text_input("Enter your question here:")

This input triggers semantic retrieval and answer generation over the indexed articles.

9. Retrieval-Augmented Answer Generation

At inference time, the vector store is queried, and a QA chain is invoked:

vectorstore = FAISS.load_local("INDEX", OpenAIEmbeddings(), allow_dangerous_deserialization=True)
chain = RetrievalQAWithSourcesChain.from_llm(llm=llm, retriever=vectorstore.as_retriever())
result = chain.invoke({"question": query}, return_only_outputs=True)

The retriever fetches relevant chunks from the vectorstore, and the language model generates a constrained answer using the custom prompt.

10. Result Display and Post-Processing

The final answer is rendered elegantly using HTML styling within Streamlit:

st.markdown(f"<div style='background-color:#f0f8ff;padding:15px;border-radius:10px;'>{result['answer']}</div>", unsafe_allow_html=True)

Conclusion

Technology Stack

Frontend: Streamlit
LLM & Embeddings: OpenAI APIs via LangChain
Vector Store: FAISS
Prompting Strategy: Custom PromptTemplate with enforced constraints
Data Acquisition: UnstructuredURLLoader
Text Preprocessing: RecursiveCharacterTextSplitter

AskMyArticle: End-to-End Retrieval-Augmented Question Answering from Online Articles

Table of contents

AskMyArticle: End-to-End Retrieval-Augmented Question Answering from Online Articles

Overview

Architecture and Workflow

Phase 1: Indexing Phase (Offline Preparation)

Phase 2: Query Phase (Interactive Retrieval and Answer Generation)

1. Environment Initialization and Configuration

2. Frontend Layout using Streamlit

3. URL Input and Web Content Acquisition

4. Text Chunking via Recursive Splitting

5. Embedding Generation with OpenAI

6. Vector Index Construction using FAISS

7. Prompt Engineering and Custom Prompt Template

8. Natural Language Question Input

9. Retrieval-Augmented Answer Generation

10. Result Display and Post-Processing

Conclusion

Technology Stack

Table of contents

AskMyArticle: End-to-End Retrieval-Augmented Question Answering from Online Articles

Overview

Architecture and Workflow

Phase 1: Indexing Phase (Offline Preparation)

Phase 2: Query Phase (Interactive Retrieval and Answer Generation)

1. Environment Initialization and Configuration

2. Frontend Layout using Streamlit

3. URL Input and Web Content Acquisition

4. Text Chunking via Recursive Splitting

5. Embedding Generation with OpenAI

6. Vector Index Construction using FAISS

7. Prompt Engineering and Custom Prompt Template

8. Natural Language Question Input

9. Retrieval-Augmented Answer Generation

10. Result Display and Post-Processing

Conclusion

Technology Stack

Code

Code