In this project "BrieflyAI", a comprehensive news intelligence system that aggregates and analyzes information from multiple sources including Google News and Reddit discussions. The system employs advanced natural language processing techniques using Groq's LLaMA models to generate structured summaries and insights from real-time data. BrieflyAI addresses the challenge of information overload by providing users with concise, well-structured analyses of topics across different media platforms. The system integrates web scraping capabilities, rate-limited API calls, and a user-friendly Streamlit interface to deliver comprehensive news intelligence. Experimental results demonstrate the system's effectiveness in processing multiple topics simultaneously while maintaining data quality and providing actionable insights.
In today's information-rich environment, staying updated with current events across multiple platforms presents significant challenges. Traditional news consumption methods often lead to information silos, where users miss crucial discussions happening on social media platforms like Reddit. BrieflyAI was developed to bridge this gap by creating an integrated analysis system that combines traditional news sources with community-driven discussions.
The primary challenges addressed by this system include:
BrieflyAI implements a microservices architecture that combines:
# Core system architecture components class NewsScraper: _rate_limiter = AsyncLimiter(5, 1) # 5 requests/second @retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10) ) async def scrape_news(self, topics: List[str]) -> Dict[str, str]: """Scrape and analyze news articles with structured summaries""" results = {} raw_headlines = {} # Store raw headlines for debugging for topic in topics: async with self._rate_limiter: try: # Generate news URLs for the topic urls = generate_news_urls_to_scrape([topic]) # Scrape the news page search_html = scrape_with_brightdata(urls[topic]) clean_text = clean_html_to_text(search_html) headlines = extract_headlines(clean_text) # Store raw headlines for potential debugging raw_headlines[topic] = headlines # Generate structured summary using the updated function if headlines.strip(): summary = summarize_with_groq_structured( api_key=GROQ_API_KEY, headlines=headlines ) results[topic] = summary else: results[topic] = f"No headlines found for topic: {topic}" except Exception as e: print(f"Error scraping news for topic '{topic}': {str(e)}") results[topic] = f"Error analyzing {topic}: {str(e)}" # Rate limiting to be respectful to news sites await asyncio.sleep(1) return { "news_analysis": results, "raw_headlines": raw_headlines, # Include raw data for debugging "metadata": { "total_topics": len(topics), "successful_scrapes": len([r for r in results.values() if not r.startswith("Error")]), "scraping_method": "brightdata" } } @retry( stop=stop_after_attempt(2), wait=wait_exponential(multiplier=1, min=1, max=5) ) async def scrape_single_topic(self, topic: str) -> Dict[str, str]: """Scrape a single topic for more focused analysis""" try: async with self._rate_limiter: urls = generate_news_urls_to_scrape([topic]) search_html = scrape_with_brightdata(urls[topic]) clean_text = clean_html_to_text(search_html) headlines = extract_headlines(clean_text) if headlines.strip(): summary = summarize_with_groq_structured( api_key=GROQ_API_KEY, headlines=headlines ) return { "topic": topic, "summary": summary, "raw_headlines": headlines, "status": "success" } else: return { "topic": topic, "summary": f"No current news found for: {topic}", "raw_headlines": "", "status": "no_data" } except Exception as e: print(f"Error in single topic scrape for '{topic}': {str(e)}") return { "topic": topic, "summary": f"Error: {str(e)}", "raw_headlines": "", "status": "error" } async def get_news_health_check(self) -> Dict[str, str]: """Test if news scraping is working properly""" test_topic = "technology" try: result = await self.scrape_single_topic(test_topic) return { "status": "healthy" if result["status"] == "success" else "degraded", "test_topic": test_topic, "details": result["status"] } except Exception as e: return { "status": "unhealthy", "test_topic": test_topic, "details": str(e) }
The system follows a modular architecture with distinct components for different functionalities:
The news scraping module utilizes BrightData's web unlocker service to access Google News:
def generate_valid_news_url(keyword: str) -> str: """ Generate a Google News search URL for a keyword with optional sorting by latest Args: keyword: Search term to use in the news search Returns: str: Constructed Google News search URL """ q = quote_plus(keyword) return f"https://news.google.com/search?q={q}&tbs=sbd:1" def scrape_with_brightdata(url: str) -> str: """Scrape a URL using BrightData with improved error handling""" # Validate environment variables api_key = BRIGHTDATA_API_KEY zone = WEB_UNLOCKER_ZONE if not api_key: raise ValueError("BRIGHTDATA_API_KEY environment variable is not set") if not zone: raise ValueError("BRIGHTDATA_WEB_UNLOCKER_ZONE environment variable is not set") # Validate URL if not url or not url.startswith(('http://', 'https://')): raise ValueError("Invalid URL provided") headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } # Updated payload based on BrightData API documentation payload = { "zone": zone, "url": url, "format": "raw", "country": "US", # Add country if required "render": False # Set to True if you need JavaScript rendering } try: print(f"Attempting to scrape: {url}") print(f"Using zone: {zone}") response = requests.post( "https://api.brightdata.com/request", json=payload, headers=headers, timeout=30 # Add timeout ) # Log response details for debugging print(f"Response status: {response.status_code}") print(f"Response headers: {dict(response.headers)}") if response.status_code == 400: # Try to get more specific error details try: error_detail = response.json() print(f"BrightData error details: {error_detail}") raise HTTPException( status_code=500, detail=f"BrightData API error: {error_detail.get('message', 'Bad Request')}" ) except: print(f"Response text: {response.text}") raise HTTPException( status_code=500, detail=f"BrightData API error: 400 Bad Request - {response.text}" ) response.raise_for_status() return response.text except requests.exceptions.RequestException as e: print(f"Request exception: {str(e)}") raise HTTPException(status_code=500, detail=f"BrightData error: {str(e)}")
For Reddit data collection, the system implements MCP (Model Context Protocol) integration:
@retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=15, max=60), retry=retry_if_exception_type(MCPOverloadedError), reraise=True ) async def process_topic(agent, topic: str): async with mcp_limiter: messages = [ { "role": "system", "content": f"""You are a Reddit analysis expert. Use available tools to: 1. Find top 2 posts about the given topic BUT only after {two_weeks_ago_str}, NOTHING before this date strictly! 2. Analyze their content and sentiment 3. Create a summary of discussions and overall sentiment""" }, { "role": "user", "content": f"""Analyze Reddit posts about '{topic}'. Provide a comprehensive summary including: - Main discussion points - Key opinions expressed - Any notable trends or patterns - Summarize the overall narrative, discussion points and also quote interesting comments without mentioning names - Overall sentiment (positive/neutral/negative)""" } ] try: response = await agent.ainvoke({"messages": messages}) return response["messages"][-1].content except Exception as e: if "Overloaded" in str(e): raise MCPOverloadedError("Service overloaded") else: raise
The system employs Groq's LLaMA models for content analysis and summarization:
def summarize_with_groq_structured(api_key: str, headlines: str) -> str: """ Summarize headlines into a structured format for UI display using GROQ. """ system_prompt = """ You are a professional news analyst creating structured summaries for web display. Transform the provided headlines into a well-organized, comprehensive summary with: 1. **Executive Summary**: Brief overview of the main themes 2. **Key Stories**: Major headlines with detailed explanations 3. **Analysis**: What these stories mean and their significance 4. **Trends**: Patterns or connections between different stories Format guidelines: - Use clear markdown headings (##, ###) - Present key points as bullet points - Include specific details and context - Maintain professional, informative tone - Focus on clarity and readability - Make it comprehensive but digestible Create a structured report that would be suitable for display on a news dashboard or summary page. """ try: llm = ChatGroq( model=LLAMA_70b_model, api_key=api_key, temperature=TEMPERATURE, max_tokens=MAX_TOKEN_1 ) response = llm.invoke([ SystemMessage(content=system_prompt), HumanMessage(content=f"Headlines to analyze:\n\n{headlines}") ]) return response.content except Exception as e: raise HTTPException(status_code=500, detail=f"GROQ error: {str(e)}")
The experimental evaluation focused on three key areas:
Topics Tested: Technology, Climate Change, Cryptocurrency, Political Events, Healthcare
The BrieflyAI system features a modern, intuitive interface designed for optimal user experience:

The system demonstrated robust data collection capabilities:
Generated summaries showed high quality across multiple dimensions:
BrieflyAI represents a significant advancement in automated news intelligence systems. The project successfully demonstrates the feasibility of creating a comprehensive multi-source analysis platform that combines traditional news sources with community-driven discussions. The system's modular architecture, robust error handling, and user-friendly interface make it a valuable tool for information consumers seeking efficient access to diverse perspectives on current events.
BrieflyAI has potential applications in:
The successful implementation of BrieflyAI demonstrates the viability of AI-powered news intelligence systems and provides a foundation for future developments in automated information processing and analysis.
Repository and Resources
GitHub Repository: BrieflyAI Project
Technology Stack: Python, FastAPI, Streamlit, Groq LLM, BrightData