
A cross-lingual information retrieval system that enables real-time semantic search across disparate language datasets (English, German, French, Spanish) without intermediate machine translation. Powered by Sentence-Transformers and FAISS, this engine aligns multilingual customer feedback into a shared 384-dimensional vector space, allowing Product Managers to query global data in English and retrieve semantically relevant insights instantly.
In modern global commerce, user feedback is generated in a multitude of languages. For Product Managers and Data Analysts, this creates a "Language Silo": valuable insights regarding product defects, feature requests, or sentiment are locked away in languages the core team does not speak.
Traditional approaches to solving this have significant limitations:
Keyword Search (BM25): Fails completely across languages (e.g., searching "Screen" will not find "Pantalla").
Translation Pipelines: Translating millions of reviews via APIs is computationally expensive, slow, and introduces "translation noise" before analysis even begins.
This project implements a Neural Information Retrieval (NIR) system that bypasses translation entirely. Instead of translating text, we map sentences from all languages into a shared Semantic Vector Space.
By using a pre-trained multilingual Transformer model (paraphrase-multilingual-MiniLM-L12-v2), the system encodes English queries and foreign-language documents such that semantically similar concepts (e.g., "Bad battery" and "Batterie défaillante") are projected to proximal points in the vector space.

3.1 Embedding Layer:
3.2 Retrieval Engine:

3.4 Hybrid Filtering:
Latency: Sub-millisecond retrieval times on CPU (FAISS), enabling real-time dashboards.
Accuracy: Qualitative analysis demonstrates the system successfully identifies specific product defects (e.g., "battery drainage," "broken screens") across 4 languages using only English queries.
Scalability: The vector-based approach scales linearly with data volume, avoiding the $X/token cost of translation APIs.