This publication was originally written in Spanish. You're viewing an automated translation into English.
Web application that processes files (PDF, DOCX, PPTX) or YouTube transcripts to generate:
Why this project?
This generator automates the creation of summaries and quizzes from study materials, helping students and teachers reduce study time and provide quick assessments based on existing content. It's ideal for situations where large amounts of text or audiovisual content need to be processed efficiently.
Who is it for?
The system is designed to be used by:
Optimal threshold: 0.85 (configurable in code)
The histogram shows how similarity analysis is performed to determine the relationship between text sections. A higher similarity threshold can improve the accuracy of summaries.
Selecting between local files or YouTube URLs
The interface is easy to use and allows users to upload local files or paste YouTube links directly to start generating summaries and quizzes.
Reduction from 728 words → 120 words (83% more concise)
The summary is generated using NLP techniques such as TextRank, allowing a considerable reduction in text length without losing the essence of the content.
The 5 questions with multiple choices and explanations can be modified in this section of code, where it must be considered that the greater the number of questions, the greater the cost in tokens of the query and therefore the number of requests to the API is reduced, so for this case 5 were used.
The generated quiz is interactive and allows users to assess their understanding of the material. It also offers detailed explanations for each answer.
flowchart TD A[Entrada] -->|Archivo PDF/DOCX/PPTX| B(Extracción de Texto) A -->|URL de YouTube| C(Transcripción API) B --> D[Texto Procesado] C --> D D --> E{Modo Seleccionado} E -->|Generar Resumen| F[spaCy + TextRank] E -->|Generar Cuestionario| G[Llama3 70B\nvía NVIDIA API] F --> H[Resumen Automático\nReducción 80% palabras] G --> I[Cuestionario JSON\n5 preguntas con opciones] H --> J[(Salida:\nMarkdown/Interfaz)] I --> J K[Streamlit] -->|Interfaz Web| L[Usuario Final] %% Estilos classDef tech fill:#4CAF50,color:white,stroke:#388E3C; classDef data fill:#2196F3,color:white,stroke:#1976D2; classDef output fill:#FF9800,color:white,stroke:#F57C00; classDef tool fill:#9C27B0,color:white,stroke:#7B1FA2; class B,C,F,G,K tech; class D,A data; class H,I,J output; class L tool;
classDiagram class Streamlit { +file_uploader() +text_input() +button() } class spaCy { +load(model_name) +add_pipe(algorithm) } class NVIDIA_API { +base_url: string +model: string } Streamlit --> spaCy : Usa para Streamlit --> NVIDIA_API : Consulta
Upload a file (PDF/DOCX/PPTX) or paste YouTube URL .
Choose the mode :
Explore the results :
PyPDF2
for PDFs and python-docx
for Word documents. It is then processed to generate the summary or questionnaire.Name | Use | License |
---|---|---|
es_core_news_md | Word processing in Spanish | MIT |
Llama 3 70B | Question generation | Owner (NVIDIA) |
You need to download the Markdown Preview Mermaid Support extension.
- this is so that the diagrams are displayed correctly in mermaid format
⚠️ Limitations :
🛠️ Code available at : GitHub/repo