Together-Agents is a set of Web AI Agents, currently it only consist of the webcrawler
data analysis AI Agent and PPT Generator
. It leverages Puppeteer for web scraping and together-ai free AI API's for scraped data analysis. The project is designed to support multiple routes providing a window to the web AI Agents.
Webcrawler can crawl to any url, extract data, and perform analysis on the collected information. The web crawler is mildly optimized, using puppeteer-extra plugins, random-useragents, simulating human behaviour and real browser and using open source available proxies to vist any site.
The web crawler leverages Puppeteer, a powerful headless browser automation tool, to navigate and extract data from web pages. The process involves several key steps and optimizations to ensure efficient and effective data extraction. Below is a detailed explanation of how the web crawler works, including the use of Puppeteer optimizations and the analysis of HTML content in chunks.
%%{init: {'theme':'base', 'themeVariables': {'background': '#ffffff', 'primaryColor': '#6baed6', 'edgeLabelBackground':'#ffffff', 'nodeBorder':'#005f87', 'clusterBkg':'#eaf2f8', 'secondaryColor':'#9ecae1'}}}%% graph TD; A[Initialization] -->|Setup Puppeteer| B[Crawling a Link] B -->|Navigate to Page| C[Extracting HTML Content] C -->|Split into Chunks| D[Analyzing Chunks] D -->|Send to AI Model| E[Merging Results] E -->|Combine JSON Responses| F[Returning Final Data] subgraph Puppeteer Optimizations G1[Stealth Plugin] -->|Bypass Bot Detection| C G2[Random User-Agent] -->|Prevent Detection| C G3[Proxy Rotation] -->|Avoid IP Blocking| B G4[Request Interception] -->|Skip Unnecessary Resources| C G5[Human-Like Behavior] -->|Simulate Mouse & Scroll| B G6[Browser Verification Bypass] -->|Modify Navigator Properties| B end C -->|Chunk Processing| H[Processing Chunks Sequentially] H -->|Send to AI Model| D D -->|AI JSON Response| I[Sanitizing & Parsing JSON] I -->|Clean Data| E style A fill:#b3e5fc,stroke:#005f87,stroke-width:2px,color:#005f87; style B fill:#81d4fa,stroke:#005f87,stroke-width:2px,color:#005f87; style C fill:#4fc3f7,stroke:#005f87,stroke-width:2px,color:#005f87; style D fill:#29b6f6,stroke:#005f87,stroke-width:2px,color:#005f87; style E fill:#03a9f4,stroke:#005f87,stroke-width:2px,color:#ffffff; style F fill:#0288d1,stroke:#005f87,stroke-width:2px,color:#ffffff; style G1,G2,G3,G4,G5,G6 fill:#e1f5fe,stroke:#005f87,stroke-width:1px,color:#005f87; style H fill:#80deea,stroke:#005f87,stroke-width:2px,color:#005f87; style I fill:#26c6da,stroke:#005f87,stroke-width:2px,color:#ffffff;
Stealth Plugin:
puppeteer-extra-plugin-stealth
is used to make Puppeteer less detectable by websites. This plugin helps in bypassing common anti-bot mechanisms.Random User-Agent:
random-useragent
library is used to generate random user-agent strings, making it harder for websites to detect automated requests.Proxy Rotation:
fetchProxyList
function and the launchBrowserWithProxy
method.Request Interception:
Human-Like Behavior:
emulateHumanBehavior
method introduces random mouse movements, scrolls, and clicks to mimic human behavior, further reducing the likelihood of being detected as a bot.Browser Verification Bypass:
bypassVerification
method overrides various navigator properties to mimic a real browser, including webdriver
, chrome
, plugins
, languages
, platform
, hardwareConcurrency
, deviceMemory
, userAgent
, vendor
, maxTouchPoints
, connection
, productSub
, appVersion
, product
, appCodeName
, and appName
.The analysis.js
file contains functions to process and analyze the HTML content extracted by Puppeteer. The key steps involved are:
Chunking HTML Content:
chunkContent
function splits the HTML content into manageable chunks. This is necessary because the AI model used for analysis has a token limit, and splitting the content into smaller parts ensures that each part can be processed individually.Processing Chunks Sequentially:
analyze
function processes each chunk sequentially. It sends each chunk to the AI model for analysis and collects the JSON responses.Merging JSON Responses:
mergeJsonObjects
function merges the JSON responses from each chunk into a single JSON object. This ensures that the final output is a comprehensive representation of the extracted data.Sanitizing and Parsing JSON:
sanitizeAndParseJson
function sanitizes the JSON responses from the AI model and parses them into JavaScript objects. This ensures that the data is in a usable format.Initialization:
PuppeteerService
class is initialized with various settings, including proxy rotation, request interception, and human-like behavior emulation.Crawling a Link:
crawl
method navigates to the specified link using Puppeteer. It handles retries with different proxies if the initial attempt fails.Extracting HTML Content:
crawl
method extracts the HTML content of the page and splits it into chunks using the chunkContent
function from analysis.js
.Analyzing Chunks:
analyze
function from analysis.js
. The AI model extracts relevant information from each chunk and returns a JSON response.Merging Results:
mergeJsonObjects
function from analysis.js
.Returning Results:
The web crawler leverages Puppeteer to navigate web pages and extract HTML content. It uses various optimizations to avoid detection and improve performance. The extracted HTML content is split into chunks and analyzed using an AI model to extract relevant data. The results from each chunk are merged into a single JSON object, providing a comprehensive representation of the extracted data.
This approach ensures that the web crawler can efficiently and effectively extract data from web pages, even those with complex structures and anti-bot mechanisms.
Start the application:
npm start
The application will start at port 8080. You can do a post request to the /webcrawler
route with the following payload.
{ "link":"https://example.com", "source" : 0, // 0 for Meta Llama 3.3 Turbo Instruct , 1 for DeepSeek R1 "prompt": "Please extract the following data:" }
The ppt_generator
module is a powerful tool for generating PowerPoint presentations using AI models. It leverages the together-ai
library for AI-driven content generation and the PptxTemplateEngine
for creating presentation slides.
The ppt_generator
module uses AI models to generate the structure and content of PowerPoint presentations. The process involves several key steps:
graph TD; A[Start] --> B[Initialize PPTGenerator]; B --> C[Generate Schema]; C -->|AI Model| D[Schema JSON Output]; D --> E[Generate Slides]; E -->|AI Model| F[Slide JSON Output]; F --> G{Does Slide Need Image?}; G -- Yes --> H[Generate Image]; H -->|AI Model| I[Image Path Output]; G -- No --> J[Skip Image Generation]; I --> K[Assemble Presentation]; J --> K; K -->|Using PptxTemplateEngine| L[Final PPTX Output]; L --> M[Save PPT to Public Directory]; M --> N[End]; subgraph AI Process C E H end
Initialization:
PPTGenerator
class is initialized with various settings, including the AI client, models, and system prompts.Generating Schema:
generateSchema
method sends a prompt to the AI model to generate the overall structure of the presentation, including the number of slides and their types.Generating Slides:
generateSlide
method generates the content for each slide based on the schema. It sends the slide title and description to the AI model, which returns a JSON object representing the slide's layout and content.Generating Images:
generateImage
method generates images based on text prompts using the AI model. It returns the path to the generated image.Assembling the Presentation:
generatePresentation
method assembles the slides into a final PowerPoint presentation using the PptxTemplateEngine
. It handles the layout and formatting of each slide based on the JSON content generated by the AI model.Initialization:
PPTGenerator
class is initialized with the AI client, models, and system prompts. It also sets up the template engine with the specified color scheme.Generating Schema:
generateSchema
method sends a prompt to the AI model to generate the presentation schema. It returns a JSON object representing the structure of the presentation.Generating Slides:
generateSlide
method generates the content for each slide based on the schema. It sends the slide title and description to the AI model, which returns a JSON object representing the slide's layout and content.Generating Images:
generateImage
method generates images based on text prompts using the AI model. It returns the path to the generated image.Assembling the Presentation:
generatePresentation
method assembles the slides into a final PowerPoint presentation using the PptxTemplateEngine
. It handles the layout and formatting of each slide based on the JSON content generated by the AI model.The AI process involves several steps to generate the content for each slide:
Schema Generation:
Slide Generation:
Image Generation:
Presentation Assembly:
PptxTemplateEngine
assembles the slides into a final PowerPoint presentation. It handles the layout and formatting of each slide based on the JSON content generated by the AI model.The ppt_generator
module leverages AI models to generate PowerPoint presentations. It uses the together-ai
library for AI-driven content generation and the PptxTemplateEngine
for creating presentation slides. The module handles the initialization, schema generation, slide generation, image generation, and assembly of the final presentation.
This approach ensures that the ppt_generator
module can efficiently and effectively generate PowerPoint presentations with AI-driven content.
Start the application:
npm start
The application will start at port 8080. You can do a post request to the /generate/ppt
route with the following payload.
{ "prompt": "Your presentation prompt" }
public
directory.Clone the repository:
git clone https://github.com/yourusername/together-agents.git cd together-agents
Install the dependencies:
npm install
Set up environment variables:
Create a .env
file in the root directory and add the necessary environment variables. For example:
API_KEY=TOGETHER_AI_API_KEY
The project uses the following dependencies:
dotenv
: ^16.4.7express
: ^4.21.2puppeteer
: ^24.1.1puppeteer-extra
: ^3.3.6puppeteer-extra-plugin-stealth
: ^2.11.2random-useragent
: ^0.5.0sqlite3
: ^5.1.7 -> For future use cases 😀
together-ai
: ^0.13.0pptxgenjs
: ^3.1.0Contributions are welcome! Please follow these steps:
There are no datasets linked
There are no datasets linked