Make it real - experimental multi-agent CLI to flesh out an idea

MakeItReal is an experimental multi-agent AI system designed to help aspiring developers turn vague software ideas into a concrete, actionable project plan. The tool acts as a virtual project manager and technical advisor, guiding users (especially those without formal software engineering experience) from an initial concept to a structured Minimum Viable Product (MVP) specification. By breaking down a high-level idea into well-defined features, suggesting an appropriate technology stack, and outlining a set of development tasks, MakeItReal bridges the gap between ideation and implementation. This publication describes the purpose and design of MakeItReal, how it works, and how to use it in practice.

Introduction: From Idea to Structured Plan

Many hobbyist programmers and newcomers to software development struggle with the initial planning phase of a project. They might start with a great idea, but due to limited experience in requirements gathering and project scoping, they find it difficult to refine that idea into a clear set of features and tasks. This often leads to false starts, scope creep, or projects that stall out. In fact, studies have shown that nearly half of software project failures can be traced back to unclear or incomplete requirements. Clearly, a well-defined project specification and roadmap are critical for success.

MakeItReal addresses this problem by acting as an AI co-pilot for the early project planning stages. The system’s goal is to ensure that a user’s “fuzzy” idea is systematically developed into:

A clear problem statement and value proposition (what the product will do and who it is for).
A minimal set of essential features (use cases) that define the MVP scope.
A recommended technology stack suited to implementing those features.
A structured task list for developers to follow in building the MVP.

By providing this structured output, MakeItReal empowers users without technical project management skills to kick-start their software projects on solid footing. The tool guides the user step-by-step, helping them avoid common pitfalls such as missing critical requirements or choosing an inappropriate tech stack that could lead to technical debt down the line. Ultimately, MakeItReal makes the development process more approachable and efficient for novices, increasing the likelihood that their project will be completed successfully.

System Architecture and Workflow Design

MakeItReal is structured as a multi-agent workflow organized into sequential stages. Each stage tackles a distinct aspect of the project definition process, and together they form a pipeline from idea to execution plan. The stages are: Requirement Analysis, Tech Stack Discovery, and Task Creation. These correspond to deriving the core product features, determining a suitable technology stack, and generating a list of development tasks, respectively. Under the hood, the orchestration is implemented using the LangGraph framework, which allows us to define the workflow as a directed graph of nodes (steps) with conditional transitions. This design enables complex interactions like iterative loops and conditional branching while maintaining a clear overall flow.

At a high level, the workflow can be visualized as a flowchart diagram, shown below in Mermaid syntax:

---
config:
  flowchart:
    curve: linear
---
graph TD;
    __start__([<p>__start__</p>]):::first
    requirement_analysis(requirement_analysis)
    techstack_discovery(techstack_discovery)
    task_creation(task_creation)
    log_tasks(log_tasks)
    __end__([<p>__end__</p>]):::last
    __start__ --> requirement_analysis;
    requirement_analysis --> techstack_discovery;
    techstack_discovery --> task_creation;
    task_creation --> log_tasks;
    log_tasks --> __end__;
    classDef default fill:#f2f0ff,line-height:1.2
    classDef first fill-opacity:0
    classDef last fill:#bfb6fc

In this main graph, the nodes requirement_analysis, techstack_discovery, and task_creation each represent a sub-process handled by a pair of agents (explained below). The final node log_tasks simply collates and outputs the results. The directed arrows show the progression: first the idea goes through requirements analysis, then the outcome feeds into tech stack discovery, then into task planning, and finally the compiled plan is output at the end. The linear progression ensures that each subsequent stage has the context it needs (for example, the Task Creation stage uses the features and tech stack decided earlier).

Critically, each of the first three stages is not a single step but an internal loop of proposal and review. This loop is a Generator–Evaluator pattern (also known as a generator–reviewer or propose–critique loop) that helps refine the output at each stage. We have implemented this by nesting a subgraph for each stage. In each subgraph, one agent generates a proposal (a list of items for that stage), and another agent evaluates that proposal, possibly requesting changes. This cycle may repeat until the proposal is satisfactory. The diagram below illustrates the internal workflow for each stage (requirement analysis, tech stack, and task creation all follow this same pattern):

---
config:
  flowchart:
    curve: linear
---
graph TD;
    __start__([<p>__start__</p>]):::first
    requirements_agent(requirements_agent)
    review_agent(review_agent)
    human_review(human_review)
    __end__([<p>__end__</p>]):::last
    __start__ --> requirements_agent;
    requirements_agent --> review_agent;
    review_agent -. &nbsp;approved&nbsp; .-> human_review;
    review_agent -. &nbsp;rejected&nbsp; .-> requirements_agent;
    human_review -. &nbsp;approved&nbsp; .-> __end__;
    human_review -. &nbsp;rejected&nbsp; .-> requirements_agent;
    classDef default fill:#f2f0ff,line-height:1.2
    classDef first fill-opacity:0
    classDef last fill:#bfb6fc

As shown above, each stage’s subgraph involves three roles: a generator agent (requirements_agent in the diagram), a review agent (review_agent), and a final human review node. The workflow within a stage is as follows:

The generator agent produces a proposal based on the current idea and any existing context (for instance, in the first stage it proposes a set of product features given the initial idea).
The review agent then examines this proposal critically. The review agent can either approve the proposal if it deems it complete and appropriate, or reject it with feedback on what needs to change.
If the review agent approves, the proposal moves to the human_review node. Here the system pauses and asks the human user (via the CLI interface) to either approve the proposed list or request changes. This Human-in-the-Loop step ensures the user has final control to accept the results or provide their own input if something is missing or undesirable.
If the human approves, that stage is finalized and the subgraph ends successfully. If the human rejects (for example, the user wants to add a feature or remove something), the system captures the user’s change request and loops back to the generator agent to regenerate a new proposal incorporating the requested changes. The cycle then repeats: the generator proposes an updated list, the agent reviewer checks it, and then it goes to human review again.
In case the agent review had rejected the proposal (finding it incomplete or not meeting MVP criteria), it directly loops back to the generator agent with the agent’s requested changes. The human review in that cycle is skipped until the agent itself is satisfied. This ensures that the user is only presented with a proposal once it has passed an internal quality bar (the AI reviewer’s approval), reducing the likelihood of obvious issues in the suggestions.

This iterative Generator–Reviewer loop (with potential Human-in-the-Loop feedback) continues until each stage yields an outcome that is approved by both the AI reviewer and the user. The use of conditional graph transitions (the dotted lines labeled "approved" and "rejected" in the diagram) is a key aspect of LangGraph’s orchestration. We leverage LangGraph’s ability to branch the workflow based on the state: the system checks a boolean flag (approved or not) attached to the proposal and routes the flow accordingly. This dynamic control flow is what allows the multi-agent team to refine their outputs collaboratively.

Agent Roles and Responsibilities

MakeItReal employs a team of six AI agents, grouped into three pairs, each pair focusing on one of the stages.

Requirements Generator Agent: A specialized agent that takes the user’s raw idea and suggests a list of essential product features or use-cases that the MVP should include. This agent acts like a seasoned product requirements engineer, trying to derive the minimal, most crucial features from the idea.

Requirements Review Agent: This agent evaluates the feature list proposed by the generator. Its role is to ensure that the list of features is complete (no obvious necessary feature is missing) and focused (no extraneous "nice-to-have" features that don't belong in an MVP). If it finds issues, it will provide feedback specifying what to add or remove, thereby prompting a revision. For example, if the idea is a "task management app", the review agent might notice if a feature like "user authentication" is missing (perhaps critical for a multi-user app) and request it to be added, or it might flag that a proposed feature is out of scope for a first version and should be dropped. Only when the feature set looks reasonable and lean does this agent approve it.

Tech Stack Generator Agent: Once the features are finalized, this agent proposes a suitable technology stack to implement the project. It considers the nature of the application and features, and suggests specific frameworks, languages, and tools. For instance, it might recommend building a web application with a Python backend using FastAPI, a React frontend, and a PostgreSQL database, if those fit the project. This agent is designed to be intelligent in its choices: it prioritizes modern, well-documented technologies that align with the project requirements.

Tech Stack Review Agent: The tech stack suggestions are then scrutinized by a review agent. It checks for feasibility and appropriateness: Are the recommended technologies capable of delivering the identified features? Are they overkill or underpowered for the scope? Are they reasonably easy to work with for a first version? The Tech Stack Review Agent can reject the proposal if, for example, it finds a missing layer (perhaps no database was suggested for an app that clearly needs data persistence) or if an unnecessary tool was included. It will then specify changes (e.g., "add a database", "remove X library as it's not needed for MVP") and send it back for refinement.

Task List Generator Agent: In the final creative stage, this agent generates a detailed list of development tasks required to build the MVP, given the confirmed features and chosen tech stack. It breaks the project into manageable tasks or milestones. The tasks are typically phrased as actionable steps – for example, "Set up project repository and environment", "Implement user login and authentication (using framework X)", "Develop task creation API endpoint", "Build frontend page for task management", etc. The agent ensures that every feature is covered by one or more tasks and that the tasks take into account the specifics of the tech stack (so it will incorporate tasks relevant to the technologies chosen, like setting up a database schema if a database is in the stack).

Task List Review Agent: The task list is finally reviewed by an agent that acts like a software project manager. It checks if the tasks are comprehensive (cover all features), well-scoped, and ordered logically. If tasks are missing or some are too ambiguous, it will ask for modifications. For example, if the list skipped setting up testing or deployment, the reviewer might flag that, or if a task is too large and not specific, it might suggest breaking it down. Once it is satisfied that the task list is thorough and implementable, it approves the plan.

Each agent is implemented using an LLM (OpenAI’s GPT-4 model, in this case) with a carefully crafted prompt and instructions tailored to its role. The generator agents are prompted with the context of the idea (and previous stage results when applicable) and asked to output a structured list of items. The review agents are prompted with both the idea and the current proposal and tasked with providing a critical evaluation and either an approval or a set of required changes. This division of roles ensures a form of “peer review” where one AI checks the work of another, enhancing the quality and reliability of the outcome before involving the human user.

Structured Outputs and Validation with Pydantic

A notable aspect of our system design is the enforcement of structured outputs for each agent using Pydantic schemas. Rather than relying on free-form text from the LLMs and then trying to parse it, we define explicit data models for what each agent should return. For example, the Requirements Generator returns a model ProposalResult which contains a list of feature items, and the Requirements Review returns a model ReviewResult containing a boolean approved flag and a changes string describing requested modifications. We utilize the OpenAI function-calling and JSON output mechanisms via the LangChain/OpenAI integration to have the LLM respond in a way that conforms to these Pydantic models. This approach serves as an input validation and reliability measure: if the LLM output does not fit the schema (for instance, missing a field or not providing a boolean where expected), the system will detect it as an error. By structuring the LLM-agent interaction, we reduce ambiguity and ensure that each agent’s output can be programmatically interpreted by the next step in the workflow without brittle text parsing. The structured schemas act as a contract, making the multi-agent chain more deterministic and robust.

Using Pydantic for schema enforcement also helps to keep the agents honest about the format and content of their answers. For instance, the Task List Generator must return its tasks as a list of strings in a specified field. If it were to go off on a tangent or produce a long narrative, the schema validation would fail, prompting a retry or an error. In practice, this encourages well-formatted outputs and allows us to trust that, when a proposal passes from a generator to a reviewer agent, it’s in a known-good structure. It’s an effective way to implement validation within an AI-driven pipeline, analogous to how API endpoints validate inputs before processing.

Human-in-the-Loop Interaction

While the AI agents work together to refine the idea, MakeItReal keeps the human user firmly in control of the final decisions through a human-in-the-loop (HITL) mechanism. After the AI review agent in each stage approves a proposal, the system pauses and asks the user to review the proposed list of features, tech stack items, or tasks. In the CLI, this is presented clearly by listing the items (numbered) and then prompting the user: “Do you approve [the current list]? [Y/n]”. The user can simply press Enter or type “Y” to accept, or “n” to indicate they want changes. If they request changes, they are then prompted: “What do you want to change?”. The user can then type a brief instruction, for example: “Add a feature for data export” or “Remove the use of technology X, I prefer not to use it”. This input is taken as the human’s change request and is fed back into the workflow (specifically, stored in the Proposal.change_request field for that stage). The system then resumes the loop, with the generator agent receiving not only the idea and current list, but also this additional human feedback on what to modify. The generator will incorporate the feedback in its next proposal. This HITL loop can repeat as needed until the user is satisfied and approves the list.

This design ensures that the user’s vision and preferences are respected. The AI provides structure and suggestions, but the user can iteratively steer the outcome. For non-technical users, this is a gentle introduction to the idea of refining requirements— they don’t have to come up with everything from scratch, but they can adjust the plan to match their intent. From a technical standpoint, implementing the human feedback loop was made easy by LangGraph’s interrupt feature: we use a special interrupt node (human_review) that effectively checkpoints the state and yields control back to the CLI, which then handles user input. Once the input is gathered, the workflow is resumed seamlessly. The internal state carries over the user’s remarks to the next iteration. We also utilize LangGraph’s MemorySaver (in-memory checkpointing) so that the state of the conversation (the idea, current proposals, etc.) is preserved across these interruptions and could even be saved and reloaded if needed.

Tools Integration for Enhanced Capabilities

Beyond basic LLM capabilities, MakeItReal integrates external tools to extend what the agents can do – a key requirement for the project was to use at least three distinct tools. Currently, two custom tools have been implemented and a third is in development (to enable session persistence):

Web Search Tool: The Tech Stack Generator agent is equipped with a web search capability (search_suitable_techstack) that allows it to perform an online search for relevant technologies. When formulating a tech stack, if the agent is unsure or needs to discover what libraries or frameworks might be suitable, it can issue a search query (for example, “best web framework for a task management app backend”). Under the hood, this tool uses a DuckDuckGo Search API to fetch results and even attempts to retrieve and summarize content from the top result. The agent can analyze this information to inform its recommendations – for instance, confirming the popularity or viability of a suggested technology or finding alternatives.
Documentation Lookup Tool: In addition to general web search, the Tech Stack Generator has access to a library documentation search tool (search_library_docs). This tool connects to an external service (Context7 MCP server) that provides up-to-date documentation excerpts for a given library or topic. The agent uses this when it needs detailed, specific information about a technology. For example, if the agent is considering recommending a particular database or an AI service, it can retrieve the latest official docs or usage examples for that library. This ensures that its suggestions are grounded in current, accurate technical information and not just based on the model’s static training data. The inclusion of this tool demonstrates the system’s use of an advanced communication protocol (the MCP – Multi-Chain Proxy or similar) to augment the LLM: the model can effectively ask an external knowledge base for help before finalizing its answer.
State Persistence Tool (Planned): In the current implementation, the final state (including the idea and all finalized proposals) is automatically saved to a JSON file on disk at the end of a run. The forthcoming improvement leverages this by enabling a user to reload that state in a future session. This could be useful if the user wants to pause and continue later, or if they want to iterate on the plan over multiple sessions. The MemorySaver and state serialization in the design lay the groundwork for this capability.

By integrating these tools, MakeItReal goes beyond a vanilla LLM chatbot – it becomes a research assistant as well as a planner. The web search and documentation lookup ensure that technology recommendations are not solely based on potentially outdated training data; instead, the AI can fetch current information on demand. This is crucial in the tech domain, where best practices and popular frameworks can evolve rapidly. It also shows the extensibility of our multi-agent framework: new tools can be added and bound to agents as needed to improve their performance on specific tasks.

Orchestration and State Management

The overall orchestration uses the LangGraph StateGraph to tie everything together. Each node in the workflow updates a shared state (a Python dictionary following the WorkflowState schema). Key parts of this state include the original idea (stored as a message object), and three Proposal objects named features, tech_stack, and tasks. The Proposal is a custom Pydantic model we created to hold the current list of proposed items for that stage, a change_request (if any), and flags indicating whether the agent and human have approved it. This structured state design is pivotal. When an agent in a stage runs, it reads the relevant part of the state (e.g., the features proposal) and writes its output back into that state. The conditional transitions (approved/rejected loops) simply check those flags in the state to decide the next node.

One benefit of using LangGraph with a defined state is that it inherently supports memory and context passing. Earlier stages’ outcomes are available to later stages. For instance, by the time the Task Generator agent runs, the state already contains the final approved features list and tech stack, so it can incorporate those into its prompt (which our implementation does – it explicitly feeds the features and tech stack into the task-generation prompt). This ensures coherence across stages: the tasks are directly aligned with the chosen features and technologies. Similarly, if the user provided any custom change requests in earlier stages, those are part of the state and can influence subsequent suggestions (for example, if the user insisted on using a certain technology, the Task agent will naturally generate tasks related to that tech).

Additionally, the orchestration framework and MemorySaver allow the system to be reproducible and debuggable. At any point, we can dump the graph or examine the state transitions, which is helpful for development and also serves as documentation. In fact, a Makefile command make dump-graph is provided to output the Mermaid diagrams of the workflow – the same diagrams we included above – directly from the code, ensuring the design documentation stays in sync with the implementation.

Implementation and Technology Stack

MakeItReal is implemented in Python and makes use of several modern libraries and tools to achieve its multi-agent functionality:

Language Model: The core intelligence comes from OpenAI’s GPT-4 model (invoked via the OpenAI API). Prompts are managed with LangChain, and we use the ChatOpenAI interface with function calling to produce JSON-formatted outputs that Pydantic can validate. This provides the natural language understanding and generation capabilities for each agent.
Orchestration Framework: We chose LangGraph to orchestrate the agent workflow. LangGraph allows defining a graph of nodes (which can themselves be subgraphs), handling asynchronous execution and conditional logic elegantly. This was critical in implementing the loop pattern and human interrupts without writing complex custom logic.
Data Modeling and Validation: Pydantic (v2) is used extensively for defining configuration (e.g., loading API keys from an .env file) and for the structured output models exchanged between agents. This use of data models ensures type-safe and predictable interactions.
Command-Line Interface: The user interacts with MakeItReal through a CLI built with Typer (a library for building user-friendly CLI applications). Typer was chosen for its simplicity in defining commands and options. The CLI provides a clean experience: the user runs a single command with their idea as an argument, and the tool takes over, guiding them through approvals. Colorful output and a spinner (via the Rich library) make the process visually clear.
Docker & Environment: To simplify running the application and to encapsulate all dependencies, we provide a Docker container configuration. The project includes a Dockerfile (using a Python 3.13 Alpine base) and a Docker Compose setup that also spins up the auxiliary service (the Context7 documentation server for the library lookup tool). The containerization ensures that evaluators or users can run the system locally in a reliable way. We adhere to best practices by excluding secrets from the image – the user must provide their OpenAI API key and GitHub token via environment files (.env and .mcp.env), and an example env file is provided to guide this setup.
Version Control & Repository: The code repository is structured following common Python project standards (with a pyproject.toml for dependencies and packaging, and source code under a package directory). We’ve included a Makefile to streamline common actions like running the app or dumping the workflow graph. Continuous integration tools (like linting with Ruff, formatting, and tests via Pytest) are configured to ensure code quality.

Despite being a prototype, MakeItReal was built with an eye on maintainability and clarity, so that other developers can understand the system design from both the code and the documentation. The use of modern frameworks (LangChain, LangGraph) and clear separation of concerns for each agent make it relatively straightforward to modify or improve parts of the system (for example, adding a new agent stage, or plugging in a different LLM).

Usage Guide

Using MakeItReal is straightforward, thanks to the provided CLI and Docker setup. Below is a typical process to run the application and generate a project plan from an idea:

Setup: Ensure you have Docker installed, and obtain the necessary API keys. You will need an OpenAI API key for the LLM agents, and (optionally) a GitHub personal access token if you want to enable the documentation lookup tool fully. Place these in a .env file (for OpenAI) and a .mcp.env file (for the Context7 service) as described in the repository README. For example, your .env should contain OPENAI_API_KEY=<your key> and you can leave the model and base URL as provided defaults.
Launch the System: Build and start the Docker containers using the Makefile. You can do this in one step by running the provided make run command. For instance, to run the system with your custom idea, execute:
```
make run IDEA="task management app for developers"
```
This will build the Docker image if not already built, spin up the necessary containers (the MakeItReal app and the Context7 tool server), and then run the CLI inside the container with the given idea. You can replace the IDEA text with any project idea you have.
Interactive Session: Once running, the CLI will greet you and start processing the idea. You will see messages indicating the workflow stage (e.g., "Analyzing your product idea...") and then it will present the generated list of features. It will ask for your approval. If you just press Enter (which counts as yes), it will proceed; if you type "n" and press Enter, you can then provide a change request. You will go through this approval step for features, tech stack, and tasks in sequence.
Output: At the end of the process, after the task list is approved, the system will output the final list of tasks in the console (each task prefixed by an asterisk bullet for readability). It will also save the entire plan (your idea, the features, tech stack, and tasks) to a timestamped JSON file in a .state directory. This file can be referred back to as documentation of what was generated, or used in the future to reload the plan (once the resume functionality is fully implemented).
Next Steps: With the plan in hand, you can proceed to implement the project. The features outline what to build, the tech stack tells you how to build it (with which tools), and the task list tells you where to start. Even as you begin coding (possibly with the help of AI pair-programming tools), you have a solid map to follow, which was the goal of MakeItReal – to set you off on the right foot.

Example: Suppose you run the tool with the idea “a personal finance tracker app”. The system might guide you through something like this (illustrative example): It proposes features such as “User can input transactions, Categorize expenses, Set monthly budget, View summary reports”. After a couple of revisions (maybe you ask it to add a feature for multi-currency support), you approve the feature set. Next, it suggests a tech stack, perhaps “Flutter for a cross-platform mobile app, Firebase as a backend service for authentication and database, Plaid API for bank integration”. You discuss and approve those choices. Finally, it outputs a task list including items like “Initialize Flutter project”, “Implement login with Firebase Auth”, “Design expense input UI and storage”, “Integrate Plaid API for account linking”, “Test and polish UI”, and so on. These tasks are saved to a JSON file and also printed out for you. With this, you now have a concrete action plan to start building your personal finance tracker.

Interactive Workflow: After running the command, the system will begin processing the idea. You will see a message that it’s analyzing your product idea, and then the agents go to work:

First, a list of proposed features will be output (numbered). For example, you might see:
1. User can create an account and log in
2. User can create a new task with title and description
3. User can mark tasks as completed
(etc.)
This list is the result of the Requirements Generator agent. Immediately after, the system will pause for your input: “Do you approve features? [Y|n]”. At this point, you can type Y (just press Enter for yes) if the feature list looks good. If you type n for no, the CLI will then ask: “What do you want to change?”. You can enter something like “Add a feature for password recovery” or “Remove the user profile feature for now”, depending on your judgment. The system will take that input as a change request and automatically loop back to regenerate a new feature list that accounts for your feedback. This loop continues until you approve the features list.
Next, the technology stack suggestions are presented in a similar fashion (once the features are finalized). For example, the output might suggest:
1. **Backend:** Django (Python) – for rapid development of web backend and REST APIs
2. **Frontend:** React – for building a dynamic user interface
3. **Database:** PostgreSQL – for reliable data storage
Along with possibly some rationale. You’ll be prompted to approve the tech stack. You can accept or request changes (maybe you prefer a different database, etc.), and the TechStack Generator will refine its suggestion if needed.
Finally, the task list is generated and shown. This could look like:
1. Set up a new Django project and initialize a Git repository
2. Create Django app for tasks and implement models
3. Implement user authentication (login/logout/signup)
4. Create API endpoints for task CRUD operations
5. Develop React frontend app and connect to backend API
6. Test end-to-end functionality
Again, you approve or give feedback. Since tasks are the last stage, after you approve them, the workflow ends.

Output: At the end of the interaction, you will have:

A confirmed list of features (this essentially serves as the functional requirements for your MVP).
A confirmed list of tech stack items (languages, frameworks, services you will use).
A confirmed task list (actionable steps to start building).

All of this information is printed to the console in a structured way. You can copy it out or, in future updates, the tool may save it to a file for you. With this output, you have a ready-made project plan. You can proceed to implement each task, and even use AI coding assistants or your own coding skills to develop the MVP with confidence that you’re building the right things in the right order.

Conclusion

MakeItReal demonstrates how a carefully orchestrated multi-agent AI system can assist in early-phase software project planning. By decomposing the problem into specialized roles (requirements, tech stack, tasks) and using an iterative generator-reviewer approach, the system produces results that are both creative and vetted for feasibility. Importantly, it keeps the user in the loop, allowing non-experts to inject their domain knowledge or preferences without requiring them to have technical expertise in project scoping.

In its current state, MakeItReal is a powerful proof-of-concept. Users can take a nebulous idea and, within minutes, obtain a structured specification to guide development. This can shorten the time between ideation and prototyping and reduce false starts. In the future, the system could be extended with more agents (for example, an agent to perform market analysis or risk assessment was conceptualized in our design) or integrated into a user-friendly web interface for broader accessibility. There is also potential to incorporate formal evaluation metrics for the AI outputs or to benchmark different LLMs within the agent roles to continuously improve quality. Nonetheless, even as an experimental tool, MakeItReal provides genuine utility to its target audience – helping them make their ideas real by combining the strengths of AI planning with human creativity and judgment.

License

This project is licensed under the Apache 2.0 License.

Contributing

Contributions are welcome! Please feel free to open issues or pull requests within the GitHub repository.

Contact & Support

For questions or support, please open an issue on GitHub.