AI CASTAWAY - Can a LLM survive on a remote island?
Table of contents
Abstract
Can an AI survive on its own? AI Castaway explores this question by placing a Large Language Model (LLM) agent in a survival game, where it must gather resources, craft tools, and make strategic decisions without human intervention.
This publication examines how different AI agents—powered by GPT-4, LLaMA3, Mixtral, Gemma, and Claude—adapt to a constantly changing environment. The AI navigates challenges using two approaches: Zero-Shot, making fast one-shot decisions, and Agentic, where it selectively retrieves and processes information for smarter choices.
By blending AI-driven reasoning, decision-making, and survival mechanics, AI Castaway pushes the boundaries of autonomous AI agents in interactive environments.
Introduction
Video games have evolved from simple pixel-based mechanics to complex, immersive worlds. One of the most exciting advancements is artificial intelligence in gaming, where AI-driven characters can think, adapt, and react dynamically.
AI Castaway explores this frontier by placing a Large Language Model (LLM) in a survival scenario on a remote island. Unlike traditional game AI, which follows scripted behaviors, this system uses LLMs to make real-time decisions, gather resources, and manage survival needs.
This publication examines how LLMs can simulate reasoning, problem-solving, and adaptation in a dynamic environment. It compares two AI approaches—Zero-Shot and Agentic—to determine which is more effective for autonomous survival. By integrating cutting-edge AI with game mechanics, AI Castaway pushes the boundaries of interactive entertainment and AI-driven decision-making.
Methodology
To explore whether an AI can survive in a dynamic environment, AI Castaway implements an autonomous agent within a survival game. The AI must independently gather resources, craft tools, and manage survival factors such as hunger, thirst, and stress. To achieve this, I developed a system that integrates Large Language Models (LLMs) with decision-making frameworks, creating an adaptive AI-driven experience.
System Architecture
The system consists of three core components:
- Game Engine (Unity) – Simulates the environment, tracks resources, and executes AI decisions.
- AI Decision System (Python, FastAPI) – Processes game data, determines the next action, and sends responses to the game engine.
- LLM-based AI Agent – Uses GPT-4, LLaMA3, Mixtral, Claude, and Gemma to generate intelligent decisions based on game state, past experiences, and predefined survival strategies.
The AI follows two distinct decision-making approaches:
1. Zero-Shot Decision-Making
In this approach, the AI makes decisions based on a single, complete snapshot of its environment. It receives all relevant game data—inventory, health, surroundings, and available actions—at once and generates an immediate response. This method is efficient but has limitations, such as inability to retrieve past experiences beyond the prompt's token limit.
2. Agentic Decision-Making
Unlike the Zero-Shot approach, the Agentic method allows the AI to retrieve specific information as needed, much like a human would recall relevant details before making a decision. Using frameworks like LangChain, the AI selectively queries its memory, past actions, and environmental changes before choosing the best course of action. This results in more context-aware and strategic decision-making, improving long-term survival.
Handling Resource Constraints
LLMs have strict token limits, which affect how much information they can process at once. To overcome this, the system implements:
- XP-based Unlocking – The AI starts with basic survival actions (e.g., collecting wood, hunting) and gradually gains access to more complex tasks (e.g., crafting tools, building shelters) as it earns experience.
- Memory Optimization – Logs of past actions are pruned and prioritized to keep relevant data accessible while discarding unnecessary details.
- Selective Queries – In the Agentic approach, the AI dynamically fetches only the most critical information instead of processing the entire game history.
Evaluation Metrics
To assess AI performance, I track:
- Survival Time – How long the AI lasts before failing due to hunger, injury, or environmental hazards.
- Decision Efficiency – The effectiveness of chosen actions in resource gathering, crafting, and self-preservation.
- Adaptability – How well the AI adjusts to changing conditions (e.g., bad weather, scarce resources).
- Comparative Analysis – Performance differences between Zero-Shot and Agentic approaches across multiple AI models.
Conclusion
By combining real-time survival mechanics with AI-driven reasoning, AI Castaway explores the potential of adaptive AI agents in dynamic environments. The system's ability to make strategic, evolving decisions showcases a significant step toward autonomous, decision-making AI in gaming and beyond.
Experiments
To evaluate the AI’s ability to survive autonomously, I conducted multiple experiments in AI Castaway, testing different decision-making models and AI agent architectures. The focus was on survival efficiency, resource management, and strategic planning.
Experimental Setup
AI Models Tested
I evaluated multiple LLMs to compare their effectiveness in survival scenarios:
- GPT-4o (OpenAI)
- LLaMA3-8B and LLaMA3-70B (Meta)
- Mixtral-8x7B-32768 (Mistral)
- Gemma-7B-IT and Gemma2-9B-IT (Google DeepMind)
Each model was tested under identical survival conditions, measuring how well it could make context-aware decisions.
Decision-Making Approaches
I compared two primary AI strategies:
-
Zero-Shot Decision-Making
- The AI receives all relevant game data in a single input and generates the next action immediately.
- Strength: Faster processing.
- Weakness: Can struggle with long-term strategy and memory constraints.
-
Agentic Decision-Making
- The AI retrieves specific information as needed before making a decision.
- Strength: More strategic and adaptive.
- Weakness: Higher computational overhead.
Evaluation Metrics
To assess performance, I tracked:
- Survival Duration – How long the AI sustained itself before failing due to starvation or resource depletion.
- Action Efficiency – The percentage of useful vs. redundant actions taken.
- Resource Management – How effectively the AI collected, stored, and used materials.
- Decision Consistency – Whether the AI made rational choices based on past experiences.
Experimental Execution
Each AI agent started with no knowledge of the environment and had to explore, plan, and survive based on real-time feedback. The AI could interact with objects, prioritize crafting essential tools, and manage its hunger, thirst, and stress levels.
I conducted multiple test runs with different models, observing how well they adapted to new survival challenges and whether the Agentic approach led to superior long-term performance compared to the Zero-Shot method.
Results & Observations
Findings revealed significant differences between LLM-based decision-making and traditional game AI approaches. I analyzed how models:
- Prioritized basic survival tasks (food, water, shelter).
- Adapted to inventory limitations and crafting complexity.
- Managed multi-step survival strategies (e.g., crafting a fishing rod before searching for food).
Results
The experiments conducted in AI Castaway provided key insights into how different AI models handle autonomous survival tasks. By comparing various LLMs and decision-making approaches, I evaluated their effectiveness in managing resources, adapting to challenges, and making efficient survival choices.
1. Comparison of Decision-Making Approaches
Zero-Shot vs. Agentic Performance
-
Zero-Shot Approach
- Produced faster decisions but struggled with long-term planning.
- Often failed to anticipate multi-step survival tasks (e.g., crafting an axe before attempting to cut wood).
- Performance declined as the game progressed due to token limits affecting memory recall.
-
Agentic Approach
- Demonstrated better decision consistency by selectively retrieving relevant past experiences.
- Managed complex survival strategies more effectively, such as prioritizing essential resources before engaging in riskier tasks.
- Showed higher adaptability in situations requiring strategic planning (e.g., preparing tools in advance for resource gathering).
- However, it required more computational resources and took longer to generate decisions compared to the Zero-Shot method.
Key Observations on AI Behavior
-
Survival Efficiency
- AI agents using Agentic decision-making consistently survived longer than those using Zero-Shot methods.
- The AI struggled when forced to make trade-offs, such as deciding between searching for food vs. crafting tools.
- Survival failures often resulted from inefficient inventory management, where the AI failed to prioritize critical resources.
-
Action Optimization
- LLMs frequently selected suboptimal actions when token limits restricted access to past data.
- AI models with stronger memory integration (Agentic) performed significantly better in multi-step tasks.
2. AI Model Performance Comparison
I tested different LLMs to see how well they adapted to survival conditions.
-
GPT-4o
- Strong performance in reasoning and long-term decision-making.
- Handled multi-step planning well, leading to higher survival rates.
- Sometimes produced overly verbose responses, increasing computational load.
-
LLaMA3 (8B & 70B)
- Showed balanced performance between decision speed and complexity.
- The 70B version was more effective in survival planning but required longer processing times.
-
Mixtral-8x7B-32768
- Efficient at basic survival tasks but struggled with multi-step logic.
- Faster decision-making but sometimes lacked strategic depth.
-
Gemma-7B-IT & Gemma2-9B-IT
- Performed well in resource prioritization but was less effective in long-term survival.
- AI occasionally made redundant or inefficient decisions when recalling past events.
3. Key Takeaways
- Agentic models consistently outperformed Zero-Shot models in survival duration and decision accuracy.
- LLM token constraints impact survival efficiency, requiring memory optimization techniques.
- More powerful AI models (GPT-4o, LLaMA3-70B) showed better survival performance, especially in multi-step task planning.
- Faster models (Mixtral, Gemma) were more responsive but often failed to execute long-term survival strategies.
4. Future Considerations
- Refining memory handling to improve Zero-Shot efficiency without increasing token usage.
- Optimizing Agentic decision-making to reduce computational overhead while maintaining contextual accuracy.
- Investigating hybrid models that blend Zero-Shot speed with Agentic depth to enhance AI-driven survival simulations.
Conclusion
The AI Castaway experiments demonstrate that LLM-powered AI agents can adapt, strategize, and survive in dynamic environments, but they face key challenges. While Zero-Shot decision-making offers speed, it struggles with long-term planning due to memory limitations. The Agentic approach, on the other hand, excels in context-aware decision-making, leading to better survival outcomes despite higher computational demands.
mY findings highlight the importance of memory management, strategic planning, and resource prioritization in AI-driven survival simulations. More advanced models like GPT-4o and LLaMA3-70B performed better in multi-step survival tasks, while faster models like Mixtral and Gemma were more responsive but lacked strategic depth.
These results open up exciting possibilities for AI-driven agents in gaming and beyond. Future work will focus on optimizing decision-making efficiency, hybrid AI models, and refining memory strategies to further enhance autonomous AI survival capabilities.
This research proves that AI can survive on its own—but just like humans, it must learn, adapt, and evolve.
Models
There are no models linked
Datasets
There are no datasets linked