Baseball isn’t just a game of numbers; it’s a game of dreams. From the sandlots to the big leagues, every player’s journey carries the hope of greatness. Yet, scouting often relies on intuition and fragmented data, leaving potential untapped. With 3P, we’ve set out to change that narrative. Our platform uses data-driven insights to help MLB teams uncover the stars of tomorrow, ensuring that no talent goes unnoticed.
What it Does
3P is more than an analytics tool—it’s a crystal ball for baseball prospects. By leveraging cutting-edge machine learning and historical data, 3P provides:
Accurate player trajectory predictions.
Detailed scouting reports highlighting strengths and areas for growth.
Comparative insights to historical players, identifying hidden gems.
Whether you’re scouting for a championship team or looking to invest in the future, 3P delivers the precision you need to make winning decisions.
How we built it
3P combines technological sophistication with user-centric design:
Data Collection: Aggregated from top-tier MLB databases, ensuring high-quality inputs.
Machine Learning: Trained models capable of delivering predictive insights with stunning accuracy.
Interactive Frontend: Engaging visualizations, including real-time performance graphs, comparisons to - legendary players, and predictions generated by Gemini AI.
Cloud-Based Backend: Real-time scalability to handle large datasets effortlessly.
Challenges We Overcame
Data Quality: Cleaning decades of data to ensure accuracy.
Model Training: Achieving high prediction accuracy without compromising interpretability.
User Experience: Crafting a platform that speaks both to scouts and analysts.
Accomplishments that we're proud of:
Achieved a 95%+ accuracy rate in predicting player performance—giving scouts unparalleled confidence.
Built a platform praised for its simplicity and depth by early testers.
Pioneered a tool that democratizes access to advanced scouting analytics, leveling the playing field.
Integrated seamlessly with AutoML for rapid model training and deployment.
What we learned
We learned that the magic lies in the details:
Data integrity is the backbone of any analytics platform.
Bridging the gap between complex algorithms and usability creates a tool that users love.
Collaboration between sports and tech is a goldmine for innovation.
AutoML transformed our development process, making sophisticated AI accessible and scalable.
What’s Next for 3P
Deeper Metrics: Injury risk analysis and team fit assessments.
Global Integration: Scouting international leagues and incorporating ongoing stats from minors.
Fan Interaction: Introducing features for fans to explore predictions and compare players.
Strategic Partnerships: Collaborating with MLB teams for real-world validation.
Broadening Horizons: Expanding 3P to other sports, revolutionizing scouting across disciplines.
How We Trained the Model
Training an accurate and reliable predictive model for 3P required aggregating and processing vast amounts of baseball data. Our approach included multiple data sources and machine learning techniques to create a powerful scouting tool.
Step 1: Data Collection
We pulled data from various sources to build a comprehensive dataset:
MLB Draft Data (25 Years)
We collected 25 years of draft prospect data using the MLB Stats API: https://statsapi.mlb.com/api/v1/draft/{year}
This provided essential scouting details on drafted players, including team selection, round, position, and physical attributes.
Minor League Performance Data
For each drafted player, we retrieved their Minor League stats using: https://statsapi.mlb.com/api/v1/people/{person_id}/stats?stats=yearByYear,career,yearByYearAdvanced,careerAdvanced&leagueListId=milb_all
This helped track player development, identifying key performance indicators over time.
Career Performance from Baseball-Reference
We scraped historical performance metrics from Baseball-Reference, focusing on:
Career WAR (Wins Above Replacement)
Career Hits
Home Runs
Other key statistics throughout their MLB career.
Step 2: Data Preprocessing & Feature Engineering
Once the raw data was collected, we prepared it for model training:
Labeling & Categorization
We created categorical mappings for player positions and teams, allowing the model to distinguish between roles.
Labels were generated to classify players into potential career trajectories (e.g., All-Star, Starter, Bench, Bust).
Feature Selection & Cleaning
We refined the dataset by removing inconsistencies and ensuring accurate data alignment.
Advanced metrics were calculated to enhance predictive power, including performance trends and player growth rates.
Step 3: Model Training with Vertex AI AutoML
With our cleaned and structured dataset, we used Google Vertex AI AutoML to train machine learning models. This allowed us to:
Automate Model Training: Vertex AI efficiently identified the best-performing models.
Optimize Predictions: We fine-tuned the model to predict a player’s long-term success with high accuracy.
Generate Actionable Targets: The model focused on key outputs such as WAR projections, probability of MLB success, and potential career paths.
Step 4: Deployment & Integration
Once trained, we deployed the models using Google Cloud Endpoints, enabling real-time scouting insights.
The predictions were enhanced with Gemini AI, which analyzed model results and generated human-readable scouting reports.
The final output combined statistical analysis and AI-generated insights, making 3P an intelligent and data-backed scouting tool.
With 3P, we’re not just predicting careers—we’re shaping the future of baseball