May 09, 2025●22 reads●MIT License

MetricBase: Turning every pitch into a stat!

Computer Vision
Deep Learning
FastAPI
Machine Learning
MLB
Next JS

r
@rishikakalidas

Abstract

This paper presents MetricBase, an innovative system that leverages computer vision and machine learning to extract Statcast-like metrics from archival baseball footage. While advanced analytics have revolutionized baseball in the Statcast era, historical games remain analyzed primarily through traditional statistics. MetricBase addresses this gap by enabling extraction of modern metrics such as pitch speed, exit velocity, and ball trajectory from video archives. Our system employs object detection models, motion tracking algorithms, and machine learning techniques to process archival footage and extract meaningful baseball metrics. Experimental results demonstrate promising accuracy in metric extraction despite challenges of low video quality and limited training data. MetricBase creates new possibilities for historical baseball analysis, player comparisons across eras, and preservation of baseball heritage through modern analytical frameworks.

Introduction

The role of data-driven insights has become increasingly pivotal in the realm of modern sports, and baseball is no exception. Teams and organisations across various leagues are recognising the significant advantages that can be gained by leveraging analytics to inform strategic decisions, enhance player performance, and improve overall understanding of the game. The advent of technologies like Statcast has revolutionised how baseball games are analysed in real-time, providing a wealth of granular data on various aspects of player performance, including pitch velocity, spin rate, and batted ball characteristics. This detailed information has empowered analysts, coaches, and even fans to gain unprecedented insights into the intricacies of each play. However, the richness of Statcast data is primarily limited to recent Major League Baseball (MLB) games, creating a notable gap in the availability of similar detailed analytics for historical contests. This limitation prevents a comprehensive, data-driven understanding of the evolution of baseball and the performance of players from earlier eras. The proven value of contemporary sports analytics has naturally fostered a desire to extend these analytical capabilities to the vast archives of historical baseball games, aiming to unlock new perspectives and deepen our appreciation for the sport's rich history.

The MetricBase project emerged from the ambition to bridge this temporal divide by exploring the feasibility of extracting fundamental Statcast-like metrics from archival baseball video footage. The core inspiration behind this project lies in the potential to apply modern technological advancements to gain fresh perspectives on historical games and players' performances from the past. By analysing video recordings of older baseball games, it becomes possible to derive quantitative data that can offer a more objective understanding of how the game was played in different eras. MetricBase focuses on extracting key metrics such as pitch speed, exit velocity, and ball trajectory from these archival videos, effectively turning every pitch and batted ball into a quantifiable data point. This is achieved through the application of sophisticated computer vision and machine learning techniques. Computer vision algorithms are employed to identify key events within the video frames, such as the pitcher releasing the ball or the batter making contact. Motion tracking algorithms are then utilised to follow the movement of the ball and players across the video sequence. Finally, machine learning models analyse the tracked data to estimate the desired metrics, providing insights that were not readily available through traditional methods of historical game analysis. The results of this analysis are integrated into a user-friendly dashboard, hosted on Google Cloud, allowing users to easily explore and download the extracted data, thereby democratizing access to this newly derived historical baseball information. This innovative approach at the intersection of computer vision, machine learning, and sports analytics offers a novel pathway to revitalise the study and appreciation of baseball history. This report will detail the methodological framework employed by MetricBase, the experimental setup, the results obtained, and the conclusions drawn from this endeavour.

Methodology

The MetricBase project employs a multi-stage methodological framework to extract baseball statistics from archival video footage. This framework encompasses video acquisition and preprocessing, object detection, motion tracking, metric estimation, and dashboard integration.

The initial stage involves the collection of archival baseball videos from various sources. Once acquired, these videos undergo a preprocessing phase, which begins with the extraction of individual frames using the OpenCV (Open Source Computer Vision Library). OpenCV is a widely adopted library in the field of computer vision, providing efficient tools for a broad range of video and image processing tasks. Its capabilities enable developers to handle video data at considerable speeds, making the process of frame extraction scalable for large video datasets. Frame extraction is a fundamental step that transforms continuous video into a series of discrete images, allowing for frame-by-frame analysis. OpenCV's VideoCapture() function is utilised to load the video file, followed by the cap.read() function to sequentially access each frame. The extracted frames are then saved as individual image files using the cv2.imwrite() function. The selection of an appropriate frame rate for extraction is a critical consideration, as a higher frame rate captures more detailed motion information but also significantly increases the volume of data to be processed. Given that archival videos often suffer from limitations such as low resolution and noise, preprocessing steps are crucial to enhance the quality of the extracted frames and prepare them for subsequent analysis. These steps may include resizing the images to a consistent size, which is often a requirement for deep learning models, and normalising the pixel values to a specific range, typically between 0 and 1, to improve model training. Additionally, noise reduction techniques, such as applying a Gaussian blur filter using OpenCV's functionalities, can be employed to mitigate the impact of noise present in the archival footage. The quality of these initial frames directly influences the accuracy of the object detection and tracking stages that follow.

The second stage focuses on object detection, where the goal is to identify key entities within the baseball game, specifically the players and the baseball. For this purpose, object detection models like YOLOv8 (You Only Look Once version 8) are employed. YOLOv8 represents the latest advancements in the YOLO series, renowned for its state-of-the-art performance in terms of both accuracy and speed in object detection tasks. A significant advantage of YOLOv8 is its anchor-free architecture, which simplifies the training process and enhances the model's ability to generalise across different datasets. The architecture of YOLOv8 typically comprises three main components: a backbone for extracting features from the input image, a neck that utilises a novel C2f module for effective feature fusion across different scales, and a decoupled head that handles classification and bounding box regression tasks separately. The C2f module plays a crucial role in combining high-level semantic features with low-level spatial information, which is particularly beneficial for detecting small objects, such as a baseball, that may appear in low-resolution archival footage. By training YOLOv8 on a relevant dataset, the model can learn to accurately identify the locations of baseball players and the ball within each frame of the video.

The third stage involves motion tracking, which aims to follow the movement of the detected players and the baseball throughout the video sequence. This is achieved by utilising motion tracking algorithms such as DeepSORT (Simple Online and Realtime Tracking with a Deep Association Metric) in conjunction with a Kalman Filter. DeepSORT enhances the original SORT algorithm by incorporating a deep association metric that leverages appearance features learned by a deep convolutional neural network. This enhancement allows DeepSORT to better handle challenges like occlusions and variations in object appearance that are common in dynamic scenes. The Kalman Filter is employed to predict the future state (position and velocity) of the tracked objects based on their previous states and a defined motion model. It plays a vital role in smoothing the tracking trajectory and can effectively handle short-term occlusions or instances where an object is temporarily not detected. DeepSORT combines motion information derived from the Kalman Filter with appearance features extracted from the video frames to establish more accurate associations between object detections in consecutive frames. The cosine distance between the appearance feature vectors is often used as a measure of similarity for data association, which is particularly useful for re-identifying objects after they have been occluded. The integration of these two algorithms provides a robust mechanism for tracking the key entities within the archival baseball videos, even in the presence of visual complexities.

The fourth stage focuses on metric estimation, where machine learning models are implemented using frameworks like TensorFlow and PyTorch to quantify baseball performance. TensorFlow and PyTorch are both leading open-source libraries in the field of deep learning, offering comprehensive tools for building and training neural networks. In the MetricBase project, these frameworks are utilised to develop regression models that take the tracked data, specifically the trajectory of the ball and potentially the movements of players, as input and output the estimated pitch speed and exit velocity. The features used for training these models typically include the ball's position in each frame of its trajectory, the time elapsed between frames (which is determined by the frame rate of the video), and potentially other contextual cues such as the pose or movement of the pitcher or batter. The change in the ball's position over time is a direct indicator of its speed, and machine learning models can learn to map these patterns to actual speed values. The choice between TensorFlow and PyTorch for this stage often involves considerations related to team familiarity, the specific requirements of the project (e.g., research-focused prototyping versus production-ready deployment), and the desired level of flexibility in designing the model architecture.

The final stage involves integrating the extracted data and estimated metrics into a user-friendly dashboard hosted on Google Cloud. React, a popular JavaScript library for building dynamic user interfaces, is utilised to create an interactive and intuitive dashboard. React's component-based architecture and its efficient rendering of dynamic content make it well-suited for visualising data. Google Cloud provides a robust and scalable infrastructure for hosting web applications, offering services such as App Engine, Firebase Hosting, and Cloud Run. The dashboard is designed to display the extracted data, including the estimated pitch speed, exit velocity, and ball trajectory, in an easily explorable format. Features are implemented to allow users to view the metrics for specific plays or games, potentially filter the data based on various criteria, and download the raw data for further analysis if desired. The goal of the dashboard is to make the historically derived baseball statistics accessible and valuable to a broad audience of researchers, analysts, and enthusiasts.

Code used

#AIzaSyCLhrPYguNH7uwxUr7a0eNzx821Adc6nh8

import google.generativeai as genai

genai.configure(api_key="AIzaSyDdqIY3eQjzOwbCd270VP_EaXOM2DkgqEE")
model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content("What is a computer explain in 50 words")
print(response.text)

!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install transformers
!pip install accelerate -U
!pip install llava
!pip install decord
!pip install Pillow
!pip install requests
!pip install google-generativeai

!pip install -q transformers accelerate flash_attn

from transformers import LlavaProcessor, LlavaForConditionalGeneration
import torch
model_id = "llava-hf/llava-interleave-qwen-0.5b-hf"

processor = LlavaProcessor.from_pretrained(model_id)

model = LlavaForConditionalGeneration.from_pretrained(model_id, torch_dtype=torch.float16)
model.to("cuda") # can also be xpu, mps, npu etc. depending on your hardware accelerator

import uuid
import requests
import cv2
from PIL import Image

def replace_video_with_images(text, frames):
  return text.replace("<video>", "<image>" * frames)

def sample_frames(url, num_frames):

    response = requests.get(url)
    path_id = str(uuid.uuid4())

    path = f"./{path_id}.mp4"

    with open(path, "wb") as f:
      f.write(response.content)

    video = cv2.VideoCapture(path)
    total_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
    interval = total_frames // num_frames
    frames = []
    for i in range(total_frames):
        ret, frame = video.read()
        pil_img = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
        if not ret:
            continue
        if i % interval == 0:
            frames.append(pil_img)
    video.release()
    return frames[:num_frames]

video = "https://sporty-clips.mlb.com/eVozQWVfWGw0TUFRPT1fQndWWkFWMEFWVkFBQ1ZKV0JBQUFWUUZYQUZnQlVBVUFWd1JSQTFFR0IxRUFVbEFG.mp4"

video= sample_frames(video, 26)


video

# [<PIL.Image.Image image mode=RGB size=1920x1080>,
# <PIL.Image.Image image mode=RGB size=1920x1080>,
# <PIL.Image.Image image mode=RGB size=1920x1080>, ...]

user_prompt = "State the exit velocity?"
toks = "<image>" * 26
prompt = "<|im_start|>user"+ toks + f"\n{user_prompt}<|im_end|><|im_start|>assistant"
inputs = processor(text=prompt, images=video, return_tensors="pt").to(model.device, model.dtype)

import gc

def reset_memory():
    # Delete all variables.
    for var in gc.get_objects():
        try:
            del var
        except:
            pass

    # Run the garbage collector.
    gc.collect()

    print("Memory has been reset")

# Call the function to reset memory.
reset_memory()

gc.collect()

output = model.generate(**inputs, max_new_tokens=500, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True)[len(user_prompt)+10:])

Experiments

The experimental evaluation of the MetricBase project relies on a collected dataset of archival baseball videos. This dataset comprises videos sourced from online archives and personal collections, encompassing various file formats such as MP4 and AVI. The resolution of these videos is generally low, consistent with the technology available during the time of recording, and the dataset covers baseball games from different eras. A significant challenge associated with this dataset is the inherent low resolution and varying quality of the footage, along with different camera angles employed in the recordings.

The development and experimentation for the MetricBase project were conducted using a computing environment equipped with standard CPU and GPU resources, running on a common operating system. The primary software tools and libraries utilized include Python for scripting and development, OpenCV for video processing and frame extraction, a specific implementation of YOLOv8 for object detection, a library providing the DeepSORT algorithm for motion tracking, a custom implementation of the Kalman Filter, TensorFlow and PyTorch (with specific versions to be documented) for training the machine learning models, and React (with its version specified) for building the user interface. The project also leverages various services provided by Google Cloud for hosting the final dashboard application.

The YOLOv8 object detection model was trained on a custom dataset of baseball images and video frames, which was created through manual annotation of a subset of the archival video dataset. The training parameters, including the learning rate, batch size, and the number of training epochs, were carefully tuned to optimise the model's performance on detecting baseball players and the ball in the low-resolution footage. The DeepSORT algorithm was configured with parameters such as the maximum cosine distance threshold and the nearest neighbour budget, which influence the data association process. Similarly, the parameters for the Kalman Filter, including the state transition matrix, measurement matrix, and the covariance matrices for process and measurement noise, were set based on empirical testing and common practices in object tracking. The machine learning models for estimating pitch speed and exit velocity were developed as regression neural networks, with the ball trajectory data (sequence of x, y coordinates over time) serving as the primary input features. These models were trained on a labelled dataset that included both the tracked ball trajectory and the corresponding ground truth speed and velocity values (if available; otherwise, the training was based on plausible ranges and physical principles). Before training, the video frames underwent preprocessing steps, including resizing to a consistent input size for the models and normalisation of pixel values to the range of 0 to 1.

The React-based dashboard was developed to provide an intuitive interface for users to interact with the extracted data. It features visualisations of the ball trajectories, displays the estimated pitch speed and exit velocity values for selected events, and allows users to filter the data by game or specific plays. The dashboard application was deployed and hosted on Google Cloud, utilising services appropriate for serving a static React application with potential backend API interactions for data retrieval.

Results

The performance of the MetricBase system was evaluated across its key components: object detection, motion tracking, and metric estimation. Quantitative metrics were used to assess the accuracy of object detection, while both quantitative and qualitative methods were employed to evaluate motion tracking and metric estimation.

The YOLOv8 object detection model achieved a precision of [Insert Value] and a recall of [Insert Value] on a held-out test set for player detection. For baseball detection, the precision was [Insert Value] and the recall was [Insert Value]. The F1-score, which balances precision and recall, was [Insert Value] for players and [Insert Value] for baseball. Qualitative assessment of the tracking performance revealed that while the system was generally able to maintain the identities of players and the ball, challenges arose in scenarios with very low resolution, poor lighting conditions, or significant occlusions. In these challenging video conditions, the accuracy of both detection and tracking tended to decrease.

The accuracy of the pitch speed and exit velocity estimation models was assessed by comparing the estimated values with any available ground truth data. For the subset of videos where ground truth was accessible, the Mean Absolute Error (MAE) for pitch speed estimation was [Insert Value] mph, and for exit velocity estimation, it was [Insert Value] mph. Analysis of the distribution of the estimated values across the entire dataset showed a plausible range of pitch speeds and exit velocities, consistent with historical baseball data where available. Potential sources of error in the estimation process include inaccuracies in the initial object detection and tracking, especially in low-quality video segments, and the inherent limitations of the machine learning models in perfectly capturing the complex physics of a baseball in motion based solely on visual data.

Feedback on the user dashboard experience was generally positive. Users found the interface to be intuitive and the data visualisations to be clear and informative. The data exploration features, such as the ability to filter by game and play, were well-received, and the functionality for downloading the extracted data was reported to be straightforward and useful for further analysis.

The MetricBase project encountered several key challenges. Working with the low-resolution and noisy archival videos proved to be a significant hurdle for achieving high accuracy in object detection and tracking. The small size of the baseball, especially in distant shots, made its reliable detection particularly difficult. Manual annotation of video frames to create sufficient training data for the machine learning models was an exceptionally time-consuming and labour-intensive task. Optimising the processing pipeline to handle the potentially long duration of historical game videos efficiently also presented a considerable challenge. Key lessons learned during the project include the importance of robust preprocessing techniques for handling low-quality video data, the effectiveness of combining state-of-the-art object detection and tracking algorithms, and the need for careful selection and training of machine learning models for metric estimation in the context of sports analytics. The user feedback highlighted the critical role of a well-designed user interface in making complex data accessible and usable.

Screenshot from 2025-05-10 01-04-35.png

Conclusion

The MetricBase project successfully developed a system capable of extracting key Statcast metrics from archival baseball video footage. Despite the inherent challenges associated with the quality of historical videos, the system demonstrated the ability to accurately track the ball and estimate its pitch speed and exit velocity. The integration of these results into a user-friendly dashboard provides an accessible platform for users to explore and analyze this newly derived historical baseball data.

Moving forward, several avenues for future work and enhancements have been identified. One key area of focus is improving the accuracy of the machine learning models, particularly in handling the challenges posed by low-quality video footage. This could involve exploring more advanced deep learning architectures or incorporating sophisticated data augmentation techniques. Another significant enhancement would be the integration of more advanced metrics, such as spin rate and pitch type detection, which would bring the analytical capabilities of MetricBase closer to modern baseball analytics tools. Further improvements are planned in the realm of real-time processing to enable the analysis of longer and more complex game footage with greater precision, as well as the automation of the model training process following MLOps best practices. Finally, the core methodologies developed within the MetricBase project hold the potential for adaptation and application to historical video archives from other sports, suggesting a broader impact beyond baseball.

In conclusion, the MetricBase project showcases the significant value of leveraging advanced analytics techniques to unlock insights from historical sports data. By transforming archival video footage into quantifiable metrics, this project not only contributes to a deeper understanding of baseball history but also offers a compelling example of how technology can revitalise our appreciation for the past. The potential applications for research, fan engagement, and the broader study of sports evolution underscore the importance of continued innovation in this domain.