AI Ball Tracking for Sport Analysis

Accurate ball tracking has long been a holy grail in sports technology, promising to unlock new insights and improvements in athlete performance. It can be used for various tasks, from gathering valuable data to helping referees make decisions, even for predictive analysis. However, developing reliable ball tracking systems has proven to be a complex and challenging task. However, the recent breakthroughs in AI and computer vision space have introduced new models and methods to enhance ball tracking greatly.

‍

In this article, we'll explore the latest advancements in AI-powered ball tracking, and how they're overcoming these challenges to revolutionize the sports industry.

What is Ball Tracking

Ball tracking in sports is simply tracking the ball as it moves around in the field. Along with this, things like collision detection with the bat or racket, speed analysis, and player interaction with the ball are also measured and tracked. These are valuable data points in the sports analysis industry as for most of the games the “ball” itself is at the core. It might be kicking the ball, putting the ball in a basket, or hitting the ball with a bat or a racquet. Balls are involved in one way or another in most of the popular games.

‍

Ball tracking has become almost a necessary part of most of the games. It just allows the players, regulators, and viewers to extract so much more information from the games. It really enriches the experience and improves the game overall.

‍

Let’s talk about some of the advantages of using Ball tracking in sports.

‍

Ball tracking for Making Better Decisions

‍

Making judgment decisions is one of the most important parts of any sport, with or without balls. Most of them are straightforward like fours and sixes in Cricket, but in some cases, these can become very difficult, like LBW (Leg before Wicket) and No Ball judgment in Cricket. These edge case scenarios are now handled by “3rd Umpires”, These are the umpires sitting in a room with a lot of video feed coming from all over the field.

‍

For the 3rd umpires, ball tracking data is essential. When the field umpires cannot make a decision they refer to the 3rd umpires, which then using ball tracking and other sport analysis techniques, make a decision for them. This requires a lot of work and a lot of data processing in real-time.

Ball Tracking for Improving Player Performance

Ball tracking has been used by players for a long time now to improve their own performance and to understand the flaws in other people’s playing styles too. Bowlers use it all the time to understand how they are handling the ball and where can they improve further to score the most wickets. Golf players use ball tracking with pose estimation to understand how they should hit the ball and at what angle. Doing all this provides players with extremely granular data and very good feedback by the system, with all this, it’s very to improve and catch mistakes. At this point, this is something every player in the industry uses.

Ball tracking for Assisting Coaches

Along with improving individual performances, coaches have started making better teams based upon different metrics gathered through sports analytics systems such as ball speed, spin rate, distance covered while running after hitting the shot, and number of steps taken during the batting stance preparation phase. This allows managers/coaches access detailed insights regarding strengths & weaknesses amongst squad members allowing informed tactical adjustments throughout the season leading ultimately toward success.

‍

Coaches also analyze opponents’ games thoroughly beforehand so they can devise strategies accordingly against certain styles played previously seen footage recorded via cameras capturing entire match proceedings including replays slow motion clips highlighting key moments helps understand tactics employed by both sides resulting in improved overall quality play. This really helps coaches prepare a better team strategy against any other team.

Ball Tracking for Fan Engagement

Ball tracking for fan engagement is rather a new phenomenon where providers has been introducing things like “Ball cam”, a special drone or camera that follows the ball specifically. This is new but something viewers love engaging with. Providers also often show the “ball path” and other important visualizations that keep viewers engaged and informed.

How does Ball tracking work?

There have been many ways Ball tracking is implemented in the actual games. One of them notably being using TrackNet and YOLO Networks. These techniques are often paired to provide a good experience and also work well to this date. But we want to introduce newer better models which can track the ball and other objects even better.

‍

Let’s learn how to build ball tracking pipelines.

Segment Anything Model

Segment Anything Model by Meta is a rather new and recent model. Segment Anything Model or SAM by Meta is a rather straightforward model. It takes in an image and can take in various types of prompts like masks, boxes, points, and even free-form text. Then the both image and the passed prompt is encoded into a much smaller subspace, these embeddings are then passed into a decoder which outputs a final mask that represents the segmented parts of the image. As you can see in the diagram below:

‍

This means that you can pass a normal image like this:

‍

‍

And segment all the players and extract information from it like this:

‍

‍

As you can see the model was able to extract all the details from the image, the players, the pitch, the umpire the hats, helmets, etc. This is very granular data. This data can further be used to analyze a lot of things in the games, and also, track specific players, balls, and whatnot.

Track Anything Model

The Track Anything model is an extension of the Segment Anything Model, integrating the X-Mem architecture with it to allow it to operate over images. Track anything is first used to create a segmentation mask for the object that is desired to be tracked, the mask is then provided to the X-Mem model which is very good in tracking objects over a long-term video.

‍

X-Mem uses the Atkinson-Shiffrin Memory Model which is similar to how human beings process and store information and memory, the same architecture is then used over consistent frames to track an object through the video. Over the years X-mem has evolved into a much better architecture, X-Mem++ being the latest one. All these techniques can be used to track players, balls, and other elements in a game.

‍

Here you can see Stephen Curry being tracked across shot changes over a 2 minute video.

‍

This same pipeline can be used for any game, like soccer, baseball, basketball, cricket, golf and whatnot.

‍

Latest Updates - SAM2 and EdgeTAM

While SAM made great progress in image segmentation, it wasn't designed to handle the fast, unpredictable world of videos. Even though SAM is great for segmenting images, current video segmentation models and datasets still fall short when it comes to “segmenting anything in videos”

That’s where SAM 2 comes in. It builds on SAM by creating a unified model for both image and video segmentation, introducing a streaming architecture with memory attention. This new memory system keeps track of earlier frames and uses that context to improve predictions over time. As a result, SAM 2 can track and segment objects more accurately across movement, occlusion, and lighting changes, even with fewer user inputs.

When it comes to performance, SAM 2 is a huge upgrade. It delivers better segmentation with 3× fewer interactions, outperforms previous models on standard video benchmarks, and even beats SAM on image tasks while running 6× faster.

SAM 2 is powerful, but it's too heavy to run efficiently on mobile devices. The main bottleneck is the memory attention blocks added for video processing. To solve this, a lightweight version of SAM2 called EdgeTAM was introduced.

EdgeTAM replaces the heavy memory system with a new component called the 2D Spatial Perceiver. This module reads stored video frame data more efficiently using a lightweight transformer. Instead of scanning everything in detail, it uses a fixed set of smart "queries" that focus only on what matters. This keeps it fast without losing accuracy.

‍

Since video segmentation requires pixel-level precision, EdgeTAM keeps the spatial layout of the memory intact. It organizes the queries into two groups - global-level queries that look at the full scene, and patch-level queries that focus on small, local areas. This balance helps the model capture both the overall context and fine details.

‍

As a result, EdgeTAM achieves 87.7, 70.0, 72.3, and 71.7 J &F on DAVIS 2017, MOSE, SA-V val, and SA-V test, while running at 16 FPS on iPhone 15 Pro Max.

‍

State-of-the-art Ball tracking Models (May-2025)

TrackNetV3

Ball tracking has moved from slow, error-prone manual tagging to real-time AI systems that handle speed, clutter, and occlusion with ease. TrackNet was an early deep learning model that used CNNs to detect balls in motion, even in visually noisy sports footage.

‍

TrackNetV2 improved on this by using a U-Net architecture. It processed multiple frames and predicted heatmaps, which helped it deal better with motion blur, occlusion, and lighting changes. It achieved an IoU (overlap between original and predicted mask) of 0.82 but still struggled when the ball disappeared mid-play.

‍

TrackNetV3 solves this issue with two modules - trajectory prediction and rectification. The prediction module looks at a sequence of frames plus a background image, helping the model ignore static distractions and focus on the moving object. If the shuttle gets occluded or missed, the rectification module kicks in. It studies the trajectory, guesses where the ball likely was, and “repairs” the gap using inpainting (filling in missing positions).

It also uses mixup augmentation, it mixes different training examples to help the model handle strange lighting, and unpredictable ball movements. As a result, TrackNetV3 reaches 97.51% tracking accuracy, better than TrackNetV2 (94.98%) and much higher than general models like YOLOv7 (53.47%). Its IoU score also improves to 0.91.

‍

TrackNetV3 doesn’t just detect where the ball is, it can guess where it went, even if it disappears for a moment. That is crucial for live replays, analytics, and broadcast graphics, especially in fast sports like badminton.

YOLOv11 + SAHI + ByteTrack

Tracking small objects in high-resolution sports video is challenging due to their tiny size and fast movement. In 4K footage, objects like shuttlecocks or cricket balls typically appear as just 5 to 15 pixels wide.To track them accurately, we need a system built specifically for small, fast-moving objects. A combination of YOLOv11, SAHI, and ByteTrack addresses this challenge.

YOLOv11 + SAHI

YOLOv11 is designed to detect very small and low-resolution objects in busy scenes. It uses a technique called dynamic attention, which helps the model focus on the important parts of the image, especially where the small objects are. The model also incorporates special blocks called C3k2, which help combine features from different image sizes. This makes it easier to detect objects that are blurry or only partially visible.

During training, mixup techniques blend different images together to simulate conditions like poor lighting or fast motion. This improves the model’s performance in real sports videos, where conditions are not always ideal.

Specialized detectors struggle when small objects are just a few pixels in a large 4K frame. SAHI (Sliced Aided Hyper Inference) solves this by splitting the image into smaller, overlapping sections. In each section, the small object occupies a larger portion of the frame, making it easier to detect.

‍



import supervision as sv
from inference import get_model


# Load small-object-optimized model
model = get_model(model_id="yolov8x-640")
image = cv2.imread()


# Slice callback
def callback(image_slice: np.ndarray) -> sv.Detections:
    results = model.infer(image_slice)[0]
    return sv.Detections.from_inference(results)


# Run sliced inference
slicer = sv.InferenceSlicer(callback=callback)
detections = slicer(image)

‍

ByteTrack

Once an object is detected, the next step is to track it across multiple frames. In fast sports, objects can quickly disappear behind players or move too fast to track consistently. ByteTrack handles this challenge by keeping even low-confidence detections and trying to match them to previously identified tracks.

It does this using dual-thresholding, which evaluates both high-confidence and low-confidence predictions. Kalman filtering is used to predict where the object will move when it's temporarily out of sight. The matching algorithm connects detections based on the object’s motion and the overlap between frames.

This approach improves the continuity of tracking, meaning objects are less likely to be misidentified or lost during occlusions.

SAMURAI

‍

Meta’s “Segment Anything” Model (SAM) excels at image segmentation but struggles with video object tracking, especially in crowded scenes, fast-moving targets, or when objects briefly disappear. The issue lies in SAM’s memory mechanism, which uses a fixed window to store only the most recent frames without assessing their quality. This leads to error accumulation over time, affecting its tracking accuracy in dynamic video scenarios.

‍

To overcome these challenges, researchers at the University of Washington built a new model called SAMURAI (Segment Anything Model Using Robust AI). It improves SAM by using motion cues and smarter memory selection. It doesn’t need retraining and works well across a range of tracking tasks.

‍

‍

Key Innovations of SAMURAI:

‍

Motion Modeling System

SAMURAI uses motion cues to predict where objects will move in complex, dynamic scenes. This helps it select the right mask and avoid confusion when objects look similar or overlap, ensuring accurate tracking even in challenging situations.

‍

Motion-Aware Memory Selection

SAMURAI improves SAM’s memory by replacing its fixed-window system with a hybrid scoring approach. It evaluates frames based on three factors:

Mask similarity
Object appearance
Motion patterns

Only the most relevant frames are kept in memory, minimizing errors and improving tracking accuracy

‍

SAMURAI works because of its use of motion and memory. It uses Kalman filters to predict object positions and sizes, helping it pick the right mask from multiple options. It only stores frames that meet certain quality thresholds for mask similarity, ensuring it focuses on the most relevant data. The balance improves tracking accuracy and reliability

Performance Comparison

TrackNet V3 is efficient on GPU memory (2.5 GB) but lags behind in speed. It manages around 25 FPS at 1080p, but drops to 12 FPS in multi-camera setups, making it more fit for post-event analysis rather than real-time use.

YOLO + SAHI + ByteTrack sits in the middle. Without slicing, it can hit 45 FPS, but once SAHI tiling is used (for better small object detection), speed drops to 15 FPS. It's CPU-intensive due to multi-threaded slicing, and latency jumps to 66ms with full slicing, which can impact live responsiveness.

Samurai clearly stands out with the highest FPS across both 1080p and 4K setups, achieving 60 FPS on single-camera 1080p and 22 FPS across 4K multi-camera feeds, thanks to optimized memory streaming and frame handling. Its latency is also the lowest at just 16ms, making it suitable for live applications.

‍

How to use Ball Tracking Data

Once you have the ball tracking data along with the player data, then you can do a lot of things from there to generate a ton of analytics. Things like player interaction with the ball, team interaction, what player is best at what area, where the ball goes most, etc. All these very essential metrics become very easy to extract and track once we have the data ready, let's see how we can work with all this data.

‍

Team Analysis

‍

Once the ball tracking data is in, we can measure how teams or specific members of the team interact with the ball at several different incidents. This is rather important for games like Basketball and Soccer, as different members seem to perform differently given the phase of the game and the area of the field. Important metrics like the Strech Index and Team Synchrony can be extremely useful in these scenarios.

‍

These metrics help understand how much area a team is using and how well. For example, the stretch index of 3 players might be very tight, and that could explain why they have trouble maneuvering the ball over large areas. Whereas, if the other team’s covered area overlaps with our team’s area, we can understand how they are going to interact and study what are some techniques to secure the ball in those scenarios. All this is very valuable to a manager or a coach. You can read more about these metrics here.

‍

Individual Analysis

Individual player analysis is as important as team analysis. Things like the individual path of a player, movement speed of a player, distance from the ball as the game moves on, etc. These are important metrics for specific players. We can also understand how a player is performing in different areas of the field. For example, if a player is not moving much in the defensive area, we can understand that the player is not performing well in that area. This is important for the coach to understand and make decisions on how to improve the player in specific areas or what areas to target them for.

‍

This data can also be used to pair correct players together. 3 players who have a high stretch index together can be paired together to cover a large area of the field. Players who can run faster can be placed closer to the opponent’s side so that they can quickly move back and forth between defense and offense.

Pose Analysis

Ball tracking data can be further paired up with pose estimation to refine the technique of the player in games like cricket and golf. These games where you have to hit the ball with great accuracy and precision can benefit greatly from pose analysis. You can see how the position of the body changes during hitting the ball and whether certain positions result in better performance than others. Pose estimations can also detect injuries early on before they become serious problems down the road.

‍

‍

As you can see in the image, pose estimation in golf can be helpful. Tracking where the ball goes, when hit a certain way, and when the pose is in a certain way. Poses can also be compared with other better players to get an idea of what the player is doing wrong and where the improvement is needed. This same technique can be used for various other things like exercise and posture analysis if needed.

Traditional methods vs Track Anything Model

As mentioned before, traditionally, networks like Tracknet have been used for this application. But even Tracknet has its issues.

‍

Performance Issues

Tracknet, being a single architecture, can make mistakes. Tracknet was mainly developed for tracking shuttle cocks during tennis matches, performance across other domains can be very degraded unless finetuned properly. It is seen that smaller faster moving objects, like a ball in cricket, can be problematic for Tracknet to track. However, techniques that build upon Tracknet like MOTRv2 seem to show much better performance. These are not single network architectures but rather pipelines that use the network for the core tracking task.

‍

SAM, on the other hand, is highly performant in most of the out-of-domain tasks, and when combined with other architectures like X-Mem and X-Mem++, the tracking capabilities are simply SOTA. Similar to Tracknet, pipelines built with SAM might also require some finetuning but performance gains are much greater compared to Tracknet.

‍

Computation Cost and Inference Time

Another big issue with TrackNet and its dependent pipelines is that it is computationally very heavy as it is mostly a single convolution network, mostly. TrackNetV2’s performance is around 31.8 FPS. Whereas the Track Anything Model paired with X-Mem++ can do 39 FPS. And much more if a smaller version of the model is finetuned for a specific use case.

‍

In production, it is often the case that not all the frames are processed, most are clumped together and a Kalman filter is used with it to track the ball in all the frames altogether

‍

Want to Build Ball Tracking Pipelines for Sports?

If you are looking to build ball-tracking pipelines and sports analysis applications, please reach out to us. We have worked with many computer vision pipelines and have integrated them into already existing systems. Reach out to us to build such pipelines or just to chat. Would love to chat!