About Racing Quant

Who We Are

We are a group of university students with backgrounds in mathematics and data analysis, brought together by a shared interest in applying quantitative methods to real-world problems.

Inspired by pioneers such as Bill Benter, we explore the use of statistical modelling and data-driven approaches to analyse horse racing, with a particular focus on the Hong Kong racing scene.

This website is a platform for us to share our analytical outputs, observations, and predictions. It also serves as a space to exchange ideas with others who have a similar interest in quantitative modelling, racing analytics, and probability-based thinking.

Over time, we may introduce additional features, such as optional notifications highlighting model outputs that meet certain statistical thresholds. These updates are intended for informational and engagement purposes only. You can register for our newsletter to stay informed about new features, insights, and updates.

Our goal is to learn, experiment, and refine our models over time through open discussion and feedback from the community.

How Do We Operate

We run two prediction models on every race day.

The Ranker Model picks horses using an algorithm called XGBoost, which is a gradient-boosted decision tree model. You can think of it like this: instead of using one big simple rule, it builds lots of small if-then rules called decision trees. Each tree is not perfect on its own, but XGBoost trains them one after another, and every new tree focuses on correcting the mistakes the earlier trees made. After enough rounds, the combined result becomes a strong predictor.

For each horse in a race, we feed the model a set of inputs, or features, taken from the racecard and database. Most of the signal comes from the usual racing fundamentals we trained on: the current race context, such as class, distance, draw, and weight; recent-form style features from the horse's prior runs, such as finish position, margin, days since last run, and various position-change or pace-shape summaries; plus stable and jockey strength proxies such as trainer and jockey 365-day win and place rates and number of starts. Gear-change indicators are included as extra binary hints that sometimes help on the margin.

What the model predicts is a number that represents the horse's chance to finish in the Top 3, which we call p_top3. Once it has a p_top3 for every runner in that race, the pick is simply the horses with the highest predicted p_top3, ranked from highest to lower. We typically share the Top 4 on our site.

Then there is the Positive EV Win/Q Model. The production Golden W/Q setup is designed for one job: not just to find the most likely winner, but to find situations where the price is wrong enough that it may be worth betting.

It starts from the market, because the market already contains a lot of information. From the live WIN odds, we compute a market win probability. Roughly speaking, lower odds imply a higher chance. If we only followed that, we would basically be copying the crowd, which usually is not profitable after margin. So the production W model adds a second opinion: it uses a LightGBM model trained to estimate how much the market is over- or under-estimating each horse, based on the race and runner features we build from racing history.

That is where overlay comes in. Overlay is essentially our value meter. It measures the gap between what the market is pricing and what our model thinks the horse's true chance should be. When overlay is positive, it means this horse looks better than the odds are implying. When overlay is near zero or negative, it means the odds already reflect it, or may even be too optimistic.

For each race, our algorithm scores every horse and picks a top1, meaning the best horse by its win-scoring function. Then the decision to bet is deliberately simple and strict: we only call a WIN bet on that race if the top1 overlay is above the threshold x. If top1 overlay is not greater than x, we treat it as no edge and skip the race, even if the horse looks strong, because strength alone is not enough and we need mispricing.

Our Q pick sits on top of that. Quinella is about picking a pair to run first and second in any order, so instead of trying to solve the whole race from scratch, the algorithm uses a practical anchor approach. The Q model is a LightGBM ranking model, LambdaRank, that ranks horses for pairing. We take the Golden W top1 as the anchor, the horse we already believe has the best value-adjusted win profile, then we use the Q ranker to choose the best two partners among the remaining runners. The output is two quinella combinations: anchor plus partner2 and anchor plus partner3.

So the production system is really value-first. The key concept is overlay. It is the reason we bet at all, and a race only becomes a bet for us when overlay clears the bar, meaning greater than x on the W top pick.

Contact Us