Bias in English Horse Racing (2023)

Adam Davis, Ashley Wu, Matthew Benstock, Jay Hou, Alcindor Leadon

Introduction & Data Cleaning

We are interested in studying the outcomes of horse races at four different English race courses: Beverley, Chester, Newmarket, and Pontefract. We seek to determine whether there is any evidence of track bias that influences the results of the race.

Track bias would likely favor horses starting in stall positions closer to the inside of a curved track. Since the distance needed to circle the inside of a curved track is shorter than that needed to circle the outside, horses racing on the inside may have an advantage. In our data set the potentially favored horses may be in either low or high numbered stalls, depending on the course.

Potential bias, however, may be mitigated by a number of factors. For example, when the first curve is later in the race, horses have more time to jockey for the inside position, allowing the fastest horses to also take advantage of the inside lane regardless of starting position. Additionally, courses with less steep curves may also show a negligible amount of bias, where the fastest horses are still able to win despite a poor starting stall.

There are times when horses drop out of races and do not finish, and thus there may be inconsistencies in data where the starting position is a number larger than the total number of horses that finish a race. As a result, the starting positions have been reranked to only incorporate horses that finished.

We decided to eliminate races in which there was a tie or multiple horses started in the same stall in the data set. There were 1406 entries we removed in total, so we dropped about 3 percent of the original race data.

We are interested in testing whether finishing positions are unbiased as they relate to starting stall position. Additionally, we cleaned the data in order to address samples with:

  • Too many horses raced for the number of stalls at the track
  • Too few horses for reasonable results (less than 5)
  • Two horses in the same stall
  • Re-order Beverley races (all are pre-2011)

Methodology & Analysis

We assume that if a course is unbiased in an unseeded race, where horses are randomly placed into a starting stall, then any horse is capable of and should have an equal chance of finishing in first place, or any other finishing position. For example, a horse starting in stall one, has an equal probability of finishing in first place as it does of finishing in third, fifth, or last place. Thus, we expect a uniform distribution of finishing positions for each starting position, our null hypothesis.

Since each race does not consist of the same number of horses, we define new variables, fp and sp, that results in the finish and start percentile for a horse, normalizing the finish and start rank and allowing us to compare between races of different sizes. Note that we subtracted .5 from each position so that the average percentile for any given race would be equal to .5.

We define start_p and finish_p below:

x <- within(x, { start_p <- (start - .5)/Freq finish_p <- (finish - .5)/Freq})

Below we plot finish_p against start_p at each of the courses:

Bias in English Horse Racing (1)

We expect both start_p and finish_p to be uniformly distributed between the values of 0 and 1. A different distribution, for example should fp be skewed closer towards 0 for a lower starting position, indicates a bias because it shows that a certain finish position is more likely given the starting position, making the finish position dependent on the starting position as opposed to independent.

To get a slightly better view of the distribution of start_p and finish_p, we have produced quartile heatmaps for each of the courses. We can see that Chester and Pontefract have a high frequency of horses that started and finished in the first (top) quartile of horses, which is indicative of bias. For Beverley and Newmarket, we can see that there is a higher relative frequency of bottom quartile starting horses in finishing in the top quartile as well as top quartile starting horses finishing in other quartiles.

Bias in English Horse Racing (2)

Our null hypothesis, \(H_0\), thus states that for a given track, finish_p should be uniformly distributed between 0 and 1 relative to start_p, indicating that finish percentile is independent of starting percentile. Alternatively, if finishing percentile is dependent on starting percentile, meaning that starting percentile influences finishing percentile, we reject the null hypothesis and conclude that there is evidence of bias. However, we are primarily interested in track bias that favors horses starting in a stall closer to the inside of the course, meaning that we want to know if a better starting position leads to a better finishing position as opposed to seeing if starting in the first stall is going to result in a significantly worse finish.

Since the data has been normalized, the data is now continuous, and we are able to use a regression method to determine mean and significance of deviation from the expected mean of 0.5 for the first starting position. The intercept is the mean, and when it is less than 0.5, it suggests that starting first results in a better than average finish. We are also able to see the significance of the starting percentile variable in determining its significance in influencing finishing percentile. The significance of the starting percentile variable also suggests whether or not this bias extends to starting percentiles beyond the first position.

Additionally, we are able to take into consideration other variables and how they affect the finishing percentile. For example, we are able to incorporate length into the regression model. We generated such a linear model for each of the courses:

# Model for Beverleymodel_b <- lm(finish_p ~ start_p + len, data = Bev)# Model for Chestermodel_c <- lm(finish_p ~ start_p + len, data = Ches)# Model for Newmarketmodel_n <- lm(finish_p ~ start_p + len, data = Newm)# Model for Pontefractmodel_p <- lm(finish_p ~ start_p + len, data = Pont)

Below we present simplified results for each of the models:


Bias in English Horse Racing (3)

Based on the mean predicted values from each of the models, we were able to identify evidence of bias in Chester and Pontefract but not in Newmarket or Beverley. This possible bias was identified because the average predicted finishing percentile for the potentially favored starting positions was less than .5, indicative that such horses are more likely to finish better than the average horse in the race.

Additional considerations

In this model, we were able to take into account the length of a course by including it as a variable, although there were shortcomings with length as a variable as it stands. Length is difficult to use because we would assume the variable to be categorical as a result of the length of a track, but there are variances between length that we are unable to account for. Additionally, we are unable to consider the shape of the course and its effect on track bias. This is one of several factors that were not in the original data set that might be of interest to our analysis. We would like to study, were data available, favorable weather conditions at one course versus another, type of track material, and whether a horse is a “hometown favorite”, meaning whether or not a horse has extensive experience on a particular track.

Top Articles
Latest Posts
Article information

Author: Maia Crooks Jr

Last Updated: 01/12/2023

Views: 5547

Rating: 4.2 / 5 (63 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Maia Crooks Jr

Birthday: 1997-09-21

Address: 93119 Joseph Street, Peggyfurt, NC 11582

Phone: +2983088926881

Job: Principal Design Liaison

Hobby: Web surfing, Skiing, role-playing games, Sketching, Polo, Sewing, Genealogy

Introduction: My name is Maia Crooks Jr, I am a homely, joyous, shiny, successful, hilarious, thoughtful, joyous person who loves writing and wants to share my knowledge and understanding with you.