Analyzing Tournament Chess Perfomance Factors
Data
Analytics
Chess
Economics
As part of my studies in Economics, my econometrics class required us to write a paper for our final project. This is summary of that paper I wrote regarding cheating in during chess.com Titled Tuesday events. If you would like a full copy, please feel free to reach out and I can send you a PDF.
Abstract
In recent years, the popularity of chess has exploded. In addition to in-person games, online play has become extremely popular. However, as games moved online, the rate of cheating increased. With the inability to prove whether a player is cheating, some players have resorted to comparing players over the board (OTB) ratings. The belief was that comparing OTB ratings when one player outperforms another could indicate cheating. Others have claimed that this is not a valid indicator, as other factors, such as experience playing online, could be a factor in the unexpected win. This paper analyzes these factors and their effect on online performance. After tracking players across several online tournaments, we are able to conclude that neither OTB ratings nor experience are factors in determining tournament performance. We observe the online rating of a player as the most significant factor in determining tournament rank.
Introduction
In recent years, the popularity of chess has increased significantly. Due to COVID lockdowns, players have migrated more towards playing online. Online chess content, such as streamed games and the Netflix series The Queen's Gambit, fueled its popularity.
The most popular platform for playing online is the website chess.com (Chesscom). By the end of 2022, their site had over 100 million users (Team (CHESScom), 2022). As of April 2023, they had 11 million active users daily (Ritchel, 2023).
As games started being played online, the rate of cheating has increased. Chesscom has methods in place to detect cheating and remove offending players from their platform. While Chesscom indicates that they close over 1,000 accounts each week, they do not make their methods public (“What Do I Need to Know about Fair Play on Chess.Com?”, 2024).
With the lack of transparency in methodology and an inability to prove definitively whether a player is cheating, some have resorted to comparing over-the-board (OTB) ratings. The International Chess Federation (FIDE) assigns these ratings based on an Elo rating system. They calculate these ratings as a result of FIDE-arbitrated games.
The belief is that OTB ratings from FIDE are more accurate than those assigned by Chesscom. When one player outperforms another and they have a lower FIDE rating, it is used as evidence of cheating.
Other prominent players have claimed that this is not a valid indicator. They have claimed that other factors, such as experience playing online, could account for the unexpected win.
By comparing players’ results in a controlled setting, we can use the number of games a player has played online as a proxy for experience. We can then compare the effects of rating and experience on overall performance within a tournament. This will allow us to determine whether concerns of cheating with rating disparity are valid.
Data
Chesscom hosts weekly tournaments for money prizes. One of the most popular of those, Titled Tuesday, is a tournament specifically for players that have obtained a FIDE title. Leveraging Chesscom, I was able to obtain information on player statistics and tournament results. Using players' names and countries, I was able to access their OTB FIDE rating from the official FIDE website.
The following table represents panel data of results from a select number of players. These players participated in six specific tournaments between February 26, 2024 and March 27, 2024.
The dependent variable rank (rank) is the player’s resulting rank after tournament completion. The independent variables, player’s Chesscom blitz rating (blitz), and the total number of blitz games the player has recorded on the Chesscom platform (totgames) are captured immediately after the tournament completes. The players’ official FIDE ratings (ofide) are posted monthly, and were captured at the end of February 2024.
For the player rank, going forward I will be referencing the log of the rank ln(rank) which I will represent as lrank. A percentage change in the rank provides a better representation of the value of a change in tournament ranking. For example, moving from second place to first has a higher value than moving from 200th to 199th.
The following graphs provide an initial summary of the relationships between our dependent and independent variables.
With tournament rankings, lower numbers are considered better. Meaning first place is a better result than second. Since higher values for our independent variables are indications of higher skills, we anticipate negative relationships between our dependent and independent variables. At first glance, blitz rating is the only clear variable that displays this negative relationship.
Methodology
My first model was a simple linear regression of our dependent and independent variables.
The initial model showed a statistically significant coefficient for blitz at the 1% level. Total games (totgames) showed statistical significance at the 10% level.
Based on the strength of the relationship of blitz on the dependent variable, I included the polynomials blitz2 and blitz3 which are the blitz term squared and cubed respectively. All three of these variables showed to be statistically significant, with a chi-squared value 10.24, Prob>chi2 = .0059.
I also included the original totgames and ofide variables. It is important to note that ofide is susceptible to measurement error. As players play less frequently in person, their ratings might not be accurate reflections of their true chess ability. This attenuation bias would lower the measured effect of ofide on lrank. However, our analysis is to determine the validity of comparing these ratings as they are reported. Therefore, we can include the ofide variable, and accept any attenuation bias.
Results
The interpretation of the inclusion of these terms can be described by taking the first derivative of our linear model with respect to blitz. Our result would be the following.
This would indicate that a 1 unit increase in blitz rating would result in a -0.12399370705 (~12%) decrease in lrank.
Similar to the basic linear regression model, totgames is statistically significant. However, the relationship is weak and positively correlated. Increasing the number of games played by 1 increased the rank by .000013 (%.0013).
Players’ OTB rating, ofide, was also statistically insignificant and weakly related in this model.
It is worth noting I tested a third model which included an interaction term between totgames and blitz. This was an attempt to assess whether or not the effect of totgames on lrank changed depending on the rating of a player. However, not only was the interaction term not statistically significant, a Ramsey RESET test provided enough evidence to reject the null that the model was correctly specified.
Conclusion
While our second model provided a better fit of the data, the interpretation of the results is similar to our first. We can conclude that OTB ratings and experience playing online have little effect on tournament performance. It is also important to note that in both models, the OTB rating of a player does not have a statistically significant relationship on their ranking in a tournament. From this, we can conclude that we cannot use players’ OTB rating differences, or experience playing online, as indicators of cheating/not cheating during a given tournament.
Even though the model presented shows statistical significance between a players rating and their tournament ranking, the lower R2 indicates that there is still unobserved data being captured by the error term. Further research could include data on individual games within the tournaments. This could include recording winning streaks or the degree of the difference between players’ ratings. This data could help us provide more accurate coefficients.