Jump to content

All Activity

This stream auto-updates     

  1. Past hour
  2. I made a MAD about figure skaters, idk if i can post it here if it's not 100% Yuzu related
  3. As we've got two recycled programs already I'd really like something new... Unless it's Masquerade, but I don't think it will be.
  4. And Worlds, don't forget Worlds. Right now they only list La Presse as a media sponsor ( which is a newspaper), but surely they'll have someone broadcasting (right, ISU? Right?). Possibly CTV/ TSN, who I think have the rights for SCI too. Or maybe the French-language sports station, I can't remember the name at the moment. CBC seems like it's not going to fund figure skating coverage this year, but we'll see.
  5. Selling one Smart pack section 113 row 6 original price 519€ Contact me if interested!!
  6. Yeah, Skate Canada is around the corner, EX gala included. I'm really curious what program Yuzu has picked for this season... (as much as I love Haru Yo, Koi, I really hope for something new...) Many people say that Crystal Memories is the most likely choice. Would you agree? I created this poll + prediction thread, so that the General Chat doesn't get too flooded (I hope, that's okay?)
  7. Well, in less than two weeks Yuzu will probably reveal his brand new EX program at Skate Canada. What do you think will that be?
  8. I emailed Prospera Place and got a response regarding food and water -"Outside food and beverage are not permitted in our building. You may bring in a small bag with you to have at your seat as long as it doesn’t obstruct other patrons. Please keep in mind that bags may be subject to search on your way in the venue." I specifically asked about water bottles and, as far as I'm concerned, the fact that they didn't say I couldn't bring in a bottle translates to being able to, at least, bring in an empty bottle and fill it up once inside. And if they want to confiscate a few protein bars, I'll be okay.
  9. Today
  10. Hey guys, we made a new thread for the ISU Announcement and all the posts have been moved here: Later - before each event - we'll get more info about which countries will be geoblocked from streaming etc so we can keep the discussion in one place!
  11. Problem solved. ISU will be streaming everything (GPs, Euros, 4CC, Worlds) on youtube, geoblocking on what I guess is a per-event-basis, based on broadcasting rights. Since Europe (except possibly those couple of nordic countries) apaprently don't have any broadcasting rights... we should be good to watch youtube without any VPNing and stuff. In theory, anyway. edit: it's bad form not to give a source. https://isu.org/isu-news/news/145-news/12726-isu-events-live-on-youtube?templateParam=15
  12. Well, I only looked at one kind of judging bias, so I can only confidently say whether a judge is “good” or “bad” in that particular area. It's possible that a judge is biased in one way but not biased in another (for instance, in one of the competition threads, someone mentioned that Adrienn Schadenbauer seemed to stick more closely to the guidelines than the typical judge, though we can see that they are a pretty biased judge in terms of national bias). Although I do think being biased in one way should raise questions about a judge's commitment to objectivity in general. Unfortunately, it's a lot harder to come up with ways to test other kinds of bias--reputational bias, for instance, seems like it would require someone to make judgments about how a skater ought to be scored and compare the judges actual score to that. I don't necessarily think there's anything wrong with that, but it will naturally be easier to attack than a subjective-judgment-free method like the one that I used here.
  13. This is mindblowing. Thank you so much, @shanshani for your amazing work! I agree that there are countless different types of "bias" (personal, national, cultural, reputational, corruptional, ...) and some of them are really tricky to verify. I ask myself: if the majority of judges used 'inofficial guidelines' (starting order, world ranking position or reputation, number of senior seasons/titles/records, flag, etc.) and there was one judge, who 100% sticked to the official written guidelines... could that be pointed out as well? Sure, extremely biased judges must be banned, but at the same time they must be replaced by proper alternatives as well. Maybe it would help, if we pointed out the work of the least biased judges as positive examples that others should follow. What do you think?
  14. Hi!!!! According to their website, prohibited items http://prosperaplace.com/a-to-z-guide/ it looks like we can't bring in even a bottle of water. Do they check everyone's little backpacks? I intend on bringing in a water bottle and snacks.... but i am going to email them about the water thing. They say no bottles of any kind, which is crazy. I, for one, like to stay hydrated. And how would they deal with someone who is hypoglycemic or diabetic (neither that I am....) so I will ask them. I'm thinking that most of their restrictions are there for hockey games. We behave in a far more civilized manner.
  15. The H&L rice field survived the typhoon. May the people who were affected stay safe and have a fast recovery.
  16. I'd find it absolutely incredable if UK is geoblocked as finding figure skating to watch here is like looking for a needle in a haystack. Literally!!!!
  17. Delicious looking Yuzu-inspired dishes and drinks... ETA: Why am I not surprised that these specials that are available at ToshI's exhibition are all Yuzu-related?
  18. It's probably a sign of mild insanity that I did all of this, but I'M FINALLY (sort of) DONE (with part one), HERE IS THE JUDGE CALL OUT POST YOU'VE ALL BEEN WAITING FOR it has statistics. a lot of statistics. and names. anyway, I hope you guys read it and let me know if any part is unclear or requiring further explanation, so I can clean up the writing/formatting/whatever before I try to get people outside this website to read it too I guess it's only mildly Yuzu related though forgive me for hijacking this thread edit: as someone recommended to me, I should probably host it as an article on a separate website as well. does anyone have any recommendations?
  19. ehh, screw it, let's publish. comprehensive review of judging bias here would welcome comments before I try to disseminate this more widely
  20. Part 3: Judges against Rival Skaters Forthcoming, check after the Grand Prix
  21. Limitations/Other considerations Despite the fact that so many judges show evidence of nationalistic bias by the methods used here, I actually think that they are somewhat limited in their ability to catch nationalistic bias. (Which ought to indicate how bad the problem is.) First, a judge can quite easily avoid detection by only being biased “when it counts” ie. in only a selected number of competitions, when there are medals or spots at stake and a tweak in judging can make the difference. Because this bias will be washed out in the average with all of the other competitions where the judge was not being biased, this type of bias, if detectable at all using these methods, will only be so after a judge has built up an extremely substantial judging record. Another related type of nationalistic bias this metric is not good at catching is when judges selectively underscore only direct competitors, but score everyone else normally. Because all non-home-country skaters are averaged together, this type of bias has little impact on the overall ZDifference, and consequently it is very difficult to detect it using the method here. I hope to address this in a future segment which will examine whether judges from federations with top competitors underscore the direct competitors of their skaters. Stay tuned for that. Thirdly, if there is bloc judging going on, or any other score-trading or collusion scheme to increase a skater’s score, it will function to weaken the evidence for biased judging by reducing a biased judge’s apparent difference with the other judges on the panel when scoring their own skaters. On the flip side, if there is a score-trading or collusion scheme to lower a competitor skater’s score, that may wrongly introduce apparent bias on the part of the judge from the home country of that skater. (Nonetheless, I don’t think this creates a major concern about wrongly flagging judges, because unless there is some grand conspiracy against all of the skaters from a certain federation, one instance of apparently biased judging will be washed out when averaged with the rest of the scores that judge has given.) However, we can use this same dataset to take at least a partial look at bloc judging, and we will do so in a forthcoming post, so stay tuned for that too. Finally, it is also possible for judges to “game the system” by overscoring non-direct competitor skaters, thereby inflating the “Z-other” portion of the calculation. I don’t think this is a major concern now, but if somehow this were adopted as the primary means of track judges’ bias, then it would be a concern in the future. (Although they would chiefly do this by overscoring lower ranking competitors, which might actually be a good thing, since it would combat reputation bias). Conclusions There is, as always, more to say than I have said, but as I don’t want to spend literally forever on this or produce something so long no one wants to read it, I will end it here and leave it to others to raise questions if there is a gap you would like me to fill. The overall conclusion is pretty clear: figure skating judging has a massive problem with nationalistic bias, and many judges are extremely blatant in their favoritism for their own countries’ skaters. This also raises questions about judges’ commitment to objectivity in general. Though we have only looked at a very specific type of bias, and arguably not even the most significant one (just the easiest one to tackle using statistical methods), one might suspect that lack of objectivity in one respect bleeds in to lack of objectivity in others. What about other forms of bias that are also often discussed, like reputation bias and big fed bias? If judges are so demonstrably biased in one way, it seems reasonable to suspect that they are also biased in others.
  22. Discussion First, the most obvious and basic question: what does bias mean? Here, I’ve been using bias to mean a demonstrable, mathematical difference between how a judge scores their own skaters and other skaters. By claiming a judge is “biased,” I don’t mean to impute anything in particular about their psychology, nor am I making any claim about the origin of the bias. It may be conscious, or it may be unconscious. It may be a deliberate attempt to manipulate the results, or simply the same kind of lack of objectivity fans often display concerning the skating of their favorite skaters. Personally, I am not overly concerned about the causes of the bias, only that it exists--after all, let us remember that these judges, through their scoring, determine young athletes’ futures. I don’t know about you, but I don’t want the future of these young, incredibly hard-working people to be determined by a group of people who are unable to be objective, whether that lack of objectivity is the result of corrupt intent or simply clouded judgment. However, let me address some specific explanations for bias which I do not believe are true or at least have problems, as well as other attempts to defend the judges from criticism. 1. The bias is just due to cultural preferences. People tend to look more favorably upon programs they are culturally familiar with and score them higher, and obviously a judge and a skater from the same country are more able to culturally understand each other. First, skaters from the same country often have very different skating styles and skate very different types of programs, so it strains credulity to believe that there is something quintessentially “Russian” or “Canadian” or whatever about all skaters who skate under the same flag. Sasha Trusova’s programs look completely different from Alina Zagitova’s which are completely different from Alena Kostornaia’s, and they’re even coached and choreographed by the same people! Second, if this were true, we would expect to see judges from culturally similar countries scoring each others’ skaters higher. For example, Canada and the US are two extremely culturally similar countries, so Canadian judges should overscore US skaters and vice versa. Fortunately, the sheet is built such that it’s easy to test this proposition (just change the country code inside each judge’s individual page to see how they score specific other countries’ skaters) and in fact, we do not see this. Canadian judges score US skaters like a completely unbiased judge would, and so, too, the other way around [FORTHCOMING]. There are a few exceptions to this (former USSR countries’ judges tend to score Russian judges higher, although the level of bias is not quite as severe as it is for their own skaters. Also, I believe South Korean judges score North Korean skaters higher, but whether those two countries have similar cultures seems quite debatable. We will look at this in more detail in a future post), but I believe there is a better explanation for the exceptions. In general, culturally similar countries do *not* score each others’ skaters higher. 2. It’s just human nature to be biased. We should realize that judges are humans too and not robots. Judges are not all the same. They do not all show evidence of bias. For instance, Glenn Fortin (CAN), Katharina Heusinger (GER), Andreas Waldeck (GER), Ayumi Kozuka (JPN), Shizuko Ugaki (JPN), Linda Leaver (USA) all have reasonably substantial judging records that do not evince any substantial evidence of bias. This clearly indicates that it is possible for judges to be unbiased, at least when it comes to nationality-related bias. Evidently, not all human have this particular human-nature related flaw. Even among the judges who are biased, there are considerable variations in the degree of bias shown. The worst offenders, for example Salome Chigogidze (GEO), Nicholas Russell (GBR), and Elena Fomina (RUS), have ZDifferences in the range of 1.5-2, whereas the lowest statistically significant differences are in the range of 0.5. This shows that it is certainly possible to reduce the level of bias of the judges overall by getting rid of the worst offenders and replacing them with less biased judges, even if some low level of bias (say, less than 0.5) is difficult to get rid of and may not be practically significant enough to be worth dealing with. 3. Your metric looks at judges’ bias by comparing them to the mean of the judging panel. Doesn’t that assume that the mean of the judging panel is right? But sometimes it’s the outlier judge that is right, and the other judges that are wrong. It may be true that the outlier judge is indeed “right”--I avoided making any assessment of what a skater “should objectively” have scored because those types of assessments lead to unproductive fan wars and involve a level of personal judgment that I did not want to introduce to this study. However, let me note that outlier scores only “count against” judges if they align with expected patterns of nationalistic bias. If a Japanese judge scores a Filipino skater way above average because only that judge was being objective and the other judges all underscored him due to some other form of bias (reputation, small country, etc.), then that will actually count very mildly in favor of the Japanese judge. Only if a Japanese judge scores a Japanese skater way above average does it count “against” that judge. But in that case, there are still at least 3 other data points to consider (I only start calculating p if a judge has scored her own skaters at least 4 times), and if the judge only shows a pattern of “correcting” the scores for her own skaters, one has to begin wondering whether they are truly being objective. Again, judge is not labeled “biased” for having scores that deviate from the mean, a judge is labeled “biased” if there is a difference between how they score their own skaters and other skaters. A judge who scores both groups 2 standard deviations above the mean would not trigger the flagging formula, despite having scores that are way out of whack with the other judges. It is only the difference between the judges own scores for her skaters and other skaters that matters. The only situation in which I think this may be a substantial concern is for judges from a small federation who have only judged one or two unique home country skaters. In that case, it’s possible that a personal (rather than nationalistic) preference, or a genuine belief that a particular skater (but less a large group of skaters, as that would affect Z-other as well, and thereby decrease ZDifference) is underscored, which just so happens to coincide with a national flag, gets “wrongly” flagged as nationalistic bias. This being the case, we may wish to be a little bit more lenient on small-fed judges. However, this defense hardly applies to the judges of large and powerful federations like Russia and the United States, who will judge many different skaters from their own country through their judging career, and whose skaters cannot credibly claim to be underscored because of their nationality.
  23. Methodology The basic idea behind determining judge bias was to take each judge, compare their scores for each skater to the scores the rest of the panel gave, quantify how far off the panel judge was, and then put all the numbers together in one place so I could compare how a given judge scored their own skaters in comparison to the rest of the panel, versus how they scored other skaters. I then ran a standard statistical test on the difference in order to figure out the strength of the evidence the judging record provides of bias. In order to do all this, I started by recording all the scores given by all the judges in every senior-level international competition (excluding competitions like Japan Open and Shanghai Trophy) since the scoring system changed in 2018. This gave me a bunch of spreadsheets which you can find here. (Fortunately, I had some help at this stage, thanks people at planethanyu!) If you open these spreadsheets, you’ll notice they look something like this: Odhran Allen Doug Williams Maria Fortescue Veronique Verrue Andreas Waldeck Lorna Schroder Miwako Ando Mean IRL USA ISL FRA GER CAN JPN Yuzuru Hanyu JPN 164.01 157.34 172.44 161.55 166.7 167.4 170.74 165.74 (This example from the 2018 ACI men’s free skate is completely random, obviously.) As you can see, it lists judges, judge nationality codes, skaters, the skater’s nationality code, and the scores given by each judge, as well as the average of all the judges’ scores. From this data, I first determined how much higher or lower each judge scored each skater compared to the rest of the panel by subtracting the mean of all the scores from the individual judge’s score. This produced what I called the score deviation. Using our friend Yuzu as an example here, this produces: Odhran Allen Doug Williams Maria Fortescue Veronique Verrue Andreas Waldeck Lorna Schroder Miwako Ando IRL USA ISL FRA GER CAN JPN Yuzuru Hanyu JPN -1.73 -8.4 6.7 -4.19 0.96 1.66 5 So Odhran Allen scored Yuzu 1.73 points below the other judges, Doug Williams 8.4 points below, etc. This is what’s shown in the second block on the competition sheets. Now, unfortunately, we can’t just leave it at that, because we want to compare how judges score skaters across competitions, segments, and disciplines, in order to get the largest, most robust data sets. -8.4, while already a lot even in the men’s free, would be absolutely massive in the short, and also bigger in ladies or pairs than in men’s. Therefore, in order to make these score deviations more comparable, I had to standardize them into z-scores. Note that this is an extremely common method of standardizing data. Here’s how it’s calculated: first, I have to determine something called the standard deviation of the judges’ scores. Standard deviation is another one of those common statistical measures, and it quantifies how spread out a set of numbers is from the average of those numbers. So if judges are all over the place on someone’s scores, then the standard deviation will be relatively high, whereas if judges are all more or less in agreement, the standard deviation will be low. In this case, the standard deviation of Yuzu’s scores was a fairly typical (in men’s) 4.85. To calculate the z-scores, we just need to divide each of our score deviations by the standard deviation of Yuzu’s scores. One way to think about the z-score, then, is that it tells you how many standard deviations a judge scored a skater above or below average. Here is what happens to Yuzu’s score deviations once they’re converted into z-scores. Odhran Allen Doug Williams Maria Fortescue Veronique Verrue Andreas Waldeck Lorna Schroder Miwako Ando IRL USA ISL FRA GER CAN JPN Yuzuru Hanyu JPN -0.36 -1.73 1.38 -0.86 0.2 0.34 1.03 Z-scores typically range from -2 to 2, though occasionally you’ll see numbers outside that if a judge *really* disagrees with the other judges (this occur at roughly a 5% frequency). Underscoring (ie. scoring below the other judges) turns into a negative z-score, while overscoring (ie. scoring above the other judges) turns into a positive z-score. Using z-scores actually build in a measure of leniency for the judges. If there’s a lot of disagreement within the panel about a skate, then the z-score will be less extreme than the raw score difference, so a big difference with the average will “count” less. On the other hand, it does mean that if there’s a lot of agreement among panelists, someone who is a lone outlier may have a more extreme z-score in comparison to the raw score difference. But overall, the z-score makes it a bit easier for biased judges to hide. Oh well, it’s not like they hide very well in the first place. Once these z-scores are computed for each judge, competition, and segment, all of the z-scores associated with a given judge are collected together into one sheet, the individual judge’s sheet. You can find these in the big judges database. If you click on any judge’s name in the sheet labeled “Judges”, you’ll be taken to the individual sheet for the judge you clicked, where you can see the collected z-scores for all the competitions they’ve judged. On the left you’ll see a bunch of summary statistics. I’ll explain those in a second. On the right you’ll see z-scores. As you can see, they are labeled by skater and nationality, and at the top there’s a code which tells you which competition and segment is being shown in a given set of columns. This is composed of [Year][Competition Code] [Segment Code]. Using a formula, all of this data is split into two groups--z-scores for home country skaters and z-scores for other skaters. The z-scores for home country skaters are averaged, producing Z-home on the left. Same thing for the z-scores for other skaters, producing Z-other. The difference between the two, which is what we’re ultimately interested in, is then calculated as ZDifference. (You’ll also see these metrics in the overall judge summary). You can think of the ZDifference as representing the degree of bias that a judge has shown to home country skaters. In terms of raw score as a rule of thumb, a ZDifference of 1 represents about 7-8 points in men’s, 6ish in ladies and pairs, and 6-7 in ice dance over the course of a competition. In other words, a judge who has a ZDifference of 1 will give, on average, 7-8 bonus points in men’s, etc., versus what they would typically give a skater who is not from their home country. Of course, the degree of bias shown is not the only thing that matters when it comes to assessing a judge’s level of bias. If a judge shows a ZDifference of 1 but has only judged their home country skater a couple of times, it’s possible that that ZDifference is simply due to random chance or other factors. On the other hand, if the ZDifference persists across many competitions, we can be much more confident that the judge is biased. This is where the metric p comes in. p is another standard statistical measurement, and in the context of our data it represents the chance out of 1 that an unbiased judge, ie. one that scores home country skaters no differently than other skaters, could arrive at a record that evidences equal or greater bias than the actual judge’s record purely by accident. So in other words, the lower p is, the higher the chance that there is some kind of systematic difference between how a judge scores home country skaters and other country skaters. By convention, a p value below 0.05 is considered statistically significant, and that is the standard I will be using to flag judges, though in many cases we’ll see that p will be far below that threshold. For instance, in the case of Russian judge Olga Kozhemyakina, p=0.000000000000003 (note that that’s a 0.0000000000003% chance an unbiased judge would produce a record equal to or worse than hers). It’s better not to see statistical significance as a binary thing, however. Instead, you should become more and more suspicious of a judge as p drops. Notice that many judges whose records were not flagged nonetheless have fairly low p values--I suspect that many of these judges’ records will start getting flagged as more scores start coming in. By considering ZDifference and p jointly, we can make a full assessment of a judge’s judging record. ZDifference tells you the severity of the historical bias, whereas p tells you the probability it came about by chance. I used a standard statistical test for the difference between two means, the Welch's t-test, in order to calculate p. I used the one tailed version of the test, because we’re only looking for bias in one direction. Welch’s rather than Student’s was used because I noticed that judges with extensive judging records tended to have different variances for home country skater scores versus other scores. (If you didn’t understand this paragraph, that’s okay--unfortunately, it requires a lot more effort to explain how calculating p works in depth, so I will have to pass on doing that. If you would like to learn, I would recommend you take an introductory statistics class.)
  1. Load more activity
×
×
  • Create New...