Beating Bulgaria

Goal“As Darren Bent sent a left-foot sitter sailing high over the bar and into the stands, the sighs of despair could be heard all round Wembley.”

“You wouldn’t put your house on this lot getting through.”

“England stayed on the field to wave to the fans but most had already left. There was nothing to celebrate.”

This was The Sun's take on England's 2-2 draw with Switzerland during the Euro 2012 qualifiers. As it happens, Capello's 'lot' did get through, and so if you'd staked your house on them you would have been celebrating all the way to Poland and Ukraine.

Compare these comments to those after the victory against Bulgaria nine months earlier:

“Good news for Capello who, pre-match, had observed how he was a God before the World Cup and a monster after it. He may never get back to being a God. But there’s no doubt this was a monster result.”

or to the comments following the away win three months later, again against the hapless Bulgarians:

“It was a cracking display from England against a dreadfully poor home team.”

The England players were roughly the same, the time period not particularly long, the manager the same, and yet you might think you were reading about different teams. What explanations could there be for this disparity in perceived performance?

A matter of perception

One answer may lie in the clue that the Bulgarians were dreadfully poor. In fact they are currently ranked 50th in the world, while the Swiss are ranked 12th (England are currently 6th)1.

When judging performance, we are more likely to favour performance against an easier challenge than against a more testing one. At least this was the conclusion that Good and Cresswell came to when they studied how examiners rated the work of students facing different levels of difficulty in question papers.2

As part of research undertaken when the GCSE was being designed, they studied what happened when candidates were given papers of different levels of difficulty. They assigned candidates to take easy and hard question papers, and then asked examiners to judge the standard of work they observed. Their conclusion was the following:

“The awarders tended to consider fewer candidates to be worthy of any given grade on harder papers or, alternatively, that more candidates reached the required standards on easier papers.” (Good & Cresswell, 1988, p. vii)

This finding has come to be known as the Good and Cresswell effect. If judgement alone was used in monitoring the standards of examinations, whenever a hard question paper is set, performance will appear to have declined, whereas whenever an easier question paper is set performance will appear to have improved. As papers vary in difficulty over time, perceptions of standards would see-saw up and down.

Footballers: Mice or men?

So does the Good and Cresswell effect hold for England's performance at football? Or are they genuinely mice one day and lions the next? The Daily Telegraph's Jeremy Wilson rates the England players after every match they play. If you average the player ratings you can derive an estimate of the overall team's rating according to his professional judgement. You can then perform a test of association to see if his rating of the team is related to the standard of the opposition as defined by its Fifa World Ranking, a statistically derived measure of ability.

The findings for the Euro 2012 qualifying campaign are summarised in the table below. The Pearson correlation of 0.87 suggests a strong relationship between the rating given by Jeremy Wilson to the England players and the quality of the opposition they faced. The tougher the challenge, the harsher Mr Wilson's verdict. Clearly we cannot generalise too far from this limited amount of data, but it is a promising start.

Telegraph ratings and Fifa World Rankings, Euro 2012

Opposition Date Telegraph rating Ranking
(as of 6 Nov 2012)
Switzerland 07/09/2010 5.50 15
Switzerland 04/06/2011 5.50 15
Montenegro 12/10/2010 6.10 44
Montenego 07/10/2011 6.10 44
Wales 06/09/2011 6.40 57
Wales 26/03/2011 6.50 57
Bulgaria 03/09/2010 6.91 55
Bulgaria 02/09/2011 7.20 55

So what?

What can we learn from the Good & Cresswell effect? The obvious immediate application in the late 1980s, when it was discovered, was the introduction of more statistical support for decisions made in maintaining standards in examinations. Currently, there seems to be a trend towards asking more demanding questions of candidates in national examinations. The desire is that this will lead to higher standards; the Good & Cresswell effect, however, if unchecked, may lead to a perception of lower performance from candidates. This potential for biased perception would need to be accounted for in the standard setting process if candidates are not to fail unfairly.

Finally, while I only examined a partial data set here, I would end with a word of caution. Next time you read a review of an England performance, check the Fifa world ranking of the opposition. Beating Bulgaria doesn't necessarily mean we'll win the next world cup.

Chris Wheadon


  1. FIFA/Coca-Cola World Ranking, Accessed 6 November 2012 and 10 January 2013.
  2. Good, F.J., and Cresswell, M.J. (1988). Grading the GCSE. London: Secondary Examinations Council.

Share this page