Using Data from Leagues Around Europe to Improve xG for the SPFL

Expected goals are having a moment right now. The stat most associated with advancedHayes.jpg stats in football has recently gotten think-pieces from outlets like The Guardian, The Telegraph,, and Fan-Sided . Football stats writer Mike Goodman said today on twitter, “I’m so happy that xG is getting increased exposure that I’m gonna grit my teeth and grin through all the re-litigation of its underpinnings.” More people now at least are aware of xG and have an idea what the stat measures than ever before. With each new “mainstream media” (shudders) piece on expected goals, we start to have the same arguments and discussions about the metric that were had years ago.

While the fight over xG’s importance may be starting all over again in places, if you dive deeper into so called “analytics twitter” you will find great pieces discussing efforts to improve the metric. Marek and Nils articles I linked there are the type of work that made me interesting in stats in football in the first place. Furthermore, those pieces are what drove me to want to track these stats for Scottish football when I could not find any publicly available.

With that being said, Christian Wulff and I have been working on improving expected goals in Scotland. The wonderful people at Stratagem have given us more data than we

Moussa Dembele; xG Monster

could have imagined to help accomplish this. Before Stratagem, I was reliant on pulling shot “location” information from the BBC live-tracker of SPFL matches, simply because there was no other public data available. The “Beeb” xG model served it purpose well, giving us a surprisingly decent look expected goals in Scotland when it did not exist before. However, we can now do better.

Seeing honest-to-goodness x an y coordinates for shots (AND passing locations?!) in the data Stratagem sent Christian and I was a coming to god moment. No longer would be be reliant on terms of “center of the box” from the BBC. In addition to using actual x and y coordinates and shot types, we also now had such information as how many defenders were in between the shooter and the goal and how much defensive pressure the shooter was under when shooting.

With this additional info, Christian and I set out to improve our xG model for the SPFL. Common criticisms of xG is that is does not take defenders and defensive pressure into account, so this new Stratagem data would allow us to address this. Good in theory, right? Well, believe it or not, you run in to a sample size issue when you become more granular and only have a season’s worth of data. Trying to get enough shots to come up with a decent xG model for Scotland where 2 defenders were in between the shooter at 37 x, 44 y on the pitch proved to be a challenge. Luckily, the SPFL is not the only league Stratagem has data for.

Sheet 1-3

While it seems to be a recent trend to try and asses how your Gran would do in the SPFL, most would agree that the level of play in Scotland is below leagues such as the EPL, Bundesliga, and La Liga. No shame in that, it is just reality. However, there are plenty of leagues in Europe and around the world that most would agree are at a similar level to Scotland. Leagues such as Eliteserien in Norway, the Swiss Super League, and others. In total, we had 11 leagues worth of data (Turkish, Swiss, Swedish, Greek, Bundesliga 2, Dutch, Austria, Australia, the English Championship, Norwegian, and Scottish league data to be specific) that gave us over 400,000 shots. With data from what I have dubbed the “League of Average Leagues”, we now can use these defensive metrics from Strata and create small zones on the pitch where shots take place to calculate xG values (though thanks to Nils and Marek’s articles I mentioned above, we are now thinking about how to implement their ideas for our model!).

Heat Map

With these figures calculated based on location and defensive pressure, we can develop a heat map similar to above of the probability of scoring a goal. Going from light green, as the least likely to score, to dark red, as the most likely to score, we can see where a team should be trying to take shots from in order to score. The “danger zone” concept is shown well here, with red filling the 8 yard box and the darkest red in between the frame of the goal. Compare that to the green in the sides of the pitch or outside the box. Clearly you are more likely to score in the red areas than the green (I’m looking at you, Fassai El Bahktouri).


With this new model, Christian (on his new stats and tactics vertical from the 90 Minute Cynic, xCynic) and I are also planning on doing some new things and changing some graphs and maps we have done previously. We have made some improvements to our xG game maps, xG and xA player maps, and some new team graphs that we hope will help further understanding both advanced stats in football and Scottish football as a whole.

This article was written with the aid of StrataData, which is property of Stratagem Technologies. StrataData powers the StrataBet Sports Trading Platform, in addition to StrataBet Premium Recommendations.

B.U.R.L.E.Y. Makes His SPFL Premiership Season Predictions

We have arrived upon the dawn of a new Scottish football season. While Scottish football is often the subject of derision by others due to it’s perceived to lack competition for the title (because in leagues like Germany and Spain, it’s REAAALLLY anyone’s ballgame on who will win the league right?), the opening day of the fixtures sees an endless array of possibilities for the season ahead. Before kick off on match day one, it’s a twelve-way tie in the table, with everyone even on goal differential.

With the beginning of the SPFL Premiership season days away, it is time to un-cage the SPFL probability predicting, insult spewing, and your favorite team hating bot that we all know and hate called B.U.R.L.E.Y. If you are new to the high speed world of advanced stats in Scottish Football, I introduced B.U.R.L.E.Y. in the middle of last season (right after Robbie Nielson’s last match with Hearts that saw them beat Rangers 2-0, my how times have changed eh?).

To quickly describe what B.U.R.L.E.Y. does, using the expected goal data that I collect throughout the season, we can come up with the win, lose, and draw probabilitiesburlz for every match (my methods to do this are described in the link above, though I will be tweaking these methods a bit this season which I will describe later). With these probabilities we can get a projected point total for each club in the SPFL up to the split in the league. Since we do not know who each team will be playing after the split, we can then take B.U.R.L.E.Y.’s projected points per game total for the season and multiply that by 5 for the 5 games after the split and we will have B.U.R.L.E.Y.’s projected point total for each club in the Premiership.

With the new season approaching, I wanted to make some improvements to B.U.R.L.E.Y. Last season, B.U.R.L.E.Y. used a club’s average xG per match, average xG conceded per match, the league average xG, and a a club’s xG average home and away. The results were ok. No prizes for predicting Celtic to take the title, but B.U.R.L.E.Y. thought it would finish Rangers-Hearts for second and third, obviously off from what occurred. B.U.R.L.E.Y also missed on relegation, picking Kilmarnock to go down. Though B.U.R.L.E.Y. did think that Inverness CT were relegation playoff bound, while the Jags earned automatic relegation when the season shook out.

With these in mind, I set out to improve B.U.R.L.E.Y. The fine people at Stratagem, who will been providing me with SPFL data that will allow me to do some cool new stuff this season, reached out to me and suggested I take a look at the effect playing at home has on a match. Looking at my data, I noticed that over the past two season, SPFL home clubs averaged 10% higher xG than away clubs. With that in mind, I decided to give all home clubs that 10% bump in xG in B.U.R.L.E.Y.’s calculations.

SPFL xG H_A.png

Another quandary with B.U.R.L.E.Y. was how to come up with the numbers used to calculate B.U.R.L.E.Y.’s projection for Hibernian’s return to the SPFL Premiership. In most leagues across Europe, recently promoted teams often struggle upon their promotion to the top flight. In B.U.R.L.E.Y.’s Norwegian cousin, R.O.N.N.Y., without data for the previous season for newly promoted Eliteserien clubs Sandefjord and Kristiansund, I took the average for the league last year and went a standard deviation down to come up with their beginning of the season R.O.N.N.Y. figures. These numbers made sense due to the lack of data and tendency for clubs promoted to the Norwegian top flight to struggle.

However, we do have data for what Hibs did last year. Yet, I am not sure Hibs being statistically dominant over the likes of Dumbarton, Ayr, Raith, etc in the Championship will tell us much about how they will do in the Premiership. While in most leagues, it is typically smaller yo-yo clubs that are promoted to come up and struggle, we have seen “big clubs” promoted from the Championship the past two season in Hearts and Rangers. We can use these two clubs as guides to what Hibs might do in the top flight.

With this in mind, I took the average figures used for B.U.R.L.E.Y. for Hearts and Rangers in their first year back in the SPFL Premiership. I then took those averagesburlz 2.jpg and added the averages for places 3rd-7th for the last 3 seasons (the range of predicted finishes for Hibs I have seen by various experts in the media). Averaging all of these figures, I came up with the underlying estimates B.U.R.L.E.Y. will be using in his pre-season predictions for Hibs. Fully admitting there’s a bit of guess work in this, I think this is a decent method to try and quantify what the Easter Road club will do this season.

While coming up with figures for B.U.R.L.E.Y. to predict Hibs campaign took some thinking, their Edinburgh neighbors Hearts present a problem for the model as well. The irony is not lost that B.U.R.L.E.Y.’s first appearance came at perhaps the height of Hearts season last year, while now B.U.R.L.E.Y. is back during some very frustrating lows for the Jambos. Last December, it seemed very possible for Hearts to finish second. Fast forward to today, where they have sacked manager Ian Cathro days before the beginning of the season.

xG Cathro

There was clearly a difference in Hearts performance the first half of the season and the second. Under Nielson, Hearts were averaging an xG of just under 2 per match. Under Cathro, that average dropped to 1.67, and even that figure is inflated by xG outputs of over 4 twice against Kilmarnock. If you take out those Killie outliers out, Hearts averaged an xG per match of 1.34 during their time with Ian Cathro as their manager, similar numbers to the likes of Ross County and Inverness CT last season.

Therefore, when calculating B.U.R.L.E.Y.’s predictions for Hearts season, I decided to completely exclude the data from Robbie Nielson’s reign last year, only using the data from Ian Cathro’s time in charge. Of course, the day after these calculations were done, Cathro was sacked. Yet, with having to replace their manager mere days from the start of the season and the difficult start of the season Hearts face, I decided to keep the figures for Cathro’s time only in the calculations.

Burley Table

After all those adjustments were made, I had B.U.R.L.E.Y. spit out the average number of points he thought each team would have at the split after 10,000 simulations of this season in the SPFL (and boy are Leigh Griffiths legs tired, yeah I made that joke two years in a row, so what?). We see perhaps unsurprisingly, B.U.R.L.E.Y. is putting Celtic on top at 88 points at season’s end, 20 points ahead of rival Rangers. Aberdeen move to third according to B.U.R.L.E.Y., while Hibs and Hearts will be neck and neck, with B.U.R.L.E.Y. thinking Hibs will be slightly better, only averaging 0.35 points better over these 10,000 simulations.

B.U.R.L.E.Y. thinks Motherwell will jump into the Top 6 with their potent attack, though their fate will largely hinge onto whether or not the Steelmen can hold onto Louis Moult. In fact, comparing Motherwell’s points per game last season to what B.U.R.L.E.Y. is projecting, the Well has the largest jump in expected points per game from last season to what B.U.R.L.E.Y. is expecting this year. Clearly, B.U.R.L.E.Y is impressed by Motherwell’s attack and is confident enough in their back-line and keepers improving this season.

On the other half of the table, B.U.R.L.E.Y. predicts this is the season St. Johnstone do not make the top 6 after an unprecedented run of success for the Saints, though only projected to finish a point behind Motherwell. He also still sees a gap between the Burlz 3Saints and the rest of the bottom six, while only 5 points separating 8th and 12th place. Killie will be fighting for their Premiership lives in the relegation playoff come seasons end according to B.U.R.L.E.Y., while Dundee fans will have to enjoy winding up their Tangerine neighbors about sending them down at Dens while they can, because B.U.R.L.E.Y. sees them heading “doon” this season.

It is at this point that I would like to clarify that these predictions come straight from the model. I made my “sight unseen” predicted table with just my stats from last season as a refernce. While my “human” predictions were similar to B.U.R.L.E.Y.’s, having the same top 5, there was some differences. I think St. Johnstone will regress this year, but still finish top 6 and Hamilton will be the ones heading down this season. It will be interesting to compare my thoughts to B.U.R.L.E.Y.’s over the season.

Similar to last season, I will be tweeting out individual match B.U.R.L.E.Y. probabilities each week. I will also be publishing polls to see what my twitter followers think will be the result of matches. We will see whether come year’s end whether B.U.R.L.E.Y. or my twitter followers were better prognosticators of the SPFL.