While recovering from a dodgy curry in London, I decided I wanted to look deeper into food standards across the city. Considering all the information was available online, I thought it would be pretty straight forward, but little did I know how many rabbit holes I would find.
The British government publishes all food hygene ratings online. But, there is no option to download the database in one go, so I wrote a scaper to pull the data from the website. The scraper ran over a couple of hours on a cheap Google Cloud instance and collected data on a total of 515,748 locations covering all available categories of location. These ratings cover everything from pubs, bars, and restaurants to supermarkets and nurserys. For each location, I collected the name of the business, the longitude/latitude, the postcode, the rating, and a couple of other fields.
As you might expect of data scraped from a government website, the data was incredibly dirty. Amongst the problems was someone in Northumberland clearly not doing their job properly, as almost all locations missing a rating were from there. 60,000 locations had missing longitude/latitude coordinates, which meant they were unusable in the final map.
The most difficult part of cleaning up the data was dealing with the names. These locations were obviously entered manually, and independently by multiple people. These people seemingly all had their own opinion on how to spell common establishment names. For example, there are more than 100 different ways that McDonald's has been spelled, each with its own variation on capital letters, punctuation, and name. Some of the more common spellings are as follows, I can only assume all of these refer to the same chain.
McDonalds | McDonalds Restaurants Ltd | McDonalds Restaurant | McDonald's |
Mcdonalds | Mcdonalds Restaurants Ltd | McDonald's Restaurants Ltd | McDonalds Restaurants |
McDonald's Restaurant | McDonalds Restaurants tdmited | McDonalds Restaurant Ltd | McDonald's Restaurants |
MCDONALDS | McDonald's Restaurants Ltd. | McDonald's Restaurants tdmited | Mcdonalds Restaurant |
MCDONALDS RESTAURANTS LTD | MCDONALDS RESTAURANT | McDonald's Restaurant Ltd | MCDONALD'S |
Mc Donald's | Mc Donalds | Mc Donalds Restaurant | Mc Donalds Ltd |
But even McDonald's was a fairly benign example when compared to The Co-op. After spending hours trying to clean up the dataset on Google's OpenRefine, I was still finding new names that referred to The Co-op and their subsidiaries. Eventually I gave up, but I believe I have captured almost all spellings for the biggest brands under one name.
After processing and cleaning up the data, I was left with 431,758 locations with a numeric rating (not all ratings are on a 1 to 5 scale). The good news is that the vast majority of establishments across the UK had a rating of 5.
Most of the big chains, had great ratings. I was surprised to see that McDonald's, Burger King, and Subway all had perfect 5 out of 5 ratings.
The notable exception of the big fast food chains was KFC, falling quite a bit below the national average. In fact, KFC was among the lowest of even its cheaper fried chicken compeditors.
I wasn't expecting KFC to be rating so much lower than what are typically seen as its lower standard competitors, but I think that everyone agrees that KFC is going downhill.
The next stat really surprised me. I would have thought that food standards in the capital would have been held to a higher standard than elsewhere in the country, but the opposite is true.
I'm not sure why Glasgow is doing so well, but is far ahead of any other city for its average food standard rating. After I first saw this, I thought something went wrong, but double checking the data, it looks correct. London definitely needs to pick up food standards in this area after falling so low on this graph.
Now we get to the interesting bit, a map of food standards across London.
I decided to average food standards by postcode, so the first step was to get a map of London's postcodes. I found a shapefile of all UK postcodes from here here. This shapefile used the same coordinate reference system (CRS) as the collected data so everything lined up nicely. After merging some postcodes (e.g SW1X, and SW1Y into SW1), and loading the data into geopandas, I was able to calculate the average food rating by postcode.
However, it turns out that some postcodes span the river Thames. This makes for a very ugly map when plotted. The solution is to merge this map with another, geographically accurate map. That shapefile comes courtesy of the British Government here. Unfortunately, this shapefile uses a different CRS than the rest of our data. Shapefiles provided by the the GIS arm of the British government seems to use the British National Grid (BNG) coordinate system. The BNG system and the standard GPS coordinate system (WGS84) assume different shapes for the earth so don't work together. Thankfully, its quite easy to change one CRS into another using geopandas. Once I had converted the geographically accurate map to one compatible to the postcode map, I could subtract one from the other and end up with a map of postcodes that followed the river Thames.
The end result is what follows.
I'm afraid I don't now what happened to Wapping in this map, but otherwise it seems fairly consistent with my own experience. Next to the river, you're pretty safe. But enter the band to the north, or that localised pocket to the south, and you're straying into unknown territory.
All graphics used in this post are SVG vectors, so you should be able to zoom in on them as much as you like. If you would like a copy of the cleaned dataset, please get in touch!