Monday 6 February 2017

Bristol Post: cut and paste journalism? Share the data

There is one news outlet in the city whose coverage is insightful, cuts to the core of the city's problems and of whom every article is worth a read.

Yes, we refer to The Bristolian. Being ad-free there is no need for central-HQ agendas to be pushed; no need to try and generate click-bait content at the lowest cost per article, and so instead they can write independent content.

There is also another news outlet in the city, The Bristol Evening Post, which is part of The Trinity Group, as is the

We've been avoiding covering the Bristol Evening Post since it's "witty" Bikes and Lorries April 1 2015 article. Every link we make to a low-value web site devalues our own rating in google's PageRank algorithm, and since most of their coverage is bollocks there's no real point.

However, today it's time to link to an article, albeit through a nofollow marker: Revealed: The number of cyclists involved in crashes while undertaking other vehicles, covering the 5-6 cyclists hit a year by going to the left of cars in those little painted bits of bollocks on the road.

This turns out to be a seminal piece of work

  1. Because it appears in[Cox17], Tara Cox, Revealed: Hotspots in Cambridge for accidents where cyclists undertake other vehicles, Trinity Group Cambridge News , 2017, where 4-5 cyclists are injured/year.
  2. And in [Grant17] Rob Grant 2017, Dozens of cyclists have been involved in collisions while undertaking, new figures show, of the Manchester Evening news, where the collision rate is 11/year, no variance/stddev supplied
  3. and [Grant17a], Rob Grant, How many Birmingham cyclists are involved in accidents while undertakingBirmingham Mail,  2017. Here the collision rate is "an average of 8/year", again, without any variance.

As a news outlet that believe in weakly-defensible data to back up all our ill informed opinions, we are always pleased to see our press outlets following our strategy of "have an opinion, grab some meaningless statistic and then turn into an article defending our prejudices. Which as our detractors will point out, we do all too often.

But we do like to see that weakly-defensible data. Indeed, we're happy to critique the DfT's data gathering processes as a relic of the twentieth century, and suggest modern, big data alternatives.

Which is why, given the broad covering of this seminal piece of work, we'd really like to see the data.

Preferably

  1. The cleaned up DfT data, either in the painfully generic CSV format, or something more efficient and with tighter typing, like Apache Avro.
  2. The data science notebook used to take the data and produce the numbers which got published. A Jupyter Notebook pushed to github would be fine.
Reproducible analysis of the results of an experiment is something which is becoming a big issue in science: given the same data, can different scientists come up with the same answers. Publishing the data and the analysis code is the foundation to this.

At least this dataset is going to be small, it's not like the datasets lurking in CERN CASTOR , or worse, the feed expected to come off the Square Kilometre Array, a feed that has everyone fucking scared right now. 

So to the Evening Post, as one datascience organisation to another,: if you are going to write articles on traffic issues in the city,  even if they are copied and pasted from the same piece of tier-2 prose seen in Manchester, Birmingham and Cambridge: show us the data, or STFU.