"Journalists need to be data-savvy. These are the people whose jobs are to interpret what government is doing to the people. So it used to be that you would get stories by chatting to people in bars, and it still might be that you'll do it that way some times. But now it's also going to be about poring over data and equipping yourself with the tools to analyse it and picking out what's interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what's going on in the country."It had a nice ring to it, and an almost utopian quality, evoking a vision of lots of highly skilled, tech-savvy investigative reporters wielding powerful database query techniques on behalf of the public, an intrepid army of modern-day I.F. Stones defending the public's right to know.
What Berners-Lee did not address were the economic imperatives and the business models of a newspaper industry that is yielding ground to the internet he helped popularize. For experienced, knowledgeable journalists to do what he is proposing is an expensive, labor-intensive undertaking, and that raises the question of who is going to pay for it. Our big newspaper organizations are doing what they can (think of the months that the NYT's team of journalists spent analyzing the "cablegate" documents from Wikileaks).
Outside the print media, you don't find a lot of this on the internet, where more and more of the public gets its news, and where the business model is very different. Publishers are usually looking for "content" that is free or as cheap as possible. It's important to keep down costs, because the web advertising on a screen doesn't bring in nearly what a page of print advertising does, so the content that accompanies it tends to be priced in proportion.
And that's why Berners-Lee's phrase, "data-driven journalism" is a two edged sword. Yes, intelligent investigation of data can result in powerful journalism. But "data-driven journalism" can also result in a race to the bottom, one that may increasingly dispense with journalists altogether.
One current example was mentioned in the NYT's Digital Domain blog the other day.
This month, StatSheet unveiled StatSheet Network, made up of separate Web sites for each of the 345 N.C.A.A. Division I men’s basketball teams. Beyond statistics galore, each site has what the company calls “automated content,” stories written entirely by software, including write-ups of the team’s games, past and future. With a joking wink, StatSheet’s founder, Robbie Allen, refers to these sites as the “Robot Army.”Here's an example, the site for the Wisconsin Badger. It's probably not going to sweep anybody off their feet, because the Badgers already get plenty of press coverage. But it's not meant to. These websites are really designed to draw traffic from smaller schools that don't get much press.
Each team’s StatSheet Web site is located at a freestanding Web address, conveying the sense that it is wholly invested in the interests of that school’s fans. (To find a domain name, a fan first visits http://statsheet.com/#websites.)
The software is imbued with the smarts to flatter each particular team. The same statistics, documenting the same game, produce an entirely different write-up and headline at the opposing team’s page.
It all kind of makes sense, in its upside-down way. Great sports writers are artists. Not-so-great sports writers take game statistics and weave them together with formulaic clichés to create their stories. Why pay someone good money to write clichés when you can just give a robot a dictionary and a database? And when you think about it, there are lots of other areas where this approach would probably work.
What's it going to be? Will data-driven journalism take us to new heights, as Tim Berners-Lee hopes it will? Or will it be part of a new race to the bottom (line), with robots creating "good enough" content that drives out good journalism, the way bad money drives out good? Stay tuned.