Monday, November 29, 2010

Will "data driven journalism" empower journalists or replace them with content-generating robots?

Sir Tim Berners-Lee, inventor of the World Wide Web, was recently quoted as saying, "Data-driven journalism is the future." He was speaking in the wake of an unprecedented release of huge amounts of data about government spending in the UK, when he was asked who would possess the skill sets needed to analyze complex government databases.
"Journalists need to be data-savvy. These are the people whose jobs are to interpret what government is doing to the people. So it used to be that you would get stories by chatting to people in bars, and it still might be that you'll do it that way some times. But now it's also going to be about poring over data and equipping yourself with the tools to analyse it and picking out what's interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what's going on in the country."
It had a nice ring to it, and an almost utopian quality, evoking a vision of lots of highly skilled, tech-savvy investigative reporters wielding powerful database query techniques on behalf of the public, an intrepid army of modern-day I.F. Stones defending the public's right to know.

What Berners-Lee did not address were the economic imperatives and the business models of a newspaper industry that is yielding ground to the internet he helped popularize. For experienced, knowledgeable journalists to do what he is proposing is an expensive, labor-intensive undertaking, and that raises the question of who is going to pay for it. Our big newspaper organizations are doing what they can (think of the months that the NYT's team of journalists spent analyzing the "cablegate" documents from Wikileaks).

Outside the print media, you don't find a lot of this on the internet, where more and more of the public gets its news, and where the business model is very different. Publishers are usually looking for "content" that is free or as cheap as possible. It's important to keep down costs, because the web advertising on a screen doesn't bring in nearly what a page of print advertising does, so the content that accompanies it tends to be priced in proportion.

And that's why Berners-Lee's phrase, "data-driven journalism" is a two edged sword. Yes, intelligent investigation of data can result in powerful journalism. But "data-driven journalism" can also result in a race to the bottom, one that may increasingly dispense with journalists altogether.

One current example was mentioned in the NYT's Digital Domain blog the other day.
This month, StatSheet unveiled StatSheet Network, made up of separate Web sites for each of the 345 N.C.A.A. Division I men’s basketball teams. Beyond statistics galore, each site has what the company calls “automated content,” stories written entirely by software, including write-ups of the team’s games, past and future. With a joking wink, StatSheet’s founder, Robbie Allen, refers to these sites as the “Robot Army.”

Each team’s StatSheet Web site is located at a freestanding Web address, conveying the sense that it is wholly invested in the interests of that school’s fans. (To find a domain name, a fan first visits

The software is imbued with the smarts to flatter each particular team. The same statistics, documenting the same game, produce an entirely different write-up and headline at the opposing team’s page.
Here's an example, the site for the Wisconsin Badger. It's probably not going to sweep anybody off their feet, because the Badgers already get plenty of press coverage. But it's not meant to. These websites are really designed to draw traffic from smaller schools that don't get much press.

It all kind of makes sense, in its upside-down way. Great sports writers are artists. Not-so-great sports writers take game statistics and weave them together with formulaic clichés to create their stories. Why pay someone good money to write clichés when you can just give a robot a dictionary and a database? And when you think about it, there are lots of other areas where this approach would probably work.

What's it going to be? Will data-driven journalism take us to new heights, as Tim Berners-Lee hopes it will? Or will it be part of a new race to the bottom (line), with robots creating "good enough" content that drives out good journalism, the way bad money drives out good? Stay tuned.


Cybergabi said...

Thanks, Peter, for this knowledgeable and interesting read. I am stuck between marveling at the technological possibilities of today and dire images of a future robot army of journalists, of machines taking over, spreading and manipulating information bits according to the will of those who pay the most to put them in place. It's a scary thought, particularly for those of us who have a scientific education and know how any set of data can be interpreted in many different ways.
I'll stay tuned.

George H. said...

This is more than a passing worry to many of us in this business. There is yet another path that has already been chosen by some, and that results in doubt about the credibility of data. Look how data has been manipulated and re-manipulated in the train issue. Not that many people had to be fooled, even when journalists attempted to present accurate data. Too polite to ask the tough question: where did this information come from and why is it accurate?
You write so well on this issue. If the answer is found in journalism "institutes," doing the heavy work, then a way has to be found so that these workers are paid. This is a two-pitcher discussion, because you also pinpoint one of my favorite topics, the formula taking over from art. I worry that a contributing reason may be the illiteracy of the nation.