On Algorithmic Newspapers and Publishing

dna

Yesterday saw the launch of The Long Good Read (TLGR) newspaper, a newspaper publishing experiment between Newspaper Club and The Guardian. Newspaper Club's blogpost covers some of the whys behind TLGR, such as what happens when you play at the edges and seams of a 24 hour digital news organisation and printing presses that can print a run of just a single broadsheet or tabloid paper.

The Long Good Read

It's an experiment in which I find myself playing the role of newspaper editor, I guess it's something to put on the CV :) It's an experiment in which I put together a weekly newspaper in about an hour, "one person, one hour" as Tom says. While the 1st issue took slightly longer than that, I'm still getting used to the tools, the 2nd issue will probably take less time than it takes me to write this. Although that probably says more about my writing than the tools.

Speaking of which, how's this whole thing put together?

Well, from an editor's points of view, it's very much standing on the shoulder of giants. We like the phrase "Editors, readers and robots" because its pretty much true. This is how an issue is put together.

1) Newspaper Editors and Writers.

The Guardian is of course made of people, and those people together give the paper its voice, by deciding which articles to commission, which writers to hire and what topics, news and subjects they want them to write about. In the last 7 days the Guardian has published just over 3,100 articles, videos and podcasts.

2) Readers

Now its not going to read or watch itself, so here come the readers. As it turns out some of those articles don't always get read very widely, while others unexpectedly do. Just as the Guardian chooses what it wants to publish, the readers choose what they want to read. Sometimes what the Guardian decides to put on its front page or home page matches what the users are reading, other times everyone seems to be focusing on something else, with twitter, facebook and other sites acting as a back channel.

Editors are our first filter, the readers are our second.

3) Robots

The Guardian, like all big orgs, has a number of analytics tools, including an in-house tool called Ophan which tracks what readers are looking at, where they are coming from and various other factors which it all rolls up into a handy dashboard.

They also have an API which exposes things like Most Viewed articles, word counts, sections and such like.

We plundered those tools for data and wrote our own little "robot" (a bunch of algorithms) to surface what we hoped would be good, interesting, sometimes funny, sometimes long articles. Just before I throw together a new issue of this paper I can head off to our dashboard that presents me with about 30 "top" articles, about 1% of all articles originally published by the Guardian.

A Penultimate Step

From there I cast my critical editorial eye across the selection as often there are stories that'll become out of date quickly as happens with fast moving news stories. Last week there were several articles about the latest NSA spying & bugging, but the chances are those stories would have evolved by the time they hit our press.

Once selected I then entrust the task of putting together the paper to Newspaper Club's robot, ARTHR.

The Long Good Read

ARTHR

ARTHR has been written by Newspaper Club to handle the job of laying out newspapers, this can range from a creator very carefully deciding where everything goes on each page, to being given a simple URL and deciding for itself where headlines, images and body text should go.

It's this last option I quite enjoy, the tool as I understand it tries out various layouts before deciding on which one it thinks is the best. Sometimes it produces some crazy looking pages, but even then at least its only taking a couple of seconds rather then me figuring it all out by hand.

My roll as newspaper baron is to feed in several URLs selected from our "Top Stories" leave ARTHR to do its own thing, and then shuffle the results around bit, it rather feels like cheating.

A Data Paper, the DNA of news

As we're very much relying on algorithms to create this paper, to pick the stories and then again to lay them out, we've started to think of this as a newspaper built from data, and as an evolving experiment.

We're starting to ask ourselves the question of what happens if we not only use data to pick the news but start to expose some of that underlying (or meta) data. The people, i.e. you, we're expecting to find this paper interesting are not just interested in the news, but the news about the news. What made a story a "Top Story"? Can the metadata around a weeks worth of news tell us something about the news itself.

We tried this in issue #1 with a graph of all the main tags used from all the news stories our robots picked over, there's probably still a copy around if you want to grab one.

This week I had a quick play with looking at the "DNA" of when the Guardian publishes news. The front cover shows a weeks worth of articles, 7 columns in all, Monday over on the left, the weekend on the right. Midnight is at the top of the day, midday halfway down and the evening towards the bottom.

All of these things are created with a few lines of code and then the press of a button. Hopefully by the end of this experiment once we've gotten used to the tools, producing a paper will be as simple as pressing a few buttons, sending the results over to Newspaper Club and pressing a few more.

The Long Good Read

Then if we can do it, then its not a very far stretch to allowing anyone to do it. And once the system has learnt what you're interested in then next step is to just let it carry on for you, with the results arriving on your doorstep to read each week.

But for the moment I'd better get back to pressing a few buttons for this next issue.

Newspaper photos Creative Commons Licensed by Newspaper Club