data journalism – Nieman Lab

In Spain, a new data-powered news outlet aims to increase accountability reporting

Hanaa' Tameez — Tue, 09 May 2023 18:32:26 +0000

In March, Spain passed a gender quotas law aimed at raising the number of women in leadership roles across the country. Among other requirements, the law calls requires political parties to put forward equal numbers of male and female candidates in municipal and national elections.

After months of extracting and analyzing information from parliamentary websites, documents, and other public records, Demócrata — a recently launched news outlet focused on Spanish government and public policy — published a series finding that in general Parliamentary sessions, the ones that get the most attention, men gave nearly two-thirds of the speeches. Women were underrepresented on congressional committees related to “state matters” like defense, economic affairs, and budgeting, but make up the majority of members on committees focused on equality, gender violence, and children’s rights.

Stories like these are what Demócrata aims to provide news consumers in Spain: Data-based journalism that helps to holds politicians accountable. That series, for example, included a methodology of how the journalists obtained the data, organized it, and decided what to include. (For instance: “Participations of less than one minute duration have also been left out. They mostly deal with oaths to take possession of seats, questions of order, requests to speak…They accounted for less than 1% of the total interventions collected.”)

“It brings a lot of transparency to the legislative process,” said Pilar Velasco, a veteran investigative journalist and Demócrata’s editorial director. “When the noise of politics occupies the entire news cycle, it generates a space for opacity that isn’t reported on.”

The site fills a gap in Spain, which will hold its general election in December. “It’s a good year to launch a news outlet with a focus on politics and policies,” said Eduardo Suárez, the head of editorial for the Reuters Institute for the Study of Journalism. “[Demócrata’s] value proposition is to report on public policies and Parliamentary debates in much more detail than mainstream publications. Newspapers in Spain are much more focused on politics than on public policies, and this might provide an opening for a publication like Demócrata, whose goal is to cover those policy debates in a more nuanced and granular way.”

Demócrata is the country’s only news outlet that specifically covers Parliament and public policy from an accountability lens daily, according to the Iberian Digital Media Map by Iberifier, a European Commission–funded initiative. (Another initiative in Spain, Civio, was founded in 2012 and focuses on data-powered watchdog reporting on the environment, healthcare, and the justice system.)

Demócrata has a team of seven. It’s funded by an initial investment from its board of directors and from advertising, though Velasco wants to expand into sponsorships, paid events, and subscriptions. The site has multiple sections: Agenda (an archive of the weekly newsletter that summarizes what’s happening in Parliament in the coming week), Actualidad (updates and play-by-play of laws and amendments), Políticas (news on proposed and ongoing policies), Quieren Influir (economy stories), and an analysis and opinion section. The site’s initial target audience is political insiders and politics junkies, but Velasco said the stories are written so that general audiences will be able to understand them as well. The Agenda newsletter has around 2,000 subscribers.

Demócrata’s goal is to use its data expertise to tell stories that other outlets can’t. Leading up to the outlet’s launch, the data team spent months building the software it uses to scrape and analyze data that, while technically public, is disorganized and difficult to parse. When the country’s far-right party, Vox, called for a vote of no confidence against the current ruling socialist party this past March, Demócrata published an analysis of Vox’s legislative footprint in the current parliamentary session, finding that the party has so far failed to pass any laws.

Velasco, who was an investigative reporter for Spain’s largest radio network Cadena SER, where she investigated political corruption cases, experienced first-hand the challenges of telling data stories for radio, where it can be difficult to delve into numbers. As a 2018 Yale World Fellow and one of the co-founders of Spain’s Investigative Journalists Association, she also saw American sites like Politico cultivated audiences for in-depth political reporting. When Demócrata founder David Córdova (who is also the director of a public affairs consulting firm, Vinces) approached her for the project, she saw it as a chance to experiment and try something new. (Demócrata is editorially independent from Vinces.)

“The mission is permanent scrutiny of institutions,” Velasco said. “Through continuous supervision of the work of politicians and legislators, information transparency, we believe, can strengthen institutional credibility. [The news] that comes to us from Parliament is often the political discussion, statements, politicians fighting with each other, and press conferences. But the legislative branch is a pillar of the State where many things happen that regulate life in society. It is what orders us and regulates us. And all of that wasn’t being covered in Spain with the specialization it deserves.”

One of Velasco’s goals in the next few months is to continue the work on a platform, already in progress, that will monitor updates to every piece of legislation in Parliament in real time. Down the line, she hopes to launch a chatbot that can answer reader questions. Demócrata has also partnered with Political Watch (a group of academics who monitor Parliament), design studio Flat26, and the think tank Ethosfera, which is helping Demócrata with its own ethics and transparency policies.

“We sort of feel like a hub for people who already had innovative ideas about parliamentary information,” Velasco said. “We get a lot of pitches for [collaborations]. When that you’re a small outlet, to grow you have to put springboards in places to get to the next level, and you can’t get there on your own.”

Image generated using Midjourney.

How UC Berkeley computer science students helped build a database of police misconduct in California

Hanaa' Tameez — Wed, 02 Feb 2022 14:16:02 +0000

In 2018, California passed the “Right to Know Act,” unsealing three types of internal law enforcement documents: use of force records, sexual assault records, and official dishonesty records.

Before the passage of SB1421, California had some of the strictest laws in the United States to shield police officers’ privacy, according to Capital Public Radio, and police misconduct records were deemed “off-limits”.

Six news outlets — Bay Area News Group, Capital Public Radio, the Investigative Reporting Program at the University of California, Berkeley, KPCC/LAist, KQED, and the Los Angeles Times — got together to request those documents, forming the California Reporting Project. Now, 40 news outlets are part of the initiative.

They sent public records requests to more than 700 agencies across the state, from police departments and sheriffs’ offices to prisons, schools, and welfare agencies that have police presence on site. if you’ve ever submitted a records request to a government agency, you know it’s not easy or straightforward to extract information from documents, if you can even get them at all.

But to sort through the more than 100,000 records they’ve gotten back since 2018, Lisa Pickoff-White, KQED’s only data reporter and the data lead on the California Reporting Project, enlisted the help of data science students from UC Berkeley to help organize the data.

The Data Science Discovery Program was founded in 2015 and is part of Berkeley’s Division of Computing, Data Science, and Society. Every semester, the program pairs around 200 students with companies and organizations that have data science–related projects they need help completing. Students spend six to 12 hours a week working on their assignments, for which they receive course credit.

The students have worked with media companies on editorial and operational projects, including the San Francisco Chronicle’s air quality map and the Wall Street Journal’s effort to analyze its source and topic diversity using natural processing language. When newsrooms, especially local ones, are strapped for engineering resources, the Berkeley students fill a gap to help journalists complete more ambitious projects.

“It’s a really natural fit. [We want] students to get a deep understanding of the context of the data analysis that they’re doing, and to consider human context and the implications of the insights and conclusions they’re making,” Data Science Discovery program manager Arlo Malmberg said. “All the things we emphasize in the data science program are at the core of what journalists do as well, in bringing forward the context of a problem in a story for readers, and in providing analysis of the causes of those issues.”

Pickoff-White co-selected four students to work with the California Reporting Project to build a police misconduct database from the records received. They all had particular interests in policing because of various connections in their personal lives. Usually in their data science courses, she said, they work individually on assignments and applications, but they were excited to work as a team on something tangible.

“The purpose of the project really resonated with me,” Pruthvi Innamuri, a sophomore computer science major who worked on the project, said. “During 2020, with a lot of police misconduct happening, I noticed a lot of communities feeling severely hurt and oppressed. I wanted to be able to use my computer science background to work on a project that’s able to better inform people in some way regarding this issue.”

Innamuri and his classmates built programs to recognize basic information from the police records, like names, locations, and case numbers. That made it easier to group files together and organize data for the journalists to analyze.

Some of the stories that have come out of the data from the records include a Mercury News story about how Richmond has more police dog bites than other cities and how Bakersfield police officers broke 45 bones in 31 people in the span of four years. The database isn’t complete yet and the students’ work helps make future data collection easier.

“I don’t know if we’d be able to do this without them,” Pickoff-White said. “None of these newsrooms would be able to automate this work on their own.”

Photo by Lagos Techie.

No explaining allowed! A new journal promises just-the-facts description, not theory or causality

Joshua Benton — Mon, 26 Apr 2021 16:45:49 +0000

A major trend in digital journalism over the past decade has been the rise of the explainer: the let’s-step-back article or infographic-packed video that takes a big issue in the headlines and, well, tries to explain it. Vox built an entire editorial model around it.

But on the flip side, a very common complaint about the media (particularly from those on the political right) is that reporters spend too much time decoding intentions, describing trends, and deriving meaning — and not enough on reporting. just. the. facts.

There’s a similar debate in academia. How much should researchers invest in answering what versus why and how? Will your work be better if it investigates a hypothesis that might explain a phenomenon? Or would it be more useful to make your goal simply to describe that phenomenon?

In the field of media research, those on Team Describe got a valuable new ally today: a new publication called the Journal of Quantitative Description: Digital Media. Its co-founders are Princeton’s Andy Guess, the University of Zurich’s Eszter Hargittai, and Penn State’s Kevin Munger, all of whom work on issues in and around journalism. Here’s their explanation of their no-explaining model:

We would not be undertaking this endeavor if we thought our journal would simply add to the accumulating stock of existing scholarly venues, mirroring its structure and pathologies through some inescapable process of institutional isomorphism. On the contrary, our hope is that this intervention into the social science journal publishing space pushes the boundaries of the feasible along multiple dimensions methodological, disciplinary, and financial.

We are here to address some of the failures in the existing structure of publishing outlets, particularly those that cater to quantitative social science researchers. Such failures are many:

1. Trending away from “mere” description. There are macro trends in social science that affect all journals. Many of these trends are good; we applaud the growing attention to causality, for example, and to concerns about generalizability that drive attention to sample composition. But as we describe below, these trends come at a cost to quantitative work that can provide a descriptive foundation for research agendas.

2. Lack of clear standards for substantive importance. The topics that are deemed important too often reflect path dependence, the biases of established scholars and institutions, approved theoretical frameworks from the dominant canon, and the focus of media interest. The whiplash of the past few years of digital media research, the attention paid first to “echo chambers,” then to “fake news,” now to “radicalization,’ is inimical to the accumulation of knowledge. All of these topics are worth studying, but we need a more stable metric for “topical importance” than media attention.

3. Adherence to disciplinary and geographic boundaries. Most peer journals are explicitly connected to a single discipline, and all of them are overly concerned with the United States and Western Europe. The topic of digital media is of obvious importance to the entire world.

4. Artificial constraints. Most journals have strict requirements for the length and format of what they publish, making it difficult to find outlets for important contributions of modest scope or idiosyncratic topic. (How many of us have written 8,000-word papers around one interesting finding, or have shelved neat findings because we did not feel like writing an 8,000-word paper around them?)

5. Inefficiencies of peer review. Most will agree that the current mode of journal reviewing is suboptimal. Too many authors wait months only to be told that their submission has been desk rejected; at the same time, too many scholars receive an endless stream of reviewing requests.

Their new journal is meant to address these issues. It has no preset limits on length; it sets its field as “digital media, broadly construed” rather than one of the many disciplinary niches within it; its acquisition process aims to reduce the number of papers that go out for peer review (and increase the share of them that get published). And it’s only interested in “quantitative description…a mode of social-scientific inquiry [that] can be applied to any substantive domain.”

Also, it’s all open access, and it doesn’t require a publishing fee (at least not now).

JQD:DM is interesting both as a concept and as a container for interesting work. The first issue, out today, is probably packed with more papers I’d be interested in reading than an academic journal has had for a long time. A few of the highlights:

Cracking Open the News Feed: Exploring What U.S. Facebook Users See and Share with Large-Scale Platform Data, by Andy Guess, Kevin Aslett, Joshua Tucker, Richard Bonneau, and Jonathan Nagler

In this study, we analyze for the first time newly available engagement data covering millions of web links shared on Facebook to describe how and by which categories of U.S. users different types of news are seen and shared on the platform. We focus on articles from low-credibility news publishers, credible news sources, purveyors of clickbait, and news specifically about politics, which we identify through a combination of curated lists and supervised classifiers.

Our results support recent findings that more fake news is shared by older users and conservatives and that both viewing and sharing patterns suggest a preference for ideologically congenial misinformation. We also find that fake news articles related to politics are more popular among older Americans than other types, while the youngest users share relatively more articles with clickbait headlines.

Across the platform, however, articles from credible news sources are shared over 5 times more often and viewed over 7 times more often than articles from low-credibility sources. These findings offer important context for researchers studying the spread and consumption of information — including misinformation — on social media.

Value for Correction: Documenting Perceptions about Peer Correction of Misinformation on Social Media in the Context of COVID-19, by Leticia Bode and Emily K. Vraga

Although correction is often suggested as a tool against misinformation, and empirical research suggests it can be an effective one, we know little about how people perceive the act of correcting people on social media.

This study measures such perceptions in the context of the onset of the COVID-19 pandemic in 2020, introducing the concept of value for correction. We find that value for correction on social media is relatively strong and widespread, with no differences by partisanship or gender. Neither those who engage in correction themselves nor those witnessing the correction of others have higher value for correction.

Witnessing correction, on the other hand, is associated with lower concerns about negative consequences of correction, whereas engaging in correction is not.

An Analysis of the Partnership between Retailers and Low-credibility News Publishers, by Lia Bozarth and Ceren Budak

In this paper, we provide a large-scale analysis of the display ad ecosystem that supports low-credibility and traditional news sites, with a particular focus on the relationship between retailers and news producers. We study this relationship from both the retailer and news producer perspectives.

First, focusing on the retailers, our work reveals high-profile retailers that are frequently advertised on low-credibility news sites, including those that are more likely to be advertised on low-credibility news sites than traditional news sites. Additionally, despite high-profile retailers having more resources and incentive to dissociate with low-credibility news publishers, we surprisingly do not observe a strong relationship between retailer popularity and advertising intensity on low-credibility news sites. We also do not observe a significant difference across different market sectors.

Second, turning to the publishers, we characterize how different retailers are contributing to the ad revenue stream of low-credibility news sites. We observe that retailers who are among the top-10K websites on the Internet account for a quarter of all ad traffic on low-credibility news sites.

Nevertheless, we show that low-credibility news sites are already becoming less reliant on popular retailers over time, highlighting the dynamic nature of the low-credibility news ad ecosystem.

Generous Attitudes and Online Participation, by Floor Fiers, Aaron Shaw, and Eszter Hargittai

Some of the most popular websites depend on user-generated content produced and aggregated by unpaid volunteers. Contributing in such ways constitutes a type of generous behavior, as it costs time and energy while benefiting others.

This study examines the relationship between contributions to a variety of online information resources and an experimental measure of generosity, the dictator game. Results suggest that contributors to any type of online content tend to donate more in the dictator game than those who do not contribute at all.

When disaggregating by type of contribution, we find that those who write reviews, upload public videos, write or answer questions, and contribute to encyclopedic collections online are more generous in the dictator game than their non-contributing counterparts. These findings suggest that generous attitudes help to explain variation in contributions to review, question-and-answer, video, and encyclopedic websites.

Characterizing Online Media on COVID-19 during the Early Months of the Pandemic, by Henry Dambanemuya, Haomin Lin, and Ágnes Horvát

The 2019 coronavirus disease had wide-ranging effects on public health throughout the world. Vital in managing its spread was effective communication about public health guidelines such as social distancing and sheltering in place. Our study provides a descriptive analysis of online information sharing about coronavirus-related topics in 5.2 million English-language news articles, blog posts, and discussion forum entries shared in 197 countries during the early months of the pandemic.

We illustrate potential approaches to analyze the data while emphasizing how often-overlooked dimensions of the online media environment play a crucial role in the observed information-sharing patterns. In particular, we show how the following three dimensions matter: (1) online media posts’ geographic location in relation to local exposure to the virus; (2) the platforms and types of media chosen for discussing various topics; and (3) temporal variations in information-sharing patterns.

Our descriptive analyses of the multimedia data suggest that studies that overlook these crucial aspects of online media may arrive at misleading conclusions about the observed information-sharing patterns. This could impact the success of potential communication strategies devised based on data from online media. Our work has broad implications for the study and design of computational approaches for characterizing large-scale information dissemination during pandemics and beyond.

Information Seeking Patterns and COVID-19 in the United States by Bianca Reisdorf, Grant Blank, Johannes Bauer, Shelia Cotten, Craig Robertson, and Megan Knittel

In this paper, we describe how socioeconomic background and political leaning are related to how U.S. residents look for information on COVID-19.

Using representative survey data from 2,280 U.S. internet users, collected in fall 2020, we examine how factors, such as age, gender, race, income, education, political leaning, and internet skills are related to how many different types of sources and what types of sources respondents use to find information on COVID-19. Moreover, we describe how many checking actions individuals use to verify information, and how all of these factors are related to knowledge about COVID-19.

Results show that men, those with higher education, higher incomes, and higher self-perceived internet ability, and those who are younger used more types of information sources. Similar patterns emerged for checking actions.

When we examined different types of sources (mainstream media, conservative sources, medical sources, and TV sources), three patterns emerged: 1) respondents who have more resources used more types of sources; 2) demographic factors made less difference for conservative media consumers; and 3) conservative media were the only type of source used less by younger age groups than older age groups.

Finally, availability of resources and types of information sources were related to differences in factual knowledge. Respondents who had fewer resources, those who used conservative news media, and those who engaged in more checking actions got fewer answers right. This difference could lead to information divides and associated knowledge gaps in the United States regarding the coronavirus pandemic.

All interesting stuff, and there’s more of it.

But it’s also a fun thought experiment to consider what the approach of JQD:DM would look like if it was being used in the world of journalism rather than academia. I’ve always believed that people who want “just the facts” from news outlets wouldn’t actually like it if media companies moved in that direction. (Wanting “just the facts” is often just a cultural signal for conservatism. Trump supporters were far more likely to say they want “just the facts” than Clinton supporters in 2016; a strict allegiance to fact-based reality was not a hallmark of the Trump administration.)

On the other hand, I know there are a thousand things I’d love to write about — that I think would be interesting information that would make the world an ever-so-slightly better place — but which don’t have a particular analytical hook attached to them. “This is some really interesting data I discovered/gathered/generated” is more likely to lead to posting a dataset on GitHub than writing a story for a news site.

I don’t think abandoning analysis and explanation makes any sense for news organizations — especially at a time when subscriber-based business models make the delivery of benefits/service more key to revenue streams than it was decades ago. But I do wish there were more spaces for “quantitative description” in journalism.

I think of Jeremy Singer-Vine’s email newsletter Data is Plural, which highlights “useful/curious” datasets. I think of The Pudding, which “explains ideas debated in culture with visual essays,” but does just as much to be a platform for compelling quantitative data. And I think of some of The New York Times’ best interactives, like its ridiculously popular 2013 dialect map, which are more like UIs for datasets than “stories” or “explainers.” People like this stuff! Let’s do more of it.

Photo by Mika Baumeister.

How Eviction Lab is helping journalists cover a spiraling housing crisis

Sarah Scire — Thu, 10 Sep 2020 16:40:09 +0000

No single, authoritative source exists to track evictions in the United States. It can be confusing, even during the best of times, to know how many evictions are happening at any given moment.

It is not the best of times. The pandemic and a patchwork of emergency orders have made the numbers and laws around eviction even harder to keep straight. That’s where Eviction Lab, part of Princeton University, hopes to help.

The Eviction Lab maintains a national eviction database and makes the 83 million eviction records they’ve collected available for analysis or merging with other data sources. Their maps and reports are free for publications to customize and embed. As Covid-19 has worsened the housing crisis, researchers have also started live tracking evictions in 17 cities and explaining what various moratoriums, guidelines, and orders actually mean for the nation’s renters.

The imperative to find ways to communicate with the public and share resources with journalists comes from the very top.

Eviction Lab’s founder and director, Matt Desmond, is a contributing writer for The New York Times Magazine and his book Evicted: Poverty and Profit in the American City appeared on bestseller lists before winning the 2017 Pulitzer Prize for nonfiction — unusual for a fieldwork-based book written by an academic. Desmond, a sociologist by training, captured mainstream attention with vivid writing and a richly detailed chronicle of how evictions often function as a cause, not just a result, of poverty.

As Desmond told NPR this July, the housing crisis is not new:

Every year in America, 3.7 million evictions are filed. That’s about seven evictions filed every minute. That number far exceeds the number of foreclosure starts at the height of the foreclosure crisis. So before the pandemic, the majority of renters below the poverty line were already spending half of their income on housing costs or more. And 1 in 4 of those families were spending over 70% of their income just on rent and utilities.

When you’re spending 70, 80% of your income on rent and the lights, you don’t need to have a big emergency wash over your life to get evicted. Something very small can do it.

Or something very large — like a pandemic.

Since Covid-19 took hold in the U.S., millions of Americans have lost their jobs and many millions more find themselves grappling with insufficient child care and unexpected health care costs. Although Eviction Lab won’t make predictions about the exact number of Americans facing eviction during the spiraling crisis, Desmond has written it’s in the millions. (Emily Benfer, a law professor and Eviction Lab collaborator, estimated 28 million in July and a more recent CNN report put the number around 40 million.)

One of the most important pieces of journalism all year — @KyungLahCNN with the stories of those being evicted and losing their livelihood every day due to the economic stresses of Covid-19.

America is in crisis. Don’t look away. pic.twitter.com/IuBsW5sMm1

— Josh Campbell (@joshscampbell) September 3, 2020

Faculty assistant Anne Kat Alexander said Eviction Lab, in partnership with Benfer, built the Covid Housing Police Scorecard to help reporters parse eviction moratoriums and compare the state-level protections in place for renters.

“The scorecard was filling a pretty media-centric need that we’d seen as various eviction moratoriums were coming out across the country. We sit and look at eviction policy all day — and have for years — so we have an extra level of expertise in interpreting them,” Alexander said.

The scorecard unpacks the policies in each state and ranks them. (If you’re wondering, Massachusetts tops the list with 4.15 stars. Eight states, including Texas and Tennessee, share the bottom spot with exactly zero stars between them.)

Alieza Durana, a policy journalist working at the lab as a “narrative change liaison” through a Chan Zuckerberg Initiative grant, said she splits her time between walking reporters and researchers through resources, making connections to sources or community organizations, and commissioning journalistic work.

“Given that housing insecurity — and particularly the perspective of someone experiencing housing insecurity — has been so underreported in media, part of our grant money from the Chan Zuckerberg Initiative goes toward commissioning those works,” Durana said. The works, so far, include “docu-poetry” and reported pieces, and there’s a documentary on the history of eviction on the way.

Eviction Lab acknowledges its database of evictions, though the most comprehensible available, remains incomplete. The federal government and most state governments don’t track evictions, leaving records to be gathered county court by county court. A number of states, because of inconsistent digitization practices and varying privacy and public records laws, almost certainly underreport evictions. And Eviction Lab is only attempting to count the legal evictions. When a landlord changes the locks, turns off utilities, harasses a tenant, or uses other illegal methods to force tenants out, the eviction goes unrecorded.

The Eviction lab media guide says there are “countless untold stories in these data.” Where could you start?

The live tracker is useful for reporters covering one of the 17 cities that it tracks. There’s also data on evictions initiated during the pandemic for relatively small amounts of money — leaving families homeless over a few hundred dollars. Journalists can get the big picture using the database, which includes eviction records from 2000 to 2016, or start collecting their own records for a more updated look.

Durana recommended looking for recurring names on recently filed evictions. “We know there’s often one landlord or a handful of landlords driving a lot of the eviction filings in a place,” she said. “Whether you’re in Milwaukee or Houston or Richmond, starting with PACER or court records is a good way to zero in on what that story looks like.”

Durana noted that some populations are disproportionately affected by evictions due to discrimination in the housing market and policies. Eviction Lab’s maps can be used to investigate where evictions have clustered.

“We know that Black communities, communities of color, families with children, and folks experiencing domestic violence all face high rates of eviction,” Durana said. “It’s something to keep in mind as you report these kinds of stories out.”

For additional information, Eviction Lab recommends connecting with legal aid societies (as New Hampshire Public Radio did for this piece) and organizations (some are listed on Eviction Lab’s sister site, Just Shelter) in your community.

“Big tech is watching you. Who’s watching big tech?” The Markup is finally ready for liftoff

Sarah Scire — Tue, 25 Feb 2020 19:22:07 +0000

Five weeks. That’s how long it took The Markup — the new digitally savvy investigative publication, focused on tech accountability, that launched today — to find someone who could send out its email newsletters without violating its privacy standards.

The Markup tested and rejected eight different email providers — including the industry’s 800-pound gorilla, Mailchimp — before finally turning to Revue, a small Dutch company that agreed to custom-build a newsletter with no user-tracking features. (No one had requested the option before, apparently.)

The process ended up being longer — and more expensive — than the outlet’s founders anticipated. But then again, not much about the road to The Markup’s long-awaited launch this week has been easy.

Originally slated to launch in early 2019, The Markup was dreamed up by a journalist-programmer pair — Julia Angwin and Jeff Larson — who brought on a third cofounder, Sue Gardner, who had previously led the Wikimedia Foundation. Anticipation grew for the nonprofit news organization that promised to build on the prize-winning investigations that Angwin and Larson had worked on together at ProPublica.

Just months from launch, however, chaos: Gardner and Larson forced out Angwin, the project’s most prominent public face, a move that prompted The Markup’s editorial team to resign en masse in protest. Craig Newmark, the Craigslist founder who had contributed most of the $23 million raised for The Markup’s launch, promised to look into the firing. What followed was an array of accusations and counteraccusations about management styles, techlash, and spreadsheets.

The editorial team of @team_markup has signed a statement of unequivocal support for our Editor in Chief, @JuliaAngwin: pic.twitter.com/aTRsmM6oeo

— The Real Team Markup (@MarkupReal) April 23, 2019

A few months later, the machine got rebooted, this time without Gardner and Larson. Angwin and her editorial team were reinstated and Nabiha Syed, former general counsel at BuzzFeed, was announced as president. (Angwin and Syed will both report to an independent board of directors.)

We are happy to announce that we are back at our desks and hard at work. Big thanks to everyone for your support. Please follow us all over at @team_markup to see what’s next: https://t.co/5Vv0OaMZce

— The Real Team Markup (@MarkupReal) August 6, 2019

The leadership dustup may have been a unique gift for a team-based newsroom that will pair data scientists and journalists together on collaborative investigations. The editorial team rallied behind the publication’s mission of impartial, investigative journalism — not anti-tech advocacy, as some accused Gardner of pushing for — and ended up spending a lot of time working out of Angwin’s living room as governance issues were sorted out.

“I never would have wished for those events to happen, but it was actually an excellent team-building exercise,” Angwin said. “It wasn’t about me and my vision, it was about us and our vision. We really pulled together and I think, even today, that we all feel a sense of ownership in it.”

That support made all the difference during these stressful past few months when the team was working (ok, and doing a lot of eating) at my kitchen table. We are back at the office now and so thrilled to be back to work. THANK YOU pic.twitter.com/EPd6i5pXxG

— Julia Angwin (@JuliaAngwin) August 6, 2019

“Like I said, I would not have chosen it,” Angwin added, laughing. “But I bet you a lot of companies would pay a lot of money for that amount of solidarity to be created amongst their crew.”

On opening day, The Markup crew led with an investigative feature on a previously undisclosed algorithm used by the insurance giant Allstate, two “Ask The Markup” features about DNA testing kits and online shopping, and letters from Angwin and Syed. Each article provides embed code to republish it via Creative Commons license.

That Allstate story is copublished with Consumer Reports. It’s in some ways the Platonic form of a Markup story: A large corporation (Allstate) makes important decisions (how much you pay for car insurance) using an algorithm that’s seemingly inscrutable to the people affected by it. The Markup — using its “unparalleled roster of quantitative journalists…committed to finding the true meaning in large amounts of data” — figures out that the algorithm isn’t making decisions the way it’s supposed to. That finding gets rendered confidently in prose:

…we found that, despite the purported complexity of Allstate’s price-adjustment algorithm, it was actually simple: It resulted in a suckers list of Maryland customers who were big spenders and would squeeze more money out of them than others.

…as well as in a deep technical explanation that goes far beyond what most data journalism offers: more than 5,000 words, 21 footnotes, three scatterplots, one decision tree, 10 other charts, and a “Confusion Matrix.”

The Allstate story also features an unusually detailed set of credits at the bottom of the story — an indication of the disparate sets of skills needed to do the sort of algorithm inquisition that The Markup aims for.

The site’s digital Page 1 also prominently promotes a tips page that encourages readers to share information, with a list of options also shows The Markup’s range: regular old email, encrypted messaging apps Signal and WhatsApp, snail mail (“the Post Office cannot open your mail without a search warrant”), or SecureDrop over Tor. Oh, and a signup form for that custom-built, privacy-first email newsletter.

A newsroom apart

Most of The Markup’s 19-person newsroom is based in New York. There are two staffers working from California, but they’re based in Los Angeles, not San Francisco or Silicon Valley. The distance is intentional, and it isn’t measured only in miles: While other tech publications sometimes face criticism for getting too close to the world they’re covering, The Markup is doing its own thing.

“We deliberately decided not to have a Silicon Valley bureau,” said Angwin. “We’re an investigative outlet, doing deep investigations and explanatory work that asks, ‘What does this all mean?’ And honestly, I feel like you get a better perspective for that type of work when you’re a little more removed from the industry.”

There are plenty of journalists in Silicon Valley reporting on “open secrets at industry parties” etc. We are focused on the impact of technology on society .. which means we are more likely to be reporting in Detroit or Houston or partnering with journalists in Myanmar.

— Julia Angwin (@JuliaAngwin) August 26, 2019

The site also plans to distinguish itself from tech-focused peers through what Angwin and Syed have dubbed “The Markup Method,” à la the scientific method.

It’s a three-step process:

Build. We ask questions and collect or build the datasets we need to test our hypotheses.

Bulletproof. We bulletproof our stories through a rigorous review process, inviting external experts and even the subjects of investigations to challenge our findings.

Show our work. We share our research methods by publishing our datasets and our code. And we explain our approach in detailed methodological write-ups.

Angwin, a Palo Alto native who studied math at the University of Chicago and pursued computer programming before she found journalism, said The Markup is inspired by that scientific approach of amassing evidence, sharing methods, and providing data for other researchers to challenge or build on their work. Building data sets themselves allows The Markup to do more than just “opportunistic” data journalism that makes “a pretty visual” out of existing data sets, Angwin said.

Using public records requests, web scraping, and other resource-intensive tools, The Markup hopes to tackle projects that other outlets don’t have the time or energy to complete. Don’t expect to read about product launches or earnings calls or today’s cybersecurity vulnerability — but the site does want to confront many-tentacled questions like “Is tech too big?” and “What impact are algorithms having on society?”

“We have some of the best data journalists and the most investigative firepower out there,” Angwin said. “We want to make sure we’re using it on the right problems.”

As part of “bulletproofing” their work, The Markup will present the targets of their investigations — typically technology companies and government agencies — with the data they’ve collected and the code they’ve used to analyze it before publication. “We offer them an opportunity to challenge our findings,” Angwin said. “Because the truth is that they have the most incentive to show us where we are wrong, and we want to find the flaws in our work.” (For today’s story, “Allstate declined to answer any of our detailed questions and did not raise any specific issues with our statistical analysis, which we provided to the company in November, including the code used to calculate our findings.”)

That “show your work” ethos will allow readers — whether academics, policymakers, or the general public — to see the data, code used to analyze it, methodological information, and explanatory videos to help them dig into things themselves, even if they don’t have experience in programming or data science. Angwin said she hopes other journalists, in particular, will be able to use the work to further the story and write about how the technology is affecting their own communities.

It’s not always advisable (or legal) to share original data and documents, however anonymized, and not every story will be accompanied by original data and documents. (Angwin mentioned Reality Winner, the NSA contractor currently serving a five-year sentence after the F.B.I. used clues on a printout shared by The Intercept to identify her as a reporter’s source.) But The Markup hopes to publish as much as it can; here’s the Allstate story’s data and code on GitHub.

Doing things differently

The Markup’s privacy policy is 2,771 words long, and each one of them is the result of considered thought. It promises readers they won’t be exposed to third-party tracking and that the site will not exploit or sell any of their information. Keeping this promise proved more challenging than the founder originally anticipated (as in its newsletter quest), but The Markup hopes that readers will appreciate the effort and support the nonprofit.

“If you feel grateful for the work and that we’re honoring your privacy, we hope that will encourage you to donate,” Angwin said. (Those millions from Newmark and other funders should help, too.)

Syed’s letter acknowledges the tradeoff. The Markup will have to do reader engagement and measure impact within the self-imposed limitations, which also preclude advertising, since digital ads almost always require reader-tracking elements.

Because we don’t track you, we won’t know if you like our work. We don’t know if you open our newsletter or if your cursor lingered over a particular story. We don’t have the metrics that let us approximate whether a story changed your worldview or, better yet, gave you the tools to change your world. All we have is…you. And so we will have to do things the analog way: We want to build a connection with you directly.

(As its donate page puts it: “We prefer doing things the hard way.”)

That kind of engagement approach means going back to the future. She says part of her efforts include reaching out to marketing professors and asking what audience engagement looked like in, say, 1985.

“I’ve heard that we are tying our hands behind our backs, but there must be a way to engage an audience without subjecting readers to a surveillance ecosystem,” Syed said. “The privacy policy creates tension with another trend in journalism, which is audience engagement, but I think it’s a fascinating opportunity to put our money where our mouth is and build the world we want to see.” Instead of user-tracking features, The Markup will rely on social media, direct feedback on the tools they build, event attendance, and participation in upcoming educational seminars.

Angwin said The Markup is also working on building privacy-protecting analytics tools, browser extensions, and custom forms that can structure the information readers and tipsters want to send along. (Structured data is a lot easier to analyze than responses to a callout for tips on Twitter.)

Its marketing is also distinctly analog. A striking street-art-style poster — by radical feminist film and digital studio Mala Forever — reading “Big Tech Is Watching You. Who’s Watching Big Tech? The Markup” went up last week in San Francisco. On Tuesday, New Yorkers will start seeing The Markup advertised on the subway. The publication also bought a billboard along a highway into Silicon Valley and will play videos on buildings in San Francisco this week.

“We want people to think about tech in the physical world, as they walk and live and breathe,” Syed said. The poster, she hears, is quite close to a bus stop for Google employees.

The Sigma Awards are a new successor to the Data Journalism Awards

Hanaa' Tameez — Tue, 07 Jan 2020 13:12:01 +0000

The Sigma Awards is a new international competition marking the most outstanding examples of data journalism. It will help fill the void left behind by the Data Journalism Awards, which since 2012 had been run by the now-defunct Global Editors Network.

“A bunch of us were very sad the Data Journalism Awards died, so we decided to do something about it — and try to do something a bit different,” Aron Pilhofer, the James B. Steele Chair in Journalism Innovation at Temple University and one of the co-creators of the Sigma Awards, tweeted on January 2.

The first round of the Sigma Awards will have six categories and award nine winners with an all-expenses-paid trip to the International Journalism Festival in Perugia, Italy.

The international data journalism awards are BACK as the @sigmaawards with simpler categories. Applications open for the next month at https://t.co/AB1lQb7s30 . Its winners convene at the @journalismfest in Perugia! Thanks to @reginaldchua and @pilhofer for continuing it.

— Sarah Cohen (@sarahcnyt) January 2, 2020

The DJAs were the first international ceremony to celebrate and support data-driven storytelling. In December, we wrote about the options that Pilhofer and Reginald Chua of Reuters, who were both jury members for the DJAs, were exploring to revive the awards. In 2020, the Google News Initiative will sponsor the Sigma Awards and the European Journalism Centre will host it on Datajournalism.com.

Last November, GEN announced it would fold after nine years due to a lack of sustainable funding. It had established the DJAs in 2012 and received over 600 award submissions from more than 80 countries in 2019.

Read more about the Sigma Awards here; applications for the first round of awards are due February 3.

Looking for the future of data journalism awards? Here are a few communities coming together after GEN’s closure

Christine Schmidt — Wed, 04 Dec 2019 20:15:31 +0000

If you’re seeking a community around data journalism, fear not: Several are bubbling up in the wake of the Global Editors Network’s closure, which was announced a month ago. GEN had maintained the Data Journalism Awards ceremony and Slack for the past several years.

This year, the DJAs brought in 607 project contenders (a third of them from Asia) from 62 countries, highlighting work like the investigation of the year, Hurricane Maria’s Dead from the AP, Center for Investigative Journalism and Quartz; the best data journalism team portfolio for a large organization, Argentina’s La Nacion; the best portfolio for a small newsroom, India’s Factchecker.in; and more. Unfortunately, the Data Journalism Awards, along with other assets of GEN, are currently in the liquidation process, with folks expected to bid on taking control of some; DJA project manager Marianne Bouchart is working on HEI-DA, a nonprofit promoting data journalism innovation.

In the meantime, jury members Reginald Chua of Reuters (most recently jury chair) and Aron Pilhofer of Temple University — both leaders in data journalism known for past work at The Wall Street Journal and The New York Times/The Guardian, respectively — are working with a few others to develop a new group and recognition process independent of the original awards. The Google News Initiative will be the first sponsor of the new initiative, according to Google data editor Simon Rogers, and Pilhofer and Chua plan to keep the project housed in a nonprofit.

“Our mission is to identify and honor the best data journalism around the globe…to use the awards as a center of gravity for the data journalism community around the globe to build connections, and to elevate and empower others who could learn from the kind of work being done,” Pilhofer said.

“The awards have surfaced really small newsrooms around the world working under incredibly difficult circumstances — it’s not just for the state-of-the-art data ninjas but really how people in small newsrooms with limited resources can do something else,” Chua said. And expect the categories of the as-yet-unnamed project to be in flux: “People are inventing new things every year, coming out with new methods, new presentations, new ways of telling stories,” he added. “If we don’t keep current you’re rewarding best horse and carriage in the motor show.”

I reached out to Bertrand Pecquerie, GEN’s founder and former CEO, for comment; it seems like they’re all on the same page that this new venture will not be part of GEN’s legacy but instead will have no relationship with the organization. “When a programme or a product is successful, platforms want to control it or to manage it and they just have to find third parties playing their game. As the news industry depends more and more on platforms’ money, it is not difficult to find such allies,” he said over an email. “Be sure that all my energy will be dedicated to save the DJA from unfriendly and toxic third parties or platforms. It will be the task of the liquidator of GEN to chose the best organization for managing the 2020 DJA competition.”

The new version will open award submissions later this month — “we want to make sure the world doesn’t skip a year doing this,” Chua said — but until then, he and Pilhofer would appreciate any suggestions for a name.

Every crime map needs context. This USC data journalism project aims to scale it

Christine Schmidt — Mon, 26 Aug 2019 13:56:23 +0000

A bunch of money — check. A bunch of local data — check. Brains galore — check. Future of local news? TBD.

When University of Southern California professor and Wall Street Journal alum Gabriel Kahn was given part of a grant made to USC’s journalism school, he sat down with a professor in the computer science department. He wanted to know what that department was seeing and learning and how it could be retooled with journalism — specifically, local news.

The CS folks had been contracted with LA’s traffic department for five years to analyze its data, and a giant load of data (for instance, all the signals from road directors that can be used to calculate traffic density and speed) had been stockpiled. The data just kept on coming, but could also be broken down into smaller sets for different areas and — the magic word! — localized. “Could we make these large datasets into hundreds of locally relevant datasets and then turn that into something like a local news feed?” Kahn wondered.

It was a big ask, but Kahn — along with computer science department chair Cyrus Shahabi — had big resources to work on it: The grant provides around $120,000 per year for five years, and the team now includes part-time data scientists to crunch the numbers, designers to visualize the trends, and journalists to put it all into context. This is Crosstown LA, a nonprofit news project based at USC, but it has the potential to expand to other regions.

Crosstown uses publicly available data on LA’s traffic, air quality, and crime to analyze trends and report local — really local — stories from that. Kahn compared it to the police commissioner sharing updates on the city’s yearly crime statistics. How are the numbers impacting each neighborhood? That’s the wedge Crosstown is trying to get at, though it’s still in the developmental phase (with a plan for future monetization). Some recent stories:

So far, LA crime in 2019 is dipping: “Sharp increases in housing prices in neighborhoods like West Adams have led to changes in both demographics and public safety. The neighborhood saw a 13 percent increase in crime during the first half of 2019. However, it is unclear whether this increase was due to actual higher crime or more vigilant crime monitoring by residents.”
Recent quakes trigger alarm for retrofitting: “Almost four years after the ordinance passed…only 18 percent of the buildings are earthquake ready. Nearly 9,700 building owners have filed at least the initial permits to begin work. About 2,200 have hired contractors and have projects underway. And 269 owners, according to Building and Safety, haven’t done anything at all.”
Suspect wore a hoodie: “In the City of Los Angeles, the hoodie appears in Los Angeles Police Department crime data to describe what suspect or suspects were wearing at the time of a reported crime: ‘Suspect wore hood/hoodie.’ The department has been cataloging this clothing descriptor for suspects since at least 2010, when it started making its data publicly available. But in the year following Martin’s death, the number of crimes reported with ‘Suspect wore hood/hoodie’ skyrocketed. In 2013, there were 1,243 reports, a 92 percent increase from 2012.”

Kahn hopes that as the team develops its system for analyzing and formulating the data (it’s automated at a minor level at this point, with, for instance, a slackbot that tracks reports of hate crimes), it can be scaled to more datasets and more locations. “When we do a story looking at 18 different commuting routes, we’ve created 18 different stories, in a way — we’ve created a story with 18 distinct audiences,” he said.

One wrinkle: The data isn’t always completely comparable; reporting differs across police departments, and certain types of theft might be broken down differently in different areas, for instance. And officers don’t always record the time of a crime, in which case it’s automatically filled in as midnight. How do you scale nuance? “We learned a lot of the idiosyncrasies of police data. You need to get close to the data and understand its flaws and limitations,” said Kahn. With practice, “we feel much more confident about what we can report on.”

So far, Crosstown has focused primarily on crime because data on it is widely available. But “we don’t want to be in the crime [news] business, we want to be in the data business,” Kahn said. (See Ring and its system for invoking fear into potential customers through journalism.) Stories on traffic and air quality are waiting in the wings.

In the case of the health and air quality stories, Crosstown consults with public health experts on how to responsibly interpret and share the data. Readers email Kahn for advice about the environmental safety of certain areas, he says, and the team wants to make sure people can draw the right conclusions from the work. It’s one reason they include a “How we did it” section at the end of each article explaining the data sources and interpretations:

KPCC and LAist have worked with Crosstown to publish reporting on traffic delays, and a Facebook Community Network grant of $25,000 will help Crosstown envision more ways to share its data and collaborate with more local news outlets in southern California. The team has also compiled its findings in one-pagers and brought them to neighborhood council meetings. In the future, Khan is thinking about developing hyperlocal newsletters and creating a membership program.

“If we can take regularly occurring data about crime, real estate, and public education and package it into a weekly newsletter,” he said, “that would be 110 different newsletters” across the city.

Screenshot of Crosstown’s safety map.

Is the business model for American national news “Trump plus rolling scandals”? And is that sustainable?

Lívia Vieira — Mon, 18 Mar 2019 09:30:05 +0000

Anderson: I think you have a real gap between the elite news organizations and everyone else. I have a Ph.D. student who worked at News Corp for a while in Australia. And to hear her tell it, they were utterly governed by clicks, completely governed by news metrics.

If you talk to people at The New York Times or The Guardian, they will tell you, ‘No, we aren’t like that at all — we use metrics as one of many other things and we certainly aren’t living in this culture of the click.” But many of the more local and more commercial news organizations are absolutely still more governed by reader metrics. I think of the average British tabloid: I would be very surprised if they did not still have largely clickbait view. That’s different than at elite news organizations — which tend to be the kind of news organizations that academics study.

The other thing I will say is: Journalists at these elite news organizations are trained to say the right things about metrics if you ask them. They’re trained to say very smart things. So you can interview someone and they can explain to you at length about, “Oh, well, we’re not governed by these things.” But I don’t think we have had enough ethnography of how this all works — apart from your own work, Caitlin Petre’s work, a few other people who have really done ethnographies of this stuff. And I think when you actually watch what journalists do, it may be different than what they say.

Vieira: You describe yourself as an ethnographer who studies the news. What does an ethnographer bring to studying journalism?

Anderson: I think the goal of ethnography is to understand how journalists understand their lives and their jobs, understand what is happening to them. So I’d make a difference between understanding what is happening to journalists and understanding what journalists think is happening to journalists.

For an ethnographer, the key thing is to always get what the person thinks is happening to them — what they think the Internet is doing, what they think technology is doing, what they think metrics are doing. It is in some ways, for an ethnographer, as important if not more important than what those things are actually doing in reality.

So to me, ethnography is to some degree always going to be concerned with what we might call the hermeneutical aspects of research — which is understanding how people make sense of the world. It’s not necessarily understanding the world, but it is understanding how people make sense of the world. Because ethnography allows you to spend a lot of time with people and allows you to watch what they do, not simply listen to what they say, it provides a unique access to the culture of a particular place. That’s the main value-added that ethnography brings.

Vieira: In Remaking the News: Essays on the Future of Journalism Scholarship in the Digital Age, you propose the idea of a genealogical ethnography.

Anderson: Yes, the idea was that — as somebody who studies things, spaces, places, and professions that have changed very quickly — I thought of my looking at my work 10 years later. Was it still relevant, was it still valid? So I thought: What if we combined ethnography and history — not history as in 100 years ago, but history as in 10 years ago? Because the way newsrooms were in 2009 is very different than the way they are in 2019.

The idea was: What if we combined a historical perspective with an ethnographic perspective? So we can watch how the path of the newsroom or the path of the journalism profession changes as it passes through time.

Vieira: Adding in that history is a huge challenge. How do you do it? Is it something you do during the interviews, the observations, or do you research in advance?

Anderson: I think it’s all those things. So one of the things that I did in the dissertation, in Rebuilding the News, I spent a lot of time on archive.org, the collection of past versions of websites. I could see how they looked in 2005 and go back. I spent a lot of time investigating it before I got there, so I could at least start to get a sense of how have these things changed over time.

The thing about digital technology is that what happened five years ago is already history. We need to be very conscious of that and always remember that our present orientation needs to be leavened to some degree by looking at the historical time period.

Vieira: Is that what you do in your new book, Apostles of Certainty: Data Journalism and the Politics of Doubt?

Anderson: That’s exactly what I try to do. In the new book, I really wanted to go all out on history. “Data journalism” is a thing that everyone is talking about, a really hot topic. The book ends with an ethnographic chapter, but everything before that is historical, in the sense that I was trying to understand: Did something like data journalism exist a hundred years ago? And if so, what was it like? How was it different or similar to now? How has the idea of what data journalism is or the culture of data journalism changed over time?

Vieira: And what did you discover?

Anderson: I discovered that data journalism now, in 2019, is more like it was in 1899 than it was in 1970.

So in some ways, certain aspects of data journalism are more like they were a hundred years ago that might they were 50 years ago. Because our understanding of data has changed and our understanding of what we mean by data has changed. The idea of “big data” has led to a lot of changes. So that was one of the main things I learned, which is that in some ways we’re kind of going back to the past to understand the present.

[From the book’s description: “In this book, C.W. Anderson traces the genealogy of data journalism and its material and technological underpinnings, arguing that the use of data in news reporting is inevitably intertwined with national politics, the evolution of computable databases, and the history of professional scientific fields. It is impossible to understand journalistic uses of data, Anderson argues, without understanding the oft-contentious relationship between social science and journalism. It is also impossible to disentangle empirical forms of public truth-telling without first understanding the remarkably persistent Progressive belief that the publication of empirically verifiable information will lead to a more just and prosperous world. Anderson considers various types of evidence (documents, interviews, informational graphics, surveys, databases, variables, and algorithms) and the ways these objects have been used through four different eras in American journalism (the Progressive Era, the interpretive journalism movement of the 1930s, the invention of so-called ‘precision journalism,’ and today’s computational journalistic moment) to pinpoint what counts as empirical knowledge in news reporting. Ultimately the book shows how the changes in these specifically journalistic understandings of evidence can help us think through the current ‘digital data moment’ in ways that go beyond simply journalism.” —Ed.]

Vieira: In 2012, you wrote, with Emily Bell and Clay Shirky, the report Post-Industrial Journalism: Adapting to the Present. Do you still agree with this concept of post-industrial journalism?

[The term “post-industrial journalism” was coined by Doc Searls to describe journalism that is “no longer organized around the norms of proximity to the machinery of production.” —Ed.]

Anderson: The short answer is yes, I do. The idea when we wrote the report was that post-industrial journalism is a very unsettled, chaotic state of affairs — unlike industrial journalism which was relatively stable, the ways to do it were relatively set. And I do think that post-industrial journalism will eventually be just like old journalism, which is that it will stabilize. We won’t be in a state of chaos forever.

Eventually new structures, new routines, new professional codes, new organizational practices will solidify. I don’t think there’s anything inherent to the Internet that means that we’re going to be living in a state of chaos forever.

That said, though, if you ask me a year ago what I thought the new model would be, I probably would have told you BuzzFeed or Vice. And they just have had tremendous difficulties. So maybe it will be chaos for longer than I thought, because it did seem to me two or three years ago that we were starting to see some stability.

Vieira: What about the membership model, like The Correspondent?

Anderson: Jay Rosen certainly done a tremendous job in bringing that project along and turning it into a really viable way to do journalism. It’s interesting because they haven’t actually produced any journalism yet. And Jay said this — he said “we are the most successful membership-based journalism website that has never actually produced an article.” They’ve had a lot of success so far, but I do think that eventually, they have to do the journalism — and, you know, doing journalism is tricky. To some degree, it’s easier to want to support something when you don’t know what it’s going to do. The tricky thing for The Correspondent will be their ability to keep those subscribers. Hopefully they will — we’ll see. So as to whether they are a model, I think it’s too soon to tell.

Vieira: America is a different market than the Netherlands [where The Correspondent started].

Anderson: Yes, absolutely — it’s a bigger market. The trouble in America, though, is that journalism traditionally in the U.S. has been very local, because America is so big and because of the federal nature of America as well. So journalism has been local, and there’s no business model for local in the U.S. — I mean there just doesn’t seem to be one. So the question in the U.S. is what’s going to happen to that local journalism? Is national journalism just Trump all the time, plus sort of the latest political scandal that blows up and becomes news for 48 hours?

So right now, the business model of news in the U.S. seems to be Trump plus rolling political scandals. Is that sustainable or will everyone just lose their minds? A lot of the content has similar rhythms, which is “one stupid thing that Trump said today,” “what is Bob Mueller doing.” That doesn’t mean it’s not important — but you do have to wonder how long we can keep up before people have a nervous breakdown.

Vieira: In Brazil, we’re facing the same thing now with our new president. Bolsonaro also posts on Twitter all the time and he doesn’t like to talk to the press. What can Brazil learn from American journalism regarding to this issue?

Anderson: I think it’s very hard for the press in the U.S. to know what to do when it becomes the target of a particular type of political attack. Trump has made the American press his enemy. I suspect the new Brazilian president will do the same, or has already done the same. The question is: How do you respond?

This is something that Jay Rosen has also talked about a lot. Do you respond by saying: “No, we’re not the enemy, we are just objective journalists doing our job” — which I think is the wrong choice? Or do you say: “To the degree that you, as the president, are against basic liberal democratic ideals, we are your enemy”? That’s a different way of saying we’re going to take sides. That’s different from saying that the press is going to support Democrats, or support liberals, or support the Worker’s Party.

What you can say, as the press, is: “We’re in favor of truth. We’re in favor of kindness. We’re in favor of reasonable conversation, the ability to disagree. We are against racism. We’re against dictatorships.” To me, that’s different than saying that “we’re in favor of this particular political party.” That’s saying that we’re on the side of certain ideals, and to the degree that we have a leadership that violates those ideals, then we are its enemy. I think that is something that the Brazilian press can learn from the U.S. press.

Vieira: I’ve seen you and other researchers on Twitter discussing academic careers, about the number of papers in journals that universities require and how is it possible to build a healthy and at the same time a productive career. Do you think we should publish less?

Anderson: That’s a great question. What I would say to an early career researcher is this.

In the end, the most important thing is that you have a big question. A big question that is going to take you a few decades to answer. If you have a big question, then it becomes less important whether you write a lot or a little — or if sometimes you blog, you tweet or you write books or academic papers — because it’s all geared towards answering the big question.

A lot of the time scholars don’t have a big question. They’re not trained to have a big question. They’re trained to have smaller questions. When you have smaller questions, then all you can do is publish. The way that you show that you’re valuable and that you’re worthy is by publishing a lot. If you have a big question, you will publish the right amount to be successful, no matter how much you publish. If you have a big question and you’re always trying to answer it in different ways, different formats, and different methods, you will always publish exactly the right amount. You won’t have to worry about meeting a quota. In the end, it’s all about the question, a big one that will take you a long time to figure out. I’ve had a big question since I started.

Vieira: And what is your big question?

Anderson: My big question is: How do we know what we know in order to operate as democratic citizens? And what do different types of professions tell us what we know and how do they tell us what we know in different ways? So that’s my big question. How do we know what we know — which is not to say “is there a reality” or “does reality exist,” but to say: How do different types of institutions and different people who need to operate in a liberal democratic way, how do they interact? And journalism is one of those institutions. But so is academia, and so are your neighbors, and your social networks. So I think that journalism is really important — but one thing about my research is that journalism has never been the only focus. And I think that’s a problem for journalism researchers: I think that sometimes journalism researchers care too much about journalism.

Image of Donald Trump talking to media in Mesa, Arizona, in December 2015 based on a photo by Gage Skidmore used under a Creative Commons license.

This Spanish data-driven news site thinks its work goes past publishing stories — to lobbying the government and writing laws

Christine Schmidt — Thu, 08 Nov 2018 16:21:21 +0000

If you spend dozens of hours learning about a subject — say, government procurements — for an article, you might want to find some way to use that knowledge beyond just hitting Publish on a story.

As a journalist, obviously, there are some hesitations; the debate on where the line falls between journalism and activism continues to simmer. But the activist-for-truth role of a journalist is part of the core of Spanish nonprofit news organization Civio, which believes that when it uncovers problems in government, part of its job is to lobby for specific solutions.

Eva Belmonte, Civio’s managing editor, has shown up at Spanish legislators’ offices with 100-page proposed amendments in tow. Some of her legislative language has made it into Spanish law. After reporting extensively on the procurement process, Belmonte became Civio’s main outreach to government officials.

CIvio doesn’t lobby on every issue it reports on. But if its reporting shows that the government is standing in the way of transparency or accountability, it’s not afraid to take a stand.

“You know so much of the problems you have implementing the law, what kind of information you need to try to avoid corruption or similar,” she said. “You feel all this knowledge would be useful for something, for trying to change something.”

Belmonte, an eight-veteran of the newspaper El Mundo, joined the then-two-year-old Civio in 2013. The group, founded by software developer David Cabo and entrepreneur Jacobo Elosua, modeled itself after the Sunlight Foundation and pushed for open data practices and tools to help citizens see the intricacies of public institutions. Civio pivoted to add journalism as a core component (yes, a pivot to journalism!) to help tell stories from the data a few years later, taking inspiration from ProPublica this time. Now it has a staff of four journalists and two or three (if you count Cabo) tech folks in its Madrid office.

“The whole narrative was naive — a tech utopia. We realized we needed to have lobbying to make things work and the journalism was very important to us,” Cabo, Civio’s executive director, said. “Now, we are a nonprofit investigative newsroom with a strong technical angle, because we still have tech skills, but are now focused very much in data journalism.”

Many of its lobbying efforts have been prompted by that need for public data to fuel its journalism. “Very quickly we realized we didn’t have an access-to-information law like FOIA,” Cabo said. “Many of the investigations we couldn’t do because there was no public data available. We realized we had to push for that.”

In its seven years, Civio has reported on the details of daily government bulletins, closely tracked 10,000 pardons from the past 20 years (building its own Pardonmeter with scraped results), and investigated pharmaceutical pricing differences and access worldwide. It’s very close to convincing public officials to make their daily meeting schedules public for transparency, Cabo and Belmonte told me.

Civio is upfront with readers about their lobbying every step of the way, with their goals, recommendations, and even a running list dating back to 2015 of which officials they met with when and what their intentions were. An example:

(In English, this entry talks about a video conference Civio held with the deputy spokesperson of the Finance and Public Administration Commission in order to present proposals for public-sector contract reform, identifying the members of the meeting from Civio and the government.)

“Every meeting we have — we talk about it, publish the documents we use in those meetings,” Cabo said. “We thought it was the right way to show it can be done.”

But as a nonprofit newsroom, the team is limited by the lack of a major philanthropy-in-news culture like the one the U.S. has spent the past decade been cultivating.

Only 2 percent of Spaniards currently donate to news organizations, but 28 percent told researchers with the Reuters Institute’s Digital News Report that they could see themselves donating in the future. (To be fair, the U.S. was only at 3 percent/26 percent on those same questions.) Independent Spanish media outlets have been experimenting with donation structures in recent years — digital news organization El Español raised €3.6 million in a 2015 crowdfunding campaign.

Still, Cabo has had to become creative with Civio’s funding sources. It made money off of helping Barcelona and Madrid create visualizations of their budgets using a Civio open-source tool.

In 2017 alone, Open Society Foundation (€68,180), the European Union (€61,926), and the European Journalism Center (€16,302) have given grants for Civio to work on projects or for general operations. On the individual level, Civio has more than 400 donors — no anonymous contributions allowed — with many chipping in around 5 to 10 euros a month. Most of Civio’s readers are between ages 35 and 50, with a lot of middle-manager public officials, journalists who share Civio’s mission, and tech workers who appreciate open data efforts, Cabo said.

Civio is not the only news organization taking a more pointed approach, though again it lobbies only for issues of government transparency and lowering barriers for journalism in Spain. Schibsted created a director of public policy role earlier this year, and of course industry organizations lobby Facebook for more pieces of the pie and governments about market regulations, for example. But it’s one thing to have your work stop when readers start clicking on the story — and another to advocate fixes for the problems you’ve found.

“If we are eight people and manage to do this, I don’t want to know what bigger companies are doing,” Cabo said.

Illustration by Toma Silinaite used under a Creative Commons license.

Watch out, algorithms: Julia Angwin and Jeff Larson unveil The Markup, their plan for investigating tech’s societal impacts

Christine Schmidt — Mon, 24 Sep 2018 18:55:05 +0000

Observation: Julia Angwin and Jeff Larson left their jobs at ProPublica to “start a crazy adventure.”

Hypothesis: Their recently unveiled organization, The Markup, is setting up a new model for newsrooms to report on the societal effects of technology, using the scientific method (as seen, well, here in this lede).

Data/evaluation/findings: TK.

Angwin and Larsen, a journalist-programmer team at ProPublica, had uncovered how Facebook let users target ads at “Jew haters” and enabled advertisers to exclude certain races and ages from housing and job ads, among other investigations into how algorithms are biased. The work even earned Angwin a public shoutout from Facebook:

Thanks @JuliaAngwin. You've done a lot to uncover issues in our ads systems, which we've worked hard to fix. This new API is an important step towards greater transparency + other changes we're making like view ads and the archive going back 7 years for all political & issue ads. https://t.co/z7wagLZRyV

— Facebook (@facebook) May 11, 2018

They split from ProPublica in April — bringing a couple staffers with them on their way out, and roping in cofounder and executive director Sue Gardner, formerly of the Wikimedia Foundation and the Canadian Broadcasting Corporation — and now are ramping up said crazy adventure:

The Markup is a nonpartisan, nonprofit newsroom. We produce meaningful data-centered journalism that reveals the societal harms of technology. We hold the powerful to account, raise the cost of bad behavior and spur reforms. Our journalism is guided by the scientific method; we develop hypotheses and collect data to test those hypotheses.

Funded by $20 million from Craig Newmark Philanthropies, $2 million from the Knight Foundation (which, disclosure, has supported Nieman Lab in the past), and $1 million from a collection of other journalism philanthropy organizations, The Markup kicks off in early 2019. I spoke with Angwin (with a short interlude on the financial details from Gardner) about the organization’s plans, its distinction from ProPublica, and how others can get involved going forward. Our chat has been edited and condensed for clarity.

Christine Schmidt: How far along was The Markup when you left ProPublica?

Julia Angwin: I left ProPublica with the hope and dream of doing something like The Markup. It wasn’t appropriate for me to try to raise money while I was there because they’re also a nonprofit. I had to leave without any money. I didn’t know if I’d be able to raise money. It was terrifying. Jeff and I just figured we’ll try this and hopefully someone will hire us if we fail. We had been talking about it for a while because we wanted to expand our work. We had a team of him, me, a halftime programmer, and a sometimes researcher. We loved our work but we wanted to do more of it with more people. So we decided to jump off the cliff and hope it worked out.

Schmidt: Can you say more about why you decided to build this organization separate from ProPublica and the work you’d already been doing there?

Angwin: ProPublica was great. Jeff and I had a great run there — we had so much fun and were able to establish and pioneer this type of reporting that we did together with a journalist and programmer working together from the beginning. Our dreams, though, were pretty ambitious. We wanted a newsroom, and we will have a newsroom of 20 people. That would be a significant commitment for ProPublica. We discussed with them about doing it internally and we all agreed in the end that it would be better to go off and do it on my own. It was something we were all feeling sad about, but it wasn’t angry.

ProPublica is literally the best job in journalism. It was and remains the best job in journalism. We took a crazy leap. It was crazy to walk away from those jobs, and I’m so happy it turned out to be a good thing for us. But it was a huge, huge risk. Nothing about what we did reflects poorly on them. It’s more about: We have an idea about how journalism should be. It’s much more tech-focused than any newsroom, even though ProPublica is the most tech-infused newsroom out there. We want to take it to another level.

Schmidt: What is that next level? What are the nuts and bolts of how this organization will operate differently?

Angwin: We describe ourselves as doing journalism that is based on the scientific method. The idea is that objectivity has been a false god for journalism for a long time. It started out as a decent idea, but it’s led to a lot of what people call false equivalents. I think journalism needs a new guiding light, a new philosophical approach, and I think that approach should be the scientific method. What that really means is we develop a hypothesis. Maybe the hypothesis is: ‘Brett Kavanaugh. Did he actually harass a woman or not?’ Then you collect evidence — how much evidence is there for and against this. Then you describe the limitations of your evidence: ‘So far the evidence is only one/two people.’

It doesn’t have to be ‘he said, she said.’ It’s more about: this is the amount of evidence to support this hypothesis, and then here are the limitations of this. There are always limitations to our findings. Even though climate change is well accepted scientifically, there are limitations for those findings as well. That’s our goal, to try to frame our journalism around that.

What that means in practice is having people with technical and statistical skills involved in an investigation from the outset. So much of what happens in traditional newsrooms, in every newsroom I’ve ever worked in, is that there’s a data desk. A reporter goes over to the desk and basically orders data like it’s a hamburger. Usually by then, the reporter has already done the reporting and has a hypothesis based on the anecdotes. Then, if the data doesn’t support it, there’s a fight between them and the data desk. Or, more often, there’s not even data available.

There isn’t data about most of the important questions we need answered as a society. The reason there’s no data about them is that there’s no political will for it. The reason we don’t know what happened on Facebook during the elections, for instance, is because Facebook would have to tell us — and why would they want to? It’s important to start the investigation earlier, with ‘What is the question we want to ask?’ and ‘What is the best way to get that data?’

Of course, traditional reporting is one of those ways. It’s not a good idea to just wade into a topic you don’t know anything about. You have to talk to people and understand what you’re talking about. But at the same time, I think it’s really important for journalists to provide data because data is how we as a society make decisions. We have chosen to be, pretty much, a scientifically driven society, and we appreciate data. Mostly, we still agree that facts matter. For journalists, the more data we can bring to the table, the more we can say ‘Hey, this is what we have found.’ It’s not just three anecdotes — it could be 10 or 10,000. The fact is, we need to bring a bigger sample size to the table.

Schmidt: So what are you looking for in different journalists and programmers? Who will make up that team?

Angwin: We’re going to put up really specific job descriptions, but I can talk about it on a high level.

The thing that we’re looking for is not necessarily heavy programming skills. We will need some of that, but there’s a really interesting dynamic I’ve noticed doing this type of work. It’s more about being open to the scientific method — being open to the idea that we’re going to let the data guide us and we’re going to go find the data for important questions as a way of doing journalism. It takes a certain kind of mindset to be open to that. There are lots of other ways to do journalism; I’m not saying this is the only way to do it. This is one way that I want to do it. I have noticed that people who have nontraditional backgrounds can be really good at this kind of thing. I have a feeling that we will have a wider variety of people with maybe not as often a traditional journalism background.

Two of the people I took with me from ProPublica, Surya Mattu and Madeleine Varner, are both programmers, but they’re self-taught and they both studied art. They’re both artists primarily, but they have that investigative mindset and curiosity. It’s a pretty nonraditional hire but those are the kinds of people who have worked out really great.

Schmidt: The reception I’ve seen on Twitter is a lot of excitement and a lot of interest. What are other ways that people could get involved or partner with The Markup?

Angwin: We’re going to have the ProPublica model with Creative Commons licensing of our stories, so they will be widely republishable. Also, just like ProPublica, we’re going to have partnerships with big media outlets for our big investigations. The likelihood us attracting gigantic traffic to TheMarkup.org in our first five years is probably low, so we’re going to want to extend our reach through partnerships.

In general, in our investigations, we often reach out to academic and subject matter experts for advice. We’re not actually statisticians. We know we are just amateurs. We always reach out to experts on a case by case basis for advice on investigations. We may formulate that into an advisory group, but we haven’t decided yet. Of course people can donate! I know we did receive an enormous gift and we are so grateful, but of course if we want to run this — that’s about four years of funding, so they can contribute to the fifth year!

[Here’s that interlude with Sue Gardner, The Markup’s cofounder and executive director, with more on the finances.]

Gardner: The story of the societal effects on technology is remarkably undercovered. We have a lot of tech coverage — I’ve been living in San Francisco through a lot of it. A lot of it has been gossipy stuff or the exciting rollouts of new products and services, or it’s been the business coverage and effect with stock prices. There has been remarkably little coverage of the societal effects of new technologies, and it is the story of our time. We felt that there wasn’t yet — until now — a journalistic organization that focused solely on the societal effects of technology. It was a big screaming gap in the media landscape.

Schmidt: What are the priorities of the $23 million The Markup has raised going forward in these first few years? How do you see the financial model building out around that?

Gardner: Right, the first couple of years we’re aiming to do two things: We need to build the news organization. Jeff and Julia at ProPublica pioneered the practices of bringing data science to journalism so we’re going to try to scale up the model from ProPublica and institutionalize it.

The other thing we’re going to be working on is trying to find a sustainable model for that kind of journalism. Investigative journalism is the most expensive kind of journalism. Data-centered investigative journalism is even more expensive. It’s a niche product. It’s not broadly appealing to large numbers of people. Journalism is best when it’s paid for by the users. Then all the incentives are virtuously aligned.

What we plan to do is exactly what I did at the Wikimedia Foundation. When I went there, we did not have a business model. We were a nonprofit, but we weren’t bringing in a lot of money. My first job was to figure out sustainability for Wikipedia. When you look at it now, it seems really obvious for what the model for Wikipedia should be, but it was not obvious in 2007. We deliberately set out to spend two years experimenting with different revenue model. We solicited major gifts, we spoke with foundations and got grants for the organization, we did what we called the ‘many small donors’ model, and we experimented with various kinds of earned income. I had always hoped when we started that the ‘many small donors’ model would be successful, and it did turn out to be successful.

I want to do the same thing with The Markup. We’re going to experiment for probably around two years and we’re going to double down on what seems to be working. That is what Craig Newmark’s money and the major grantmakers’ money has bought us — that runway, so we have a couple years to experiment and we have some time to figure out what will work in the longterm.

Schmidt: What are some of the experiments you’re eyeing?

Gardner: We’re going to need to experiment with appealing to people beyond those who read and consume the news products. Investigative journalism, in particular, is very niche and the audiences for it are very small. If you approach it as a purely consumer product, you limit how much money you can raise. I think it’s a mistake to think of journalism as purely a consumer product — it is a consumer product, but there’s also an argument to be made that journalism is also a public good. I benefit, as a person in the world, from the work that the International Consortium of Investigative Journalists or the Organized Crime and Corruption Reporting Project is doing. I benefit even if I don’t consume the stories directly, because journalism has a role in holding power to account, which is separate from its role in creating an informed society. One of the messages we’re going to be experimenting with is an argument that it’s a public good and that the public wants the tech industry and institutional users of technology to be held to account independent of whether they read our work or not. It’s a public good and it should be supported as much as being a consumer product.

[Okay, back to Julia.]

Angwin: We are not going to experiment with advertising. We’ve ruled that out. We are not going to be taking corporate money. We don’t take government money. I hope philanthropy can support investigative journalism for years to come, but it would be wise to look at other options as well.

I really, really am excited to try to build a model for a new way for doing tech-driven journalism. There was a time when journalists really didn’t know anything about business, and then there was the effort to educate journalists about finance and I was part of that. I got the Knight-Bagehot fellowship at Columbia and I ended up getting my MBA because I was a business journalist and I wanted to have that expertise in the area I covered. I feel like we need that in technology.

Technology is invading every part of our lives, and it is used as a cover for political decisions. Journalists in every field need to have more skills to investigate those types of decision-making that is embedded in technology. I’m hoping we build a model that is replicated.

I see us as a FiveThirtyEight — when FiveThirtyEight started, they were the first ones doing major meta statistical analysis of polls and using that to inform political coverage. Then everybody copied it, like with The Upshot. Part of their success was extended through invitation. I hope that happens to us. I still want us to exist, I don’t want to be copied out of existence, but if we can build a model about how you can do this kind of work that’s journalists paired with technologists and expertise — I would be thrilled if that spread to other newsrooms and became a practice and a field.

Image of code markup from Markus Spiske used under a Creative Commons license.

We’re getting closer to the day when news apps and interactives can be easily preserved in perpetuity

Shan Wang — Wed, 22 Aug 2018 13:36:50 +0000

What if all the interactives a news organization ever made could be stored somewhere, accessible in the same form forever, even as the technologies people might use to access them change?

That’s the dream, and that’s what a small team led by Katherine Boss, the librarian for journalism, media, culture and communication at New York University, and Meredith Broussard, assistant professor at the Arthur L. Carter Journalism Institute at NYU, are trying to get the news industry closer to.

It’s a question that many people in the libraries world and a smaller set of people in the news industry have been worrying about and working on for some time. Boss and Broussard’s team will be building software that can zip up the entirety of a news app, using ProPublica’s Dollars for Docs database (which tracks payments pharmaceutical and medical device companies made to doctors) as a test case. For the prototype, they’re adapting an open source tool called ReproZip, created initially for replicating scientific experiments without also having to replicate everything else, such as installing additional software, or the operating system on which the original experiment ran. (The tests are currently funded through an Institute of Museum and Library Services grant. The team is now Boss, Broussard, and a reproducibility librarian Victoria Steeves, and it’ll add a third programmer to the team of Fernando Chirigati and Rémi Rampin.)

“The software tool we’re trying to build is an emulation-based web archiving tool to archive the…well, internet. In particular, dynamic projects like news apps that can’t currently be fully archived by anyone,” Boss said. “This tool is the first step in that process. If we don’t have a way to capture and compress these things through emulation, we can’t begin to think about any other aspects of the process.”

As Boss and Broussard had explained to Nieman Lab when they first embarked on this project:

[T]here’s “migration,” and then there’s “emulation.” “Migration” is the traditional stuff we might associate with libraries: digitizing print materials, digitizing microfilms, moving VHS to DVDs and then DVDs to Blu-ray and then Blu-ray to streaming media. That process doesn’t make sense for digital “objects” like news apps that are dependent on many different types of software, and therefore have too many moving parts to migrate. What if, a hundred years out, we’re not even browsing the internet on computers, or at least not the computers we’re familiar with now?

As part of a Reynolds Journalism Institute fellowship this year, Broussard is also working on the online holding place that will allow people to access, through a web browser, these archived news apps just as they were first presented, without broken links or wonky graphics or dead-end interactions. Kind of like the way, say, someone might be able to go to a library and look at digitized versions of a notable writer’s collection of letters. Or the way the Internet Archive lets you play a 1986 Sega game online.

“Once we’ve packaged these apps, we need a place to put them. You can’t just package them up and put it anywhere on the internet, because as we know, stuff on the internet sometimes just disappears,” Broussard said. “A physical library has shelves you can put books on. A digital repository needs to have the digital equivalent.” Lots of similar repositories exist that hold other types of content; none yet specifically hold news apps. In their test case, the ReproZip-based software successfully preserved the backend of Dollars for Docs, but not the frontend, so the tool needed to be further adapted to account for that.

They’re aiming to launch the repository in the fall, starting with packaging and storing some of Broussard’s own recent — but already broken — news apps, like a 2016 campaign finance data project, or a 2014 database on textbooks in Philadelphia schools. Several other news organizations are now interested in archiving their news apps with them, Boss and Broussard said.

Few news organizations are consciously considering the problem of archiving news apps, let alone are putting in place real archival strategies. The Tow Center is also researching this question. There’s the Save My News plugin, which lets users easily save their articles to a place like the Internet Archive. The New York Times put a serious team behind archiving all its story pages the way they originally looked when they were published. For the most part, news staff can’t spare the time or resources to archive projects systematically, and for posterity.

Before they started building the archiving software, Boss and Broussard’s surveyed developers and journalists on the tech used to make and store their news apps, to get a sense of what programming languages, for instance, to prioritize in building out their archiving software.

“We discovered, for example, that nobody is using Haskell or Julia. But people are using Python and R. They’re using JavaScript, and frameworks like Django and Flask,” Broussard said. “We don’t want to build a tool that works for a language that nobody’s using.”

They also asked people their organizations’ archiving practices, or lack thereof. According to data that will be published in a forthcoming special issue of Digital Journalism (look for Volume 6, Issue 9, edited by Henrik Bodker):

19 of the 76 news applications represented in the survey weren’t being maintained, according to respondents. (For seven news apps, respondents didn’t even know whether or not the apps were being maintained or updated.)
93 percent of these apps had been published in the past five years. Of the 41 different news outlets represented in their survey, only two — 5 percent — said their organization had a system in place for archiving news apps.

“The issue is again that no organization has a way to compress these objects through emulation and send them to libraries. Libraries actually have the support and mandate to save dynamic digital objects like this, for 50, 100 years — we’re thinking way far into the future,” Boss said. “Our tool would make possible for newsrooms to package and send their stuff to libraries. That pipeline doesn’t exist right now, but it’s important to establish that.”

Shelves by Simon Gray, used under a Creative Commons license.

Here’s a new online community that looks to be a one-stop shop for global data journalism resources

Shan Wang — Thu, 22 Mar 2018 15:13:33 +0000

Data, data, everywhere, and quite a lot to drink in.

The Global Editors Network, has launched the Data Journalism Den, a new online community focused on spotlighting good data-driven journalism and datasets and connecting data journalists for news stories, advice, and even jobs. The Den is open in beta now, and its community is free to join.

Building on its active Slack community of data journalists, where it hosts monthly speaker series for anyone in the Slack, the Den is trying to facilitate more active collaborations and more frequent discussions between journalists. Its “Matchmaking” section will allow Den members to request feedback on projects, submit calls for partners, or even ask for funding.

“When people think of collaborative journalism these days, they think of these big projects like the Panama Papers or the Paradise Papers, where you have hundreds of journalists working together across different countries. That’s a big project to manage,” Teemu Henriksson of GEN, who was brought on to oversee the project, told me when we spoke ahead of the launch. “We’d like to be able to facilitate many smaller collaborations. Data journalists can go on our site for a project they’d like to create, but maybe don’t have the resources available for. They can specify what kind of needs they have, from manpower to skills to assistance to funding.”

The Hub will also offer a regular email newsletter that will pull together examples of good data journalism being done around the world, as well as surface new tools, data-related services, and insightful discussions around data reporting happening elsewhere online.

There are plans for a “data store,” modeled in part on ProPublica’s data shop, where it sells valuable datasets. A lot of work often goes into putting together and cleaning a dataset, which may combine several different sources, and the dataset could still be useful to a wider community of data journalists after the original set of news stories based off the data have been published.

A jobs section is also forthcoming.

The project is supported in part by Google’s Digital News Initiative. You can join and read more about the Den here.

Holding algorithms (and the people behind them) accountable is still tricky, but doable

Christine Schmidt — Wed, 21 Mar 2018 15:38:29 +0000

The black box of algorithms in public and private life can be impenetrable for journalists, constrained by trade secret exemptions and situational awareness — despite the fact that they can have a huge influence on both public and private life, from what you pay for airfare to whether an app declares you likely to commit a crime. But it doesn’t mean journalists stop trying.

We’ve written about algorithmic accountability before, but the importance of parsing it still remains, especially when the blackest of boxes of algorithms is playing a pretty influential role in society. Reporting on algorithms by reverse-engineering them can work — if you can get the inputs and outputs right. But sometimes just reporting the fact that an algorithm exists within a government can be a revelation. So how do you dig in?

Last month, Ali Winston reported for The Verge in partnership with the Investigative Fund at the Nation Institute that the city of New Orleans was partnering with Palantir, the secretive data analysis company cofounded by Peter Thiel. The city had been using a predictive policing algorithm, unbeknownst to many local elected officials or the public, Winston found.

There's been some hue & cry in the local press about Palantir's role in New Orleans being overblown. I'll just leave you with just one reason why you should take these claims w/a grain of salt" when NOLA claims they didn't analyze social media – what does this slide say? pic.twitter.com/dXEsVSTJpJ

— Ali Winston (@awinston) March 2, 2018

Source document – which I obtained from NOPD, for what it’s worth https://t.co/WdLdGfS3cu

— Ali Winston (@awinston) March 2, 2018

The story landed squarely in public debate:

Two weeks ago, The Verge reported the existence of a six-year predictive policing collaboration between the New Orleans Police Department and Palantir Technologies, a data mining giant co-founded by Peter Thiel. The nature of the partnership, which used Palantir’s network-analysis software to identify potential aggressors and victims of violence, was unknown to the public and key members of the city council prior to publication of The Verge’s findings.

Yesterday, outgoing New Orleans Mayor Mitch Landrieu’s press office told the Times-Picayune that his office would not renew its pro bono contract with Palantir, which has been extended three times since 2012. The remarks were the first from Landrieu’s office concerning Palantir’s work with the NOPD. The mayor did not respond to repeated requests for comment from The Verge for the February 28th article, done in partnership with Investigative Fund, or from local media since news of the partnership broke.

There is also potential legal fallout from the revelation of New Orleans’ partnership with Palantir. Several defense attorneys interviewed by The Verge, including lawyers who represented people accused of membership in gangs that, according to documents and interviews, were identified at least in part through the use of Palantir software, said they had never heard of the partnership nor seen any discovery evidence referencing Palantir’s use by the NOPD.

Winston had reported extensively on the existence and implications of the Palantir–New Orleans algorithm, including how Palantir used the pro bono partnership’s efforts in a sales pitch to Chicago’s police department. That sale didn’t end up going through, but Chicago’s own predictive policing algorithm has also been subject to journalistic scrutiny and even reverse-engineering. Rob Arthur used the public records from other news organizations’ FOIA requests to obtain the inputs and outputs of the algorithm, based on additional information from the police department.

“We had 400,000 data points with their arrest information and their scores and what we didn’t know is the middle part, the algorithm that informed this,” Arthur explained at a panel at NICAR (his slides are here). “What we did was very simple: We ran a statistical model using the input predictors that we had — arrest info and so on — and tried to predict their strategic subject list score [the predictive policing results] as a function of those predictors…We knew we had successfully reverse engineered the model because [based] on our sample data, we were able to predict the strategic subject list score extremely accurately.” They had an R-squared value of .98, meaning they “pretty much nailed what their algorithm was with just the information that they gave us,” Arthur said.

The algorithm had apparently been developed at the Illinois Institute of Technology, a private university, so it wasn’t necessarily subject to FOIA requests, Arthur said — but journalists in Chicago still sued to get access.

“It’s very important we actually see the algorithm for itself, but even without getting that request successfully filled, we were able to demystify this black box, this algorithm that had very scary connotations, and break it down into what ended up being a very simple linear model,” Arthur said. In fulfilling other FOIA requests about the inputs and outputs, the city had said that not all the variables of the algorithm were provided, but Arthur believes they did. Still, “we don’t need to perfectly reverse engineer an algorithm to be able to say something interesting about it.”

To be fair, though, journalists should be wary of reverse-engineering an algorithm — and then getting it wrong. At NICAR, Nick Diakopoulos pointed out that having such a high R-squared value was a confidence booster in publishing. (His slides are here.)

Diakopoulos has been tracking algorithm accountability for years, most recently as the director of Northwestern’s Computational Journalism Lab and Tow Center Fellow, and also as an occasional contributor to Nieman Lab. He also helps maintains the site algorithmtips.org as a resource for journalists parsing potentially newsworthy algorithms in the U.S. He advised interested journalists to be aware of missing information when governments are reluctant to share, to have an expectation of what an algorithm is supposed to do, and to know that it’s never one-and-done since algorithms can always be tweaked. And remember, it’s usually humans that are doing the tweaking.

“As reporters, we really need to push to hold people accountable. Don’t let corporations say ‘it was the algorithm,'” Diakopoulos said. “You need to push on that more and find out where the people are in this system.”

Others on the panel pointed out other systems that could be audited and held accountable as well: targeted job listings, Airbnb results, landlord postings, hotel rankings, and more. Robert Brauneis of George Washington University conducted a study with Rutgers’ Ellen Goodman to test the limits of transparency around governmental big data analytics. They filed 42 different open record requests with public agencies in 23 different states about six predictive algorithm programs. They received no response to 6 requests; 7 responded initially and then didn’t follow through; two were caught up in the courts, and three “requested large sums of money we were not able to provide,” Brauneis said. Another 12 said they did not have materials related to algorithms, 5 sent non-disclosure agreements they had with vendors, and 6 provided some materials ranging from training sets for the algorithms to email correspondence about algorithms.

In our previous coverage of NICARian discussions on algorithmic accountability, Diakopoulos offered some advice for journalists on newsworthiness and thinking critically about the machines we rely on four years ago: “Does the algorithm break the expectations we have?” he asked. “Does it break some social norm which makes us uncomfortable?”

Now, the social norm might be becoming uncomfortable.

Visualization of a Toledo 65 algorithm by Juan Manuel de J. used under a Creative Commons license.

This Indian startup wants to free — and find stories in — public data that’s messy and inaccessible

Gangadhar Patil — Tue, 20 Mar 2018 13:37:43 +0000

Do private hospitals in India perform an unnecessary number of C-section operations in order to make more money? It’s a common worry among Indian families, but until recently there was no official data to back up their concerns.

Then data journalists working at How India Lives, a three-year-old startup whose mission is to make public data more easily accessible, stumbled across a database India’s central health ministry had been maintaining.

The health ministry wasn’t looking at C-sections specifically; it was tracking pregnant women and newborns for a study on how to reduce infant and maternal mortality rates. But when How India Lives journalists dug into the dataset, they found that the numbers supported what many Indians had considered common knowledge: the number of C-sections conducted at private hospitals was almost three times as high as the number conducted at government-run, public facilities. (Private hospitals were also conducting C-sections at three times the country-level percentages recommended by the World Health Organization.)

How India Lives wants to be the go-to search portal for publicly available data in India. It also operates as a data consultancy and agency for data-driven news stories that attempt to answer questions in the public interest by transforming difficult-to-obtain and analyze data into something more accessible. In its first year, the company worked with multiple editorial partners to publish its data stories; it’s since signed an exclusive publishing agreement with India’s second-largest financial news daily Mint, which has commissioned and published more than 150 data stories from How India Lives to date.

“We want to be enablers for journalists to use public data for storytelling,” John Samuel Raja, How India Lives’ cofounder, said. Raja has worked across several of India’s major financial dailies, including India’s largest business newspaper The Economic Times, for more than 15 years. He worked on the idea for the company at the Tow-Knight Center for Entrepreneurial Journalism, where it won a $16,000 grant to kickstart the venture. (That’s been the team’s only grant funding so far.)

How India Lives, founded by five journalists with a range of experience at mainstream Indian news organizations, now has a mix of 11 total full-time and part-time staff — six reporters, two dedicated coders, and three data analysts. On top of data stories for Mint, it’s done research and consulting for 28 clients, including organizations like IDFC Foundation and Ashoka University. The company has been profitable since its first year, and hopes to clear $230,000 in revenue this year. Two-thirds of its revenue now comes from its consulting work.

Indian journalists who want to work on data stories face several major hurdles, including the availability of data in a clean, analyzable format and the skills required to build clear and useful visualizations. Other organizations might have questions that can be answered via publicly available data — what’s the philanthropy situation in India? — but don’t have the capacity to go searching for and processing the data. How India Lives responds to these obstacles.

Raja says that the Indian government actually makes available a good deal of public data, but relatively few people make use of it. How India Lives has been able to, for instance, analyze public data collected by the government’s education department to show a strong correlation between having a functional toilet in schools and school dropout rates among girls. It analyzed roughly 39,000 government job postings for government officials to show just how frequently some government workers transferred jobs.

“Public data is quite hard to come by in India. Even if it is accessible, it is structured in such a manner that it almost becomes impossible to use it effectively,” Saikat Datta, South Asia editor of Asia Times Online and an Indian investigative journalist, told me. “The time and effort needed to structure and analyze the data leads to very poor returns, in terms of readership and insights.”

A lot of other public data is outdated, or can be faulty because of collection errors, Samar Halarnkar, editor of another major data-driven news outlet IndiaSpend.com (and a former Nieman Visiting Fellow), added. And many Indian journalists remain uncomfortable with data journalism, he said: “They do not know how to use data to lend strength to a narrative, or vice versa.”

While India’s made progress in making public data available through portals like data.gov.in and a data-sharing policy, the quality and comprehensiveness of what’s available continues to hamper data journalism. Often government departments upload scanned copies of the data as JPEGs instead of making the spreadsheet available online.

Census data, which used to cost a fee to download, is now free, and How India Lives has incorporated the information into the simple search portals on its site. But other sources like Survey of India — which has a monopoly over mapping data in India — and the Indian Meteorological Department are still paid. Indian authorities also used to put out detailed export and import data on daily basis, but stopped abruptly without notice in December 2016: Companies, many in the manufacturing sector, objected to the sharing of these figures, argued that it revealed competitive information.

Besides How India Lives, several other established organizations such as IndiaStat, Social Cops and Gramener also work in the data analysis and visualization space. How India Lives pitches itself as the only one working at the intersection of three circles — journalism, technology, and public data.

How India Lives is hoping to roll out more advanced data products in the coming years, with a continued focus on creating customizable services to surface new datasets to present to paying clients in a searchable, comparable, and visualizable format, How India Lives cofounder Avinash Singh told me. Its consulting business, Raja said, has helped his team understand what types of questions people want to answer with data, and in what specific formats they want to consume that information.

How India Lives currently offers a beta search engine for public data covering only census information, with 2,300 categories of data on 715,00 geographical locations across India. The company is now building out a paid-for search product, which will offer 18,000 new data categories for all these locations, with new datasets, such as the Socio-Economic Caste census, added. Access to some of this will remain free; the team is hoping to make the expanded search features available around the end of this month. It also wants to make the process of adding new datasets easier, and build out avenues for other people to submit datasets themselves, according to Singh.

“As new technology solutions come, journalists use them,” Singh said. “But our solution can be used not only by journalists but also by organizations for their own decision-making.”

Photo of Shimla by Masrur Ashraf used under a Creative Commons license.

By mass-texting local residents, Outlier Media connects low-income news consumers to useful, personalized data

Christine Schmidt — Thu, 01 Mar 2018 15:28:51 +0000

If you received an unsolicited text message about a free service offering to check the public record of your house or landlord, would you respond?

What if you were a renter without much money and debating whether you should withhold next month’s rent because needed repairs aren’t being done? Or if the house next door is unmaintained and affecting your own living situation?

For many Detroit residents, replying to that sort of out-of-the-blue text might be worth a shot. When I first texted the service of Outlier Media, within a minute I was informed about what it does, how I could use it, and (after a prompt to send an address) that the Outlier database didn’t have records on the building I was interested in (the Motown Museum). Seconds later, Outlier had asked me if I needed more information about housing, inspections, or utility shutoffs. And though I didn’t ask for a follow-up, 51 minutes later Sarah Alvarez, the founder and lead reporter on Outlier, had answered my query manually. (The museum’s address has $1,249.24 in taxes due from 2016 but is not on the tax auction list, I’m told.)

By drawing on a hefty database of information compiled from city and county public sources and automating initial responses, Alvarez has built the one-woman-show of Outlier Media into a resource for low-income news consumers in Detroit in search of tangible, individualized information. In 13 months, Alvarez has sent messages to about 40,000 Detroit cell phone numbers in her quest to reach “as many Detroiters as possible”; between 1,200 and 1,600 Detroiters have used Outlier to search for information on an address. (Opting out from Outlier’s messages is always an option as well.) She developed the system as a JSK Fellow after reporting for Michigan Public Radio.

“Even though the journalism was very good, I was not satisfied with covering low-income communities for a higher-income audience. I wanted to cover issues for and with low-income news consumers,” said Alvarez, who came to journalism after working as a civil rights lawyer. “I covered issues that were important to low-income families, but I was not a housing reporter. Using Outlier’s method and delivery system, it’s such efficient beat development. I learned so much about housing so quickly. You can talk to hundreds of people in a week instead of just talking to a few.”

But she also focused on what reporting she could rapidly (and realistically) provide to people. Alvarez pulled local data from United Way’s 211 line, a hotline set up for people seeking resources on topics such as domestic violence, veteran support, addition rehabilitation, and housing and utilities assistance. “I knew if I could find out what people were complaining about, I would know a good starting point for my reporting,” she said. After comparing it to data from the Consumer Financial Protection Bureau, “housing was far and away the biggest thing.”

Alvarez dug into the data she could find to identify what information in the realm of Detroit housing is most needed but also most actionable on an individual level. Thanks to FOIA, public data, and data scrapers, she has access to the number of blight tickets per address, names of a property’s registered owners, whether or not it is at risk of tax auction, and how much taxes are owed.

After she buys lists of phone numbers from a marketing company with an A+ rating from the Better Business Bureau, Alvarez sends out her introductory cold-text to 5,000 numbers at a time using GroundSource as the texting platform. “The information is the first thing they get,” she said in regards to building trust in unsolicited text messages. “It’s scary enough to put in your address to someone on the other end that you don’t know, but I think this information is so valuable and it is not easy to get.” (Very few people text her initially like I did, she said, and she doesn’t do any marketing or outreach for Outlier. But some people save her number and reach out later when they have an issue with their housing.)

The first few interactions are automated, from her introductory message to the prompt for entering an address to follow-up messages — about additional searching, the need for an Outlier journalist to follow up, or if the texter thought the service was helpful (very, kind of, or no). But some messages prompt Alvarez to respond directly. One Detroit resident recently sent to Alvarez:

Texter: Im having a problem with an empty lot next to my house. It has an owner but he is refusing to care for it. The field is causing rodent issues, killed one of my dogs (because of flies eating my dog), ive broken a few lawn mowers trying to keep the lot maintained for my childrens safety….what should i do? I have contacted the owner and he is refusinf to deal with the situation Also, the land is caving in aome places, the trees are digging holes in my garage roof, and the trash has caused my garage to start sinking and having foundation problems….please please help

Outlier response (after the texter sent their address): This is Sarah Alvarez from Outlier following up with you. The address next to you is [address]? I see that there is only 1 blight ticket so I think what you should be able to do is to get more blight tickets for the property. I think the Department of Neighborhoods should be your best bet. In District 2 you can call Sean Davis at [phone number; he’s the deputy district manager]. If they give you the runaround please let me know. It’s their job to help you through this and it’s my job as a journalist to hold them accountable.

While the city of Detroit has been redeveloping, its journalism has also seen some reinvigoration and investment, with the Detroit Journalism Cooperative and the Detroit Journalism Engagement Fund, and experiments such as City Bureau’s Documenters pilot program. For the past two years, Outlier Media has been funded by a grant from the W. K. Kellogg Foundation, though that ends next month. Alvarez is exploring the idea of asking texters to pay for lookups beyond the initial outreach — say, 50 cents per additional search — though the limits of sending payment via text could restrain that idea. She’s also keeping an eye out for publishers who might be interested in including Outlier as part of their news organization. She has lined up funding from the Detroit Journalism Engagement Fund to bring on a second reporter for at least one year to focus on utility issues, which she said was the second-most sought-after information in her research. (Kenichi Serino is joining the Outlier team today as a senior reporter on the utilities beat.)

“We’re going to really look into [utility shutoff policies] to see as a consumer what can be done in regard to shutoffs and deposits,” Alvarez said. “That one’s the harder one than housing, but again, the data is harder to get, and you want to be able to deliver high-value information.”

Aside from the direct data lookup service, Alvarez also leverages Outlier to pitch stories to Detroit newsrooms or work on her own enterprise reporting, such as these. “When there is a bigger piece or a bigger investigation, my news consumers have been very eager to help. It’s a very reciprocal relationship,” she said.

Most people think about the job of journalists as reporting individual stories, but Alvarez firmly believes Outlier’s service falls within the traditional definition of journalism: “My job is to get vetted information to the people who need it most,” she said. “The reason I do it is to create more accountability, which is at the heart of what I think journalists should do.”

Image of Detroit’s Motown Museum by Ted Eytan used under a Creative Commons license.

The Guardian’s new podcast player for the web tries to make listening a little more interactive (but not interruptive)

Shan Wang — Thu, 15 Feb 2018 15:23:38 +0000

Sometimes, the podcast isn’t enough. Or to put it differently, it’s so good you want to find out even more.

While binging on the S-Town podcast last year, the first thing I did after one episode was search the Internet for more about the central character John McLemore, in hopes of finding a picture of the hedge maze via the GPS coordinates McLemore started giving out. (If you haven’t listened to S-Town and plan to, don’t do this.)

So did the Guardian Mobile Innovation Lab team — an impulse that they guessed many podcast listeners shared. The Lab’s latest mobile experiment is an attempt to address some of the small inconveniences and limitations of the podcast listening experience as it stands today. It’s a podcast player designed for the mobile web, which is being tested first with a new Guardian podcast called Strange Bird, hosted by data editor Mona Chalabi. (The Mobile Innovation Lab and Nieman Lab both receive Knight Foundation funding. You can check out their product process on that Lab’s Medium page, or find Nieman Lab’s coverage of their past experiments here.

Any listener can access the player through any browser on Apple or Android, though Android users get the added option of turning on push alerts timed to different parts of the audio as the podcast plays on. It’s like podcast show notes, but in real time, and with phone notifications that point you to links and graphics at relevant points in the story. Chrome users can also sign up to get alerts when a new episode is available, without subscribing within a dedicated listening app.

“We had a lot of debates about the purity of audio, and the power a good audio story has. We’re trying to play around with the idea: If it weren’t just audio, what could it be? How can we seamlessly deliver additional media assets?” Sarah Schmalbach, the senior product manager at the Lab, said. “We’re not building an entire experience around them, but adding them where they’re relevant, and taking into account that some people also like to keep the visual or other elements a mystery. We wanted to figure out how to blend a couple of different formats together, with audio as the anchor.”

“We’re seeing so many organizations take interest or invest money in podcasts as a source of gaining new audiences, but those aren’t necessarily integrated into the rest of their output,” Sasha Koren, the Mobile Lab’s editor, said. “Maybe they’re available through a basic web player, but you can’t follow or interact with them in any other way, or you can only access them on a podcast app which takes you off platform.”

Have you ever listened to a podcast episode centered around a visual, heard the host mention the website where some of the images in question live, and then never actually followed up with those extra steps to see the extra stuff? For the wonkier among you, have you ever listened to charts getting cited and papers getting quoted, and wanted to check out the data for yourself? The Mobile Lab’s web player experiment addresses some of these use cases for more regular podcast users. It also attempts to both simplify and enhance the podcast listening process for beginners — especially Android users — who are faced with a range of paid and free listening apps, of varying quality.

Strange Bird’s premise is that it will use data to open up difficult issues that are actually common, but insufficiently discussed, making it a useful test case for the Mobile Lab’s web player. The Mobile Lab team worked off of the show script to look for possibilities where added material made sense — where there were characters whose photos listeners might be interested in seeing or data graphics Chalabi created that might be incorporated, Koren said. Its pilot episode is on miscarriage, and will feature chats from Mona herself, links to other stories, and illustrations. The podcast will also be available all the other places where podcasts are found, but the web player audio will include a little introduction about the added features.

“What we’ve created is a sort of augmented traditional podcast,” Alastair Coote, the Mobile Lab’s developer, said. “You can listen to it the way you would a normal podcast, you can engage with the extra bit if you want to.” The web podcast player is a proof-of-concept for the Lab, which is winding down its two-year grant period, but other publishers with resources may want to build on ideas around better podcast listening experiences without a dedicated app (choose your-own-adventure interactive podcasts, perhaps, Coote and Schmalbach suggest).

Some technical roadblocks remain — for instance, if you get an alert that a new podcast episode is posted, and want to tap to download that episode in the background over wifi for future listening offline, you can’t. (Coote outlined a few of these challenges in his own writeup of the podcast player concept from last summer.)

Both platforms and news organizations have experimented with various efforts to give listeners interactivity on top of audio. The investigative reporting show Reveal has offered users the option of texting with the show at given points when it has document excerpts, images, or data to show listeners. The New York Times now offers its morning show The Daily within its app, and listeners can tap around to other stories as the audio continues to play. Last month Spotify announced it would roll out a feature in the app called Spotlight that would show listeners “contextual visual elements, such as photos, video and text, that appear as users move through each episode,” first with partners like BuzzFeed News and Gimlet Media.

Verrry interested to see this new podcast + visuals thing Spotify announced, particularly @BuzzFeedNews and @cheddar use. Despite launch press I can't find any sign of it in the iOS app. https://t.co/OC0lnGX64O

— Sasha Koren (@SashaK) January 23, 2018

The Mobile Lab’s idea is, as always, to get others in the industry thinking about prototypes they’ve built: “We’d love to see more people experimenting with this stuff,” Schmalbach said.

Here are the digital media features to watch during the 2018 Pyeongchang Olympics

Christine Schmidt — Mon, 12 Feb 2018 14:46:02 +0000

Each edition of the Olympics offers a shining host city, compelling tales of athletic triumph, and an opportunity for news organizations to test out new storytelling technology with a meticulously scheduled global event.

The 2018 Winter Olympics are no different, with Pyeongchang, South Korea partnering with its feisty neighbors to the north, the image of an Olympian redefined in the U.S. after gymnasts testified against their doctor convicted of sexual assault, and news organizations exploring all realms of media to cover the Games. Frankly, there’s a lot going on.

Here are some of the Olympic digital news coverage experiments to keep an eye on during the Winter Games, running until February 25. See others? Speak up!

For the latter, NBC is broadcasting much of the Games live in what it’s calling the “most live Winter Olympics ever,” including a portion on Snapchat. It will introduce the Snapchat Live tool designed for TV networks, according to Digiday’s Sahil Patel, to cover key moments of the games. They’ll also utilize cards built into Snapchat’s Our Stories to show the Games’ schedule, medal counts, etc. and launch a handful of new shows on Snapchat Discover. The shows clearly fit Snapchat’s quick-paced, flashy style, which NBC News has already been practicing with its twice-daily show. The shows feature the trials Olympians face to compete and the stories of how they made it to the Olympics in the first place.

BuzzFeed (in which NBCUniversal also has a hefty investment) is working with NBC to craft content for Snapchat, similar to their arrangement in 2016. And The Hollywood Reporter’s Natalie Jarvey also notes that special car coverage from NBC will be shown to Uber riders (not necessarily in South Korea) during the Games via the Uber app, showing “exclusive ‘in-car’ interviews as [athletes and announcers] travel to and from the various Pyeongchang venues.”

Viewers can also satiate their thirst for the stories coming out of the Olympics with NBC’s podcast partnership with Vox called The Podium. (They’re promising “K-pop, of course. Lots of K-pop.”) The Podium was introduced in December, with early episodes focusing on the global political and cultural context of these Games, but it also includes an Intel-sponsored episode on how technology is changing the Olympics. (You’ll never guess: NBCUniversal is also an investor in Vox Media.)

The New York Times

As my colleague Ricardo Bilton reported, The New York Times has brought back its personal messaging feature connecting readers to an on-the-ground reporter. Instead of using SMS, as in the 2016 Olympics, the team has revamped it to run through their mobile app (much cheaper than mass texting, they learned!) and to personalize content sent to users based on specific sports interest.

“One of the big benefits here is that we do control the whole space,” Troy Griggs, graphics editor at the Times, told Ricardo. “So much more is on the table now. Any interactive experience we build now we can tie together in a way that we wouldn’t be able to elsewhere, even on Instagram or Snapchat. We can really integrate our content and experience in a way that is new.”

On the heels of its augmented reality announcement — “Something profound has happened to your camera” — the Times has also introduced Olympics coverage in AR. Its first feature explores the multidimensional dynamics of Olympic bodies and Graham Roberts, the director of immersive platforms storytelling, described the project’s development (in the humbly-titled “How We Achieved an Olympic Feat of Immersive Journalism”):

Bringing the four Olympians into augmented reality required finding a technique to capture them not just photographically, but also three-dimensionally, creating a photo-real scan that can then be viewed from any angle.

We asked each athlete to demonstrate his or her form at specific moments. Nathan Chen held a pose showing exactly how he positions his arms tightly to his body during his quads to allow his incredible speed of rotation. Alex Rigsby showed us how she arranges her pads to best guard the net from a puck traveling at 70 miles per hour.

For the AR experience, we placed these scans into context — for example, placing Nathan Chen at the 20-inch height off the ground he would be midquad, based off photo reference and sometimes motion capture. In your space, this will truly be a distance of 20 inches because this is all true to scale.

The full AR experience is available in the Times’ iOS app, with some nifty-but-sub-AR visuals also available on the website. The Times also translated its AR feature into four pages of print.

POWER OF PRINT: Eye-catching @Olympics special section in NYT. 4-page foldout to capture @nathanwchen @chloekimsnow @MikaelaShiffrin #AnnaGasser #olympics pic.twitter.com/KbikJhMHWV

— sree sreenivasan 谢瑞睿 (@sree) February 9, 2018

The Washington Post

In 2016, the Post used a bot to write certain Olympics results stories (there are a lot of events!). This year, the Post’s Olympic Twitter bot is generating “short multi-sentence updates” about medals won in all events with a medal tally twice a day and reminders before events from 6 a.m. to 12 a.m. EST, though the Messenger bot does not seem to be running this cycle.

The Post also introduced an AR quiz-based game in its classic app for users to play with the speeds of competitors in nine Winter Olympic sports. (I correctly guessed the four-man bobsled over downhill skilling.) I’m not sure what more the AR component added beyond an in-the-room experience as the mini Olympians raced over my desktop keyboard versus just keeping the game within the app, but this game could ride high on the group sofa competition HQ Trivia has thrust upon us.

The paper’s Olympics coverage also includes a daily newsletter and, in a nod to the Post’s recent lean toward demystifying the jobs of journalists, first-person accounts of covering the Games from rookie Olympics reporter Chelsea Janes.

Some news: I’ll be writing a diary-like blog for the next few weeks in which I’ll chronicle my experiences as a first-timer at the Olympics. We all will probably regret this somehow, so I figured I might as well start things in the toilet. Literally. https://t.co/kFKSrD02or

— Chelsea Janes (@chelsea_janes) February 6, 2018

Other ways news orgs are Olympic-izing

USA Today is partnering with Google Assistant to provide daily news and highlights to all devices with the assistant after the prompt “OK Google, play the latest news from USA Today Olympics.” It’s updated every day at 3 p.m. EST and it’s an exclusive partnership.
Amazon devices have some updates, too: “Alexa, who won gold today?”, “Alexa, what happened in the Olympics today,” and “Alexa, how many medals does Shaun White have?” are all suggested phrasings.
If you’re interested in digging into the data of the Olympics, the Global Editors Network compiled tips on how to report on the numbers behind the Games, or sports in general.
The Canadian Broadcasting Corporation has embraced the time trans-Pacific time difference in a social media campaign encouraging Canadians to “flip the clock” and stay up late to watch the games.

Image from the 2018 Pyeongchang Opening Ceremonies courtesy of the Republic of Korea used under a Creative Commons license.

A network of news outlets and data agencies wants to unlock untold data stories across Europe

Shan Wang — Mon, 22 Jan 2018 15:28:06 +0000

Collaboration and data journalism suit each other. All that’s needed to make things work is time — a lot of it.

Stretching data stories across borders and languages is a feat of processes, and the European Data Journalism Network is hoping it’s ironed out the right ones. On board are 15 official partners, with news outlets like Germany’s Spiegel Online, Netherlands’ NRC Handelsblad, and Spain’s El Confidencial, as well as European data journalism and visualization agencies such as Journalism++ and LocalFocus. The think tank Osservatorio Balcani Caucaso Transeuropa (specializing in the policy issues of Southeast Europe, Turkey, and the Caucasus region) and the multilingual nonprofit news site VoxEurop are organizing the initiative and shepherding data stories, in what the coordinators call a mostly “bottom-up” approach.

“We really do believe that being part of the EU gives our societies the opportunity to compare with one another on how effective a given policy is, how specific social phenomena are spreading. And then through comparison, improve the debate, and maybe even the policies,” said Chiara Sighele, the project’s director at OBC Transeuropa, who helps run the network. The organization OBC Transeuropa includes a media arm that publishes relevant data stories to the EDJNet platform (for instance: “The number of asylum requests from Turkey has tripled over the last two years”). “We saw the project as an opportunity to improve our capacity to cover journalism on European affairs in a meaningful way, with meaningful partners. It’s also a way to take advantage of all this open data, to make the best out of these comparisons, and maybe contribute to the development of a culture of public debate based on facts and data.”

It’s a pretty grand hope. EDJNet is so far only in its fourth official month of a three-year grant period. It’s funded through the European Commission, with €975,000 (USD $1.2 million) to distribute to the two main coordinators OBC Transeuropa and VoxEurop, and the other partners in the network. Those two collaborators get the largest share of the funding, followed by Alternatives Economiques, which is producing more stories than other partners.

(The full European Commission grant is for €1.95 million, shared equally between the EDJNet initiative and another collaborative news hub spearheaded by news agencies Agence France-Presse, Deutsche Presse-Agentur, and Agenzia Nazionale Stampa Associata, called the European Data News Hub. That hub has been publishing since the summer and also makes news spots and multimedia pieces in multiple languages available for all news organization to use, though its materials aren’t all data-driven stories.)

Participating news organizations can seek out datasets for themselves, run their own analyses, do their own reporting, and publish the resulting stories on their own platforms in the original language of their outlets. VoxEurop then handles making sure stories are translated, along with accompanying charts and graphs and other visualizations, into English, French, German, Italian, Polish, and Spanish. Editorial operations are conducted over an EDJNet Slack. All published stories are then freely available to partner or non-partner organizations.

Stories appearing as part of the network, whether short hits or more in depth features, must have solid data sets as their backbone, and use a Europe-wide lens on political and socioeconomic changes in the region: a comparison of xenophobia in European cities (“despite the growing influence of racist movements, xenophobia is not generally on the increase”), for instance, or the political decline of social democrats across Europe (“its vote share has fallen in 15 of the 17 countries we examined”). Its graphics are visually plain, intentionally so, since they all need to be individually translated, and need to work on mobile and potentially live on many different organizations’ sites.

As it builds up a story base and improves tools, the EDJNet team is open to any new interested partners. One priority is to add a news partner from every EU country. The work of collaborating is itself also a useful exercise, Gian-Paolo Accardo, VoxEurop’s editor-in-chief who also coordinates EDJNet, told me.

“We strongly believe collaboration is the future of journalism,” Accardo said. “There are examples around us of more and more collaborative networks that are working just fine — ICIJ being one of the best. As resources get thinner and thinner over time for newsrooms, it would be better for us to collaborate, especially if you’re not operating in the same markets.”

The team is planning to roll out data journalism tools for the wider community, including a quote finder that will conduct sentiment analysis and a virtual help desk run by OBC Transeuropa to field questions from journalists around software or statistical methodology, something along the lines of the NICAR email list or the Data Journalism Awards Slack run through the Global Editors Network (GEN has also teased the launch of its Data Journalism Den next month, intended to be a global community for people working in data journalism and related fields). It’s also preparing for a couple of longer-term, collaborative investigations, as well as more shareable video offerings.

A feature called Stats Monitor, built by the Swedish Journalism++ and Netherlands startup LocalFocus, is rolling out soon on the EDJNet platform. The tool hooks into the API of European Commission’s database of stats about Europe — packed with datasets but not wholly user friendly — and alerts the network Slack group about relevant trends or changes to data in case a partner news outlet would like to write a story. Roughly 15 to 20 percent of EDJNet’s stories so far are based on original datasets compiled by partners, Accardo estimated; data mined from the Eurostat portal has been the primary resource.

Organizers are simultaneously tackling that perennial question of a feasible business model. How can the network pay for the time-consuming work of coordinating a dozen-plus news outlets, accurately translating everything into multiple languages, preparing data sets and stories, and maintaining tools like the Stats Monitor and the help desk, beyond the initial three-year period supported by the grant? Paid subscriptions for access to various features are likely. Accardo said he couldn’t reveal details yet, but that the team had been drawing up ideas for financial sustainability from the start.

“There’s the sustainability question. I would also say a medium-term goal is skill building: journalists who are now working on data who were not data journalists in the first place,” Accardo said. “Another is network building: we have brought newsrooms and journalists who are not used to working together, across languages and borders, to work together in a collaborative, not competitive, way.”

“We’re still in the first year, and there will be two intense more years ahead of us,” Sighele said.

Image of MEPs voting in Brussels, copyright European Union 2011 PE-EP/Pietro Naj-Oleari, used under a Creative Commons license.

Volt Data Lab grew from a personal blog for coding experiments to a full-fledged data storytelling agency

Natasha Madov — Mon, 22 Jan 2018 14:38:11 +0000

It’s been a tumultuous few years of Brazilian news. A year after the World Cup frenzy and the presidential election that ended in an impeachment a few months later, newsrooms turned inward: Which would be the next to downsize? As company after company laid off employees, some journalists in São Paulo began to wonder just how many reporters and editors had become unemployed in the shrinking of the news industry in Brazil in the past couple of years.

Sérgio Spagnuolo was among those wondering. Spagnuolo, at the time a freelance business reporter who had also worked at the UN, Reuters, and Yahoo, recently started dabbling in data journalism, and decided the question of layoffs in newsrooms nationwide was the perfect space to explore new data skills.

Arriving at an actual number turned out to be an arduous task. There was no semblance of a centralized database that tallied job losses. “Brazil’s Labor Department counts all accredited journalists, no matter if they work in newsrooms, PR agencies, or corporate communications, but only if they are full-time employees,” Spagnuolo said. “Unions and associations also didn’t count them. Many companies hire journalists as contracted labor, which is against the law, so they were not eager to help, either.” So he sourced the numbers through news reports and tips from friends, and after a full month, published his self-funded project to Medium, under the Volt Data Lab banner. He titled it “A Conta dos Passaralhos” (passaralho is Brazilian newsroom slang for layoffs that combines the Portuguese words for bird and the male genitals).

Spagnuolo found that more than a thousand journalists had been laid off by 50 newsrooms around the country since 2012. The project made a splash in the Brazilian news industry and made a name of Volt Data Lab, then still a side project.

Volt has evolved from a blog hosting Spagnuolo’s own data-driven explorations to an agency specializing in data journalism. It now provides data-based stories and reports to Brazilian legacy newsrooms, digital news startups, and nonprofits, as well as to PR and advertising agencies. The outfit is currently made up of Spagnuolo and one other full time reporter, with freelancers on contract for specific assignments.

Depois de um 2016 com menos demissões em redações, vimos uma disparada de ocorrências de "passaralhos" neste ano, segundo balanço mais recente da Conta dos Passaralhos, projeto do @voltdatalab https://t.co/H3nRu8BT9q pic.twitter.com/2dKz2nWYa6

— Sérgio Spagnuolo (@sergiospagnuolo) December 19, 2017

Volt started as not much more than a personal workspace for Spagnuolo to experiment with data and coding. On his own time, Spagnuolo canvassed publicly available datasets and published stories with his findings (this one, for instance, about the assassinations of Brazilian environmentalists).

“Nobody cared and nobody read it,” Spagnuolo told me, laughing. But following the success of “A Conta dos Passaralhos,” he started receiving freelance assignments for data-driven stories, which he published under the Volt byline.

This evolution coincided with a growing interest among Brazilian newsrooms, slim as they were, in beautifully told data stories online, spurred by the passing of a local version of the Freedom of Information Act in 2012.

“It unearthed a trove of government information that was either difficult or unavailable before,” said Tai Nalon, cofounder of Aos Fatos, a political fact-checking startup that partners regularly with Volt on news stories and other reporting projects. (Their latest effort together is a fact-checking bot, funded by Facebook.)

At least four large legacy newsrooms have created dedicated news desks since 2012, such as O Estado de S.Paulo’s Estadão Dados. A piece Estadão Dados produced on federal university loans became the first work of data journalism to win a Prêmio Esso, Brazil’s top journalism award. On the digital side, a prominent example is Nexo Jornal, an explainer-focused digital news startup that’s mastering charts, graphs, and interactives.

“We realized that we had a demand for better online experiences among our audience, coming from our millennial readers in particular. They naturally consume digital news and are not willing to subscribe to print,” said Leandro Demori, who was the online editor for Piauí, a longform journalism magazine, when I first spoke to him (he’s since joined The Intercept Brazil as its executive editor). Demori first contacted Spagnuolo when he was working on Medium’s launch in Brazil and looking for good content in Portuguese to feature in the platform. He found Volt’s layoffs report. The two then worked on several data-based stories for Piauí’s site.

The new availability of datasets and openness to data-based reporting in newsrooms were met by Brazilian reporters eager to hone their own data skills. Not all Brazilian journalism schools offered comprehensive data journalism classes, but initiatives like the Knight Foundation’s MOOCs on data and programming in Portuguese attract thousands of students. Online courses from the Brazilian association for investigative journalism Abraji on data journalism and the freedom of information law are consistently in high demand.

Spagnuolo experienced the boom in interest first-hand. Volt opened its own journalism fellowship last year and received an overwhelming 360 applications in a few weeks with little promotion other than word of mouth (“I thought I would get 20, 30 applications at the most,” Spagnuolo told me). The three-month paid fellowship went to Renata Hirota, a former reporter turned statistics major, who became Volt’s second full-time employee.

Its projects were time-consuming, and Volt was still in search of a sustainable business model. As part of the 2016 Tow-Knight Center for Entrepreneurial Journalism cohort, Spagnuolo started thinking seriously about how to turn Volt into a real business (disclosure: I attended the same program in 2014). He devised a scalable B2B model that would offer data visualization packages for small newsrooms. Back in Brazil, he found there wasn’t enough demand to make that model work, and began teaching courses on data journalism, while continuing to work on one-off projects under the Volt byline.

But he had thus far been relying only on people in his network to refer him for projects, not seeking out clients himself.

“Like many journalists, I wasn’t comfortable with this role. I wanted someone else to do it for me,” he said. But after encouragement from a mentor, he changed his perception. “I realized that nobody would sell Volt better than myself. It was a game change.” A week after that talk, he closed contracts with new clients: PR and advertising agencies. He had found a new revenue stream in whitlabel data-based reports for other companies.

Spagnuolo wouldn’t disclose yearly revenue, but such large-scale projects and white-label reports now account for 84 percent of Volt’s revenue. The rest is divided between news stories, consulting, and data training courses. He keeps a lean full-time operation, but in 2017 was able to work with 18 freelancers, among them journalists, designers, and developers.

Volt Data Lab as an agency has had no shortage of business. One such large-scale project, Atlas da Notícia (“News Atlas”), a partnership with the nonprofit Projor that was released in November, mapped the scarcity of local print and digital outlets in Brazil. It found that only 1,125 out of 4,500 Brazilian cities had at least one news outlet, meaning that 35 percent of the country’s population has no substantial source of local news.

Atlas da Notícia (News Atlas) now has an English landing page! In this project you can check out a database of more than 5,000 media companies in Brazil, as well as aggregate analysis #ddj #localjournalism #datajournalism https://t.co/psq8RCA45G

— Volt Data Lab (@voltdatalab) December 11, 2017

2018 will be another momentous year, marking the first presidential elections since Dilma Rousseff’s impeachment at the end of 2015 and the FIFA World Cup Russia. Volt’s forthcoming work includes data projects around these two events, as well as continued work on the News Atlas, and other projects with mainstream newsrooms. It’s also working on a podcast about the impact of open data in everyday life.

At the close of 2017, Spagnuolo again updated Conta dos Passaralhos, the layoffs project that had started it all. 2017 proved to be the second-worst year in employment numbers in the news industry since 2012. Volt Data Lab has its work cut out for it.

Wall in Brazil by Gustavo Minas, used under a Creative Commons license.

Cold, hard numbers will drive the stories on this Internet-crawling company’s new media arm

Christine Schmidt — Wed, 17 Jan 2018 14:53:52 +0000

Google Trends, but for more than just searches and not freely public. The Bloomberg Terminal, but for data trails over time. Alternative data, but for journalists. (The Wall Street kind, not the Kellyanne Conway kind.) The price of a Chipotle burrito bowl, but comparing the price differences across zip codes.

Thinknum Media launched this morning under the leadership of longtime tech journalist Joshua Fruhlinger (formerly of Aol Tech, Engadget, and The Wall Street Journal) and fintech company Thinknum, which crawls the Internet to provide data on other businesses to subscribers. Fruhlinger leads the team of writers charged with taking the data and building “facts-only” stories around it, which other news organizations could aggregate from or could prompt business-minded folks to buy into Thinknum’s database to monitor in the future.

“As a journalist who has been everywhere from Engadget to The Wall Street Journal to TMZ and seen all different sides of the reporting world, one thing I’ve learned is that facts and actual trends are some of the most fertile grounds for being able to tell a story,” Fruhlinger told me. Or, as he put it in his intro post, “There is a universe of numbers out there, and I want to tell their stories.”

Thinknum’s cofounders approached Fruhlinger about the media opportunity for their database, which is built by frequent crawls across the Internet in search of quantifiable data, such as the aforementioned Chipotle prices. The project has been in the works for over a year. Fruhlinger says he was drawn by the cold, hard numbers Thinknum has been stockpiling since its founding in 2013: “…pulling that data every six hours from all of Chipotle’s locations, parsing it into a database, and normalizing it into an interface that an investor has a subscription to,” he explained for the burrito bowl example. “What’s proprietary here is that we’re collecting it on a regular basis and normalizing it.”

Nasdaq quoted Thinknum cofounder Justin Zhen in 2016:

Take a company like Home Depot for example. With Thinknum I can quickly pull up Home Depot store locations on a map and then overlay it with locations of a competitor such as Lowes. Then I can connect this information to demographic data such as median household income by county.

With this I can determine where income is going up and which of these two companies will benefit from that because they have more stores in or near that area.

We can look at unemployment rates and see which restaurant chains are going to lose business. We can look at weather data and if it rained a lot this past quarter which companies will benefit and which will be hurt by that.

Stories produced by Fruhlinger, Jeremy Bloom, and three freelance writers are based on trends ranging from Amazon reviews to the northernmost place on Earth to buy an iPhone (Barrow, Alaska) to how a dating website is stealthily faking its user numbers to BuzzFeed’s hiring halt:

The Wall Street Journal reported that BuzzFeed was “on track to miss its revenue target for this year by a significant amount.”

The report noted that BuzzFeed was “targeting revenue of around $350 million in 2017 but is expected to fall short of that figure by about 15% to 20%.”

This news made their investors sad kitties. Which in turn panicked management. At Thinknum, one of the things we track is job listings on LinkedIn, and you can see it in the numbers: Buzzfeed’s job listings started to contract.

Think of it as content marketing: You can’t access the full data on your own unless you buy into Thinknum’s database. (Or otherwise manually enter in zip codes to Chipotle’s website to find burrito bowl prices, or count BuzzFeed’s job listings on LinkedIn every day.) So the journalism serves as an ad for the paid product. But it’s still an interesting way to find legitimate data to write about, especially without PR people for the companies themselves pitching numbers to journalists.

Beyond the Thinknum Media website, you can get their stories on social media and on a forthcoming weekly newsletter. You can also tune into Ann Arbor’s top country radio station every Wednesday morning to hear Fruhlinger present fun facts from Thinknum Media stories on the “Breakfast with Bubba” show.

Photo by Fabio used under a Creative Commons license.

How this local news co-op gets its members interested: Getting them involved in the production of news

Liam Corcoran — Fri, 03 Nov 2017 12:25:10 +0000

“You buy in, so we can’t sell out.”

That’s one of the taglines the Bristol, England-based local news organization Bristol Cable has adopted recently.

“Basically everywhere is the sentiment that the mainstream media, particularly the tabloid media, is not really serving the needs of the public as a whole,” said Adam Cantwell-Corn, who cofounded the Cable with Alec Saelens and Alon Aviram. “What we need to do is transfer that commonly held opinion into positively framed: Okay, here’s something we can do about it.”

The Cable is a quarterly print magazine with a circulation of around 30,000 and a website that publishes around five pieces a week, both of which are free for anyone to read. Run as a co-operative, the publication takes its direction almost entirely from its 1,800 members, who pay an average of £3 ($4) a month for access to Bristol Cable events and a vote in how the publication operates. Apart from membership fees, the Cable is supported through grant funding, print advertising, and workshop commissions. It’s aiming to hit 3,400 paying members within the next year.

The Cable, with its full-time editorial team of five, isn’t interested in maintaining a breaking-news desk. As the team sees it, its journalistic strength lies in going beyond breaking-news headlines to report on stories from Bristol that aren’t already part of the mainstream news cycle. A crucial part of that strategy is developing the diverse network of bought-in members who can pitch story ideas, bring their personal or professional experience to assist with investigations, and to help deliver the magazine to more readers in their own communities.

Much of the inspiration for the co-op model stemmed from frustration with the decline of local journalism in the region. Local media in the U.K. has been hit hard through consolidations, cutbacks, and closures in recent years, leaving gaps in vital ground-level reporting. 18 local newspapers around the U.K. closed over summer 2017 alone.

The Cable has bet on hyperlocal stories with the belief that Bristolians can be persuaded to pay for quality local journalism.

“A local media desert has started to appear, with failing business models behind it. As a result, we’ve identified a loss of quality in the existing media,” Cantwell-Corn said. “The idea was to step into that gap, create a niche, and look at stories that didn’t have much traction, even though they dealt with big issues that were in people’s daily lives.”

At its monthly events, the Cable staff brings members behind the scenes on major stories that the team has been working on, relating the reporting and the expertise of industry professionals to members’ day-to-day lives. Past events have included Q&As between journalists and local health care providers or talks with an investigative journalist digging into the local housing crisis, and were open to the public for a small fee or free for Bristol Cable members, “refugees and asylum seekers, and people in financial hardship.”

“People are curious to know what the story is behind the headlines. How did we break a story, what impact did it have, what were the implications in terms of resources that were invested in it,” Saelens said. “That was something that we could offer the members as a privileged audience, so we decided to tap into it.”

The events for members are as much an effort to try to restore people’s trust in local media as they are a means to grow the Cable’s own presence in Bristol, Saelens said.

“It’s about re-establishing the bonds that have been so severely lost between local media and the community that it supposedly serves,” he told me. “I think that’s been lost within much of the corporate media.”

Cable stories often try to dig into national issues on a local level, such as this analysis of fire safety measures in Bristol high-rises following the Grenfell Tower disaster in London, or this investigation into the relationship between an atomic weapons institution and a local university. Another strand of coverage comes from collaborating with other local media outlets, and country-wide organizations like the Bureau for Investigative Journalism, which helped support a Cable investigation into racial profiling at U.K. immigration checkpoints.

Not all Cable members choose to have regular involvement with the running of the Cable, instead staying involved through voting: at co-op meetings, members rank issues such as housing, education, and immigration that they think should be covered more, and vote on how the Cable uses its finances and other resources.

The site regularly publishes its spending reports, membership numbers, and details about its ongoing projects. Their open reporting is fastidious; one report shows the vote breakdown on whether or not the Cable should apply for a Google Digital News Initiative grant (68.2 percent were in favor).

A smaller, core group of its members, along with the Cable’s full-time staff, are responsible for a range of operations, from distributing copies of the magazine at local markets to investigating and writing stories themselves.

“At the widest level, most people are involved with big questions about principles, priorities, and key ethical questions,” Cantwell-Corn said of the co-op’s decision-making process. “As it filters down to the practicalities of implementation, it gets down to the sub teams, the editors and the individuals.”

Diversity in its membership base has also improved the Cable’s ability to cover Bristol issues more comprehensively. Last year, Cable staff ran a free mentoring course through its Media Lab — the organization’s training, education, and innovation arm — which provided journalism training to non-journalist locals interested in writing for the magazine. The Cable has been able to bring on “several dozen” freelancers this way.

Recent statistics from Bristol City Council show that 187 nationalities and over 90 different languages are represented in the city. The Cable has been trying to cover issues relevant to these immigrant communities, as well as translate stories so that they reach wider audiences. Features of interest to particular communities have been translated into relevant languages like Spanish or Somali. The Cable also hosts roundtable discussions with community leaders to get their perspective on relevant issues, which are turned into podcasts.

Changing perceptions about autism: a Somali parent is helping to educate others in their community https://t.co/4GTTm6OLYB

— The Bristol Cable (@TheBristolCable) November 1, 2017

Cantwell-Corn is optimistic about the Cable’s co-op model and is convinced that it could be replicated by other local journalism outfits. But the training sessions, educational events, and efforts to improve ties with its community aren’t the only things that the team is juggling. The team understands the Cable must continue to grow its readership and paying membership in order to pay contributors and become more financially self-sufficient. (As the group is registered as a co-op, they’re legally obliged to reinvest all profits back into the Cable.)

Saelens is realistic about the financial challenges of such a business model. “We need to be hard-nosed about that. We may be a co-operative, but that doesn’t mean we shy away [from finances],” he said. “On the contrary, we need to be even more strict and disciplined about the way that we do things in order to obtain the objectives that we have.”

Sometimes, making money at all can feel like an uphill battle, the co-founders will readily admit. Cantwell-Corn said that access to grant funding has become more competitive as other cash-strapped U.K. newsrooms look for other ways to finance their reporting. The Cable also faces the same challenges as other print media in sustaining their own print advertising. And while it publishes tens of thousands of print editions each year, it’s still in the relatively early stages of developing a robust online presence.

Cantwell-Corn is clear on what his team needs to do to almost double their paying members by late 2018.

“We basically have to continue to make the argument, that hasn’t been made for some time, or hasn’t been made well enough, that media is a public good, and therefore it needs to be paid for. We want to keep it free at the point of access, but that means we need to get people to stump up at a certain level in order for that to keep happening.”

Bristol Cable issues image used with permission.

“Instagram for data”: Grafiti wants to make it easier to create and share data visualizations on smartphones

Ricardo Bilton — Thu, 02 Nov 2017 13:05:24 +0000

One of the features that made Instagram such a runaway success was its simple-yet-robust set of photo editing tools, which made it easy for users to tweak photos before sharing them with friends.

The developers of Grafiti want to bring a similar kind of democratization to the real-time production of charts and data. The app, one of 11 ventures to emerge out of startup accelerator Matter’s latest class, is a suite of tools designed to make it easier for smartphone users to explore verified datasets, create and tweak charts, and share their findings with others via texting or any of the big social networks — all while on-the-go.

Farhan Mustafa, Grafiti’s CEO, said the app was a product of his experiences in the field reporting for Al Jazeera. While other journalists were able to use their smartphones to tell stories in the moment via text, photos, and video, it was nearly impossible to do the same with charts and data, which required more sophisticated software on his laptop. “I had some stories I wanted to get out there, but I just couldn’t do it,” he said. “All I had was my phone.”

Grafiti aims to streamline the process of introducing data and charts to storytelling. When users open the app, they can search Grafiti’s archive of verified datasets and then visualize elements of them using the app’s charting tools, letting them restructure and restyle their charts before sharing them with others. There’s also a conversational component. During his demonstration at Matter’s demo day in October, Mustafa used a hypothetical feud between two friends over the YouTube popularity of the songs “Despacito” and “Gangnam Style” to show how Grafiti’s charts can be the basis for conversations.

“Our core question is: How can we introduce facts into storytelling on a more immediate basis?’ Mustafa said. “We want to make it easier for people to respond to facts in the moment, especially in social conversations. These days, we have all these conversations about policy and immigration where data comes up, and it’s amazing how much plain text is still used to explain these ideas.”

Grafiti is also trying to make its way into newsrooms. Here, the pitch is centered on a central issue bogging down news organizations, particularly small ones, as they try produce more data-driven stories. In a report published in September, Google News Lab found that over 74 percent of data-driven stories take more than a day to create. Grafiti is designed to speed up that process, largely by collecting the data and cleaning up, which together take up “80 percent of the work,” said Mustafa. Simplifying the process will also help make it easier for reporters who lack intensive data skills to incorporate data in their reporting, which is particularly vital for smaller newsrooms “We want to help make it so that this a normal thing that anyone can use, not limited to teams at places like The Upshot that spend lots of money hiring a great team with interactive developers,” said Mustafa.”We want to figure out how to make all of this more accessible to local newsrooms, which can just plug and play the visualizations.”

Grafiti is in different stages of pilot projects or has had conversations with news organizations including The Economist and Thomson Reuters, which are looking for ways to integrate the app in their news production and presentation. UNICEF, too, is using the tool. Francesco Marconi, strategy manager at the AP, said that the organization has experimented with Grafiti, which “showed that, if fully integrated, it could democratize how data stories are told, particularly in situations where visualizations can provide context to breaking news.” He added that the AP has encouraged Grafiti to continue to build out its pool of datasets, which needs to be more expansive if Grafiti wants reporters to keep coming back to use the tool.

More broadly, Grafiti wants to do its part to help expand the public’s overall comfort with using and understanding data. But Mustafa says that, thanks to the proliferation of fitness trackers, that’s an increasingly easier lift. “Everyone is checking their steps today, so this idea of checking data and understanding it isn’t as crazy of a thing anymore. People are looking at data all the time and don’t even realize it.”

Not a revolution (yet): Data journalism hasn’t changed that much in 4 years, a new paper finds

Laura Hazard Owen — Mon, 16 Oct 2017 17:51:46 +0000

When you hear the words “data journalism,” you also often hear words like “revolution” and “future.” But — according to a new paper that looks at a couple hundred international data journalism projects nominated for awards over four years — most of the journalism itself hasn’t changed as much as you’d think: It still mostly covers politics, it’s still labor-intensive and requires big teams, it’s still mostly done by newspapers, and it still primarily uses “pre-processed public data.”

“Our findings challenge the widespread notion that [data-driven journalism] ‘revolutionizes’ journalism in general by replacing traditional ways of discovering and reporting news,” write Wiebke Loosen, Julius Reimer, and Fenja de Silva-Schmidt, in a paper published online last week in the journal Journalism. (It’s paywalled.)

Loosen and Reimer (both from the Hans-Bredow-Institut for Media Research in Hamburg, Germany) and De Silva-Schmidt (University of Hamburg) analyzed 225 projects that were nominated finalists (not just submitted) for the Data Journalism Awards between 2013 and 2016, logging data sources and types, visualizations, interactive features, topics, and producers, to see how projects changed over time, how award winners differed from projects that were only nominated, and where there might be room for innovation and improvement. Why look at these projects? They’re “what the field itself considers to be significant examples of data-driven reporting,” the authors write, and the winners are likely to shape future development of the field.

Here are some of the trends seen across the 225 projects:

— Data journalism is still very labor-intensive. Of the 192 projects in the sample that had a byline, they named on average “just over five individuals as authors or contributors.” About a third (32.7 percent) of projects were done in collaboration with “external partners who either contributed to the analysis or designed visualizations.”

— Newspapers are still doing the most data journalism and winning the most awards for it. A total of 43.1 percent of the nominees, and 37.8 percent of the award-winners, were submitted by newspapers. After that:

Another important group comprises organizations involved in investigative journalism such as ProPublica and The International Consortium of Investigative Journalists (ICIJ), which were awarded significantly more often than not (total: 18.2%, DJA-awarded: 32.4; only nominated: 15.4). Print magazines and native online media (8.4% each), public and private broadcasters (5.8 and 5.3%), news agencies (4.4%), non-journalistic organizations (4.0%), university media (3.1%) and other types of authors (2.7%) are represented to much lesser extents. Interestingly, stories by print magazines, news agencies and non-journalistic organizations have not been awarded at all.

— It’s a lot of politics. Almost half the analyzed pieces (48.2 percent) covered a political topic, followed by “societal issues” like census results and crime reports (36.6 percent), business/the economy (28.1 percent), and health and science (21.4 percent). “Culture, sports, and education attract little coverage (2.7% to 5.4%).” Most of the projects also dealt with only one topic category, rather than “spread[ing] into two or more different topical areas (e.g. political decisions and their societal impact by investigating how weapon laws influence the number of mass shootings).” The authors wonder if this is a function of industry awards being biased toward more serious topics.

— Data journalism is becoming more critical. Fifty-two percent of the pieces analyzed included “elements of criticism (e.g. on the police’s wrongful confiscation methods) or even calls for public intervention (e.g. with respect to carbon emissions)…This share grew consistently over the four years (2013: 46.4% versus 2016: 63.0%) and was considerably higher among the award-winners (62.2% vs. 50.0%”).

— Most projects still rely on official (rather than originally collected) data.

Probably not surprisingly, award-winning stories were more likely to contain “data obtained through requests, own collection or leaks.” The authors were surprised, however, that “despite data journalism’s often-cited association with openness and transparency, in over two-fifths of pieces, journalists did not indicate at all how they accessed the data they used.”

— Visualizations haven’t gotten much more sophisticated. Static images and charts were still found the most frequently; “typical combinations of visualizing elements include images with simple static charts (40.0% of all cases) or with maps (32.4%) as well as maps coupled with simple static charts (31.1%).” Award-winning pieces were more likely to be visually rich.

— Interactivity is “a quality criterion,” but sophisticated interactivity is really rare. Zoomable maps and filter functions are most common, perhaps because they’re often already included with free software tools that data journalists are likely to use. “Our results are in line with others’ observations of a ‘lack of sophistication’ in data-related interactivity…they often include only ‘limited possibilities for the audience to make choices’ or ‘minimum formal interactivity’ simply ‘for interactivity’s sake.'” It’s also unclear how much audiences actually want interactive visualizations in news stories.

Overall, the authors find, data journalism is still labor-intensive, slow to respond to breaking news, and reliant on the domains that already regularly produce data, such as elections. “Lacking those important characteristics of journalism — currentness and thematical universality — data journalism is more likely to complement traditional reporting,” the authors write, “than to replace it on a broad scale.”

Visualization of Wikipedia edits by Fernanda Viégas used under a Creative Commons license.

尚未成为一场革命：调查发现数据新闻在四年间并未改变很多

Laura Hazard Owen — Mon, 16 Oct 2017 17:50:19 +0000

当你听到“数据新闻”这个词汇，你常常会接着听到“革命”或者“未来”。然而——根据一篇新的调查了过去四年数百个被提名获奖的国际数据新闻项目的论文——大部分新闻并未像你想象那般改变很多：它仍旧报道政治、仍旧是劳动密集型、仍旧需要大团队、仍旧大多依靠报纸打造，并且仍旧主要使用所谓“预处理公共数据”。

“我们的发现与广为流传的（数据驱动的新闻）将会因为改变传统的发现和报道新闻的方式从而‘革命化’新闻行业的预言相抵触”，Wiebke Loosen, Julius Reimer和Fenja de Silva-Schmidt三位作者在上周发表于新闻杂志的一篇论文中写道。

Loosen和Reimer，都来自位于德国汉堡的Hans-Bredow媒体研究所，加之汉堡大学的De Silva-Schmid三位学者共分析研究了从2013-2016年被数据新闻奖列为最终提名的225个项目（不仅仅是提交而已），登录数据与类型、可视化、互动特质、话题与制作，来分析项目是如何依时间而改变的，获奖者与仅获提名者区别何在，在哪些地方还有创新和提升的空间。为什么研究这些项目呢？它们是这个领域本身认为是数据驱动式报道的显著案例，”这些作者写道，并且这些获胜者很有可能将会塑造这一领域的未来发展。

这里是纵观225个项目后得出的一些趋势：

— 数据新闻仍旧是劳动密集型的。 在192个有署名的项目中，平均都有“超过五个人作为作者或者贡献者”。大约1/3（32.7%）的项目是与“帮助分析或者视觉设计的外界人士搭档合作的”。

— 报纸仍旧进行着大部分的数据新闻工作，并且赢得了大多数相关奖项。 占总数43.1%的提名和37.8%的获奖者是由报纸提交的，并且：

其他的重要群组由涉及调查新闻报道的ProPublica和国际调查记者联盟(ICIJ)等组成，他们更经常获奖而非落选（总数：18.2%，获奖：32.%，只提名：15.4%）。纸面杂志和全国在线媒体（各为8.4%），公共和私人广播电视（5.8和5.3%），新闻通讯社（4.4%），非新闻机构（4.0%），高校媒体（3.1%）以及其他类型的作者（2.7%）相对就要少得多。有趣的是，纸质杂志、新闻通讯社以及非新闻机构的报道竟然完全没有获奖。

— 多数关于政治。 几乎半数的作品（48.2%）报道政治话题，接下来是“社会议题”比如民调结果以及犯罪报道（36.6%），财经新闻（28.1%）和健康与科学报道（21.4%）。“文化、体育和教育只吸引到很少量的报道（2.7%和5.4%）。”大多数项目只关注一个分类，而非“涉足两个或更多的话题领域（比如通过调查关于枪支武器的立法对于大型枪击事件的影响了解政治决策和其社会影响）。”作者发问，是否这类奖项的一大功能就是偏袒严肃议题。

— 数据新闻正在变得更具批判性。 52%的作品分析包含了“批判元素”（比如关于警察不当收缴充公的报道）或者呼吁公共干预（比如关于碳排放）……这个组成在过去四年不断提高（2013年占比46.4%，到2016年占比63%），而在获奖者中占比更是相当高（62.2% vs. 50.0%)”。

— 大多数项目仍旧依赖官方数据（而非一手原始）数据。 或许并不出人意料，获奖报道更多都包含“通过质询、自己采集或者泄露后得到的数据”。这些作者对于“尽管数据新闻往往被与公开透明相提并论，但在2/5的项目中，记者都没有指明他们获取所使用数据的明确方式。”

—可视化变得更加复杂。 静态图片和图表依然最为频繁地被使用；“通常是两者的结合 (占比全部案例的40.0%) 或者是使用地图（32.4%）以及配合简单静态图标使用的地图（31.1%）。” 获奖作品通常视觉上更加丰富。

— 互动性成为了“质量标准”，但复杂精妙的互动又很稀缺。 可伸缩地图和过滤功能最为常见，可能因为它们常常本来就已经包含了数据新闻最常使用的免费的软件工具。“我们的结果与其他观测一致，在数据相关的互动性方面缺乏复杂性…它们常常只包含“使得受众能够做选择的有限可能性”或者只包含“为了互动而互动的最小的互动性”。同样，在新闻中受众究竟有多需要可视化互动性也并不清晰。

总而言之，论文作者发现，数据新闻仍是劳动密集型的，在回应突发新闻方面比较缓慢，更依赖于那些本就周期性产生数据的领域，比如选举。“缺乏那些新闻行业重要的特质——当下性以及主题普适性——数据新闻更像是传统报道的补充，”作者写道，“而非是其更广泛意义上的取代者。”

The Chinese version of this story was originally published at IJNet.

Visualization of Wikipedia edits by Fernanda Viégas used under a Creative Commons license.

Não é uma revolução (ainda): jornalismo de dados não mudou tanto

Laura Hazard Owen — Mon, 16 Oct 2017 17:50:11 +0000

Quando você ouve as palavras “jornalismo de dados”, muitas vezes também escuta “revolução” e “futuro”. Mas de acordo com um novo estudo que examina centenas de projetos internacionais de jornalismo de dados nomeados para prêmios nos últimos quatro anos, a maioria dos projetos de jornalismo não mudou tanto quanto se esperava: ainda cobre principalmente a política, ainda é muito intensivo em mão-de-obra e requer grandes equipes, ainda é feito principalmente por jornais e ainda usa principalmente “dados públicos pré-processados”.

“Nossas descobertas desafiam a noção generalizada de que [o jornalismo baseado em dados] ‘revoluciona’ o jornalismo em geral, substituindo formas tradicionais de descobrir e apurar notícias”, escreveram Wiebke Loosen, Julius Reimer e Fenja de Silva-Schmidt em um artigo publicado online na revista Journalism.

Loosen e Reimer, ambos do Instituto Hans-Bredow para Pesquisa de Mídia em Hamburgo, Alemanha, e De Silva-Schmidt, da Universidade de Hamburgo, analisaram 225 projetos que foram finalistas (não apenas inscritos) de Prêmios de Jornalismo de Dados (DJA, em inglês) entre 2013 e 2016, registrando fontes e tipos de dados, visualizações, recursos interativos, tópicos e produtores, para ver como os projetos mudaram ao longo do tempo, como os vencedores dos prêmios diferiram dos projetos que foram apenas nomeados e onde pode haver espaço para inovação e melhoria. Por que olhar para esses projetos? Eles são “o que o próprio campo considera como exemplos significativos de reportagem baseada em dados”, escreveram os autores, explicando que os vencedores provavelmente conduzirão o desenvolvimento futuro do campo.

Aqui estão algumas das tendências observadas nos 225 projetos:

— O jornalismo de dados ainda requer muita mão-de-obra intensiva. Dos 192 projetos na amostra com reportagens assinadas, em média “pouco mais de cinco pessoas receberam crédito como autores ou contribuidores”. Cerca de um terço (32,7 por cento) dos projetos foram realizados em colaboração com “parceiros externos que contribuíram para a análise ou visualizações projetadas.”

— Jornais ainda lideram na produção do jornalismo de dados e ganham mais prêmios. Um total de 43,1 por cento dos candidatos e 37,8 por cento dos premiados foram enviados por jornais. Depois disso:

Outro grupo importante é composto por organizações envolvidas no jornalismo investigativo, como a ProPublica e o Consórcio Internacional de Jornalistas de Investigação (ICIJ, em inglês), que foram premiados significativamente com maior frequência (total: 18,2 por cento, concedido pelo DJA: 32,4, apenas indicado: 15,4). Revistas impressas e mídia online nativa (8,4 por cento cada), emissoras públicas e privadas (5,8 e 5,3 por cento), agências de notícias (4,4 por cento), organizações não jornalísticas (4,0 por cento), mídia universitária (3,1 por cento) e outros tipos de autores (2,7 por cento) são bem menos representados. Curiosamente, matérias impressas de revistas, agências de notícias e organizações não jornalísticas não foram premiadas.

— É muita política. Quase metade das matérias analisadas (48,2 por cento) abrangeu um tema político, seguido de “questões sociais”, como resultado do censo e crimes (36,6 por cento), negócios e economia (28,1 por cento) e saúde e ciência (21,4 por cento). “Cultura, esportes e educação atraem pouca cobertura (2,7 por cento para 5,4 por cento)”. A maioria dos projetos também tratou apenas de uma categoria tópica, em vez de cobrir duas ou mais áreas tópicas diferentes (por exemplo, decisões políticas e seu impacto social, investigando como as leis de armas influenciam o número de massacres)”. Os autores se perguntam se é porque os prêmios da indústria preferem tópicos mais sérios.

— O jornalismo de dados está se tornando mais crítico. Cinquenta e dois por cento das matérias analisadas incluíram “elementos de crítica” (por exemplo, nos métodos de confisco ilícito da polícia) ou mesmo chamadas para intervenção pública (por exemplo, em relação às emissões de carbono)… Esta parcela cresceu consistentemente ao longo dos quatro anos (2013: 46,4 por cento vs. 2016: 63,0 por cento) e foi consideravelmente maior entre os vencedores dos prêmios (62,2 por cento vs. 50,0 por cento).

— A maioria dos projetos ainda depende de dados oficiais (em vez de coletados pelo próprio projeto). Provavelmente não é supresa que as matérias premiadas foram mais propensas a conter “dados obtidos através de pedidos, coleta própria ou vazamentos”. Os autores ficaram surpresos, no entanto, que “apesar de associarmos mais frequentemente o jornalismo de dados com abertura e transparência, em mais de dois-quintos das matérias, os jornalistas não indicaram como acessaram os dados que usaram.”

— As visualizações não são muito mais sofisticadas. As imagens estáticas e os gráficos ainda foram encontrados com mais frequência. “As combinações típicas de elementos de visualização incluem imagens com gráficos estáticos simples (40,0 por cento de todos os casos) ou com mapas (32,4 por cento), além de mapas acoplados a gráficos estáticos simples (31,1 por cento)”. As matérias premiadas tiveram mais probabilidade de serem visualmente ricas.

— A interatividade é “um critério de qualidade”, mas a interatividade sofisticada é realmente rara. Mapas de zoom e funções de filtro são mais comuns, talvez porque muitas vezes já estão incluídos em ferramentas de software gratuitas que os jornalistas de dados provavelmente usarão. “Nossos resultados coincidem com as observações de outros de que falta sofisticação na interatividade relacionada a dados… geralmente incluem apenas possibilidades limitadas para o público fazer escolhas ou interatividade formal mínima só para adicionar algum tipo de interatividade.” Também não está claro o quanto o público realmente quer visualizações interativas em notícias.

No geral, os autores acham que o jornalismo de dados ainda é intensivo em mão-de-obra, lento para responder às últimas notícias e dependente das áreas que já produzem dados regularmente, como eleições. “Pela falta dessas características importantes do jornalismo (atualidade e universalidade de temas), é mais provável que o jornalismo de dados complemente as reportagens tradicionais”, segundo os autores, “do que substituí-las em larga escala.”

The Portuguese version of this story was originally published at IJNet.

Visualization of Wikipedia edits by Fernanda Viégas used under a Creative Commons license.

Lejos de la revolución: el periodismo de datos no ha cambiado tanto en cuatro años

Laura Hazard Owen — Mon, 16 Oct 2017 17:50:09 +0000

Cuando escuchas la expresión “periodismo de datos”, también escuchas palabras como “revolución” y “futuro”. Pero, según un nuevo documento que analiza un par de cientos de proyectos internacionales de periodismo de datos nominados para premios durante cuatro años, la mayor parte de estos proyectos no ha evolucionado tanto como se podría pensar: todavía cubre sobre todo política, sigue necesitando grandes equipos de trabajo, todavía está a cargo principalmente de los periódicos y aún utiliza “datos públicos preprocesados”.

“Nuestros hallazgos cuestionan la idea generalizada de que el periodismo basado en datos “revoluciona” al periodismo en general reemplazando las formas tradicionales de descubrir y reportear noticias”, escriben Wiebke Loosen, Julius Reimer y Fenja de Silva-Schmidt en un artículo publicado recientemente en la revista Journalism.

Loosen y Reimer, ambos del Hans-Bredow-Institut for Media Research de Hamburgo, Alemania, y De Silva-Schmidt de la Universidad de Hamburgo analizaron 225 proyectos finalistas (no solo presentados) a los Data Journalism Awards entre 2013 y 2016, registrando fuentes y tipos de datos visualizaciones, características interactivas, temas y productores, para ver de qué manera los proyectos cambiaron con el tiempo, en qué se diferenciaban los proyectos finalistas de los proyectos ganadores, y si hay espacio para la innovación y la mejora. ¿Por qué explorar estos proyectos? Porque son “lo que la espeialización en sí misma considera como ejemplos significativos de periodismo basados en datos”, escriben los autores, y es probable que los ganadores estén presentes en el futuro desarrollo del campo.

Algunas de las tendencias de los 225 proyectos:

— El periodismo de datos todavía requiere mucha mano de obra. De los 192 proyectos en la muestra que tenían firma, “un poco más de cinco personas figuraban como autores o contribuyentes”. Alrededor de un tercio (32.7 por ciento) de los proyectos se realizaron en colaboración con “socios externos que contribuyeron al análisis o diseñaron visualizaciones”.

— Los periódicos siguen a la cabeza de la producción del periodismo de datos y ganando la mayor cantidad de premios. 43.1 por ciento de los proyectos nominados y 37.8 por ciento de los premiados fueron enviados por periódicos. Después:

Otro grupo importante comprende organizaciones de periodismo de investigación como ProPublica y el Consorcio Internacional de Periodistas de Investigación (ICIJ), que fueron galardonados con mayor frecuencia (total: 18.2%, otorgado por DJA: 32.4%, solo nominado: 15.4%). Revistas impresas y medios nativos en línea (8,4% cada uno), emisoras públicas y privadas (5,8 y 5,3%), agencias de noticias (4,4%), organizaciones no periodísticas (4,0%), medios universitarios (3,1%) y otros tipos de autores (2.7%) están mucho menos representados. Curiosamente, las revistas impresas, las agencias de noticias y las organizaciones no periodísticas nunca han sido premiadas.

— Sobre todo es político. Casi la mitad de las piezas analizadas (48.2 por ciento) cubrían un tema político, seguido de “asuntos sociales” como resultados de un censo e informes delictivos (36.6 por ciento), negocios y economía (28.1 por ciento) y salud y ciencia (21.4 por ciento). “La cultura, los deportes y la educación atraen poca cobertura (2.7 por ciento a 5.4 por ciento)”. La mayoría de los proyectos también trataron con una sola categoría temática en lugar de “diseminarse en dos o más áreas temáticas distintas (por ejemplo, decisiones políticas y su impacto social, investigando de qué manera las leyes que regulan el uso de las armas influyen en la cantidad de tiroteos masivos)”. Los autores se preguntan si es una función de los premios de periodismo tener en cuenta sobre todo a los asuntos más serios.

— El periodismo de datos es cada vez más crítico. El cincuenta y dos por ciento de las piezas analizadas incluyeron “elementos críticos” (por ejemplo, sobre los métodos de confiscación ilegal de la policía) o incluso llamados a la intervención pública (por ejemplo, con respecto a las emisiones de carbono). Esta proporción creció consistentemente durante los cuatro años (2013: 46.4 por ciento vs. 2016: 63.0 por ciento), y fue considerablemente más alta entre los premiados (62.2 por ciento contra 50.0 por ciento).

— La mayoría de los proyectos aún se basan en datos oficiales (en lugar de ser recolectados por el propio proyecto).

Probablemente no sea una sorpresa que las historias con mayores probabilidades de ser premiadas contuvieran “datos obtenidos a través de solicitudes, colecciones propias o filtraciones”. Los autores se sorprendieron, sin embargo, de que “a pesar de la asociación del periodismo de datos con la apertura y la transparencia, en más de dos quintos de piezas, los periodistas no indicaron en absoluto cómo accedieron a los datos utilizados”.

— Las visualizaciones no se han vuelto mucho más sofisticadas. Las imágenes y cuadros estáticos fueron las visualizaciones más frecuentes. “Las combinaciones típicas de elementos de visualización incluyen imágenes con gráficos estáticos simples (40.0 por ciento de todos los casos) o con mapas (32.4 por ciento), además de mapas combinados con gráficos estáticos simples (31.1 por ciento). “Las piezas premiadas tenían más probabilidades de ser visualmente ricas”.

— La interactividad es “un criterio de calidad”, pero la interactividad sofisticada es muy rara. Los mapas ampliables y las funciones de filtro son muy comunes, tal vez porque suelen estar incluidas en herramientas de software gratuitas que usan los periodistas de datos. “Nuestros resultados coinciden con las observaciones de otros respecto de la ‘falta de sofisticación’ en lo que respecta a la interactividad relacionada con datos. A menudo incluyen solo ‘posibilidades limitadas para que el público tome decisiones’ o una ‘mínima interactividad formal’ que está allí solo para que haya algo de interactividad.” Tampoco está claro realmente si el público quiere visualizaciones interactivas en las noticias.

En términos generales, los autores descubrieron que el periodismo de datos sigue requiriendo mucha mano de obra, es lento para dar respuesta a las noticias de última hora, y depende de situaciones en las que se ya producen datos regularmente, como las elecciones. “Al carecer de esas importantes características del periodismo –actualidad y universalidad tematica–, es más probable que el periodismo de datos solo sea un complemento de la información tradicional” escriben los autores, “y no su reemplazo”.

The Spanish version of this story was translated by IJNet.

Visualization of Wikipedia edits by Fernanda Viégas used under a Creative Commons license.

A new program wants to help more people in news orgs — beyond journalists — get literate with data

Ricardo Bilton — Wed, 26 Jul 2017 13:30:53 +0000

News organizations these days are spending a lot of time and energy making sure that data plays a central role in their newsrooms. But fewer are putting as much effort into building that kind of data culture into their organizations overall.

That argument is core to the Data Culture Project, a new effort designed to help nonprofit organizations expand data literacy to more of their staff and leadership. While the program isn’t designed exclusively for journalism nonprofits, its creators are interested in seeing how, for example, data-literate reporters and editors can extend those skills to less technical people in other departments. The program’s activities, which focus on topics such as data storytelling, spreadsheet analysis, and test mining, are designed to be accessible, and assume no prior knowledge of how to work with data.

“When we’re talking data culture in a journalism organization, we’re not just talking about data journalism,” said Catherine D’Ignazio, assistant professor of civic media and data visualization at Emerson College. “We’re talking about shifting the culture of the whole operation, including the people at the top, to understand how data can provide insight into all parts of an organization.” D’Ignazio is developing the project alongside Rahul Bhargava, a research scientist at MIT Center for Civic Media.

To help develop the program further, the team is looking for roughly a dozen organizations to volunteer to participate in a three-month pilot of the program, which will involve three brown-bag lunch sessions in which organizations will work through some key ideas and activities. (Organizations interested in participating in a pilot program can apply here.)

Organizations that have a “data culture,” as D’Ignazio and Bhargava define it, are typically those whose leadership understands the importance of data and prioritizes its collection and management. More, these leaders also prioritize the idea that people in all parts of an organization — not just a few techies — are data literate. These staffers are trained and encourage to find uses for data, particularly when it comes to solving big problems within their organizations. (D’Ignazio and Bhargava also developed Databasic.io, a suite of tools designed to introduce users to basic data concepts.)

The Data Culture Project was developed with the understanding that organizations face plenty of barriers to making data literacy more core to their organizations. One of the most common of these barriers is that, for the leadership at many organizations, data is seen as something solely in the purview of IT departments. And in other organizations, data is controlled by evaluation departments, which must collect data because of funding stipulations.

The problem, argues D’Ignazio, is that when data literacy is limited to small numbers of people within an organization, the rest of the organization misses out. “There are a lot of folks in nonprofit organizations who are evaluation specialists and end up being the data people — but then the rest of the people in the organization don’t necessarily know what that group is doing,” D’Ignazio said. “We think there’s a lot more useful organizational knowledge and theres a lot of under-leveraged creativity with people who are not super skilled with data, but have deep domain knowledge with the areas they’re working in.”

Another problem occurs when organizations collect lots of data, but can’t figure out what to do with it. There’s the “boringness” barrier, which can be significant. These problems are real ones, but also ones that can be fixed as people outside technical circles learn to better use data within their own domains. While these people have a stake in how data is used at their organizations, they rarely have a seat at the table. “There’s often this bridging factor that is lacking,” said D’Ignazio. “If we want to make the case that data is good for democracy, it can’t only be good for democracy for IT people and tech people. It’s something all of us should be able to use and have access to as a common resource or common language.”

Photo of My Other Car Is Data Journalism sticker on a laptop by former Knight-Mozilla OpenNews fellow Mike Tigas used under a Creative Commons license.

The AP Stylebook now includes new guidelines on data (requesting it, scraping it, reporting on it, and publishing it)

Laura Hazard Owen — Wed, 31 May 2017 13:00:57 +0000

It’s fitting that, in a year when the Panama Papers investigation won the 2017 Pulitzer Prize for explanatory reporting (the entire leaked data set for that investigation totaled 11.5 million documents adding up to 2.6 terabytes), the Associated Press is releasing its updated 2017 Stylebook with a new chapter on data journalism.

“Government agencies, businesses and other organizations alike all communicate in the language of data and statistics,” the AP said. “To cover them, journalists must become conversant in that language as well.”

Here are a few of the AP’s data journalism recommendations:

— Get the data in searchable form, if you can. “In a records request for data, be sure to ask for data in an ‘electronic, machine-readable’ format that can be interpreted by standard spreadsheet or database software. The alternative, which is the default for many agencies, is to provide records in paper form or as scans of paper pages, which present an obstacle to analysis.”

— Scraping data should be a “last resort.”

Some website operators sanction this practice, and others oppose it. A website with policies limiting or prohibiting scraping often will include them in its terms of service or in a “robots.txt” file, and reporters should take these into account when considering whether to scrape.

Scraping a website can cause its servers to work unusually hard, and in extreme cases, scraping can cause a website to stop working altogether and treat the attempt as a hostile attack. Therefore, follow these precautions:
— Scraping should be seen as a last resort. First try to acquire the desired data by requesting it directly.
— Limit the rate at which the scraper software requests pages in order to avoid causing undue strain on the website’s servers.
— Wherever feasible, identify yourself to the site’s maintainers by adding your contact information to the scraper’s requests via the HTTP headers.

— Make sure someone else can reproduce your findings. “If at all possible, an editor or another reporter should attempt to reproduce the results of the analysis and confirm all findings before publication.”

— Let your readers see the source data, too.

Where possible, provide the source data for download along with the story or visualization. When distributing data consider the following guidelines:
— The data should be distributed in a machine-readable, widely useable format, such as a spreadsheet.
— The data should be accompanied by thorough documentation that explains data provenance, transformations and alterations, any caveats with the data analysis and a data dictionary.

The updated stylebook also includes entries on fake news, among other things; if you have questions for its editor, you can ask them in a 2:30 p.m. ET Twitter chat with the hashtag #APStyleChat.

Investigative outlet Correctiv crowdsourced data collection with the help of a local newsroom

Shan Wang — Thu, 11 May 2017 14:56:02 +0000

When the data you’re looking for to do your reporting doesn’t actually exist, consider collecting it yourselves — or, cast a wider net, and ask for help from those who live in the community whose issues you’re investigating.

German investigative nonprofit CORRECT!V (henceforth, Correctiv) recently wrapped up an investigation meant to address such an absence of reliable, centralized government data that could be made available through a FOI request around class cancellations in schools in the Germany city of Dortmund (connected to a shortage of teachers). The crowdsourced investigation asked Dortmund residents — from parents to teachers to students themselves — to help enter information into its Crowd Newsroom platform on cancellation of classes in Dortmund schools over the course of March. More than 520 participated, resulting in 3,552 class cancellations logged on the platform. Registered users remained anonymous on the platform, but other users could see the reported hours.

Correctiv has a team based in and focused on issues in North Rhine-Westphalia (the German state where Dortmund is located) and had also been looking to partner with a local newsroom on a deeper investigation. For this Crowd Newsroom effort, it was able to work with Ruhr Nachrichten, the daily newspaper in Dortmund, which aided in the reporting and, critically, helped get the word out to its readers, including sending signup information via printed mailings with an overview of the project and the Crowd Newsroom platform URL to the paper’s readers. (It also worked with two student newspapers that could use the data to write their own stories.)

A cancellation percentage already exists, based on spot checks by the Ministry of Education in North Rhine-Westphalia. But the number seemed too low to parents and students and others directly involved in the education system, though they were largely in the dark about how they compared to other schools in the state; moreover, the schools might have known that they’d be visited, which then might have led to fewer class cancellations on that day.

“The way we’d ask for numbers normally would be via FOIA requests, but we could not really ask the federal state for numbers, because there were no good numbers,” Jonathan Sachse, a Correctiv reporter and the staffer heading up the organization’s broader community engagement efforts. “So using Crowd Newsroom for this was a perfect match. It’s also a big topic in Germany when it comes to family policy — a lot of my colleagues have school-age children as well. Everybody was interested in getting closer to the actual numbers.”

An analysis of the data submitted through Crowd Newsroom painted a different picture from early government statistics. It seemed that the percentage of classes canceled in Dortmund schools was closer to double the earlier figure provided by the ministry of education (41 percent versus 20 percent). Here, the reporters emphasize these crowdsourced numbers are starting points, not perfect scientific inquiry.

As data collection progressed, Correctiv and Ruhr Nachrichten also began to get tips “from people inside the schools” who passed along letters indicating that the federal state had become aware of the reporting and began to collect from schools’ official data in response, according to Sachse. Midway through the crowdsourced process, “we were able to use FOIA to access from the ministry of education what didn’t exist before,” Sachse said. “I’m sure we can now start working with real official numbers.”

There was resistance to the overall approach, which Susanne Riese, who led the reporting on the Ruhr Nachrichten side, had to address concerns in a piece for Ruhr Nachrichten. Some felt that the project was essentially asking parents to tattletale on an overworked school staff. Others were concerned about methodological issues, such as individuals sabotaging data entry.

Correctiv addressed the second issue with two moderating layers. First, it sent data collected to individual schools directly each week and asked for confirmation and comment. Second, contributors were allowed to correct other users’ entries on Correctiv’s platform, which allowed the platform to operate “a little more like Wikipedia than classical fact-checking operations,” according to Sachse. (Practically speaking, it wasn’t likely that project participants, the majority of whom were parents, were out to mess with the platform.)

Schools was Correctiv’s second crowdsourced project via Crowd Newsroom. Its first involved helping community members reporting on their local banks. For that investigation, Correctiv staff checked all the data entered themselves. From the first investigation, Sachse said, Correctiv learned to limit the data collection period to about a month, treating it like a crowdfunding campaign, and to choose a very specific question (“How many classes had to be canceled?”) to keep the process simple for contributors.

Getting readers to participate in the reporting process, particularly in the collection of documents and data, isn’t new. Sachse gave a nod to ProPublica’s work, from its early Free the Files project examining political ad spending to its partnership with The Virginian-Pilot investigating the lasting impacts of Agent Orange used during the Vietnam War. ICIJ created a database from the Panama Papers leaks (and Offshore Leaks and the Bahamas Leaks), encouraging interested users to flag notable names they come across when searching. Argentinian daily La Nación’s VozData platform also facilitates user checking of documents, with a twist — it includes a competitive ranking of users’ activities on its dashboard.

Correctiv, however, is now flush with Google Digital News Initiative money (a grant of €500,000 over three years), and hopes to build out its platform to serve a much wider community of both journalists and community members.

“We want to make this an open platform, though it’s a long way to go before it can be open sourced,” Sachse said. A small step toward that, though, he said, is to see if the crowdsourcing process used this time around for recording class cancellations could be done with other local newspapers in other regions. “That would be the middle term, and then in the long-term, make it more open. An eventually, but not this year, maybe start international investigations — that’s really long-term.”