On Dec. 6, 1941, the Foreign Broadcast Information Service (FBIS), a radio monitoring operation set up by the U.S. intelligence community and one of the earliest experiments in what it now called open-source intelligence, delivered its very first report, an analysis of Japanese media sentiment. The report noted that Japanese radio stations had sharply increased their level of criticism of the United States and dropped their calls for peace. The next day, Pearl Harbor was attacked.
Obviously, no amount of media monitoring would have revealed when and where the attack would take place (that’s what spies are for), but it’s certainly possible that with a better sense of the likelihood of an attack, U.S. forces might not have been caught quite so unawares. Some 70 years later, one computer scientist believes that a somewhat more ambitious version of the same type of news monitoring may soon be able to predict social upheavals and conflicts — such as the recent revolutions in the Arab world — with a remarkable degree of accuracy.
Kalev Leetaru, the assistant director for text and digital media analytics at the University of Illinois’s Institute for Computing in the Humanities, Arts, and Social Science, is one of the leading researchers in the emerging field of conflict early-warning. In a paper published this month in the peer-reviewed online technology journal First Monday, Leetaru argues that “computational analysis of large text archives can yield novel insights to the functioning of society.”
Leetaru’s study builds on recent economics research looking at how the tone of news and social media coverage can predict economic events. One recent paper, for instance, found that the general mood state on Twitter can anticipate the movements of the Dow Jones Industrial Average. Leetaru was curious whether the same type of analysis could predict social events.
Leetaru employed several massive databases of news articles over the last 30 years, including the “Summary of World Broadcasts” — English translations of foreign broadcasts done by the British equivalent of the FBIS — the complete digital archives of the New York Times, and a web crawl of online news sites, to create a dataset of around 100 million news articles dating back to 1979. He then fed this raw material into one of the world’s most powerful supercomputers, the University of Tennessee’s “Nautilus,” and began to look for patterns.
In recent years, companies have increasingly deployed “sentiment mining” software to gauge the tone of news coverage. Think of it as a hypersophisticated Google Alerts: These programs scan news articles for positive and negative words and can also distinguish the severity of feeling, knowing the difference between “loathe” and “dislike” for instance. The software misses a lot of nuance and can be fooled by sarcasm, but at the scale of data Leetaru was working with, it gives a pretty good indication of global media sentiment on a given topic.
Take Egypt. Looking at the tone of media coverage of the country over the past three decades, Leetaru noted that at the beginning of this year, it was at its most negative point in 20 years:
In particular, the tone “drops off a cliff into negativity” during the first week of 2011, following the attack on a Coptic church in Alexandria. And this was several weeks before the start of the protests that ultimately toppled Mubarak on Feb. 11:
“Physical unrest is caused by pent-up emotion,” Leetaru says. “We’re not at a point where we can say, ‘In two weeks there will be a riot on such-and-such street.’ The more possible goal is to measure the background.”
Leetaru notes that during this period, Egypt’s GDP, a more traditional measure of societal stability, was on its way up. But, he argues, President Hosni Mubarak had lost the Egyptian people’s faith in his “ability to provide security” and that the shift in mood indicated a loss of “global legitimacy.”
What’s the practical implication of this? Leetaru notes U.S. President Barack Obama‘s cautious public statements during the early days of the protests. “Whatever the high-level assessments were, there seemed to be at least a strong suspicion that [Mubarak] would stay in power.” But what if the White House had access to Leetaru’s Nautilus super news-cruncher? Perhaps Obama would have seen Mubarak’s writing on the wall sooner.
That level of real-time prediction is still a long-way off, of course. Looking at an event that has already happened and finding advance indicators is very different from detecting those indicators — with any real certainty — before the event happens. It should also be noted that it’s still far from batting 1.000 on past events: Given the relative lack of news coverage, the indicators leading up to the revolutions in Tunisia and Libya weren’t as obvious. But Leetaru’s current goal is to make the results available in real-time. Even with Nautilus’s 1024 core processors – a top of the line Mac has 12 — delivering and interpreting results as they come in is a tall order. But when that day comes, the technology could be transformational.
“Let’s say the mayor of London, a week before the riots, could see that his city was descending toward unrest and his social policies were approaching a breaking point,” Leetaru suggests. “If he had that information, would he have instituted emergency policies to alleviate those social issues, or moved police in quicker before violence spread?”
And Leetaru sees applications for this type of analysis not just in time — but in space. His analysis of the coverage of Osama bin Laden from the late 1990s until his killing this May found that nearly 49 percent of articles mentioning the al Qaeda chief included a reference to a city in Pakistan. Leetaru concludes that global news content “would have suggested Northern Pakistan in a 200 km. radius around Islamabad and Peshawar as his most likely location.” (Google and the U.S. Centers for Disease Control and Prevention have developed a somewhat similar project using news results to track flu outbreaks.)
This section of Leetaru’s paper has gotten a fair amount of media attention this week, but is arguably the least convincing. If it was already conventional wisdom that bin Laden was hiding in Pakistan, is it really that useful to know that 100 million news articles agree? (Another highly touted effort by a UCLA professor to track bin Laden using biogeographic data also essentially replicated the assumption most people had anyway.) And don’t get your hopes up about tracking down Libya’s fugitive leader. “We’re not yet at a point right now where I can just sit down at a computer and type in ‘Qaddafi’ and find him,” Leetaru says.
But despite these limitations, Leetaru’s work does suggest a time in the near future when this type of data mining will be part of the everyday toolbox of political scientists, just as mathematical modeling and predictions markets already are.
The U.S. intelligence community seems to be betting on it, based on a recent invitation to academics put out by the Director of National Intelligence’s skunk works, the Intelligence Advanced Research Projects Activity, calling for methods of “detecting unexpected events by fusing publicly available data of multiple types from multiple sources.”
“People talk about oceans of information,” Leetaru says. “We’ve spent the last few decades looking at the waves. If you look below the surface, there’s a whole world of latent information that we’re just beginning to try to understand.”