Last week, The Independent reported that a US spy agency had patented a system for eavesdropping on phone calls. Now it is lab-testing software that can sift through calls and e-mails in search of key phrases
By Suelette Dreyfus
22 November 1999
THE US Department of Defense is lab-testing technology that could make it easier automatically to sift through a vast pool of private communications, including international telephone phone calls, in a similar manner to using an Internet search engine.
The technology, called “Semantic Forests”, is a software program that analyses voice transcripts and other documents in order to allow intelligent searching for specific topics. The software could be used to analyse computer-transcribed telephone conversations. It is named for its use of an electronic dictionary to make a weighted “tree” of meanings for each word in a target document.
Two US Department of Defense academic papers, published as part of the Text Retrieval Conference (TREC) in 1997 and 1998, provide the first evidence that the US government has actually built a working prototype of this technology and is testing it. The papers reveal that the US military had been honing Semantic Forests over at least two years, from 1996 to 1998, to make it more effective at siphoning off useful information.
According to the 1998 paper, the software was originally developed to “work with imperfect speech recogniser transcripts”. The US Department of Defense declined to comment on the matter.
In a series of lab tests, the software sifted through large pools of documents, including transcripts of speech and data from Internet discussion groups. In one set of tests, scientists increased the average precision rate for finding relevant documents per query from 19 per cent to 27 per cent in just one year, from 1997 to 1998.
It appears that Semantic Forests is intelligent enough to handle questions given in plain English. One of the sample questions used to test the software was, “What have the effects of the UN sanctions against Iraq been on the Iraqi people, the Iraqi economy, or world oil prices?”
The US National Security Agency is also closely associated with Semantic Forests. One of the authors of Semantic Forests, Patrick Shone, was also one of the inventors of an NSA-patented system for eavesdropping on international phone calls, which is similar to Semantic Forests.
The NSA applied for the patent, No 5,937,422, seven months before the first Semantic Forest paper was delivered at TREC. However, the patent only became public after winning US Patent Office approval in August this year.
The NSA is believed to conduct large-scale, automatic eavesdropping on some types of written international communications such as e-mail, according to a May 1999 interim report commissioned by European Parliament’s Scientific and Technical Options Assessment (STOA) panel.
Glyn Ford MEP, who instigated the STOA’s investigation, said he was concerned that the US was testing technology that might be used to eavesdrop on international telephone calls. “It appears the NSA has abilities over and above what has been indicated to us to date,” he said.
There was “strong circumstantial evidence” that the NSA had been engaged in economic espionage on occasion, passing intercepted information on to American companies to give them a competitive advantage, he said. While he was happy for intelligence agencies to spy on terrorists, he said that the NSA’s “blanket approach” to monitoring telephone calls and e-mails was “a serious breach of privacy rights”.
Cryptographer Julian Assange, who moderates the online Australian discussion forum AUCRYPTO, discovered the department papers while investigating NSA capabilities. “This is not some theoretical exercise. The US has actually built and lab tested this technology, which is clearly aimed at telephone calls. You don’t make a wheel like this unless you have something to put it on,” he said.
US Congressman Bob Barr, who previously served with the CIA, said: “This report underscores the need to update oversight procedures and legal standards designed in the 1970s and not updated since, in light of the revolutionary technological changes of the past two decades. A perfected system to intercept voice communications and allow government agencies to precisely pinpoint conversational topics of interest would create a truly awesome potential for privacy-invading abuses.”
The outspoken Georgia Republican has been a driving force behind proposed legislation to force the NSA and CIA to report the legal standards that they use while conducting signals intelligence activities, including electronic surveillance. The legislation has passed both houses of Congress and is awaiting signature by President Clinton.
Dr Brian Gladman, the former director of Strategic Electronic Communications at the Ministry of Defence, said the NSA would always like to find better ways to filter “voice traffic” – international phone conversations – automatically for information. “The NSA’s problem is finding needles in haystacks, and any technology that can chuck out hay without chucking out needles is of value to them,” he said.
“Automation is essential. It is likely the success rate will be low, but this may not be an issue. It is better to deploy something that will allow 10 per cent of the interesting traffic to be found, than doing nothing and finding nothing.”
Dr Gladman speculated that the NSA was not using the new technology on international telephone calls at the moment, but was doing trials on it “to see if it is worth deploying”.
The two Semantic Forests academic papers came from the speech research branch of the US Department of Defense at Fort Meade, Maryland – the location of the headquarters of the NSA. When the 1998 paper was downloaded from the TREC conference Internet site, the name of the file was listed as “nsa-rev.pdf”.
Bruce Schneier, the author of Applied Cryptography, claims that, paired with other types of spying technology, this software could have a significant impact on people’s privacy. “This technology can be combined with voice-recognition technology to automatically find certain conversations by a particular person or ethnic group,” he said.