Monthly Archives: February 2017

One of its servers are compromised

Anonymity networks protect people living under repressive regimes from surveillance of their Internet use. But the recent discovery of vulnerabilities in the most popular of these networks — Tor — has prompted computer scientists to try to come up with more secure anonymity schemes.

At the Privacy Enhancing Technologies Symposium in July, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory and the École Polytechnique Fédérale de Lausanne will present a new anonymity scheme that provides strong security guarantees but uses bandwidth much more efficiently than its predecessors. In experiments, the researchers’ system required only one-tenth as much time as similarly secure experimental systems to transfer a large file between anonymous users.

“The initial use case that we thought of was to do anonymous file-sharing, where the receiving end and sending end don’t know each other,” says Albert Kwon, a graduate student in electrical engineering and computer science and first author on the new paper. “The reason is that things like honeypotting” — in which spies offer services through an anonymity network in order to entrap its users — “are a real issue. But we also studied applications in microblogging, something like Twitter, where you want to anonymously broadcast your messages to everyone.”

The system devised by Kwon and his coauthors — his advisor, Srini Devadas, the Edwin Sibley Webster Professor of Electrical Engineering and Computer Science at MIT; David Lazar, also a graduate student in electrical engineering and computer science; and Bryan Ford SM ’02 PhD ’08, an associate professor of computer and communication sciences at the École Polytechnique Fédérale de Lausanne — employs several existing cryptographic techniques but combines them in a novel manner.

Shell game

The heart of the system is a series of servers called a mixnet. Each server permutes the order in which it receives messages before passing them on to the next. If, for instance, messages from senders Alice, Bob, and Carol reach the first server in the order A, B, C, that server would send them to the second server in a different order — say, C, B, A. The second server would permute them before sending them to the third, and so on.

An adversary that had tracked the messages’ points of origin would have no idea which was which by the time they exited the last server. It’s this reshuffling of the messages that gives the new system its name: Riffle.

Best algorithms for network communication.

Ants, it turns out, are extremely good at estimating the concentration of other ants in their vicinity. This ability appears to play a role in several communal activities, particularly in the voting procedure whereby an ant colony selects a new nest.

Biologists have long suspected that ants base their population-density estimates on the frequency with which they — literally — bump into other ants while randomly exploring their environments.

That theory gets new support from a theoretical paper that researchers from MIT’s Computer Science and Artificial Intelligence Laboratory will present at the Association for Computing Machinery’s Symposium on Principles of Distributed Computing conference later this month. The paper shows that observations from random exploration of the environment converge very quickly on an accurate estimate of population density. Indeed, they converge about as quickly as is theoretically possible.

Beyond offering support for biologists’ suppositions, this theoretical framework also applies to the analysis of social networks, of collective decision making among robot swarms, and of communication in ad hoc networks, such as networks of low-cost sensors scattered in forbidding environments.

“It’s intuitive that if a bunch of people are randomly walking around an area, the number of times they bump into each other will be a surrogate of the population density,” says Cameron Musco, an MIT graduate student in electrical engineering and computer science and a co-author on the new paper. “What we’re doing is giving a rigorous analysis behind that intuition, and also saying that the estimate is a very good estimate, rather than some coarse estimate. As a function of time, it gets more and more accurate, and it goes nearly as fast as you would expect you could ever do.”

Random walks

Musco and his coauthors — his advisor, NEC Professor of Software Science and Engineering Nancy Lynch, and Hsin-Hao Su, a postdoc in Lynch’s group — characterize an ant’s environment as a grid, with some number of other ants scattered randomly across it. The ant of interest — call it the explorer — starts at some cell of the grid and, with equal probability, moves to one of the adjacent cells. Then, with equal probability, it moves to one of the cells adjacent to that one, and so on. In statistics, this is referred to as a “random walk.” The explorer counts the number of other ants inhabiting every cell it visits.

In their paper, the researchers compare the random walk to random sampling, in which cells are selected from the grid at random and the number of ants counted. The accuracy of both approaches improves with each additional sample, but remarkably, the random walk converges on the true population density virtually as quickly as random sampling does.

That’s important because in many practical cases, random sampling isn’t an option. Suppose, for instance, that you want to write an algorithm to analyze an online social network — say, to estimate what fraction of the network self-describes as Republican. There’s no publicly available list of the network’s members; the only way to explore it is to pick an individual member and start tracing connections.

Practical applications for non-native English speakers

After thousands of hours of work, MIT researchers have released the first major database of fully annotated English sentences written by non-native speakers.

The researchers who led the project had already shown that the grammatical quirks of non-native speakers writing in English could be a source of linguistic insight. But they hope that their dataset could also lead to applications that would improve computers’ handling of spoken or written language of non-native English speakers.

“English is the most used language on the Internet, with over 1 billion speakers,” says Yevgeni Berzak, a graduate student in electrical engineering and computer science, who led the new project. “Most of the people who speak English in the world or produce English text are non-native speakers. This characteristic is often overlooked when we study English scientifically or when we do natural-language processing for English.”

Most natural-language-processing systems, which enable smartphone and other computer applications to process requests phrased in ordinary language, are based on machine learning, in which computer systems look for patterns in huge sets of training data. “If you want to handle noncanonical learner language, in terms of the training material that’s available to you, you can only train on standard English,” Berzak explains.

Systems trained on nonstandard English, on the other hand, could be better able to handle the idiosyncrasies of non-native English speakers, such as tendencies to drop or add prepositions, to substitute particular tenses for others, or to misuse particular auxiliary verbs. Indeed, the researchers hope that their work could lead to grammar-correction software targeted to native speakers of other languages.

Diagramming sentences

The researchers’ dataset consists of 5,124 sentences culled from exam essays written by students of English as a second language (ESL). The sentences were drawn, in approximately equal distribution, from native speakers of 10 languages that are the primary tongues of roughly 40 percent of the world’s population.

Every sentence in the dataset includes at least one grammatical error. The original source of the sentences was a collection made public by Cambridge University, which included annotation of the errors, but no other grammatical or syntactic information.

To provide that additional information, Berzak recruited a group of MIT undergraduate and graduate students from the departments of Electrical Engineering and Computer Science (EECS), Linguistics, and Mechanical Engineering, led by Carolyn Spadine, a graduate student in linguistics.

After eight weeks of training in how to annotate both grammatically correct and error-ridden sentences, the students began working directly on the data. There are three levels of annotation. The first involves basic parts of speech — whether a word is a noun, a verb, a preposition, and so on. The next is a more detailed description of parts of speech — plural versus singular nouns, verb tenses, comparative and superlative adjectives, and the like.