Preserving their fundamental mathematical relationships

One way to handle big data is to shrink it. If you can identify a small subset of your data set that preserves its salient mathematical relationships, you may be able to perform useful analyses on it that would be prohibitively time consuming on the full set.

The methods for creating such “coresets” vary according to application, however. Last week, at the Annual Conference on Neural Information Processing Systems, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory and the University of Haifa in Israel presented a new coreset-generation technique that’s tailored to a whole family of data analysis tools with applications in natural-language processing, computer vision, signal processing, recommendation systems, weather prediction, finance, and neuroscience, among many others.

“These are all very general algorithms that are used in so many applications,” says Daniela Rus, the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT and senior author on the new paper. “They’re fundamental to so many problems. By figuring out the coreset for a huge matrix for one of these tools, you can enable computations that at the moment are simply not possible.”

As an example, in their paper the researchers apply their technique to a matrix — that is, a table — that maps every article on the English version of Wikipedia against every word that appears on the site. That’s 1.4 million articles, or matrix rows, and 4.4 million words, or matrix columns.

That matrix would be much too large to analyze using low-rank approximation, an algorithm that can deduce the topics of free-form texts. But with their coreset, the researchers were able to use low-rank approximation to extract clusters of words that denote the 100 most common topics on Wikipedia. The cluster that contains “dress,” “brides,” “bridesmaids,” and “wedding,” for instance, appears to denote the topic of weddings; the cluster that contains “gun,” “fired,” “jammed,” “pistol,” and “shootings” appears to designate the topic of shootings.

Joining Rus on the paper are Mikhail Volkov, an MIT postdoc in electrical engineering and computer science, and Dan Feldman, director of the University of Haifa’s Robotics and Big Data Lab and a former postdoc in Rus’s group.

The researchers’ new coreset technique is useful for a range of tools with names like singular-value decomposition, principal-component analysis, and latent semantic analysis. But what they all have in common is dimension reduction: They take data sets with large numbers of variables and find approximations of them with far fewer variables.

In this, these tools are similar to coresets. But coresets are application-specific, while dimension-reduction tools are general-purpose. That generality makes them much more computationally intensive than coreset generation — too computationally intensive for practical application to large data sets.

The researchers believe that their technique could be used to winnow a data set with, say, millions of variables — such as descriptions of Wikipedia pages in terms of the words they use — to merely thousands. At that point, a widely used technique like principal-component analysis could reduce the number of variables to mere hundreds, or even lower.

The researchers’ technique works with what is called sparse data. Consider, for instance, the Wikipedia matrix, with its 4.4 million columns, each representing a different word. Any given article on Wikipedia will use only a few thousand distinct words. So in any given row — representing one article — only a few thousand matrix slots out of 4.4 million will have any values in them. In a sparse matrix, most of the values are zero.

Crucially, the new technique preserves that sparsity, which makes its coresets much easier to deal with computationally. Calculations become lot easier if they involve a lot of multiplication by and addition of zero.

The new coreset technique uses what’s called a merge-and-reduce procedure. It starts by taking, say, 20 data points in the data set and selecting 10 of them as most representative of the full 20. Then it performs the same procedure with another 20 data points, giving it two reduced sets of 10, which it merges to form a new set of 20. Then it does another reduction, from 20 down to 10.

Communication networks from malicious hackers

Distributed planning, communication, and control algorithms for autonomous robots make up a major area of research in computer science. But in the literature on multirobot systems, security has gotten relatively short shrift.

In the latest issue of the journal Autonomous Robots, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory and their colleagues present a new technique for preventing malicious hackers from commandeering robot teams’ communication networks. The technique could provide an added layer of security in systems that encrypt communications, or an alternative in circumstances in which encryption is impractical.

“The robotics community has focused on making multirobot systems autonomous and increasingly more capable by developing the science of autonomy. In some sense we have not done enough about systems-level issues like cybersecurity and privacy,” says Daniela Rus, an Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT and senior author on the new paper.

“But when we deploy multirobot systems in real applications, we expose them to all the issues that current computer systems are exposed to,” she adds. “If you take over a computer system, you can make it release private data — and you can do a lot of other bad things. A cybersecurity attack on a robot has all the perils of attacks on computer systems, plus the robot could be controlled to take potentially damaging action in the physical world. So in some sense there is even more urgency that we think about this problem.”

Identity theft

Most planning algorithms in multirobot systems rely on some kind of voting procedure to determine a course of action. Each robot makes a recommendation based on its own limited, local observations, and the recommendations are aggregated to yield a final decision.

A natural way for a hacker to infiltrate a multirobot system would be to impersonate a large number of robots on the network and cast enough spurious votes to tip the collective decision, a technique called “spoofing.” The researchers’ new system analyzes the distinctive ways in which robots’ wireless transmissions interact with the environment, to assign each of them its own radio “fingerprint.” If the system identifies multiple votes as coming from the same transmitter, it can discount them as probably fraudulent.

“There are two ways to think of it,” says Stephanie Gil, a research scientist in Rus’ Distributed Robotics Lab and a co-author on the new paper. “In some cases cryptography is too difficult to implement in a decentralized form. Perhaps you just don’t have that central key authority that you can secure, and you have agents continually entering or exiting the network, so that a key-passing scheme becomes much more challenging to implement. In that case, we can still provide protection.

Simple method for making smaller microchip patterns

For the last few decades, microchip manufacturers have been on a quest to find ways to make the patterns of wires and components in their microchips ever smaller, in order to fit more of them onto a single chip and thus continue the relentless progress toward faster and more powerful computers. That progress has become more difficult recently, as manufacturing processes bump up against fundamental limits involving, for example, the wavelengths of the light used to create the patterns.

Now, a team of researchers at MIT and in Chicago has found an approach that could break through some of those limits and make it possible to produce some of the narrowest wires yet, using a process with the potential to be economically viable for mass manufacturing with standard types of equipment.

The new findings are reported this week in the journal Nature Nanotechnology, in a paper by postdoc Do Han Kim, graduate student Priya Moni, and Professor Karen Gleason, all at MIT, and by postdoc Hyo Seon Suh, Professor Paul Nealey, and three others at the University of Chicago and Argonne National Laboratory. While there are other methods that can achieve such fine lines, the team says, none of them are cost-effective for large-scale manufacturing.

The new approach includes a technique in which polymer thin films are formed on a surface, first by heating precursurs so they vaporize, and then by allowing them to condense and polymerize on a cooler surface, much as water condenses on the outside of a cold drinking glass on a hot day.

“People always want smaller and smaller patterns, but achieving that has been getting more and more expensive,” says Gleason, who is MIT’s associate provost as well as the Alexander and I. Michael Kasser (1960) Professor of Chemical Engineering. Today’s methods for producing features smaller than about 22 nanometers (billionths of a meter) across generally require either extreme ultraviolet light with very expensive optics or building up an image line by line, by scanning a beam of electrons or ions across the chip surface — a very slow process and therefore expensive to implement at large scale.

The new process uses a novel integration of three existing methods. First, a pattern of lines is produced on the chip surface using well-established lithographic techniques, in which an electron beam is used to “write” the pattern on the chip.

Detection time from minutes to microseconds

Terahertz spectroscopy, which uses the band of electromagnetic radiation between microwaves and infrared light, is a promising security technology because it can extract the spectroscopic “fingerprints” of a wide range of materials, including chemicals used in explosives.

But traditional terahertz spectroscopy requires a radiation source that’s heavy and about the size of a large suitcase, and it takes 15 to 30 minutes to analyze a single sample, rendering it impractical for most applications.

In the latest issue of the journal Optica, researchers from MIT’s Research Laboratory of Electronics and their colleagues present a new terahertz spectroscopy system that uses a quantum cascade laser, a source of terahertz radiation that’s the size of a computer chip. The system can extract a material’s spectroscopic signature in just 100 microseconds.

The device is so efficient because it emits terahertz radiation in what’s known as a “frequency comb,” meaning a range of frequencies that are perfectly evenly spaced.

“With this work, we answer the question, ‘What is the real application of quantum-cascade laser frequency combs?’” says Yang Yang, a graduate student in electrical engineering and computer science and first author on the new paper. “Terahertz is such a unique region that spectroscopy is probably the best application. And QCL-based frequency combs are a great candidate for spectroscopy.”

Different materials absorb different frequencies of terahertz radiation to different degrees, giving each of them a unique terahertz-absorption profile. Traditionally, however, terahertz spectroscopy has required measuring a material’s response to each frequency separately, a process that involves mechanically readjusting the spectroscopic apparatus. That’s why the method has been so time consuming.

Because the frequencies in a frequency comb are evenly spaced, however, it’s possible to mathematically reconstruct a material’s absorption fingerprint from just a few measurements, without any mechanical adjustments.

Getting even

The trick is evening out the spacing in the comb. Quantum cascade lasers, like all electrically powered lasers, bounce electromagnetic radiation back and forth through a “gain medium” until the radiation has enough energy to escape. They emit radiation at multiple frequencies that are determined by the length of the gain medium.

But those frequencies are also dependent on the medium’s refractive index, which describes the speed at which electromagnetic radiation passes through it. And the refractive index varies for different frequencies, so the gaps between frequencies in the comb vary, too.

Search engine enables English monolingual analysts

“About 6,000 languages are currently spoken in the world today,” says Elizabeth Salesky of MIT Lincoln Laboratory’s Human Language Technology (HLT) Group. “Within the law enforcement community, there are not enough multilingual analysts who possess the necessary level of proficiency to understand and analyze content across these languages,” she continues.

This problem of too many languages and too few specialized analysts is one Salesky and her colleagues are now working to solve for law enforcement agencies, but their work has potential application for the Department of Defense and Intelligence Community. The research team is taking advantage of major advances in language recognition, speaker recognition, speech recognition, machine translation, and information retrieval to automate language processing tasks so that the limited number of linguists available for analyzing text and spoken foreign languages can be used more efficiently. “With HLT, an equivalent of 20 times more foreign language analysts are at your disposal,” says Salesky.

One area in which Lincoln Laboratory researchers are focusing their efforts is cross-language information retrieval (CLIR). The Cross-LAnguage Search Engine, or CLASE, is a CLIR tool developed by the HLT Group for the Federal Bureau of Investigation (FBI). CLASE is a fusion of laboratory research in language identification, machine translation, information retrieval, and query-biased summarization. CLASE enables English monolingual analysts to help search for and filter foreign language documents — tasks that have traditionally been restricted to foreign language analysts.

Laboratory researchers considered three algorithmic approaches to CLIR that have emerged in the HLT research community: query translation, document translation, and probabilistic CLIR. In query translation, an English-speaking analyst queries foreign language documents for an English phrase; that query is translated into a foreign language via machine translation. The most relevant foreign language documents containing the translated query are then translated into English and returned to the analyst. In document translation, foreign language documents are translated into English; an analyst then queries the translated documents for an English phrase, and the most relevant documents are returned to the analyst. Probabilistic CLIR, the approach that researchers within the HLT Group are taking, is based on machine translation lattices (graphs in which edges connect related translations).

Practical for programs that import huge swaths of code

Symbolic execution is a powerful software-analysis tool that can be used to automaticallylocate and even repair programming bugs. Essentially, it traces out every path that a program’s execution might take.

But it tends not to work well with applications written using today’s programming frameworks. An application might consist of only 1,000 lines of new code, but it will generally import functions — such as those that handle virtual buttons — from a programming framework, which includes huge libraries of frequently reused code. The additional burden of evaluating the imported code makes symbolic execution prohibitively time consuming.

Computer scientists address this problem by creating simple models of the imported libraries, which describe their interactions with new programs but don’t require line-by-line evaluation of their code. Building the models, however, is labor-intensive and error prone, and the models require regular updates, as programming frameworks are constantly evolving.

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory, working with colleagues at the University of Maryland, have taken an important step toward enabling symbolic execution of applications written using programming frameworks, with a system that automatically constructs models of framework libraries.

The researchers compared a model generated by their system with a widely used model of Java’s standard library of graphical-user-interface components, which had been laboriously constructed over a period of years. They found that their new model plugged several holes in the hand-coded one.

They described their results in a paper they presented last week at the International Conference on Software Engineering. Their work was funded by the National Science Foundation’s Expeditions Program.

“Forty years ago, if you wanted to write a program, you went in, you wrote the code, and basically all the code you wrote was the code that executed,” says Armando Solar-Lezama, an associate professor of electrical engineering and computer science at MIT, whose group led the new work. “But today, if you want to write a program, you go and bring in these huge frameworks and these huge pieces of functionality that you then glue together, and you write a little code to get them to interact with each other. If you don’t understand what that big framework is doing, you’re not even going to know where your program is going to start executing.”

One of its servers are compromised

Anonymity networks protect people living under repressive regimes from surveillance of their Internet use. But the recent discovery of vulnerabilities in the most popular of these networks — Tor — has prompted computer scientists to try to come up with more secure anonymity schemes.

At the Privacy Enhancing Technologies Symposium in July, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory and the École Polytechnique Fédérale de Lausanne will present a new anonymity scheme that provides strong security guarantees but uses bandwidth much more efficiently than its predecessors. In experiments, the researchers’ system required only one-tenth as much time as similarly secure experimental systems to transfer a large file between anonymous users.

“The initial use case that we thought of was to do anonymous file-sharing, where the receiving end and sending end don’t know each other,” says Albert Kwon, a graduate student in electrical engineering and computer science and first author on the new paper. “The reason is that things like honeypotting” — in which spies offer services through an anonymity network in order to entrap its users — “are a real issue. But we also studied applications in microblogging, something like Twitter, where you want to anonymously broadcast your messages to everyone.”

The system devised by Kwon and his coauthors — his advisor, Srini Devadas, the Edwin Sibley Webster Professor of Electrical Engineering and Computer Science at MIT; David Lazar, also a graduate student in electrical engineering and computer science; and Bryan Ford SM ’02 PhD ’08, an associate professor of computer and communication sciences at the École Polytechnique Fédérale de Lausanne — employs several existing cryptographic techniques but combines them in a novel manner.

Shell game

The heart of the system is a series of servers called a mixnet. Each server permutes the order in which it receives messages before passing them on to the next. If, for instance, messages from senders Alice, Bob, and Carol reach the first server in the order A, B, C, that server would send them to the second server in a different order — say, C, B, A. The second server would permute them before sending them to the third, and so on.

An adversary that had tracked the messages’ points of origin would have no idea which was which by the time they exited the last server. It’s this reshuffling of the messages that gives the new system its name: Riffle.

Best algorithms for network communication.

Ants, it turns out, are extremely good at estimating the concentration of other ants in their vicinity. This ability appears to play a role in several communal activities, particularly in the voting procedure whereby an ant colony selects a new nest.

Biologists have long suspected that ants base their population-density estimates on the frequency with which they — literally — bump into other ants while randomly exploring their environments.

That theory gets new support from a theoretical paper that researchers from MIT’s Computer Science and Artificial Intelligence Laboratory will present at the Association for Computing Machinery’s Symposium on Principles of Distributed Computing conference later this month. The paper shows that observations from random exploration of the environment converge very quickly on an accurate estimate of population density. Indeed, they converge about as quickly as is theoretically possible.

Beyond offering support for biologists’ suppositions, this theoretical framework also applies to the analysis of social networks, of collective decision making among robot swarms, and of communication in ad hoc networks, such as networks of low-cost sensors scattered in forbidding environments.

“It’s intuitive that if a bunch of people are randomly walking around an area, the number of times they bump into each other will be a surrogate of the population density,” says Cameron Musco, an MIT graduate student in electrical engineering and computer science and a co-author on the new paper. “What we’re doing is giving a rigorous analysis behind that intuition, and also saying that the estimate is a very good estimate, rather than some coarse estimate. As a function of time, it gets more and more accurate, and it goes nearly as fast as you would expect you could ever do.”

Random walks

Musco and his coauthors — his advisor, NEC Professor of Software Science and Engineering Nancy Lynch, and Hsin-Hao Su, a postdoc in Lynch’s group — characterize an ant’s environment as a grid, with some number of other ants scattered randomly across it. The ant of interest — call it the explorer — starts at some cell of the grid and, with equal probability, moves to one of the adjacent cells. Then, with equal probability, it moves to one of the cells adjacent to that one, and so on. In statistics, this is referred to as a “random walk.” The explorer counts the number of other ants inhabiting every cell it visits.

In their paper, the researchers compare the random walk to random sampling, in which cells are selected from the grid at random and the number of ants counted. The accuracy of both approaches improves with each additional sample, but remarkably, the random walk converges on the true population density virtually as quickly as random sampling does.

That’s important because in many practical cases, random sampling isn’t an option. Suppose, for instance, that you want to write an algorithm to analyze an online social network — say, to estimate what fraction of the network self-describes as Republican. There’s no publicly available list of the network’s members; the only way to explore it is to pick an individual member and start tracing connections.

Practical applications for non-native English speakers

After thousands of hours of work, MIT researchers have released the first major database of fully annotated English sentences written by non-native speakers.

The researchers who led the project had already shown that the grammatical quirks of non-native speakers writing in English could be a source of linguistic insight. But they hope that their dataset could also lead to applications that would improve computers’ handling of spoken or written language of non-native English speakers.

“English is the most used language on the Internet, with over 1 billion speakers,” says Yevgeni Berzak, a graduate student in electrical engineering and computer science, who led the new project. “Most of the people who speak English in the world or produce English text are non-native speakers. This characteristic is often overlooked when we study English scientifically or when we do natural-language processing for English.”

Most natural-language-processing systems, which enable smartphone and other computer applications to process requests phrased in ordinary language, are based on machine learning, in which computer systems look for patterns in huge sets of training data. “If you want to handle noncanonical learner language, in terms of the training material that’s available to you, you can only train on standard English,” Berzak explains.

Systems trained on nonstandard English, on the other hand, could be better able to handle the idiosyncrasies of non-native English speakers, such as tendencies to drop or add prepositions, to substitute particular tenses for others, or to misuse particular auxiliary verbs. Indeed, the researchers hope that their work could lead to grammar-correction software targeted to native speakers of other languages.

Diagramming sentences

The researchers’ dataset consists of 5,124 sentences culled from exam essays written by students of English as a second language (ESL). The sentences were drawn, in approximately equal distribution, from native speakers of 10 languages that are the primary tongues of roughly 40 percent of the world’s population.

Every sentence in the dataset includes at least one grammatical error. The original source of the sentences was a collection made public by Cambridge University, which included annotation of the errors, but no other grammatical or syntactic information.

To provide that additional information, Berzak recruited a group of MIT undergraduate and graduate students from the departments of Electrical Engineering and Computer Science (EECS), Linguistics, and Mechanical Engineering, led by Carolyn Spadine, a graduate student in linguistics.

After eight weeks of training in how to annotate both grammatically correct and error-ridden sentences, the students began working directly on the data. There are three levels of annotation. The first involves basic parts of speech — whether a word is a noun, a verb, a preposition, and so on. The next is a more detailed description of parts of speech — plural versus singular nouns, verb tenses, comparative and superlative adjectives, and the like.

Light on purpose of inhibitory neurons

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory have developed a new computational model of a neural circuit in the brain, which could shed light on the biological role of inhibitory neurons — neurons that keep other neurons from firing.

The model describes a neural circuit consisting of an array of input neurons and an equivalent number of output neurons. The circuit performs what neuroscientists call a “winner-take-all” operation, in which signals from multiple input neurons induce a signal in just one output neuron.

Using the tools of theoretical computer science, the researchers prove that, within the context of their model, a certain configuration of inhibitory neurons provides the most efficient means of enacting a winner-take-all operation. Because the model makes empirical predictions about the behavior of inhibitory neurons in the brain, it offers a good example of the way in which computational analysis could aid neuroscience.

The researchers will present their results this week at the conference on Innovations in Theoretical Computer Science. Nancy Lynch, the NEC Professor of Software Science and Engineering at MIT, is the senior author on the paper. She’s joined by Merav Parter, a postdoc in her group, and Cameron Musco, an MIT graduate student in electrical engineering and computer science.

For years, Lynch’s group has studied communication and resource allocation in ad hoc networks — networks whose members are continually leaving and rejoining. But recently, the team has begun using the tools of network analysis to investigate biological phenomena.

“There’s a close correspondence between the behavior of networks of computers or other devices like mobile phones and that of biological systems,” Lynch says. “We’re trying to find problems that can benefit from this distributed-computing perspective, focusing on algorithms for which we can prove mathematical properties.”

Artificial neurology

In recent years, artificial neural networks — computer models roughly based on the structure of the brain — have been responsible for some of the most rapid improvement in artificial-intelligence systems, from speech transcription to face recognition software.

An artificial neural network consists of “nodes” that, like individual neurons, have limited information-processing power but are densely interconnected. Data are fed into the first layer of nodes. If the data received by a given node meet some threshold criterion — for instance, if it exceeds a particular value — the node “fires,” or sends signals along all of its outgoing connections.

Each of those outgoing connections, however, has an associated “weight,” which can augment or diminish a signal. Each node in the next layer of the network receives weighted signals from multiple nodes in the first layer; it adds them together, and again, if their sum exceeds some threshold, it fires. Its outgoing signals pass to the next layer, and so on.

In artificial-intelligence applications, a neural network is “trained” on sample data, constantly adjusting its weights and firing thresholds until the output of its final layer consistently represents the solution to some computational problem.

Senior Garrett Parrish combines art and technology

Garrett Parrish grew up singing and dancing as a theater kid, influenced by his older siblings, one of whom is an actor and the other a stage manager. But by the time he reached high school, Parrish had branched out significantly, drumming in his school’s jazz ensemble and helping to build a state-championship-winning robot.

MIT was the first place Parrish felt he was able to work meaningfully at the nexus of art and technology. “Being a part of the MIT culture, and having the resources that are available here, are what really what opened my mind to that intersection,” the MIT senior says. “That’s always been my goal from the beginning: to be as emotionally educated as I am technically educated.”

Parrish, who is majoring in mechanical engineering, has collaborated on a dizzying array of projects ranging from app-building, to assistant directing, to collaborating on a robotic opera. Driving his work is an interest in shaping technology to serve others.

“The whole goal of my life is to fix all the people problems. I sincerely think that the biggest problems we have are how we deal with each other, and how we treat each other. [We need to be] promoting empathy and understanding, and technology is an enormous power to influence that in a good way,” he says.

Technology for doing good

Parrish began his academic career at Harvard University and transferred to MIT after his first year. Frustrated at how little power individuals often have in society, Parrish joined DoneGood co-founders Scott Jacobsen and Cullen Schwartz, and became the startup’s chief technology officer his sophomore year. “We kind of distilled our frustrations about the way things are into, ‘How do you actionably use people’s existing power to create real change?’” Parrish says.

The DoneGood app and Chrome extension help consumers find businesses that share their priorities and values, such as paying a living wage, or using organic ingredients. The extension monitors a user’s online shopping and recommends alternatives. The mobile app offers a directory of local options and national brands that users can filter according to their values. “The two things that everyday people have at their disposal to create change is how they spend their time and how they spend their money. We direct money away from brands that aren’t sustainable, therefore creating an actionable incentive for them to become more sustainable,” Parrish says.

DoneGood has raised its first round of funding, and became a finalist in the MIT $100K Entrepreneurship Competition last May. The company now has five full-time employees, and Parrish continues to work as CTO part-time. “It’s been a really amazing experience to be in such an important leadership role. And to take something from the ground up, and really figure out what is the best way to actually create the change you want,” Parrish says. “Where technology meets cultural influence is very interesting, and it’s a space that requires a lot of responsibility and perspective.”

Artificial intelligence technique

In the past 10 years, the best-performing artificial-intelligence systems — such as the speech recognizers on smartphones or Google’s latest automatic translator — have resulted from a technique called “deep learning.”

Deep learning is in fact a new name for an approach to artificial intelligence called neural networks, which have been going in and out of fashion for more than 70 years. Neural networks were first proposed in 1944 by Warren McCullough and Walter Pitts, two University of Chicago researchers who moved to MIT in 1952 as founding members of what’s sometimes called the first cognitive science department.

Neural nets were a major area of research in both neuroscience and computer science until 1969, when, according to computer science lore, they were killed off by the MIT mathematicians Marvin Minsky and Seymour Papert, who a year later would become co-directors of the new MIT Artificial Intelligence Laboratory.

The technique then enjoyed a resurgence in the 1980s, fell into eclipse again in the first decade of the new century, and has returned like gangbusters in the second, fueled largely by the increased processing power of graphics chips.

“There’s this idea that ideas in science are a bit like epidemics of viruses,” says Tomaso Poggio, the Eugene McDermott Professor of Brain and Cognitive Sciences at MIT, an investigator at MIT’s McGovern Institute for Brain Research, and director of MIT’s Center for Brains, Minds, and Machines. “There are apparently five or six basic strains of flu viruses, and apparently each one comes back with a period of around 25 years. People get infected, and they develop an immune response, and so they don’t get infected for the next 25 years. And then there is a new generation that is ready to be infected by the same strain of virus. In science, people fall in love with an idea, get excited about it, hammer it to death, and then get immunized — they get tired of it. So ideas should have the same kind of periodicity!”

Weighty matters

Neural nets are a means of doing machine learning, in which a computer learns to perform some task by analyzing training examples. Usually, the examples have been hand-labeled in advance. An object recognition system, for instance, might be fed thousands of labeled images of cars, houses, coffee cups, and so on, and it would find visual patterns in the images that consistently correlate with particular labels.

Modeled loosely on the human brain, a neural net consists of thousands or even millions of simple processing nodes that are densely interconnected. Most of today’s neural nets are organized into layers of nodes, and they’re “feed-forward,” meaning that data moves through them in only one direction. An individual node might be connected to several nodes in the layer beneath it, from which it receives data, and several nodes in the layer above it, to which it sends data.

To each of its incoming connections, a node will assign a number known as a “weight.” When the network is active, the node receives a different data item — a different number — over each of its connections and multiplies it by the associated weight. It then adds the resulting products together, yielding a single number. If that number is below a threshold value, the node passes no data to the next layer. If the number exceeds the threshold value, the node “fires,” which in today’s neural nets generally means sending the number — the sum of the weighted inputs — along all its outgoing connections.