Data scientists could accomplish in days

Last year, MIT researchers presented a system that automated a crucial step in big-data analysis: the selection of a “feature set,” or aspects of the data that are useful for making predictions. The researchers entered the system in several data science contests, where it outperformed most of the human competitors and took only hours instead of months to perform its analyses.

This week, in a pair of papers at the IEEE International Conference on Data Science and Advanced Analytics, the team described an approach to automating most of the rest of the process of big-data analysis — the preparation of the data for analysis and even the specification of problems that the analysis might be able to solve.

The researchers believe that, again, their systems could perform in days tasks that used to take data scientists months.

“The goal of all this is to present the interesting stuff to the data scientists so that they can more quickly address all these new data sets that are coming in,” says Max Kanter MEng ’15, who is first author on last year’s paper and one of this year’s papers. “[Data scientists want to know], ‘Why don’t you show me the top 10 things that I can do the best, and then I’ll dig down into those?’ So [these methods are] shrinking the time between getting a data set and actually producing value out of it.”

Both papers focus on time-varying data, which reflects observations made over time, and they assume that the goal of analysis is to produce a probabilistic model that will predict future events on the basis of current observations.

Real-world problems

The first paper describes a general framework for analyzing time-varying data. It splits the analytic process into three stages: labeling the data, or categorizing salient data points so they can be fed to a machine-learning system; segmenting the data, or determining which time sequences of data points are relevant to which problems; and “featurizing” the data, the step performed by the system the researchers presented last year.

The second paper describes a new language for describing data-analysis problems and a set of algorithms that automatically recombine data in different ways, to determine what types of prediction problems the data might be useful for solving.

According to Kalyan Veeramachaneni, a principal research scientist at MIT’s Laboratory for Information and Decision Systems and senior author on all three papers, the work grew out of his team’s experience with real data-analysis problems brought to it by industry researchers.

“Our experience was, when we got the data, the domain experts and data scientists sat around the table for a couple months to define a prediction problem,” he says. “The reason I think that people did that is they knew that the label-segment-featurize process takes six to eight months. So we better define a good prediction problem to even start that process.”

Featured discussions on cybersecurity

MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) hosted a summit that brought together cybersecurity experts from business, government, and academia to talk about better ways to combat cyber-threats directed at companies and countries.

Co-organized by the Aspen Institute and CNBC, the “Cambridge Cyber Summit” featured discussions with leaders that include Admiral Michael Rogers, director of the National Security Agency (NSA); and Andrew McCabe, deputy director of the Federal Bureau of Investigation (FBI).

Taking place in MIT’s Kresge Auditorium, the event included a mix of interviews and demos from top government officials, technologists, and “white hat” security hackers, as well as live coverage throughout the day on CNBC.

The summit focused on critical issues in privacy and security. Rogers spoke of the growing threats to public and private companies, including trends on how hacking has changed over time and the importance of law enforcement being able to access criminals’ information.

“The challenge for us is how to access content in a way that will protect [people’s] rights,” but still allow us to generate the answers to protect our citizens,” Rogers said.

Throughout the day there were constant reminders of the summit’s timeliness. During an interview with senior national security official John Carlin, CNBC anchor Andrew Ross Sorkin broke the news of an NSA contractor who had just been arrested for allegedly stealing top-secret information — and asked Carlin to chime in with his thoughts.

“There’s been a shift in approach post-9/11,” said Carlin. “Success is not prosecution after the fact. It’s preventing an attack from occurring in the first place. We need to learn and adjust our defenses to ensure that we can prevent the next one, before it happens.”

During a live demonstration of the dark web and ransomware, CSAIL’s Srini Devadas spoke about the complex nature of anonymity.

“The good side of it is that it protects the everyday web user,” said Devadas, a professor of computer science at MIT. “The other part, in places like the dark web, shows the bad side — when bad people target innocent users.”

White-hat security hacker David Kennedy, who had previously penetrated the website in just four minutes, demonstrated how easy it is to uncover personal information. He took a volunteer from the audience and, using just his full name and hometown, was able to uncover his social security number and address, as well as send him a text message that seemed to come from his wife’s phone.

CSAIL’s Daniel Weitzner, founding director of the MIT Internet Policy Research Initiative, spoke passionately about maintaining user privacy in the face of legitimate national security issues.

“Over the last decade, there have been efforts to introduce technology to make surveillance easier, which has put users at risk,” he said. “There will never be perfectly secure systems, but we will always try to close the gaps.”

The summit comes on the heels of several recent cybersecurity efforts at MIT, including last year’s launch of three new initiatives that span multiple labs and departments. The three efforts — Cybersecurity@CSAIL, the Internet Policy Initiative (CPI), and MIT Sloan’s Interdisciplinary Consortium for Improving Critical Infrastructure Cybersecurity (IC)3 — aim to provide a cross-disciplinary strategy for tackling the complex problem of cybersecurity.

Programs that run on multiprocessor chips

Dynamic programming is a technique that can yield relatively efficient solutions to computational problems in economics, genomic analysis, and other fields. But adapting it to computer chips with multiple “cores,” or processing units, requires a level of programming expertise that few economists and biologists have.

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Stony Brook University aim to change that, with a new system that allows users to describe what they want their programs to do in very general terms. It then automatically produces versions of those programs that are optimized to run on multicore chips. It also guarantees that the new versions will yield exactly the same results that the single-core versions would, albeit much faster.

In experiments, the researchers used the system to “parallelize” several algorithms that used dynamic programming, splitting them up so that they would run on multicore chips. The resulting programs were between three and 11 times as fast as those produced by earlier techniques for automatic parallelization, and they were generally as efficient as those that were hand-parallelized by computer scientists.

The researchers presented their new system last week at the Association for Computing Machinery’s conference on Systems, Programming, Languages and Applications: Software for Humanity.

Dynamic programming offers exponential speedups on a certain class of problems because it stores and reuses the results of computations, rather than recomputing them every time they’re required.

“But you need more memory, because you store the results of intermediate computations,” says Shachar Itzhaky, first author on the new paper and a postdoc in the group of Armando Solar-Lezama, an associate professor of electrical engineering and computer science at MIT. “When you come to implement it, you realize that you don’t get as much speedup as you thought you would, because the memory is slow. When you store and fetch, of course, it’s still faster than redoing the computation, but it’s not as fast as it could have been.”

Outsourcing complexity

Computer scientists avoid this problem by reordering computations so that those requiring a particular stored value are executed in sequence, minimizing the number of times that the value has to be recalled from memory. That’s relatively easy to do with a single-core computer, but with multicore computers, when multiple cores are sharing data stored at multiple locations, memory management become much more complex. A hand-optimized, parallel version of a dynamic-programming algorithm is typically 10 times as long as the single-core version, and the individual lines of code are more complex, to boot.

The CSAIL researchers’ new system — dubbed Bellmania, after Richard Bellman, the applied mathematician who pioneered dynamic programming — adopts a parallelization strategy called recursive divide-and-conquer. Suppose that the task of a parallel algorithm is to perform a sequence of computations on a grid of numbers, known as a matrix. Its first task might be to divide the grid into four parts, each to be processed separately.

The secrets of cryptography

“Split up into groups of three,” directed Sophia Yakoubov, associate staff in the Secure Resilient Systems and Technology Group at MIT Lincoln Laboratory and instructor of the LLCipher cryptography workshop. “Within each group, the person sitting on the left is Alice, the person on the right is Bob, and the person in the middle is Eve. Alice must write a secret message in a notebook and pass it to Bob. Eve must figure out Alice’s message and intercept everything that Alice and Bob pass to each other. Alice and Bob each have a lock and matching key, however, they cannot exchange their keys. How can Alice pass her secret message to Bob so that Eve is unable to unlock and view the secret, and only Bob can read it?”

The 13 high school students participating in the workshop glanced at one another until one brave student addressed the entire class, starting a flurry of conversation: “Any ideas?”

Thus began one of the many hands-on challenges that students tackled at the LLCipher workshop held in August at the MIT campus in Cambridge, Massachusetts, and MIT Lincoln Laboratory in Lexington, Massachusetts. LLCipher is a one-week program that introduces students to modern cryptography, a theoretical approach to securing data such as Alice’s secret message. The program begins with lessons in abstract algebra and number theory that students use to understand theoretical cryptography during lessons later in the workshop.

“I decided that LLCipher was for me when I researched the course topics,” says student Evan Hughes. “As I made my way down the topic list, I didn’t understand many of the concepts, so I immediately applied to the program.”

Because of student feedback from LLCipher’s inaugural year in 2015, Yakoubov extended each lesson from two to six hours. “Many students said they wanted more time on learning,” says Yakoubov. “Specifically, they wanted to learn more than one cryptography technique and apply those techniques to ‘real-world’ scenarios, rather than just learn theory.” This year, in addition to the El Gamal public key cryptosystem, students learned the RSA public key cryptosystem. RSA is one of the most common methods to secure data and uses slightly different math from El Gamal. Both RSA and El Gamal use modular arithmetic, a type of math in which integers “wrap around” upon reaching a certain value, i.e., the modulus, similar to a clock that uses 12 numbers to represent 24 hours in one day. El Gamal uses a very large prime number as a modulus; RSA uses a very large composite number, i.e., a whole number that can be divided evenly by numbers other than 1 or itself, with a secret factorization.

To reinforce the techniques and allow students to apply the theory, Yakoubov, along with the help of Uri Blumenthal and Jeff Diewald of the Secure Resilient Systems and Technology Group, created an online platform that includes El Gamal- and RSA-based challenges. “With these exercises, we are able to show students examples of flawed cryptography so that they can see how easily it can be broken,” says Yakoubov. “Students can visualize huge numbers and see why concepts like randomization are so important to effective encryption.” The platform is used throughout the course and includes six challenges that bolster teamwork and creativity.

“Learning about public key encryption is fun because it is so complicated and secretive,” says student Garrett Mallinson. “I like creating codes that no one else can break or unlock — this is like what characters do on television shows in just 45 minutes.”

During the final day of the course, students toured several Lincoln Laboratory facilities, such as the anechoic chambers and the Flight Test Facility. “I enjoyed the tour around Lincoln Laboratory,” says Hughes. “We always hear about theoretical concepts at school, so it is inspiring to see people applying and making the things we hear about.”

Solve complex urban problems

MIT has signed an agreement to engage in research collaborations with the Amsterdam Institute for Advanced Metropolitan Solutions (AMS) in the Netherlands. The collaboration’s flagship project, led by researchers from multiple departments at MIT, will be to develop a fleet of autonomous boats for the city’s canals.

Based in Amsterdam, the AMS Institute brings together a consortium of public and private partners to tackle complex urban challenges such as water, energy, waste, food, data, and mobility. MIT joins with two research institutions in the Netherlands — the Delft University of Technology and Wageningen University and Research Center — as the core academic partners who will use the city as a living laboratory and test bed.

An interdisciplinary team from MIT has assembled to lead the collaboration’s first project: ROBOAT, an effort to develop a fleet of autonomous boats, or “roboats,” to investigate how urban waterways can be used to improve the city’s function and quality of life.

The ROBOAT project will develop a logistics platform for people and goods, superimposing a dynamic infrastructure over one the world’s most famous water cities. “This project imagines a fleet of autonomous boats for the transportation of goods and people that can also cooperate to produce temporary floating infrastructure, such as on-demand bridges or stages that can be assembled or disassembled in a matter of hours,” says Carlo Ratti, professor of the practice of urban technologies in the MIT Department of Urban Studies and Planning (DUSP).

In addition to infrastructure and transport, ROBOAT will also deploy environmental sensing to monitor water quality and offer data for assessing and predicting issues related to public health, pollution, and the environment. “Water is the bearer of life. By focusing on the water system of the city, ROBOAT can create opportunities for new environmental sensing methods and climate adaptation. This will help secure the city’s quality of life and lasting functionality,” says Arjan van Timmeren, professor and scientific director at AMS, who also envisions a multitude of possibilities for a network of roboats, from real-time sensing similar to the MIT Underworlds project to retrieving the 12,000 bicycles or cleaning up the floating waste that ends up in the Dutch city’s canals each year.

Joining Ratti from MIT as co-principal investigators are Daniela Rus, professor of electrical engineering and computer science and director of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL); Andrew Whittle, the Edmund K. Turner Professor in Civil Engineering in the Department of Civil and Environmental Engineering; and Dennis Frenchman, the Class of 1922 Professor of Urban Design and Planning and director of the DesignX program in the MIT School of Architecture and Planning.

At AMS, Van Timmeren and Stephan van Dijk, research program manager, will coordinate the involvement of another 12 groups of researchers from TU Delft and Wageningen UR. Along with the City of Amsterdam, Waternet, the public water utility of Amsterdam and surrounding areas, will participate in the research.

The first prototypes of autonomous boats, or “roboats,” are expected to be tested in Amsterdam in 2017. The project’s initial phase will last for five years.

With nearly one-quarter of the city covered by water, Amsterdam is an ideal place for developing ROBOAT, according to the researchers. The canal system was once the key functional urban infrastructure of the city and today still plays a major role in recreation and tourism. Amsterdam’s waters, including bridges, canals, and the IJ river and its docks, offer plenty of opportunity to help solve current issues with transportation, mobility, and water quality.

Measuring your heartbeat and breath

As many a relationship book can tell you, understanding someone else’s emotions can be a difficult task. Facial expressions aren’t always reliable: A smile can conceal frustration, while a poker face might mask a winning hand.

But what if technology could tell us how someone is really feeling?

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed “EQ-Radio,” a device that can detect a person’s emotions using wireless signals.

By measuring subtle changes in breathing and heart rhythms, EQ-Radio is 87 percent accurate at detecting if a person is excited, happy, angry or sad — and can do so without on-body sensors.

MIT professor and project lead Dina Katabi envisions the system being used in entertainment, consumer behavior, and health care. Film studios and ad agencies could test viewers’ reactions in real-time, while smart homes could use information about your mood to adjust the heating or suggest that you get some fresh air.

“Our work shows that wireless signals can capture information about human behavior that is not always visible to the naked eye,” says Katabi, who co-wrote a paper on the topic with PhD students Mingmin Zhao and Fadel Adib. “We believe that our results could pave the way for future technologies that could help monitor and diagnose conditions like depression and anxiety.”

EQ-Radio builds on Katabi’s continued efforts to use wireless technology for measuring human behaviors such as breathing and falling. She says that she will incorporate emotion-detection into her spinoff company Emerald, which makes a device that is aimed at detecting and predicting falls among the elderly.

Using wireless signals reflected off people’s bodies, the device measures heartbeats as accurately as an ECG monitor, with a margin of error of approximately 0.3 percent. It then studies the waveforms within each heartbeat to match a person’s behavior to how they previously acted in one of the four emotion-states.

The team will present the work next month at the Association of Computing Machinery’s International Conference on Mobile Computing and Networking (MobiCom).

Memory management scheme

A year ago, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory unveiled a fundamentally new way of managing memory on computer chips, one that would use circuit space much more efficiently as chips continue to comprise more and more cores, or processing units. In chips with hundreds of cores, the researchers’ scheme could free up somewhere between 15 and 25 percent of on-chip memory, enabling much more efficient computation.

Their scheme, however, assumed a certain type of computational behavior that most modern chips do not, in fact, enforce. Last week, at the International Conference on Parallel Architectures and Compilation Techniques — the same conference where they first reported their scheme — the researchers presented an updated version that’s more consistent with existing chip designs and has a few additional improvements.

The essential challenge posed by multicore chips is that they execute instructions in parallel, while in a traditional computer program, instructions are written in sequence. Computer scientists are constantly working on ways to make parallelization easier for computer programmers.

The initial version of the MIT researchers’ scheme, called Tardis, enforced a standard called sequential consistency. Suppose that different parts of a program contain the sequences of instructions ABC and XYZ. When the program is parallelized, A, B, and C get assigned to core 1; X, Y, and Z to core 2.

Sequential consistency doesn’t enforce any relationship between the relative execution times of instructions assigned to different cores. It doesn’t guarantee that core 2 will complete its first instruction — X — before core 1 moves onto its second — B. It doesn’t even guarantee that core 2 will begin executing its first instruction — X — before core 1 completes its last one — C. All it guarantees is that, on core 1, A will execute before B and B before C; and on core 2, X will execute before Y and Y before Z.

The first author on the new paper is Xiangyao Yu, a graduate student in electrical engineering and computer science. He is joined by his thesis advisor and co-author on the earlier paper, Srini Devadas, the Edwin Sibley Webster Professor in MIT’s Department of Electrical Engineering and Computer Science, and by Hongzhe Liu of Algonquin Regional High School and Ethan Zou of Lexington High School, who joined the project through MIT’s Program for Research in Mathematics, Engineering and Science (PRIMES) program.

Improve systems like Gmail and Dropbox

Git is an open-source system with a polarizing reputation among programmers. It’s a powerful tool to help developers track changes to code, but many view it as prohibitively difficult to use.

To make it more user-friendly, a team from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has developed “Gitless,” an interface that fixes many of the system’s core problems without fundamentally changing what it does.

“With Gitless we’ve developed a tool that we think is easier to learn and use, but that still keeps the core elements that make Git popular,” says graduate student Santiago Perez De Rosso, who co-wrote a related paper with MIT Professor Daniel Jackson. “What’s particularly encouraging about this work is that it suggests that the same approach might be used to improve the usability of other software systems, such as Dropbox and Google Inbox.”

Gitless was developed, in part, by looking at nearly 2,400 Git-related questions from the popular programming site StackOverflow. The team then outlined some of Git’s biggest issues, including its concepts of “staging” and “stashing,” and proposed changes aimed at minimizing those problems.

Because Gitless is implemented on top of Git, users can easily switch between the two without having migrate code from one to the other. Plus, their collaborators don’t even have to know that they aren’t big fans of Git.

Perez De Rosso will present the paper at next month’s ACM SIGPLAN conference on “Systems, Programming, Languages and Applications: Software for Humanity” in Amsterdam.

How it works

Git is what’s called a “version control system.” It allows multiple programmers to track changes to code, including making “branches” of a file that can be worked on individually.

Users make changes and then save (or “commit”) them so that everyone knows who did what. If you and a colleague are on version 10 of a file, and you want to try something new, you can create a separate “branch” while your friend works on the “master.”

Makes sense, right? But things get confusing quickly. One feature of Gitless is that it eliminates “staging,” which lets you save just certain parts of a file. For example, let’s say you have a file with both finished and unfinished changes, and you’d like to commit the finished changes. “Staging” lets you commit those changes while keeping the others as a work-in-progress.

However, having a file with both a staged and working version creates tricky situations. If you stage a file and make more changes that you then commit, the version that’s committed is the one you staged before, not the one you’re working on now.

Gitless essentially hides the staging area altogether, which makes the process much clearer and less complex for the user. Instead, there’s a much more flexible “commit” command that still allows you to do things like selecting segments of code to commit.

Another concept that Gitless removes is “stashing.” Imagine that you’re in the middle of a project and have to switch to a different branch of it, but don’t yet want to commit your half-done work. Stashing takes the changes you’ve made and saves them on a stack of unfinished changes that you can restore later. (The key difference between stashing and staging is that, with stashing, changes disappear from the working directory.)

“The problem is that, when switching branches, it can be hard to remember which stash goes where,” says Perez De Rosso. “On top of that, stashing doesn’t help if you are in the middle of an action like a merge that involves conflicting files.”

Gitless solves this issue by making branches completely independent from each other. This makes it much easier and less confusing for developers who have to constantly switch between tasks.

Computer system that could help identify subtle speech

For children with speech and language disorders, early-childhood intervention can make a great difference in their later academic and social success. But many such children — one study estimates 60 percent — go undiagnosed until kindergarten or even later.

Researchers at the Computer Science and Artificial Intelligence Laboratory at MIT and Massachusetts General Hospital’s Institute of Health Professions hope to change that, with a computer system that can automatically screen young children for speech and language disorders and, potentially, even provide specific diagnoses.

This week, at the Interspeech conference on speech processing, the researchers reported on an initial set of experiments with their system, which yielded promising results. “We’re nowhere near finished with this work,” says John Guttag, the Dugald C. Jackson Professor in Electrical Engineering and senior author on the new paper. “This is sort of a preliminary study. But I think it’s a pretty convincing feasibility study.”

The system analyzes audio recordings of children’s performances on a standardized storytelling test, in which they are presented with a series of images and an accompanying narrative, and then asked to retell the story in their own words.

“The really exciting idea here is to be able to do screening in a fully automated way using very simplistic tools,” Guttag says. “You could imagine the storytelling task being totally done with a tablet or a phone. I think this opens up the possibility of low-cost screening for large numbers of children, and I think that if we could do that, it would be a great boon to society.”

Subtle signals

The researchers evaluated the system’s performance using a standard measure called area under the curve, which describes the tradeoff between exhaustively identifying members of a population who have a particular disorder, and limiting false positives. (Modifying the system to limit false positives generally results in limiting true positives, too.) In the medical literature, a diagnostic test with an area under the curve of about 0.7 is generally considered accurate enough to be useful; on three distinct clinically useful tasks, the researchers’ system ranged between 0.74 and 0.86.

To build the new system, Guttag and Jen Gong, a graduate student in electrical engineering and computer science and first author on the new paper, used machine learning, in which a computer searches large sets of training data for patterns that correspond to particular classifications — in this case, diagnoses of speech and language disorders.

Efficient even for data sets

Data analysis — and particularly big-data analysis — is often a matter of fitting data to some sort of mathematical model. The most familiar example of this might be linear regression, which finds a line that approximates a distribution of data points. But fitting data to probability distributions, such as the familiar bell curve, is just as common.

If, however, a data set has just a few corrupted entries — say, outlandishly improbable measurements — standard data-fitting techniques can break down. This problem becomes much more acute with high-dimensional data, or data with many variables, which is ubiquitous in the digital age.

Since the early 1960s, it’s been known that there are algorithms for weeding corruptions out of high-dimensional data, but none of the algorithms proposed in the past 50 years are practical when the variable count gets above, say, 12.

That’s about to change. Earlier this month, at the IEEE Symposium on Foundations of Computer Science, a team of researchers from MIT’s Computer Science and Artificial Intelligence Laboratory, the University of Southern California, and the University of California at San Diego presented a new set of algorithms that can efficiently fit probability distributions to high-dimensional data.

Remarkably, at the same conference, researchers from Georgia Tech presented a very similar algorithm.

The pioneering work on “robust statistics,” or statistical methods that can tolerate corrupted data, was done by statisticians, but both new papers come from groups of computer scientists. That probably reflects a shift of attention within the field, toward the computational efficiency of model-fitting techniques.

“From the vantage point of theoretical computer science, it’s much more apparent how rare it is for a problem to be efficiently solvable,” says Ankur Moitra, the Rockwell International Career Development Assistant Professor of Mathematics at MIT and one of the leaders of the MIT-USC-UCSD project. “If you start off with some hypothetical thing — ‘Man, I wish I could do this. If I could, it would be robust’ — you’re going to have a bad time, because it will be inefficient. You should start off with the things that you know that you can efficiently do, and figure out how to piece them together to get robustness.”

Resisting corruption

To understand the principle behind robust statistics, Moitra explains, consider the normal distribution — the bell curve, or in mathematical parlance, the one-dimensional Gaussian distribution. The one-dimensional Gaussian is completely described by two parameters: the mean, or average, value of the data, and the variance, which is a measure of how quickly the data spreads out around the mean.

If the data in a data set — say, people’s heights in a given population — is well-described by a Gaussian distribution, then the mean is just the arithmetic average. But suppose you have a data set consisting of height measurements of 100 women, and while most of them cluster around 64 inches — some a little higher, some a little lower — one of them, for some reason, is 1,000 inches. Taking the arithmetic average will peg a woman’s mean height at 6 feet 4 inches, not 5 feet 4 inches.

The basis for machine learning systems

In recent years, the best-performing systems in artificial-intelligence research have come courtesy of neural networks, which look for patterns in training data that yield useful predictions or classifications. A neural net might, for instance, be trained to recognize certain objects in digital images or to infer the topics of texts.

But neural nets are black boxes. After training, a network may be very good at classifying data, but even its creators will have no idea why. With visual data, it’s sometimes possible to automate experiments that determine which visual features a neural net is responding to. But text-processing systems tend to be more opaque.

At the Association for Computational Linguistics’ Conference on Empirical Methods in Natural Language Processing, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) will present a new way to train neural networks so that they provide not only predictions and classifications but rationales for their decisions.

“In real-world applications, sometimes people really want to know why the model makes the predictions it does,” says Tao Lei, an MIT graduate student in electrical engineering and computer science and first author on the new paper. “One major reason that doctors don’t trust machine-learning methods is that there’s no evidence.”

“It’s not only the medical domain,” adds Regina Barzilay, the Delta Electronics Professor of Electrical Engineering and Computer Science and Lei’s thesis advisor. “It’s in any domain where the cost of making the wrong prediction is very high. You need to justify why you did it.”

“There’s a broader aspect to this work, as well,” says Tommi Jaakkola, an MIT professor of electrical engineering and computer science and the third coauthor on the paper. “You may not want to just verify that the model is making the prediction in the right way; you might also want to exert some influence in terms of the types of predictions that it should make. How does a layperson communicate with a complex model that’s trained with algorithms that they know nothing about? They might be able to tell you about the rationale for a particular prediction. In that sense it opens up a different way of communicating with the model.”