Close Mobile Menu

Wrangling Big Data

December 9, 2013
by Anne Pinckard
A cowboy wrangling

A Berkeley lab is hard at work making sense of the information age.

Imagine a website that could offer you personalized medical advice. You could log on and input your symptoms and medical history. The program would then compare your situation to that of other people with a similar condition, perhaps analyze your genotype, consult with a few hundred doctors as necessary, and then provide you with a diagnosis and treatment recommendation.

It’s a vision that may one day become a reality, thanks to Berkeley’s AMPLab. Started in 2011 and funded for six years, the lab aims to tackle Big Data, the massive amount of information generated and aggregated in the world today. Every day, millions of Internet users log on to shop, browse, or check their Facebook page, using computers, tablets, and smartphones—creating more data. Such information represents a gold mine of consumer research for companies—if only they could wrangle it all into something sensible.

Similarly, scientific research increasingly requires collection and aggregation of data, whether it’s analyzing genetic code from thousands of samples to identify cancer genes, analyzing cosmic waves for signs of intelligent life, or compiling data from multiple sources to model climate change. And as the need for more efficient ways to handle such volumes of data grows, so, too, do the opportunities for new technologies.

Michael Franklin, AMPLab’s director, says he and his colleagues in Berkeley’s computer science department saw early on how important Big Data was going to be. In 2009, they began putting together a plan for the AMPLab. In addition to recruiting a multidisciplinary team of faculty, the lab’s founders, many of whom had worked in the private sector before joining the faculty, knew industry partners would provide crucial input.

Academics often favor complex, intricate problems, Franklin explained, whereas people in the industry prefer simplistic solutions. Furthermore, often academics are afraid they can’t compete. “They don’t have the resources that a company like Google or Facebook can bring to the problem,” Franklin says. But researchers in academia have an advantage, too: They can try something new without worrying about upsetting their existing customers.

Today, the lab has 22 private partners, including Google and Amazon Web Services (a division of the online retail giant that offers computing services in the cloud). Twice a year, AMPLab holds conferences, during which lab members present their research and receive feedback from company engineers. The engineers also have a chance to present key trends in the industry and the problems they are facing. The best part is, the lab can choose to tackle only the problems that interest them.

So, what exactly does the lab do? The answer lies in its name—AMP. First, they develop algorithms (A) for modeling predictions, as in a website that suggests movies based on films you’ve liked in the past. The lab’s open-source program Berkeley Data Analytic Stack (BDAS, pronounced, you guessed it, “bad ass”), handles the machines (M) required to do the processing, by organizing multiple computers so they work in concert and can be controlled by a few users. The lab’s other programs deal with data storage and organization.

The P is for people—and not just the researchers. There are some problems, Franklin explains, that people are better at solving than computers are, such as indentifying images or understanding words in context. AMPLab designs ways to outsource these tasks to humans by soliciting answers from “the crowd.”

An example of how this all might work together is a recently launched smartphone app called Carat, developed by a post doc at AMPLab. Carat monitors battery use on your phone, noting which apps you use and how much power they require. It sends that information anonymously to the lab’s program, which aggregates your data with data collected from thousands of other phones, to determine how apps, phone models, and operating systems might affect battery usage. The app then sends personalized recommendations on how to reduce power drain. Although Carat is designed to save your phone’s battery life, not yours, the same idea is behind the vision of automated, personalized medical diagnoses.

Franklin enthuses that the lab has been successful beyond his expectations. In 2012, AMPLab’s work earned recognition from the White House, along with a National Science Foundation grant, as part of the nation’s Big Data initiative. Articles by the lab’s students and faculty have been published in the field’s top-notch journals and garnered “best paper” awards, and its programs are readily adopted in industry.

And there’s a more personal benefit for Franklin. “I love what I do,” he says, “but I haven’t had this much fun in my whole career.”

Anne Pinckard ’02 is Senior Editor of California.
Share this article