Medical research is in the midst of the "big data" revolution. What is it? Until the past 20 years, most laboratory re-search involved simple ideas that could be tested by simple experiments. For example, 70 years ago the biochemist Linus Pauling discovered the cause of sickle cell disease. In this disease, misshapen red blood cells clog up small blood vessels, leading children and young adults to have recurrent bouts of excruciating pain and even strokes and heart attacks.
Pauling bet that the children had inherited a defect in a specific protein, globin, that caused the protein, and then the red blood cell, to become misshapen. The simple idea led to a simple experiment. He put globin from people with and without the disease into a gel. Then, he turned on an electric current that caused the globin to move through the gel. Diseased globin moved differently from healthy globin. Anyone could see the difference, just looking at the gel. No fancy analysis was necessary.
For a few years it seemed possible that most diseases might be caused by a defect in a single protein or other type of molecule. If so, perhaps simple cures would follow. Unfortunately, by the 1970s it had become clear that inside every cell are thousands of molecules of different types, including proteins and nucleic acids, and that the cause of most diseases involved the interaction of many of these different molecules. That meant that, with most diseases, one would have to analyze thousands of molecules to figure out what was wrong. However, in the 1970s, such analyses were not possible. Scientists could not identify which molecules were the abnormal ones: they could only guess, like Linus Pauling did. And the guesses were often wrong — clever, but wrong.
Fortunately, the effort to identify every one of our roughly 20,000 genes accomplished not only that remarkable feat but also led to the development of technologies that simultaneously analyze thousands of molecules of different types. Even 30 years ago, few scientists imagined that this could be possible in their lifetimes.
However, unlike Pauling's experiment, these experiments involving thousands of molecules require very fancy analysis — new mathematical techniques to make sense out of the thousands of numbers (the "big data") that are generated with each experiment. A whole new field, computational medicine, emerged. At Harvard Medical School we have created a large department dedicated to developing such computational medicine techniques and to training young scientists to use and teach them.
The new world of "big data" in biomedical research has brought about another change, as well. Today, we don't always need a bright idea to advance science. We now have the tools to compare tissue from people with and without a disease and ask what molecules are different. Guessing what might be different is no longer essential. Now, we can just let nature speak for itself.
Image: © poba/Getty Images