HapMap Charts Widespread DNA Variations
Scientists have a powerful new shortcut in their search for the genetic twists of fate that predispose some people to obesity, protect others from arthritis, and exert numerous other effects on human health.
For the first time, a relatively small and affordable sampling of the human genome can provide the means to track down the many slight differences in genes that conspire to cause type 2 diabetes or explain why an antihypertensive drug lowers blood pressure in one person but causes a devastating side effect in another. The same resource can reveal genes that have been important in recent human evolutionary history.
At the core of all these predicted medical advances is the first rough guide to common genetic variation in people. Known as the HapMap, it catalogs the pastiche of genes and other DNA in chromosome neighborhoods that are inherited in long connected chunks known as haplotypes.
In the Oct. 27 Nature, HMS researchers and their colleagues across Boston and in the International HapMap Consortium report their analysis of the genomes of 269 individuals from four distinct ethnic groups around the world. The paper validates haplotypes as an important unit of inheritance over the last 10,000 years of human history and confirms preliminary findings that most haplotypes come in only four or five patterns with distinctive genetic signposts (see Focus, June 7, 2002).
More importantly for biomedical researchers, the HapMap provides a validated set of genetic tags to sample whole genomes in large studies for the common variations that influence disease.
“This is not about personalized medicine, although that will likely happen in some cases; this is about finally finding the underlying root causes of common diseases,” said David Altshuler, HMS associate professor of genetics at Massachusetts General Hospital and a corresponding author on the paper. “It’s about one of the most fundamental characteristics of biology and medicine. On the order of half of the risk of all individual diseases is due to inherited differences.”
Three related papers in Nature Genetics, PLoS Biology, and Genome Research published at the same time by Altshuler; Mark Daly, assistant professor in the Center for Human Genetic Research at MGH; and their colleagues at MGH and the Broad Institute of Harvard and MIT describe a systematic way to test people who have a clinical endpoint and compare them with others. Researchers can use another method to identify gene variations that may have arisen to protect people at certain times in human history.
The HapMap tells researchers where to look for important variations and has given them the tools to do it at a significant cost reduction, Altshuler said. The Human Genome Project detailed the 99.9 percent of DNA that is identical in all people. This next stage of genome analysis focuses on the 0.1 percent of the 3 billion base pairs that vary among people.
For the Nature paper, HapMap teams in Britain, Canada, China, Japan, and the United States analyzed about 1 million common single nucleotide polymorphisms (SNPs). Most SNPs are correlated to their neighbors, the team found, creating many redundant markers for distinguishing the common varieties of haplotypes.
“This is not about personalized medicine, although that will likely happen in some cases; this is about finally finding the underlying root causes of common diseases.”
Phase 2 of the HapMap project will supplement these genetic signposts with about 2.8 million more SNPs, Altshuler said. All 3.8 million of these SNPs are publicly available at www.hapmap.org in advance of the comprehensive analysis, expected to be published next spring.
Here is the good part. Researchers only need a fraction of those SNPs to conduct genomewide gene association studies. “It appears that if you have 300,000 well-selected tag SNPs chosen from the HapMap across the genome, you have 95 percent as much power to detect a true association as if you had directly tested all 5 million common variants,” said Paul de Bakker, HMS research fellow in genetics at MGH and co–first author of the Nature Genetics paper with Health Sciences and Technology student Roman Yelensky.
Snipping Away Costs
At the beginning of the HapMap project, these numbers would have elicited groans of despair from researchers, who knew the time and money it took to genotype only 1,000 SNPs. But a byproduct of the three-year HapMap project has been enormous technical advances by major manufacturers of genotyping products. “Now we have [prototype] chips that can probe 500,000 SNPs,” de Bakker said. “The advances have driven the costs down exponentially.”
The approach is already bearing fruit. Most recently, a trio of retrospective studies published in the April 15 Science found a new direct genetic risk factor for age-related macular degeneration. The research team that identified a common causal variant of a gene, found in 30 percent of the cases, for a protein involved in chronic inflammation used an early version of the publicly available HapMap data to probe a chromosome region implicated in traditional family linkage studies. People had three times the risk of disease if they carried one copy of the variant and six times the risk if they carried two copies of the variant.
“That paper knocked a lot of people’s socks off,” Altshuler said. “It was a real shot in the arm for this approach.”
For all their enthusiasm, the HapMap researchers urge care and caution in studies based on the HapMap resource. “Rigorous standards of statistical significance will be needed to avoid a flood of false–positive results,” write the authors of the Nature paper. “Multiple replications in large samples provide the most straightforward path to identifying robust and broadly relevant associations. We urge conservatism and restraint in the public dissemination and interpretation of such studies.”
For example, the Broad team will be genotyping 1,500 people with type 2 diabetes and 1,500 controls, but they already plan to retest their strongest findings in another 10,000 DNA samples from people with the disease.
The Need for New Models
Part of the design, grant applications, and analyses of such studies is modeling the expected frequency of gene variants in people with and without the disease to calculate, for example, how many people need to be genotyped and phenotyped. Until now, models have used well-defined mathematical procedures developed by population geneticists and based on the estimated population of people 10,000 years ago randomly mating with each other and distributing their genes among their offspring for many generations.
Compared with real HapMap data about human genetic variation, these models fall short of real-world simulations, according to Broad computational biologist Stephen Schaffner, first author of the paper in the November Genome Research. From now on, researchers will need more complex computer models akin to or better than the one he developed that more closely match HapMap data, he said.
Although it was designed as a resource to sample the genetic underpinnings of disease, the HapMap also has provided some insights into human evolution. Now researchers can sample actual genomes to look for patterns of helpful genetic variations that have spread among people over the centuries and deleterious mutations that have disappeared. Using a method developed three years ago by postdoctoral fellow Pardis Sabeti, an HMS medical student now in her final clinical rotations, the Nature paper confirms some previous findings, such as the lactase gene in the European group that allowed them to thrive on dairy farming, and several genes in the African group that protect against malaria. The analysis revealed dozens of other genes of unknown function that similarly stand out.
Surprisingly, the HapMap resource cannot verify one of the most high-profile examples of evolutionary selection, a variant of the CCR5 gene that protects about 10 percent of people with European ancestry against HIV infection, Sabeti and her colleagues report in a separate paper in the November PLoS Biology. Several years ago, co-author Steve O’Brien at the National Institutes of Health, working with David Reich, HMS assistant professor of genetics, calculated that it arose about 700 years ago, perhaps because it protected people from the bubonic plague. Compared to the genomewide distribution of genetic variation, the CCR5 variant does not stand out and may not have been subject to selection after all. But Altshuler cautions that the HapMap data is not a perfect experiment to look for evolutionary selection, so time will be needed for the implications of such analyses to be fully understood.
Every week, 60 to 80 people from a dozen labs around Harvard, affiliated hospitals, and MIT meet at the Broad Institute Program in Medical Genetics to discuss studies investigating the genetic underpinnings of type 2 diabetes, bipolar disorder, rheumatoid arthritis, congenital heart disease, autism, and multiple sclerosis. Researchers in Britain, Canada, and Japan are also using the approach to search for variants contributing to Parkinson’s disease, Alzheimer’s disease, colon cancer, and tuberculosis.
Altshuler estimates that researchers will learn most of what can be known through the HapMap approach to human genetics in the next five to 10 years. That gives the scientific community some time to prepare for their ultimate goal: having a complete genome sequence of every individual for more finely tuned knowledge about how individual variation contributes to health and disease.