The Encyclopedia of DNA Elements (ENCODE) began its work in 2003 as the successor to the Human Genome Project. The goal of the NIH- funded ENCODE Consortium is to produce a complete registry of all the regulatory elements encoded in human and mouse genomes, and to make that research open access.
Phase 3 of the project has been completed (Phase 4 is continuing), and a massive amount of data has been collected. “However, most investigators don’t really know how to access or use the large amount of data that has been collected over the past several years,” says Peggy Farnham, PhD, Chair of the Department of Biochemistry and Molecular Medicine and the Vice Dean for Health and Biomedical Science Education at the Keck School of Medicine.
She and other researchers have published the first official reporting of all the data through Phase 3. A total of 5,992 datasets were included in the paper, which was published Wednesday in the journal Nature. Importantly, Farnham says, the paper “describes the development of a new web-based server that provides a streamlined and flexible means by which all researchers can access the ENCODE datasets.”
“The datasets produced by ENCODE are invaluable to the scientific community,” she added. “In addition to the many papers published by members of the Consortium, there are almost 2000 papers already published from researchers outside the consortium using the subsets of the ENCODE data that are most important for addressing their own specific biomedical research question. This is the ultimate goal of the ENCODE Consortium: to advance biomedical research by providing large and difficult to obtain genomic and epigenomic datasets to the scientific community, which will enable researchers to decode the molecular mechanisms that underpin the genetic bases of human traits and diseases.”