“Standards are a major issue to be tackled in genomics right now,” says Patrick Chain from Los Alamos National Laboratory (LANL), New Mexico, USA and joint first author. “These proposals are guideposts meant to inform users and generators.”
A range of next-generation sequencing technologies, increasingly deployed in research, generate massive amounts of data in any one of several formats. One example is the Wellcome Trust Sanger Institute where, over the past two years, sequence output has gone from around 100 million bases per day to around 60 billion bases per day.
Perhaps more important, many of these data are short sequence stretches for comparative genomics or other studies on related sequence and not data designed to produce draft or finished genome assemblies.
“There is a widening gap between the output data, draft genomes and finished genomes,” explains Chris Detter, Director of the LANL Joint Genome Institute and senior author on the report, “and a developing confusion over which data sets are of a high quality.”
More on Scicasts.