A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species¶
Why this mattered¶
Before Elshire et al. (2011), genome-wide genotyping in crops and other high-diversity organisms was often constrained by the need for prior marker development, SNP arrays, sequence capture designs, or high-quality reference genomes. This paper helped shift genotyping from a marker-design problem to a sequencing-sampling problem: digest the genome with restriction enzymes, barcode many samples, sequence a reproducible reduced representation, and call variants from the resulting tags. Its importance was especially clear for large, repetitive plant genomes such as maize and barley, where whole-genome resequencing was still expensive and read alignment was computationally difficult. By using methylation-sensitive restriction enzymes to avoid many repetitive regions, the method made dense, genome-wide marker discovery and genotyping practical in species where conventional approaches were slow or inaccessible.
The paradigm shift was not that GBS produced complete genomes, but that it made incomplete, low-cost, repeatable genome sampling useful at population scale. The same experiment could both discover markers and genotype individuals, including in species without a complete reference genome, where consensus sequence tags or dominant marker presence/absence could still support mapping, kinship, and population-structure analyses. That lowered the entry cost for genomic selection, association mapping, diversity surveys, and conservation genetics, especially in crops, wild relatives, and non-model organisms.
Subsequent plant and ecological genomics built heavily on this logic: many thousands of individuals could be genotyped cheaply enough for breeding programs, diversity panels, recombinant inbred populations, and global germplasm surveys. GBS became one of the practical bridges between early next-generation sequencing and today’s routine genomic prediction, high-density genetic maps, imputation-based breeding pipelines, and reduced-representation population genomics. Its lasting contribution was methodological democratization: it made genome-scale variation data available before full genome assemblies, polished references, or species-specific marker platforms were in place.
Abstract¶
Advances in next generation technologies have driven the costs of DNA sequencing down to the point that genotyping-by-sequencing (GBS) is now feasible for high diversity, large genome species. Here, we report a procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs). This approach is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches. By using methylation-sensitive REs, repetitive regions of genomes can be avoided and lower copy regions targeted with two to three fold higher efficiency. This tremendously simplifies computationally challenging alignment problems in species with high levels of genetic diversity. The GBS procedure is demonstrated with maize (IBM) and barley (Oregon Wolfe Barley) recombinant inbred populations where roughly 200,000 and 25,000 sequence tags were mapped, respectively. An advantage in species like barley that lack a complete genome sequence is that a reference map need only be developed around the restriction sites, and this can be done in the process of sample genotyping. In such cases, the consensus of the read clusters across the sequence tagged sites becomes the reference. Alternatively, for kinship analyses in the absence of a reference genome, the sequence tags can simply be treated as dominant markers. Future application of GBS to breeding, conservation, and global species and population surveys may allow plant breeders to conduct genomic selection on a novel germplasm or species without first having to develop any prior molecular tools, or conservation biologists to determine population structure without prior knowledge of the genome or diversity in the species.
Related¶
- cite → Fast and accurate short read alignment with Burrows–Wheeler transform — The GBS pipeline uses BWA's Burrows-Wheeler short-read alignment method to map sequencing tags to reference genomes.