The Shendure Lab is part of the Department of Genome Sciences at the University of Washington (Seattle, WA). The mission of the lab is to develop and apply new technologies in genomics and molecular biology. Most projects in the lab exploit new DNA sequencing technologies (Shendure et al., Nature Reviews Genetics 2004; Shendure & Ji, Nature Biotechnology 2008; Shendure & Lieberman Aiden, Nature Biotechnology 2012), and generally fall into one of six areas: 1) next-generation human genetics; 2) genome contiguity & completeness; 3) massively parallel functional analysis; 4) molecular tagging; 5) synthetic biology; 6) translational genomics. Our interests in each of these areas are outlined briefly below, and a full list of publications is available via PubMed.

Next-generation human genetics – Next-generation or massively parallel DNA sequencing technologies have the potential to markedly accelerate genetics research (Shendure et al., Science 2005; Shendure, Genome Biology 2011). We have developed methods for targeted sequence enrichment that enable the cost-effective discovery of genetic variation within regions of interest, e.g. specific genes or the “exome”. These include protocols relying on molecular inversion probes (Porreca et al., Nature Methods 2007; Turner et al., Nature Methods 2009; O’Roak et al. Science 2012) or, alternatively, hybrid capture (Ng et al., Nature 2009). We have also proposed and implemented novel analytical strategies to identify the genetic basis of Mendelian disorders by exome sequencing of a few affected individuals (Ng et al., Nature 2009), including autosomal recessive disorders such as Miller syndrome (Ng et al., Nature Genetics 2009) and autosomal dominant disorders such as Kabuki syndrome (Ng et al. Nature Genetics 2010). In collaboration with other investigators at the University of Washington and elsewhere, we are now applying exome and genome sequencing to additional Mendelian disorders (as one of three NHGRI-funded Centers for Mendelian Genomics), complex heart, lung, and blood related traits (as part of the NHLBI Exome Sequencing Project), and neuropsychiatric disorders such as autism (O’Roak et al. Nature Genetics 2011; O’Roak et al. Nature 2012; O’Roak et al. Science 2012). We are also pursuing the application of exome and genome sequencing to explore the genetic basis for initiation, progression, metastasis and and drug resistance in prostate cancer (Kumar et al. PNAS 2011). For all of these projects, our aim is to engage in discovery while continuing to innovate analytical and experimental methods.

Genome contiguity & completeness – Massively parallel technologies have reduced the per-base cost of genome sequencing by several orders of magnitude. However, the “personal human genomes” sequenced today are strikingly incomplete, both in that a substantial fraction of genetic variation is missed, and also in that the product is blind with respect to haplotype. Furthermore, short read lengths and a lack of methods to establish contiguity over even modest distances have prevented these technologies from delivering high-quality, low-cost de novo assemblies of large genomes. To address these limitations, we are developing novel technologies for “massively parallel contiguity mapping”. These include, for example, clone-based strategies to enable haplotype-resolved human genome sequencing (Kitzman et al., Nature Biotechnology 2011), as well as in situ methods that exploit in vitro transposition to physically shatter genomic DNA (Adey et al. Genome Biology 2011) while retaining contiguity information, i.e. optical sequencing (Schwartz et al. PNAS 2012). Our goals for this work (now funded by the NHGRI’s Advanced Sequencing Technology program) are to facilitate the low-cost, high-quality de novo assembly of complex genomes, as well as the low-cost, haplotype-resolved, near-complete sequencing of human genomes and epigenomes.

Massively parallel functional analysis – A fundamental goal of modern biology is to understand the human genome at single nucleotide resolution. Single nucleotide differences between genomes are causative for, or impact susceptibility to, a host of diseases, while single nucleotide mutations are a primary source of raw material for evolution. Furthermore, mechanistic insights often depend explicitly on the experimental perturbation of single nucleotides or amino acids. A major challenge in genomics is the disconnect between the relatively coarse, descriptive nature of contemporary tools for high-throughput functional annotation and the goal of achieving a high-resolution understanding of all cis-regulatory elements and trans-acting factors encoded by the genome. To address this gap, we are exploiting massively parallel technologies for nucleic acid synthesis and sequencing to develop a new experimental paradigm for dissecting function with single nucleotide resolution (Patwardhan et al. Nature Biotechnology 2009; Patwardhan et al. Nature Biotechnology 2012). In our approach, mutagenized libraries of cis-regulatory elements or trans-acting factors are generated cost-effectively, using microarray-derived or doped oligonucleotides for parallel synthesis or parallel mutagenesis. Complex libraries of mutants are subjected en masse to multiplex in vitro or in vivo assays that measure a defined aspect of function. Finally, the relative impact of individual mutations is deconvolved by massively parallel sequencing. Ongoing efforts are directed at adapting this paradigm to diverse classes of functional sequence. We anticipate that this approach will be broadly useful for elucidating the biological design principles for DNA regulatory elements, and may also facilitate the clinical interpretation of individual human genomes.

Molecular tagging – The “molecular tagging” of entities of interest, e.g. single DNA molecules, single cells, cell lineages, sequencing libraries, etc., can be highly useful. For example, we recently exploited molecular tagging in “subassembly”, a method which extends the utility of short-read, inaccurate sequencing platforms to applications requiring long, accurate reads (Hiatt et al., Nature Methods 2010). In this method, a long DNA fragment library is converted to a population of nested sub-libraries, and a tag sequence directs grouping of short reads derived from the same long fragment, thereby enabling localized assembly and error-correction of long fragment sequences. Related molecular tagging schemes that accurately quantify rare mutations are in development, whereas other projects in the lab are aimed at developing tagging strategies for nucleic acids derived from single cells or cell lineages. We are also interested in developing protocols for rapid, ultra-low-input shotgun library construction that facilitate high levels of sample indexing. For example, we recently reported the extensive characterization of an efficient method for constructing shotgun fragment libraries in which transposase catalyzes in vitro DNA fragmentation and adaptor incorporation simultaneously. We have extended this method’s capabilities by developing protocols for sub-nanogram library construction, exome capture from 50 nanograms of input DNA, PCR-free and colony PCR library construction, and 96-plex sample indexing (Adey et al. Genome Biology 2011), as well as for ultra-low-input whole genome bisulfite sequencing (Adey et al. Genome Research 2012).

Synthetic biology – Synthetic biology is a nascent field whose progress will be dependent in part on the ability to cost-effectively ‘read’ and ‘write’ DNA. Although next-generation sequencing can be used to efficiently ‘read’ DNA, there is a strong need for ‘write’ methods that enable the ultra-cheap and highly accurate synthesis of kilobase-scale or megabase-scale DNA constructs of arbitrary sequence. We are therefore developing generalized protocols for multiplex, error-free gene synthesis that rely on oligonucleotides from programmable microarrays as raw material (as the per-base cost of these oligonucleotides is <1/20th that of conventionally synthesized oligonucleotides, with the potential to be much cheaper). One such method that we recently developed is "dialout PCR", a compelling alternative to in vivo cloning and Sanger sequencing for accurate gene synthesis (Schwartz et al. Nature Methods 2012). We are concurrently developing cost-effective, high-throughput methods for programmable mutagenesis, e.g. to facilitate the efficient construction of complex allelic series. We are applying these methods to reconstruct ancestral or unrecoverable proteins, regulatory elements, and variants thereof for functional characterization.

Translational genomics – Our lab has a strong and ongoing interest in the clinical translation of genomic technologies. Specific areas include: (1) Reproductive genetics - DNA sequencing technologies have the potential to enable the non-invasive prenatal diagnosis of genetic conditions including thousands of Mendelian disorders. We recently exploited haplotype-resolving technologies developed in our lab (Kitzman et al., Nature Biotechnology 2011) to perform the first whole genome sequencing of a fetus, using samples obtained non-invasively from the parents in the second trimester of pregnancy (Kitzman et al., Science Translational Medicine 2012). Ongoing work is aimed at improving the technical performance and scalability of this approach. (2) Clinical sequencing - There is tremendous potential for the sequencing of germline and somatic human genomes in contexts where they will impact diagnosis, prognosis, and/or therapeutic decisions. We are particularly interested in developing reagents and methods for the highly cost-effective targeted sequencing of clinically relevant gene panels for both cancer genetics and medical genetics. (3) Pathogen sequencing - The costs of DNA sequencing are dropping so rapidly that complete genome sequencing of clinical bacterial & viral isolates may soon be possible to implement on a widespread basis, thereby enabling the comprehensive epidemiological analysis of clinically manifest infections. To this end, we have implemented a platform for high-throughput genome sequencing of clinical bacterial isolates, and to date have processed >1,000 bacterial genomes (clinical isolates or strains resulting from experimental evolution).