The Smith group is an interdisciplinary group of researchers working on the development of novel methods and approaches for the analysis and manipulation of biomolecules. Interest areas include biological mass spectrometry, proteomics, bioinformatics, nucleic acid arrays, nucleic acid hybridization, aptamers, gene assembly, chromatin structure, mass spectrometric instrumentation development and surface chemistry.
The ability to determine the extent and nature of protein variation is a critical missing piece in proteomics today. A surprise revealed by the success of the human genome project was the much lower than anticipated number of genes present in human, in the range of ~20,000 rather than the predicted ~100,000. This fact has led to the general recognition that much of the complexity and sophistication afforded by our biological machinery is at the level of protein variation rather than just resulting from a large number of distinct genes. These protein variations occur on at least three levels: alternative splicing of the RNA transcript, codon substitutions, and a wide variety of post-translational modifications (PTMs). The different proteoforms containing these variations play central roles in a wide variety of biological processes, from cell signaling and signal transduction to gene regulation, and thus determining their identities and abundances is vital to understanding normal and disease biology. Remarkably, there is no technology presently in existence to capture this critical information, and striving to develop one is accordingly a central focus of our laboratory.
Happily, this is a special moment in time. Biology is in the midst of a technological revolution of unprecedented scope. Next generation sequencing platforms allow rapid, inexpensive and comprehensive transcriptomic analyses, and new mass spectrometers of ever-increasing sensitivity can rapidly determine the accurate mass of intact proteins. This technological convergence opens the possibility of dramatically changing the fundamental paradigm of proteomic analyses. We are integrating these state-of-the art genomics and proteomics capabilities into a new two-pronged strategy, in which custom sample-specific proteoform databases are created and used to identify proteoforms from mass spectrometric data. Employing such an informed database avoids the combinatorial explosion of possible proteoforms that has severely hindered identification strategies for intact proteins (e.g. top-down proteomics).
Hybridization Capture of Chromatin-Associated Proteins for Proteomics (HyCCAPP)
Our laboratory co-directs the Wisconsin Center of Excellence in Genomic Sciences (CEGS). This Center is supported by the National Human Genome Research Institute (NHGRI) at NIH, and is a close collaboration with the laboratory of Professor Michael Olivier at the Texas Biomedical Research Institute. Over the past five years we have developed a powerful new tool, HyCCAPP, for dissecting the complexities of gene regulatory processes. HyCCAPP combines i) sequence-specific hybridization capture of DNA fragments of interest directly from a cleared yeast lysate, ii) state-of-the-art mass spectrometric analysis, and iii) a bioinformatics analysis pipeline to statistically differentiate between real and background signal. We validated HyCCAPP by capturing and interrogating four genomic regions in Saccharomyces cerevisiae: two regions within the rDNA locus: 25S and 5S (~150-200 copies/cell), the X-element adjacent to the telomeres (~35 copies/cell) and the GAL1-10 promoter (single copy/cell) (Fig. 2). The different loci have different functions, which are reflected in the proteins that interact with them. These differences were evident in the results from HyCCAPP analysis, which produced distinct sets of proteins for each of the four loci, validating the specificity of the technology. These locus-specific protein lists include many, although not all, previously identified protein interactors, as well as numerous previously unknown interactors that expand our understanding of the biology at these loci. We are currently in the process of extending the strategy to the analysis of mammalian genomes, which are ~300-fold more complex than the yeast genome, and implementing a multiplex strategy to permit many loci to be captured in parallel, thereby increasing throughput while reducing cost and labor. As these and other improvements to the technology are made, the capabilities of HyCCAPP will continue to develop, providing an increasingly powerful new tool for the study of genomic processes.
The advent of high-density DNA arrays in the early ‘90s demonstrated the power of the array concept for genome-wide analyses of biological systems. We use a Maskless Array Synthesizer, to photolithographically fabricate high-density DNA arrays with up to 786,000 individual DNA features. In contrast to standard array fabrication technologies, which employ glass substrates and silane-based biomolecular attachment, we use carbon substrates and have developed novel carbon attachment chemistries that permit biomolecular arrays of unprecedented chemical stability to be made. These and other developments have enabled us to pioneer a new approach to RNA-mediated gene assembly from DNA arrays and also develop the first high-density RNA arrays, which we fabricate enzymatically on the array surface.