Research in the Smith group is directed at the development of powerful new technologies to drive biological research. Our current projects include:
Tools for understanding gene regulation
The successful completion of sequencing the human genome and other genomes ushered in a new era in biological research. A strong focus now is identifying regulatory mechanisms that turn genes on or off, and then understanding how such gene regulation is altered by critical biological processes, diseases, or environmental factors such as drugs. We are developing novel technologies to identify the proteins that bind to particular DNA regions because such regulatory proteins determine, to a large extent, which genes are expressed. Briefly, the technology involves (1) chemical cross-linking of proteins to DNA, (2) fragmentation of the chromatin (long strands of DNA wrapped around proteins), (3) capture of these fragments onto surfaces in a DNA-sequence-specific manner, and (4) mass spectrometry to identify and quantify the proteins.
The advent of high-density DNA arrays in the early ‘90s demonstrated the power of the array concept for genome-wide analyses of biological systems. Our group has a Maskless Array Synthesizer (MAS), which allows any high-density DNA array of interest (up to 786,000 individual DNA features) to be designed and fabricated overnight. Two exciting projects we are presently engaged in are (1) hybridization-driven self-assembly of genes and genomes from RNAs that are enzymatically copied from the array elements, and (2) fabrication of high-density RNA arrays for many applications in the “RNA world.”
Proteoform determination by mass spectrometry
The dominant paradigm of modern proteomics is the "bottom-up" strategy, in which a mixture of proteins of interest is cleaved into peptides and analyzed by liquid chromatography/mass spectrometry (LC-MS). While the bottom-up strategy is powerful and widely practiced, the digestion of the proteins into peptides means that information as to the protein context within which that peptide is found is lost. Proteins produced from the same gene can vary substantially in their molecular structure: genetic variations, splice variants, RNA editing, and post-translation modifications (PTMs), all give rise to different forms of the proteins; these are referred to as "proteoforms." Knowledge of the proteoforms that are present in a system under study is absolutely essential to understanding that system, as the different proteoforms often have dramatically different functional behaviour, and regulation of their production is a central aspect of pathway control.
We are developing a new strategy for proteoform analysis, in which the determination of just two pieces of information for each proteoform, namely the accurate mass and the number of lysine residues contained, suffices to identify it. The accurate mass is determined by standard LC-MS analysis of the undigested protein mixture in an orbitrap mass spectrometer, and the lysine count is determined using a recently developed isotopic tagging method. A key enabling concept is a search strategy that reveals post-translationally modified protein variants. The strategy is demonstrated by elucidating hundreds of proteoform families present in yeast cell lysate. This simple and readily implemented new proteomic strategy provides an unprecedented view of the proteoforms present in biological systems, and will thereby make possible critical new insights into the functioning of biological systems and pathways.