2.14 Does the Nature of the Amino Acids Influence [[alpha]]-Helix and [[beta]]-Pleated Sheet Formation?

As you examine the structure of HIV Protease, you will notice that not all of the protein has a well-defined secondary structure. Why, for example, doesn't the entire protein coil into an a-helix or loop back on itself to form a large sheeted structure? The influences on protein structure are diverse. We have already seen that salt bridges, hydrophobicity/hydrophilicity, and hydrogen bonding all act upon the structure of a protein. Because different types of amino acids have side groups with different hydrophobicities and hydrophilicities, different charges, and different hydrogen bonding capabilities we expect that structure of local regions of a protein will vary in response to the amino acid composition. It is no surprise that just two dozen amino acids can give rise to so many proteins with such varied properties. There are virtually unlimited ways of combining different amino acids. However, in terms of understanding why a protein adopts a particular structure, this diversity of influences makes the problem nearly intractable. Nonetheless, we can gain insight by examining one of the special amino acids, proline, in more detail.

The cyclic structure of proline strongly influences the secondary structures of proteins. Because the amino group of proline is a secondary amine, formation of a peptide bond yields an amide in which there is no N-H hydrogen bond donor function. Furthermore, the geometric constraints imposed by the five membered ring prevent a protein strand with a proline in the middle from adopting a fully extended, linear structure (Figure 2.31). The beginning of a) in the protein strand results from these constraints.

Figure 2.31 A strand of protein in the a) linear, fully extended conformation, and b) the same strand with a proline in the middle.

a)

b)

Nature takes advantage of the special geometric features of the amino acid proline. Recall how the formation of b-pleated sheets requires that a strand of protein loop back upon itself; such regions are called b-turns. Now look at the kink induced by proline in the strand of protein depicted above. The constrained geometry of the proline amino acid initiates turning of a protein strand. Not surprisingly, analysis of the structures of many proteins reveals that theturn regions of b-pleated sheets frequently contain proline residues.

Turn regions of b-sheets often lie on the exterior of proteins. Not too surprisingly, it is often found that b-turn regions contain polar amino acid residues.

The "kinking" of a protein strand induced by proline can destabilize a-helix formation. One can imagine the formation of an a-helix by taking a fully extended chain of amino acids sketched out on a piece of paper or an overhead transparency and then rolling the paper into a tube so that the strand forms a helix. If the protein strand is kinked by the presence of a proline residue in the middle then the amino acids will be able to form the arrangements of hydrogen bonds needed to stabilize the a-helix. Thus, proline is never found beyond the first three residues of an a-helix.

2.15 HIV Protease Comprises Two Strands of Amino Acids

HIV Protease has two strands of proteins with no covalent connections. The two strands are identical in their amino acid content and sequencing. The superstructure of proteins that results from more than one peptide chain coming together is called quaternary structure. The individual strands are called subunits (or segments) of the protein. For example, the oxygen transport protein hemoglobin has four subunits. The subunits of HIV Protease are held together by the cumulative effects of the same weak intermolecular interactions that cause proteins to fold, twist, and turn: hydrogen bonds, salt bridges, and hydrophobic/hydrophilic effects. ??In HIV protease the main interactions holding the strands together seem to be hydrophobic effects. By coming together in the form of a dimer, the strands appear to minimize the exposure of hydrophobic regions to water. Hence, the regions in which the two strands come closest together are rich in non-polar residues.

2.16 How Do We Know the Structures of Proteins?

So far we have assumed the structure of HIV Protease and used that structure to reveal some of the structural features of carboxylic acids, amides, and esters. But how do we know that this structure is a good representation of the protein? You may be aware of that a common technique for elucidating the three-dimensional structures of molecules is X-ray crystallography. In this method the diffraction of X-rays by planes of atoms that are arrayed orderly in single crystals can be analyzed to determine the positions of the atoms. This is a popular and rapid technique for small molecules but much more difficult for proteins because of their size and difficulty of crystallization. Furthermore, one may be concerned that in packing the protein molecules to form a crystal, the molecules have deformed from their solution structures. Recent technological developments have provided another method for characterizing the three dimensional structure of proteins: NMR spectroscopy.

Let us begin our discussion of structure determination by NMR by distinguishing three-dimensional structures from topological structures. The most common application of NMR spectroscopy is to determine the topological structure; i.e. the nature of the functional groups and how they are connected together. Consider the 1H NMR spectrum of N-methyl formamide.

Figure 2.32. Proton NMR Spectrum of N-methyl formamide

The appearance of a doblet at high field, a broadened quartet of doublets at lower field, and a doblet at very low field that integrate to a 3:1:1 ratio of areas are consistent with N-methyl formamide. That is the combination of coupling, intensities, and chemical shift indicate that a methyl group is coupled to the N-H proton, and that the N-H proton is coupled to both a methyl group and a proton attached to a carbonyl. What this NMR spectrum does not tell us is the three dimensional structure. Is the methyl group syn or anti to the carbonyl? Is the amide group planar?

A variant of the normal NMR spectroscopy, called Nuclear Overhauser Effect SpectroscopY (NOESY), provides distance related information that permits us to answer such questions. The NOESY spectrum is depicted below, note how the spectrum looks very different from the NMR spectra that you have seen thus far.

Figure 2.33 2D-NOESY Spectrum of N-methyl formamide

The NOESY data shown above is referred to as a 2D NMR spectrum. This contrasts with normal NMR spectra which plot intensity vs. chemical shift. For 1D spectroscopy the indicated dimension is chemical shift; the intensity dimension is assumed. 2D NOESY results contain three dimensions of information: intensity in one dimension and chemical shift in the remaining two. The depiction of three dimensional information in two dimesions (such as a computer monitor or piece of paper) requires the use of a projection technique. Two common techniques, relief maps and contour maps are illustrated below.

Figure 2.34 Relief and Contour Maps of Mountain Peaks.

The principal advantage of the contour map is that none of the surface features are hidden. In the 2D NOESY spectra, the contour lines represent intensity information projected onto a plane containing chemical shift information.

The off-diagonal peaks in 2D NMR spectra reveal information about the distances between protons. Notice how the 2D NMR spectrum of N-methyl formamide shows substantial symmetry. The set of peaks that runs along the diagonal from the lower left corner to the upper right corner is essentially the normal 1D spectrum. Along both the vertical and horizontal axes these peaks have been projected in Figure 2.33. These projections have the same information as the 1D-spectrum. The off-diagonal peaks are arrayed symmetrically about the diagonal peaks. The magnitude of the off-diagonal peaks are proportional to the distance between the protons involved. Let's see how this works by analyzing the 2D NOESY spectrum of N-methyl formamide in detail.

First examine the off-diagonal peak that connects the methyl group to the amide N-H proton. These H's are close together in space so we expect and observe a large off-diagonal peak (indicated by several contour lines). We expect smaller off-diagonal peaks interconnecting the formyl (H-C=O) proton with the amide N-H and methyl groups because the distances are greater. However, notice how the off-diagonal peak for the N-H proton is much larger than that for the methyl. This suggests that the amide exists predominately with the methyl group syn to the carbonyl oxygen. Quantitative analysis of the NOE intensities confirms this assignment.

Why use 2D NMR spectra? Imagine that you have obtained a 1H NMR spectrum of HIV Protease. Except for the N-terminal residue, each amino acid will exhibit an amide N-H resonance, leading to 98 peaks in the amide region. This leads to a very crowded, highly overlapped spectrum. The advantage of 2D NMR spectroscopy is that the information is spread across two dimensions rather than one. Whereas it is impossible to separate out the contributions made by individual resonances in the 1D spectra of proteins, individual assignments often can be made in 2D spectra.

Now lets look at the 2D NOESY data of a small dipeptide, N-acetyl-(L)-Pro-(D)-Ala-NHMe (Figure 2.35). Essentially we have taken the dipeptide of L-Proline and D-Alanine (the unnatural enantiomer), capped off the N-terminus with an acetyl (Me-CO-) group, and turned the C-terminus into an amide of methylamine. Here there are enough peaks that the advantages of the 2D spectrum are clearer: off-diagonal peaks are clearly resolved and easily quantified. Does the NOESY data support a conformation of the small peptide that is fully extended or folded? Structural depictions of fully extended and folded conformations are shown below. Note how the folded conformation is enforced by formation of a hydrogen bond from the N-H of the C-terminal amide to the carbonyl O of the acetyl which caps the N-terminus. This folding pattern is essentially that of a b-turn in b-sheeted structures of larger proteins. We can detect the presence of a folded conformation for this special dipeptide from the off-diagonal peak that connects the NHMe proton to the a-CH of the Proline residue. As a general rule of thumb, the appearance of a NOESY cross-peak indicates that the protons are closer together than 5. The only conformation that brings the NHMe proton close to the Proline ring protons is that in which a hydrogen bond causes the structure to fold.

Figure 2.35. 2D-NOESY spectrum of N-acetyl-(L)-Pro-(D)-Ala-NHMe.

2.17. The Structure of HIV Protease and the Design of Inhibitors

The structure of HIV Protease to which we have been referring was determined by X-ray crystallography. In overview this structure exhibits four levels of structure: primary structure (the sequence of amino acids), secondary structure (b-pleated sheets and a-helices), tertiary structure (the overall folding enforced by salt bridges, hydrogen bonds, and hydrophobic/hydrophylic effects), and quaternary structure (two strands held together by non-covalent forces). Knowledge of this structure can be exploited in the design of new drugs that will inhibit the enzyme by binding more tightly to the enzyme than its normal substrate. A good example of the thinking that goes into inhibitor design comes from the recent chemical literature.

       "The prevalent strategy for peptidomimetic design
        involves the enforced emulation of specific motifs
        of peptide secondary structure.  This approach
        reflects the accepted view that biologically active
        peptides adopt discrete  conformations upon binding,
        even though many energetically similar conformations
        are accessible.  Peptidomimetics  based upon the
        b-strand motif, which proviously attracted relatively
        little attention,  have gained considerable importance
        with the discovery that peptidal substrates and inhibitors
        bind to diverse proteolytic enzymes as b-strands.
        Notably, X-ray studies revealed b-strand bound
        conformations for inhibitors of the HIV protease, an
        aspartic acid protease required for processing of the
        viral polyprotein precursor."
Smith, A. B. III; Hirschmann, R.; Pasternak, A.; Guzman, M. C.; Yokoyama, A.; Sprengeler, P. A.; Darke, P. L.; Emini, E. A.; Schleif, W. A. Journal of the American Chemical Society 1995, 117, 11113-11123.

Let's see if we can decipher this quote. A peptidomimetic is a non-peptide that mimicks the shape and functionality of a small protein (a peptide). A "b-strand motif" of peptide secondary structure is a b-pleated sheet. We can paraphrase this section as follows. Structures of inhibitors bound to HIV protease adopt the conformation of a b-strand. A small molecule that adopts the b-strand conformation is more likely to be a good inhibitor of HIV protease than one that does not have a b-turn.

Knowledge of an enzyme structure allows one to create models for how a substrate binds and reacts in an enzyme-catalyzed process. In the case of HIV Protease, the natural substrate (the gaG protein) is itself a protein. HIV protease cleaves this protein at a very specific place in the gaG sequence. Thus a molecule that mimicks the functionality and shape of the gaG protein clevage region, but that itself cannot be cleaved by HIV protease, is an excellent candidate for an inhibitor of the enzyme. In order to better understand how HIV protease binds and cleaves proteins, we will explore the fundamental chemical reactions of amide linkages in the next section.