Although an understanding of the HIV lifecycle suggests that inhibition of HIV Protease could lead to a treatment of HIV infections, the creation of HIV Protease Inhibitors requires a detailed understanding of the molecules involved and their interactions. Thus, the development of HIV Protease Inhibitors is taking place at the interface of chemistry, biochemistry, and virology. Overall the idea is to create a drug, commonly a small molecule, that prevents an enzyme, HIV Protease, from performing its biological function. For us to understand the process of developing these new drugs we will need to understand the chemical makeup, structures, and reactions of proteins and their inhibitors.
Examine the pictoral representation of HIV Protease shown in Figure 2.1 below. Although this picture does not look like the representations that we normally associate with molecular structures, HIV Protease is indeed a molecule, actually, two molecules that are loosely associated. In order to design a drug that will inhibit HIV Protease we will need to decipher the meaning of the HIV Protease structure depicted in the figure. To understand why HIV Protease displays its distinctive properties, we will need to learn more about the makeup of proteins. In particular, we will need to learn about the functional groups present and their properties.
Figure 2.1. a) Ribbon and b) ball and stick representations of HIV protease dimer .
a)

b)

The first pictoral representation in Figure 2.1 uses the loops and whirls of the ribbons to show the three-dimensional structure of HIV Protease. Although the ball and stick model of HIV Protease (Figure 2.1) looks nearly incomprehensible, we can make part of HIV Protease look more like the structures we're used to by tranforming a piece of the ribbon diagram into the stick representation as shown in Figure 2.2. The piece of HIV Protease shown in stick form is also shown using dashes and wedges and more traditional symbols (Figure 2.2c). This piece of HIV Protease still looks complicated, but it can be simplified if all carbon chains that lie off the molecular backbone are designated as R1, R2, etc (Figure 2.2d). With the carbon sidechains shown as R groups, the backbone clearly consists of repeating units of NH-CHR-C=O. As we'll see in the next paragraphs, these units are amino acid residues. Amino acids serve as the fundamental building blocks of not just HIV Protease but of all proteins.
Figure 2.2. a) The ribbon structure of HIV1 protease is shown in ribbon with residues 53-57 shown in stick form. b) An enlargement of the stick representation of residues 53 through 57. c) A chemical structure representation of these same residues. d) A generalized chemical structure representation of these residues.
a) b)

c)

d)

An amino acid consists of an amino group, a carboxyl group, a hydrogen atom, and an R group bonded to a carbon atom (Figure 2.2). The R group, also called the side chain, is different in each of the twenty different amino acids commonly found in proteins. In glycine, the simplest amino acid, the R group is a hydrogen atom. Other amino acids have R groups containing carboxylic acids, amides, aromatic rings, alkyl groups, hydroxyl groups, or other more complicated functional groups (Table 2.1). The side chains vary in size, shape, charge, hydrogen-bonding capacity, and chemical reactivity. This wide variety in the side chains is critical because it enables proteins to create intricate three-dimensional structures and to participate in many essential biological processes.
Of the twenty naturally occurring amino acids, alanine, valine, leucine, and isoleucine have no reactive functionality on their side chains. They have only methylene (-CH2-) and methyl (-CH3) groups. Their main contribution to protein structure and reactivity is that they are hydrophobic; they do not interact favorably with water. Instead, they interact favorably with each other and with other nonpolar atoms.
Phenylalanine, tyrosine, and tryptophan all have aromatic side chains. The aromatic side chains cause ultraviolet absorbance and fluorescence properties. Because the spectral properties are sensitive to the environment around the side chains, these residues are useful probes of protein structure.
Two amino acids, serine and threonine, have hydroxyl groups on their side chains. Serine and threonine can react the same way that ethanol reacts. Two other amino acids, methionine and cysteine, have sulfur in their side chains. Sulfur can be oxidized. In cysteine, oxidation of the thiol results in a disulfide bond. Disulfide bonds crosslink proteins and stabilize the protein structure. Since proteins are mostly held together by many weaker noncovalent interactions, disulfide bonds can have a very large effect on the stability of the protein structure.
Aspartic acid and glutamic acid both have carboxylic acids in their side chains. The pKa of Aspartic acids is 3.9 and the pKa of glutamic acid is 4.3; these residues are ionized and very polar under physiological conditions. We will see later that these amino acids participate in salt bridges to stabilize protein structures and also that two aspartic acid residues are found in the active site of HIV Protease.
Asparagine and glutamine are amides rather than carboxylic acids. These side chains do not ionize and are not very reactive. However, they can function as both hydrogen bond donors and acceptors. We will see many more properties of amides in the following sections of this unit.
The two basic residues are lysine and arginine. The side chain of lysine has four methylene groups capped by an amino group with pKa = 11.1. This means that lysine is usually ionized under physiological conditions (pH~7.0). Any unprotonated lysines are active nucleophiles and participate in a variety of reactions. In arginine, there is a strongly basic guanidine (three nitrogens all connected to a central carbon). The pKa of arginine is about 12. The planarity of amide bonds will be discussed in detail later; note that the guanidine group is also planar.
Histidine has an imidazole ring in the side chain. This makes it nucleophilic and basic like other amines, and the pKa of imidazole is 7 so it is neutral at physiological pH. In the nonprotonated form of imidazole, the nitrogen with the hydrogen atom is a hydrogen bond donor while the nitrogen with the lone pair is a hydrogen bond acceptor. Because histidine contains an imidazole ring, it is very versatile.
The twentieth amino acid, proline, has a side chain of methylenes that are covalently bonded to the nitrogen atom of the backbone to form a pyrrolidine ring. The five-membered ring forces the amino acid rigidly into one conformation and can cause kinks in a peptide chain. Proline is thought to play a role in initiating some of the structural motifs regularly found in proteins such as the [[alpha]]-helix. Also, once a peptide bond has been formed using the secondary amine in proline's backbone, no hydrogen bonding is possible becuase no amide hydrogen is present.
Figure 2.3. The general form of amino acids (a) writen as a neutral L-amino acid (S configuration), (b) writen in the zwitterionic form.
a) b)

Although amino acids differ in the makeup of their side chains, the sense of chirality is the same in virtually all naturally occurring amino acids. The four substituents on the tetrahedral carbon (amine, carboxylic acid, hydrogen, and R) are usually in the S orientation, rarely in the R orientation (e.g., bacteria and some worms have small amounts of the S-isomer, or D- amino acids). In protein chemistry, the S-isomer is usually called the L-isomer while the R-isomer is called the D-amino acid. Also, amino acids in solution at neutral pH exist mainly as zwitterions rather than as unionized molecules. This means that the amino group is protonated (NH3+) and the carboxyl group is deprotonated (CO2-). All of these concepts are illustrated in Figure 2.3.
Table 2.1. The Primary Naturally Occuring Amino Acids
| Name | Three letter code | One letter code | Structure ![]() |
|---|---|---|---|
| R = H | |||
| Glycine | Gly | G | ![]() |
| R = Alkyl | |||
| Alanine | Ala | A | ![]() |
| Valine | Val | V | ![]() |
| Leucine | Leu | L | ![]() |
| Isoleucine | Ile | I | ![]() |
| R = Aromatic | |||
| Phenylalanine | Phe | F | ![]() |
| Tyrosine | Tyr | Y | ![]() |
| Tryptophan | Trp | W | ![]() |
| R = Alcohol | |||
| Serine | Ser | S | ![]() |
| Threonine | Thr | T | ![]() |
| R = Thiol, Disulfide | |||
| Methionine | Met | M | ![]() |
| Cysteine | Cys | C | ![]() |
| R = Carboxylic Acid | |||
| Aspartic Acid | Asp | D | ![]() |
| Glutamic Acid | Glu | E | ![]() |
| R = Amide | |||
| Asparagine | Asn | N | ![]() |
| Glutamine | Gln | Q | ![]() |
| R = Base | |||
| Lysine | Lys | K | ![]() |
| Arginine | Arg | R | ![]() |
| R = Special | |||
| Histidine | His | H | ![]() |
| Proline | Pro | P | ![]() |
The amino acids in HIV Protease (and all other proteins) are linked together in an end-to-end arrangement, and the linking bond between two amino acids is called a peptide bond. The peptide bond (also called an amide bond) joins the carboxylic acid of one amino acid to the amino group of a second amino acid. When a small number of amino acids are joined together in this way, a peptide is formed. When many amino acids (often 100 or more) are joined together in this way, a polypeptide or protein such as HIV Protease is formed. Each amino acid unit in the protein is called a residue. HIV Protease is comprised of 99 amino acid residues all linked together by peptide bonds between the carboxylic acid of one residue and the amino group of a second residue. In a protein or peptide, the ordering or sequence of amino acids is called the primary structure. The primary structure tells you about the connectivity of the amino acids. We'll see later that secondary, tertiary, and quaternary structure tell about the three-dimensional structures of proteins. The piece of HIV Protease shown in Figure 2.2b-d represents residues 53-57, and a list of all 99 amino acid residues is shown in Figure 2.4. Note that the customary way of numbering amino acids in proteins is to begin at the amino terminus and work toward the carboxy terminus. Further, amino acids can be named using their full name, a three letter abbreviation, or a one letter abbreviation as shown in Table 2.1.
Figure 2.4. The 99 amino acid residues of HIV Protease.
(N-terminus) Pro Gln Ile Leu Trp Gln Arg Pro Leu Val Thr Ile Lys Ile (15)Gly Gly Gln Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp (30)
Thr Val Leu Glu Glu Met Ser Leu Pro Gly Arg Trp Lys Pro Lys (45)
Met Ile Gly Gly Ile Gly Gly Phe Ile Lys Val Arg Gln Tyr Asp (60)
Gln Ile Leu Ile Glu Ile Cys Gly His Lys Ala Ile Gly Thr Val (75)
Leu Val Gly Pro Thr Pro Val Asn Ile Ile Gly Asr Asn Leu Leu (90)
Thr Gln Ile Gly Cys Thr Leu Asn Phe (99) (C-terminus)
2.2. The Carbonyl Group
There are many different functional groups present in proteins that you have already seen: aromatic groups, alkyl groups, alcohols, amines, and thiols. However, to understand proteins we will need to introduce a new fragment, the carbonyl group. We'll examine the carbonyl group in its various forms with an emphasis on amides and carboxylic acids.
Why the emphasis on carboxylic acids and amides? As mentioned above, condensing a carboxylic acid with an amine generates an amide (Figure 2.5). It is the amide bond which links amino acids together to form the backbone of the protein. Therefore, to understand the structures, physical, and chemical properties of proteins such as HIV Protease, we must develop an understanding of carboxylic acids and amides.
Figure 2.5. Pictures of an amine, a carboxylic acid, and an amide.

The carbonyl group itself (pronounced car-bo-neel) is a fragment that contains a carbon atom which is doubly bonded to an oxygen atom (C=O). The carbonyl group is an essential component of many biologically important molecules and pharmaceutical drugs. All proteins, fatty acids, and many pharmaceuticals contain carbonyl groups. A wide variety of types of carbonyl groups exist. These types are differentiated by the groups attached to the C of the C=0. A list of common types of carbonyl groups are given in Table 2.2. All types of carbonyl groups contain an acyl fragment(often denoted `R' as in the amide example below) which may be alkyl, aryl, alkenyl, etc.
Table 2.2. Types of Carbonyl Groups. The functional groups are shown in blue; the acyl fragment is shown in black.
| Name | Formula |
|---|---|
| Aldehyde | ![]() benzaldehyde |
| Ketone | ![]() 2-butanone |
| Carboxylic acid | ![]() 2-methylpropionic acid |
| Ester | ![]() t-butylcyclopentylcarboxylate |
| Amide | ![]() N,N-dimethylamide |
| Acid Halide | ![]() |
| Acid Anhydride | ![]() acetic anhydride |
Consider the two polymers depicted in Figure 2.6 below. Assign hybridizations to each of the backbone atoms and fill in all of the missing H's using wedged and dashed bonds. Also, draw a curved arrow on all the bonds about which rotation can occur (one is done as an example). How does the polymer that has an all carbon framework differ from the protein polymer?
Figure 2.6. a) A polymer with an all carbon backbone and b) a poly peptide, the backbone of proteins.
a)

b)

Proteins are flatter than you might predict. Let's use hybridization concepts to examine proteins in more detail. Like the alkene, the valence bond picture of any carbonyl group consists of three sigma bonds and one [[pi]] bond. Because the C atom of the carbonyl is sp2 hybridized, all carbonyls are planar with bond angles of about 120o. As shown in Figure 2.7 and 2.8, the bond angles about carbonyl carbons are close to 120o and all three atoms attached to the C as well as the C atom itself lie in a plane. On the basis of the double bond character of the carbonyl bond, we expect a shorter and stronger C-O connection than would be seen for a simple single C-O bond. These expectations are born out by experimental data: typical C=O bond lengths are considerably shorter (1.22 Å ) and stronger (732 kJ/mol) than C-O bonds (1.43Å, 385 kJ/mol).
Figure 2.7. Geometric data for compounds containing C-O and C=O bonds.





Figure 2.8. Bond lengths and angles for a piece of HIV Protease.

Another contributing factor to the short C=O bond lengths of carbonyls is the polar character of carbonyl groups. This polarity arises from the higher electronegativity of oxygen (3.5) relative to carbon (2.5). Another way of stating this using Lewis structures is to say that the ionic Lewis form makes a substantial contribution to the description of a carbonyl (eq 2.1). Recall that the depiction of the C=O group with two Lewis structures that are in resonance does not mean that the carbonyl sometimes is double bonded and sometimes dipolar. The carbonyl is always best described as a mixture of a covalent, doubly bonded C=O and a dipolar, singly bonded C-O.
(2.1)
A striking geometric property of the piece of HIV Protease that we've been discussing is the extension of the planar geometry about the carbonyl C to include planarity about the amide N. Examine the three dimensional structure of a segment of HIV Protease shown in Figure 2.9. As the shaded planes emphasize in the figure, the six atoms of the amide linkages (C-CO-NH-C) all lie in the same plane. Now compare the experimental geometry with the structures that you drew at the top of this section and make any necessary corrections. Simple bonding arguments do not lead to the correct structure because these notions predict pyramidal geometries about the N (as in amines).
Figure 2.9. The planarity of the amide.

In order to understand the extended planarity of the amide linkages in proteins, we turn once again to the concept of resonance. Using the octet rule only as a guide in drawing Lewis structures, three acceptable arrangements are possible for amides (Figure 2.10). As you have seen previously for aromatic compounds, unusual stabilities and shortened bond lengths commonly are associated with molecules having several Lewis structures in resonance. Unlike the resonance of equivalent cyclohexatriene structures used to describe benzene, the resonance structures for amides are not equivalent. Hence, we do not expect equal contributions from each resonance structure in describing the amide.
Figure 2.10. Resonance Structures for an amide.

Let's focus on the third resonance structure in which there is a formal C=N double bond and formal charges of +1 and -1 at the N and carbonyl O, respectively. If this resonance structure is a significant contributor, we expect the C-N bond length to be shorter than a typical C-N bond. Indeed, amide C-N bonds are shorter (C-N ~ 1.36 Å) than those of simple C-N single bonds (C-N ~ 1.47 Å). Also, participation of the C=N resonance structure should induce coplanarity of the (C-CO-NH-C) fragment, just as the six atoms of ethene (H2C=CH2) are coplanar. As mentioned previously, experimental structure determinations indicate that amide linkages are planar.
The experimental observation of a barrier to C-N bond rotation of about 17 kcal/mol (72 kJ/mol) for formamide provides strong evidence for a significant contribution of the Lewis structure in which the N is doubly bonded to carbonyl carbon. In contrast to amines, the N lone pair of an amide is partially occupied in the formation of a partial C=N double bond. In later sections of this unit, we will see how the formation of a partial C=N double bond affects not only structure but also reactivity; amines and amides have very different patterns of reactivity.
The best picture of the amide is one that emphasizes the first resonance structure as the primary contributor and the third resonance structure as a minor contributor. Although a minor contributor, there is enough C=N character to flatten out the amide group. Why isn't this resonance structure the primary contributor? Formal charges help to rationalize the minor contribution of C=N resonance structure to the net structures of amides. Drawing a C=N double bond places a formal charge of +1 at the amide N and a -1 charge at the carbonyl O.
For amides formed from primary amines such as N-methylformamide (HC(=O)NHCH3), there appears to be a slight preference (ca. 5 kJ/mol) for the conformer in which the CH3 group is syn with respect to C=O. The terms syn and anti are used for carbonyl-containing compounds such as amides to denote the location of the substituent on the nitrogen. In the syn isomer the substituent is on the same side as the carbonyl; in the anti isomer the substituent is on the side opposite the carbonyl. You can compare this to E and Z alkenes where E corresponds to syn and Z corresponds to anti. An energy diagram describing the interconversion of syn and anti conformers of N-methylformamide is shown in Figure 2.11. Although the barrier to rotation about an amide bond (72 kJ/mol) is less than that for rotation about a C=C double bond (>100 kJ/mol), this hindered rotation has significant effects on the structure of proteins.
Figure 2.11. Energy diagram and syn and anti conformers of N-methylformamide.

2.4 Resonance Considerations Apply to Carboxylic Acids and Esters
The ideas of resonance can be extended to understanding the structures of the related carbonyl containing groups, carboxylic acids and esters. For esters and carboxylic acids, the appropriate resonance structures are shown in Figure 2.12. Note that the structures having C=O+R or C=O+H double bond character also place formal positive charges at O atoms. Recall that oxygen is the secondmost electronegative group in the periodic table. Hence we expect that resonance structures with a formal positive charge on O will be less important than those with a formal charge on N.
Figure 2.12. Resonance structures for esters and carboxylic acids (carboxylic acid, R2 = H; ester, R2 = alkyl, aryl, etc).

As in the case of the resonance structure with the C=N+HR linkage for amides, the resonance structure with the C=O+R linkage (R=H in carboxylic acids and R=alkyl, aryl, etc. in esters) is expected to be the minor contributor in esters and carboxylic acids due to the formal positive charge on the highly electronegative O atom. In carboxylic acids, it is found that the O-H bond lies in the plane of the groups attached to the carbonyl carbon. If partial double bond character accounts for this geometric arrangement, then we expect a barrier to rotation about the C=O+H bond. We also expect the possiblity of two distinct isomers for the syn and anti configurations as was seen for amides. Both experiment and computational results confirm these expectations. In the simple carboxylic acid, formic acid, all atoms lie in the same plane. In general, the syn conformer is more stable than the anti by about 20 kJ/mol. Barriers to rotation about the C-OH bond are estimated to be 40 kJ/mol, signifcantly less than the 72 kJ/mol for rotation about an amide C=N double bond but significant nonetheless.
Figure 2.13. Energy profile and structures of syn and anti conformers of carboxylic acids

When the hydroxyl group of a carboxylic acid is replaced with an alkoxy group, an ester is generated. Like the carboxylic acid and the amide groups, the ester group is planar with bond angles about the carbonyl C tending to ca. 120o. The O-C bond (blue in Figure 2.??) of esters also lies in the plane of the carbonyl group. Like carboxylic acids and amides, both syn and anti conformations are possible with the syn conformation preferred by around 15-20 kJ/mol. The barrier to rotation of the ester C-OR bond is approximately 45 kJ/mol (Figure 2.14). Again, such a barrier is suggestive of some double bond character in the C-O. Such double bond character can be rationalized by consideration of the dipolar resonance structure shown with the usual Lewis structure for esters in Figure 2.12.
Figure 2.14. Energy profile and structures of syn and anti conformers of esters.

2.5 Carbonyls Have Distinctive Spectroscopic Properties
The effect of resonance on the structure and the planarity of the amide bond have been described in the above sections. These explanations seem reasonable, believable, valid. But what if you don't believe what you've been told? What if you didn't even have access to the above information and wanted to learn for yourself about the properties of carbonyl-containing compounds? What would you do?
What researchers working with HIV Protease (and in fact with a wide variety of other chemistry projects) do to study compounds is to use spectroscopic methods of instrumental analysis. For example, suppose you are given a sample of HIV Protease or some other protein or peptide and asked to determine its structure. The first things that you would be likely to do would be to obtain an Infrared (IR) spectrum and Nuclear Magnetic Resonance spectra of your compound. These spectra are shown in Figure 2.15.
Figure 2.15. a) IR, b) 13C, and c) 1H NMR of a peptide.

a)


b)

c)

Although upon a first inspection these spectra look impossibly complicated, they are in fact quite useful. Focus first on the different areas of absorptions in the IR spectrum. Spectroscopic features of carbonyl groups are a distinctive characteristic of compounds containing this functionality, and almost all carbonyl groups exhibit very strong IR absorptions in the range of 1600-1800cm-1. These absorptions are very useful in identifying carbonyls because this region of the IR spectrum is contains very few other bands. Note from the example shown in Figure 2.?? that the C=O molar absorptivities are much stronger than those of simple C=C bonds. This in not unexpected as stretching and compressing the highly polarized C=O leads to a large oscillating dipole moment, hence the strong absorption of IR light. Table 2.4 compares the stretching frequencies of compounds containing C=C with those containing a carbonyl group. Using the information in Table 2.4, what can you say about the types of carbonyl compounds (if any) that are present in the IR spectrum given in Figure 2.15 above?
Figure 2.16. IR spectrum of 3-penten-2-one.
Table 2.4. Comparison of IR stretching frequencies of compounds containing C=O and C=C functional groups. The given frequency is for the carbonyl carbon (column one) or the equivalent C=C carbon (column two).
carbonyl compound alkene![]()
![]()
![]()
![]()
![]()
![]()
Along with 13C NMR, 1H NMR spectra of carbonyl-containing compounds are often distinct. The acidic proton of carboxylic acids is found between 10 and 13 ppm, and the aldehyde CHO proton is usually found between 9 and 10.5 ppm. These and other 1H NMR chemical shifts are shown in Table 2.6. Use the 13C and 1H NMR information given in the tables to interpret the spectra in Figure 2.15. Are carbonyl groups present? If they are, can you identify any specific types of carbonyl groups?
Table 2.5. 1H NMR chemical shifts for carbonyl compounds. The given d value is for the H shown in blue.The given d value is for the carbonyl carbon.
Type of carbonyl compound Range of ppms Examples aldehyde 190-205 ppmketone 195-220 ppm
carboxylic acid 170-185 ppm
ester 165-180 ppm (-5 to
-10 ppm shift from the carboxylic acid) amide 165-180 ppm
![]()
Along with 13C NMR, 1H NMR spectra of carbonyl-containing compounds are often distinct. The acidic proton of carboxylic acids is found between 10 and 13 ppm, and the aldehyde CHO proton is usually found between 9 and 10.5 ppm. These and other 1H NMR chemical shifts are shown in Table 2.6. Use the 13C and 1H NMR information given in the tables to interpret the spectra in Figure 2.15. Are carbonyl groups present? If they are, can you identify any specific types of carbonyl groups?
Table 2.6. 1H NMR chemical shifts for carbonyl compounds. The given d value is for the H shown in blue.
Type of carbonyl compound Range of ppms Examples aldehyde 9-10.5 ppmketone 2-3.6 ppm
carboxylic acid 10-13 ppm
ester 3.5-4 ppm
amide 5-10 ppm (often
broad)
Still need syn/anti exercize here. Working on it. MJC
2.6. Carboxylic Acids, Esters, and Amides Are Found in Many Common Materials
So far we have considered the occurrence of carboxylic acids, esters, and amides only in the context of protein structures. Consider the common products shown in Figure 2.17. Aspirin and ibuprofen both contain aromatic groups and carboxylic acids (aspirin also has an ester group), the non-nutritive sweetener apartame contains both amide and ester groups, morphine also contains an amide group, the anti-Parkinsonian drug L-DOPA is a carboxylic acid, the ester isopentyl acetate largely is responsible for the odor of bananas, and the chief organic component of vinegar is the carboxylic acid, acetic acid.
Figure 2.17. Common compounds which contain various types of carbonyl groups.


2.7. Ionization of Carboxylic Acids Contributes to the Folding of Proteins.
Let's return to the ribbon structure of HIV Protease (Figure 2.1). You will notice that molecule is folded around on itself to form a well defined shape. What makes HIV Protease fold in this way? To address this critical issue we need to take a closer look at the protein and its functional groups. As the figure below depicts, many carboxylic acid side groups from amino acids such aspartic acid lie near the amine side groups of amino acids such as lysine. Closer inspection reveals that the carboxylic acid groups have transferred a proton to an amine group, in the process forming negatively charged carboxylate (R-COO-) group and a posititvely charged ammonium group (R-NH3+). Not surprisingly, the carboxylic acid group acts as a Brønsted acid and the amine group acts as a Brønsted base. We might expect that the oppositely charged ammonium and carboxylate groups have a substantial attraction. We will soon see that this formation of "salt bridges" plays a critical role in determining the protein shape. But first we must ask why carboxylic acids are acidic whereas alcohols are not.
Figure 2.18 Salt Bridges in HIV Protease with a close-up of a salt bridge

2.8. Acidity is Promoted by Resonance Stabilization
Place acetic acid (CH3C(=O)OH)in water (i.e., make distilled white vinegar) and measure the solution pH with a meter or litmus paper. You will find that the solution is acidic by virtue of the following equilibrium:

The equilibrium constant for this reaction, called Ka, has the following value at 25oC:

corresponding to a pKa of 4.74. This means that at pH=4.74, half of the acetic acid is in the protonated or free acid form and half is in the deprotonated or acetate ion form. Because the pKa of acetic acid is low, acetic acid is essentially entirely deprotonated at physiological pH (around pH=7.0).

The pKa of acetic acid is not remarkable in and of itself; for a 0.1 M concentration in water the acetic acid is only 1.3% dissociated. Certainly other oxyacids such as H2SO4 (pKa= -3), H3PO4 (pKa=2.14), and HClO4 (pKa= -3) are far stronger acids. The unusual acidity of carboxylic acids is revealed only in comparison with the closely related functional group, the alcohol. Alcohols are weak acids; the pKa of ethanol is only about 16. In fact, ethanol is a weaker acid than water. The differences of the pKa's for ethanol and acetic acid are about 11, and this corresponds to equilibrium constants that differ by eleven orders of magnitude.
Why is acetic acid so much more acidic than ethanol? To rationalize this effect we must consider both the reactants and products of the acid dissociation reaction. Let us begin with the products and consider the relative stabilities of the two anions formed by loss of a proton, the ethoxide and acetate anions. In the acetate anion, the negative charge is distributed over both oxygen atoms of the carboxyl group, whereas the charge of ethoxide is mostly localized on the oxygen. The two oxygens of carboxylate groups are identical in all respects: for example, the two C-O bond lengths determined by X-ray crystallograpy are the same (1.26Å). This contrasts with the C-O bond lengths of the carboxylic acid, for which the two C-O bond lengths are significantly different (1.21Å for C=O and 1.36Å for C-O of acetic acid).
A single Lewis structure does not adequately describe the acetate ion. To rationalize the observed structure of the acetate ion we must consider two Lewis structures in resonance with one another (Figure 2.19). Alternatively, we may represent the net effect of the resonance structures as the sum of the two individual representations (Figure 2.20).
Figure 2.19. Lewis structures describing the acetate anion.

Figure 2.20. Single structure describing the acetate anion.

The effect of spreading the charge of the acetate anion over two oxygen atoms is to stabilize the the anion with respect to the ethoxide anion. This energetic effect, which is similar to the stabilization of benzene relative to non-conjugated alkenes, is referred to as resonance stabilization. We can view the energy lowering as resulting from spreading of the electron density over more atoms.
The above explanation focused on relative stabilities of the products of proton loss. We must be careful; the energetics of a chemical reaction depend on the differences between reactant and product energies. Therefore, it is not sufficient to examine the effects of resonance on the products, only. Let us consider now the reactants, ethanol and carboxylic acids. We can propose three resonance structures for acetic acid as shown below. Note that stabilization may be afforded by the resonance structure involving a formal positive charge on the O bonded to H; by electronegativity considerations we expect that this structure will be a minor contributor. Nonetheless we have already seen evidence for the participation of the third resonance structure shown below (recall the restriction of rotation about the C-OH bond).
Figure 2.21. Three resonance structures for acetic acid.

Even a small contribution of a resonance structure with a formal positive charge on the O of the O-H group will increase the acidity. By electrostatic-based reasoning, diminished negative charge on the O of the OH group will increase the tendency of the O-H bond to dissociate H+. An alcohol has no equivalent possibilities for resonance enhanced acidities. In conclusion, resonance provides a rationale for understanding the far higher stabilities of carboxylic acids relative to alcohols. We can view the influence of resonance either as Coulombic destabilization of the acid or as stabilization of the acetate anion via charge delocalization.
2.9 Salt Bridges Stabilize Protein Folding by Electrostatic Attraction
Returning to the structure of HIV Protease, we find that many of the carboxylate side groups (RCO2-) are closely paired with ammonium side groups (RNH3+), as shown in figure 2.??. We expect that two charged groups will have a strong electrostatic attraction. At a distance of 2.0Å and in the absence of any intervening atoms, the attraction of a +1 charge to a -1 charge yields a stabilization (indicated by the negative sign of the energy) of about 191 kJ/mole (or 46 kcal/mol). These numbers derive from Coulombs law.

Essentially, Coulomb's law states that the effect of charge is inversely proportional to distance. In other words, nearly 200 kJ/mole are required to separate the opposite charges to an infinite distance. This energy is about half of the strength of a C-C covalent bond. We must be cautious in these estimates, however, because the charges in a carboxylate anion and an ammonium cation are spread out over many atoms. Such delocalization will decrease the magnitude of the electrostatic attraction by increasing the average separation of the charges. Nevertheless, it is clear that electrostatic attractions, as well as repulsions, can supply substantial forces that impact protein structures.
The presence of charged groups has another less direct influence on the folding of HIV Protease. Charged groups interact strongly with polar molecules such as water. We can characterize this interaction as being a charge-dipole interaction. Consider the ammonium group in an aqueous solution. Maximum stablization of the ammonium group will occur when surrounded by water molecules oriented with their dipoles pointing toward the positively charged ammonium, as shown in Figure 2.22.
Figure 2.22. Water stabilized ammonium ion

If the protein cannot fold to place an ammonium group near to a carboxylate, then it will be most favorable for the charged groups to be located on the water accessible surfaces of the protein. Thus we expect that proteins will fold in such a way that charged and highly polar functional groups are on the water accessible exterior surfaces. Such groups are called hydrophilic (Greek for water loving). Similarly, nonpolar groups (the hydrophobic, or water hating, groups with high hydrocarbon content) will tend to aggregate in the oily interior of the protein. Refer back to Table 2.1 and note that the amino acids where R = alkyl and aryl are hydrophobic while amino acids with basic, acidic, alcohol, and thiol functional groups are hydrophilic. All of these features are revealed in the structures of HIV Protease as shown below.
Figure 2.23 Color coded structures of HIV Protease showing aggregation of polor side qroups (shown in green) on the protein exterior.

The overall folding of a protein is referred to as its tertiary structure. Because the ionization of carboxylates and amines is a critical element in stabilizing the tertiary structure of proteins and because the ionization occurs through loss or gain of a proton, we might expect that the structure of a protein is dependend on the pH of the solution. Indeed this is observed, proteins commonly have a well defined structure at near physiological pH but unfold to random structures as the pH is either increased or decreased significantly. For example, as the pH is decreased carboxylate functions will be protonated to form free acids, which are not charged and cannot engage in salt bridge formation. In removing the structural driving force of the salt bridges, many proteins will simply unfold to a disordered state. This allows us to understand why the ability of organisms such as fish to survive is so strongly affected by the pH of their envirionment.
2.10 Carboxylic Acids are Critical for the Activity of HIV Protease
The acidity of the O-H bond in carboxylic acids plays a crucial role in the mechanism of HIV Protease. The two amino acids in the active site of HIV Protease that are responsible for cleaving the amide bond in the substrate are both aspartic residues; aspartic acid is a carboxylic acid (Table 2.1). In the active form of HIV Protease, one of the carboxylic acids is protonated and the other is present as the carboxylate anion. As the reaction of HIV Protease with its substrate progresses, the neutral carboxylic acid becomes deprotonated and the carboxylate anion gains a proton to become neutral. The mechanism of HIV Protease will be presented later; for now, it is sufficient to understand that the carboxylic acid functional group is ideally suited for HIV Protease because the carboxylic acid is acidic and the carboxylate anion is stable.
Figure 2.24. HIV Protease with active site aspartic acid groups highlighted

In HIV Protease, the two aspartic acid residues in the active site are hydrogen bonded to a water molecule. The hydrogen bonding ability of the aspartic acids is the main force behind the reaction by which HIV Protease cleaves its substrate. In fact, when these acids are converted to ester groups which cannot hydrogen bond, HIV Protease is rendered inactive.
Figure 2.??. Close in showing Hydrogen bonding of aspartates to water
2.11 Carboxylic Acids Form Strong Intermolecular Bonds
A distinctive feature of many carbonyl-containing compounds is that their boiling points are significantly higher than hydrocarbons of similar mass. Examine the table shown below and it is obvious that the presence of the carbonyl markedly affects the boiling points. A closer look reveals that amides and carboxylic acids have particularly high boiling points. Although boiling points may seem to have little to do with protein structure, in fact the physical interactions underlying these properties have much in common. We have already seen the interactions between functional groups on proteins strongly impact the structure. Boiling points, in part, are a measure of the strength of intermolecular interactions. For example, the boiling points of carboxylic acids relative to hydrocarbons reveal something about strength of the intermolecular forces.
Table 2.6. Boiling point comparisons for molecules of similar mass.
Carbonyl Compound Alkene![]()
![]()
![]()
![]()
![]()
![]()
Simple carbonyl containing compounds are polar by virtue of the polarity of the carbonyl. Intermolecular dipole-dipole interactions lead to stabilizing forces, hence higher boiling points. As illustrated below, molecules will tend to align so that maximum attraction between dipoles is achieved. This occurs when the neighboring molecules have their dipoles anti-parallel, or the positive end of one dipole lies close to the negative end of another dipole. However, notice how acetic acid and acetamide have substantially higher boiling points than the related ketone, 2-butanone, and ester, methyl acetate. Obviously there is more occurring in acetic acid and acetamide than simple dipole-dipole intermolecular forces.
Figure 2.25. Dipole-Dipole Interactions between Acetone Molecules

Close inspection of the structure of a carboxylic reveals a distinctive capability for dimerization via complementary hydrogen bonding. The hydrogen donor is typically H attached to either an O (i.e,. water, alcohols, and Brønsted acids), a N (amines, amides, and ammonium groups), or F (hydrofluoric acid). The hydrogen bond acceptors are lone pairs on O (i.e., water, alcohols, ethers, and all carbonyls), N (amine, C=N), and F (generally HF or F-). As a carboxylic acid contains both a hydrogen donor (the O-H) and a hydrogen bond acceptor (the carbonyl O) function, two hydrogen bonds may be formed by dimerization to form a weakly bound six-membered ring. Indeed, it is found that in the solid, liquid, and moderate pressure gas phases most carboxylic acids undergo mutual hydrogen bonding (Figure 2.26).
Figure 2.26. Dimerization of acetic acid via Hydrogen Bonding.

In the solid state, a more complex mode of hydrogen bonding occurs that leads to the formation of extended arrays of hydrogen bond networks as shown below. Similar arrays are commonly seen for amides as well.
Figure 2.27 Extended arrays of H-bonds in solid Carboxylic acids and amides.
2.12 How Do We Know that Hydrogen Bonds Form?
Hydrogen bonds are generally identified using two sets of criteria: geometric criteria and energetic criteria. In carbonyl compounds, the C=O---H angle is ideally 120o, and hydrogen bonds in the plane of the C=O bond are in the preferred geometry. Energetically, hydrogen bonds are favored if the equilibrium between bonded and non-bonded systems lies such that the hydrogen bonded system is lower in energy. Methods of detecting hydrogen bonds usually focus on one of the two criteria.
The most common method of detecting hydrogen bonds is by X-Ray Diffraction Crystallography. As you may recall from a previous chemistry course, X-ray diffraction is a method for locating the positions of atoms in a crystalline substance. Once the positions of the atoms in the structure have been located, then the distance A-H---B is examined, where B is potential hydrogen bond acceptor and A-H is the donor functionality. If the distance between H and B is significantly smaller than the sum of the van der Waals radii for the two atoms, then a hydrogen bond is present. For example, the van der Waals radius of O is around 1.4 Å and that of H is about 1.2 Å yielding a sum of 2.6Å. In molecules making hydrogen bonds, noncovalent O-H separation in the range of 1.6Å to 1.8Å are common.
Vibrational spectroscopy (infrared or raman) is a direct method for detecting hydrogen bonds. In infrared spectroscopy, the A-H (OH or NH) stretch undergoes several key changes upon hydrogen bond formation. First, the position of the maximum absorbance is shifted to lower wavenumbers. The non hydrogen bonded OH or NH peak is at about 3460 cm-1 while the hydrgoen bonded OH or NH peak is shifted to about 3320 cm-1. Second, the width of the absorbance is larger in the hydrogen bonded compound than for non hydrogen bonded material. Lastly, the intensity of the absorbance increases. This is because intensity is proportional to the change in the dipole upon stretching. Since the hydrogen bonded H is more polarized, there is a greater change in the dipole moment upon stretching and the peak is larger. Figure 2.?? shows the NH/OH region of the IR spectrum of the peptide presented in Figure 2.??. Assign the peaks for hydrogen bonded and non
hydrogen bonded NH's in the peptide and show possible hydrogen bonding interactions.
Nuclear Magnetic Resonance (NMR) spectroscopy and calorimetry have also been used to detect hydrogen bonds. Changes in chemical shift in the 1H NMR spectrum of some systems can be attributed to hydrogen bonding. The evolution of heat (measured by calorimetry) as hydrogen bonding partners are mixed has been correlated to hydrogen bonding as well.
2.13 Helices and Sheets in HIV Protease
Let's return to our ribbon structures of HIV Protease. You may notice that some of the ribbon sections are coiled like springs. Less obvious, but present nonetheless, are regions is which the ribbons run parallel to one another. These structural motifs, which are given the names [[alpha]]-helix and [[beta]]-pleated sheet, are highlighted in the structures shown below. The twisting and aligning of a strand of amino acids to form and [[alpha]]-helices and [[beta]]-pleated sheets is called the secondary structure of a protein.
Figure 2.28. Structures of HIV Protease with the regions of [[alpha]]-helix colored blue and the [[beta]]-pleated sheet regions colored red.

Such well defined structures are not likely to occur randomly; what are the forces that form the helical and sheet regions of proteins? It was first pointed out by Linus Pauling in the early 1950's that such regions could result from intramolecular hydrogen bonding in proteins. Proteins are polyamides (also called polypeptides). Because amides have both hydrogen donor (the N-H bond) and acceptor (the carbonyl O) functionalities, everything needed to form many hydrogen bonds is present in a single strand of peptide. A closer look at the helical and sheet regions of HIV protease reveals the networks of hydrogen bonds that hold these structures together. Although each hydrogen bond yields just 12-20 kJ/mol of stabilization, the presence of many of these interactions provides a strong driving force for these strands to form local regions of helical and sheeted structures.
Coiling into an [[alpha]]-helix is driven by the formation of hydrogen bonds between amides on neighboring turns of the coil. For most helical proteins maximization of hydrogen bond stabilization and minimization of repulsive steric interactions occurs when the carbonyl of the nth amino acid is an acceptor for the N-H bond of the n+4th amino acid. This hydrogen bonding pattern leads to helices in which each turn is separated by about 5.4Å.
The [[alpha]]-helix is like a spiral staircase without a center pole. Each amino acid is a step and hydrogen bonds connect it weakly to the amino acids four steps above and four steps below. Although each hydrogen
STOPPED HERE - CRL
Clark--new part looks nice. Should probably define secondary structure in section 2.13.--MJCbond is weak and the formation of such an ordered structure is entropically disfavored, there are enough hydrogen bonds to make the coiled structure of the [[alpha]]-helix a very common structural motif.
Figure 2.29. Close up of [[alpha]]-helix in HIV Protease showing hydrogen bonding.

The [[beta]]-pleated sheet also is stabilized by hydrogen bonding. In the [[beta]]-pleated sheet hydrogen bonds are formed between two sections of a strand of protein that run parallel to one another. The hydrogen bonds are arranged like the ties in a railroad track; the rails correspond to parallel sections of the protein strands and the ties are the hydrogen bonds made by the amide functional groups.
Figure 2.30. Close up of [[beta]]-pleated sheet in HIV Protease showing hydrogen bonding.

2.14 Does the Nature of the Amino Acids Influence [[alpha]]-Helix and [[beta]]-Pleated Sheet Formation?
As you examine the structure of HIV Protease, you will notice that not all of the protein has a well-defined secondary structure. Why, for example, doesn't the entire protein coil into an [[alpha]]-helix or loop back on itself to form a large sheeted structure? The influences on protein structure are diverse. We have already seen that salt bridges, hydrophobicity/hydrophilicity, and hydrogen bonding all act upon the structure of a protein. Because different types of amino acids have side groups with different hydrophobicities and hydrophilicities, different charges, and different hydrogen bonding capabilities we expect that structure of local regions of a protein will vary in response to the amino acid composition. It is no surprise that just two dozen amino acids can give rise to so many proteins with such varied properties. There are virtually unlimited ways of combining different amino acids. However, in terms of understanding why a protein adopts a particular structure, this diversity of influences makes the problem nearly intractable. Nonetheless, we can gain insight by examining one of the special amino acids, proline, in more detail.
The cyclic structure of proline strongly influences the secondary structures of proteins. Because the amino group of proline is a secondary amine, formation of a peptide bond yields an amide in which there is no N-H hydrogen bond donor function. Furthermore, the geometric constraints imposed by the five membered ring prevent a protein strand with a proline in the middle from adopting a fully extended, linear structure (Figure 2.31). The beginning of a) in the protein strand results from these constraints.
Figure 2.31 A strand of protein in the a) linear, fully extended conformation, and b) the same strand with a proline in the middle.
a)

b)

Nature takes advantage of the special geometric features of the amino acid proline. Recall how the formation of [[beta]]-pleated sheets requires that a strand of protein loop back upon itself; such regions are called [[beta]]-turns. Now look at the kink induced by proline in the strand of protein depicted above. The constrained geometry of the proline amino acid initiates turning of a protein strand. Not surprisingly, analysis of the structures of many proteins reveals that theturn regions of [[beta]]-pleated sheets frequently contain proline residues.
Turn regions of [[beta]]-sheets often lie on the exterior of proteins. Not too surprisingly, it is often found that [[beta]]-turn regions contain polar amino acid residues.
The "kinking" of a protein strand induced by proline can destabilize [[alpha]]-helix formation. One can imagine the formation of an [[alpha]]-helix by taking a fully extended chain of amino acids sketched out on a piece of paper or an overhead transparency and then rolling the paper into a tube so that the strand forms a helix. If the protein strand is kinked by the presence of a proline residue in the middle then the amino acids will be able to form the arrangements of hydrogen bonds needed to stabilize the [[alpha]]-helix. Thus, proline is never found beyond the first three residues of an [[alpha]]-helix.
2.15 HIV Protease Comprises Two Strands of Amino Acids
HIV Protease has two strands of proteins with no covalent connections. The two strands are identical in their amino acid content and sequencing. The superstructure of proteins that results from more than one peptide chain coming together is called quaternary structure. The individual strands are called subunits (or segments) of the protein. For example, the oxygen transport protein hemoglobin has four subunits. The subunits of HIV Protease are held together by the cumulative effects of the same weak intermolecular interactions that cause proteins to fold, twist, and turn: hydrogen bonds, salt bridges, and hydrophobic/hydrophilic effects. ??In HIV protease the main interactions holding the strands together seem to be hydrophobic effects. By coming together in the form of a dimer, the strands appear to minimize the exposure of hydrophobic regions to water. Hence, the regions in which the two strands come closest together are rich in non-polar residues.
2.16 How Do We Know the Structures of Proteins?
So far we have assumed the structure of HIV Protease and used that structure to reveal some of the structural features of carboxylic acids, amides, and esters. But how do we know that this structure is a good representation of the protein? You may be aware of that a common technique for elucidating the three-dimensional structures of molecules is X-ray crystallography. In this method the diffraction of X-rays by planes of atoms that are arrayed orderly in single crystals can be analyzed to determine the positions of the atoms. This is a popular and rapid technique for small molecules but much more difficult for proteins because of their size and difficulty of crystallization. Furthermore, one may be concerned that in packing the protein molecules to form a crystal, the molecules have deformed from their solution structures. Recent technological developments have provided another method for characterizing the three dimensional structure of proteins: NMR spectroscopy.
Let us begin our discussion of structure determination by NMR by distinguishing three-dimensional structures from topological structures. The most common application of NMR spectroscopy is to determine the topological structure; i.e. the nature of the functional groups and how they are connected together. Consider the 1NMR spectrum of N-methyl formamide.
Figure 2.32. Proton NMR Spectrum of N-methyl formamide

The appearance of a doblet at high field, a broadened quartet of doublets at lower field, and a doblet at very low field that integrate to a 3:1:1 ratio of areas are consistent with N-methyl formamide. That is the combination of coupling, intensities, and chemical shift indicate that a methyl group is coupled to the N-H proton, and that the N-H proton is coupled to both a methyl group and a proton attached to a carbonyl. What this NMR spectrum does not tell us is the three dimensional structure. Is the methyl group syn or anti to the carbonyl? Is the amide group planar?
A variant of the normal NMR spectroscopy, called Nuclear Overhauser Effect SpectroscopY (NOESY), provides distance related information that permits us to answer such questions. The NOESY spectrum is depicted below, note how the spectrum looks very different from the NMR spectra that you have seen thus far.
Figure 2.33 2D-NOESY Spectrum of N-methyl formamide

The NOESY data shown above is referred to as a 2D NMR spectrum. This contrasts with normal NMR spectra which plot intensity vs. chemical shift. For 1D spectroscopy the indicated dimension is chemical shift; the intensity dimension is assumed. 2D NOESY results contain three dimensions of information: intensity in one dimension and chemical shift in the remaining two. The depiction of three dimensional information in two dimesions (such as a computer monitor or piece of paper) requires the use of a projection technique. Two common techniques, relief maps and contour maps are illustrated below.
Figure 2.34 Relief and Contour Maps of Mountain Peaks.


The principal advantage of the contour map is that none of the surface features are hidden. In the 2D NOESY spectra, the contour lines represent intensity information projected onto a plane containing chemical shift information.
The off-diagonal peaks in 2D NMR spectra reveal information about the distances between protons. Notice how the 2D NMR spectrum of N-methyl formamide shows substantial symmetry. The set of peaks that runs along the diagonal from the lower left corner to the upper right corner is essentially the normal 1D spectrum. Along both the vertical and horizontal axes these peaks have been projected in Figure 2.33. These projections have the same information as the 1D-spectrum. The off-diagonal peaks are arrayed symmetrically about the diagonal peaks. The magnitude of the off-diagonal peaks are proportional to the distance between the protons involved. Let's see how this works by analyzing the 2D NOESY spectrum of N-methyl formamide in detail.
First examine the off-diagonal peak that connects the methyl group to the amide N-H proton. These H's are close together in space so we expect and observe a large off-diagonal peak (indicated by several contour lines). We expect smaller off-diagonal peaks interconnecting the formyl (H-C=O) proton with the amide N-H and methyl groups because the distances are greater. However, notice how the off-diagonal peak for the N-H proton is much larger than that for the methyl. This suggests that the amide exists predominately with the methyl group syn to the carbonyl oxygen. Quantitative analysis of the NOE intensities confirms this assignment.
Why use 2D NMR spectra? Imagine that you have obtained a 1H NMR spectrum of HIV Protease. Except for the N-terminal residue, each amino acid will exhibit an amide N-H resonance, leading to 98 peaks in the amide region. This leads to a very crowded, highly overlapped spectrum. The advantage of 2D NMR spectroscopy is that the information is spread across two dimensions rather than one. Whereas it is impossible to separate out the contributions made by individual resonances in the 1D spectra of proteins, individual assignments often can be made in 2D spectra.
Now lets look at the 2D NOESY data of a small dipeptide, N-acetyl-(L)-Pro-(D)-Ala-NHMe (Figure 2.35). Essentially we have taken the dipeptide of L-Proline and D-Alanine (the unnatural enantiomer), capped off the N-terminus with an acetyl (Me-CO-) group, and turned the C-terminus into an amide of methylamine. Here there are enough peaks that the advantages of the 2D spectrum are clearer: off-diagonal peaks are clearly resolved and easily quantified. Does the NOESY data support a conformation of the small peptide that is fully extended or folded? Structural depictions of fully extended and folded conformations are shown below. Note how the folded conformation is enforced by formation of a hydrogen bond from the N-H of the C-terminal amide to the carbonyl O of the acetyl which caps the N-terminus. This folding pattern is essentially that of a [[beta]]-turn in [[beta]]-sheeted structures of larger proteins. We can detect the presence of a folded conformation for this special dipeptide from the off-diagonal peak that connects the NHMe proton to the [[alpha]]-CH of the Proline residue. As a general rule of thumb, the appearance of a NOESY cross-peak indicates that the protons are closer together than 5Å. The only conformation that brings the NHMe proton close to the Proline ring protons is that in which a hydrogen bond causes the structure to fold.
Figure 2.35. 2D-NOESY spectrum of N-acetyl-(L)-Pro-(D)-Ala-NHMe.
2.17. The Structure of HIV Protease and the Design of Inhibitors
The structure of HIV Protease to which we have been referring was determined by X-ray crystallography. In overview this structure exhibits four levels of structure: primary structure (the sequence of amino acids), secondary structure ([[beta]]-pleated sheets and [[alpha]]-helices), tertiary structure (the overall folding enforced by salt bridges, hydrogen bonds, and hydrophobic/hydrophylic effects), and quaternary structure (two strands held together by non-covalent forces).
STOPPED HERE - CRL
Like all carbonyls, carboxylic acids may function not only as acids but also as bases via attachment of a proton to one of the lone pairs of the carbonyl oxygen. In contrast to the lone pairs of amines, the lone pairs of carbonyl groups are very weak bases. In other words, protonated carbonyl groups are strong acids, with pKa's <-5 for most carbonyls. Compare such values with the pKa of protonated water: pKa = -1.7. Nonetheless, protonation of the carbonyl group is an important step in many reactions of the carbonyl group.
In principle, protonation of a carboxylic acid functional group may take place at either the C=O or the C-O oxygens. We can understand which site of protonation is likely to be favored by considering the important resonance structures for the carboxylic acid. These structures are shown below.
Figure 2.??. Three resonance structures of carboxylic acids (Streitweiser, p521)
Note that these resonance structures place partial negative charge on the C=O oxygen and partial positive charge on the -OH group. Simple electrostatic arguments favor placement of a proton on the C=O (negative) oxygen rather than the -OH (partially positive) oxygen atom. Hence, we expect that protonation will occur at a lone pair of the C=O.
??Again, protonation related to protein structure or function here?
2.3. Esters: Simple Derivatives of Carboxylic Acids
The presence of substantial negative charge on the oxygen atom of the ester carbonyl suggests that esters, like carboxylic acids, can function as bases. Esters can be protonated and the pKa's of the protonated esters are similar to those of protonated carboxylic acids (pKa's ca. -6).
One way to imagine synthesizing an ester is to simply take a carboxylic acid and react it with some alcohol. The equilibrium between carboxylic acid and ester functions is depicted below. As drawn, the left to right progression of the reaction is referred to as an esterification reaction. The reverse process is called a hydrolysis reaction. The equilibrium constant for this process has a value not far from unity (for the esterification of acetic acid by ethanol, an equimolar mixture of acetic acid and ethanol will reach equilibrium at 65% ethyl acetate and 35% acetic acid). Therefore, in principle one could synthesize ethyl acetate from acetic acid simply by placing the acetic acid in a large excess of ethanol. However, for reasons that will be discussed more fully in a later section, it is found that the equilibrium is established too sluggishly for practical synthesis via this route. In a practical esterification reaction a strong acid such as HCl often is added in order to speed the reaction through the catalytic effect of strong acids. Recall that carbonyls are weak bases that can be protonated by strong acids such as sulfuric acid. Note that the pKa's of carboxylic acids are insufficient to effect significant protonation of the weakly basic carbonyls. Even strong acids such as sulfuric acid generate only low concentrations of protonated carbonyl groups. Although low concentrations of protonated carbonyl groups are generated, these intermediates are very reactive. Thus the addition catalytic amounts of strong acids tremendously accelarates the esWe will need to return to acid catalysis of esterification/hydrolysis reactions when we discuss the mechanism of HIV Protease catalyzed hydrolysis of proteins.
2.4. Amides: Derivatives of Carboxylic Acids and the Backbone of Proteins
More support for the importance of partial double bond character in the amide C-N linkage comes from an examination of the acid-base character of amides. Ammonia is a very weak acid; the pKa ~ 34 for the loss of a proton from ammonia to form the NH2-. This very weak acidity is approximately equal to that of the methylene group of diphenylmethane, Ph2CH2. Now contrast the pKa of NH3 with that of acetamide. The dissociation of H+ from acetamide, CH3CONH2, has a pKa ~ 25, a nine orders of magnitude increase relative to ammonia. As with carboxylic acids, the resonance stabilization of the negative charge of the CH3CONH- rationalizes the higer acidity of the amide.
Figure 2.??. Resonance Structures of Amidate Anion

We have also seen that protonation of carbonyl-containing groups, i.e., the basicity, is stablitized by resonance interactions. First, lets compare the basicity of the amide N with that of ammonia. As shown in the resonance structures for amides, the partial double bond character of the C-N bond imbues the N with a partial positive charge. Accordingly, we might expect that electrostatic repulsions would disfavor addition of an H+ to the N. Furthermore, protonation of the N of the amide would convert the N to sp3 hybridization. In the process of protonation, the availability of the p-orbital needed for resonance stabilization of the [[pi]]-system is lost. Both electrostatic and resonance arguments lead one to conclude that acetamideis a weaker base than ammonia. Experiment supplies confirmation, the basicity of the amide is nine orders of magnitude less than that of ammonia.
Given the destablizing influence of protonation of the N of amides, we might wonder if the N is the actual site of protonation. Due to the unsymmetrical nature of the amide, protonation may occur at either the N or the O. Examine the resonance structures of the amide again. Resonance places a partial negative charge on the amide O. Furthermore, protonation of one of the O lone pairs does not force rehybridization of the amide that would lead to a loss of resonance. Indeed, protonation of amides occurs at the O rather than at the N. Let us emphasize that although carbonyl groups may be protonated, they are not strong bases. Generally, carbonyl groups are weaker bases than water itself. Nonetheless, as we have seen the acid catalyzed esterification of carboxylic acids, protonation at the carbonyl groups may dramatically lower activation barriers for reactions at carbonyls.
2.5. What are proteins?
Re-examine the structural depiction of HIV protease. The ribbon structure of HIV protease implies certain structural features of proteins. Note that ribbons are used, suggestive that the molecule consists of strings of atoms. Also notice that the ribbon appears to have local regions of order such as helical coils. Furthermore, there are two ribbons that make up the HIV Protease molecule. Naturally this type of structural depiction elicits certain questions. What are the chains of atoms and how are they connected? What makes HIV protease adopt a structure so different from that of the similarly sized protein, (??need one ??)? What forces give rise to local structural elements such as helical coiling? How is the overall folding of the molecule determined and why do two strands come together? In short we ask, can a biological structure as complex as HIV Protease be understood as the result of the atomic scale interactions that constitute the chemists' normal perspective? If you accept that the HIV Protease structure is not egregious, the answers to these questions reveal many general features of protein structure. Let us begin with how a discussion of proteins and their roles in living systems.
Proteins are macromolecules which play critical roles in almost all biological functions. For example, when we cut ourselves, a group of proteins called coagulation factors are responsible for triggering the clotting of our wound. Hemophilia, one of the most familiar bleeding disorders results when one of the coagulation factors is missing, or has significantly reduced activity. Hemophiliacs can often bleed excessively due to minor injuries because blood clots do not form quickly. Recently, hemophiliacs have benefited from recombinant DNA technologies which clone the proper coagulation factor in high purity. Before then, blood transfusions supplied the patient with the needed coagulation factor. Such frequent blood transfusions carry the risk of infection by AIDS or hepatitis -- although the strict blood screening begun in 1985 has made the blood supply far safer.<
Movie - Erica/Monty's video blood clotting portion?
Blood clotting is only a small example of the far reaching roles proteins play in biological systems. Some other examples are given in Table 3.1 below:
<b>Table 3.1.</b> Protein classes<p>
<center>
<table border=5 cellpadding=5 cellspacing=5>
<tr><th>Class of Protein</th><th>Function</th><th>Examples</th><tr>
<tr><td>Antibodies</td><td>Immune system</td><td>Antibodies for viral and bacterial infections</td></tr>
<tr><td>Contractile</td><td>Motion</td><td>Atin, myocin</td></tr>
<tr><td>Enzymes</td><td>Catalysts</td><td>HI-1 protease</td></tr>
<tr><td>Hormones</td><td>Biological Regulation</td><td>Insulin</td></tr>
<tr><td>Storage</td><td>Storage of amino acids</td><td>Casein</td></tr>
<tr><td>Structural</td><td>Support (cells, tissues, etc.)</td><td>Collagen, elastin, keratin</td></tr>
<tr><td>Transport</td><td>Carriers of other molecules</td><td>Hemoglobin, myoglobin</td></tr>
</table>
</center><p>
<p>
2.6. Proteins Are Polymers of Amino Acids that Are Joined together by Peptide Bonds
A closer look at the composition of proteins reveals a repetition of fragments.
The bond that links together these fragments is the C-N bond of the amide functional group.
Hydrolysis (or addition of water to the C-N amide linkages of proteins) yields the constituent amino acids.
So, in order to understand the construction of proteins from amino acids, or the reverse process of making small proteins from large ones that is the primary function of the enzyme HIV Protease, we must first explore the nature of carboxylic acids and amides more thoroughly.
<p>
A protein of special interest to AIDS researchers is HIV-1 protease. Proteases are proteins which serve to cleave other proteins at specific peptide locations, often activating some sort of biological mechanism. HIV-1 protease cleaves a polyprotein precursor into structural proteins used by the HIV virus during its life cycle -- and hence has become a major target area for AIDS research. Figure 3.2 below details the life cycle of HIV and the role of HIV-1 protease. <p>
<p>
<b>Figure 3.2</b> HIV life cycle<p>
<p>
Researchers hope that an inhibitor of HIV-protease can be developed which may lead to a cure for AIDS. An understanding of the structure of proteins is necessary for study of their mechanisms. Fortunately, major advances in spectroscopic techniques accompanied the development of biochemistry and protein structures are now frequently determined.<p>
<p>
<b>How do we know the structure of proteins?</b><p>
<p>
Nuclear magnetic resonance (NMR) techniques and x-ray crystallography techniques are most commonly used to determine protein structures. Below are structures of proteins which have been determined by NMR methods and by x-ray crystallography.<p>
<p>
<b>Figure 3.3</b> Structures of protein by a) NMR of something b) x-ray diffraction of myoglobin<p>
<p>
When the first protein structure of myoglobin was determined by x-ray crystallography, the scientific community was surprised to find that the protein lacked symmetry and instead appeared to be a random globule. In hind site we may say that the amazing diversity of protein functions demands a more complex structure.<p>
<p>
<b>What are the structural components of proteins?</b><p>
<p>
The overall structure of proteins is best described as its quaternary structure. A single protein may be composed of more than one distinct strands or individual molecules. The shape these subunits assume as a whole defines the quaternary structure of proteins. HIV-1 protease is a dimer of subunits as shown below in Figure 3.4.<p>
<p>
<b>Figure 3.4</b> Protein structure with subunits differentiated.<p>
<p>
The folding of the individual protein subunits forms the tertiary structure of proteins. A closer look at tertiary structures show that there are recurring structural themes within the folded shapes of the subunits. Alpha helices and beta pleated sheets are two common protein structural motifs. Figure 3.5 below shows alpha helix and beta pleated sheet sections of HIV-protease.<p>
<p>
<b>Figure 3.5</b> Diagram of alpha helices and beta pleated sheet portions of HIV protease.<p>
<p>
These structural motifs are known as a protein's secondary structure. They describe the way atoms are arranged next to their neighbors, but, they do not describe how larger portions of the protein fold. {more information about alpha helices and beta pleated sheets} <p>
<p>
Proteins are a series of linked amino acids. In this respect, we can think of proteins as polymers, with the amino acids as the monomer units. The ordering of the amino acids is the protein's primary structure. Optimistically, we might hope that the primary structure of proteins alone might be enough information to predict the overall protein structure. Unfortunately, no universal set of rules governing protein folding has been found. Predicting protein structure is much like predicting the weather - both systems are far too complex for our current models to give much insight.<p>
<p>
<b>How are protein structures determined?</b><p>
<p>
Determining the structures of proteins is far more difficult than determining those of simple organic molecules. Nuclear magnetic resonance (NMR) techniques are used routinely by chemists to characterize molecules with dozens of NMR active nuclei, but, proteins may have hundreds if not thousands of nuclei detectable through NMR. Simply put, routine protein NMR spectra are often too crowded to yield any useful information. Figure 3.6 below shows a 1H spectra of X with numerous overlapping resonances while Figure 3.7 shows the easily interpretable spectrum of methanol.<p>
<p>
<b>Figure 3.6</b> 1H spectra of X<p>
<p>
<b>Figure 3.7</b> 1H spectra of ethanol<p>
<p>
The overlapping resonances of 3.6 make structural determination exceptionally difficult if not impossible through routine 1H spectra. More complicated "two dimensional" NMR experiments are available which are often more useful for analyzing larger molecules. The simplest of the 2D experiments is called COSY which stands for "Correlation SpectroscopY." We call this a 2D experiment because now there are two axes over which the data is plotted. In the COSY experiment, both axes correspond to the chemical shift (ppm) range of the nuclei. Figure 3.8 shows a 1H COSY of X as displayed as a stack plot.<p>
<p>
<b>Figure 3.8</b> 1H COSY of X, stack plot<p>
<p>
The stack plot presentation is meant to emphasize the fact that 2D experiments result in signal volumes instead of areas as in the 1D experiment. More often, 2D experiments are shown as contour plots. These are similar to topographic maps which detail elevations along terrain. Figure 3.9 below shows a relief-style map next to a contour map of the same landscape. Figure 3.10 shows a COSY of X drawn as a contour plot.<p>
<p>
<b>Figure 3.9</b> a) relief map b) contour map<p>


<p>
<b>Figure 3.10</b> COSY of X, contour plot<p>
<p>
The diagonal through a COSY closely resembles a regular 1D experiment. Notice, however, that there are numerous signals off of the diagonal which always line up between two diagonal resonances. These signals are called crosspeaks and have important significance to interpreting 2D experiments. Notice that crosspeaks occur above and below the diagonal. Indeed, pairs of crosspeaks are usually equal in volume and therefore contain redundant information. In the COSY experiment, crosspeaks imply J-coupling between those two resonances. Recall that in a 1D experiment, J-coupling is found by matching splittings between resonances (see Figure 3.7). Finding complimentary splittings in 1D experiments can often be complicated if the spectrum is crowded and hence the splittings are obscured. 2D experiments like a COSY spread out information making more complex systems easier to manage. Figure 3.11 below shows how the COSY of X can be used to prove Y.
<p>
<b>Figure 3.11</b> COSY of X, contour, with marked crosspeaks showing given connectivities.
<p>
Although COSY experiments can often reveal atomic connectivities, the experiment does not reveal anything about the spatial arrangement of the atoms, and hence the conformation of the molecule. Another 2D experiment, called NOESY (Nuclear Overhauser Effect SpectroscopY), is often used to answer questions about the distances between nuclei. A NOESY of X is given below in figure 3.12<p>
<p>
<b>Figure 3.12</b> NOESY of X<p>
<p>
In a NOESY, crosspeaks indicate close proximity through space between the given protons. We might say that crosspeaks in a 1H COSY qualitatively imply closeness because protons which are coupled must be near each other. Consider, however, when a protein folds in on itself. Protons which are not near each other sequentially, are brought near each other through space. These protons will show NOESY crosspeaks although a COSY spectrum would not. Furthermore, the NOESY experiment provides quantitative information because larger NOESY crosspeaks indicate closer nuclei. In principle, if we can calibrate the magnitude of NOESY crosspeaks to distances, we can determine the structure of proteins by careful consideration of all distance information.<p>
<p>
[followed by example of how structural elements of X were determined through NOESY data. Also need to talk about x-ray structures as well.]<p>