Although an understanding of the HIV lifecycle suggests that inhibition of HIV Protease could lead to a treatment of HIV infections, the creation of HIV Protease Inhibitors requires a detailed understanding of the molecules involved and their interactions. Thus, the development of HIV Protease Inhibitors is taking place at the interface of chemistry, biochemistry, and virology. Overall the idea is to create a drug, commonly a small molecule, that prevents an enzyme, HIV Protease, from performing its biological function. For us to understand the process of developing these new drugs we will need to understand the chemical makeup, structures, and reactions of proteins and their inhibitors.
Examine the pictoral representation of HIV Protease shown in Figure 2.1 below. Although this picture does not look like the representations that we normally associate with molecular structures, HIV Protease is indeed a molecule, actually, two molecules that are loosely associated. In order to design a drug that will inhibit HIV Protease we will need to decipher the meaning of the HIV Protease structure depicted in the figure. To understand why HIV Protease displays its distinctive properties, we will need to learn more about the makeup of proteins. In particular, we will need to learn about the functional groups present and their properties.
Figure 2.1. a) Ribbon and b) ball and stick representations of HIV protease dimer .
a)

b)

The first pictoral representation in Figure 2.1 uses the loops and whirls of the ribbons to show the three-dimensional structure of HIV Protease. Although the ball and stick model of HIV Protease (Figure 2.1) looks nearly incomprehensible, we can make part of HIV Protease look more like the structures we're used to by tranforming a piece of the ribbon diagram into the stick representation as shown in Figure 2.2. The piece of HIV Protease shown in stick form is also shown using dashes and wedges and more traditional symbols (Figure 2.2c). This piece of HIV Protease still looks complicated, but it can be simplified if all carbon chains that lie off the molecular backbone are designated as R1, R2, etc (Figure 2.2d). With the carbon sidechains shown as R groups, the backbone clearly consists of repeating units of NH-CHR-C=O. As we'll see in the next paragraphs, these units are amino acid residues. Amino acids serve as the fundamental building blocks of not just HIV Protease but of all proteins.
Figure 2.2. a) The ribbon structure of HIV1 protease is shown in ribbon with residues 53-57 shown in stick form. b) An enlargement of the stick representation of residues 53 through 57. c) A chemical structure representation of these same residues. d) A generalized chemical structure representation of these residues.
a) b)

c)

d)

An amino acid consists of an amino group, a carboxyl group, a hydrogen atom, and an R group bonded to a carbon atom (Figure 2.2). The R group, also called the side chain, is different in each of the twenty different amino acids commonly found in proteins. In glycine, the simplest amino acid, the R group is a hydrogen atom. Other amino acids have R groups containing carboxylic acids, amides, aromatic rings, alkyl groups, hydroxyl groups, or other more complicated functional groups (Table 2.1). The side chains vary in size, shape, charge, hydrogen-bonding capacity, and chemical reactivity. This wide variety in the side chains is critical because it enables proteins to create intricate three-dimensional structures and to participate in many essential biological processes.
Of the twenty naturally occurring amino acids, alanine, valine, leucine, and isoleucine have no reactive functionality on their side chains. They have only methylene (-CH2-) and methyl (-CH3) groups. Their main contribution to protein structure and reactivity is that they are hydrophobic; they do not interact favorably with water. Instead, they interact favorably with each other and with other nonpolar atoms.
Phenylalanine, tyrosine, and tryptophan all have aromatic side chains. The aromatic side chains cause ultraviolet absorbance and fluorescence properties. Because the spectral properties are sensitive to the environment around the side chains, these residues are useful probes of protein structure.
Two amino acids, serine and threonine, have hydroxyl groups on their side chains. Serine and threonine can react the same way that ethanol reacts. Two other amino acids, methionine and cysteine, have sulfur in their side chains. Sulfur can be oxidized. In cysteine, oxidation of the thiol results in a disulfide bond. Disulfide bonds crosslink proteins and stabilize the protein structure. Since proteins are mostly held together by many weaker noncovalent interactions, disulfide bonds can have a very large effect on the stability of the protein structure.
Aspartic acid and glutamic acid both have carboxylic acids in their side chains. The pKa of Aspartic acids is 3.9 and the pKa of glutamic acid is 4.3; these residues are ionized and very polar under physiological conditions. We will see later that these amino acids participate in salt bridges to stabilize protein structures and also that two aspartic acid residues are found in the active site of HIV Protease. Asparagine and glutamine are amides rather than carboxylic acids. These side chains do not ionize and are not very reactive. However, they can function as both hydrogen bond donors and acceptors. We will see many more properties of amides in the following sections of this unit.
The two basic residues are lysine and arginine. The side chain of lysine has four methylene groups capped by an amino group with pKa = 11.1. This means that lysine is usually ionized under physiological conditions (pH~7.0). Any unprotonated lysines are active nucleophiles and participate in a variety of reactions. In arginine, there is a strongly basic guanidine (three nitrogens all connected to a central carbon). The pKa of arginine is about 12. The planarity of amide bonds will be discussed in detail later; note that the guanidine group is also planar.
Histidine has an imidazole ring in the side chain. This makes it nucleophilic and basic like other amines, and the pKa of imidazole is 7 so it is neutral at physiological pH. In the nonprotonated form of imidazole, the nitrogen with the hydrogen atom is a hydrogen bond donor while the nitrogen with the lone pair is a hydrogen bond acceptor. Because histidine contains an imidazole ring, it is very versatile.
The twentieth amino acid, proline, has a side chain of methylenes that are covalently bonded to the nitrogen atom of the backbone to form a pyrrolidine ring. The five-membered ring forces the amino acid rigidly into one conformation and can cause kinks in a peptide chain. Proline is thought to play a role in initiating some of the structural motifs regularly found in proteins such as the a-helix. Also, once a peptide bond has been formed using the secondary amine in proline's backbone, no hydrogen bonding is possible becuase no amide hydrogen is present.
Figure 2.3. The general form of amino acids (a) written as a neutral L-amino acid (S configuration), (b) writen in the zwitterionic form.
a) b)

Although amino acids differ in the makeup of their side chains, the sense of chirality is the same in virtually all naturally occurring amino acids. The four substituents on the tetrahedral carbon (amine, carboxylic acid, hydrogen, and R) are usually in the S orientation, rarely in the R orientation (e.g., bacteria and some worms have small amounts of the S-isomer, or D- amino acids). In protein chemistry, the S-isomer is usually called the L-isomer while the R-isomer is called the D-amino acid. Also, amino acids in solution at neutral pH exist mainly as zwitterions rather than as unionized molecules. This means that the amino group is protonated (NH3+) and the carboxyl group is deprotonated (CO2-). All of these concepts are illustrated in Figure 2.3.
Table 2.1. The Primary Naturally Occuring Amino Acids
| Name | Three letter code | One letter code | Structure ![]() |
|---|---|---|---|
| R = H | |||
| Glycine | Gly | G | ![]() |
| R = Alkyl | |||
| Alanine | Ala | A | ![]() |
| Valine | Val | V | ![]() |
| Leucine | Leu | L | ![]() |
| Isoleucine | Ile | I | ![]() |
| R = Aromatic | |||
| Phenylalanine | Phe | F | ![]() |
| Tyrosine | Tyr | Y | ![]() |
| Tryptophan | Trp | W | ![]() |
| R = Alcohol | |||
| Serine | Ser | S | ![]() |
| Threonine | Thr | T | ![]() |
| R = Thiol, Disulfide | |||
| Methionine | Met | M | ![]() |
| Cysteine | Cys | C | ![]() |
| R = Carboxylic Acid | |||
| Aspartic Acid | Asp | D | ![]() |
| Glutamic Acid | Glu | E | ![]() |
| R = Amide | |||
| Asparagine | Asn | N | ![]() |
| Glutamine | Gln | Q | ![]() |
| R = Base | |||
| Lysine | Lys | K | ![]() |
| Arginine | Arg | R | ![]() |
| R = Special | |||
| Histidine | His | H | ![]() |
| Proline | Pro | P | ![]() |
The amino acids in HIV Protease (and all other proteins) are linked together in an end-to-end arrangement, and the linking bond between two amino acids is called a peptide bond. The peptide bond (also called an amide bond) joins the carboxylic acid of one amino acid to the amino group of a second amino acid. When a small number of amino acids are joined together in this way, a peptide is formed. When many amino acids (often 100 or more) are joined together in this way, a polypeptide or protein such as HIV Protease is formed. Each amino acid unit in the protein is called a residue. HIV Protease is comprised of 99 amino acid residues all linked together by peptide bonds between the carboxylic acid of one residue and the amino group of a second residue. In a protein or peptide, the ordering or sequence of amino acids is called the primary structure. The primary structure tells you about the connectivity of the amino acids. We'll see later that secondary, tertiary, and quaternary structure tell about the three-dimensional structures of proteins. The piece of HIV Protease shown in Figure 2.2b-d represents residues 53-57, and a list of all 99 amino acid residues is shown in Figure 2.4. Note that the customary way of numbering amino acids in proteins is to begin at the amino terminus and work toward the carboxy terminus. Further, amino acids can be named using their full name, a three letter abbreviation, or a one letter abbreviation as shown in Table 2.1.
Figure 2.4. The 99 amino acid residues of HIV Protease.
(N-terminus) Pro Gln Ile Leu Trp Gln Arg Pro Leu Val Thr Ile Lys Ile (15)Gly Gly Gln Leu Lys Glu Ala Leu Leu Asp Thr Gly Ala Asp Asp (30)
Thr Val Leu Glu Glu Met Ser Leu Pro Gly Arg Trp Lys Pro Lys (45)
Met Ile Gly Gly Ile Gly Gly Phe Ile Lys Val Arg Gln Tyr Asp (60)
Gln Ile Leu Ile Glu Ile Cys Gly His Lys Ala Ile Gly Thr Val (75)
Leu Val Gly Pro Thr Pro Val Asn Ile Ile Gly Asr Asn Leu Leu (90)
Thr Gln Ile Gly Cys Thr Leu Asn Phe (99) (C-terminus)
There are many different functional groups present in proteins that you have already seen: aromatic groups, alkyl groups, alcohols, amines, and thiols. However, to understand proteins we will need to introduce a new fragment, the carbonyl group. We'll examine the carbonyl group in its various forms with an emphasis on amides and carboxylic acids.
Why the emphasis on carboxylic acids and amides? As mentioned above, condensing a carboxylic acid with an amine generates an amide (Figure 2.5). It is the amide bond which links amino acids together to form the backbone of the protein. Therefore, to understand the structures, physical, and chemical properties of proteins such as HIV Protease, we must develop an understanding of carboxylic acids and amides.
Figure 2.5. Pictures of an amine, a carboxylic acid, and an amide.

The carbonyl group itself (pronounced car-bo-neel) is a fragment that contains a carbon atom which is doubly bonded to an oxygen atom (C=O). The carbonyl group is an essential component of many biologically important molecules and pharmaceutical drugs. All proteins, fatty acids, and many pharmaceuticals contain carbonyl groups. A wide variety of types of carbonyl groups exist. These types are differentiated by the groups attached to the C of the C=0. A list of common types of carbonyl groups are given in Table 2.2. All types of carbonyl groups contain an acyl fragment(often denoted 'R' as in the amide example below) which may be alkyl, aryl, alkenyl, etc.
Table 2.2. Types of Carbonyl Groups. The functional groups are shown in blue; the acyl fragment is shown in black.
| Name | Formula |
|---|---|
| Aldehyde | ![]() benzaldehyde |
| Ketone | 2-butanone |
| Carboxylic acid | ![]() 2-methylpropionic acid |
| Ester | ![]() t-butylcyclopentylcarboxylate |
| Amide | ![]() N,N-dimethylamide |
| Acid Halide | acetyl chloride (X = Cl) |
| Acid Anhydride | acetic anhydride |
Consider the two polymers depicted in Figure 2.6 below. Assign hybridizations to each of the backbone atoms and fill in all of the missing H's using wedged and dashed bonds. Also, draw a curved arrow on all the bonds about which rotation can occur (one is done as an example). How does the polymer that has an all carbon framework differ from the protein polymer?
Figure 2.6. a) A polymer with an all carbon backbone and b) a poly peptide, the backbone of proteins.
a)

b)

Proteins are flatter than you might predict. Let's use hybridization concepts to examine proteins in more detail. Like the alkene, the valence bond picture of any carbonyl group consists of three sigma bonds and one ¹ bond. Because the C atom of the carbonyl is sp2 hybridized, all carbonyls are planar with bond angles of about 120o. As shown in Figure 2.7 and 2.8, the bond angles about carbonyl carbons are close to 120o and all three atoms attached to the C as well as the C atom itself lie in a plane. On the basis of the double bond character of the carbonyl bond, we expect a shorter and stronger C-O connection than would be seen for a simple single C-O bond. These expectations are born out by experimental data: typical C=O bond lengths are considerably shorter (1.22 ) and stronger (732 kJ/mol) than C-O bonds (1.43, 385 kJ/mol).
Figure 2.7. Geometric data for compounds containing C-O and C=O bonds.





Figure 2.8. Bond lengths and angles for a piece of HIV Protease.

Another contributing factor to the short C=O bond lengths of carbonyls is the polar character of carbonyl groups. This polarity arises from the higher electronegativity of oxygen (3.5) relative to carbon (2.5). Another way of stating this using Lewis structures is to say that the ionic Lewis form makes a substantial contribution to the description of a carbonyl (eq 2.1). Recall that the depiction of the C=O group with two Lewis structures that are in resonance does not mean that the carbonyl sometimes is double bonded and sometimes dipolar. The carbonyl is always best described as a mixture of a covalent, doubly bonded C=O and a dipolar, singly bonded C-O.
(2.1)
A striking geometric property of the piece of HIV Protease that we've been discussing is the extension of the planar geometry about the carbonyl C to include planarity about the amide N. Examine the three dimensional structure of a segment of HIV Protease shown in Figure 2.9. As the shaded planes emphasize in the figure, the six atoms of the amide linkages (C-CO-NH-C) all lie in the same plane. Now compare the experimental geometry with the structures that you drew at the top of this section and make any necessary corrections. Simple bonding arguments do not lead to the correct structure because these notions predict pyramidal geometries about the N (as in amines).
Figure 2.9. The planarity of the amide.

In order to understand the extended planarity of the amide linkages in proteins, we turn once again to the concept of resonance. Using the octet rule only as a guide in drawing Lewis structures, three acceptable arrangements are possible for amides (Figure 2.10). As you have seen previously for aromatic compounds, unusual stabilities and shortened bond lengths commonly are associated with molecules having several Lewis structures in resonance. Unlike the resonance of equivalent cyclohexatriene structures used to describe benzene, the resonance structures for amides are not equivalent. Hence, we do not expect equal contributions from each resonance structure in describing the amide.
Figure 2.10. Resonance Structures for an amide.

Let's focus on the third resonance structure in which there is a formal C=N double bond and formal charges of +1 and -1 at the N and carbonyl O, respectively. If this resonance structure is a significant contributor, we expect the C-N bond length to be shorter than a typical C-N bond. Indeed, amide C-N bonds are shorter (C-N Å 1.36 ) than those of simple C-N single bonds (C-N Å 1.47 ). Also, participation of the C=N resonance structure should induce coplanarity of the (C-CO-NH-C) fragment, just as the six atoms of ethene (H2C=CH2) are coplanar. As mentioned previously, experimental structure determinations indicate that amide linkages are planar.
The experimental observation of a barrier to C-N bond rotation of about 17 kcal/mol (72 kJ/mol) for formamide provides strong evidence for a significant contribution of the Lewis structure in which the N is doubly bonded to carbonyl carbon. In contrast to amines, the N lone pair of an amide is partially occupied in the formation of a partial C=N double bond. In later sections of this unit, we will see how the formation of a partial C=N double bond affects not only structure but also reactivity; amines and amides have very different patterns of reactivity.
The best picture of the amide is one that emphasizes the first resonance structure as the primary contributor and the third resonance structure as a minor contributor. Although a minor contributor, there is enough C=N character to flatten out the amide group. Why isn't this resonance structure the primary contributor? Formal charges help to rationalize the minor contribution of C=N resonance structure to the net structures of amides. Drawing a C=N double bond places a formal charge of +1 at the amide N and a -1 charge at the carbonyl O.
For amides formed from primary amines such as N-methylformamide (HC(=O)NHCH3), there appears to be a slight preference (ca. 5 kJ/mol) for the conformer in which the CH3 group is syn with respect to C=O. The terms syn and anti are used for carbonyl-containing compounds such as amides to denote the location of the substituent on the nitrogen. In the syn isomer the substituent is on the same side as the carbonyl; in the anti isomer the substituent is on the side opposite the carbonyl. You can compare this to E and Z alkenes where E corresponds to syn and Z corresponds to anti. An energy diagram describing the interconversion of syn and anti conformers of N-methylformamide is shown in Figure 2.11. Although the barrier to rotation about an amide bond (72 kJ/mol) is less than that for rotation about a C=C double bond (>100 kJ/mol), this hindered rotation has significant effects on the structure of proteins.
Figure 2.11. Energy diagram and syn and anti conformers of N-methylformamide.
