The previous lecture described the 20 amino acids that make up all proteins, and went on to discuss how the primary and higher-order structures of protein molecules can be characterised. In this lecture, attention turns to the interactions that govern protein folding (Slide 1).

Slide 1 The folding of proteins into precise 3-dimensional shapes is governed by the interactions between constituents of the amino-acid chains
Slide 1 The folding of proteins into precise 3-dimensional shapes is governed by the interactions between constituents of the amino-acid chains

4.1 Introduction to protein folding

The way a protein functions depends not only on its chemical composition but on its shape. To function correctly, a protein molecule must spontaneously fold into a complex yet precisely determined configuration.

The intricacy of the folding process can be illustrated using animations. The following are just three examples of the many animations freely available from sites such as YouTube. These vary in the level of detail and accuracy they show, and an internet search will yield many others.

Folding pathways

Protein G folding transition

GCSF Protein Folding

Given the vast number of possible configurations of a large molecule, it might seem unlikely that proteins should fold correctly, but correct folding is in fact common. The interactions between various parts of a protein molecule can go some way to explaining why proteins do adopt particular configurations, and that is the subject of this lecture.

4.2 Forces involved in protein folding

Biochemistry relies as much on weak, non-covalent molecular interactions as it does on covalent bonds. Table 1 (also shown on Slide 2) is a rough guide to the relative strengths of the relevant interactions.

Reversible non-covalent interactions stabilise the three-dimensional structures of macromolecules, including proteins; they are responsible for molecular recognition and binding, including the recognition of substrates by enzymes and the interaction of signalling molecules.

The folded structure of proteins is only marginally stable. The free energy required to denature a protein in an aqueous solution is around 0.4 kJ mol-1 per amino acid – so 40 kJ mol-1 for a 100-residue protein. Note: this is the difference between (interaction energies within the folded protein) and (interactions between the un-folded protein and water in the surrounding solution). For comparison, the free energy required to break a typical hydrogen bond is ~20 kJ mol-1. This does not mean that the forces holding the protein together are weak – the enthalpy of hydrogen bonding, van der Waals and hydrophobic interactions is of the order of thousands of kJ mol-1.

The free energy change on protein folding is so small because enthalpy (ΔH ~ -103 kJ mol-1) is almost exactly balanced by the entropic cost of choosing one well-defined structure over the many possible disordered configurations (-TΔS ~ +103 kJ mol-1, with kBT = 2.5 kJ mol-1). This is not an accident, but almost certainly a property selected by evolution. Hyperthermophiles, that live in hot springs and hydrothermal vents at temperatures up to and above 100 ºC, have proteins that are necessarily significantly more stable than those of mesophiles that grow at normal temperatures – the difference in thermal stability is of the order of 100 kJ mol-1 – but at the temperatures at which these organisms grow, their proteins are similarly marginally stable.

Most proteins fold in an aqueous environment – their stability is strongly affected by the properties of the solvent.

4.2.1 Generic forces

Generic forces are those that are common to all parts of any protein, regardless of its sequence.

Proteins fold in such a way that hydrophobic side chains are largely buried in the interior, out of contact with the surrounding water (Slide 19). This is analogous to the way in which amphiphilic molecules such as detergents form micelles, in which polar head groups are in contact with the solvent and hydrophobic tails pack together out of the way.

The hydrophobic effect is a relatively weak effect that is important because it adds up – the free energy change on removing a –CH2- group from water is about -3 kJ mol-1, but a typical protein folds to bury many such groups. Because of this, hydrophobic interactions are the main reason that folded proteins are stable.

The free energy change on removing a hydrophobic group from aqueous solution is not due to a change in enthalpy but to entropy. Water is highly cohesive - the molecules of liquid water form rapidly fluctuating hydrogen bonded networks that, on a short enough length scale, resemble those in ice. A hydrophobic molecule or surface disrupts these networks. Neighbouring water molecules still manage to form hydrogen-bonded networks, because it would be too enthalpically costly not to – but this imposes a constraint on their configurations – a few layers of molecules next to the non-polar surface are more ordered than in bulk water. Slide 20 shows how water might order to form a hydrogen-bonded cage – you can appreciate that there are rather few ways to do this that satisfy all possible hydrogen bonds. This represents a reduction in entropy compared to bulk water, represented in the middle image, where there are many possible networks of hydrogen bonding.

The van der Waals interaction (Slide 21) is due to dipole-dipole attraction where one dipole is the result of a spontaneous fluctuation and the other the polarization induced in the charge distribution of a neighbouring atom or molecule. It is very weak, and it falls off as r-6. (The field of spontaneous dipole, and therefore the magnitude of an induced dipole, falls of as r-3; energy –p.E µ r-6 where p is the dipole moment and E the electric field). There is an even shorter range repulsion between atoms that becomes significant when filled electronic shells overlap. (The sum of these two gives the familiar Lennard-Jones potential.) There is therefore a potential minimum which defines the equilibrium spacing between contacting atoms. It is often a good approximation to consider atoms as hard spheres, with a van der Waals radius defined such that the spacing between two atoms at the energy minimum is the sum of their radii (Slide 5). Slide 6 shows various representations of a water molecule; note the relative sizes of the covalent bond length and the van der Waals radii.

The van der Waals bonding energy is of order 2 to 4 kJ mol-1per pair of atoms in close contact. Nevertheless, folded proteins contain many atoms in close contact and van der Waals interactions are important in determining their conformations. They are also important when proteins dock with other proteins, or bind small molecules.

van der Waals interactions are not affected by the presence of water as a solvent – because they are negligible at separations large enough to allow solvent molecules between the interacting atoms.

4.2.2 Sequence-specific forces

Slide 7 shows an electrostatic interaction, the ionic interaction. The dielectric constant of water is about 80. This is an average property appropriate for length scales much bigger than the size of a water molecule. For two ions in contact, the dielectric constant is 1. The effective dielectric constant increases as the separation between the charges increases – over a length scale of the order of 1 ~2 nm.

For the interior of a folded protein the dielectric constant is usually taken to be in the range 3~5, by analogy with molecular liquids with similar polarity. The electrostatic energy of a pair of ions – say the carboxyl group of glutamate and the ammonium group of lysine – embedded within a protein of dielectric constant 4 and separated by 0.4 nm is -86 kJ mol-1. However, there is an entropic penalty for constraining these side chains to bring the charges close to each other, and if those ions were at the surface of the protein their charges would be very effectively shielded by ordered shells of water. There is therefore little free energy difference between an unsolvated ion pair (or salt bridge) buried within a protein and a solvated pair of ions at the protein surface. So – although ionic interactions are strong, they are not usually important in stabilising protein structure.

Slide 8 shows another electrostatic interaction: the diplor interaction. Compared with felds of ions, dipolar fields are weaker, and more rapidly attenuated (1/r3 rather than 1/r2). For two carbonyl groups positioned head to tail 0.5 nm apart the electrostatic energy is -9.3 kJ mol-1. Electrostatic interactions between permanent dipoles are significant, however, for two reasons: firstly, there are lots of them – including a carbonyl and an amide group for each backbone unit; secondly, if permanent dipoles are aligned parallel to each other, as in an alpha helix, then they can add to produce a large dipole.

Hydrogen bonds (Slide 9) are largely electrostatic interactions between a weakly acidic donor and an acceptor carrying a lone pair of electrons. They can be considered as a hydrogen atom shared between two electronegative atoms – the donor is the atom with which the hydrogen is most closely associated. The donor-hydrogen bond is polarised, and the fractional positive charge on the hydrogen attracts the acceptor. The bond has some covalent character, due to interaction between the hydrogen and the lone pair on the acceptor – it is therefore weakly directional, favouring a linear arrangement of the donor, hydrogen and acceptor atoms. For the same reason, it is shorter than the sum of the van der Waals radii of the constituent atoms.

In biological systems the donor and acceptor are usually electronegative nitrogen and oxygen atoms – sometimes sulfur. The diagram on Slide 9 shows hydrogen bonding in water, in which the donor and acceptor are both oxygens. A C-H group can also be a weak hydrogen bond donor, and the pi electrons of an aromatic group – for example tryptophan – can act as a weak acceptor. The donor-acceptor bond distance is usually in the range 0.27 – 0.31 nm, and hydrogen bond formation energies are in the range -12 ~ -40 kJ mol-1 (-8 to -16 kJ mol-1 for a weak donor or acceptor). Hydrogen bonds, for example between O and NH, are common in cells (Slide 10).

The contribution of a hydrogen bond to protein stability (Slide 11) is much less than that – in the range -2 ~ -8 kJ mol-1 – because in the unfolded state water can replace both donor and acceptor.

Although hydrogen bonds make only small contributions to the stability of a folded protein compared to the same polypeptide unfolded, they can help to distinguish between correctly folded proteins (where there are many internal H-bonds) and incorrect folds.

Some protein structures are stabilised by covalent crosslinks - disulfide bonds - formed between cysteine residues (Slide 12). These are stable in the extracellular environment, but unstable in the more reducing environment within a cell. Disulfide bonds are relatively weak, because although the enthalpy associated with breaking a covalent S-S bond is very large, that is not the relevant reaction – the bond is broken by chemical reduction (to give, for example, two S-H bonds), and the free energy change associated with this reaction is of the order of a few kJ mol-1 – positive or negative, depending on the environment. Disulfide bonds normally occur in extracellular proteins.

4.3 Concluding remarks

Slide 2 provides a summary of interactions. Proteins generally fold to minimise their free energy. Remember that all these contributions to the enthalpy of the folded protein are almost completely offset by the loss of configurational entropy on folding – folded proteins are only marginally stable. Proteins can usually fold by themselves, though they are sometimes assisted by other protein molecules called chaperones.

In the late 1960s, American molecular biologist Cyrus Levinthal pointed out that correct protein folding by a simple random search is vanishingly improbable, and yet correctly folded proteins are common. This is known as Levinthal’s paradox (Slide 13).

Levinthal noted that an n-residue protein has 2n backbone torsional angles (φ,ψ). If we assume 3 stable conformations for each (cf Ramachandran plot, Slide 11), and that a protein can explore a new conformation in the characteristic time for reorientation of a single bond ( ~ 10-13 s), then the time required to explore all possible configurations for n = 100 residues is about 1087 s. For comparison, the age of the universe is about 13.7 x 109 years i.e. about 4 x 1017 s.

The paradox can be resolved by noting that there is more to it than random folding – proteins sequences contain information on folding paths as well as on the stability of the folded structure. Proteins must fold by some sort of ordered pathway or set of pathways in which the approach to the native state is accompanied by sharply increasing conformational stability, so that folding is progressive and hierarchical.

Finally, the videos below show examples of the wide range of remarkably mechanistic functions that proteins can perform - they do more than just fold. This lecture has described how unlikely it is that proteins fold at all. But that is only a small part of the story of life. Proteins not only fold, but they act and interact in a vast number of highly improbable ways. That all this should exist requires the power of evolution, which is the subject of Lecture 5.