1.1 Cell function depends on the regulation of genes and proteins

A "simple" bacterial cell contains of the order of a million protein molecules, encoded by several thousand genes. A typical human cell contains of the order of a billion protein molecules, encoded by about 25 000 different genes. If we made a mixture of a billion protein molecules, chosen at random from 25 000 different types, in a very small test tube, we would get a gooey mixture with no useful function. Yet a cell is not a gooey mixture but rather a well-organised and functional living entity. This is as a result of the regulation of the production and activity of proteins inside the cell.

Slide 2 shows three types of mammalian cell: a nerve cell, muscle cells and blood cells. These cell types look very different and behave in completely different ways, yet all of them can be found in a single organism, all containing the same genetic code and all arising from the same fertilised egg cell. What’s different about these cell types is that they make different protein molecules from their genes. Some genes that produce protein in a nerve cell don’t produce protein in a muscle cell, and vice versa. At some stage in the development of an embryo, each cell receives signals that cause it to turn off some genes and to turn on others, and this dictates its future behaviour and function.

Slide 3 shows a photograph of a population of bacterial cells growing on a flat surface of agar gel. Each of the dots is a colony of about 106 cells that have grown from a single "mother" cell, so they all contain the same genetic information. However, some colonies are blue and others are white. The blue cells are making the enzyme beta-galactosidase and the white cells aren’t. In these cells, a genetic switch controls whether the gene encoding beta-galactosidase is switched on or off. This switch flips randomly, so some colonies make the enzyme and some don’t.

1.2 Regulation uses molecular binding and recognition

The key element in cell regulation is molecular binding (i.e. the attachment of specific molecules to one another). Unlike Lego blocks, which bind together in a non-specific way (any Lego block will stack up with any other), proteins and DNA molecules are very specific in their interactions. The three-dimensional (3D) structure of a particular protein molecule means that it will interact with some small molecules, proteins or DNA sequences but not with others. When a protein molecule meets its interaction partner, its behaviour may change dramatically – and in the case of DNA, binding can have important effects on the behaviour of a gene.

Slide 4 shows the 3D structure of a protein called the lac repressor, which is found in the bacterium Escherichia coli (E. coli). This is a DNA-binding protein that turns off the expression of genes coding for proteins that are involved in the metabolism of the sugar lactose. When lactose is absent, the lac repressor prevents the bacterium from producing the agents necessary for lactose metabolism. When lactose is present, it inhibits the lac repressor’s DNA binding ability and thereby allows for the uptake and metabolism of lactose.

The lac repressor protein has a specific binding site for lactose. When lactose binds to the lac repressor protein in a specific position called the binding pocket, it changes the lac repressor’s 3D conformation and thereby alters the way in which it behaves. When lactose is absent, the protein first binds to an identical copy of itself to make a dimer, and then it binds to a specific DNA sequence (as shown in Slide 4). However, if lactose is bound to the binding pocket, the lac repressor protein cannot bind to the DNA sequence.

Why is it important that the lac repressor protein binds to DNA only when lactose is not bound to it? This is important because protein binding to DNA in this case controls whether genes are turned on or off. In this example, the lac repressor protein binds to the DNA close to the promoter sequence (also referred to as the promoter site, or just promoter) where RNA polymerase needs to bind in order to transcribe a series of genes called the lac operon. When the lac repressor is bound to the DNA, it prevents the RNA polymerase complex from accessing the promoter site. So when the repressor protein is bound, the genes controlled by this promoter are "turned off". In this way, two types of molecular binding events – sugar to protein (lactose binding to lac repressor), and protein to DNA (lac repressor to promoter) – control whether or not these genes are turned on.

It is important to note that protein–DNA binding does not always turn genes off. Some proteins (activators) can bind to DNA and turn genes on. For example, the catabolite activator protein (CAP) regulates the same genes (the lac operon) as the lac repressor. The CAP has a "sticky patch". When CAP is bound to its DNA binding sequence close to the promoter region, this sticky patch can interact with RNA polymerase bound to the promoter and thereby increase the chance that the genes will be transcribed.

1.3 Statistical mechanics of molecular binding

(This part closely follows Section 6.1.1. of Phillips et al. 2009.)

Let us suppose we have one protein molecule, P, with a single binding site for a small molecule ligand, S. If the concentration of S is c, what is the probability, pbound, that the protein molecule will be bound by a molecule of S? We denote the energy of an unbound ligand molecule as εsol ("solvation") and the energy of a bound ligand molecule as εbound.

Slide 5 shows a representation in which each ligand molecule occupies one site of a lattice. The total number of lattice sites is Ω and the number of ligand molecules is L, so

Equation 1
Equation 1


where c0 is a constant that defines our units of concentration.

To work out the probability of the protein being bound, we need to work out the total statistical weight of configurations of type B in Slide 5 (bound) versus those of type A (unbound). To do this, we compute the energy of each configuration of type A and type B, and then multiply this by the total number of such configurations, to get the total statistical weight of the bound versus unbound states, as shown in Slide 5.

The probability that P is bound is then given by the fraction


Equation 2


Writing this ratio down and carrying out several algebraic manipulations, as shown in Slide 6, we arrive at the important result that


Equation 3


where β = 1/kBT 
Equation 4

where T is temperature and kB is the Boltzmann constant and

Δε= εbound – εsol 
Equation 5.

Equation 3, plotted in Slide 7, is important and tells us that the probability that the protein is in the bound form depends nonlinearly on the concentration of the ligand.

When there is little ligand available, the binding probability increases approximately linearly with ligand concentration. However, when the ligand concentration is high, the binding site becomes saturated and adding more ligand makes little difference to the probability of binding.

We can also use Equation 3 to describe a repressor or activator protein molecule binding to DNA. The repressor or activator protein then plays the role of the ligand in the above calculation, and the DNA binding site plays the role of the macromolecule.

1.4 Regulatory interactions combine to make networks

So far we have learned that the behaviour of proteins can change when small molecules or other protein molecules bind to them, and that gene expression can be turned on or off by proteins that bind to DNA. How do these molecular binding interactions determine, for example, whether a cell develops into a nerve or muscle cell? The answer is that regulatory networks combine the results of many molecular binding events (which are triggered by the state of the cell and its environment) to determine how the cell should behave.

1.4.1 The lac regulatory system

One of the best-studied cases of a regulatory system is illustrated in Slide 8. This is the regulatory network that controls the production of the "machinery" for metabolising lactose in E. coli. We have already discussed its component parts: the lac repressor protein and the CAP activator protein. E. coli lives in the human gut, where different sugars become available at various times. E. coli needs to make different molecular machinery to cope with different sugars. However, this uses energy so the bacterium needs to make machinery only for the sugar that is available at a particular time to avoid wasting energy. In particular, if glucose is available, it is more beneficial for E. coli to metabolise this rather than lactose. However, in the absence of glucose and the presence of lactose, the bacterium needs the machinery required to metabolise lactose instead.

Slide 8 shows the regulatory network that controls how E. coli metabolises different sugars. The genes that are transcribed from the promoter encode the machinery to deal with lactose. The inputs to the network are the concentrations of lactose and glucose. The output is the transcription of the genes for lactose metabolism.

As discussed earlier, when it is not bound to lactose, the lac repressor binds to the DNA and turns off transcription, so the machinery for metabolising lactose isn’t made. The CAP activator protein binds to the DNA and turns on transcription. However, the CAP activator protein has a binding site for glucose. When glucose is bound to CAP, it cannot bind to the DNA. So the lac genes are only transcribed when the concentration of glucose is low (CAP binds to the DNA) and the concentration of lactose is high (when the lac repressor does not bind to the DNA). In this way, E. coli determines whether or not it is worthwhile making the machinery to metabolise lactose.

Slide 9 shows a convenient way to illustrate the molecular interactions that make up the lac regulatory network. Here, positive molecular interactions (activation) are shown by arrows and negative molecular interactions (repression) are shown by “blocker” bars. This type of diagram is often used to represent regulatory networks and is convenient when the networks are complicated, involving a lot of interactions.

1.4.2 The phage lambda switch

Slide 10 shows another regulatory network, this time in a virus called bacteriophage lambda, which infects E. coli. Once a bacterial cell is infected, the virus has two options: it can either hijack the cell machinery to replicate itself and then kill the cell (known as lysis), resulting in its release, or it can add its DNA to the DNA sequence of the bacterium and lie dormant inside the host cell (known as lysogeny) until conditions are more favourable for lysis. Which of these developmental pathways is adopted is determined by (a more complex version of) the regulatory network shown in Slide 10. This network contains two genes, cI and cro. When the cro gene is activated, cell lysis results; when the cI gene is activated, lysogeny follows. What prevents both pathways from being activated simultaneously?

As shown in Slide 10, the cI gene encodes a protein, CI, which acts as a repressor of the cro gene and an activator of its own gene. Thus, when cI is active, cro is repressed and remains inactive, while cI remains active. Likewise, the cro gene encodes a protein, Cro, which acts as a repressor of the cI gene. Thus, when the cro pathway to lysis has been adopted, the cI pathway to lysogeny is automatically shut down. In this way, the virus ensures that a binary all-or-nothing "decision" is made between lysis and lysogeny. This is an example of a bistable switch: a regulatory network with two distinct outcomes. Bistable switches are important not just for bacteriophage lambda but also in developmental and cell-fate decisions in many other cells, including human ones. (Bistable switches are also used in electronic control networks, where they maintain a circuit in one of two stable states until some external trigger is applied – very similar to their biological analogue.)