Cells with identical genes and environmental factors can differ chemically (Slide 1). In this lecture, we see how this can come about, and we use ideas about probability to model the processes mathematically.


3.1 Chemical reactions are stochastic

In Lecture 2 (“The physics of biological regulation”), we discussed a chemical reaction in which molecules of types A and B combine to make a complex C. We represent this reaction as

Equation 1
Equation 1


where q is the rate constant for the reaction. Slide 2 shows how the number of C molecules increases in time, if we start with a 50:50 mixture of A and B. These results were obtained via computer simulations.


A series of simulations were carried out, starting with 1000 molecules each of A and B, then with 20, 10 and 5 molecules each, with the rate constant, q, set numerically equal to 1 (to keep things simple). In each case, the simulation was repeated five times. The results of each run are plotted in different colours. When the total number of molecules is large, the number of C molecules rises smoothly and the repeat runs all give the same results. In this case, we can model the system with deterministic ordinary differential equations, as discussed in Lecture 2.

However, if the total number of molecules is small, the system becomes very “noisy”: the number of C molecules does not rise smoothly and repeat simulation runs give different results.

Using standard methods from statistics, we can quantify what we mean by the number of molecules, N, being "small". It is convenient to define s as the ratio of the standard deviation in the mean to the standard deviation itself,

Equation 2a
Equation 2a


When N is large, s is small (we get essentially the same result each time), but for small N we have

Equation 2b
Equation 2b


or equivalently

Equation 2c
Equation 2c

So (as we shall see later) it turns out that these “small molecule number” effects become important when the number of molecules becomes small enough that it is similar to its own square root.

Putting in the starting numbers of molecules for the simulations in Slide 2, when N = 2000, s = 0.022, but when N = 5, s = 0.44. Although Slide 2 shows computer simulation results, the same effect would happen in an experiment, if we could build an experimental system so small that it contained only a few molecules each of types A and B.

What is going on here? Why is our chemical reaction “noisy” when the number of molecules is small? The reason is that chemical reactions are stochastic, or random. That is, the outcome is governed by probabilities, and there are sufficiently few molecules that there is no single overwhelmingly favoured outcome.

In our box of A and B molecules, we do not know the exact positions and velocities of all of the molecules and so we do not know the exact time when a pair of A and B molecules will meet and react. The exact times when reactions happen and the exact sequence of reactions that happen can be different in repeat runs of the same experiment.

This may all be very interesting but shouldn’t it be irrelevant? Even in something as small as a bacterial cell, there are many billions of molecules, so why would these stochastic effects be important? In fact, stochastic effects can be very important in cells, because even though the total number of molecules in a cell is large, the number of molecules involved in a particular biochemical reaction network can be very small. For example, in slow-growing cells, there is only one copy of the DNA (so the number of molecules of a particular gene may actually be only one). The number of messenger RNA molecules in the cell corresponding to a particular gene can also be very small for weakly expressed genes, and some proteins are only present in small numbers. Biochemical reaction networks involving genes, mRNA or proteins that are present in small numbers per cell are likely to be dramatically affected by small-molecule number fluctuations. We call these stochastic fluctuations "biochemical noise".

3.2 Individual cells are not identical

The fact that biochemical noise really is significant for biological cells was illustrated in an important experiment by Michael Elowitz et al. in 2002. They engineered Escherichia coli bacteria carrying two different-coloured fluorescent reporter genes. These genes encode proteins that do not interfere with any cellular functions but when excited by UV light of the right wavelength they fluoresce (i.e. they emit light of a longer wavelength). This can be detected in an epifluorescence microscope. Elowitz et al. were therefore able to measure the relative amounts of the two fluorescent proteins in individual bacterial cells. The question that they wanted to answer was: if two cells are genetically identical and experience the same environmental conditions, will they produce the same amount of the two fluorescent proteins?

Slide 3 shows the results of one of their experiments. This is an overlay of micrographs of a group of E. coli cells growing on a semi-solid gel under the microscope. These cells all grew from a single “ancestor” at the start of the experiment so they are genetically identical. The colours show the relative amounts of the two fluorescent proteins present in each cell: green represents protein 1 and red represents protein 2. Cells that are coloured yellow contain approximately equal amounts of proteins 1 and 2. It is clear from this image that these “identical” cells are different colours, showing that they are very far from identical in their levels of production of the fluorescent proteins. Elowitz et al. also showed that cells that produce the reporter proteins at low levels (small number of molecules) have much more “noisy” levels of expression than cells that produce the proteins at high levels (a large number of molecules). This is what we would expect if differences between cells are caused by small molecule number noise since s = 1/√N is larger for small N.


Are the differences between cells shown in the image of Slide 3 really caused by small molecule number noise in the chemical reactions involved in protein production (transcription and translation)? Or are the different colours caused by differences between the cells? For example, we can see in Slide 3 that some cells are short because they have just been generated, while others are much longer and are about to divide. Perhaps this affects the level of protein expression? Cells could also contain different concentrations of RNA polymerase or ribosomes, which would cause them to produce more or less fluorescent protein.

3.2.1 Intrinsic and extrinsic noise

To explore the origins of the different amounts of the proteins, Elowitz et al. used two fluorescent proteins (in different colours) instead of just one. Within each cell, the genes encoding the two proteins should experience the same cell volume, RNA polymerase, ribosome concentration, etc. So if the differences in protein expression are caused by differences between cells, the levels of the two colours should be correlated – cells with a lot of protein 1 should also have a lot of protein 2. However, if chemical reaction stochasticity is responsible for the differences in protein expression, we would not expect the levels of protein 1 and protein 2 in individual cells to be correlated. This is illustrated in Slide 4.


In fact, by measuring the amount of correlation between the levels of proteins 1 and 2 in individual cells in their experiments, Elowitz et al. could measure how much of the cell-to-cell variation is caused by differences between cells (which they called extrinsic noise) and how much is caused by chemical reaction stochasticity (which they called intrinsic noise). In their experiments, both sources of noise played a significant role.

Why does it matter that genetically identical cells can have different levels of protein expression? One reason is that biochemical noise limits how precisely cells can control their own behaviour. If a cell needs to control precisely the concentration of a particular protein, either it must produce a large number of molecules (which is expensive) or it must use a biochemical control circuit (such as a negative feedback loop) to reduce the noise.

On a more positive note, biochemical noise may actually be useful for cells in some cases. For example, bacterial populations are often exposed to environmental stress (attack by antibiotics, changes in food availability, etc). If all of the cells in the population are identical in their protein composition, the stress may wipe them all out; but if there is large variability in protein composition among cells, it is possible that a few cells will happen to have the right protein levels to survive the stress. The population can then regrow from these cells once the stress is over.

3.3 Theory of noise

For stochastic chemical reactions, we cannot predict exactly which reaction will happen when, or which cell in a population will contain which exact numbers of molecules of proteins, mRNA, etc. However, we can make predictions about probability distributions. For example, we might predict the probability that a randomly selected cell in a population will have 100 molecules of a particular protein, even though we cannot predict which cell this will be. The quantity we are interested in is therefore p(N, t): the probability that our system contains N molecules of protein P at time t.

3.3.1 “Birth–death” model for gene expression

We can write down an equation for p(N, t) for the simple "one-step model" of gene expression that we discussed in Lecture 2, in which we include chemical reactions for protein production and degradation:

Equation 3
Equation 3


Equation 4
Equation 4


We assume that these reactions are "Poisson processes". This means that if we observe the system for a very short time interval from time t to time t + dt, the probability that Reaction 3 happens will be

Prob(Reaction 3) = kdt,

Equation 5


while the probability that Reaction 4 happens in this same time interval will be

Prob(Reaction 4) = μNdt,

Equation 6

where N is the number of molecules of protein P, since the more P molecules there are, the more likely it is that this reaction will happen somewhere in the system during the time interval tt + dt.

How does the probability, p(N), of having N molecules change during the time interval tt + dt? To determine this, we need to think about how the system can enter and leave the state of “having N molecules”. To get N molecules, the system could have (a) previously had (N – 1) molecules and gained one more in a production reaction or (b) previously had (N + 1) and lost one in a degradation reaction. These are the only ways in which the system can enter the “state of having N molecules”. However, it can also leave this state if it already has N molecules and either (a) another one is produced (then it will have N +1 ) or (b) one is degraded (then it will have N – 1).

This is illustrated in Slide 5. Here, the vertical bars represent the probability of having a particular number of molecules and the arrows represent how the number of molecules is changed by the protein production and degradation reactions. In our very small time interval, tt + dt, the probability, p(N, t), increases due to the possibility of reactions happening from states (N – 1) or (N + 1) to N, and it decreases due to the possibility of reactions from state N to (N – 1) or (N + 1).


We can write all of this down using Equations 5 and 6 and summing all of the probabilities to generate an equation (Slide 6) called the chemical master equation (Equation 7).

Equation 7
Equation 7



Let us suppose that we are only interested in the probability distribution, p(N), after a long time, once the system has reached its steady state. In that case, we have

Equation 8
Equation 8


Applying this condition to Equation 7, we can solve it to give

Equation 9
Equation 9


This is the famous Poisson distribution. We can see easily that it satisfies Equation 7 by noticing that for Equation 9,

Equation 10
Equation 10


and

Equation 11
Equation 11


Slide 6 shows the probability distribution, p(N), plotted for different values of (k/µ). We can see that as (k/µ) increases, the average number of molecules increases. The mean and standard deviation σN of the distribution p(N) are given by

Equation 12
Equation 12


Equation 13
Equation 13


(The derivation can be set as an exercise – see the tutorial questions linked to this lecture.) We can estimate the importance of stochastic effects by the ratio of the standard deviation to the mean,

Equation 14
Equation 14


This explains why at the beginning of this lecture we stated that small molecule number noise becomes important when the inverse square root of the number of molecules is close to one.

3.3.2 A more realistic model for gene expression

The model that we have just been considering may be too simple. In reality, the production of protein from a gene does not happen in a single step. We can make our model slightly more realistic by making a two-step model that includes both transcription and translation. The reaction scheme for this model would be

Source → M

Equation 15


M → Sink

Equation 16


M → M + P

Equation 17




Equation 18


Here, M represents mRNA and P represents protein. It is also possible to write down a chemical master equation for this model, and to solve it for the steady state probability distribution. In this case, there is a probability distribution for the number of messenger RNA molecules as well as for the number of protein molecules. For mRNA, we only need to consider Equations 15 and 16, which are identical to Equations 3 and 4 from our simpler model. (Equations 17 and 18 do not change the number of mRNA molecules.). So we expect the probability distribution for the number of mRNA molecules to be a Poisson distribution. However, Equations 17 and 18, which control the production and degradation of protein, are now different from our simple model. This means that the probability distribution of protein may be different from a Poisson distribution in this model.

Slide 7 shows the protein number probability distribution for this model. We set the parameters (translation rate/mRNA decay rate) so that five proteins are made on average per mRNA molecule (although some mRNA molecules will produce more and some less). We can compare this with the previous one-step model by fixing the transcription rate so that the average protein number is the same in both models. The results are shown in Slide 7: we can see immediately that the distribution is broader in the two-step model. This model predicts more noisy protein expression than the one-step model. The reason for this is that the extra chemical reaction step amplifies the noise: the number of mRNA molecules is itself noisy, and then on top of this each mRNA molecule can produce a variable number of proteins.


3.4 Visualising noise in gene expression

How can we test whether these are good models for noisy gene expression in real cells? One way to do this is actually to carry out single molecule experiments, in other words to watch, under the microscope, the production of single protein molecules in individual cells. Since protein molecules are very small, this is a very challenging task. However, in 2006, Yu et al. managed to design an appropriate experiment (Slide 8). They made a strain of E. coli that produced a yellow fluorescent protein attached to a polypeptide (a chain of amino acid molecules), which could anchor this complex in the cell’s lipid membrane. When the fluorescent protein is anchored in the membrane, it diffuses around much less, making it easier to see single molecules under the microscope. In this system, using advanced fluorescent microscopy, it is possible to see individual fluorescent protein molecules as dots within the cell membrane. Yu et al. could then grow cells under the microscope and track the moments when individual dots appeared in the membrane. In this way, they could see the production of individual protein molecules in real time. To keep the protein numbers low, the researchers included a binding site for the Lac repressor protein (see Lecture 1: Introduction to regulatory networks). When this repressor protein is bound to the operator site in front of the gene that encodes the fluorescent protein, no protein will be produced.


Slide 9 shows some of Yu et al.’s results. The bacterial cells in the series of images grow from a single cell during the experiment. The yellow dots show individual protein molecules bound to the cell membrane. By tracking the appearance of these dots, Yu et al. were able to monitor the moments when protein molecules appeared in the membrane. This was done for different cell lineages, as shown in the plot, which indicates the number of protein molecules that were produced in a 3 min interval. The dotted vertical lines show the moments when the cell divided into two daughter cells.


What's really striking about Yu et al.’s results is that for most of the time, no protein molecules are being produced. Protein production occurs in short bursts, with long intervals where nothing happens. This is probably because most of the time the Lac repressor protein is bound to the DNA, thereby preventing protein expression. The bursts of expression take place during the rare moments when a stochastic fluctuation causes the repressor to fall off its DNA binding site. Yu et al.’s setup therefore allows us to see stochastic chemical reactions happening inside biological cells, in real time and at single-molecule resolution.

3.5 Noise in other cell functions

This lecture has focused on noise in gene expression, but the stochasticity of chemical reactions is also important in many other cell functions. Single-molecule experiments have revealed the effects of biochemical noise in the molecular machines that drive the flagellar motor that allows cells to swim and in the bacterial membrane receptors that sense environmental gradients. Other experiments have found important effects of biochemical noise in the development of fruitfly embryos and the mechanisms that control whether or not cells proliferate. It seems that noise is everywhere.