This blog has been created partly as a companion to Chemistry for the Biosciences, the textbook that I co-author with Tony Bradshaw, and to act as an archive of posts I write for other sites (particularly the OUPblog). Like the book itself, it explores how life on the scale of atoms and molecules has an impact on biology - at the scale of cells, tissues, and organisms - and seeks to demystify a range of biological and chemical concepts.

The blog's name takes as its inspiration the cover of the first edition of Chemistry for the Biosciences, which depicts a gecko seemingly clinging to its surface. To find out what links geckos to chemistry, read this.

Sunday, 1 May 2011

How is the information in a gene used by a cell?

In my last two posts I’ve introduced the notion that DNA acts as a store of biological information; this information is stored in a series of chromosomes, each of which are divided into a number of genes. Each gene in turn contains one ‘snippet’ of biological information. But how are these genes actually used? How is the information stored in these genes actually extracted to do something useful (if ‘useful’ isn’t too flippant a term for something that the very continuation of life depends upon).

Many (but not all) genes act as recipes for a family of biological molecules called proteins: they literally tell the cell what the ingredients for a particular protein are, and how they should be combined to create the protein itself. (Proteins have a range of essential roles in the human body. Some act as building materials for different components of the body, such as the keratin we find in our hair and nails. Others act as molecular transporters: haemoglobin, which is found in our red blood cells, carries oxygen from our lungs to other parts of the body. A family of proteins called the enzymes are arguably the most important, however. Enzymes cajole different chemicals in our body into reacting with one another. Without enzymes, our bodies would be unable to generate energy from the food we eat (and you’d not be reading this blog post).)

So, somehow, the information stored in a DNA molecule is deciphered by the cell and used as the recipe for a protein. But how?

To answer this question, let’s take a journey inside the cell. We can imagine a cell to be like a factory, but one that has been divided into a series of physically separated compartments. Unlike a factory filled with air, a cell is filled with a jelly-like fluid called the cytoplasm, which surrounds the various compartments enclosed within it. In an earlier post I likened a genome to a biological library. And, inside the cell, this library is stored within a particular compartment called the nucleus. 

I mentioned earlier that genes often act as recipes for proteins. But here comes a bit of a quandary: chromosomes – and the genes they contain – are locked away inside the cell’s nucleus. By contrast, proteins are manufactured by the cell in the cytoplasm, outside of the nucleus. So, for the genetic information to be used, it has to get out of nucleus and into the cytoplasm. How does this happen? Well, if we’re in a library with a book that contains information we really need, but we’re unable to take the book out of the library, we might make a photocopy of the page that holds the information we’re after. To get the information it needs out of the nucleus and into the cytoplasm the cell does something remarkably similar. The chromosome containing the gene of interest has to stay inside the nucleus, so the cell makes a copy of the gene – and that copy is then transported to where it is to be used: out of the nucleus and into the cytoplasm. 

The copy of the gene generated during this cellular photocopying is made not of DNA but of a close cousin called RNA. RNA is made of three of the same building blocks as DNA – A, C and G. Instead of the T found in DNA, however, RNA uses a different block represented by the letter U (for ‘uracil’). Despite this difference in building material, RNA stores biological information in the same way as DNA – by joining the building blocks together in a long chain (whereby the information is ‘coded’ in the ordering of the building blocks along the chain).

Let’s return to our cell, where a cellular photocopy has been made of a particular portion of a chromosome, a portion containing one particular gene, and it has been transported to the cytoplasm. We then face our next quandary: how is the information contained in our photocopy deciphered and used as a recipe for a protein?  

To answer this question we need to know what a protein is made of – whereupon we stumble upon a not-coincidental similarity with DNA and RNA. Proteins are also made of a series of building blocks joined together to form a long chain. However, the building blocks themselves are quite different. Unlike the four ‘letters’ of DNA and RNA – A, T, C and G for DNA, and A, U, C and G for RNA – proteins are made from twenty different building blocks, called amino acids. The role of the information stored in our gene (which has now been transferred to our cellular photocopy, RNA) is to determine the identity of each amino acid along the protein chain.

So, how does this happen? Enter a special biological machine called a ‘ribosome’. The ribosome is a protein assembly line: it constructs a protein by attaching one amino acid to another to form a chain-like structure. But it doesn’t just pick amino acids at random – picking any one of the twenty amino acids available to it, and bolting it to the end of the chain that it’s currently building. Instead, it ‘reads’ the RNA molecule – the ‘photocopy’ of our gene – and uses the information it contains to work out which amino acid should be added next. (I like to think of the ribosome as a modern-day Pac Man: just as the Pac Man of the 1980s computer game chomped its way along a string of blobs on the screen, the ribosome physically chomps its way along the RNA molecule, ‘reading’ the letters in sequence as it goes.)

But it’s not as straightforward as the ribosome reading the sequence of the RNA molecule one at a time,  and each letter of the RNA molecule referring neatly to one particular amino acid (after all, there are only four ‘letters’ in the RNA recipe, but twenty amino acids).  Instead, the ribosome scans the sequence of letters in the RNA molecule in groups of three; this three-letter barcode tells the ribosome which amino acid it needs to add next. So, a ribosome would encounter the sequence GCCUCAUGC and would read it three letters at a time to add the following three amino acids in sequence:
Alanine – Serine – Cysteine
[GCC]      [UCA]    [UGC]

This journey – from gene to protein – has brought us face-to-face with what some might consider the Holy Grail of biology. The ‘cipher’ used by the ribosome to decode a three-letter RNA sequence and translate it into the identity of an amino acid is called the ‘genetic code’ – a code so universal to life that it is used by every living organism on the planet, from the daffodils flowering in the garden as I write, to the cow whose milk is in the tea I’m currently drinking. It is a code that takes the simplicity of just four building blocks and opens it up into the complexity of life that each one of us represents.