We’ve all heard of DNA, and probably know that it’s ‘something to do with our genes’. But what actually is DNA, and what does it do? At the level of chemistry, DNA - or deoxyribonucleic acid, to give it its full name – is a collection of carbon, hydrogen, oxygen, nitrogen and phosphorus atoms, joined together to form a large molecule. There is nothing that special about the atoms found in a molecule of DNA: they are no different from the atoms found in the thousands of other molecules from which the human body is made. What makes DNA special, though, is its biological role: DNA stores information – specifically, the information needed by a living organism to direct its correct growth and function.
But how does DNA, simply a collection of just a few different types of atom, actually store information? To answer this question, we need to consider the structure of DNA in a little more detail. DNA is like a long, thin chain – a chain that is constructed from a series of building blocks joined end-to-end. (In fact, a molecule of DNA features two chains, which line up side-by-side. But we only need to focus on one of these chains to be able to understand how DNA stores its information.)
There are only four different building blocks; these are represented by the letters A, C, G and T. (Each building block has three component parts; one of these parts is made up of one of four molecules: adenine, cytosine, guanine or thymine. It is these names that give rise to letters used to represent the four complete building blocks themselves.) A single DNA molecule is composed of a mixture of these four building blocks, joined together one by one to form a long chain – and it is the order in which the four building blocks are joined together along the DNA chain that lies at the heart of DNA’s information-storing capability.
The order in which the four building blocks appear along a DNA molecule determines what we call its ‘sequence’; this sequence is represented using the single-letter shorthand mentioned above. If we imagine that we had a very small DNA molecule that is composed of just eight building blocks, and these blocks were joined together in the order cytosine-adenine-cytosine-guanine-guanine-thymine-adenine-cytosine, the sequence of this DNA molecule would be CACGGTAC.
The biological information stored in a DNA molecule depends upon the order of its building blocks – that is, its sequence. If a DNA sequence changes, so too does the information it contains. On reflection, this concept – that the order in which a selection of items appears in a linear sequence affects the information stored in that sequence – may not be as alien to us as it might first seem. Indeed, it is the concept on which written communication is based: each sentence in this blog post is composed of a selection of items – the letters of the alphabet – appearing in different sequences. These different sequences of letters spell out different words, which convey different information to the reader. And so it is with the sequence of DNA: as the sequence of the four building blocks of DNA varies, so too does the information being conveyed. (You may well be asking how the information stored in DNA is actually interpreted – how it actually determines how an organism develops and functions – but that’s a topic for a different blog post.)
You may be wondering how on earth just four different building blocks can come together with such variety to capture all the information needed to direct the growth of a living organism. Well, let’s pause for a moment to look back at our eight-letter DNA sequence: CACGGTAC. Notice that there isn’t an equal mix of the four different building blocks: cytosine appears three times, adenine and guanine twice, and thymine just once. There isn’t a rule that says a C must always be followed by an A, or that a T must always be followed by a G. Instead, any of the four building blocks can appear as the next link in a DNA chain (or, put another way, each of the four building blocks of DNA can appear at each position along a DNA chain).
Imagine we had a DNA molecule just two blocks long. Even with this tiny molecule, being able to draw upon four different building blocks at each of the two positions along the two-block chain makes 16 different molecules possible:
AA AC AT AG
CA CC CT CG
TA TC TT TG
GA GC GT GG
If we scaled this up to our eight-block molecule, we’d find that a whopping 65536 different combinations would be possible. (I won’t write them all out.)
When we consider that the DNA in a human cell is made up of around 3 billion of the four building blocks joined in sequence (that’s 3,000,000,000), we can begin to imagine just how much variety is actually possible – and all from just four starting ingredients. This is a trend we see throughout the natural world: of relative simplicity giving rise to quite remarkable complexity – as complex and sophisticated as the living organisms we each represent.
This post first appeared on the OUPblog