Happy birthday, DNA

Of pub lions and trilingual books

Corrado Nai 11 December 2013

Twisted: the road to proteins is convoluted

In 1953, biologists suffered from a peculiar form of dyslexia: they could recognize the alphabet, but could not work out how words were built nor how they convey meaning

Editor's note: We are pleased to present the first of a three-part essay in celebration of 60 years since the discovery of the double helix.

Big shots in molecular biology are inspired in the lab as much as over a pint, seemingly. Or “dancing naked” on some harder stuff.

It all started 60 years ago, when on February 28th, 1953, two rookie scientists showed up at The Eagle pub in Cambridge to announce they just had uncovered the “secret of life” [1]. 24-year-old Dr James Watson and 36-year-old postgraduate student Francis Crick had a lot to celebrate: they had just taken a shortcut to finding the structure of DNA [2] and built a giant model that signalled the beginning of a new scientific era. Watson and Crick’s revelation down at The Eagle was the spark that a couple of months later, on April 25th, ignited the big bang of molecular biology when three seminal papers were published in parallel in the journal Nature proposing the structure of DNA [3-5].

That DNA is the universal language in which life is encoded is something everyone has heard of. But why it is so, and why is its structure so important? And how are today’s scientists exploiting DNA to go beyond biology?

Let’s assume that organisms are libraries, and living cells, books. If, as Galileo said, the Universe is a book written in mathematical language using symbols like triangles, circles and other geometrical figures, then the book of life is written in a much simpler, yet repetitive and seemingly monotonous language. But let’s start from the beginning.

Books have not only different sizes and covers but, of course, different texts. In 1953, biologists already thought that the text – the DNA (deoxyribonucleic acid) – was the core and most crucial part of those books, but they also suffered from a peculiar form of dyslexia: they could recognize the alphabet, but could not work out how words were built nor how they convey meaning.

At the time much was already known about DNA, even though its significance had been overlooked for a long time. If its 60th birthday is celebrated this year, then its gestation was much longer. Before Watson & Crick’s bold announcement, scientists were aware that DNA is the hereditary material carrying the genetic information [6,7]; they also knew that it is a chain made of the four chemical "letters" A, T, G, and C (the abbreviations used for the crucial components of DNA, the nucleotide bases), held together by a backbone made of a sugar molecule and phosphate [8,9]; also, researchers recognized that the letter A occurs with the same frequency as the letter T, and G with the same as C, indicating a specific pairing mechanism [10].

Mendel, Miescher, Levene, Chargaff and Avery, the scientists pioneering such work, were molecular biology’s bookbinders, preparing the ground for understanding books. But there was still a substantial gap: books were black boxes written in an unknown language in which you could pick out text and identify letters without grasping much more.

But then came Watson and Crick with their model of the 3D structure of DNA, showing how DNA is formed by two strands twisted in a double helix [1, 3]. Because of pairing of the bases (A with T, G with C) the two strands are complementary, and since they run in opposite directions they are antiparallel.

With their model they thus put together – literally! – the pieces of the big puzzle giving birth to molecular biology, which had not been possible just with our previous knowledge of DNA chemistry. The structure of DNA was prerequisite for understanding fundamental biological processes, for example how DNA replicates and is transmitted during cell division, cleverly hinted in their paper [3] and thoroughly discussed in a follow-up paper [11]. Since the two strands in DNA are complementary they are the exact copy of one another, so both act as a template when pulled apart during DNA replication, resulting in two identical daughter DNA molecules. Immediately intuitive also was the fact that DNA is, like all biological molecules, not static but instead “open” [3], in other words able to interact with other molecules via its grooves.

Watson and Crick, along with Wilkins with whom they shared the Nobel Prize in 1962, were crafted linguists who uncovered how words were built, bridging the chemical structure of DNA with its biological function. Another outstanding linguist though, Rosalind Franklin [1], was more than a “possible collaborator” [2]; she actually involuntarily dictated to Watson & Crick how to build their model with her X-ray crystallography data [12-13] – making you wonder whether the key linguists were in fact skilled amanuenses.

Yet whilst their contribution was crucial for studying the “meaning of life”, much of the books still looked like they were filled with placeholder text.

The subsequent years were spent by scientists around the world, in particular a group calling itself the “RNA Tie Club” (including Francis and Crick), deciphering the genetic code and investigating how the information stored in DNA is used by cells to produce proteins – what ultimately led to the formulation of the “central dogma of molecular biology”. This “dogma” postulates that there is a flux of information from DNA to protein (a speculation already proposed by cat-loving physicist Erwin Schrödinger some years before in his book What is Life? [14]), which makes up the actual machinery in living cells.

Proteins have structural functions, perform enzymatic reactions or in turn synthesize other major components of cells. One major problem in cracking the genetic code, though, was that whilst DNA is written in a four-letter code (A, T, G and C), proteins are also strings of molecules, but in this case made up of 20 different amino acids.

So, how is the information stored on DNA expressed into proteins?

An illustration of the “central dogma of molecular biology” can be done with a bilingual book, like those books in which you have the original version on the left page, and a translation on the right one. You have on the left the DNA text, and on the right the protein text. There’s not a 1-to-1 translation, but instead 3 letters of DNA (a “codon”) give 1 letter of protein (an amino acid). Now some math (it was actually a physicist in the “RNA Tie Club” that used mathematics to figure that out [15]): a three-letter codon, with four different letters (A, T, G, C), gives rise to 64 different codon combinations (4 x 4 x 4). To have a match then, this implies that several codons are a synonym, i.e. translated into the same amino acid. In other words, the genetic code is redundant or degenerate.

But things aren’t as simple as that. There is an intermediate between DNA and proteins: RNA. Image a trilingual book where the central page is written in RNA text. DNA and RNA are very similar and the former is transcribed almost 1 to 1 into the latter, but (i) instead of the base T there is U (a modification of T), (ii) RNA is single stranded, and (iii) the backbone holding the nucleotide bases together is, as in DNA, sugar and phosphate, but in the sugar a further functional group hydroxyl (-OH) is present. In RNA the codons (same as in DNA, but in RNA text) are the actual templates for the machinery devoted to protein synthesis, the ribosomes. The information on DNA is passed to RNA in the process called transcription, then from RNA to proteins during translation.

The “RNA Tie Club” and other scientists were the translators of molecular biology. Some got a ticket to Stockholm.

To be continued...


[2] (minutes 5:40, 6:50)