martyn amos: My Edinburgh talk

I had a wonderful time at the Edinburgh Book Festival over the weekend; a full venue and books to sign afterwards makes for a happy author! Here is a lightly edited version of what I had to say.

In 1959, a great personal hero of mine, the Nobel Prize-winning physicist Richard Feynman gave a visionary talk entitled “There's Plenty of Room at the Bottom”. In his speech, Feynman outlined the possibility of individual molecules, even individual atoms making up the component parts of computers in the future. Remember, this was back when computers filled entire rooms, and were tended by teams of lab-coated technicians, so the idea that you could compute with individual molecules was pretty outlandish. I was struck by a quotation in Oliver's book, attributed to the microbiologist A. J. Kluyver, who said, over fifty years ago, that “The most fundamental character of the living state is the occurrence in parts of the cell of a continuous and directed movement of electrons.” At their most basic, level computers work in exactly the same way; by funnelling electrons around silicon circuits, so I think this hints at the linkages between biology and computers that are only now coming to fruition.

Indeed, it wasn't until 1994 that someone demonstrated, for the first time, the feasibility of building computers from molecular-scale bits. Feynman's vision had waited, not only for the technology to catch up, but for a person with the required breadth of understanding and the will to try something slightly bizarre. That person was Len Adleman, who won the computer science equivalent of the Nobel Prize for his role in the development of the encryption scheme that protects our financial details whenever we buy something on the Internet. Len has always had an interest in biology; when one of his students showed him a program that could take over other programs and force them to replicate it, Len said “Hmmm.... that looks very much like how a virus behaves.” The student was Fred Cohen, author of the first ever computer virus, and Len's term stuck. (Update, 2/9/07: Cohen made the first reference to a "computer virus" in an academic article, but did not write the first virus).

One night in the early 90's, Len was lying in bed reading a classic molecular biology textbook. He came across the section describing a particular enzyme inside the cell that reads and copies DNA, and he was struck by its similarity with an abstract device in computer science known as the Turing Machine. By bringing together two seemingly disparate concepts, Adleman knew at once that, in his own words, “Geez, these things could compute.”

He found a lab at the University of Southern California, where he is a professor, and got down to building a molecular computer. He knew that DNA, the molecule of life that contains the instructions needed to build every organism on the planet, from a slug to.... John Redwood can be thought of as a series of characters from the set A, G, C and T, each character being the first letter of the name of a particular chemical. The title of the film Gattaca, which considers a dystopian future in which genetic discrimination defines a society, is simply a string of characters from the alphabet A, G, C and T.

As Oliver highlights in his own book, molecular biology has always been about the transformation of information, usually inside the living cell. This information is coded in the AGCT sequences of genes and in the proteins that these genes represent. Adleman immediately saw how this mechanism could be harnessed, not to represent proteins, but to store digital data, just like a computer encodes a file as a long sequence of zeroes and ones.

Adleman decided to use this fact to solve a small computational problem. Some of you might have heard of the Travelling Salesman Problem, and Adleman's was a variant of that; given a set of cities connected by flights, does there exist a sequence of flights that starts and ends at particular cities, and which visits every other city only once? This problem is easy to describe, but fiendishly difficult to solve for an even relatively small number of cities. This inherent difficulty is what made the problem interesting in Adleman's eyes, “interesting” being, to a mathematician, a synonym for “hard”.

Len decided to build his computer using the simplest possible algorithm; generate all possible answers (right or wrong), and then throw away the wrong ones. He would build a molecular haystack of answers, and then throw away huge swathes of hay encoding bad answers until he was left with the needle encoding the correct solution (of which there may be just a single copy). For Adleman, the key to his approach was that you can make DNA in the laboratory. A machine the size of a microwave oven will sit in a lab connected to four pots, each containing either A, G, C or T. Type in the sequence you require, and the machine gets to work, threading the letters together like molecular beads on a necklace, making trillions of copies of your desired sequence.

Adleman ordered DNA strands representing each city and each flight for his particular problem. Because DNA sticks together to form the double helix in a very well-defined way, he chose his sequences carefully, such that city and flight strands would glue together like Lego blocks to form long chains, each chain encoding a sequence of flights. Because of the sheer numbers involved, he was pretty sure that a chain encoding the single correct answer would self-assemble. The problem then was to get it out. In a way, Len had built a molecular memory, containing a huge file of lines of text. What he then had to do was sort the file, removing lines that were too long or too short, that started or ended with the wrong words, or which contained duplication. He used various standard lab techniques to achieve this, and, after about a week of molecular cutting and sorting, he was left with the correct solution to his problem.

The example that he solved could be figured out in a minute by a bright 10-year-old using a pen and paper. But that wasn't the point. Adleman had realised, for the first time, Feynman's vision of computing using molecules. After he published his paper, there was a flood of interest in the new field of DNA computing, a tide on which I was personally carried. The potential benefits were huge, since we can fit a vast amount of data into a very small volume of DNA. If you consider that every cell with a nucleus in your body contains a copy of your genome - 3 gigabytes of data, corresponding to 200 copies of the Manhattan phone book – you begin to understand just how advanced nature is in terms of information compression. Suddenly my 4 gig iPod nano doesn't look quite so impressive.

After a few years, though, people began to wonder if molecular computing would ever be used for anything important. They were looking for the “killer application”, the thing that people are willing to pay serious money for, like the spreadsheet, that persuaded small businesses to buy their first ever computer. The fundamental issue with Adleman's approach is tied to the difficulty of the problem; as the number of cities grows only slightly, the amount of DNA required to store all possible sequences of flights grows much more quickly; a small increase in the number of cities quickly leads to a requirement for bathtubs full of DNA, which is enough to induce hysterical laughter in even the sanest biologist. Indeed, it was estimated that if Len's algorithm were to be applied to a map with 200 cities in it, the DNA memory required to store all possible routes would weigh more than the Earth.

It would appear that DNA computing has reached the end of the line, if we are to insist on applying it to computational problems in a head-to-head battle against traditional silicon-based computers. Let's be straight, you're never going to be able to go into PC World and buy a DNA-based computer any time soon. When DNA computing first emerged as a discipline, I was dismayed to see a rash of papers making claims that within a few years we'd be cracking military codes using DNA computers and building artificial molecular memories vastly larger than the human brain. I was dismayed because I knew what had happened 30 years previously to the embryonic field of artificial intelligence. Again, hubristic claims were made for their discipline by the young Turks, ranging from personal robot butlers to automated international diplomacy. When the promised benefits failed to materialise, AI suffered a savage backlash in terms of credibility and funding, from which it is only just beginning to recover. I was very keen to avoid the same thing happening to molecular computing, but I, like many others, knew that we needed to look beyond simply using DNA as a tiny memory storage device.

The next key breakthrough was in realising that, far from being simply a very small storage medium that can be manipulated in a test tube, within its natural environment – the cell – DNA carries meaning. As the novelist Richard Powers observes in The Gold Bug Variations, “The punched tape running along the inner seam of the double helix is much more than a repository of enzyme stencils. It packs itself with regulators, suppressors, promoters, case-statements, if-thens.” Computational structures, that is. DNA encodes a program that controls its own execution. DNA, and the cellular machinery that operates on it, pre-dates electronic computers by billions of years. By re-programming the code of life, we may finally be able to take full advantage of the wonderful opportunities offered by biological wetware.

As Oliver observes in his book, “The world is not just a set of places. It is also a set of processes.” This nicely illustrates the shift in thinking that has occurred in the last few years since the human genome has been sequenced. The notion of a human “blueprint” is outdated a useless. A blueprint encodes specific locational information for the various components of whatever it's intended to represent, whether it be a car or a skyscraper. Nowhere in the human genome will you find a section that reads “place two ears, on on either side of head” or “note to self: must fix design for appendix.” Instead, genes talk to one another, turning each other (and often themselves) on and off in a complex molecular dance. The genome is an electrician's worst nightmare, a tangle of wiring and switches, where turning down a dimmer switch in Hull can switch off the Manhattan underground system.

The human genome project (and the many other projects that are sequencing other organisms, from the orang-utan to the onion) is effectively generating a biological “parts catalogue”; a list of well-understood genes, whose behaviour we can predict in particular circumstances. This is the reductionist way of doing science; break things down, in a top-down fashion, into smaller and smaller parts, through a series of levels of description (for example, organism, molecule, atom). The epitome of this approach is the very well-funded physicists smashing together bits of nature in their accelerators in an attempt to discover what some call the God Particle.

Of course, smashing together two cats and seeing what flies off is only going to give you a limited understanding of how cats work, and it'll probably annoy the cats, so the reductionist approach is of limited use to biologists. Systems biology has emerged in recent years to address this, by integrating information from many different levels of complexity. By studying how different biological components interact, rather then just looking at their structure, as before, systems biologists try to understand biological systems from the bottom up.

An even more recent extension of systems biology is synthetic biology. When a chemist discovers a new compound, the first thing they do is break it down into bits, and the next thing they do it try to synthesise it. As Richard Feynman said just before his death, “What I cannot build I cannot understand.” Synthetic biologists play, not with chemicals, but with the genetic components being placed daily in the catalogue. It's where top down meets bottom up – break things down into their genetic parts, and then put them back together in new and interesting ways. By stripping down and rebuilding microbial machines, synthetic biologists hope to better understand their basic biology, as well as getting them to do weird and wonderful things. It's the ultimate scrapheap challenge.

If we told someone in the field of nanotechnology that we had a man-made device that doesn't need batteries, can move around, talk to its friends and even make copies of itself – and all this in a case the size of a bacterium – they would sell their grandmother for a glimpse. Of course, we already have such devices available to us, but we know them better as microbes. Biology is the nanotechnology that works. By modelling and building new genetic circuits, synthetic biologists are ushering in a new era of biological engineering, where microbial devices are built to solve very pressing problems.

As Oliver notes towards the end of his book, the planet is facing a very real energy crisis. One team is therefore trying to build a microbe to produce hydrogen. Another massive problem facing the developing world is that of arsenic contamination in drinking water. A team here in Edinburgh, made up mainly of undergraduates, has built a bacterial sensor that can quickly and easily monitor arsenic concentrations from a well sample, to within safe tolerances. Jay Keasling, a colleague in California has recently been awarded 43 million dollars by the Bill and Melinda Gates Foundation to persuade E. coli to make substances that are alien to them, but which provide the raw ingredients for antimalarial drugs. The drug is found naturally in the wormwood plant, but it's not cheap – providing it to 70 per cent of the malaria victims in Africa would cost $1 billion, and they can be repeatedly infected. It's been estimated that drug companies would need to cover the entire state of Rhode Island in order to grow enough wormwood, so Keasling wants to produce it in vats, eventually at half the cost.

There are, of course, safety issues with synthetic biology, as well as legal and ethical considerations. I worry that people have this idea that the bugs we use are snarling microbes that have to be physically restrained for fear of them erupting from a Petri dish into the face of an unfortunate researcher, like something from the Alien movies. In reality, the bacteria used in synthetic biology experiments are docile creatures, pathetic even, the crack addicts of the microbial world. They have to be nurtured and cossetted, fed a very specific nutrient brew. Like some academics, they wouldn't last two minutes in the real world. Of course, nature has a habit of weeding out the weak and encouraging the fit, so we still have to be very careful and build in as many safeguards as are practical. The potential for using synthetic biology for weaponry is, to my mind, overstated. As one of the leading researchers said to me, “If I were a terrorist looking to commit a bio-based atrocity, there are much cheaper and easier ways to do it than engineering a specific microbe – anthrax, say.” Synthetic biology will not, in the foreseeable future, return many “bangs per buck”.

Many of the legal concerns centre on the patenting of gene sequences. This was going on well before synthetic biology, but it recently hit the headlines when Craig Venter, head of the private corporation that tied with the Human Genome Project, announced that they intended to patent a synthetic organism.

We must remember that Venter is, first and foremost, a businessman, and it is very much in his interests to keep his company in the public eye. The scientific rationale for some of these patents is not immediately clear. But we should also remember that, for every Craig Venter, there are probably ten or more Jay Keaslings, placing their research in the public domain and working in an open and transparent fashion for the greater good.

On that positive note, I'd like to thank you for listening, and I'll stop there.

martyn amos

Friday, August 31, 2007

My Edinburgh talk

No comments: