martyn amos: August 2007

Friday, August 31, 2007

My Edinburgh talk

I had a wonderful time at the Edinburgh Book Festival over the weekend; a full venue and books to sign afterwards makes for a happy author! Here is a lightly edited version of what I had to say.

In 1959, a great personal hero of mine, the Nobel Prize-winning physicist Richard Feynman gave a visionary talk entitled “There's Plenty of Room at the Bottom”. In his speech, Feynman outlined the possibility of individual molecules, even individual atoms making up the component parts of computers in the future. Remember, this was back when computers filled entire rooms, and were tended by teams of lab-coated technicians, so the idea that you could compute with individual molecules was pretty outlandish. I was struck by a quotation in Oliver's book, attributed to the microbiologist A. J. Kluyver, who said, over fifty years ago, that “The most fundamental character of the living state is the occurrence in parts of the cell of a continuous and directed movement of electrons.” At their most basic, level computers work in exactly the same way; by funnelling electrons around silicon circuits, so I think this hints at the linkages between biology and computers that are only now coming to fruition.

Indeed, it wasn't until 1994 that someone demonstrated, for the first time, the feasibility of building computers from molecular-scale bits. Feynman's vision had waited, not only for the technology to catch up, but for a person with the required breadth of understanding and the will to try something slightly bizarre. That person was Len Adleman, who won the computer science equivalent of the Nobel Prize for his role in the development of the encryption scheme that protects our financial details whenever we buy something on the Internet. Len has always had an interest in biology; when one of his students showed him a program that could take over other programs and force them to replicate it, Len said “Hmmm.... that looks very much like how a virus behaves.” The student was Fred Cohen, author of the first ever computer virus, and Len's term stuck. (Update, 2/9/07: Cohen made the first reference to a "computer virus" in an academic article, but did not write the first virus).

One night in the early 90's, Len was lying in bed reading a classic molecular biology textbook. He came across the section describing a particular enzyme inside the cell that reads and copies DNA, and he was struck by its similarity with an abstract device in computer science known as the Turing Machine. By bringing together two seemingly disparate concepts, Adleman knew at once that, in his own words, “Geez, these things could compute.”

He found a lab at the University of Southern California, where he is a professor, and got down to building a molecular computer. He knew that DNA, the molecule of life that contains the instructions needed to build every organism on the planet, from a slug to.... John Redwood can be thought of as a series of characters from the set A, G, C and T, each character being the first letter of the name of a particular chemical. The title of the film Gattaca, which considers a dystopian future in which genetic discrimination defines a society, is simply a string of characters from the alphabet A, G, C and T.

As Oliver highlights in his own book, molecular biology has always been about the transformation of information, usually inside the living cell. This information is coded in the AGCT sequences of genes and in the proteins that these genes represent. Adleman immediately saw how this mechanism could be harnessed, not to represent proteins, but to store digital data, just like a computer encodes a file as a long sequence of zeroes and ones.

Adleman decided to use this fact to solve a small computational problem. Some of you might have heard of the Travelling Salesman Problem, and Adleman's was a variant of that; given a set of cities connected by flights, does there exist a sequence of flights that starts and ends at particular cities, and which visits every other city only once? This problem is easy to describe, but fiendishly difficult to solve for an even relatively small number of cities. This inherent difficulty is what made the problem interesting in Adleman's eyes, “interesting” being, to a mathematician, a synonym for “hard”.

Len decided to build his computer using the simplest possible algorithm; generate all possible answers (right or wrong), and then throw away the wrong ones. He would build a molecular haystack of answers, and then throw away huge swathes of hay encoding bad answers until he was left with the needle encoding the correct solution (of which there may be just a single copy). For Adleman, the key to his approach was that you can make DNA in the laboratory. A machine the size of a microwave oven will sit in a lab connected to four pots, each containing either A, G, C or T. Type in the sequence you require, and the machine gets to work, threading the letters together like molecular beads on a necklace, making trillions of copies of your desired sequence.

Adleman ordered DNA strands representing each city and each flight for his particular problem. Because DNA sticks together to form the double helix in a very well-defined way, he chose his sequences carefully, such that city and flight strands would glue together like Lego blocks to form long chains, each chain encoding a sequence of flights. Because of the sheer numbers involved, he was pretty sure that a chain encoding the single correct answer would self-assemble. The problem then was to get it out. In a way, Len had built a molecular memory, containing a huge file of lines of text. What he then had to do was sort the file, removing lines that were too long or too short, that started or ended with the wrong words, or which contained duplication. He used various standard lab techniques to achieve this, and, after about a week of molecular cutting and sorting, he was left with the correct solution to his problem.

The example that he solved could be figured out in a minute by a bright 10-year-old using a pen and paper. But that wasn't the point. Adleman had realised, for the first time, Feynman's vision of computing using molecules. After he published his paper, there was a flood of interest in the new field of DNA computing, a tide on which I was personally carried. The potential benefits were huge, since we can fit a vast amount of data into a very small volume of DNA. If you consider that every cell with a nucleus in your body contains a copy of your genome - 3 gigabytes of data, corresponding to 200 copies of the Manhattan phone book – you begin to understand just how advanced nature is in terms of information compression. Suddenly my 4 gig iPod nano doesn't look quite so impressive.

After a few years, though, people began to wonder if molecular computing would ever be used for anything important. They were looking for the “killer application”, the thing that people are willing to pay serious money for, like the spreadsheet, that persuaded small businesses to buy their first ever computer. The fundamental issue with Adleman's approach is tied to the difficulty of the problem; as the number of cities grows only slightly, the amount of DNA required to store all possible sequences of flights grows much more quickly; a small increase in the number of cities quickly leads to a requirement for bathtubs full of DNA, which is enough to induce hysterical laughter in even the sanest biologist. Indeed, it was estimated that if Len's algorithm were to be applied to a map with 200 cities in it, the DNA memory required to store all possible routes would weigh more than the Earth.

It would appear that DNA computing has reached the end of the line, if we are to insist on applying it to computational problems in a head-to-head battle against traditional silicon-based computers. Let's be straight, you're never going to be able to go into PC World and buy a DNA-based computer any time soon. When DNA computing first emerged as a discipline, I was dismayed to see a rash of papers making claims that within a few years we'd be cracking military codes using DNA computers and building artificial molecular memories vastly larger than the human brain. I was dismayed because I knew what had happened 30 years previously to the embryonic field of artificial intelligence. Again, hubristic claims were made for their discipline by the young Turks, ranging from personal robot butlers to automated international diplomacy. When the promised benefits failed to materialise, AI suffered a savage backlash in terms of credibility and funding, from which it is only just beginning to recover. I was very keen to avoid the same thing happening to molecular computing, but I, like many others, knew that we needed to look beyond simply using DNA as a tiny memory storage device.

The next key breakthrough was in realising that, far from being simply a very small storage medium that can be manipulated in a test tube, within its natural environment – the cell – DNA carries meaning. As the novelist Richard Powers observes in The Gold Bug Variations, “The punched tape running along the inner seam of the double helix is much more than a repository of enzyme stencils. It packs itself with regulators, suppressors, promoters, case-statements, if-thens.” Computational structures, that is. DNA encodes a program that controls its own execution. DNA, and the cellular machinery that operates on it, pre-dates electronic computers by billions of years. By re-programming the code of life, we may finally be able to take full advantage of the wonderful opportunities offered by biological wetware.

As Oliver observes in his book, “The world is not just a set of places. It is also a set of processes.” This nicely illustrates the shift in thinking that has occurred in the last few years since the human genome has been sequenced. The notion of a human “blueprint” is outdated a useless. A blueprint encodes specific locational information for the various components of whatever it's intended to represent, whether it be a car or a skyscraper. Nowhere in the human genome will you find a section that reads “place two ears, on on either side of head” or “note to self: must fix design for appendix.” Instead, genes talk to one another, turning each other (and often themselves) on and off in a complex molecular dance. The genome is an electrician's worst nightmare, a tangle of wiring and switches, where turning down a dimmer switch in Hull can switch off the Manhattan underground system.

The human genome project (and the many other projects that are sequencing other organisms, from the orang-utan to the onion) is effectively generating a biological “parts catalogue”; a list of well-understood genes, whose behaviour we can predict in particular circumstances. This is the reductionist way of doing science; break things down, in a top-down fashion, into smaller and smaller parts, through a series of levels of description (for example, organism, molecule, atom). The epitome of this approach is the very well-funded physicists smashing together bits of nature in their accelerators in an attempt to discover what some call the God Particle.

Of course, smashing together two cats and seeing what flies off is only going to give you a limited understanding of how cats work, and it'll probably annoy the cats, so the reductionist approach is of limited use to biologists. Systems biology has emerged in recent years to address this, by integrating information from many different levels of complexity. By studying how different biological components interact, rather then just looking at their structure, as before, systems biologists try to understand biological systems from the bottom up.

An even more recent extension of systems biology is synthetic biology. When a chemist discovers a new compound, the first thing they do is break it down into bits, and the next thing they do it try to synthesise it. As Richard Feynman said just before his death, “What I cannot build I cannot understand.” Synthetic biologists play, not with chemicals, but with the genetic components being placed daily in the catalogue. It's where top down meets bottom up – break things down into their genetic parts, and then put them back together in new and interesting ways. By stripping down and rebuilding microbial machines, synthetic biologists hope to better understand their basic biology, as well as getting them to do weird and wonderful things. It's the ultimate scrapheap challenge.

If we told someone in the field of nanotechnology that we had a man-made device that doesn't need batteries, can move around, talk to its friends and even make copies of itself – and all this in a case the size of a bacterium – they would sell their grandmother for a glimpse. Of course, we already have such devices available to us, but we know them better as microbes. Biology is the nanotechnology that works. By modelling and building new genetic circuits, synthetic biologists are ushering in a new era of biological engineering, where microbial devices are built to solve very pressing problems.

As Oliver notes towards the end of his book, the planet is facing a very real energy crisis. One team is therefore trying to build a microbe to produce hydrogen. Another massive problem facing the developing world is that of arsenic contamination in drinking water. A team here in Edinburgh, made up mainly of undergraduates, has built a bacterial sensor that can quickly and easily monitor arsenic concentrations from a well sample, to within safe tolerances. Jay Keasling, a colleague in California has recently been awarded 43 million dollars by the Bill and Melinda Gates Foundation to persuade E. coli to make substances that are alien to them, but which provide the raw ingredients for antimalarial drugs. The drug is found naturally in the wormwood plant, but it's not cheap – providing it to 70 per cent of the malaria victims in Africa would cost $1 billion, and they can be repeatedly infected. It's been estimated that drug companies would need to cover the entire state of Rhode Island in order to grow enough wormwood, so Keasling wants to produce it in vats, eventually at half the cost.

There are, of course, safety issues with synthetic biology, as well as legal and ethical considerations. I worry that people have this idea that the bugs we use are snarling microbes that have to be physically restrained for fear of them erupting from a Petri dish into the face of an unfortunate researcher, like something from the Alien movies. In reality, the bacteria used in synthetic biology experiments are docile creatures, pathetic even, the crack addicts of the microbial world. They have to be nurtured and cossetted, fed a very specific nutrient brew. Like some academics, they wouldn't last two minutes in the real world. Of course, nature has a habit of weeding out the weak and encouraging the fit, so we still have to be very careful and build in as many safeguards as are practical. The potential for using synthetic biology for weaponry is, to my mind, overstated. As one of the leading researchers said to me, “If I were a terrorist looking to commit a bio-based atrocity, there are much cheaper and easier ways to do it than engineering a specific microbe – anthrax, say.” Synthetic biology will not, in the foreseeable future, return many “bangs per buck”.

Many of the legal concerns centre on the patenting of gene sequences. This was going on well before synthetic biology, but it recently hit the headlines when Craig Venter, head of the private corporation that tied with the Human Genome Project, announced that they intended to patent a synthetic organism.

We must remember that Venter is, first and foremost, a businessman, and it is very much in his interests to keep his company in the public eye. The scientific rationale for some of these patents is not immediately clear. But we should also remember that, for every Craig Venter, there are probably ten or more Jay Keaslings, placing their research in the public domain and working in an open and transparent fashion for the greater good.

On that positive note, I'd like to thank you for listening, and I'll stop there.

Friday, August 24, 2007

My contribution to the synthetic biology debate

You may recall that the Royal Society is soliciting opinions on various aspects of the field of synthetic biology. What follows is a lightly edited version of my own submission, which I sent off today.

In what follows, I highlight some concerns and dangers, speaking as someone who has an definite interest in the field flourishing (and would therefore wish to see these concerns addressed).

1. Terminology

The first concern is over the term “synthetic biology” itself. The two main issues are “what does it mean?” and “what does it cover?” As pointed out at the BBSRC workshop, clinicians have used the term for a while to refer to prosthetic devices. In attempting to offer a fixed definition of the term, the community runs the risk of becoming overly exclusive at a premature stage. However, there is also a risk that “synthetic biology” will become a “catch-all” term that is too loosely applied. The emphasis on the term “biology” may also serve to alienate mathematicians, physicists, computer scientists and others, who may (wrongly) feel that they have no expertise to offer a “biological” discipline. As a counter-example, witness the success of the field of bioinformatics, which would appear to fairly represent the disciplinary expertise in the field (in terms of the general composition of the term, rather than the relative lengths of its components). As a very crude experiment, I searched in Google for both “computational biology” and “bioinformatics”; the first term returned around 1,530,000 hits, the second around 14,000,000.

This leads on to the issue of “language barriers”. This is always an issue in any new field that involves the collision of two or more (often very dissimilar) disciplines. Being seen to publically ask “stupid questions” is a daunting prospect to most young scientists, and yet many of the major breakthroughs have occurred through just that. This opens up the wider debate on inter-disciplinarity in 21st century science, and how we might best prepare its practitioners. Do we give students a broad, shallow curriculum to allow them to make connections, without necessarily having the background to “drill deeper” if required, or do we stick to the “old model” of “first degree” and subsequent training? My own intuition is that it is far better to intensively train in a single field at the outset, and then offer the opportunity to “cherry pick” topics from a different discipline at a later stage. This educational debate is, however, not one that should be the sole preserve of synthetic biology!

2. Expectation Management

Even when biologists and (say) computer scientists can agree a suitable shared terminology, there is still the risk of a mismatch occurring in terms of expectations of what might be achieved. For example, the notion of “scalability” might mean very different things to a computer scientist and a microbiologist. To the former, it means being able to increase by several orders of magnitude the number of data items processed by an algorithm, or double the (already vast) number of transistors we may place on the surface of a computer chip. To a biologist, the idea of scalability might currently be very different:

“What's needed to make synthetic biology successful, Rabaey said, are the same three elements that made microelectronics successful. These are a scalable, reliable manufacturing process; a scalable design methodology; and a clear understanding of a computational model. "This is not biology, this is not physics, this is hard core engineering," Rabaey said.

In electronics, photolithography provides a scalable, reliable manufacturing process for designs involving millions of elements. Biology has a long way to go. What's needed, Rabaey said, is a way to generate thousands of genes reliably in a very short time period with very few errors. The difference between what's available and what's needed is about a trillion to one.”

3. Conceptual Issues

As the leading nanotechnologist (and FRS) Richard Jones has pointed out, his field was dominated from an early stage by often inappropriate analogies with mechanical engineering (e.g., cogs). It may well be that case that we are in danger of the same thing happening with synthetic biology, where computer scientists impose rigid circuit/software design principles on "softer", more “fuzzy” substrates. Jones quotes, on his blog, an article in the New York Times:

“Most people in synthetic biology are engineers who have invaded genetics. They have brought with them a vocabulary derived from circuit design and software development that they seek to impose on the softer substance of biology. They talk of modules — meaning networks of genes assembled to perform some standard function — and of “booting up” a cell with new DNA-based instructions, much the way someone gets a computer going.”

4. Complexity

The issue of "grey goo" has persistently dogged the field of nanotechnology, and it would be tempting to dismiss similar criticisms of synthetic biology as well-intentioned but ultimately uninformed. However, if synthetic biologists are to avoid the mistake that researchers in GM research made (that is, to appear arrogant and dismissive, leading to mass public protest and restrictive legislation), then we should acknowledge and address the very real possibility of the biological systems under study behaving in very unpredictable ways. Anyone who has any degree of contact with studying biosystems will understand the notion of complexity; components that are connected in an unknown fashion behave in unpredictable ways, which may include evasion of any control mechanisms that have been put in place. As Douglas Kell and his colleagues have observed, it is perfectly possible to alter parameters of a system on an individual basis, and see no effect, only to observe wild variations in behaviour when exactly the same tweak is applied to two or more parameters at the same time. Working in an interdisciplinary fashion may address this issue, at least in part, if modellers work closely with bench scientists in a cycle of cooperation. Once again invoking the issue of scalability, studying the behaviour of complex biosystems through modelling alone will quickly become infeasible, due the the combinatorial explosion in the size of the search space (of parameter values). By actually making or modifying the systems under study in the lab, the problem may be reduced to manageable proportions.

5. Hype

In my own book, Genesis Machines (Atlantic Books, 2006), I illustrate the risk of promising too much at an early stage by describing the story of the “AI winter”. In the 1960s, researchers in artificial intelligence (AI) had promised human-level intelligence “in a box” within twenty years. By issuing such wild predictions, AI researchers set themselves up for a monumental fall, and, when the promised benefits failed to accrue, funding was slashed and interest dwindled. This AI winter (by analogy with “nuclear winter”) affected the field for over 15 years, and it would be disappointing (to say the least) if the same thing were to happen to synthetic biology.

Hubristic claims for synthetic biology should be avoided wherever possible; without singling out particular groups, I have already seen several predictions (again, often conflated with ambitions) that have absolutely no realistic chance of coming to fruition in any meaningful time-scale (if at all). In this more “media savvy” age, perhaps practitioners in synthetic biology might benefit, as their AI counterparts did not, from media training (I have personally benefited (June 2004) from the course provided by the Royal Society, and perhaps the Society might consider a “mass participation” version for new entrants to the field).

Friday, August 17, 2007

For the love of ants

To be published next week, one book on my Amazon wishlist is titled The Ants Are My Friends. Students of popular music may recognise the phrase as one of the great misheard lyrics of our time, up there with "Beelzebub had a devil for a sideboard", rather than an expression of insect infatuation (the response being, of course, "blowing in the wind").

But I rather like the idea of ants being my friends. I've always held these misunderstood creatures in high regard, and was charmed by the story, recounted in Surely You're Joking, Mr. Feynman! (p. 91 in the Vintage edition), of how Richard Feynman investigated ant trail-following behaviour in his Princeton accomodation. He eventually used his findings to persuade an ant colony to leave his larder; "No poison, you gotta be humane to the ants!"

Anyone who has ever watched an ant colony at work cannot fail to be entranced by its beauty and efficiency. A single colony can strip an entire moose carcass in under two hours, and their work is coordinated in an inherently decentralised fashion (that is, there is no "head ant" giving out orders). An ant colony can be considered as a class of "super-organism", that is, a "virtual" organism made up of many other single organisms. Other examples include bacterial colonies and (arguably) the Earth itself.

Ants communicate remotely by way of pheromones, chemicals that generate some sort of response amongst members of the same species. When ants forage for food, they lay a particular pheromone on the ground once they've found a source. When this signal is detected by other ants, they follow the trail and reinforce it by laying pheromone themselves. Chemical signals also evaporate over time, which allows colonies to "forget" good solutions (i.e., paths) and construct new solutions if the environment changes (e.g., a stone falls onto an existing path).

By describing this mechanism in abstract terms, computer scientists have managed to harness the power of positive feedback in order to solve difficult computational problems. Perhaps the leading scientist in the field of ant colony optimization (ACO) is Marco Dorigo, and he has described how to use models of artificial ants to solve the problem of how to route text messages through a busy network of mobile base stations. We've also done some initial work on how ants build spatial structures, using an abstract model of pheromone deposition to explain how certain species can construct "bullseye"-like patterns of differently-sized objects.

Fundamentally, ongoing work in ACO reflects a wider interest in the notion of decentralised control. Rather than controlling everything from "on high" with global instructions, "bottom up" control emphasises the value of small, local interactions in keeping systems running smoothly. Software packages such as Netlogo have brought so-called agent-based modelling to a wider audience. I've just taken on a Ph.D. student to study the evacuation of tall buildings using this approach, and it's clear that, with ever-increasing computational power being available, the notion of simulating large systems of interacting entities will gain increasing influence.

Genesis Machines in the USA

I'm delighted to report that Atlantic have signed a deal to publish Genesis Machines in the USA. It's slated to appear on April 3rd of next year, and will be published by the Overlook Press (preorder here).

Friday, August 10, 2007

Molecules and Marx

My publisher kindly sends me copies of reviews of Genesis Machines that appear from time to time in the press. I was quite surprised to see the book featured in the June issue of the Marxist Review, the monthly theoretical magazine of the Workers Revolutionary Party. In his article, William Westwell invokes Richard Dawkins as the contemporary cheerleader of arch-reductionism and mechanical materialism. But, by concentrating purely on the first half of the book (which, by its very nature is comprised largely of historical background), Westwell ignores one of its fundamental arguments: that 21st century science cannot succeed by insisting on the top-down, reductionist paradigm. Science is still, to a large extent, a reductionist enterprise, but the emerging field of systems biology is providing a complementary approach (in a way, occupying the region where top-down meets bottom up). By arguing for a notion of "quality of computation", Westwell reminded me of conversations I have enjoyed in the past with Brian Goodwin, who has argued that "Biology is returning to notions of space-time organisation as an intrinsic aspect of the living condition... They are now described as complex networks of molecules that somehow read and make sense of genes. These molecular networks have intriguing properties, giving them some of the same characteristics as words in a language. Could it be that biology and culture are not so different after all; that both are based on historical traditions and languages that are used to construct patterns of relationship embodied in communities, either of cells or of individuals?" Unfortunately, Westwell appears to have ignored the later detailed discussion of such matters.