Friday, May 25, 2007

DNA hash pooling

The draft paper that came out of our trip to Paris has now been lodged with the arXiv e-print server.

DNA Hash Pooling and its Applications

Dennis Shasha (Courant Institute, New York University), Martyn Amos (Computing and Mathematics, Manchester Metropolitan University)

Abstract: In this paper we describe a new technique for the characterisation of populations of DNA strands. Such tools are vital to the study of ecological systems, at both the micro (e.g., individual humans) and macro (e.g., lakes) scales. Existing methods make extensive use of DNA sequencing and cloning, which can prove costly and time consuming. The overall objective is to address questions such as: (i) (Genome detection) Is a known genome sequence present at least in part in an environmental sample? (ii) (Sequence query) Is a specific fragment sequence present in a sample? (iii) (Similarity Discovery) How similar in terms of sequence content are two unsequenced samples?

We propose a method involving multiple filtering criteria that result in "pools" of DNA of high or very high purity. Because our method is similar in spirit to hashing in computer science, we call the method DNA hash pooling. To illustrate this method, we describe examples using pairs of restriction enzymes. The in silico empirical results we present reflect a sensitivity to experimental error. The method requires minimal DNA sequencing and, when sequencing is required, little or no cloning.

Available at http://www.arxiv.org/abs/0705.3597.

No comments: