A Parallel DNA Algorithm Using A Micro Uidic Device To Build Scheduling Grids
A Parallel DNA Algorithm Using A Micro Uidic Device To Build Scheduling Grids
A Parallel DNA Algorithm Using A Micro Uidic Device To Build Scheduling Grids
Articial Intelligence Department, Universidad Politcnica de Madrid, e Boadilla del Monte s/n, 28660 Madrid, Spain mgarnau@dia.fi.upm.es Articial Intelligence Department, Universidad Politcnica de Madrid, e Boadilla del Monte s/n, 28660 Madrid, Spain {dmanrique,arpaton}@fi.upm.es
Abstract. Microuidic systems, which constitute a miniaturization of a conventional laboratory to the dimensions of a chip, are expected to become the key support for a revolution in the world of biology and chemistry. This article proposes a parallel algorithm that uses DNA and such a distributed microuidic device to generate scheduling grids in polynomial time. Rather than taking a brute force approach, the algorithm presented here uses concatenation and separation operations to gradually build the DNA strings that represent a Multiprocessor Task scheduling problem grids. The microuidic device used makes for an autonomous system, also enabling it to solve the problem without the need of external control.
Introduction
In 1994 Leonard Adleman proved empirically what Richard Feynman had postulated several decades earlier: the chemical and electrical properties of matter give molecules the natural ability to make massively parallel calculations. So, for the rst time, Adleman used DNA strands to solve an instance of the Hamiltonian path problem on a 7-node graph [1]. This red the starting gun for a new branch of research known as biomolecular computing. A year later, Richard Lipton put forward a DNA computational model that generalized the techniques employed by Adleman, which he used to solve an instance of the SAT [2]. Since then, many researchers have exploited DNAs potential for solving computationally dicult problems (class NP-complete). Two good examples can be found in [3] and [4]. NP-complete problems have two prominent features: 1) there are as yet no polynomial algorithms to solve them, and 2) all their yes instances can be veried eciently [5]. Microuidic systems, also called microow reactors or lab-on-a-chip (LOC), are passive uidic devices built on a chip layer which is used as a substrate. They are basically composed of cavities or microchambers between which liquid can move along the microchannels that link them. Therefore, controlled chemical reactions can be carried out in each cavity independently, that is, in parallel.
J. Mira and J.R. Alvarez (Eds.): IWINAC 2007, Part I, LNCS 4527, pp. 193202, 2007. c Springer-Verlag Berlin Heidelberg 2007
194
These systems are tantamount to the miniaturization of a conventional laboratory to the dimensions of a chip and are expected to become the key support for a revolution in the world of biology and chemistry. New emerging disciplines, as synthetic biology and systems biology, demand computer scientists to contribute in the resolution of new challenging problems such as drug discovery, the understanding of computational processes in cells or the analysis of genetic pathways controlling biological processes in living organisms. In all those problems, microuidic systems implementing parallel algorithms may play an important role [6,7]. These needs have lead to a fast development of microuidic-based technology in the last years. For instance, a microuidic chip for automated nucleid acid purication from bacterial or mammalian cells is constructed in [8]. The use of microuidic systems to implement automated DNA sequencing devices is also been considered in [9]. Furthermore, the role played by valves and pumps in these kind of devices is studied in [10,11]. Finally in [12], the possibility of implementing microuidic memory and control devices is presented. Besides that, some work also exists using microuidic systems to attack computationally hard problems. Thus, a microow reactor is used in [13] to solve the Minimum Clique Problem using a brute force strategy and codifying each possible subgraph as a DNA strand. Furthermore, two microudic systems each implementing DNA algorithms are proposed in [14] and [15] for the Hamiltonian path problem and the shortest common superstring problem, respectively. Moreover, a microuidic DNA computer is used in [16] to solve the satisfability problem. The present work deals with a classic scheduling problem: optimal scheduling of tasks with precedence constraints in a multiprocessor scenario. This paper starts from the previous resolution of the problem of getting all independents sets of a dependency graph, and proposes a parallel DNA algorithm using a microuidic device to build optimal scheduling grids. To do so, it uses a constructive approach that has nothing to do with traditional brute force strategies, gradually putting together correct solutions and removing invalid results along the way. The remainder of this article is organized as follows. Section 2 presents the problem. Section 3 describes the parallel algorithm and the microuidic support system. An example of how the proposed system runs is given in section 4. Finally, section 5 sets out the nal remarks.
Let T be a set of tasks with single execution times. Let G(T, E) be a direct acyclical graph (dag), which we will term dependency graph that establishes a partial order on T . Each graph node represents a task, each arc a relation of precedence between two tasks. If there is a path between two tasks ti and tj , tj is said to be a descendant of ti and, therefore, its execution will not be able to start until ti has been run (ti tj ). Finally, let the positive integers M and d be the number of the processors and the scheduling deadline, respectively. A schedule for the tasks of T on M processors that respects the partial order established by the graph G is a function f : T Z + that satises [17]: Z
195
(1) (2)
This schedules end time will be maxti T f (ti ). A schedule of |T | tasks on M processors that respects the dependency graph can actually be represented as a scheduling grid A of size N M , (with N |T |) where each element A(k, mr ) = ti , (1 k N ), (1 r M ) indicates that task ti has been scheduled to run on processor mr at time k (ti T , where the symbol represents the null process). A scheduling grid A will be valid if it meets the following conditions: a) it does not contain repeated elements (with exception of the null process ), b) it contains all the tasks in the set T (it is complete) and c) it satises both the constraints expressed in (1) and (2). A valid grid will be optimal if it also maximises the schedules parallelism, that is, it has the least possible number of rows Figure 1.
Fig. 1. Dependency graph associated with the set of tasks T = {1, 2, 3, 4, 5} together with an optimal scheduling grid for a two-processor scenario m1 and m2
To achieve such maximum parallelism in a schedule, it is essential to observe the partial order between the tasks established by the graph G. We know that if a task tj is the descendant of another task ti , then tj will not be able to be executed in parallel with (or, of course, before) ti . If we call two tasks without a relation of precedence in the partial order independent, then the maximum set of tasks that can be run in parallel with a given task ti is made up of the maximum independent set of tasks containing ti . Therefore, the greatest possible parallelism between all the problem tasks is given by all the maximal independent sets obtained from the dependency graph. An independent set si is maximal if there is no other greater set v such that si v. Getting all these si is an NP-complete problem. There are multiple classical algorithms that have addressed its solution as it has also been found to be a problem besetting some genomics eorts, such as mapping genome data [18]. However, the proposals for solving it from the molecular or membrane computing paradigms are very few. For instance, [19] proposes a P system with active membranes to solve it. Recently, our group has developed a DNA algorithm to solve the minimum clique cover problem for graphs, which is an equivalent problem, albeit on the complementary graph [20].
196
The algorithm proposed below is based on this result, that is, on getting the maximal independent sets si from the dependency graph to design a system capable of building optimal scheduling grids for our problem. For the example graph of Figure 1, those sets are: s1 = {1, 2}, s2 = {1, 3, 4} and s3 = {5}.
3
3.1
We use a single DNA sequence to codify each task of the set T . A scheduling grid A is represented as a task sequence, composed of N subsequences of M elements each, where N is the schedule size (number of rows in A) and M is the number of processors. An example is shown in Figure 2.
Fig. 2. Representation of a scheduling grid in a DNA strand by means of row concatenation. (N = 3 and M = 2).
The present algorithm builds the possible scheduling grids in parallel (concatenation operation), removing any that are invalid as they are detected (separation operation). Both operations are based on DNAs natural property of hybridization through complementarity. Additionally, these operations are supported in this case not by test tubes but by the microuidic devices microchambers and microchannels around which the scheduling grids under construction circulate. We propose a two-layer system architecture S. It is composed of three dierent subsystems Si , each of which is associated with one of the maximum independent sets si of G. In turn, each subsystem Si contain several interconnected nodes vj , which correspond to the tasks of their associated set si . If |si | < M , a new element v , corresponding to the null process, is added to the subsystem Si . This way, symbols can be used to ll the grid positions that cannot house any other task in the set T . Figure 3 a) illustrates this point. The system S works as follows. All its subsystems Si operate in parallel and synchronously to add whole rows to the scheduling grids they receive. Once the respective row has been added, each Si sends the resulting grids to their neighbouring subsystems which continue the operation. This process is done N times until the grid is complete (with N |T |). Within a subsystem Si , a row is generated in M steps that are taken by its nodes vj working in parallel to add, one by one, each of the M tasks of which that row is composed. The total number of steps that it takes to build the scheduling grids in the system S is,
197
Fig. 3. a) Diagram of the two-layer system S associated with the example in Fig. 1. It is composed of three subsystems each with several nodes: S1 = {v1 , v2 }, S2 = {v1 , v3 , v4 }, S3 = {v5 , v }. b) Design of the microuidic device for the system S shown in Fig. 3 a). Each element vj is composed of a lter chamber Fj and a concatenation chamber Cj . Additionally, all the subsystems Si have an inlet chamber Ij and an outlet chamber Oi .
therefore, N M . The system stops when it detects that the scheduling grids already contain all the tasks of the set T . For this to be possible, those grids (DNA strands) need to be sequenced after each iteration of system S. In the following, we detail how the microuidic device has been designed to implement the functionality described. Each of the elements vj making up a subsystem Si is composed of a lter chamber Fj and a concatenation chamber Cj . Additionally, each subsystem Si has an inlet chamber Ii and an outlet chamber Oi . All the chambers linked by microchannels have pumps to circulate the uid in the direction of the arrows (Fig. 3 b)). The function and connectivity of each chamber type of the proposed device is: F ilter Chamber (Fj ): This chamber receives the strings from the chambers Cp of all the neighbouring vp or from the inlet chamber Ii of the subsystem Si . It retains the strings that already contain the task associated with vj (repeats) or any of its descendent tasks. It sends the other strands to its associated concatenation chamber Cj . Concatenation Chamber (Cj ): This chamber receives the strings from its associated Fj chamber. It concatenates the task associated with vj with the strings. It sends the resulting strands to the chambers Fp of all its neighbouring vp or to the outlet chamber Oi of the subsystem Si . Inlet Chamber (Ii ): This chamber receives and concentrates the ow of strands from the chambers Or of the neighbouring subsystems Sr . This way all ltering chambers Fj of the the subsystem Si are loaded orderly.
198
Outlet Chamber (Oi ) : This chamber concentrates the strands from all the concatenation chambers Cj of the subsystem Si . This way those strands can be sent orderly to the inlet chambers Ir of the neighbouring subsystems Sr . 3.2 Algorithm
It is assumed that the ltering and concatenation operations take one unit of time. The outputs of each chamber Fj are considered to be available at the inlet to the Cj chambers in that unit of time. Furthermore, the concentration time of all the outputs of the chambers Cj in the outlet chamber Oi is assumed to be one, as is the distribution time of the content of the inlet chamber Ii to all the lter chambers Fj of the subsystem Si : 1. Step (t = 0). Initial system loading: Put enough strands matching the task associated with vj and its descendent tasks into the lter chambers Fj of each Si , (except in F ). Put enough copies of the strand of the task associated with vj , and enough copies of the auxiliary strands and enzymes to allow concatenation into the concatenation chambers Cj of each subsystem Si . Put enough copies of the strands of all the tasks associated with vj of that subsystem into the inlet chambers Ii of each subsystem Si . 2. Steps (t = 1) to (t = N ). While the strings are incomplete: F or (n = 1) to (n = M ) do Computation: For all elements vj of all subsystems Si in parallel, do a lter operation in Fj(n) and a concatenate operation in Cj(n) . Internal communication: For all vj of all subsystems Si in parallel, pump the results of chambers Cj(n) to the chambers Fp(n+1) of their neighbouring vp : Input(Fp (n+1) ) =
p=j
Output(Cj (n) )
Load chambers Oi with the content of all chambers Cj of all subsystems Si in parallel: Cj(t) Oi(t) =
j
Communication between subsystems: Load the chambers Ii of all the subsystems Si with the ow from the chambers Or of the neighbouring subsystems Sr in parallel. Divide the content of Ii among the chambers Fj of all their vj : Or(t) Ii(t+1) =
r=i
199
3. Step (t = N + 1). System output: The chambers Oi already contain the strings of length N M , with all the tasks of T . The system stops and returns the resulting schedules output. Output(S) = Oi(t)
i
This is, therefore, a polynomial algorithm in terms of number of tasks and processors. It takes N M steps (with N |T |) to build the problem-solving grids.
Example
The system completes a total of 32 = 6 iterations before stopping and returning the problem-solving scheduling grid: t0 (Initial loading of Ii and Fi ) I1 (t0 ) = {1, 2} I2 (t0 ) = {1, 3, 4} I3 (t0 ) = {5, }
t1 (Execution of the subsystems Si and nal loading of Oi ) S1 F1 = I1 (t0 ) C1 = {21} F2 = I1 (t0 ) C2 = {12} S2 F1 = I2 (t0 ) C1 = {31, 41} F3 = I2 (t0 ) C3 = {13, 43} F4 = I2 (t0 ) C4 = {14, 34} S3 F5 = I3 (t0 ) C5 = {5} F = I3 (t0 ) C = {5}
O3 (t1 ) = {5, 5}
t2 (Communication between subsystems Si (loading of Ii and of Fj ). Execution of subsystems Si ) I1 (t2 ) = {31, 41, 13, 43, 14, 34, 5, 5} I2 (t2 ) = {12, 21, 5, 5} I3 (t2 ) = {12, 21, 31, 41, 13, 43, 14, 34} S1 F1 = I1 (t2 ) C1 = {431, 341} F2 = I1 (t2 ) C2 = {52, 52} S2 F1 = I2 (t2 ) C1 = {} F3 = I2 (t2 ) C3 = {123, 213} F4 = I2 (t2 ) C4 = {124, 214} S3 F5 = I3 (t2 ) C5 = {125, 215, 315, 415, 135, 435, 145, 345} F = I3 (t2 ) C = {12, 21, 31, 41, 13, 43, 14, 34}
200
t3 (Execution of the subsystems Si and nal loading of Oi ) S1 F1 = C2 (t2 ) C1 = {} F2 = C1 (t2 ) C2 = {} S2 F1 = C3 (t2 ) + C4 (t2 ) C1 = {} F3 = C1 (t2 ) + C4 (t2 ) C3 = {1243, 2143} F4 = C1 (t2 ) + C4 (t2 ) C4 = {1234, 2134} S3 F5 = C (t2 ) C5 = {125, 215, 315, 415, 135, 435, 145, 345} F = C5 (t2 ) C = {125, 215, 315, 415, 135, 435, 145, 345}
O1 (t3 ) = {} O2 (t3 ) = {1243, 2143, 1234, 2134} O3 (t3 ) = {125, 215, 315, 415, 135, 435, 145, 345, 125, 215, 315, 415, 135, 435, 145, 345} t4 (Communication between subsystems Si (loading of Ii and of Fj ). Execution of subsystems Si ) I1 (t4 ) = {1243, 2143, 1234, 2134, 125, 215, 315, 415, 135, 435, 145, 345, 125, 215, 315, 415, 135, 435, 145, 345} I2 (t4 ) = {125, 215, 315, 415, 135, 435, 145, 345, 125, 215, 315, 415, 135, 435, 145, 345} I3 (t4 ) = {1243, 2143, 1234, 2134} S1 S2 S3 F1 = I1 (t4 ) F1 = I2 (t4 ) F5 = I3 (t4 ) C1 = {} C1 = {} C5 = {12435, 21435, 12345, 21345} F2 = I1 (t4 ) F3 = I2 (t4 ) F = I3 (t4 ) C2 = {} C3 = {} C = {1243, 21431234, 2134} F4 = I2 (t4 ) C4 = {} t5 (Execution of subsystems Si and nal loading of Oi ) S1 F1 = C2 (t4 ) C1 = {} F2 = C1 (t4 ) C2 = {} S2 F1 = C3 (t4 ) + C4 (t4 ) C1 = {} F3 = C1 (t4 ) + C4 (t4 ) C3 = {} F4 = C1 (t4 ) + C3 (t4 ) C4 = {} S3 F5 = C (t4 ) C5 = {12435, 21435, 12345, 21345} F = C5 (t4 ) C = {12435, 21435, 12345, 21345}
O1 (t5 ) = {} O2 (t5 ) = {} O3 (t5 ) = {12435, 21435, 12345, 21345, 12435, 21435, 12345, 21345} t6 (End of execution. Results output) When the algorithm nished, we got eight strings codifying eight combinations of the problem-solving scheduling grid:
201
Concluding Remarks
This article proposes a parallel algorithm that uses DNA and a distributed microuidic device to generate scheduling grids in polynomial time. Microuidic systems are passive uidic devices built on a chip layer which is used as a substrate. They contain cavities, microchannels, pumps and valves that allow controlled chemical reactions to be carried out independently, that is, in parallel. The algorithm described in this paper takes a constructive approach based on concatenation and lter operations to get optimal scheduling grids. This uses fewer strings than would be necessary if we tried to generate those grids by brute force. Although, from the computational point of view, a tough combinatorial problem has to be solved beforehand, this algorithm constitutes an interesting approach to the possibilities brought by microuidic systems inherent parallelism. From now on, with the advent of synthetic and systems biology, computer scientists biologists and chemists will deal toguether with the resolution of new and challenging problems, in which microuidic systems implementing parallel algorithms may play an important role. The evolution of these systems since Adlemans and Liptons early experiments prove that they constitute a promising and highly interesting technology for implementing distributed DNA algorithms. The structure of microuidic systems can be exploited to design topologies, and these topologies can then be used to implement automated bioalgorithms on a miniaturized scale, as shown in this paper.
Acknowledgements
This research has been partially funded by the Spanish Ministry of Science and Education under project TIN2006-15595.
References
1. Adleman, L. M.: Molecular computation of solutions to combinatorial problems. Science. 266 (1994) 10211024 2. Lipton, R. J.: DNA solution of hard computational problems. Science. 268 (1995) 542545 3. Ouyang, Q., Kaplan, Peter D., Liu, S., Libchaber, A.: DNA Solution of the Maximal Clique Problem. Science. 278 (1997) 446449 4. Sakamoto, K., Gouzu, H., Komiya, K., Kiga, D., Yokohama, S., Yokomori, T., Hagiya, M.: Molecular Computation by DNA Hairpin Formation. Science. 288 (2000) 12231226
202
5. Garey, M. R., and Johnson, D. S.: Computers and Intractability, A Guide to the theory of NP-completeness. W. H. Freeman, San Francisco (1979). 6. Dittrich, P. S., and Manz, A.: Lab-on-a-chip: microuidics in drug discovery. Nature Reviews Drug Discovery. 5 (2006) 210218 7. David, N. Breslauer, Philip, J., Lee and Luke P. Lee: Microuidics-based systems biology. Molecular Biosystems. 2 (2006) 97112 8. Hong, J. W., Studer, V., Hang, G., Anderson, W. F., Quake, S. R.: A nanoliterscale nucleic acid processor with parallel architecture. Nature Biotechnology. 22 (2004) 435439 9. Kartalov, E. P., Quake, S. R.: Microuidic device reads up to four consecutive base pairs in DNA sequencing-by-synthesis. Nucleic acids research. 32 (2004) 28732879 10. Grover, W. H., Mathies, R. A.: An integrated microuidic processor for single nucleotide polymorphism-based DNA computing. Lab Chip. 5 (2005) 10331040 11. Van Noort, D., Landweber, L. F.: Towards a re-programmable DNA Computer. Natural Computing. 4 (2005) 163175 12. Groisman, A., Enzelberger, M., Quake, S. R.: Microuidic memory and control devices. Science. 300 (2003) 955958 13. McCaskill, J. S.: Optically programming DNA computing in microow reactors. Biosystems. 59 (2001) 125138 14. Ledesma, L., Pazos, J., Rodr guez-Patn, A.: A DNA Algorithm for the Hamilo tonian Path Problem Using Microuidic Systems. In N. Jonoska, Gh. Paun and G. Rozenberg, Eds., Aspects of Molecular Computing - Essays dedicated to Tom Head on the occasion of his 70th birthday, LNCS 2950, Springer-Verlag. (2004) 289296 15. Ledesma, L., Manrique, D., Rodr guez-Patn, A.: A Tissue P System and a DNA o Microuidic Device for Solving the Shortest Common Superstring Problem. Soft Computing. 9 (2005) 679685 16. Livstone, M., Weiss, R., Landweber, L.: Automated Design and Programming of a Microuidic DNA Computer. Natural Computing. 5 (2006) 113 17. Papadimitriou, C. H. and Steiglitz, K.: Combinatorial optimization: algorithms and complexity. Prentice-Hall (1982) with Ken Steiglitz; second edition by Dover (1998). 18. Butenko, S. and Wilhelm, W. E.: Clique-detection Models in Computational Biochemistry and Genomics. European Journal of Operational Research (2006). To appear. 19. Head, T.: Aqueous simulations of membrane computations. Romanian J. of Information Science and Technology. 5 (2002) 355364 20. Garc a-Arnau, M., Manrique, D., Rodr guez-Patn A.: A DNA algorithm for solvo ing the Maximum Clique Cover Problem. Submitted.