International Journal of Computational Intelligence Systems

Volume 14, Issue 1, 2021, Pages 228 - 237

Parallel DNA Algorithms of Generalized Traveling Salesman Problem-Based Bioinspired Computing Model

Authors
Xiaomin Ren1, Xiaoming Wang1, Zhaocai Wang1, *, ORCID, Tunhua Wu2, *
1College of Information, Shanghai Ocean University, Shanghai, 201306, P. R. China
2School of Information Engineering, Wenzhou Business College, Wenzhou, 325035, P. R. China
*Corresponding authors. Email: zcwang1028@163.com; appll188@163.com
Corresponding Authors
Zhaocai Wang, Tunhua Wu
Received 21 May 2020, Accepted 18 November 2020, Available Online 3 December 2020.
DOI
10.2991/ijcis.d.201127.001How to use a DOI?
Keywords
DNA computing; Adleman–Lipton model; Generalized traveling salesman problem; NP-hard problem
Abstract

Generalized traveling salesman problem (GTSP) is a classical combinatorial optimization problem, in which the optimization goal is the minimum route combination. Since the GTSP is a more complex problem than the traveling salesman problem (TSP), the GTSP can be considered an extension of the TSP. At present, compared with TSP, GTSP has been widely used in practice. In GTSP, n nodes are divided into m clusters, and the problem is divided into two categories according to different constraints. According to the second kind of the GTSP, we propose a parallel biochemical algorithm to solve it. We use deoxyribonucleic acid (DNA) biological strands to represent different vertices, point groups and weights, and get the optimal solutions to a series of different biochemical reaction combinations of DNA sequences. Compared with other algorithms, our algorithm reduces the time complexity to O(n2), and proves the feasibility.

Copyright
© 2021 The Authors. Published by Atlantis Press B.V.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

1. INTRODUCTION

It is well known that the generalized traveling salesman problem (GTSP) is a generalization of the traveling salesman problem (TSP), and therefore is a nondeterministic polynomial (NP) problem. TSP is a commodity salesman from a city to several cities to sell commodities, and he needs to back home after all. The question is how he should arrange the route to minimize the cost of the tour. In the language of graph theory, given a connected graph G (each side has nonnegative weight), a loop is required to visit each node only once and return the initial vertex, and then the total weight is minimized. Different from TSP, the vertices in GTSP are divided into groups and don't need to traverse every vertex. GTSP is generally divided into two categories: the first type requires that only one vertex in each point group be accessed (Figure 1) and the second type requires the loop to access at least one vertex of each point group (Figure 2). Then, the GTSP is defined as finding a special Hamilton cycle in a complete graph that accesses each point group and returns the weight of the initial vertices and the minimum. This paper discusses the second type of GTSP, which requires that point groups disjoint, and each vertex is accessed at most once.

Figure 1

The first kind of generalized traveling salesman problem (GTSP).

Figure 2

The second kind of generalized traveling salesman problem (GTSP).

The GTSP can be described as follows: let a complete undirected weighted graph G=(V,E,W), where V={v1,v2,,vn} is the vertex set, E={eij|vi,vjV} is the edge set, W={wij|wij0,wii=0,i,jV} is the weight set. In practical problems, weights can be interpreted as distance, time, etc. The set of points is divided into m groups V1,V2,,Vm, satisfy: ViVj= for all ij and V=V1V2Vm.

The object function of the GTSP is as follows:

Min D=i=1m1wpipi+1+wpmp1vpiVi

In the 1960s, GTSP was initially proposed by Henry-Labordere [1], Saksena [2], and Srivastava et al. [3] almost simultaneously. So far, there are many practical applications for this problem. Logistics system designs [4], email [5], route planning and naval fleet maritime supply planning [6] are all real-life scenarios of GTSP.

For the first type of GTSP solution, Henry-Labordere, Saksena and Srivastava [13] introduced the method of solving GTSP with a simple dynamic programming method. Still, the calculation efficiency of these methods was very low, which can only be used in the case of small problem scale. Then Laporte and Nobert [5] proposed an integer linear programming method with symmetric distance matrix and evaluated the performance of the algorithm using experimental data. This algorithm can solve larger problems than dynamic programming. Laporte et al. [7] discussed the asymmetric GTSP, using relaxation constraints to ensure access to all point groups, and then expanding or modifying the asymmetric TSP by branch and bound algorithm. This improved method increased the problem size to 104 vertices by calculating the results. Fischetti et al. [8] proposed a solution to accurately solve GTSP by using branch and cut, which provides the best solution for an instance of 442 vertices.

Since there are many sophisticated algorithms for TSP solution, some researchers convert GTSP into TSP and then solve it. The current optimization algorithms used to solve TSP problems, such as differential algorithm [9], particle swarm algorithm [10] and ant colony algorithm [11]. Among them, the difference algorithm is more suitable for solving continuous optimization problems and has strong stability. The particle swarm algorithm improves the local search ability of the algorithm when solving TSP, and can better obtain accurate solutions. The ant colony algorithm has also been improved in terms of convergence speed and solution accuracy. Many scholars focus on how to transform GTSP into TSP. For example, Noon and Beat [12] redefined the arc and arc cost of GTSP according to the rules and added a loop within the zero-cost cluster to convert GTSP into an equivalent TSP. Dimitrijevic and Saric [13] proposed to transform the GTSP into an asymmetric TSP with 2n cities, in which the route between the vertices of the same point group is directed, and this algorithm expanded the number of vertices to 2 times. Li and Zhao [14] proposed the method of separating the intersected GTSP and effectively transforming the separated GTSP. However, the results of these algorithms are not ideal, and the efficiency is too low when dealing with large-scale problems. And we need to solve the exact solution from the obtained TSP examples; otherwise, it will not be the optimal solution to GTSP.

An effective algorithm will run for a long time as the number of problem nodes or point groups increases. Later, researchers proposed several genetic algorithms (GA) to solve GTSP, and some of them combined local search algorithms with GA to obtain better-operating efficiency. The generalized chromosome genetic algorithm (GCGA) proposed by Wu et al. [15] is by using mixed coding to distinguish the vertices in different groups, and the algorithm generates various feasible solutions through some operations between chromosomes such as crossing and mutation. It then obtains the optimal solution by comparing the loop cost with the function. And in this paper, GCGA is used to solve some examples of processing simple geometric shapes in rectangular plates. In 2006, for the second type of GTSP, Zhao et al. [16] proposed a new GA to add virtual vertices to the generalized chromosome, in which the information of the shortest route between each pair of vertices is reserved in an additional matrix. Wang et al. [17] improved crossover and mutation operators based on the GCGA algorithm and designed a new coding method to obtain a hybrid chromosome genetic algorithm (HCGA). The HCGA uses binary mixed encoding to change the head crossing and increase the possibility of multiple offspring, so the global search effect is better. Snyder and Daskin [18] proposed an effective heuristic algorithm based on the GA. This algorithm improves the solution through random key encoding, and compared with other algorithms improves the quality and running time of the solution in solving the TSP. Ardalan et al. [19] solved the GTSP by making appropriate modifications to the imperialist competition algorithm, using the new coding changes the impact on the cluster and the node selection to produce different results. It has better performance than other algorithms in the benchmark tests of 27 and 40 instances. Kan and Zhang [20] designed an individual mutation method to change the operation of the ant colony optimization algorithm to solve GTSP, and reduced the execution frequency and time complexity of the routing algorithm. Smith and Imeson [21] conducted a research based on adaptive large neighborhood search heuristics. This algorithm has been implemented, and we can find a better solution based on various existing and new problems compared with well-known algorithm tests. Mehdi et al. [22] improved the breakthrough local search (BLS) meta-heuristic and GA to reduce the running time. Experiments show that the improved algorithm can successfully obtain the optimal solution from most research examples.

Up to now, there are many studies on the first kind of GTSP. There are few types of research on the second kind of GTSP. And with the expansion of the scale of the problem, it will become more difficult to solve the problem in polynomial time. On the other hand, DNA computing, as a new intelligent computing algorithm, has two important advantages: huge storage capacity and a large amount of parallelism. A large number of parallel operations can cause chemical reactions of small molecules, and billions of operations can be carried out simultaneously. As a carrier of information, DNA molecules have huge computational storage capacity. Compared with the exponential complexity of other classical algorithms, DNA computing can solve complex optimization problems in polynomial time complexity. Therefore, using DNA computing to solve GSTP will be a meaningful attempt.

We need to obtain an optimal route from the set of all possible routes, and this optimal solution usually has the following restrictions:

  1. All routes are continuous, starting from the specified vertex and ending;

  2. At least one vertex is accessed in each point group, and each vertex is accessed at most once;

  3. The sum of the weights of all routes is the smallest.

Figure 3 shows an example of a graph theory model for a GTSP, in which the eight vertices are divided into three different point groups. In Figure 3, we assume that the travel agent starts from the v1 node. Obviously, after logical calculation, the shortest routes that meet the requirements are: v1v7v6v1 and v1v6v7v1. The total length of the route is 5.

Figure 3

An example of the generalized traveling salesman problem (GTSP) with 8 vertices and 3 groups.

In this paper, for the second kind of GTSP, we propose a DNA algorithm based on the Adleman–Lipton model to solve the GTSP with O(n2) time computing complexity. Compared with the previous algorithms, the computational complexity of the algorithm is greatly reduced, and the feasibility can also be verified by theory and simulation experiments.

The rest of the paper is organized as follows: In Section 2, we review the development of DNA computing and introduce the specific operations of the Adleman–Lipton model. The algorithms and detailed operations of GTSP are presented in Section 3. In Section 4, the feasibility and time complexity of the algorithms are analyzed. Section 5 gives the simulation results of DNA algorithms. The advantages of the DNA algorithm research prospects are considered in Section 6.

2. BACKGROUND KNOWLEDGE

2.1. DNA Computing

DNA molecules in living cells are the leading stores of information. Countless years of biological evolution have improved the molecules and enzymes in the body that can both copy the information in DNA and pass it on to other molecules. DNA is a double-stranded polymer composed of tandem deoxynucleotide chains, where deoxynucleotides are composed of deoxyribose, phosphoric acid and nitrogen-containing bases. The diversity of DNA is due to the random arrangement of base pairs. Different nucleotides base by its definition, respectively for A (adenine), G (guanine), C (cytosine) and T (thymine). The four types of bases have Watson–Crick complementarity, that is, two relative bases conform to complementary characteristics, in which T matches A and C matches G. Base pairing is the basis of DNA double helix structure and the basis of replication, transcription and translation. Under the right conditions, two single-stranded DNA can form a double strand, and the number of nucleotides in a single strand is the length of the single-stranded DNA.

DNA computing emerged with the rise and development of molecular biology. Biological computing is done by taking advantage of the structure and function of DNA molecules and the parallel operations in their interactions. In the process of continuous development, the advantages of storage capacity and energy consumption gradually emerge. Massive parallelism occurs because molecules interact chemically in tiny volumes and can perform billions of operations simultaneously. Because of the DNA molecule as a carrier of information, calculation of storage capacity is huge. At the same time, the enzymes that perform DNA calculations have been produced over thousands of years of biological evolution and are highly energy efficient. DNA has a broad prospect of resource application. In 1994, Adleman [23] carried out a pioneering work using DNA computing to solve a well-known computational difficulty, namely the directed Hamiltonian cycle problem. As the number of variables increases, the problem becomes more and more difficult. But using DNA computing, the problem can be easy to be solved. The finding proves the feasibility of DNA computing to solve specific problems. Its novelty lies in setting a precedent for applying mathematical problems to calculations at the molecular level, and his pioneering work inspires people. After that, Lipton [24] solved the satisfiability problem based on Adleman's experiment. Qi Ouyang et al. [25] solved the maximum mass problem by using molecular biology techniques. Head et al. [26] solved the problem of maximum independent subset. So far, scientists devoted themselves to the research field of DNA computing and proposed many DNA computing models for NP problems in combinatorial optimization, such as the strand replacement model [2729], self-assembly model [30,31], tile model [32,33], nanoparticles model [3436], etc. GA can be used as a tunnel for DNA computing to transform complex optimization problems. Now, with the development of science and technology, GA has been combined with other evolutionary algorithms, such as the gravitational search algorithms (GSAs), particle swarm optimization (PSO) [37,38], etc. These hybrid algorithms have a better hierarchical search function and higher efficiency. At the same time, DNA computing has become the focus of new computing models, which can provide strong technical support for many problems that fail to have effective solutions.

2.2. The Adleman–Lipton Model

Suppose that a series of test tubes contain a limited number of DNA molecules of composed of {A,G,C,T}. The following is the operation of the biological experiment.

  1. Merge (T1,T2): the strands of test tubes T1, T2 are mixed in tube T1, and the tube T2 is empty;

  2. Annealing (T): it can produce all feasible double strands in T, and still store them in T after annealing;

  3. Denaturation (T): it can dissociate each double strand in tube T into two corresponding single strands;

  4. Separation (T1,X,T2): for a test tube T1 and a set of strings X, it removes all single strands containing X from tube T1, and produces a test tube T2 with the removed strands;

  5. Discard (T): for a test tube T, it discards the tube T;

  6. Copy (T1,T2): given a test tube T1, it can copy all the biological strands from tube T1 to test tube T2;

  7. Append (T, Z): given a test tube T and short singled DNA strand Z, it appends Z at the end of every strand in the tube T;

  8. Selection (T1,L,T2): given a test tube T1 and an integer L, it moves all biological strands of L-length from tube T1 to test tube T2, and the remaining biological strands are still in tube T1;

  9. Sort (T1,T2,T3): given a test tube T1, it can move the shortest biological strands to test tube T2, the longest biological strands to test tube T3, and the remaining strands still in tube T1;

  10. Read (T): given a test tube T, it recognizes composition of biological strands in tube T.

Although DNA computing differs from traditional intelligent methods, their computational thinking is the same. Since the above operations can be completed in a limited experimental step, the complexity of each operation can be reasonably set to O(1) time [3943].

3. DNA ALGORITHMS FOR THE GTSP

3.1. Preliminary Ideas

General ideas for conducting experiments using DNA operations are as follows: first, the vertices and point groups corresponding to the GTSP are represented by specific symbols; then, all possible DNA chains of the problem are generated by the biochemical reactions, and feasible chains are selected according to whether different constraints are met; finally, the biological strings corresponding to the optimal solution are obtained by searching and identifying.

It is divided into four steps:

  • Step 1:

    Construct all possible random routes from the specified fixed node to the end of the same node;

  • Step 2:

    For all feasible solutions, the route passing through each point group at least once, in turn, is filtered;

  • Step 3:

    Ensure that each route does not repeat through a node, so that it traverses all point groups once and each vertex at most once;

  • Step 4:

    By comparing the weights of each route, the optimal result of the GTSP is obtained.

3.2. Notations and Symbols

In this paper, in order to clarify and standardize the expression of our algorithm, it is necessary to define and explain the symbols and notations in the algorithm. Therefore, we use the symbols in Table 1 to illustrate.

Symbol Description Symbol Description
V Vertex set E Edge set
W Weight set vi The i-th vertex
Vi The i-th point group n Number of vertices
m Number of point groups # End of DNA strand
Ai,Bi DNA string of the i-th vertex pi point group of the i-th vertex
X DNA string representing weight ψ DNA string representing point group

DNA, deoxyribonucleic acid.

Table 1

Notations and symbols.

3.3. Graph Theory Expression

In the following, we set symbols #,A1,B1(i{1,2,,n}), pi(pi{1,2,,m}) to represent DNA different strands with the same length, which we make symbols for t mer length. Obviously, the longer the length of single-stranded DNA synthesis show that the more complicated problems. Then, we use a different single-stranded DNA symbol AipiBi for vertex vi, where pi means vi is contained in the group of pi points. The symbol # indicates the beginning and ending of the DNA strand. At the same time, using their complementary strand BiAj¯(eijE) synthetic double-stranded. Also, to calculate the weights of the different routes through the nodes, we designed the DNA string X and ψ with the length of t mer. Generally, we assume that v1 is the starting and ending point of the route.

3.4. Detailed DNA Algorithms

We will take the GTSP in Figure 3 as an example to introduce the algorithm process in detail.

3.4.1. Generate various possible chains

For a graph with m point groups, where n vertices are distributed in these point groups. We specify to start from v1 and traverse different vertices to form all possible routes. We set:

T1={#A1p1B1,A2p2B2,,AipiBi,,AnpnBn,A1p1B1#|viV,pi{1,2,,m}};

T2={BiAj¯|eijE};

T3={X}.

Algorithm 3.3.1: Generate various routings strands.

  • (1-1)

    Merge(T1,T2)

  • (1-2)

    Annealing(T1);

  • (1-3)

    Denaturation(T1);

  • (1-4)

    Separation(T1,{#A1p1B1},T4);

  • (1-5)

    Discard(T1);

  • (1-6)

    Separation(T4,{A1p1B1#},T5);

  • (1-7)

    Discard(T4).

We illustrate the process of the algorithm by taking the GTSP in Figure 3 (starting and ending node is v1) as an example. Then,

T1=#A11B1,A21B2,A31B3,A42B4,A52B5,A62B6,A73B7,A83B8,A11B1#;

T2={B1A2¯,B1A3¯,B1A4¯,B1A6¯,B1A7¯,B1A8¯,B2A1¯,B2A6¯,B2A7¯,

B3A1¯,B3A5¯,B3A8¯,B4A1¯,B4A5¯,B5A3¯,B5A4¯,B5A6¯,B5A8¯,B6A1¯,

B6A2¯,B6A5¯,B6A7¯,B6A8¯,B7A1¯,B7A2¯,B7A6¯,B7A8¯,B8A1¯,B8A3¯,

B8A5¯,B8A6¯,B8A7¯};

After the above seven steps, the single chain in tube T5 will encode all the routes from v1 to v1. For Figure 3, we have the DNA strands #A11B1A73B7A83B8A11B1# representing the route v1v7v8v1. This route goes through three vertices, but it doesn't go through all the point groups, so it doesn't satisfy the problem constraint. All we need are DNA strands that represent going through all of the points, so we have to remove the unqualified strands. Since each of the above operations executes in O(1) time [3943], this step can be finished in O(1).

3.4.2. Delete routes that do not pass through the point group

There is a limit to the feasible route of GTSP, and at least one vertex in each point group is passed. In other words, if a route does not contain any point group information, then it can't be a viable solution. We remove the unqualified chain by searching the feasible chains.

Algorithm 3.3.2: Select the route passing through each point group at least once.

For p=1 to p=m

  • (2-1)

    Separation(T5,p,T6);

  • (2-2)

    Discard(T5);

  • (2-3)

    Copy(T6,T5).

  • (2-4)

    Discard(T6).

End for

After this step, we delete the route chains that do not go through all the point groups. We consider in the Figure 3, which the value of m is equal to 3, so the DNA strand #A11B1A21B2A73B7A83B8A62B6A11B1# (representing the route of transportation v1v2v7v8v6v1) can be retained for passing 3 point groups. This operation uses one “For” clause, thus this step can be finished in O(m) since every single manipulation above works in O(1) time [3943].

3.4.3. Delete chains that pass through a vertex multiple times

We need to compare the weights of different routes and fill in the vertices in the graph that the route does not pass through. Since we require each vertex to be experienced at most once, we attach chains of information that represent vertices that the route does not pass through. If the length of the chain exceeds a certain length, this means that the corresponding chain represents a route that passes through a vertex multiple times, so the sum of the vertices required for each route should not exceed this limit.

Algorithm 3.3.3: Add vertices information that the route does not pass.

For j=2 to j=n

  • (3-1)

    Separation(T5,AjpjBj,T7);

  • (3-2)

    Appendhead(T5,ψψψ);

  • (3-3)

    Merge(T5,T7).

End for

  • (3-4)

    Selection(T5,(3n+5)t,T8).

After the above operations, we add the unpassed vertex information to the feasible chain. In Figure 3, we get the number of vertices is eight, and the chain length limit is 29t. Such as the singled strand

#A11B1A21B2A73B7A83B8A62B6A11B1#ψψψψψψψψψ33 is in the tube T8 after the step. This operation uses one “For” clause, thus this step can be finished in O(n) since each single manipulation above works in O(1) time [3943].

3.4.4. Add weight chains for different routes

To select the optimal route, we add the weight chains of corresponding routes to the end of DNA strands. We need to check the weight of the feasible route and then attach the X-chain to the end of the routing chain for comparison.

Algorithm 3.3.4: Add weight chains that the route pass.

For i=1 to i=n For j=1 to j=n

  • (4-1) Separation(T8,BjAi,T9);

  • (4-2-1) If(T9)

  • (4-2-2) Append(T9,XXXNumber:wij);

  • (4-2-3) Merge(T8,T9);

       Else

  • (4-3) Continue.

End for

End for

In Figure 3, after adding the weight chains, the route v1v2v7v6v1 is expressed as:

#A11B1A21B2A73B7A62B6A11B1#ψψψψψψψψψXXXW12XXXXW27XXW61XW76.

Meanwhile, we use two “For” clauses, thus this operation can be finished in O(n2).

3.4.5. Get the optimal solution chain

The optimal feasible solution chain refers to the minimum sum of edge weights. In the last test tube, we searched for the shortest DNA strands to represent the results of the GTSP. This operation works in O(1) steps.

Algorithm 3.3.5: Search the optimal solutions.

  • (5-1)

    Sort(T8,T10,T11);

  • (5-2)

    Read(T10).

We can get the shortest DNA strands for the Figure 3:

#A11B1A73B7A62B6A11B1#ψψψψψψψψψψψψψψψ15XXW17XXW61XW76

#A11B1A62B6A73B7A11B1#ψψψψψψψψψψψψψψψ15XXW17XXW61XW76.

So, the shortest route is either v1v7v6v1 or v1v6v7v1.

4. CORRECTNESS AND COMPUTING COMPLEXITY OF PROPOSED DNA ALGORITHMS

We demonstrate that the algorithm can obtain the solution of GTSP with O(n2) steps by DNA molecular operations.

Theorem 1.

The GTSP can be solved by the proposed DNA algorithms.

Proof.

We use different DNA biological chains to represent different vertices, point groups and weights to synthesize complete single strands. By eliminating the unsatisfactory chains in all possible paths, all possible DNA chains are obtained, and then through corresponding DNA operations, the solution to the GTSP is obtained. Specifically, we divided it into five steps: Step 1 generates a set T5 containing all possible solutions of GTSP. Through a series of DNA operations in Step 2, the chains in the set that do not pass through all the groups of points will be deleted. By Step 3, we attached the strand ψψψ to the DNA chains tail to ensure that the lengths of the strands are the same for the different routes. In Step 4, we attach profit chain X to the end of the corresponding chain to represent the weight value of the passing route. We read out the optimal solution that meets the requirements of the problem in Step 5.

Theorem 2.

Using DNA molecular algorithms, GTSP can be solved in O(n2) time complexity level.

Proof.

The complexity of each biological operation is within O(1) time [3943]. Algorithms 3.3.1 and 3.3.5 take O(1) time complexity. Due to the m<n, the Algorithm 3.3.2 takes time complexity is less than O(n). Algorithms 3.3.3 and 3.3.4 take O(n) and O(n2) time complexity respectively. The time complexity T of the algorithm is as follows:

Hence, the algorithms solve the GTSP in O(n2) time complexity level.

5. SIMULATION EXPERIMENT OF DNA ALGORITHMS

In order to prove the feasibility of the algorithm, the simulation experiment is an indispensable part. Our simulation experiments are mainly carried out in three aspects. One is to reasonably design the DNA representation of each element in the problem, which plays a decisive role in the accuracy of the experiment. The second is to reduce the hybridization error rate in the experimental steps to ensure that the corresponding information can be read quickly and accurately. Finally, the process and results of the experiment are presented.

The calculation of DNA depends on the accuracy of the operation of biochemical molecules; otherwise, it will lead to the accumulation and expansion of errors in biochemical reactions. Meanwhile, DNA sequences that may encourage unexpected probe library hybridization should be excluded. Therefore, designing a suitable DNA sequence is a necessary basis for ensuring accuracy with DNA calculations. To achieve these objectives, seven constraints for DNA sequence design were proposed [44]. These limitations include that long homopolymer chains may have unusual secondary structures; if there is no long homopolymer beam, the melting temperature of the probe library mixture will be more uniform; the probe will only bind weakly where it is not intended to bind; and the affinity of the library chain to itself is very low, etc. To this end, we used the sequence design methods in Ref. [44].

In this paper, we use the computational molecular biology tool Biopython as the system development platform, and Braich's program to generate “appropriate DNA sequences” suitable for biological computing algorithm, which can be used to solve the GTSP. Because the original Braich's program did not find the source code of two functions srante48() and drand48(), the standard function srand() was used instead of function srand48() and the source code of function drand48() was added. Our modified program is used to construct a random sequence of 4 bases for each bit of the library, and to check whether the library chain meets the seven constraints of DNA sequence [44]. If the produced DNA sequence does not meet the restrictions, the program will continue to generate another new sequence. When these restrictions can be met, the sequence is accepted for subsequent biochemical reactions. Using this method, we can get the “appropriate DNA sequences” which meets the restriction conditions to improve the accuracy of biochemical reactions [4548].

Therefore, taking the GTSP (Figure 3) as an example, the program generates random four-base sequences to form Ai, Bi, #, X and ψ as shown in Table 2. Table 3 demonstrates the DNA node sequence composed by improved Braich's methods.

Bit 35 DNA Sequence Bit 35 DNA Sequence
A1 ACAT B1 GTCC
A2 CTAT B2 GGCT
A3 TGTC B3 ATCC
A4 GAAT B4 ACTG
A5 AATC B5 CCGT
A6 GCTA B6 CTTC
A7 GCGT B7 AGCC
A8 TTGT B8 TAGT
1 TAAG 2 CTGA
3 AGTC # CGCT
X CAGC ψ TGCT
Table 2

Sequences chosen to represent Ai, Bi, pi, #, X and ψ(i{1,2,,8}) for the generalized traveling salesman problem (GTSP) in Figure 3.

Bit 35 DNA Sequence Bit 35 DNA Sequence
A11B1 ACATTAAGGTCC A21B2 CTATTAAGGGCT
A31B3 TGTCTAAGATCC A42B4 GAATCTGAACTG
A52B5 AATCCTGACCGT A62B6 GCTACTGACTTC
A73B7 GCGTAGTCAGCC A83B8 TTGTAGTCTAGT
Table 3

Sequences chosen to represent the elements AipiBi(i{1,2,,8}) for the generalized traveling salesman problem (GTSP) in Figure 3.

In Table 4, we also calculate the enthalpy, entropy and free energy of the binding of each probe to the corresponding region in the chains. Their average deviation and standard deviation levels are also shown in Table 4. Routes that pass through no more than two vertices of each point group in Figure 3 are shown in Table 5 (due to there are too many possible routes, we only represent some of them). In the simulation experiment, we derive the optimal solution of the GTSP through the composition structure of the last selected DNA sequences in Table 6.

Vertex Enthalpy Energy H Entropy Energy S Free Energy G
A11B1 110.5 287.7 24.6
A21B2 95.1 242.9 25.1
A31B3 103.3 272.1 23.6
A42B4 98.4 247.3 23.3
A52B5 109.6 281.1 25.9
A62B6 105.2 276.2 23.1
A73B7 104.5 271.9 23.2
A83B8 99.7 251.3 23.1
Average 103.288 266.313 23.989
Standard deviation 5.356 16.791 1.075
Table 4

The energies over all probe/library strand interactions.

Routing DNA Strands
v1v4v5v8 3CGCTACATTAAGGTCCGAATCTGAACTGAATCC
v1 TGACCGTTTGTAGTCTAGTACATTAAGGTCCCGCT5
v1v4v5v8 3CGCTACATTAAGGTCCGAATCTGAACTGAATCCTGACCGT
v3v1 TTGTAGTCTAGTTGTCTAAGATCCACATTAAGGTCCCGCT5
v1v4v5v8 3CGCTACATTAAGGTCCGAATCTGAACTGAATCCTGACCGT
v7v1 TTGTAGTCTAGTGCGTAGTCAGCCACATTAAGGTCCCGCT5
v1v4v5v8 3CGCTACATTAAGGTCCGAATCTGAACTGAATCCTGACCGTTTGTAG
v7v2v1 TCTAGTGCGTAGTCAGCCCTATTAAGGGCTACATTAAGGTCCCGCT5
v1v6v8v1 3CGCTACATTAAGGTCCGCTACTGACTTC
TTGTAGTCTAGTACATTAAGGTCCCGCT5
v1v6v8v5 3CGCTACATTAAGGTCCGCTACTGACTTCTTGTAG
v1 TCTAGTAATCCTGACCGTACATTAAGGTCCCGCT5
v1v6v8v7 3CGCTACATTAAGGTCCGCTACTGACTTCTTGTAG
v1 TCTAGTGCGTAGTCAGCCACATTAAGGTCCCGCT5
v1v6v8v7 3CGCTACATTAAGGTCCGCTACTGACTTCTTGTAGTCTAGT
v2v1 GCGTAGTCAGCCCTATTAAGGGCTACATTAAGGTCCCGCT5
v1v3v5v8 3CGCTACATTAAGGTCCTGTCTAAGATCCAATCCTGACC
v3v1 GTTTGTAGTCTAGTTGTCTAAGATCCACATTAAGGTCC5
v1v3v5v8 3CGCTACATTAAGGTCCTGTCTAAGATCCAATC
v1 CTGACCGTTTGTAGTCTAGTACATTAAGGTCC5
v1v3v5v8 3CGCTACATTAAGGTCCTGTCTAAGATCCAATCCTGACC
v7v1 GTTTGTAGTCTAGTGCGTAGTCAGCCACATTAAGGTCC5
v1v2v6v8 3CGCTACATTAAGGTCCCTATTAAGGGCTGCTACTGACTTCTTGTAG
v7v2v1 TCTAGTGCGTAGTCAGCCCTATTAAGGGCTACATTAAGGTCCCGCT5
v1v2v6v8 3CGCTACATTAAGGTCCCTATTAAGGGCTGCTACT
v1 GACTTCTTGTAGTCTAGTACATTAAGGTCCCGCT5
v1v2v6v8 3CGCTACATTAAGGTCCCTATTAAGGGCTGCTACTGACTTC
v7v1 TTGTAGTCTAGTGCGTAGTCAGCCACATTAAGGTCCCGCT5
v1v2v6v5 3CGCTACATTAAGGTCCCTATTAAGGGCTGCTACTGACTTC
v8v1 AATCCTGACCGTTTGTAGTCTAGTACATTAAGGTCCCGCT5
v1v2v6v5 3CGCTACATTAAGGTCCCTATTAAGGGCTGCTACTGACTTCAATCCT
v8v7v1 GACCGTTTGTAGTCTAGTGCGTAGTCAGCCACATTAAGGTCCCGCT5
v1v2v6 3CGCTACATTAAGGTCCCTATTAAGGGCTGCTACTG
v5v8v7 ACTTCAATCCTGACCGTTTGTAGTCTAGTGCGTAG
v2v1 TCAGCCCTATTAAGGGCTACATTAAGGTCCCGCT5
Table 5

Routes that pass through no more than two vertices of each point group in Figure 3.

Routing DNA Strands
v1v7v6v1 3CGCTACATTAAGGTCCGCGTAGTCAGCCGCTAC
TGACTTCACATTAAGGTCCCGCTTGCTTGCTTGCTT
GCTTGCTTGCTTGCTTGCTTGCTTGCTTGCTTGCTT
GCTTGCTTGCTCAGCCAGCCAGCCAGCCAGC5
v1v6v7v1 3CGCTACATTAAGGTCCGCTACTGACTTCGCGTA
GTCAGCCACATTAAGGTCCCGCTTGCTTGCTTGCTT
GCTTGCTTGCTTGCTTGCTTGCTTGCTTGCTTGCTT
GCTTGCTTGCTCAGCCAGCCAGCCAGCCAGC5
Table 6

DNA sequences chosen to represent the solutions to the generalized traveling salesman problem (GTSP) in Figure 3.

6. CONCLUSIONS

This paper presents DNA biological algorithms for solving GTSP based on the Adleman–Lipton model. In this process, the biological operations are used to produce combined results and to perform screening solutions. Advantages of using DNA algorithms are as follows: first, the algorithms are based on DNA molecules with strong parallel computing power, large storage capacity and low hybridization error rate as we use the specific reasonable coding DNA sequences. Secondly, the proposed algorithms can solve the GTSP at O(n2) time complexity level. At present, there are few methods to solve the second kind of GTSP, and most of them have the disadvantage of high computational complexity. For example, aiming at the second type of GTSP, Zhao [16] proposed a new GA for adding virtual vertices to generalized chromosomes, that the second kind of GTSP can be solved by the virtual vertex algorithm after a computation transformation. In 2013, Tan et al. [49] transformed the GTSP of the second type into the GTSP of the first type by reconstructing the algorithm of the distance matrix. They then used the HCGA algorithm to solve the transformed GTSP of the first type, thus indirectly solved the original problem. These two methods are the second type of GTSP transformation in the solution. In comparison, the algorithm proposed by the Adleman–Lipton model does not require transformation, and it is easier to understand and operate directly. Ant colony optimization algorithm [20] introduces individual variation change route selection. However, as this algorithm is a typical probability algorithm, parameter setting may affect the search time. Garg et al. [50] proposed a famous approximate algorithm for GTSP with n vertices and m point groups to generate approximate factor O(log2(n)log(log(n))log(m)). Bontoux et al. [51] addresses the solution of the GTSP using a dynamic programming algorithm with the O(n2m) time complexity. Pintea et al. [52] presented an effective meta-heuristic algorithm for solving the problem with the O((p1)!(nm+nlogn)) time complexity. In this paper, this algorithm is compared with other heuristic algorithms, and the results of the test problem are given (Table 7). Limited by the current biological computer simulation technology, DNA computing can not be compared by computational simulation. However, our DNA algorithm has many other advantages, such as generality, low energy consumption, efficiency, reducing time and space complexity and fault tolerance. The application potential of DNA molecular computing technology is huge. However, there are still many practical problems to be solved in the research of DNA Algorithm for different problems. Firstly, the complexity of the algorithm implementation, especially the coding ability, may be limited to a certain extent. Secondly, due to the instability of biochemical reaction, there may be some missed solutions, false solutions and wrong solutions. Our results are based on a theoretical model, which presents a DNA strand coding method, simulates DNA experiments and solves the GTSP. Each of the DNA operation used has been implemented at the laboratory level so that the method can be applied in practice. For complex NP problems, the algorithm can also be solved, but the experiment will be more complex. In previous studies, many researchers have used the same perspective to analyze the complexity of DNA computing to solve different problems [5356]. It needs to develop more powerful multifunctional DNA computing systems. Most DNA computing models can only address a specific type of problem, and the lack of a universal computing system hinders the large-scale promotion of DNA computing. This will lead us to more in-depth research and more challenging development in the field of biotechnology.

Algorithm Time Complexity
DNA algorithm O(n2)
Garg et al. [50] O(log2(n)log(log(n))log(m))
Bontoux et al. [51] O(n2m)
Pintea et al. [52] O((p1)!(nm+nlogn))
Table 7

Time complexity of different algorithms for generalized traveling salesman problem (GSTP).

In recent years, the research of DNA computing has always been a hot spot. Although there are still some difficulties to be broken through, it is undeniable that DNA has the advantages of storage capability, parallel capability and stability. Further research can consider applying the proposed algorithm to other TSP expansion problems, including mixed Chinese postman problem (MCPP), asymmetric traveling salesman problem (ATSP), generalized covering traveling salesman problem (GCTSP), etc., and trying to establish a unified and standardized model to solve them. We need to explore better ways to combine DNA computing with other computing methods in intelligent systems. We hope that this technological challenge will bring greater progress in science and technology. Moreover, in the development of DNA computing, information recognition, judgment and bio-chain screening can help us to comprehend the origin of computing more deeply, and also promote the process of DNA computing, making it as one of the potential parallel computing methods to solve more complex large data problems.

CONFLICTS OF INTEREST

The authors declare that they have no competing interests.

AUTHORS' CONTRIBUTIONS

All authors contributed to the work. All authors read and approved the manuscript.

ACKNOWLEDGMENTS

It is supported by the Open Research Fund of State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin, China Institute of Water Resources and Hydropower Research (grant No. IWHR-SKL-201905). The project is also funded by research Start-up project of Wenzhou Business School (grant No. RC202002).

REFERENCES

1.A.L. Henry-Labordere, The record balancing problem: a dynamic programming solution of a generalized traveling salesman problem, RAIRO, Vol. 2, 1969, pp. 43-49.
2.J.P. Saksena, Mathematical model of scheduling clients through welfare agencies, CORS J., Vol. 8, 1970, pp. 185-200.
3.S.S. Srivastava, S. Kumar, R.C. Garg, and P. Sen, Generalized traveling salesman problem through n sets of nodes, CORS J., Vol. 7, 1969, pp. 97-101.
Journal
International Journal of Computational Intelligence Systems
Volume-Issue
14 - 1
Pages
228 - 237
Publication Date
2020/12/03
ISSN (Online)
1875-6883
ISSN (Print)
1875-6891
DOI
10.2991/ijcis.d.201127.001How to use a DOI?
Copyright
© 2021 The Authors. Published by Atlantis Press B.V.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Xiaomin Ren
AU  - Xiaoming Wang
AU  - Zhaocai Wang
AU  - Tunhua Wu
PY  - 2020
DA  - 2020/12/03
TI  - Parallel DNA Algorithms of Generalized Traveling Salesman Problem-Based Bioinspired Computing Model
JO  - International Journal of Computational Intelligence Systems
SP  - 228
EP  - 237
VL  - 14
IS  - 1
SN  - 1875-6883
UR  - https://doi.org/10.2991/ijcis.d.201127.001
DO  - 10.2991/ijcis.d.201127.001
ID  - Ren2020
ER  -