Bibliography on Parallel Multiple Sequence Alignment

QuickSearch:   Number of matching entries: 0.

Search Settings

    AuthorTitleYearJournal/ProceedingsBibTeX typeDOI/URL
    Ahmed, N., Pan, Y. & Vandenberg, A. Parallel Algorithm for Multiple Genome Alignment on the Grid Environment 2005 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05), pp. 7  inproceedings DOI  
    Abstract: The multiple genome sequence alignment problem falls in the domain of problems that can be parallelized to address large sequence lengths. Although there is communication required for the computation of the aligned sequences, the proper distribution can reduce the overall problem to a set of tasks to be solved independently and then merged. A parallel algorithm for the alignment of multiple genome sequences is described. The algorithm is experimentally evaluated in a distributed Grid environment that provides very scalable and low cost computation performance. The Grid environment is evaluated with respect to a traditional cluster environment and results are compared to evaluate the effectiveness of a Grid environment for large computational biology.
    Review: The multiple sequence alignment algorithm appears to be based on Taylor's MULTAL algorithm [Taylor1987]. Although the title speaks of genome alignment, no anchor-based methods are used. Anchor methods are typically used in genome class alignment.
    BibTeX:
    @inproceedings{Ahmed2005,
      author = {Nova Ahmed and Yi Pan and Art Vandenberg},
      title = {Parallel Algorithm for Multiple Genome Alignment on the Grid Environment},
      booktitle = {Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05)},
      year = {2005},
      pages = {7},
      doi = {http://dx.doi.org/10.1109/IPDPS.2005.324}
    }
    
    Anbarasu, L.A., Narayanasamy, P. & Sundararajan, V. Multiple Sequence Alignment Using Parallel Genetic Algorithms 1999 Simulated Evolution and Learning, Vol. LNCS 1585, pp. 130-137  inproceedings DOI  
    Abstract: An efficient approach to solve multiple sequence alignment problem is presented in this paper. This approach is based on parallel genetic algorithm (PGA) that runs on a networked parallel environment. The algorithm optimizes an objective function `weighted sums of pairs' which measures alignment quality. Using isolated independent subpopulations of alignments in a quasi evolutionary manner this approach gradually improves the fitness of the subpopulations as measured by an objective function. This parallel approach is shown to perform better than the sequential approach and an alternative method, ClustalW. An investigation of the parameters of the algorithm further confirms the better performance.
    Review: See [Anbarasu2000]. An island genetic algorithm is used. The implementation uses C and PVM. The PARAM 10000 system has 40 nodes (SUN Ultra450) with 4 CPUs per node running at 300 MHz. The paper doesn't specify the number of nodes used, but most likely 5 were used for the 5 subpopulations of 20 individuals. For a slightly better score on 15 sequences of length 292, PGA runs in 2815 seconds while ClustalW runs in 22.35 seconds. That is 10 times slower with 5 processors.
    BibTeX:
    @inproceedings{Anbarasu1999,
      author = {L.~A. Anbarasu and P. Narayanasamy and V. Sundararajan},
      title = {Multiple Sequence Alignment Using Parallel Genetic Algorithms},
      booktitle = {Simulated Evolution and Learning},
      publisher = {Springer Berlin / Heidelberg},
      year = {1999},
      volume = {LNCS 1585},
      pages = {130-137},
      doi = {http://dx.doi.org/10.1007/3-540-48873-1_18}
    }
    
    Anbarasu, L.A., Narayanasamy, P. & Sundararajan, V. Multiple molecular sequence alignment by island parallel genetic algorithm 2000 Current Science, Vol. 78(7), pp. 858-863  article URL 
    Abstract: This paper presents an evolution-based approach for solving multiple molecular sequence alignment. The approach is based on the island parallel genetic algorithm that relies on the fitness distribution over the population of alignments. The algorithm searches for an alignment among the independent isolated evolving populations by optimizing weighted sum of pairs objective function which measures the alignment quality. The parallel approach is implemented on PARAM 10000, a parallel machine developed at the Center for Development of Advanced Computing, Pune, and is shown to consistently perform better than the sequential genetic algorithm. The algorithm yields alignments that are qualitatively better than an alternative method, ClustalW.
    Review: island Parallel Genetic Algorithm (iPGA). The PARAM 10000 system has 40 nodes (SUN Ultra450) with 4 CPUs per node running at 300 MHz. The implementation uses C and PVM. No run times or speedup is given. Quality scores are slightly better than ClustalW.
    BibTeX:
    @article{Anbarasu2000,
      author = {L.~A. Anbarasu and P. Narayanasamy and V. Sundararajan},
      title = {Multiple molecular sequence alignment by island parallel genetic algorithm},
      journal = {Current Science},
      year = {2000},
      volume = {78},
      number = {7},
      pages = {858-863},
      url = {http://www.ias.ac.in/currsci/apr102000/researcharticles2.pdf}
    }
    
    Araki, S., Goshima, M., ichiro Mori, S., Nakashima, H., Tomita, S., Akiyama, Y. & Kanehisa, M. Application of Parallelized DP and A* Algorithm to Multiple Sequence Alignment 1993 Proceedings of the Genome Informatics Workshop IV, pp. 94-102  inproceedings URL 
    Abstract: This paper makes two proposals to speed up the Parallel Iterative Method, which is based on the iterative strategy of the Berger-Munson algorithm. The first proposal is to exploit finer-grained parallelism in the DP(Dynamic Programming) procedure itself. This proposal makes the processing speed proportional to the number of processors. The second proposal is to apply the A* algorithm, a well known heuristic search algorithm, instead of DP. A* reduces the search space using heuristics, while DP traverses the whole space blindly. We have implemented these two proposals on a parallel computer, the AP1000. In a test of parallelizing DP, ten 1000-character sequences are aligned by using 10 processors per one DP procedure at a speed 8.11 times faster than sequential processing. By applying the A* algorithm to 30 sets of test problems, we obtain optimal alignment by reducing the search space by 95%.
    Review: See [Ishikawa1992] and [Araki1993] and [Yap1995] and [Martino1997] and [Yap1998] Section 4. The Berger-Munson algorithm is used. One contribution is to further parallelize the "parallel iterative method" of [Hoshida1992] and [Ishkawa1992b] by dividing the DP matrix column wise for execution on different processors. A speedup of 18 is realized for sequences of length 1000 with 30 processors on an AP1000 distributed memory computer. The other contribution is to use an A* algorithm instead of DP.
    BibTeX:
    @inproceedings{Araki1993,
      author = {Shiho Araki and Masahiro Goshima and Shin-ichiro Mori and Hiroshi Nakashima and Shinji Tomita and Yutaka Akiyama and Minoru Kanehisa},
      title = {Application of Parallelized DP and A* Algorithm to Multiple Sequence Alignment},
      booktitle = {Proceedings of the Genome Informatics Workshop IV},
      year = {1993},
      pages = {94-102},
      url = {http://www.genome.jp/manuscripts/GIW93/Oral/GIW93O11.html}
    }
    
    Aung, Y.L., Maskell, D.L., Oliver, T.F., Schmidt, B. & Bong, W. C-Based Design Methodology for FPGA Implementation of ClustalW MSA 2007 Pattern Recognition in Bioinformatics, PRIB 2007, Vol. LNBI 4774, pp. 11-18  inproceedings DOI  
    Abstract: Systolisation of the pairwise distance computation algorithm and mapping into field programmable gate arrays (FPGA) have proven to give superior performance at a lower cost, compared to the same algorithm running on a cluster of workstations. The primary design methodology for this approach is based on the hardware description languages such as VHDL and Verilog HDL. An alternative design methodology, however, is the use of a high level language such as C to describe the algorithms and generate equivalent hardware descriptions for implementation in FPGA so as to reduce time to market. This paper describes the design and implementation of the ClustalW first stage multiple sequence alignment based on the Smith-Waterman algorithm on a low cost FPGA development platform using a C language development tool suite. Performance evaluation results show that comparable performance could be achieved to that of Pentium 4 systems and other HDL-based solutions using even the smallest commercially available FPGA device with this design methodology.
    Review: Rating:** Impulse C and no HDL was used to develop the system. Hardware/software partitioning was done with Impulse CoDeveloper. Only Stage 1 is accelerated. Traceback in Stage 1 is avoided by calculating the number of identical elements (NIDs) directly. See [Oliver2005a] for the introduction of NID counting. ClustalW is accelerated on a ML403 platform with a Xilinx Virtex-4 FX FPGA. The system can only handle sequences with length up to 550. Up to three PEs take two sequences and return a score based upon the NIDs. The Stage 1 only speedup is 2.4x over a 2 GHz Pentium 4. The PEs and MicroBlaze controller run at 75 MHz. This paper really shouldn't fit into the MSA class since no overall performance was given. The only tie is that the NID score for ClustalW is calculated instead of the sum of match scores for pairwise alignment. They implemented affine gaps. See [Pellerin2007] for a similar work.
    BibTeX:
    @inproceedings{Aung2007,
      author = {Yan~Lin Aung and Douglas~L. Maskell and Timothy~F. Oliver and Bertil Schmidt and William Bong},
      title = {C-Based Design Methodology for FPGA Implementation of ClustalW MSA},
      booktitle = {Pattern Recognition in Bioinformatics, PRIB 2007},
      publisher = {Springer Berlin / Heidelberg},
      year = {2007},
      volume = {LNBI 4774},
      pages = {11-18},
      doi = {http://dx.doi.org/10.1007/978-3-540-75286-8_2}
    }
    
    Bai, J. & Rezaei, S. Multithreaded Multiple Sequence Alignments 2005 27th Annual International Conference of the Engineering in Medicine and Biology Society, IEEE-EMBS 2005, pp. 2863-2866  inproceedings DOI  
    Abstract: Based on the mixed idea of progressive alignment and divide-and-conquer alignment, two different multithreaded multiple sequence alignment programs, depending on how the guide tree(s) would be applied, are implemented for checking the improvements of alignment speed and sensitivity. The single-tree alignment builds a uniformed guide tree for the full-length sequences at the beginning, which will be used by all the sub-alignments as guide tree. In multiple-tree alignment, sequences will be firstly cut into pieces and these sub-sequences will build their own guide trees to guide their individual alignments. Multiple-tree alignment seems having a better speedup performance than the single tree alignment, but neither of them, at this stage, shows ideal sensitivity results as the number of threads increases. Therefore, some heuristic methods for fixing the cut points were suggested for future improvement, such as overlapping alignment and sliding window alignment.
    Review: After dividing the sequences into smaller subsequences, a progressive alignment is performed on the subsequences. The paper claims this is the first time that different guide trees are considered for each set of subsequences.
    BibTeX:
    @inproceedings{Bai2005,
      author = {Joanne Bai and Siamak Rezaei},
      title = {Multithreaded Multiple Sequence Alignments},
      booktitle = {27th Annual International Conference of the Engineering in Medicine and Biology Society, IEEE-EMBS 2005},
      year = {2005},
      pages = {2863-2866},
      doi = {http://dx.doi.org/10.1109/IEMBS.2005.1617071}
    }
    
    Boukerche, A., Correa, J.M., de Melo, A.C.M.A., Jacobi, R.P. & Rocha, A.F. An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN 2007 High Performance Computing, HiPC 2007, Vol. LNCS 4873, pp. 71-82  inproceedings DOI  
    Abstract: Multiple sequence alignment (MSA) is a very important problem in Computational Biology since it is often used to identify evolutionary relation-ships among the organisms and predict secondary/tertiary structure. Since MSA is known to be a computationally challenging problem, many proposals were made to accelerate it either by using parallel processing or hardware accelerators. In this paper, we propose an FPGA based accelerator to execute the most compute intensive part of DIALIGN, an iterative method to obtain multiple sequence alignments. The experimental results collected in our 200-element FPGA prototype show that a speedup of 383.41 was obtained when compared with the software implementation.
    Review: Rating:*** DIALIGN is accelerated with an FPGA. The contribution is a hardware-based, pairwise alignment method that finds diagonals. Since only diagonals are found, no gaps are computed. A pairwise alignment in DIALIGN is defined to be a chain of diagonals. A peak speedup of 383.4 was achieved on a pairwise comparison that could be used in the first stage of DIALIGN. No overall application speedup comparison is given, but is left for future work.
    BibTeX:
    @inproceedings{Boukerche2007,
      author = {Azzedine Boukerche and Jan~Mendonca Correa and de~Melo, Alba~Cristina Magalhaes Alves and Ricardo~Pezzuol Jacobi and Adson~Ferreira Rocha},
      title = {An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN},
      booktitle = {High Performance Computing, HiPC 2007},
      publisher = {Springer Berlin / Heidelberg},
      year = {2007},
      volume = {LNCS 4873},
      pages = {71-82},
      doi = {http://dx.doi.org/10.1007/978-3-540-77220-0_11}
    }
    
    Bradley, R.K., Roberts, A., Smoot, M., Juvekar, S., Do, J., Dewey, C., Holmes, I. & Pachter, L. Fast Statistical Alignment 2009 PLoS Computational Biology, Vol. 5(5), pp. e1000392  article DOI  
    Abstract: We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment---previously available only with alignment programs which use computationally expensive Markov Chain Monte Carlo approaches---yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa. sourceforge.net/.
    Review: An Author Summary is given in the online version.
    BibTeX:
    @article{Bradley2009,
      author = {Bradley, Robert K. AND Roberts, Adam AND Smoot, Michael AND Juvekar, Sudeep AND Do, Jaeyoung AND Dewey, Colin AND Holmes, Ian AND Pachter, Lior},
      title = {Fast Statistical Alignment},
      journal = {PLoS Computational Biology},
      publisher = {Public Library of Science},
      year = {2009},
      volume = {5},
      number = {5},
      pages = {e1000392},
      doi = {http://dx.doi.org/10.1371/journal.pcbi.1000392}
    }
    
    Catalyurek, U., Stahlberg, E., Ferreira, R., Kurç, T.M. & Saltz, J.H. Improving Performance of Multiple Sequence Alignment Analysis in Multi-Client Environments 2002 Proceedings of the 16th International Parallel and Distributed Processing Symposium, IPDPS '02, pp. 183-190  inproceedings DOI  
    Abstract: This paper is concerned with the efficient execution of multiple sequence alignment methods in a multiple client environment. Multiple sequence alignment (MSA) is a computationally expensive method, which is commonly used in computational and molecular biology. Large databases of protein and gene sequences are available to the scientific community. Oftentimes, these databases are accessed by multiple users to execute MSA queries. The data server has to handle multiple concurrent queries in such situations. We look at the effect of data caching on the performance of the data server. We describe an approach for caching intermediate results for reuse in subsequent or concurrent queries. We focus on progressive alignment-based strategies, in particular the CLUSTAL W algorithm. Our results for 350 sets of sequences show an average speedup of up to 2.5 is obtained by caching intermediate results. Our results also show that the cache-enabled CLUSTAL W program scales well on a SMP machine.
    Review: Rating:** Pairwise alignments are cached so that alignment jobs from several clients may get pairwise scores from cache instead of recomputing them. Multiple instances of ClustalW 1.82 are run on a shared memory data server (``24-processor Sun Fire 6800 system, equipped with 750 MHz CPUs and 48 GB of main memory''). From Figure 2 and 3, a speedup of approximately 2.5 is realized with the cache over several MSA jobs.
    BibTeX:
    @inproceedings{Catalyurek2002,
      author = {Umit Catalyurek and Eric Stahlberg and Renato Ferreira and Tahsin~M. Kurç and Joel~H. Saltz},
      title = {Improving Performance of Multiple Sequence Alignment Analysis in Multi-Client Environments},
      booktitle = {Proceedings of the 16th International Parallel and Distributed Processing Symposium, IPDPS '02},
      publisher = {IEEE Computer Society},
      year = {2002},
      pages = {183-190},
      doi = {http://dx.doi.org/10.1109/IPDPS.2002.1016584}
    }
    
    Catalyurek, U., Gray, M., Kurç, T., Saltz, J., Stahlberg, E. & Ferreira, R. A Component-Based Implementation of Multiple Sequence Alignment 2003 Proceedings of the 2003 ACM Symposium on Applied Computing, SAC '03, pp. 122-126  inproceedings DOI  
    Abstract: This paper addresses the efficient execution of a Multiple Sequence Alignment (MSA) method, in particular the progressive alignment-based CLUSTAL W algorithm, on a cluster of workstations. We describe a scalable component-based implementation of CLUSTAL W program targeting distributed memory machines and multiple query workloads. We look at the effect of data caching on the performance of the data server. We present a distributed, persistent cache approach for caching intermediate results for reuse in subsequent or concurrent queries. Our initial results show that the cache-enabled CLUSTAL W program scales well on a cluster of workstations.
    Review: Rating:** A component-based framework called DataCutter is used to manage work loads and communication in a distributed system. Pairwise alignments are cached as is done in [Catalyurek2002]. The system consists of 24 Linux PCs with 933 MHz Pentium III CPUs. For any one alignment job (or query as they call it) only the first-stage alignment is accelerated by distributing the pairwise alignment tasks and caching the results for other queries. From Figure 6a, the average speedup with no cache hits is approximately 3.3 using 8 pairwise alignment compute filters (work tasks). The speedup is about 5 with a 100% cache hit rate.
    BibTeX:
    @inproceedings{Catalyurek2003,
      author = {Umit Catalyurek and Mike Gray and Tahsin Kurç and Joel Saltz and Eric Stahlberg and Renato Ferreira},
      title = {A Component-Based Implementation of Multiple Sequence Alignment},
      booktitle = {Proceedings of the 2003 ACM Symposium on Applied Computing, SAC '03},
      publisher = {ACM},
      year = {2003},
      pages = {122-126},
      doi = {http://dx.doi.org/10.1145/952532.952559}
    }
    
    Chaichoompu, K., Kittitornkun, S. & Tongsima, S. MT-ClustalW: multithreading multiple sequence alignment 2006 20th International Parallel and Distributed Processing Symposium, IPDPS 2006, pp. 8  inproceedings DOI  
    Abstract: ClustalW is the most widely used tool for aligning multiple protein or nucleotide sequences. The alignment is achieved via three stages: pairwise alignment, guide tree generation and progressive alignment. This paper analyzes and enhances a multithreaded implementation of ClustalW called ClustalW-SMP for higher throughput. Our goal is to maximize the degree of parallelism on multithreading ClustalW called MultiThreading-ClustalW (MT-ClustalW). As a result, bioinformatics laboratories are able to use this MT-ClustalW with much less energy consumption on multicore and SMP (symmetric multiprocessor) machines than that of PC clusters. The experiment results show that the MT-ClustalW framework can achieve a considerable speedup over the sequential ClustalW and original multithreaded ClustalW-SMP implementations.
    Review: Stage 2 of ClustalW is run in parallel with pthreads going beyond ClustalW-SMP [Duzlevski2002] that only parallelized Stages 1 and 3. MT-Clustal is based on ClustalW-SMP (0.99-9) which is based on ClustalW (1.82). A speedup of about 67% is achieved with 8 threads running on a dual-core 2.8 GHz Pentium D with 2GB of memory using 800 protein sequences of length 200.
    BibTeX:
    @inproceedings{Chaichoompu2006a,
      author = {Kridsadakorn Chaichoompu and Surin Kittitornkun and Sissades Tongsima},
      title = {MT-ClustalW: multithreading multiple sequence alignment},
      booktitle = {20th International Parallel and Distributed Processing Symposium, IPDPS 2006},
      year = {2006},
      pages = {8},
      doi = {http://dx.doi.org/10.1109/IPDPS.2006.1639537}
    }
    
    Chaichoompu, K. & Kittitornkun, S. Multithreaded ClustalW with Improved Optimization for Intel Multi-core Processor 2006 International Symposium on Communications and Information Technologies, ISCIT '06, pp. 590-594  inproceedings DOI  
    Abstract: This paper presents the methodology that assists the compiler to optimize ClustalW; the most widely used tool for aligning multiple text-based protein or nucleotide sequences in Bioinformatics. Our goal is to minimize latency and maximize the throughput of execution on multithreading ClustalW called MT-ClustalW: our previous work. As a result, optimized MT-ClustalW is able to fully utilize the machine resources and achieves higher throughput on multicore computers. The experiment results show that our methodology can assist the compiler to optimize the code better than only compiler-optimization and achieve over 2 times faster than the sequential ClustalW. Finally, we analyze the overall result with Amdahl's Law.
    Review: MT-ClustalW. A 2.12x speedup is achieved with 4 threads on a dual-core 2.8 GHz Pentium D processor for 800 proteins of length 1000. The method uses pthreads and parallelizes all three stages. The Intel compiler is used with the /QxP option to get SSE3 optimization.
    BibTeX:
    @inproceedings{Chaichoompu2006b,
      author = {Kridsadakorn Chaichoompu and Surin Kittitornkun},
      title = {Multithreaded ClustalW with Improved Optimization for Intel Multi-core Processor},
      booktitle = {International Symposium on Communications and Information Technologies, ISCIT '06},
      year = {2006},
      pages = {590-594},
      doi = {http://dx.doi.org/10.1109/ISCIT.2006.340018}
    }
    
    Chaichoompu, K., Kittitornkun, S. & Tongsima, S. Speedup Bioinformatics Applications on Multicore-Based Processor Using Vectorizing and Multithreading Strategies 2007 Bioinformation, Vol. 2(5), pp. 182-184  article URL 
    Abstract: Many computational intensive bioinformatics software, such as multiple sequence alignment, population structure analysis, etc., written in C/C++ are not multicore-aware. A multicore processor is an emerging CPU technology that combines two or more independent processors into a single package. The Single Instruction Multiple Data-stream (SIMD) paradigm is heavily utilized in this class of processors. Nevertheless, most popular compilers including Microsoft Visual C/C++ 6.0, x86 gnu C-compiler gcc do not automatically create SIMD code which can fully utilize the advancement of these processors. To harness the power of the new multicore architecture certain compiler techniques must be considered. This paper presents a generic compiling strategy to assist the compiler in improving the performance of bioinformatics applications written in C/C++. The proposed framework contains 2 main steps: multithreading and vectorizing strategies. After following the strategies, the application can achieve higher speedup by taking the advantage of multicore architecture technology. Due to the extremely fast interconnection networking among multiple cores, it is suggested that the proposed optimization could be more appropriate than making use of parallelization on a small cluster computer which has larger network latency and lower bandwidth.
    Review: A methodology to profile, analyze, apply threading, vectorize, and validate is given. See [Chaichoompu2006b] for the test case with MT-ClustalW.
    BibTeX:
    @article{Chaichoompu2007,
      author = {Kridsadakorn Chaichoompu and Surin Kittitornkun and Sissades Tongsima},
      title = {Speedup Bioinformatics Applications on Multicore-Based Processor Using Vectorizing and Multithreading Strategies},
      journal = {Bioinformation},
      year = {2007},
      volume = {2},
      number = {5},
      pages = {182-184},
      note = {Views and Challenges},
      url = {http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2241925/}
    }
    
    Chandrasekaran, S., Hernandez, O., Maskell, D.L., Chapman, B. & Bui, V. Compilation and Parallelization Techniques with Tool Support to Realize Sequence Alignment Algorithm on FPGA and Multicore 2007 Proceedings of the Workshop on New Horizons in Compilers, pp. 12  inproceedings URL 
    Abstract: Reconfigurable computing (RC), such as computing using field programmable gate array (FPGA) technology has been shown as the field to accelerate a large variety of applications. RC fills the gap between hardware and software, achieving high performance on the hardware than the software and at the same time maintaining a remarkable amount of flexibility. Though there are bottlenecks associated with using the FPGA accelerators with legacy application code, it is becoming difficult to decide which parts of the code should be implemented in hardware (versus software), to provide an efficient mapping from code to the highly parallel FPGA fabric and to evaluate the costs associated with running in a hybrid (hardware/software) mode. In this paper we present a methodology to tune an application with the help of a tool environment consisting of an open source parallelizing compiler, and static and performance analysis tools. This serves as a high performance tuning strategy for identifying the bottlenecks in the application code, followed by preprocessing, before mapping the algorithm to the FPGA in order to take advantage of the intrinsic speed. Using this performance toolset and our tuning methodology, we were able to parallelize and tune a bioinformatics application to produce better load balances and higher performance, yielding almost linear speedup, of up to 80% on dynamic scheduling with 128 threads on a 1000 sequence data set.
    Review: More emphasis is given to the compiler, profiling, and scheduling tools than the parallel algorithms.
    BibTeX:
    @inproceedings{Chandrasekaran2007,
      author = {Sunita Chandrasekaran and Oscar Hernandez and Douglas~L. Maskell and Barbara Chapman and Van Bui},
      title = {Compilation and Parallelization Techniques with Tool Support to Realize Sequence Alignment Algorithm on FPGA and Multicore},
      booktitle = {Proceedings of the Workshop on New Horizons in Compilers},
      year = {2007},
      pages = {12},
      url = {http://www2.cs.uh.edu/~hpctools/pub/hipc.pdf}
    }
    
    Cheetham, J., Dehne, F., Pitre, S., Rau-Chaplin, A. & Taillon, P.J. Parallel CLUSTAL W for PC Clusters 2003 Computational Science and Its Applications, ICCSA 2003  inproceedings DOI  
    Abstract: This paper presents a parallel version of CLUSTALW, called pCLUSTAL. In contrast to the commercial SGI parallel Clustal, which requires an expensive shared memory SGI multiprocessor, pCLUSTAL can be run on a range of distributed and shared memory parallel machines, from high-end parallel multiprocessors (e.g. Sunfire 6800, IBM SP2, etc.) to PC clusters, to simple networks of workstations. We have implemented pCLUSTAL using C and the MPI communication library, and tested it on a PC cluster. Our experimental evaluation shows that our pCLUSTAL code achieves similar or better speedup on a distributed memory PC clusters than the commercial SGI parallel Clustal on a shared memory SGI multiprocessor.
    Review: pCLUSTAL, overall ~21x speedup on 64 processors and ~16x on 32 processors, cluster system using C and MPI, parallelizes stages 1 and 3.
    BibTeX:
    @inproceedings{Cheetham2003,
      author = {James Cheetham and Frank Dehne and Sylvain Pitre and Andrew Rau-Chaplin and Peter~J. Taillon},
      title = {Parallel CLUSTAL W for PC Clusters},
      booktitle = {Computational Science and Its Applications, ICCSA 2003},
      year = {2003},
      doi = {http://dx.doi.org/10.1007/3-540-44843-8_32}
    }
    
    Cheng, Y. Optimizing ClustalW1.83 using AltiVec Implementation 2006   techreport  
    BibTeX:
    @techreport{Cheng2006,
      author = {Yinhe Cheng},
      title = {Optimizing ClustalW1.83 using AltiVec Implementation},
      year = {2006}
    }
    
    Date, S., Kulkarni, R., Kulkarni, B., Kulkarni-Kale, U. & Kolaskar, A. Multiple alignment of sequences on parallel computers 1993 Comput. Appl. Biosci., Vol. 9(4), pp. 397-402  article DOI  
    Abstract: A software package that allows one to carry out multiple alignment of protein and nucleic acid sequences of almost unlimited length and number of sequences is developed on C-DAC parallel computer--a transputer-based machine. The farming approach is used for data parallelization. The speed gains are almost linear when the number of transputers is increased from 4 to 64. The software is used to carry out multiple alignment of 100 sequences each of chain and chain of hemoglobin and 83 cytochrome c sequences. The signature sequence of cytochrome c was found to be PGTKMXF. The single parameter, multiple alignment score, S, has been used to categorize proteins in different subfamilies and groups.
    Review: PRAS, overall 59x speedup on 64 transputers, parallelize stages 1 and 3 with a farm approach and dynamic load balancing. Used parallel C toolkit from 3LC. The progressive alignment method is called ``hierarchical clustering.'' Stage 3 parallelization is different than usual. ``Profiling is done by using the iterative procedure of Gribskov et al. (1988) and Vingron and Argos (1989)... In the profiling stage, the profile matrix requires two segments of the given sequence. These segments form the work unit. Calculation of each element in the profile matrix is treated as an independent process. In other words clustering, collapsing of sequences and calculation of the variability index (VI) are carried out by the farmer processor.'' The method of [Vingron1989] is a hybrid of the divide-and-conquer approach and progressive alignment. It finds segments with a graph-based algorithm after finding anchors with k-mers and then aligns the segments progressively with pairwise DP.
    BibTeX:
    @article{Date1993,
      author = {Date, Shashank and Kulkarni, Rajendra and Kulkarni, Bhavana and Kulkarni-Kale, Urmila and Kolaskar, A.S.},
      title = {Multiple alignment of sequences on parallel computers},
      journal = {Comput. Appl. Biosci.},
      year = {1993},
      volume = {9},
      number = {4},
      pages = {397-402},
      doi = {http://dx.doi.org/10.1093/bioinformatics/9.4.397}
    }
    
    Demichowicz, M. & Liśkiewicz, M. A Parallel Divide-and-Conquer Approach for Multiple Sequence Alignment 2009   techreport  
    Abstract: We consider the problem of multiple sequence alignment with the goal of minimizing sum-of-pairs cost. The method Divide-and-Conquer Alignment (DCA) introduced in Stoye, J. (1997), Gene 211, GC45-GC56, and developed in Stoye, J. et al. (1997), CABIOS 13 (6), 625- 626, is generalized to be executed in parallel efficiently. We present experimental results for real biological samples showing that our parallel algorithm improves a naive straightforward parallel version of DCA and achieves better alignments both in mathematical and in the biological sense than the sequential DCA program using comparable computational resources.
    BibTeX:
    @techreport{Demichowicz2009,
      author = {Monika Demichowicz and Maciej Liśkiewicz},
      title = {A Parallel Divide-and-Conquer Approach for Multiple Sequence Alignment},
      year = {2009}
    }
    
    Deng, X., Li, E., Shan, J. & Chen, W. Parallel implementation and performance characterization of MUSCLE 2006 20th International Parallel and Distributed Processing Symposium, IPDPS 2006  inproceedings DOI  
    Abstract: Multiple sequence alignment is a fundamental and very computationally intensive task in molecular biology. MUSCLE, a new algorithm for creating multiple alignments of protein sequences, achieves a highest rank in accuracy and the fastest speed compared to ClustalW as well as T-Coffee, some widely used tools in multiple sequence alignment. To further accelerate the computations, we present the parallel implementation of MUSCLE in this paper. It is decomposed into several independent modules, which are parallelized with different OpenMP paradigms. We also conduct detailed performance characterization on symmetric multiple processor systems. The experiments show that MUSCLE scales well with the increase of processors, and achieves up to 15.times speedup on 16-way shared memory multiple processor system.
    Review: MUSCLE is parallelized. A speedup of 15.2 is achieved on a 16 processor SMP system using OpenMP. The target data set consists of 50--150 proteins of average length 330. Most of the execution time on this data set is spent in the second round distance matrix calculation which uses the ProbCons probabilistic method. This method might not be used on nucleotide sequences and it has been added since the original MUSCLE paper by Edgar (2004). The first round of alignment (FRA) is parallelized. The group-to-group alignment following the guide tree in the third stage of progressive alignment is executed in parallel. A queuing module is used to make sure children nodes are aligned before parent nodes. Poor parallelism is reported for this stage. Two modules of the probabilistic distance matrix calculation are also parallelized, namely, Probability Matrix Computation (PMC) and Consistency Transformation (CT). For both of these modules, the pairwise computation (in a double nested "for" loop) can be executed independently. Instead of using a "parallel for" directive, a task queue is again used (Intel workqueuing model).
    BibTeX:
    @inproceedings{Deng2006,
      author = {Xi Deng and Eric Li and Jiulong Shan and Wenguang Chen},
      title = {Parallel implementation and performance characterization of MUSCLE},
      booktitle = {20th International Parallel and Distributed Processing Symposium, IPDPS 2006},
      year = {2006},
      doi = {http://dx.doi.org/10.1109/IPDPS.2006.1639616}
    }
    
    Do, J. & Dewey, C. Parallel FSA: Improving the Performance of Multiple Sequence Alignment using a Workstation Cluster and Database 2009   techreport  
    Abstract: Multiple Sequence Alignments (MSA) are widely-used tools for biological sequence analysis such as function prediction and phylogeny inference. Recently, a MSA algorithm based on a statistically-sound method for model selection and parameterization, Fast Statistical Alignment (FSA), has been introduced. Although FSA is state-of-the-art with respect to accuracy and ability to scale to thousands of sequences, it still suffers from problems of long execution time and high memory consumption when trying to align truly large datasets of sequences. In this paper, we propose a parallelized version of FSA which interacts with a database to address these problems. First, it runs over a cluster of processors to reduce its runtime for pairwise comparisons. In order to restrict the memory to be consumed, it requests from the database only the required datasets while aligning sequences. We show that utilizing a cluster of processors can lead to large performance improvements. Also, by using a database, we can not only align large datasets without worrying about physical RAM limitations, but also, reuse the results of previous pairwise comparisons, resulting in a dramatic execution time reduction.
    Review: PFSA, parallel fast statistical alignment
    BibTeX:
    @techreport{Do.Jaeyoung2009,
      author = {Jaeyoung Do and Colin Dewey},
      title = {Parallel FSA: Improving the Performance of Multiple Sequence Alignment using a Workstation Cluster and Database},
      year = {2009}
    }
    
    Du, Z. & Lin, F. pNJTree: A parallel program for reconstruction of neighbor-joining tree and its application in ClustalW 2006 Parallel Computing, Vol. 32(5-6), pp. 441-446  article DOI  
    Abstract: Neighbor-joining (NJ) is a distance-based method for tree construction. It is the most widely used method with polynomial time complexity at present. However, a fundamental problem with the previous implementations of this method is its limitation to handle large taxa sets within a reasonable time and memory resources. In this paper, we present a parallel implementation, pNJTree, for fast construction of very large phylogenetic trees. In comparison, pNJTree gets near-linear speedup for large taxa sets. It can be used to improve the speedup of the parallelized ClustalW methods.
    BibTeX:
    @article{Du2006,
      author = {Zhihua Du and Feng Lin},
      title = {pNJTree: A parallel program for reconstruction of neighbor-joining tree and its application in ClustalW},
      journal = {Parallel Computing},
      year = {2006},
      volume = {32},
      number = {5-6},
      pages = {441-446},
      doi = {http://dx.doi.org/10.1016/j.parco.2006.05.001}
    }
    
    Duzlevski, O. SMP version of ClustalW 1.82 2002   unpublished  
    Abstract: I (Ognen Duzlevski) have worked and been associated for more than one year with the National Research Council of Canada's Plant Biotechnology Institute where I have developed a multi-threaded version of clustalw available for free download in source code form from http://bioinfo.pbi.nrc.ca/clustalw-smp/ (application note pending review with the Journal of Bioinformatics).
    Review: Stages 1 and 3 are parallelized.
    BibTeX:
    @unpublished{Duzlevski2002,
      author = {Ognen Duzlevski},
      title = {SMP version of ClustalW 1.82},
      year = {2002},
      note = {was available from http://bioinfo.pbi.nrc.ca/clustalw-smp/ (broken link)}
    }
    
    Dziubecki, P. & Zola, J. Grid Portal for Multiple Sequence Alignment 2007 Computing, Multimedia and Intelligent Techniques, Vol. 3(1), pp. 33-42  article URL 
    Abstract: Multiple sequence alignment is by far the most common task in computational biology. At the same time it is very computationally demanding optimisation problem. In this paper we describe a new Grid environment designed to facilitate access to the parallel multiple sequence alignment software. Our user–oriented Grid interface provides a uniform access to several popular alignment packages like, for example, ClustalW–MPI or Parallel PhyloBuilder, and it allows large alignment instances to be solved in parallel efficiently.
    BibTeX:
    @article{Dziubecki2007,
      author = {Piotr Dziubecki and Jaroslaw Zola},
      title = {Grid Portal for Multiple Sequence Alignment},
      journal = {Computing, Multimedia and Intelligent Techniques},
      year = {2007},
      volume = {3},
      number = {1},
      pages = {33-42},
      url = {http://www.ece.iastate.edu/~zola/counter/counter.php?res=papers/dziubecki_cmit2007.pdf}
    }
    
    Ebedes, J. & Datta, A. Multiple sequence alignment in parallel on a workstation cluster 2004 Bioinformatics, Vol. 20(7), pp. 1193-1195  article DOI  
    Abstract: Summary: Multiple sequence alignment is the NP-hard problem of aligning three or more DNA or amino acid sequences in an optimal way so as to match as many characters as possible from the set of sequences. The popular sequence alignment program ClustalW uses the classical method of approximating a sequence alignment, by first computing a distance matrix and then constructing a guide tree to show the evolutionary relationship of the sequences. We show that parallelizing the ClustalW algorithm can result in significant speedup. We used a cluster of workstations using C and message passing interface for our implementation. Experimental results show that speedup of over 5.5 on six processors is obtainable for most inputs. Availability: The software is available upon request from the second author.
    Review: Parallel ClustalW, first stage speedup 5.8 on 6 processors, no second stage, third stage 3x on 6 processors, used C and message passing on a cluster.
    BibTeX:
    @article{Ebedes2004,
      author = {Justin Ebedes and Amitava Datta},
      title = {Multiple sequence alignment in parallel on a workstation cluster},
      journal = {Bioinformatics},
      year = {2004},
      volume = {20},
      number = {7},
      pages = {1193-1195},
      doi = {http://dx.doi.org/10.1093/bioinformatics/bth055}
    }
    
    Helal, M., Mullin, L., Gaëta, B. & El-Gindy, H. Multiple sequence alignment using massively parallel mathematics of arrays 2007 Proceedings of the International Conference on High Performance Computing, Networking and Communication Systems (HPCNCS-07), pp. 120-127  inproceedings  
    BibTeX:
    @inproceedings{Helal2007,
      author = {Manal Helal and Lenore Mullin and Bruno Gaëta and Hossam El-Gindy},
      title = {Multiple sequence alignment using massively parallel mathematics of arrays},
      booktitle = {Proceedings of the International Conference on High Performance Computing, Networking and Communication Systems (HPCNCS-07)},
      year = {2007},
      pages = {120-127}
    }
    
    Helal, M., El-Gindy, H., Mullin, L. & Gaëta, B. Parallelizing Optimal Multiple Sequence Alignment by Dynamic Programming 2008 International Symposium on Parallel and Distributed Processing with Applications, ISPA '08, pp. 669-674  inproceedings DOI  
    Abstract: Optimal multiple sequence alignment by dynamic programming, like many highly dimensional scientific computing problems, has failed to benefit from the improvements in computing performance brought about by multi-processor systems, due to the lack of suitable scheme to manage partitioning and dependencies. A scheme for parallel implementation of the dynamic programming multiple sequence alignment is presented, based on a peer to peer design and a multidimensional array indexing method. This design results in up to 5-fold improvement compared to a previously described master/slave design, and scales favourably with the number of processors used. This study demonstrates an approach for parallelising multi-dimensional dynamic programming and similar algorithms utilizing multi-processor architectures.
    BibTeX:
    @inproceedings{Helal2008,
      author = {Manal Helal and Hossam El-Gindy and Lenore Mullin and Bruno Gaëta},
      title = {Parallelizing Optimal Multiple Sequence Alignment by Dynamic Programming},
      booktitle = {International Symposium on Parallel and Distributed Processing with Applications, ISPA '08},
      year = {2008},
      pages = {669-674},
      doi = {http://dx.doi.org/10.1109/ISPA.2008.93}
    }
    
    Helal, M., Mullin, L., Potter, J. & Sintchenko, V. Search Space Reduction Technique for Distributed Multiple Sequence Alignment 2009 Sixth IFIP International Conference on Network and Parallel Computing (NPC'09), pp. 219-226  inproceedings DOI  
    Abstract: To take advantage of the various High Performance Computer (HPC) architectures for multithreaded and distributed computing, this paper parallelizes the dynamic programming algorithm for Multiple Sequence Alignment (MSA). A novel definition of a hyper-diagonal through a tensor space is used to reduce the search space. Experiments demonstrate that scoring less than 1% of the search space produces the same optimal results as scoring the full search space. The alignment scores are often better than other heuristic methods and are capable of aligning more divergent sequences.
    BibTeX:
    @inproceedings{Helal2009,
      author = {Manal Helal and Lenore Mullin and John Potter and Vitali Sintchenko},
      title = {Search Space Reduction Technique for Distributed Multiple Sequence Alignment},
      booktitle = {Sixth IFIP International Conference on Network and Parallel Computing (NPC'09)},
      publisher = {IEEE Computer Society},
      year = {2009},
      pages = {219-226},
      doi = {http://dx.doi.org/10.1109/NPC.2009.43}
    }
    
    Isaza, S., Sánchez, F., Gaydadjiev, G., Ramirez, A. & Valero, M. Preliminary Analysis of the Cell BE Processor Limitations for Sequence Alignment Applications 2008 Embedded Computer Systems: Architectures, Modeling, and Simulation, SAMOS 2008, Vol. LNCS 5114, pp. 53-64  inproceedings DOI  
    Abstract: The fast growth of bioinformatics field has attracted the attention of computer scientists in the last few years. At the same time the increasing database sizes require greater efforts to improve the computational performance. From a computer architecture point of view, we intend to investigate how bioinformatics applications can benefit from future multi-core processors. In this paper we present a preliminary study of the Cell BE processor limitations when executing two representative sequence alignment applications (Ssearch and ClustalW). The inherent large parallelism of the targeted algorithms makes them ideal for architectures supporting multiple dimensions of parallelism (TLP and DLP). However, in the case of Cell BE we identified several architectural limitations that need a careful study and quantification.
    Review: Rating:** ClustalW is accelerated on the Cell BE. The parallel version is only compared to a single threaded G5 (PowerPC970) and a Cell-PPU. An overall speedup of ~2x is achieved with 8 SPUs compared with the G5.
    BibTeX:
    @inproceedings{Isaza2008,
      author = {Sebastian Isaza and Friman Sánchez and Georgi Gaydadjiev and Alex Ramirez and Mateo Valero},
      title = {Preliminary Analysis of the Cell BE Processor Limitations for Sequence Alignment Applications},
      booktitle = {Embedded Computer Systems: Architectures, Modeling, and Simulation, SAMOS 2008},
      publisher = {Springer Berlin / Heidelberg},
      year = {2008},
      volume = {LNCS 5114},
      pages = {53-64},
      doi = {http://dx.doi.org/10.1007/978-3-540-70550-5_7}
    }
    
    Ishikawa, M., Hoshida, M., Hirosawa, M., Toya, T., Onizuka, K. & Nitta, K. Protein Sequence Analysis by Parallel Inference Machine 1992 Proceedings of the International Conference on Fifth Generation Computer Systems (FGCS '92), pp. 294-299  inproceedings  
    Abstract: We have developed a multiple alignment system for protein sequence analysis. The system works on a parallel inference machine PIM. The merits of PIM bring prominent features to the multiple alignment system. The system consists of two major components: a parallel iterative aligner and an intelligent refiner. The aligner uses a parallel iterative search for aligning protein sequences. The search algorithm is the Berger-Munson algorithm with its parallel extension. Our implementation shows that the algorithm extended in parallel can rapidly produce better solutions than the original Berger-Munson algorithm. The refiner uses condition-action rules for refining multiple sequence alignments given by the aligner. The rules help to extract motif patterns from ambiguous alignment patterns.
    Review: See [Ishikawa1992] and [Araki1993] and [Yap1995] and [Martino1997] and [Yap1998] Section 4. The Berger-Munson algorithm is used. This paper just implements the parallel method suggested by Berger and Munson. All (2^n-1)-1) group-to-group alignments are run in parallel and the best is selected for the next iteration. It is called "the best-choice iterative strategy." The data set is 7 protein sequences of length 80. The parallel version on 63 PEs is about 10 times faster than the sequential version. The PIM machine is a MIMD system with up to 256 PEs.
    BibTeX:
    @inproceedings{Ishikawa1992a,
      author = {Masato Ishikawa and Masaki Hoshida and Makoto Hirosawa and Tomoyuki Toya and Kentaro Onizuka and Katsumi Nitta},
      title = {Protein Sequence Analysis by Parallel Inference Machine},
      booktitle = {Proceedings of the International Conference on Fifth Generation Computer Systems (FGCS '92)},
      year = {1992},
      pages = {294-299}
    }
    
    Ishikawa, M., Hoshida, M., Hirosawa, M., Toya, T., Onizuka, K. & Nitta, K. Protein Sequence Analysis Program: Multiple Sequence Alignment by Parallel Iterative Aligner 1992 Demonstrations of the International Conference on Fifth Generation Computer Systems (FGCS'92), pp. 57-62  inproceedings  
    Abstract: Every protein is a chain of twenty kinds of amino acids. Its structure and function are determined by the sequence of these amino acids. Protein sequence analysis is vital to the prediction of protein structure and function, and to the analysis of the evolutional relationship among proteins. We built the "Parallel Iterative Aligner" for this analysis using a PIM machine. It solves a typical protein sequence analysis problem, that of multiple sequence alignment. This kind of combinatorial problem generally requires a large amount of computation. Our Parallel Iterative Aligner, however, can quickly solve practical-sized problems, producing high-quality answers.
    Review: See [Ishikawa1992] and [Araki1993] and [Yap1995] and [Martino1997] and [Yap1998] Section 4. The Berger-Munson algorithm is used. The "restricted partitioning technique" is introduced. A group of N sequences is divided into two groups of k and N-k sequences.
    BibTeX:
    @inproceedings{Ishikawa1992b,
      author = {Masato Ishikawa and Masaki Hoshida and Makoto Hirosawa and Tomoyuki Toya and Kentaro Onizuka and Katsumi Nitta},
      title = {Protein Sequence Analysis Program: Multiple Sequence Alignment by Parallel Iterative Aligner},
      booktitle = {Demonstrations of the International Conference on Fifth Generation Computer Systems (FGCS'92)},
      year = {1992},
      pages = {57-62}
    }
    
    Ishikawa, M., Toya, T., Hoshida, M., Nitta, K., Ogiwara, A. & Kanehisa, M. Multiple sequence alignment by parallel simulated annealing 1993 Comput. Appl. Biosci., Vol. 9(3), pp. 267-273  article DOI  
    Abstract: We have developed simulated annealing algorithms to solve the problem of multiple sequence alignment. The algorithm was shown to give the optimal solution as confirmed by the rigorous dynamic programming algorithm for three-sequence alignment. To overcome long execution times for simulated annealing, we utilized a parallel computer. A sequential algorithm, a simple parallel algorithm and the temperature parallel algorithm were tested on a problem. The results were compared with the result obtained by a conventional tree-based algorithm where alignments were merged by two-' dynamic programming. Every annealing algorithm produced a better energy value than the conventional algorithm. The best energy value, which probably represents the optimal solution, was reached within a reasonable time by both of the parallel annealing algorithms. We consider the temperature parallel algorithm of simulated annealing to be the most suitable for finding the optimal multiple sequence alignment because the algorithm does not require any scheduling for optimization. The algorithm is also useful for refining multiple alignments obtained by other heuristic methods.
    Review: See [Yap1998] [27,28,32]. The machine called Multi-PSI is a distributed, MIMD system with 64 processing elements and 80 MB of memory per node. A 2-D mesh (8x8) connects the PEs. The program was written in KL1 which is similar to Prolog.
    BibTeX:
    @article{Ishikawa1993a,
      author = {Masato Ishikawa and Tomoyuki Toya and Masaki Hoshida and Katsumi Nitta and Atushi Ogiwara and Minoru Kanehisa},
      title = {Multiple sequence alignment by parallel simulated annealing},
      journal = {Comput. Appl. Biosci.},
      year = {1993},
      volume = {9},
      number = {3},
      pages = {267-273},
      doi = {http://dx.doi.org/10.1093/bioinformatics/9.3.267}
    }
    
    Ishikawa, M., Toya, T., Totoki, Y. & Konagaya, A. Parallel Iterative Aligner with Genetic Algorithm 1993 Genome Informatics Workshop, Vol. 4, pp. 84-93  inproceedings  
    Abstract: This paper proposes a new methodology to improve the performance of multiple sequence alignment by combining a genetic algorithm and an iterative alignment algorithm. Iterative alignment algorithms usually achieve better alignment than other alignment algorithms, such as tournament based multiple alignment. They, also, can incorporate parallelism to improve execution performance. However, they sometimes suffer from being trapped in the local optima and result in relatively low-quality alignments due to their rapid convergence. A genetic algorithm can save this problem by exchanging partial alignment sequences between "individuals". Our experiments show that the combination of a genetic algorithm and an iterative alignment algorithm produces better results than iterative aligners which employ hill-climbing search strategies.
    Review: The machine (PIM) is a MIMD with 256 PEs.
    BibTeX:
    @inproceedings{Ishikawa1993b,
      author = {Masato Ishikawa and Tomoyuki Toya and Yasushi Totoki and Akihiko Konagaya},
      title = {Parallel Iterative Aligner with Genetic Algorithm},
      booktitle = {Genome Informatics Workshop},
      publisher = {Universal Academy Press, Inc.},
      year = {1993},
      volume = {4},
      pages = {84-93}
    }
    
    Ishikawa, M., Toya, T. & Totoki, Y. Parallel Application Systems in Genetic Information Processing 1994 Proceedings of the International Conference on Fifth Generation Computer Systems (FGCS '94), pp. 129-138  inproceedings  
    Abstract: We have developed parallel application systems for genetic sequence analysis. The systems are three parallel iterative aligners and an alignment workbench for the aligners. The parallel iterative aligners use a best-first search, a hill-climbing search, and a genetic algorithm. Each iterative aligner can produce higher-quality solutions than the conventional tree-based aligner. The genetic algorithm shows better performance than the best-first and hill-climbing searches. The alignment workbench, which features the parallel iterative aligners, realizes alignment which is not only fast and high-quality-it is also constraint-based. When a user has some biological knowledge which indicates that some characters might be aligned in a column, a constraint can be defined for those characters. The constraint set is considered simultaneously in each iteration cycle of parallel alignment. Then appropriate multiple alignment is generated by the aligner and displayed on the workbench's full-color display. The alignment workbench also contains the following characteristic sequence analysis modules: a phylogenetic tree drawer, a motif-database matcher, and a stem region specifier.
    BibTeX:
    @inproceedings{Ishikawa1994,
      author = {Masato Ishikawa and Tomoyuki Toya and Yasushi Totoki},
      title = {Parallel Application Systems in Genetic Information Processing},
      booktitle = {Proceedings of the International Conference on Fifth Generation Computer Systems (FGCS '94)},
      year = {1994},
      pages = {129-138}
    }
    
    Kleinjung, J., Douglas, N. & Heringa, J. Parallelized multiple alignment 2002 Bioinformatics, Vol. 18(9), pp. 1270-1271  article DOI  
    Abstract: Summary: Multiple sequence alignment is a frequently used technique for analyzing sequence relationships. Compilation of large alignments is computationally expensive, but processing time can be considerably reduced when the computational load is distributed over many processors. Parallel processing functionality in the form of single-instruction multiple-data (SIMD) technology was implemented into the multiple alignment program Praline by using `message passing interface' (MPI) routines. Over the alignments tested here, the parallelized program performed up to ten times faster on 25 processors compared to the single processor version. Availability: Example program code for parallelizing pairwise alignment loops is available from http://mathbio.nimr.mrc.ac.uk/~jkleinj/tools/mpicode. The `message passing interface' package (MPICH) is available from http:/www.unix.mcs.anl.gov/mpi/mpich. Contact: jhering@nimr.mrc.ac.uk Supplementary information: Praline is accessible at http://mathbio.nimr.mrc.ac.uk/praline.
    Review: Rating:** Praline MPI. A speedup of 10 was realized with 25 processors on a set of 200 sequences that are 200 residues in length. The pairwise sequence alignment stage is parallelized in the usual way with independent tasks. The progressive profile alignment (PPA) stage is also parallelized. After each step in this stage, n-2 communications are required. No guide tree is used in the progressive profile alignment (PPA) stage. A full profile search is done on each remaining sequence or profile to be merged and the highest scoring one is included in the final alignment. This profile search is the part that must be done in parallel.
    BibTeX:
    @article{Kleinjung2002,
      author = {Kleinjung, Jens and Douglas, Nigel and Heringa, Jaap},
      title = {Parallelized multiple alignment},
      journal = {Bioinformatics},
      year = {2002},
      volume = {18},
      number = {9},
      pages = {1270-1271},
      doi = {http://dx.doi.org/10.1093/bioinformatics/18.9.1270}
    }
    
    Lee, Y.S. & Oh, J. A Simple and Fast Web Alignment Tool for Large Amount of Sequence Data 2008 Genomics & Informatics, Vol. 6(3), pp. 157-159  article URL 
    Abstract: Multiple sequence alignment (MSA) is the most important step for many of biological sequence analyses, homology search, and protein structural assignments. However, large amount of data make biologists difficult to perform MSA analyses and it requires much computational time to align many sequences. Here, we have developed a simple and fast web alignment tool for aligning, editing, and visualizing large amount of sequence data. We used a cluster server installed ClustalW-MPI using web services and message passing interface (MPI). It also enables users to edit multiple sequence alignments for manual editing and to download the input data and results such as alignments and phylogenetic tree.
    BibTeX:
    @article{Lee.Yong2008,
      author = {Yong~Seok Lee and Jeongsu Oh},
      title = {A Simple and Fast Web Alignment Tool for Large Amount of Sequence Data},
      journal = {Genomics & Informatics},
      year = {2008},
      volume = {6},
      number = {3},
      pages = {157-159},
      url = {http://www.genominfo.org/html/UploadFile/article10_200809.pdf}
    }
    
    Li, K.-B. ClustalW-MPI: ClustalW analysis using distributed and parallel computing 2003 Bioinformatics, Vol. 19(12), pp. 1585-1586  article DOI  
    Abstract: Summary: ClustalW is a tool for aligning multiple protein or nucleotide sequences. The alignment is achieved via three steps: pairwise alignment, guide-tree generation and progressive alignment. ClustalW-MPI is a distributed and parallel implementation of ClustalW. All three steps have been parallelized to reduce the execution time. The software uses a message-passing library called MPI (Message Passing Interface) and runs on distributed workstation clusters as well as on traditional parallel computers. Availability:The source codes are written in ISO C and are available at http://www.bii.a-star.edu.sg/software/clustalw-mpi/. An open source implementations of MPI http://www-unix.mcs.anl.gov/mpi/. Contact: kuobin@bii.a-star.edu.sg
    Review: ClustalW-MPI, A speedup of 15.8 was observed in Stage 1 and a speedup of 4.3 in Stage 3 and a speedup of 14.6 overall with 16 processors (8 nodes: dual 800 MHz Pentium III). The data set consists of 500 protein sequences of length 1100. A portion of stage 2 is also accelerated. Guide tree parallelism is used in Stage 3. The recursive splitting of sections in the Myers-Miller algorithm are parallelized. The forward and backward sections can be done in parallel but processes must join to make a split point decision before recursively calling pdiff again. Independently, [Lin2005 DOI:10.1109/HPCASIA.2005.96] reports up to 29.1x speedup with 64 CPUs.
    BibTeX:
    @article{Li.Kuo-Bin2003,
      author = {Kuo-Bin Li},
      title = {ClustalW-MPI: ClustalW analysis using distributed and parallel computing},
      journal = {Bioinformatics},
      year = {2003},
      volume = {19},
      number = {12},
      pages = {1585-1586},
      doi = {http://dx.doi.org/10.1093/bioinformatics/btg192}
    }
    
    Li, Y. & Chen, C.-K. Parallelization of Multiple Genome Alignment 2005 High Performance Computing and Communications, HPCC 2005, Vol. LNCS 3726, pp. 910-915  inproceedings DOI  
    Abstract: In this work, we implement a genome alignment system which applies parallelization schemes to the ClustalW algorithm and the interface of database querying. Parallel construction of the distance matrices and parallelization of progressive alignment in the ClustalW algorithm are performed on PC-based Linux cluster with message-passing interface libraries. Achieved experiments show good speedup and significant parallel performance.
    Review: ClustalW algorithm has speedup of 14.95 with 16 processors using MPI on a Linux cluster. Parallelize stages 1 and 3.
    BibTeX:
    @inproceedings{Li.Yiming2005,
      author = {Yiming Li and Cheng-Kai Chen},
      title = {Parallelization of Multiple Genome Alignment},
      booktitle = {High Performance Computing and Communications, HPCC 2005},
      publisher = {Springer-Verlag Berlin Heidelberg},
      year = {2005},
      volume = {LNCS 3726},
      pages = {910-915},
      doi = {http://dx.doi.org/10.1007/11557654_102}
    }
    
    Lima, C.R.E., Lopes, H.S., Moroz, M.R. & Menezes, R.M. Multiple Sequence Alignment Using Reconfigurable Computing 2007 Reconfigurable Computing: Architectures, Tools and Applications, ARC 2007, Vol. LNCS 4419, pp. 379-384  inproceedings DOI  
    Abstract: The alignment of multiple protein (or DNA) sequences is a current problem in Bioinformatics. ClustalW is the most popular heuristic algorithm for multiple sequence alignment. Pairwise alignment has exponential complexity and it is the most time-consuming part of ClustalW. This part of ClustalW was implemented using a reconfigurable logic hardware solution: Hardalign. The system was evaluated using data sets of different dimensionality, and compared with a pure software version running in a embedded processor, as well as running in a desktop computer. Results indicate that such implementation is capable of accelerating significantly part of the algorithm, and this is especially important for processing large protein data sets.
    Review: Rating:* Traceback pointers are computed on the FPGA for Stage 1 pairwise alignment, but nothing else is really novel when compared with [Lin.Xu2005, Oliver2006b]. The accelerator is named Hardalign. ClustalW is accelerated on an Altera Stratix II EP2S60F672C5ES device clocked at 40 MHz with 8 MLPUs (PEs). A 10x (simulated 277.2x with DMA) speedup on Stage 1 is achieved compared with an AthlonXP1600+ processor. The Stage 1 speedup is 112.7x when compared with the Altera NIOS II soft core CPU. A 100% or 2x speedup overall is achieved when compared with the NIOS II core. Traceback pointers are stored in SDRAM from the 8 PEs. No overall speedup comparison is given with a commodity processor. They implemented constant gaps only.
    BibTeX:
    @inproceedings{Lima2007,
      author = {Carlos~R.~Erig Lima and Heitor~S. Lopes and Maiko~R. Moroz and Ramon~M. Menezes},
      title = {Multiple Sequence Alignment Using Reconfigurable Computing},
      booktitle = {Reconfigurable Computing: Architectures, Tools and Applications, ARC 2007},
      publisher = {Springer Berlin / Heidelberg},
      year = {2007},
      volume = {LNCS 4419},
      pages = {379-384},
      doi = {http://dx.doi.org/10.1007/978-3-540-71431-6_37}
    }
    
    Lin, X., Peiheng, Z., Dongbo, B., Shengzhong, F. & Ninghui, S. To Accelerate Multiple Sequence Alignment using FPGAs 2005 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05), pp. 5  inproceedings DOI  
    Abstract: Multiple sequence alignment is a fundamental and challenging problem in computational molecular biology. ClustalW, the most widely used multiple sequence alignment software, performs very slowly on hundreds of sequence. Here, we analyze the algorithm complexity of ClustalW as well as the time profile in practice, and then propose a strategy which uses the reconfigurable hardware FPGA to accelerate ClustalW. Comparison with other coarse-grained parallel strategies demonstrates a fine speedup of this strategy and savage of computing resource
    Review: Rating:***** A novel aspect is that the number of identical characters (NIDs) is deduced from the comparison score returned from the accelerator and the sequence lengths. The accelerator system is called the Dawning 4000H and consists of 10 Altera Stratix PEIS30 FPGAs. With a total of 3072 PEs clocked at 133 MHz, the system achieves 409.6 GCUPS. The Stage 1 speedup is 1697.5x and the overall speedup is 34.6x compared to ClustalW running on 2.8 GHz Xeon. They only implemented constant gaps and can only handle nucleotides. The performance is also compared with ClustalW-MPI running on 40 nodes (80 CPUs), which achieves a speedup of 29.1 over one CPU.
    BibTeX:
    @inproceedings{Lin.Xu2005,
      author = {Xu Lin and Zhang Peiheng and Bu Dongbo and Feng Shengzhong and Sun Ninghui},
      title = {To Accelerate Multiple Sequence Alignment using FPGAs},
      booktitle = {Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05)},
      year = {2005},
      pages = {5},
      doi = {http://dx.doi.org/10.1109/HPCASIA.2005.96}
    }
    
    Liu, W., Schmidt, B., Voss, G. & Müller-Wittig, W. GPU-ClustalW: Using Graphics Hardware to Accelerate Multiple Sequence Alignment 2006 High Performance Computing, HiPC 2006, Vol. LNCS 4297, pp. 363-374  inproceedings DOI  
    Abstract: Molecular Biologists frequently compute multiple sequence alignments (MSAs) to identify similar regions in protein families. However, aligning hundreds of sequences by popular MSA tools such as ClustalW requires several hours on sequential computers. Due to the rapid growth of biological sequence databases biologists have to compute MSAs in a far shorter time. In this paper we present a new approach to reduce this runtime using graphics processing units (GPUs). To derive an efficient mapping onto this type of architecture, we have reformulated the computationally most expensive part of ClustalW in terms of computer graphics primitives. This results in a high-speed implementation with significant runtime savings on a commodity graphics card.
    Review: GPU-ClustalW, overall 7.2 speedup with 1 GPU card (GeForce 7800 GTX) compared with a 3.0 GHz Pentium 4, only accelerates Stage 1, the Stage 1 speedup is 11.7x, implemented with GLSL (OpenGL Shading Language). A recurrence relation is proposed for GPUs (same used on FPGAs) to assist in calculating the number of identical characters (NIDs).
    BibTeX:
    @inproceedings{Liu.Weiguo2006b,
      author = {Weiguo Liu and Bertil Schmidt and Gerrit Voss and Wolfgang Müller-Wittig},
      title = {GPU-ClustalW: Using Graphics Hardware to Accelerate Multiple Sequence Alignment},
      booktitle = {High Performance Computing, HiPC 2006},
      publisher = {Springer Berlin / Heidelberg},
      year = {2006},
      volume = {LNCS 4297},
      pages = {363-374},
      doi = {http://dx.doi.org/10.1007/11945918_37}
    }
    
    Liu, W., Schmidt, B., Voss, G. & Müller-Wittig, W. Streaming Algorithms for Biological Sequence Alignment on GPUs 2007 IEEE Transactions on Parallel and Distributed Systems, Vol. 18(9), pp. 1270-1281  article DOI  
    Abstract: Sequence alignment is a common and often repeated task in molecular biology. Typical alignment operations consist of finding similarities between a pair of sequences (pairwise sequence alignment) or a family of sequences (multiple sequence alignment). The need for speeding up this treatment comes from the rapid growth rate of biological sequence databases: every year their size increases by a factor of 1.5 to 2. In this paper, we present a new approach to high-performance biological sequence alignment based on commodity PC graphics hardware. Using modern graphics processing units (GPUs) for high-performance computing is facilitated by their enhanced programmability and motivated by their attractive price/performance ratio and incredible growth in speed. To derive an efficient mapping onto this type of architecture, we have reformulated dynamic-programming-based alignment algorithms as streaming algorithms in terms of computer graphics primitives. Our experimental results show that the GPU-based approach allows speedups of more than one order of magnitude with respect to optimized CPU implementations.
    Review: Only the first stage of MSA is accelerated with a speedup of 12. Overall speedup is 7.2 on a GeForce 7900 GTX 512 compared with a 3 GHz Pentium 4.
    BibTeX:
    @article{Liu.Weiguo2007,
      author = {Weiguo Liu and Bertil Schmidt and Gerrit Voss and Wolfgang Müller-Wittig},
      title = {Streaming Algorithms for Biological Sequence Alignment on GPUs},
      journal = {IEEE Transactions on Parallel and Distributed Systems},
      year = {2007},
      volume = {18},
      number = {9},
      pages = {1270-1281},
      doi = {http://dx.doi.org/10.1109/TPDS.2007.1069}
    }
    
    Liu, Y., Schmidt, B. & Maskell, D.L. Parallel Reconstruction of Neighbor-Joining Trees for Large Multiple Sequence Alignments using CUDA 2009 IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2009, pp. 1-8  inproceedings DOI  
    Abstract: Computing large multiple protein sequence alignments using progressive alignment tools such as ClustalW requires several hours on state-of-the-art workstations. ClustalW uses a three-stage processing pipeline: (i) pairwise distance computation; (ii) phylogenetic tree reconstruction; and (iii) progressive multiple alignment computation. Previous work on accelerating ClustalW was mainly focused on parallelizing the first stage and achieved good speedups for a few hundred input sequences. However, if the input size grows to several thousand sequences, the second stage can dominate the overall runtime. In this paper, we present a new approach to accelerating this second stage using graphics processing units (GPUs). In order to derive an efficient mapping onto the GPU architecture, we present a parallelization of the neighbor-joining tree reconstruction algorithm using CUDA. Our experimental results show speedups of over 26times for large datasets compared to the sequential implementation.
    BibTeX:
    @inproceedings{Liu.Yongchao2009b,
      author = {Yongchao Liu and Bertil Schmidt and Douglas~L. Maskell},
      title = {Parallel Reconstruction of Neighbor-Joining Trees for Large Multiple Sequence Alignments using CUDA},
      booktitle = {IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2009},
      year = {2009},
      pages = {1-8},
      doi = {http://dx.doi.org/10.1109/IPDPS.2009.5160923}
    }
    
    Liu, Y., Schmidt, B. & Maskell, D.L. MSA-CUDA: Multiple Sequence Alignment on Graphics Processing Units with CUDA 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP'09), pp. 121-128  inproceedings DOI  
    Abstract: Progressive alignment is a widely used approach for computing multiple sequence alignments (MSAs). However, aligning several hundred or thousand sequences with popular progressive alignment tools such as ClustalW requires hours or even days on state-of-the-art workstations. This paper presents MSA-CUDA, a parallel MSA program, which parallelizes all three stages of the ClustalW processing pipeline using CUDA and achieves significant speedups compared to the sequential ClustalW for a variety of large protein sequence datasets. Our tests on a GeForce GTX 280 GPU demonstrate average speedups of 36.91 (for long protein sequences), 18.74 (for average-length protein sequences), and 11.27 (for short protein sequences) compared to the sequential ClustalW running on a Pentium 4 3.0 GHz processor. Our MSA-CUDA outperforms ClustalW-MPI running on 32 cores of a high performance workstation cluster.
    Review: A peak speedup of 41.53 with 1 GPU card (GeForce GTX 280) is achieved when compared with a 3.0 GHz Pentium 4. CUDA is used for the GPU code. All three stages of progressive alignment (ClustalW) are accelerated by the GPU. The maximum speedup obtained in stages 1-3 is 47.13, 11.08, and 5.9.
    BibTeX:
    @inproceedings{Liu.Yongchao2009c,
      author = {Yongchao Liu and Bertil Schmidt and Douglas~L. Maskell},
      title = {MSA-CUDA: Multiple Sequence Alignment on Graphics Processing Units with CUDA},
      booktitle = {20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP'09)},
      publisher = {IEEE Computer Society},
      year = {2009},
      pages = {121-128},
      doi = {http://dx.doi.org/10.1109/ASAP.2009.14}
    }
    
    Lloyd, S. & Snell, Q. Accelerated Large-Scale Multiple Sequence Alignment with Reconfigurable Computing 2010 Presentation  electronic URL 
    Abstract: Multiple Sequence Alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations according to Amdahl's Law. This work is the first known to accelerate the third stage of progressive alignment on reconfigurable hardware. We reduce subgroups of aligned sequences into discrete profiles before they are pairwise aligned on the accelerator. For a single pairwise alignment, a speedup of 300 has been demonstrated when compared to a 2.4 GHz Core2 processor. Our parallel algorithm and architecture accelerates large-scale MSA with reconfigurable computing and allows researchers to solve the larger problems that confront biologists today.
    BibTeX:
    @electronic{Lloyd2010,
      author = {Scott Lloyd and Quinn Snell},
      title = {Accelerated Large-Scale Multiple Sequence Alignment with Reconfigurable Computing},
      year = {2010},
      url = {http://sc10.supercomputing.org/schedule/event_detail.php?evid=drs115}
    }
    
    Lopes, H.S. & Moritz, G.L. A Distributed Approach for a Multiple Sequence Alignment Algorithm Using a Parallel Virtual Machine 2005 27th Annual International Conference of the Engineering in Medicine and Biology Society, IEEE-EMBS 2005, pp. 2843-2846  inproceedings DOI  
    Abstract: Multiple sequence alignment is a central topic of extensive research in computational biology. Basically, two or more protein sequences are compared so as to evaluate their similarity. This work reports a methodology for parallel processing of a multiple sequence alignment algorithm (ClustalW) in an environment of networked computers. A detailed description of the modules that compose the distributed system is provided, giving special attention to the way a dynamic programming algorithm can be executed in parallel. Extensive experiments were done to evaluate performance and scalability of the method. Results show that the proposed method is efficient and offers a real advantage for large-scale multiple protein sequence alignment.
    BibTeX:
    @inproceedings{Lopes2005,
      author = {Heitor~S. Lopes and Guilherme~L. Moritz},
      title = {A Distributed Approach for a Multiple Sequence Alignment Algorithm Using a Parallel Virtual Machine},
      booktitle = {27th Annual International Conference of the Engineering in Medicine and Biology Society, IEEE-EMBS 2005},
      year = {2005},
      pages = {2843-2846},
      doi = {http://dx.doi.org/10.1109/IEMBS.2005.1617066}
    }
    
    Low, D.H., Veeravalli, B. & Bader, D.A. On the design of high-performance algorithms for aligning multiple protein sequences on mesh-based multiprocessor architectures 2007 Journal of Parallel and Distributed Computing, Vol. 67(9), pp. 1007-1017  article DOI  
    Abstract: In this paper, we address the problem of multiple sequence alignment (MSA) for handling very large number of proteins sequences on mesh-based multiprocessor architectures. As the problem has been conclusively shown to be computationally complex, we employ divisible load paradigm (also, referred to as divisible load theory, DLT) to handle such large number of sequences. We design an efficient computational engine that is capable of conducting MSAs by exploiting the underlying parallelism embedded in the computational steps of multiple sequence algorithms. Specifically, we consider the standard Smith-Waterman (SW) algorithm in our implementation, however, our approach is by no means restrictive to SW class of algorithms alone. The treatment used in this paper is generic to a class of similar dynamic programming problems. Our approach is recursive in the sense that the quality of solutions can be refined continuously till an acceptable level of quality is achieved. After first phase of computation, we design a heuristic scheme that renders the final solution for MSA. We conduct rigorous simulation experiments using several hundreds of homologous protein sequences derived from the Rattus Norvegicus and Mus Musculus databases of olfactory receptors. We quantify the performance based on speed-up metric. We compare our algorithms to serial or single machine processing approaches. We testify our findings by comparing with conventional equal load partitioning (ELP) strategy that is commonly used in the parallel processing literature. Based on our extensive simulation study, we observe that DLT paradigm offers an excellent speed-up characteristics and provides avenues for its use in several other biological sequence processing related problem. This study is a first time attempt in using the DLT paradigm to devise efficient strategies to handle large scale multiple protein sequence alignment problem on mesh-based multiprocessor systems.
    Review: At the first level, this method appears to be a domain decomposition. Sequences are divided equally and randomly? among rows of nodes in a 2D mesh. This is the first level of parallelism. Each row of nodes aligns the group of sequences independently. The [Taylor1987] clustering method is used to order sequences for alignment. The DP calculations for pairwise alignment are divided into loads with divisible load theory and are allocated to a row of nodes. Within a row of 10 processors, a speedup of 3.8 can be approximated from Figure 9 for aligning 20 protein sequences of average length 300 (Rattus Norvegicus and Mus Musculus olfactory receptors). This speedup comes from parallelizing the DP calculations with DLT. The system is a 16-node Linux cluster with 32 2.2 GHz Intel Xeon processors.
    BibTeX:
    @article{Low2007,
      author = {Diana H.P. Low and Bharadwaj Veeravalli and David A. Bader},
      title = {On the design of high-performance algorithms for aligning multiple protein sequences on mesh-based multiprocessor architectures},
      journal = {Journal of Parallel and Distributed Computing},
      year = {2007},
      volume = {67},
      number = {9},
      pages = {1007-1017},
      doi = {http://dx.doi.org/10.1016/j.jpdc.2007.03.007}
    }
    
    Luo, J., Ahmad, I., Ahmed, M. & Paul, R. Parallel Multiple Sequence Alignment with Dynamic Scheduling 2005 International Conference on Information Technology: Coding and Computing, ITCC 2005, Vol. 1, pp. 8-13  inproceedings DOI  
    Abstract: This paper proposes a parallel implementation of the multiple sequence alignment algorithm, known as ClustalW, on distributed memory parallel machines. The proposed algorithm divides a progressive alignment into subtasks and schedules them dynamically. A task tree is built according to the dependency of the generated phylogenetic tree. The computation and communication costs of the tasks are estimated at run-time and updated periodically. With dynamic scheduling, tasks are allocated to the processors considering the tasks' estimated computation and communication costs and the processors' workload in order to minimize the completion time. The experiment results show that the proposed parallel implementation achieves a considerable speedup over the sequential ClustalW.
    Review: The cluster system has 16 nodes with dual 2.4 GHz Xeon processors and 2 GB of RAM, but only 1 to 8 processors were used in the benchmark. The nodes are connected with gigabit Ethernet. Beta purple bacterial RNA sequences were used in the tests with lengths ranging from 289 to 399 nucleotides. Stages 1 and 3 were parallelized. Dynamic scheduling was used for tasks that follow the guide tree. A speedup of 6 was achieved on 8 processors with 80 sequences. The program was written in C and uses MPI.
    BibTeX:
    @inproceedings{Luo2005,
      author = {Jiancong Luo and Ishfaq Ahmad and Munib Ahmed and Raymond Paul},
      title = {Parallel Multiple Sequence Alignment with Dynamic Scheduling},
      booktitle = {International Conference on Information Technology: Coding and Computing, ITCC 2005},
      year = {2005},
      volume = {1},
      pages = {8-13},
      doi = {http://dx.doi.org/10.1109/ITCC.2005.223}
    }
    
    Martino, R.L., Yap, T.K. & Suh, E.B. Parallel Algorithms in Molecular Biology 1997 High-Performance Computing and Networking, Vol. LNCS 1225, pp. 232-240  inproceedings DOI  
    Abstract: Scalable parallel computer architectures provide the computational performance needed for advanced computing problems in molecular biology. Many scientific challenges in molecular biology have associated with them a computational requirement that must be solved before scientific progress can be made. We have developed a number of parallel algorithms and techniques useful in determining biological structure and function. Two example applications are the alignment of multiple DNA and protein sequences using speculative computation and the calculation of the solvent accessible surface area of proteins used to predict the three-dimensional conformation of these molecules from their primary structure. Timing results demonstrate substantial performance improvements with parallel implementations compared with conventional sequential systems. As the developed methods allow molecular biologists to perform computational tasks that would not otherwise be possible, we continue to develop parallel algorithms useful to this important scientific field.
    Review: See [Ishikawa1992] and [Araki1993] and [Yap1995] and [Martino1997] and [Yap1998] Section 4. The method uses speculative computation which is similar to simulated annealing. Speculative computation is applied to parallelize the Berger-Munson algorithm. A draft alignment created manually or from Clustal is iteratively refined.
    BibTeX:
    @inproceedings{Martino1997,
      author = {Robert~L. Martino and Tieng~K. Yap and Edward~B. Suh},
      title = {Parallel Algorithms in Molecular Biology},
      booktitle = {High-Performance Computing and Networking},
      publisher = {Springer Berlin / Heidelberg},
      year = {1997},
      volume = {LNCS 1225},
      pages = {232-240},
      doi = {http://dx.doi.org/10.1007/BFb0031596}
    }
    
    Masuno, S., Maruyama, T., Yamaguchi, Y. & Konagaya, A. Multidimensional Dynamic Programming for Homology Search 2005 International Conference on Field Programmable Logic and Applications, FPL 2005, pp. 173-178  inproceedings DOI  
    Abstract: Alignment problems in computational biology have been focused recently because of the rapid growth of sequence databases. By computing alignment, we can understand similarity among the sequences. Many systems for alignment have been proposed to date, but most of them are designed for two-dimensional alignment (alignment between two sequences). In this paper, we describe a compact system with an off-the-shelf FPGA board and a host computer for more than three-dimensional alignment based on dynamic programming. In our approach, high performance is achieved (1) by configuring optimal circuit for each dimensional alignment, and (2) by two phase search in each dimension by reconfiguration. In order to realize multidimensional search with a common architecture, two-dimensional dynamic programming is repeated along other dimensions. With this approach, we can minimize the size of units for alignment and achieve high parallelism. Our system with one XC2V6000 enables about 300-fold speedup as compared with single Intel Pentium 4 2GHz processor for four-dimensional alignment, and 100-fold speedup for five-dimensional alignment.
    Review: Rating:*** See [Masuno2007b] for the journal version of this paper.
    BibTeX:
    @inproceedings{Masuno2005,
      author = {Shingo Masuno and Tsutomu Maruyama and Yoshiki Yamaguchi and Akihiko Konagaya},
      title = {Multidimensional Dynamic Programming for Homology Search},
      booktitle = {International Conference on Field Programmable Logic and Applications, FPL 2005},
      year = {2005},
      pages = {173-178},
      doi = {http://dx.doi.org/10.1109/FPL.2005.1515718}
    }
    
    Masuno, S., Maruyama, T., Yamaguchi, Y. & Konagaya, A. Multidimensional Dynamic Programming for Homology Search on Distributed Systems 2006 Euro-Par 2006 Parallel Processing, Vol. LNCS 4128, pp. 1127-1137  inproceedings DOI  
    Abstract: Alignment problems in computational biology have been focused recently because of the rapid growth of sequence databases. By computing alignment, we can understand similarity among the sequences. Dynamic programming is a technique to find optimal alignment, but it requires very long computation time. We have shown that dynamic programming for more than two sequences can be efficiently processed on a compact system which consists of an off-the-shelf FPGA board and its host computer (node). The performance is, however, not enough for comparing long sequences. In this paper, we describe a computation method for the multidimensional dynamic programming on distributed systems. The method is now being tested using two nodes connected by Ethernet. According to our experiments, it is possible to achieve 5.1 times speedup with 16 nodes, and more speedup can be expected for comparing longer sequences using more number of nodes. The performance is affected only a little by the data transfer delay when comparing long sequences. Therefore, our method can be mapped on any kinds of networks with large delays.
    Review: Rating:** This work uses the [Masuno2005] system and distributes the problem on 2 nodes with a projected 5.1 speedup on 16 nodes. The N-dimensional DP matrix is divided among the nodes connected in a ring. The next processor in the ring can begin when the boundary from the previous node is received.
    BibTeX:
    @inproceedings{Masuno2006,
      author = {Shingo Masuno and Tsutomu Maruyama and Yoshiki Yamaguchi and Akihiko Konagaya},
      title = {Multidimensional Dynamic Programming for Homology Search on Distributed Systems},
      booktitle = {Euro-Par 2006 Parallel Processing},
      publisher = {Springer Berlin / Heidelberg},
      year = {2006},
      volume = {LNCS 4128},
      pages = {1127-1137},
      doi = {http://dx.doi.org/10.1007/11823285_119}
    }
    
    Masuno, S., Maruyama, T., Yamaguchi, Y. & Konagaya, A. An FPGA Implementation of Multiple Sequence Alignment Based on Carrillo-Lipman Method 2007 International Conference on Field Programmable Logic and Applications, FPL 2007, pp. 489-492  inproceedings DOI  
    Abstract: Multiple sequence alignment problems in computational biology have been focused recently because of the rapid growth of sequence databases. By computing alignment, we can understand similarity among the sequences. In this paper, we describe a compact system with an FPGA board and a host computer for multiple sequence alignment based on Carrillo-Lipman method. In our system, two dimensional dynamic programming is repeatedly applied along other dimensions to realize multidimensional search with a simple and common architecture, and unnecessary parts of the search space for finding the optimal alignment are skipped using Carrillo-Lipman method to reduce the computation time.
    Review: Rating:** This work uses the [Masuno2005] system and adds the Carrillo-Lipman method for pruning the search space. Only up to 5 sequences are aligned. The overall speedup is 0.07 to 49, depending on the similarity of the sequences, compared with MSA 2.0 on a 3 GHz Pentium 4. When sequences are similar, the Carrillo-Lipman method does most of the work leaving little for the FPGA to do. When the sequences are dissimilar, more area needs to be searched by the FPGA, which shows better speedup.
    BibTeX:
    @inproceedings{Masuno2007a,
      author = {Shingo Masuno and Tsutomu Maruyama and Yoshiki Yamaguchi and Akihiko Konagaya},
      title = {An FPGA Implementation of Multiple Sequence Alignment Based on Carrillo-Lipman Method},
      booktitle = {International Conference on Field Programmable Logic and Applications, FPL 2007},
      year = {2007},
      pages = {489-492},
      doi = {http://dx.doi.org/10.1109/FPL.2007.4380696}
    }
    
    Masuno, S., Maruyama, T., Yamaguchi, Y. & Konagaya, A. Multiple Sequence Alignment Based on Dynamic Programming Using FPGA 2007 IEICE Transactions on Information and Systems, Vol. E90-D(12), pp. 1939-1946  article DOI  
    Abstract: Multiple sequence alignment problems in computational biology have been focused recently because of the rapid growth of sequence databases. By computing alignment, we can understand similarity among the sequences. Many hardware systems for alignment have been proposed to date, but most of them are designed for two-dimensional alignment (alignment between two sequences) because of the complexity to calculate alignment among more than two sequences under limited hardware resources. In this paper, we describe a compact system with an off-the-shelf FPGA board and a host computer for more than three-dimensional alignment based on dynamic programming. In our approach, high performance is achieved (1) by configuring optimal circuit for each dimensional alignment, and (2) by two phase search in each dimension by reconfiguration. In order to realize multidimensional search with a common architecture, two-dimensional dynamic programming is repeated along other dimensions. With this approach, we can minimize the size of units for alignment and achieve high parallelism. Our system with one XC2V6000 enables about 300-fold speedup as compared with single Intel Pentium4 2 GHz processor for four-dimensional alignment, and 100-fold speedup for five-dimensional alignment.
    Review: Rating:*** This is a journal version of [Masuno2005]. An optimal MSA is accelerated on an FPGA. The accelerator is an ADM-XRC-II by Alpha Data with one Xilinx XC2V6000. Compared with a 2 GHz Pentium 4, the speedup for 4 sequences is 298 (running at 36.6 MHz) and the speedup for 5 sequences is 104x (running at 31.0 MHz). The performance is limited by the memory bandwidth available to save traceback pointers. The size of the sequences is limited by the amount of external memory on the FPGA board. A 5-sequence alignment with a length of 64 and a 4-sequence alignment with a length of 256 is demonstrated. Alignment performance degrades with more than 5 sequences. A maximum of 6 sequences can be aligned with a reasonable performance gain. Only constant gaps are implemented. This work is an extension of the work by [Yamaguchi2004].
    BibTeX:
    @article{Masuno2007b,
      author = {Shingo Masuno and Tsutomu Maruyama and Yoshiki Yamaguchi and Akihiko Konagaya},
      title = {Multiple Sequence Alignment Based on Dynamic Programming Using FPGA},
      journal = {IEICE Transactions on Information and Systems},
      year = {2007},
      volume = {E90-D},
      number = {12},
      pages = {1939-1946},
      doi = {http://dx.doi.org/10.1093/ietisy/e90-d.12.1939}
    }
    
    Mikhailov, D., Cofer, H. & Gomperts, R. Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal, and MULTICLUSTAL 2001   techreport  
    Abstract: Multiple sequence alignments represent a class of powerful bioinformatics tools with many uses in computational biology. Knowledge of multiple alignments (MA) helps to predict secondary and tertiary structures and to detect homologies between newly sequenced genes and existing gene (protein) families. With the adoption of high-throughput (HT) automation, offering scientists significantly more data with which to make better decisions, it is increasingly important to run MA calculations as quickly as possible to allow real-time decision making in the lab. The popular MA application Clustal W (1) provides a very good example of how the resources of multiprocessor shared memory computers can be utilized more efficiently. This paper first outlines the efforts undertaken by SGI to parallelize the Clustal W application. The parallel version shows speedups of up to 10x when running Clustal W on 16 CPUs and significantly reduces the time required for data analysis. Second, the development of a high-throughput version of Clustal W called HT Clustal and the different methods of scheduling multiple MA jobs in HT Clustal are discussed. Third, improvements in the recently introduced MULTICLUSTAL algorithm (2) and its efficient use of Parallel Clustal W are reviewed.
    Review: shows a 10x speedup with 16 processors by parallelizing all three stages with OpenMP on a shared-memory SGI Origin machine. Stage 3 computes the scoring matrix in parallel.
    BibTeX:
    @techreport{Mikhailov2001,
      author = {Dmitri Mikhailov and Haruna Cofer and Roberto Gomperts},
      title = {Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal, and MULTICLUSTAL},
      year = {2001}
    }
    
    Monwar, M.M. & Rezaei, S. Parallelized Multiple Biological Sequence Alignment with MPI: The Divide-and-Conquer Approach 2006 International MultiConference of Engineers and Computer Scientists, IMECS '06, pp. 126-130  inproceedings  
    Review: This paper got the Best Student Paper Award. See [Monwar2007]
    BibTeX:
    @inproceedings{Monwar2006,
      author = {Md.~Maruf Monwar and Siamak Rezaei},
      title = {Parallelized Multiple Biological Sequence Alignment with MPI: The Divide-and-Conquer Approach},
      booktitle = {International MultiConference of Engineers and Computer Scientists, IMECS '06},
      year = {2006},
      pages = {126-130}
    }
    
    Monwar, M. & Rezaei, S. An Efficient Parallel Processing Approach For Multiple Biological Sequence Alignment 2007 IAENG International Journal of Computer Science, Vol. 33(2), pp. 5  article URL 
    Abstract: Multiple biological sequence alignment is a challenging task due to its high demands for computational power, memory capacity and bandwidth and a number of novel algorithms have been developed for this. In this paper, an MPI based parallel multiple sequence alignment (MSA) algorithm is implemented with the Divide-and-Conquer approach. With this approach, the sequences are first cut down into smaller subsequences to minimize the computational space. Then these subsequences are run in parallel on different available processors using MPI. Each of those processors first builds an individual guide tree and then aligns the subsequences by Needleman-Wunsch algorithm for biological sequence comparison. After aligning, the results are then sent to the main processor where they concatenate to produce the final alignment. Because of the creation of multiple guide trees, this approach achieves a significantly better speed up than a simple MPI based parallel MSA algorithm. But some quality of the alignment is compromised for the introduction of gaps at the start or end of subsequence alignments. Therefore, some heuristic methods for fixing the cut points were suggested for future improvement.
    BibTeX:
    @article{Monwar2007,
      author = {Maruf Monwar and Siamak Rezaei},
      title = {An Efficient Parallel Processing Approach For Multiple Biological Sequence Alignment},
      journal = {IAENG International Journal of Computer Science},
      year = {2007},
      volume = {33},
      number = {2},
      pages = {5},
      url = {http://www.iaeng.org/IJCS/issues_v33/issue_2/IJCS_33_2_5.pdf}
    }
    
    Nguyen, H.D., Yoshihara, I., Yamamori, K. & Yasunaga, M. A parallel hybrid genetic algorithm for multiple protein sequence alignment 2002 Proceedings of the 2002 Congress on Evolutionary Computation (CEC '02), Vol. 1, pp. 309-314  inproceedings DOI  
    Abstract: This paper presents a parallel hybrid genetic algorithm (GA) for solving sum-of-pairs multiple protein sequence alignment. The method is based on a multiple population GENITOR-type GA and involves local search heuristics. It is then extended to parallel to exploit the benefit of a multiprocessor system. Benchmarks from the BAliBASE library are used to validate the method.
    Review: Parallel Hybrid Genetic Algorithm (PHGA).
    BibTeX:
    @inproceedings{Nguyen2002a,
      author = {Hung~Dinh Nguyen and Ikuo Yoshihara and Kunihito Yamamori and Moritoshi Yasunaga},
      title = {A parallel hybrid genetic algorithm for multiple protein sequence alignment},
      booktitle = {Proceedings of the 2002 Congress on Evolutionary Computation (CEC '02)},
      year = {2002},
      volume = {1},
      pages = {309-314},
      doi = {http://dx.doi.org/10.1109/CEC.2002.1006252}
    }
    
    Nguyen, H.D., Yoshihara, I., Yamamori, K. & Yasunaga, M. Aligning Multiple Protein Sequences by Parallel Hybrid Genetic Algorithm 2002 Genome Informatics, Vol. 13, pp. 123-132  article  
    Abstract: This paper presents a parallel hybrid genetic algorithm (GA) for solving the sum-of-pairs multiple protein sequence alignment. A new chromosome representation and its corresponding genetic operators are proposed. A multi-population GENITOR-type GA is combined with local search heuristics. It is then extended to run in parallel on a multiprocessor system for speeding up. Experimental results of benchmarks from the BAliBASE show that the proposed method is superior to MSA, OMA, and SAGA methods with regard to quality of solution and running time. It can be used for finding multiple sequence alignment as well as testing cost functions.
    Review: Parallel Hybrid Genetic Algorithm (PHGA).
    BibTeX:
    @article{Nguyen2002b,
      author = {Hung~Dinh Nguyen and Ikuo Yoshihara and Kunihito Yamamori and Moritoshi Yasunaga},
      title = {Aligning Multiple Protein Sequences by Parallel Hybrid Genetic Algorithm},
      journal = {Genome Informatics},
      year = {2002},
      volume = {13},
      pages = {123-132}
    }
    
    Nguyen, H.D., Yamamori, K., Yoshihara, I. & Yasunaga, M. Improved GA-based method for multiple protein sequence alignment 2003 The 2003 Congress on Evolutionary Computation (CEC '03), Vol. 3, pp. 1826-1832  inproceedings DOI  
    Abstract: In previous work, we have proposed a parallel hybrid genetic algorithm (PHGA) which can find high quality solution from the mathematical viewpoint for the multiple protein sequence alignment. We present new improvements to the PHGA. Local alignment information is added to the weighted sum-of-pairs objective function to achieve better alignment from the biological viewpoint. We also extend our method to run in parallel on a cluster of machines instead of a multi-processor machine to speed it up.
    Review: Parallel Hybrid Genetic Algorithm (PHGA). The average speedup is 6.6 on an 8 processor cluster with 933 MHz Pentium III CPUs for the BAliBASE data set. Quality results are slightly better than ClustalW, PRRP, T-Coffee, and MAFFT. See Section 2.6 Parallelization on page 1828.
    BibTeX:
    @inproceedings{Nguyen2003,
      author = {Hung~Dinh Nguyen and Kunihito Yamamori and Ikuo Yoshihara and Moritoshi Yasunaga},
      title = {Improved GA-based method for multiple protein sequence alignment},
      booktitle = {The 2003 Congress on Evolutionary Computation (CEC '03)},
      year = {2003},
      volume = {3},
      pages = {1826-1832},
      doi = {http://dx.doi.org/10.1109/CEC.2003.1299894}
    }
    
    Notredame, C., O'Brien, E.A. & Higgins, D.G. RAGA: RNA sequence alignment by genetic algorithm 1997 Nucleic Acids Research, Vol. 25(22), pp. 4570-4580  article DOI  
    Abstract: We describe a new approach for accurately aligning two homologous RNA sequences when the secondary structure of one of them is known. To do so we developed two software packages, called RAGA and PRAGA, which use a genetic algorithm approach to optimize the alignments. RAGA is mainly an extension of SAGA, an earlier package for multiple protein sequence alignment. In PRAGA several genetic algorithms run in parallel and exchange individual solutions. This method allows us to optimize an objective function that describes the quality of a RNA pairwise alignment, taking into account both primary and secondary structure, including pseudoknots. We report results obtained using PRAGA on nine test cases of pairs of eukaryotic small subunit rRNA sequence (nuclear and mitochondrial).
    Review: The parallel version is called PRAGA. An island parallel approach is used. A speedup of 10 is observed on 13 processors. The processes synchronize after each generation.
    BibTeX:
    @article{Notredame1997,
      author = {Cédric Notredame and Emmet~A. O'Brien and Desmond~G. Higgins},
      title = {RAGA: RNA sequence alignment by genetic algorithm},
      journal = {Nucleic Acids Research},
      year = {1997},
      volume = {25},
      number = {22},
      pages = {4570-4580},
      doi = {http://dx.doi.org/10.1093/nar/25.22.4570}
    }
    
    Oliver, T., Schmidt, B., Nathan, D., Clemens, R. & Maskell, D. Multiple Sequence Alignment on an FPGA 2005 Proceedings of the 11th International Conference on Parallel and Distributed Systems (ICPADS'05), Vol. 2, pp. 326-330  inproceedings DOI  
    Abstract: Molecular biologists frequently compute multiple sequence alignments (MSAs) to identify similar regions in protein families. Progressive alignment is a widely used approach to compute MSAs. However, aligning a few hundred sequences by popular progressive alignment tools requires several hours on sequential computers. Due to the rapid growth of biological sequence databases biologists have to compute MSAs in a far shorter time. In this paper we present a new approach to MSA on reconfigurable hardware platforms to gain high performance at low cost. To derive an efficient mapping onto this type of architecture, fine-grained parallel processing elements (PEs) have been designed. Using this PE design as a building block we have constructed a linear systolic array to perform a pairwise sequence distance computation using dynamic programming. This results in an implementation with significant runtime savings on a standard off-the-shelf FPGA.
    Review: Rating:***** See [Oliver2006b] for more details.
    BibTeX:
    @inproceedings{Oliver2005a,
      author = {Tim Oliver and Bertil Schmidt and Darran Nathan and Ralf Clemens and Douglas Maskell},
      title = {Multiple Sequence Alignment on an FPGA},
      booktitle = {Proceedings of the 11th International Conference on Parallel and Distributed Systems (ICPADS'05)},
      year = {2005},
      volume = {2},
      pages = {326-330},
      doi = {http://dx.doi.org/10.1109/ICPADS.2005.202}
    }
    
    Oliver, T., Schmidt, B., Nathan, D., Clemens, R. & Maskell, D. Using reconfigurable hardware to accelerate multiple sequence alignment with ClustalW 2005 Bioinformatics, Vol. 21(16), pp. 3431-3432  article DOI  
    Abstract: Summary: Aligning hundreds of sequences using progressive alignment tools such as ClustalW requires several hours on state-of-the-art workstations. We present a new approach to compute multiple sequence alignments in far shorter time using reconfigurable hardware. This results in an implementation of ClustalW with significant runtime savings on a standard off-the-shelf FPGA. Availability: An online server for ClustalW running on a Pentium IV 3 GHz with a Xilinx XC2V6000 FPGA PCI-board is available at http://beta.projectproteus.org. The PE hardware design in Verilog HDL is available on request from the first author. Contact: tim.oliver@pmail.ntu.edu.sg
    Review: Rating:***** See [Oliver2006b] for more details. This reference is the most often cited.
    BibTeX:
    @article{Oliver2005b,
      author = {Tim Oliver and Bertil Schmidt and Darran Nathan and Ralf Clemens and Douglas Maskell},
      title = {Using reconfigurable hardware to accelerate multiple sequence alignment with ClustalW},
      journal = {Bioinformatics},
      year = {2005},
      volume = {21},
      number = {16},
      pages = {3431-3432},
      doi = {http://dx.doi.org/10.1093/bioinformatics/bti508}
    }
    
    Oliver, T., Schmidt, B., Maskell, D., Nathan, D. & Clemens, R. High-speed Multiple Sequence Alignment on a reconfigurable platform 2006 International Journal of Bioinformatics Research and Applications (IJBRA), Vol. 2(4), pp. 394-406  article DOI  
    Abstract: Progressive alignment is a widely used approach to compute multiple sequence alignments (MSAs). However, aligning several hundred sequences by popular progressive alignment tools requires hours on sequential computers. Due to the rapid growth of sequence databases biologists have to compute MSAs in a far shorter time. In this paper we present a new approach to MSA on reconfigurable hardware platforms to gain high performance at low cost. We have constructed a linear systolic array to perform pairwise sequence distance computations using dynamic programming. This results in an implementation with significant runtime savings on a standard FPGA.
    Review: Rating:***** The research presented by Oliver et al. [Oliver2005a, Oliver2005b, Oliver2006b] accelerates ClustalW with an FPGA. All three references appear to cover the same work. The contents of the papers can roughly be explained by: [Oliver2005a] + [Oliver2005b] ~= [Oliver2006b]. [Oliver2005a] does not have any overall speed comparisons, while [Oliver2005b] does. The [Oliver2005b] reference is the most often cited. Oliver's work only accelerates the first stage of progressive MSA on the FPGA, which is the pairwise sequence comparison, and leaves the second and third stages for execution on the host processor. Instead of calculating the similarity score by actually aligning the sequences and counting the number of identical characters (NIDs, number of exact matches--Oliver; number of identities in the best alignment--Thompson), custom hardware is developed to compute the NIDs during the forward scan. This avoids a full alignment and traceback, which is more difficult in FPGA technology. The best overall speedup was 13.3 compared to ClustalW running on a 3.0 GHz Pentium 4. The accelerator board is a PCI based, ADP-WRC-II from Alpha Data which allows 92 PEs in a Xilinx XC2V6000 clocked at 34 MHz giving 3.1 GCUPS. The accelerator reaches a peak speedup of 50.9x in Stage 1. Affine gaps are implemented.
    BibTeX:
    @article{Oliver2006b,
      author = {Tim Oliver and Bertil Schmidt and Douglas Maskell and Darran Nathan and Ralf Clemens},
      title = {High-speed Multiple Sequence Alignment on a reconfigurable platform},
      journal = {International Journal of Bioinformatics Research and Applications (IJBRA)},
      year = {2006},
      volume = {2},
      number = {4},
      pages = {394-406},
      doi = {http://dx.doi.org/10.1504/IJBRA.2006.011038}
    }
    
    Parmentier, G., Trystram, D. & Zola, J. Cache-Based Parallelization of Multiple Sequence Alignment Problem 2004 Euro-Par 2004 Parallel Processing, Vol. LNCS 3149, pp. 1005-1012  inproceedings DOI  
    Abstract: In this paper we present new approach to the problem of parallel multiple sequence alignment. The proposed method is based on the application of caching technique and is aimed to solve, with high precision, large alignment instances on the heterogeneous clusters. The cache is used to store partial alignment guiding trees which can be reused in future computations, and is applied to eliminate redundancy of computations in parallel environment. We describe an implementation based on the CaLi library, the software designed for caches implementation. We report preliminary experimental results and finally, we propose some extensions of our method.
    BibTeX:
    @inproceedings{Parmentier2004,
      author = {Gilles Parmentier and Denis Trystram and Jaroslaw Zola},
      title = {Cache-Based Parallelization of Multiple Sequence Alignment Problem},
      booktitle = {Euro-Par 2004 Parallel Processing},
      publisher = {Springer Berlin / Heidelberg},
      year = {2004},
      volume = {LNCS 3149},
      pages = {1005-1012},
      doi = {http://dx.doi.org/10.1007/978-3-540-27866-5_135}
    }
    
    Parmentier, G., Trystram, D. & Zola, J. Large Scale Multiple Sequence Alignment with Simultaneous Phylogeny Inference 2006 Journal of Parallel and Distributed Computing, Vol. 66(12), pp. 1534-1545  article DOI  
    Abstract: Multiple sequence alignment (MSA) and phylogenetic tree reconstruction are one of the most important problems in the computational biology. While both these problems are of great practical significance, in most cases they are very computationally demanding. In this paper we propose a new approach to the MSA problem which simultaneously infers an underlying phylogenetic tree. To process large data sets we provide parallel implementation of our method, which is based on the distributed caching of intermediate results. Finally, we show a parallel server designed for grid environments, and we report results of experiments performed with actual biological data, e.g. 1000 ribosomal RNA sequences.
    Review: See Figure 1 on Page 2 for an overview of the PhylTree method. The distance-matrix calculations of the first stage are distributed independently to worker nodes. After neighborhoods are found, each one is distributed to a worker node to do an alignment and find a phylogenetic tree. All possible trees and alignments are computed for the neighborhood which consists of 4 to 10 taxa. Neighborhoods are then clustered by the master node to give the overall tree and alignment. The sequential version is not compared with the parallel version because too much memory is required for the sequential version. A near linear relative speedup is shown for up to 64 CPUs.
    BibTeX:
    @article{Parmentier2006,
      author = {Gilles Parmentier and Denis Trystram and Jaroslaw Zola},
      title = {Large Scale Multiple Sequence Alignment with Simultaneous Phylogeny Inference},
      journal = {Journal of Parallel and Distributed Computing},
      year = {2006},
      volume = {66},
      number = {12},
      pages = {1534-1545},
      note = {Special Issue: Grids in Bioinformatics and Computational Biology},
      doi = {http://dx.doi.org/10.1016/j.jpdc.2006.03.003}
    }
    
    Pellerin, D., Trexel, E. & Xu, M. FPGA-based hardware acceleration of C/C++ based applications - Part 3 2007 Programmable Logic DesignLine  electronic URL 
    Abstract: The team at Impulse Accelerated Technologies explain how state-of-the-art C-to-hardware tools simplify the development of FPGA-accelerated algorithms.
    Review: Rating:** The multiple sequence alignment application described appears to follow closely the work by [Aung2007]. For the first stage only, Pellerin et al. achieve 6.1x speedup over a 2 GHz Pentium 4 on a Xilinx ML403 board that has a Virtex-4 FX12. The main contribution here is that all of the code was developed in C and not in a HDL. Affine gaps are supported.
    BibTeX:
    @electronic{Pellerin2007,
      author = {David Pellerin and Ed Trexel and Mei Xu},
      title = {FPGA-based hardware acceleration of C/C++ based applications - Part 3},
      year = {2007},
      url = {http://www.pldesignline.com/howto/201800344}
    }
    
    Rezaei, S. & Monwar, M.M. Divide-and-Conquer Algorithm for ClustalW-MPI 2006 Canadian Conference on Electrical and Computer Engineering, CCECE '06, pp. 717-720  inproceedings DOI  
    Abstract: Multiple sequence alignment continues to be an active field of research in computational biology and the most widely used tool for multiple sequence alignment is ClustalW, which achieves alignment via three steps: pair wise alignment, guide tree generation and progressive alignment. ClustalW-MPI is a parallel implementation of ClustalW. In this paper, a new approach, divide-and-conquer, is implemented which uses ClustalW-MPI for sequence alignment but it gets a better speed up performance than ClustalW-MPI. In this approach, the sequences are first cut down into smaller subsequences by divide-and-conquer technique to minimize the computational space. Then these subsequences are sent to different available processors using message passing interface technique. Those processors align the subsequences by executing ClustalW-MPI simultaneously. After aligning, the results are then sent to the main processor to be concatenated to produce the final alignment. But some quality of the alignment may be compromised in this approach for the introduction of gaps at the start or end of subsequences aligned. Therefore, some heuristic methods for fixing the cut points were suggested for future improvement, such as overlapping alignment and sliding window alignment.
    BibTeX:
    @inproceedings{Rezaei2006a,
      author = {Siamak Rezaei and Md.~Maruf Monwar},
      title = {Divide-and-Conquer Algorithm for ClustalW-MPI},
      booktitle = {Canadian Conference on Electrical and Computer Engineering, CCECE '06},
      year = {2006},
      pages = {717-720},
      doi = {http://dx.doi.org/10.1109/CCECE.2006.277630}
    }
    
    Rezaei, S., Monwar, M.M. & Bai, J. Performance Comparison of MPI-Based Parallel Multiple Sequence Alignment Algorithm Using Single and Multiple Guide Trees 2006 5th IEEE International Conference on Cognitive Informatics, ICCI 2006, Vol. 1, pp. 595-600  inproceedings DOI  
    Abstract: Multiple alignment of biological sequences is an interesting area of research for application of parallel processing algorithms. For multiple alignment of sequences, parallelism can be introduced at different levels to reduce the overall time. This paper is concerned with the comparison of an MPI-based multiple sequence alignment algorithm by using single and multiple guide trees. In this work, we will look at the application of divide-and-conquer algorithm for multiple sequence alignment and discuss two different implementations on MPI. We compare the speed up of the two approaches and we contrast the quality of the alignments for these two approaches. These two approaches differ in using one single tree for alignment vs. multiple trees for alignment that we further discuss in this paper.
    BibTeX:
    @inproceedings{Rezaei2006b,
      author = {Siamak Rezaei and Md.~Maruf Monwar and Joanne Bai},
      title = {Performance Comparison of MPI-Based Parallel Multiple Sequence Alignment Algorithm Using Single and Multiple Guide Trees},
      booktitle = {5th IEEE International Conference on Cognitive Informatics, ICCI 2006},
      year = {2006},
      volume = {1},
      pages = {595-600},
      doi = {http://dx.doi.org/10.1109/COGINF.2006.365552}
    }
    
    Rizvi, S.A.M. & Agarwal, P. A New Index-Based Parallel Algorithm for Finding Longest Common Subsequence in Multiple DNA Sequences 2005 Proceedings of the International Conference on Cognitive Systems (ICCS 2005), pp. 8  inproceedings URL 
    Abstract: This paper presents a new Parallel Algorithm for computing a Longest Common Subsequence in Multiple DNA Sequences. It uses a heuristic approach. Although a lot of research has been carried out to find LCS from the two or more given sequences of Protein, DNA, RNA etc, but not many parallel methods exists for finding LCS from multiple sequences. Normally in existing algorithms the time complexity for finding the LCS increases linearly with the increase in Sequences. This is an attempt to given an effective Parallel algorithm to find LCS from any given number of DNA sequences. Significance of this algorithm is that time complexity does not increase linearly with the increase in the number of sequences. However algorithm can also be applied to Protein sequences with the same effectiveness, though the requirement of processors will go up.
    BibTeX:
    @inproceedings{Rizvi2005,
      author = {S.~A.~M. Rizvi and Pankaj Agarwal},
      title = {A New Index-Based Parallel Algorithm for Finding Longest Common Subsequence in Multiple DNA Sequences},
      booktitle = {Proceedings of the International Conference on Cognitive Systems (ICCS 2005)},
      year = {2005},
      pages = {8},
      url = {http://www.niitcrcs.com/iccs/papers/2005_48.pdf}
    }
    
    Sachdeva, V., Kistler, M., Speight, E. & Tzeng, T.-H.K. Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications 2007 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2007, pp. 1-8  inproceedings DOI  
    Abstract: This paper evaluates the performance of bioinformatics applications on the Cell Broadband Engine recently developed at IBM. In particular we focus on two highly popular bioinformatics applications -- FASTA and ClustalW. The characteristics of these bioinformatics applications, such as small critical time-consuming code size, regular memory accesses, existing vectorized code and embarrassingly parallel computation, make them uniquely suitable for the Cell processing platform. The price and power advantages afforded by the Cell processor also make it an attractive alternative to general purpose processors. We report preliminary performance results for these applications, and contrast these results with the state-of-the-art hardware.
    Review: Rating:** See [Sachdeva2008] for the journal version of this paper. Stage 1 speedup Woodcrest/Cell(8 SPU) on large data set, class-C from bioperf: 492.92/75.7=6.51. Overall speedup Woodcrest/Cell(8 SPU) on large data set, class-C from bioperf: 666.085/1053.49=0.632. The slower overall performance is because Stages 2 and 3 run slower on the PPU.
    BibTeX:
    @inproceedings{Sachdeva2007,
      author = {Vipin Sachdeva and Michael Kistler and Evan Speight and Tzy-Hwa~Kathy Tzeng},
      title = {Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications},
      booktitle = {IEEE International Parallel and Distributed Processing Symposium, IPDPS 2007},
      year = {2007},
      pages = {1-8},
      doi = {http://dx.doi.org/10.1109/IPDPS.2007.370449}
    }
    
    Sachdeva, V., Kistler, M., Speight, E. & Tzeng, T.-H.K. Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications 2008 Parallel Computing, Vol. 34(11), pp. 616-626  article DOI  
    Abstract: This paper evaluates the performance of bioinformatics applications on the Cell Broadband Engine (Cell/B.E.) recently developed at IBM. In particular we focus on three highly popular bioinformatics applications -- FASTA, ClustalW, and HMMER. The characteristics of these bioinformatics applications, such as small critical time-consuming code size, regular memory accesses, existing vectorized code and embarrassingly parallel computation, make them uniquely suitable for the Cell/B.E. processing platform. The price and power advantages afforded by the Cell/B.E. processor also make it an attractive alternative to general purpose processors. We report preliminary performance results for these applications, and contrast these results with the state-of-the-art hardware.
    Review: Rating:** ClustalW is accelerated on a Cell BE. A speedup of 9.6x over an Opteron for Stage 1 is achieved with one Cell BE.
    BibTeX:
    @article{Sachdeva2008,
      author = {Vipin Sachdeva and Michael Kistler and Evan Speight and Tzy-Hwa~Kathy Tzeng},
      title = {Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications},
      journal = {Parallel Computing},
      year = {2008},
      volume = {34},
      number = {11},
      pages = {616-626},
      doi = {http://dx.doi.org/10.1016/j.parco.2008.04.001}
    }
    
    Saeed, F. & Khokhar, A. Sample-Align-D: A high performance Multiple Sequence Alignment system using phylogenetic sampling and domain decomposition 2008 IEEE International Parallel & Distributed Processing Symposium, IPDPS 2008, pp. 1-9  inproceedings DOI  
    Abstract: Multiple Sequence Alignment (MSA) is one of the most computationally intensive tasks in Computational Biology. Existing best known solutions for multiple sequence alignment take several hours (in some cases days) of computation time to align, for example, 2000 homologous sequences of average length 300. Inspired by the Sample Sort approach in parallel processing, in this paper we propose a highly scalable multiprocessor solution for the MSA problem in phylogenetically diverse sequences. Our method employs an intelligent scheme to partition the set of sequences into smaller subsets using k-mer count based similarity index, referred to as k-mer rank. Each subset is then independently aligned in parallel using any sequential approach. Further fine tuning of the local alignments is achieved using constraints derived from a global ancestor of the entire set. The proposed Sample-Align-D Algorithm has been implemented on a cluster of workstations using MPI message passing library. The accuracy of the proposed solution has been tested on standard benchmarks such as PREFAB. The accuracy of the alignment produced by our methods is comparable to that of well known sequential MSA techniques. We were able to align 2000 randomly selected sequences from the Methanosarcina acetivorans genome in less than 10 minutes using Sample-Align-D on a 16 node cluster, compared to over 23 hours on sequential MUSCLE system running on a single cluster node.
    Review: See [Saeed2009b]. Sample-Align-D aligned 2000 sequences of length 300 on a 16 node cluster in 9.82 minutes for a speedup of 142x over MUSCLE (23 hours). The program uses a domain decomposition approach.
    BibTeX:
    @inproceedings{Saeed2008,
      author = {Fahad Saeed and Ashfaq Khokhar},
      title = {Sample-Align-D: A high performance Multiple Sequence Alignment system using phylogenetic sampling and domain decomposition},
      booktitle = {IEEE International Parallel & Distributed Processing Symposium, IPDPS 2008},
      year = {2008},
      pages = {1-9},
      doi = {http://dx.doi.org/10.1109/IPDPS.2008.4536174}
    }
    
    Saeed, F. & Khokhar, A. A domain decomposition strategy for alignment of multiple biological sequences on multiprocessor platforms 2009 Journal of Parallel and Distributed Computing, Vol. 69(7), pp. 666-677  article DOI  
    Abstract: Multiple Sequences Alignment (MSA) of biological sequences is a fundamental problem in computational biology due to its critical significance in wide ranging applications including haplotype reconstruction, sequence homology, phylogenetic analysis, and prediction of evolutionary origins. The MSA problem is considered NP-hard and known heuristics for the problem do not scale well with increasing numbers of sequences. On the other hand, with the advent of a new breed of fast sequencing techniques it is now possible to generate thousands of sequences very quickly. For rapid sequence analysis, it is therefore desirable to develop fast MSA algorithms that scale well with an increase in the dataset size. In this paper, we present a novel domain decomposition based technique to solve the MSA problem on multiprocessing platforms. The domain decomposition based technique, in addition to yielding better quality, gives enormous advantages in terms of execution time and memory requirements. The proposed strategy allows one to decrease the time complexity of any known heuristic of O(N)x complexity by a factor of O(1/p)x, where N is the number of sequences, x depends on the underlying heuristic approach, and p is the number of processing nodes. In particular, we propose a highly scalable algorithm, Sample-Align-D, for aligning biological sequences using Muscle system as the underlying heuristic. The proposed algorithm has been implemented on a cluster of workstations using the MPI library. Experimental results for different problem sizes are analyzed in terms of quality of alignment, execution time and speed-up.
    Review: The program uses a domain decomposition approach. k-mer rank is used to decompose the domain. Sequences are distributed to separate processors based on their global k-mer rank. MUSCLE is used to align the sequences locally on a node. Performance is evaluated on a Beowulf Cluster with 16 2.40 GHz Xeon processors and 1GB DRAM memory. The network is Gigabit Ethernet. Sample-Align-D aligned 8000 sequences (of unknown length, most likely 100-2000) in 3.9 minutes compared with 2100 minutes for sequential MUSCLE giving a 538x speedup. The super linear speedup comes from the decomposition. No results are given for running the 16-way decomposition on a single processor to see what the real speedup is for 16 processors. Sample-Align-D is compared with Parallel T-Coffee and Parallel ClustalW (respectively 8.1 s, 9.1 h, 4.3 m) using the PF00500 data set from T-Coffee that consists of 1048 sequences with a maximal length of 523.
    BibTeX:
    @article{Saeed2009b,
      author = {Fahad Saeed and Ashfaq Khokhar},
      title = {A domain decomposition strategy for alignment of multiple biological sequences on multiprocessor platforms},
      journal = {Journal of Parallel and Distributed Computing},
      year = {2009},
      volume = {69},
      number = {7},
      pages = {666-677},
      doi = {http://dx.doi.org/10.1016/j.jpdc.2009.03.006}
    }
    
    Schmollinger, M., Nieselt, K., Kaufmann, M. & Morgenstern, B. DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors 2004 BMC Bioinformatics, Vol. 5(1), pp. 128  article DOI  
    Abstract: BACKGROUND:Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. RESULTS:Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. CONCLUSIONS:By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
    Review: Two methods of parallelism are used: (1) the first stage distributes the all-to-all, pairwise, diagonal fragment finding across several processors; and (2) when aligning larger genomic sections, CHAOS is used to find anchors in the original sequences and split them into smaller sections that are independently aligned. The sequence splitting at anchor points is similar to Vingron's (1989) and Stoye's (1997) divide-and-conquer (DCA) approaches.
    BibTeX:
    @article{Schmollinger2004,
      author = {Schmollinger, Martin and Nieselt, Kay and Kaufmann, Michael and Morgenstern, Burkhard},
      title = {DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors},
      journal = {BMC Bioinformatics},
      year = {2004},
      volume = {5},
      number = {1},
      pages = {128},
      doi = {http://dx.doi.org/10.1186/1471-2105-5-128}
    }
    
    Shu, W.-L., Wu, C.-C. & Lai, W.-C. Design the Hardware of Genetic Algorithm for TSP and MSA 2004 Knowledge-Based Intelligent Information and Engineering Systems, KES 2004, Vol. LNAI 3214, pp. 1260-1267  inproceedings DOI  
    Abstract: Traveling Salesman’s Problem (TSP) can be applied to find the near optimal Multiple Sequence Alignments (MSA)[1]. TSP can be further calculated by using genetic algorithms (GA) [2,3]. In this paper, we develop the hardware of GA to improve the speed up of solving TSP and MSA. Software is used to process creating initial population and selecting operation, and hardware is used to process crossover, mutation and fitness operation. Our hardware system is designed and simulated by using VHDL. The speed up of our system is increased to 27 in worst case and 44 in best case.
    Review: Rating:** MSA is solved by first finding a traveling salesman tour. A genetic algorithm specific to solving the traveling salesman problem is accelerated in hardware. The hardware is an add-in board that most likely has an FPGA. The accelerator can handle up to 65534 sequences. A cluster of nodes, each with an accelerator, is suggested but not demonstrated. Simulations suggest a speedup of 26.9 to 44.7 with one hardware accelerator compared to a software only version.
    BibTeX:
    @inproceedings{Shu.Wen-Lung2004,
      author = {Wen-Lung Shu and Chen-Cheng Wu and Wei-Cheng Lai},
      title = {Design the Hardware of Genetic Algorithm for TSP and MSA},
      booktitle = {Knowledge-Based Intelligent Information and Engineering Systems, KES 2004},
      publisher = {Springer Berlin / Heidelberg},
      year = {2004},
      volume = {LNAI 3214},
      pages = {1260-1267},
      doi = {http://dx.doi.org/10.1007/978-3-540-30133-2_168}
    }
    
    Tajima, K. A New Multiple Alignment Algorithm for Protein and DNA Sequences Based on Vector-Scalar Matching 1988 Journal of Protein Chemistry, Vol. 7, pp. 292-293  article  
    Review: See [Tajima1988b].
    BibTeX:
    @article{Tajima1988a,
      author = {Koji Tajima},
      title = {A New Multiple Alignment Algorithm for Protein and DNA Sequences Based on Vector-Scalar Matching},
      journal = {Journal of Protein Chemistry},
      year = {1988},
      volume = {7},
      pages = {292-293}
    }
    
    Tajima, K. Multiple DNA and protein sequence alignment on a workstation and a supercomputer 1988 Comput. Appl. Biosci., Vol. 4(4), pp. 467-471  article DOI  
    Abstract: This paper describes a multiple alignment method using a workstation and supercomputer. The method is based on the alignment of a set of aligned sequences with the new sequence, and uses a recursive procedure of such alignment. The alignment is executed in a reasonable computation time on diverse levels from a workstation to a supercomputer, from the viewpoint of alignment results and computational speed by parallel processing. The application of the algorithm is illustrated by several examples of multiple alignment of 12 amino acid and DNA sequences of HIV (human immunodeficiency virus) env genes. Colour graphic programs on a workstation and parallel processing on a supercomputer are discussed.
    Review: Rating:*** The progressive alignment method of Barton and Sternberg (1987) is used. The DP calculations are vectorized on a FACOM VP-200 vector supercomputer. The alignment program was written in FORTRAN 77/VP. To avoid a table lookup for a symbol similarity matrix during vector operations, zero is returned if symbols are equal and one otherwise. UPGMA is used to determine the order of aligning sequences. After the first pairwise alignment, the next sequence is aligned with the previous group. DP between a sequence and a group is performed. The compiler vectorizes DO loops when possible. A speedup of 4 is demonstrated on sequence to sequence alignment, and a speedup of 2-3 is shown for sequence to group alignment.
    BibTeX:
    @article{Tajima1988b,
      author = {Tajima, Koji},
      title = {Multiple DNA and protein sequence alignment on a workstation and a supercomputer},
      journal = {Comput. Appl. Biosci.},
      year = {1988},
      volume = {4},
      number = {4},
      pages = {467-471},
      doi = {http://dx.doi.org/10.1093/bioinformatics/4.4.467}
    }
    
    Tajima, K. Multiple Sequence Alignment using Parallel Genetic Algorithms 1993 Genome Informatics Workshop, Vol. 4, pp. 183-187  inproceedings URL 
    Abstract: We propose a more sensitive algorithm for multiple sequence alignment using parallel genetic algorithms. With less computation than that needed for multi-dimensional dynamic programming approaches, we can obtain multiple alignments which have better similarity than that obtained by repeating two-dimensional dynamic programming. The parallel processing of genetic algorithms was performed on a Fujitsu parallel computer AP1000.
    BibTeX:
    @inproceedings{Tajima1993,
      author = {Koji Tajima},
      title = {Multiple Sequence Alignment using Parallel Genetic Algorithms},
      booktitle = {Genome Informatics Workshop},
      publisher = {Universal Academy Press, Inc.},
      year = {1993},
      volume = {4},
      pages = {183-187},
      url = {http://www.jsbi.org/pdfs/journal1/GIW93/Poster/GIW93P02.html}
    }
    
    Tan, G., Feng, S. & Sun, N. Parallel Multiple Sequences Alignment in SMP Cluster 2005 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05)  inproceedings DOI  
    Abstract: Multiple sequences alignment is a fundamental and challenging problem in computational molecular biology. It is commonly used to analyse the DNA/protein sequences. To develop a high efficient parallel algorithm is a very important solution to speedup this time consuming problem. However, due to the irregular computation behaviors based on tree, it is difficult to achieve good load balancing, so the utilization of processor is very low. This paper presents a parallel multiple sequences alignment algorithm featuring a mixed fine and coarse grained parallelization approach. The parallel algorithm is suitable to be implemented in SMP cluster, which is the main architecture of current cluster systems. We implemented the parallel algorithm in SMP cluster using a hybrid MPI/OpenMP method and the experimental results shows that the mixed fine and coarse algorithm achieves higher speedup.
    Review: ClustalW algorithm, use MPI/OpenMP, reports speedups of 35 with 80 processors (40 nodes) on a SMP-cluster. 2.8 GHz Intel Xeon SMP processors are used with gigabit Ethernet. Only stages 1 and 3 are parallelized. Speedup in Stage 1 = 80 and Stage 3 = 9.2.
    BibTeX:
    @inproceedings{Tan2005,
      author = {Guangming Tan and Shengzhong Feng and Ninghui Sun},
      title = {Parallel Multiple Sequences Alignment in SMP Cluster},
      booktitle = {Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05)},
      year = {2005},
      doi = {http://dx.doi.org/10.1109/HPCASIA.2005.70}
    }
    
    Tan, G., Peng, L., Feng, S. & Sun, N. Load Balancing and Parallel Multiple Sequence Alignment with Tree Accumulation 2006 Euro-Par 2006 Parallel Processing, Vol. LNCS 4128, pp. 1138-1147  inproceedings DOI  
    Abstract: Multiple sequence alignment program, ClustalW, is time consuming, however, commonly used to compare the protein sequences. ClustalW includes two main time consuming parts: pairwise alignment and progressive alignment. Due to the irregular computation based on tree in progressive alignment, available parallel programs can not achieve reasonable speedups for large scale number of sequences. In this paper, progressive alignment is reduced to tree accumulation problem. Load balancing is ignored in previous efficient parallel tree accumulations. We proposed a load balancing strategy for parallelizing tree accumulation in progressive alignment. The new parallel progressive alignment algorithm reducing to tree accumulation with load balancing reduced the overall running time greatly and achieved reasonable speedups.
    Review: A speedup of 18 was achieved with 32 processors when aligning 3998 protein sequences. The system is a SMP cluster of 2.8 GHz Xeon processors with 4 GB of memory that are connected with gigabit Ethernet. The program was written in C and uses MPI. Load balancing is used in the third stage for the parallel tree accumulation.
    BibTeX:
    @inproceedings{Tan2006,
      author = {Guangming Tan and Liu Peng and Shengzhong Feng and Ninghui Sun},
      title = {Load Balancing and Parallel Multiple Sequence Alignment with Tree Accumulation},
      booktitle = {Euro-Par 2006 Parallel Processing},
      publisher = {Springer Berlin / Heidelberg},
      year = {2006},
      volume = {LNCS 4128},
      pages = {1138-1147},
      doi = {http://dx.doi.org/10.1007/11823285_120}
    }
    
    Taylor, W.R., Sælensminde, G. & Eidhammer, I. Multiple protein sequence alignment using double-dynamic programming 2000 Computers & Chemistry, Vol. 24(1), pp. 3-12  article URL 
    Abstract: A method of multiple sequence alignment is described based on the double dynamic programming (DDP) algorithm previously used for treating structural constraints encountered in structure comparison and threading. Following these applications, the inconsistencies that emerge when trying to combine pair-wise alignments into a multiple alignment are reconciled by summing all the, possibly inconsistent, paths (low-level alignments) into a matrix which is then used to provide a final (high-level) alignment. This process is applied to all sequence pairs and the pair-wise results combined in a simple multiple sequence alignment program. From this alignment, further constraints are selected to bias the low-level alignments in the DDP algorithm and the process iterated. The results, however, showed that this overall iteration was not needed and one-pass gave results at least as good as the `standard' progressive method of multiple sequence alignment. Further applications of the method are discussed.
    Review: A technique called double dynamic programming (DDP) is used. The all-to-all pairwise calculations were performed in parallel across a Linux cluster.
    BibTeX:
    @article{Taylor2000,
      author = {William~R. Taylor and Gisle Sælensminde and Ingvar Eidhammer},
      title = {Multiple protein sequence alignment using double-dynamic programming},
      journal = {Computers & Chemistry},
      year = {2000},
      volume = {24},
      number = {1},
      pages = {3-12},
      url = {http://linkinghub.elsevier.com/retrieve/pii/S0097-8485(00)80003-0}
    }
    
    Tommaso, P.D., Orobitg, M., Guirado, F., Cores, F., Espinosa, T. & Notredame, C. Cloud-Coffee: implementation of a parallel consistency-based multiple alignment algorithm in the T-Coffee package and its benchmarking on the Amazon Elastic-Cloud 2010 Bioinformatics, Vol. 26(15), pp. 1903-1904  article DOI  
    Abstract: Summary: We present the first parallel implementation of the T-Coffee consistency-based multiple aligner. We benchmark it on the Amazon Elastic Cloud (EC2) and show that the parallelization procedure is reasonably effective. We also conclude that for a web server with moderate usage (10K hits/month) the cloud provides a cost-effective alternative to in-house deployment.Availability: T-Coffee is a freeware open source package available from http://www.tcoffee.org/homepage.html Contact: cedric.notredame@crg.es
    BibTeX:
    @article{Tommaso2010,
      author = {Paolo Di~Tommaso and Miquel Orobitg and Fernando Guirado and Fernado Cores and Toni Espinosa and Cedric Notredame},
      title = {Cloud-Coffee: implementation of a parallel consistency-based multiple alignment algorithm in the T-Coffee package and its benchmarking on the Amazon Elastic-Cloud},
      journal = {Bioinformatics},
      year = {2010},
      volume = {26},
      number = {15},
      pages = {1903-1904},
      doi = {http://dx.doi.org/10.1093/bioinformatics/btq304}
    }
    
    Totoki, Y. Parallel PRRN: Multiple sequence alignment by the best-first iterative refinement strategy with tree-dependent partitioning   unpublished URL 
    Abstract: The program Parallel PRRN is an implementation of the best-first search iterative refinement strategy with tree-dependent partitioning for multiple sequence alignment [5]. The basic alignment algorithms [2-4] are common to those of the programs prrp/prrn (now unified to "ordinary" or "serial" prrn [8] ftp://ftp.genome.ad.jp/pub/db/hgc/software/saitama-cc (bad link), which adopt the randomized iterative refinement strategy [1]. These strategies perform a large number of pairwise group-to-group alignments to gradually improve overall weighted sum-of-pairs score, where the pair weights are introduced to correct for uneven representations of the sequences to be aligned [6]. These hill-climbing strategies do not guarantee to achieve true optimization, but are proven powerful to solve practical alignment problems. Parallel PRRN, as well as the serial counterpart, uses the doubly nested randomized iterative (DNR) method [7] to make alignment, phylogenetic tree and pair weights mutually consistent. In Parallel PRRN, individual alignment processes are distributed to a number of (presently 32) processors running in parallel, which greatly reduces the overall execution time. The strategy works most effectively for refinement of a crude alignment obtained by other more rapid methods, e.g. a progressive alignment method. The default settings now generate such a provisional alignment from a set of totally unaligned sequences given by the user.
    Review: PRRP/PRRN. See [Gotoh1996]. A note from http://www.cbrc.jp/~gotoh/softdata.html: "Parallel prrn is a service provided by Human Genome Center of the University of Tokyo. The inner loop of the doubly nested iterations is executed in parallel with 32-64 CPUs. This parallelization, made possible by Y. Totoki, ensures fast convergence, although the accuracy of the resulting multiple sequence alignments is slightly reduced on average." A note from http://www.hgc.ims.u-tokyo.ac.jp/english/software.html: "Parallel PRRN is a sensitive multiple sequence alignment program originally developed by O. Gotoh (CBRC, AIST) as prrp/prrn. This version was implemented by Y. Totoki (GSC, RIKEN) to run on a SGI machine in parallel. Its algorithm is the best-first search iterative refinement strategy with tree-dependent partitioning. According to an objective test by a third party, it is one of the most sensitive multiple alignment programs in the world (Thompson et al., 1999)."
    BibTeX:
    @unpublished{Totoki,
      author = {Yasushi Totoki},
      title = {Parallel PRRN: Multiple sequence alignment by the best-first iterative refinement strategy with tree-dependent partitioning},
      note = {web page},
      url = {http://align.genome.jp/prrn/prrn_help.html}
    }
    
    Trelles, O. On the parallelisation of bioinformatics applications 2001 Briefings in Bioinformatics, Vol. 2(2), pp. 181-194  article DOI  
    Abstract: This paper surveys the computational strategies followed to parallelise the most used software in the bioinformatics arena. The studied algorithms are computationally expensive and their computational patterns range from regular, such as database-searching applications, to very irregularly structured patterns (phylogenetic trees). Fine- and coarse-grained parallel strategies are discussed for these very diverse sets of applications. This overview outlines computational issues related to parallelism, physical machine models, parallel programming approaches and scheduling strategies for a broad range of computer architectures. In particular, it deals with shared, distributed and shared/distributed memory architectures.
    BibTeX:
    @article{Trelles2001,
      author = {Oswaldo Trelles},
      title = {On the parallelisation of bioinformatics applications},
      journal = {Briefings in Bioinformatics},
      year = {2001},
      volume = {2},
      number = {2},
      pages = {181-194},
      doi = {http://dx.doi.org/10.1093/bib/2.2.181}
    }
    
    Trystram, D. & Zola, J. Parallel Multiple Sequence Alignment with Decentralized Cache Support 2005 Euro-Par 2005 Parallel Processing, Vol. LNCS 3648, pp. 1217-1226  inproceedings DOI  
    Abstract: In this paper we present a new method for aligning large sets of biological sequences. The method performs a sequence alignment in parallel and uses a decentralized cache to store intermediate results. The method allows alignments to be recomputed efficiently when new sequences are added or when alignments of different precisions are requested. Our method can be used to solve important biological problems like the adaptive update of a complete evolution tree when new sequences are added (without recomputing the whole tree). To validate the method, some experiments were performed using up to 512 Small Subunit Ribosomal RNA sequences, which were analyzed with different levels of precision.
    BibTeX:
    @inproceedings{Trystram2005,
      author = {Denis Trystram and Jaroslaw Zola},
      title = {Parallel Multiple Sequence Alignment with Decentralized Cache Support},
      booktitle = {Euro-Par 2005 Parallel Processing},
      publisher = {Springer Berlin / Heidelberg},
      year = {2005},
      volume = {LNCS 3648},
      pages = {1217-1226},
      doi = {http://dx.doi.org/10.1007/11549468_133}
    }
    
    Ukiyama, N. & Imai, H. Parallel Multiple Alignments and Their Implementation on CM5 1993 Genome Informatics Workshop, Vol. 4, pp. 103-108  inproceedings  
    Abstract: This paper addresses several issues in parallel multiple alignments, and reports some preliminary computational results of their implementation on CM5. Use of parallelism in the diagonal direction is laid stress on, which is quite useful especially when aligning similar strings. Some connection with the parallel approximate string matching algorithm by Landau and Vishkin [1] is also touched upon.
    Review: Optimal alignment is performed on three sequences. The CM5 system has 32 nodes with a SPARC processor connected in a fat tree. The DP matrix calculations are performed in parallel along a diagonal wavefront with virtual SIMD processors. On a data set with 800 sequences a speedup of 7.5 is realized over a SUN ELC workstation. A speedup of 18 is projected when using a MIMD approach for the DP calculations.
    BibTeX:
    @inproceedings{Ukiyama1993,
      author = {Naoto Ukiyama and Hiroshi Imai},
      title = {Parallel Multiple Alignments and Their Implementation on CM5},
      booktitle = {Genome Informatics Workshop},
      publisher = {Universal Academy Press, Inc.},
      year = {1993},
      volume = {4},
      pages = {103-108}
    }
    
    Vandierendonck, H., Rul, S., Questier, M. & De~Bosschere, K. Experiences with Parallelizing a Bio-informatics Program on the Cell BE 2008 High Performance Embedded Architectures and Compilers, HiPEAC 2008, Vol. LNCS 4917, pp. 161-175  inproceedings DOI  
    Abstract: The Cell Broadband Engine Architecture is a new heterogeneous multi-core architecture targeted at compute-intensive workloads. The architecture of the Cell BE has several features that are unique in high-performance general-purpose processors, such as static instruction scheduling, extensive support for vectorization, scratch pad memories, explicit programming of DMAs, mailbox communication, multiple processor cores, etc. It is necessary to make explicit use of these features to obtain high performance. Yet, little work reports on how to apply them and how much each of them contributes to performance. This paper presents our experiences with programming the Cell BE architecture. Our test application is Clustal W, a bio-informatics program for multiple sequence alignment. We report on how we apply the unique features of the Cell BE to ClustalW and how important each is to obtain high performance. By making extensive use of vectorization and by parallelizing the application across all cores, we speedup the pairwise alignment phase of ClustalW with a factor of 51.2 over PPU (superscalar) execution. The progressive alignment phase is sped up by a factor of 5.7 over PPU execution, resulting in an overall speedup by 9.1.
    Review: Rating:**** ClustalW is accelerated on the Cell BE by a factor of 9.1 over the PPU. A Stage 1 speedup of 51.2 and a Stage 3 speedup of 5.7 are achieved. See [Vandierendonck2009] for a comparison with an Intel processor.
    BibTeX:
    @inproceedings{Vandierendonck2008,
      author = {Hans Vandierendonck and Sean Rul and Michiel Questier and Koen De~Bosschere},
      title = {Experiences with Parallelizing a Bio-informatics Program on the Cell BE},
      booktitle = {High Performance Embedded Architectures and Compilers, HiPEAC 2008},
      publisher = {Springer Berlin / Heidelberg},
      year = {2008},
      volume = {LNCS 4917},
      pages = {161-175},
      doi = {http://dx.doi.org/10.1007/978-3-540-77560-7_12}
    }
    
    Vandierendonck, H., Rul, S. & De~Bosschere, K. Accelerating Multiple Sequence Alignment with the Cell BE Processor 2010 The Computer Journal, Vol. 53(6), pp. 814-826  article DOI  
    Abstract: The Cell Broadband Engine (BE) Architecture is a new heterogeneous multi-core architecture targeted at compute-intensive workloads. The architecture of the Cell BE has several features that are unique in high-performance general-purpose processors, most notably the extensive support for vectorization, scratch pad memories and explicit programming of direct memory accesses (DMAs) and mailbox communication. While these features strongly increase programming complexity, it is generally claimed that significant speedups can be obtained by using Cell BE processors. This paper presents our experiences with using the Cell BE architecture to accelerate Clustal W, a bio-informatics program for multiple sequence alignment. We report on how we apply the unique features of the Cell BE to Clustal W and how important each is in obtaining high performance. By making extensive use of vectorization and by parallelizing the application across all cores, we demonstrate a speedup of 24.4 times when using 16 synergistic processor units on a QS21 Cell Blade compared to single-thread execution on the power processing unit. As the Cell BE exploits a large number of slim cores, our highly optimized implementation is just 3.8 times faster than a 3-thread version running on an Intel Core2 Duo, as the latter processor exploits a small number of fat cores.
    Review: Rating:**** ClustalW is accelerated by a factor of 8 when compared with a 2.13 GHz Intel Core2 Duo processor running a single thread. Stages 1 and 3 only were parallelized on two Cell BEs by vectorizing DP matrix calculations and scheduling independent tasks across the 16 available synergistic processing elements. Core2: 32+0+26=58 seconds, Cell_original: 74+4+92=170 seconds, Cell_opt: 0.8+0+6.7=6.97 seconds, speedup_cell: 91.9, ?, 13.6=24.4, speedup_core2: 40, ?, 4=8.
    BibTeX:
    @article{Vandierendonck2010,
      author = {Hans Vandierendonck and Sean Rul and Koen De~Bosschere},
      title = {Accelerating Multiple Sequence Alignment with the Cell BE Processor},
      journal = {The Computer Journal},
      year = {2010},
      volume = {53},
      number = {6},
      pages = {814-826},
      doi = {http://dx.doi.org/10.1093/comjnl/bxp086}
    }
    
    Wang, C. & Lefkowitz, E.J. Genomic multiple sequence alignments: refinement using a genetic algorithm 2005 BMC Bioinformatics, Vol. 6(200), pp. 8  article DOI  
    Abstract: Background: Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics -- the practice of comparing genomic sequences from different species -- plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation) score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. Results: We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned) regions of the orthopoxvirus alignment. Overall sequence identity increased only slightly; but significantly, this occurred at the same time that the overall alignment length decreased -- through the removal of gaps - by approximately 200 gapped regions representing roughly 1,300 gaps. Conclusion: We have implemented a genetic algorithm in parallel mode to optimize multiple genomic sequence alignments initially generated by various alignment tools. Benchmarking experiments showed that the refinement algorithm improved genomic sequence alignments within a reasonable period of time.
    Review: GenAlignRefine.
    BibTeX:
    @article{Wang.Chunlin2005,
      author = {Chunlin Wang and Elliot~J Lefkowitz},
      title = {Genomic multiple sequence alignments: refinement using a genetic algorithm},
      journal = {BMC Bioinformatics},
      year = {2005},
      volume = {6},
      number = {200},
      pages = {8},
      doi = {http://dx.doi.org/10.1186/1471-2105-6-200}
    }
    
    Wirawan, A., Schmidt, B. & Kwoh, C.K. Pairwise Distance Matrix Computation for Multiple Sequence Alignment on the Cell Broadband Engine 2009 Computational Science -- ICCS 2009, Vol. LNCS 5544, pp. 954-963  inproceedings DOI  
    Abstract: Multiple sequence alignment is an important tool in bioinformatics. Although efficient heuristic algorithms exist for this problem, the exponential growth of biological data demands an even higher throughput. The recent emergence of accelerator technologies has made it possible to achieve a highly improved execution time for many bioinformatics applications compared to general-purpose platforms. In this paper, we demonstrate how the PlayStation3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the distance matrix computation utilized in multiple sequence alignment algorithms.
    Review: Rating:**** ClustalW is accelerated on a Playstation3. A peak speedup of 108 was achieved on the first stage when compared with a 3.0 GHz Pentium 4. Overall, a speedup of 13.74 was observed on 1000 sequences with an average length of 446. The overall speedup was not reported in the paper but if the Pentium is used for Stages 2 and 3, the overall speedup can be calculated with additional information from [Oliver2005b]. 4711.6/(40.82+16+286)=13.74 A recurrence relation is proposed for the Cell (which is the same used on a FPGA and a GPU) to assist in calculating the number of identical characters (NIDs).
    BibTeX:
    @inproceedings{Wirawan2009,
      author = {Adrianto Wirawan and Bertil Schmidt and Chee~Keong Kwoh},
      title = {Pairwise Distance Matrix Computation for Multiple Sequence Alignment on the Cell Broadband Engine},
      booktitle = {Computational Science -- ICCS 2009},
      publisher = {Springer-Verlag Berlin Heidelberg},
      year = {2009},
      volume = {LNCS 5544},
      pages = {954-963},
      doi = {http://dx.doi.org/10.1007/978-3-642-01970-8_96}
    }
    
    Wood, E.K. ClustalW in Bioinformatics: Approaches and Optimization 2009   techreport  
    Abstract: Multiple sequence alignment is one of the most important bioinformatics tasks today. ClustalW is currently the most widely accepted computer program for multiple sequence alignment, and its performance and speed is constantly being improved upon by computer scientists. This paper will address the task of speeding up ClustalW using many methods, namely the Cell Broadband Engine as contained on PlayStation 3 consoles.
    Review: Rating:*** The paper has a good literature review of Clustal and its parallel variants. No results for the Cell BE are given but Stage 1 acceleration is proposed.
    BibTeX:
    @techreport{Wood2009,
      author = {Emily~K. Wood},
      title = {ClustalW in Bioinformatics: Approaches and Optimization},
      year = {2009}
    }
    
    Yamaguchi, Y., Maruyama, T. & Konagaya, A. Three-Dimensional Dynamic Programming for Homology Search 2004 Field Programmable Logic and Application, FPL 2004, Vol. LNCS 3203, pp. 505-515  inproceedings DOI  
    Abstract: Alignment problems in computational biology have been focused recently because of the rapid growth of sequence databases. Many systems for alignment have been proposed to date, but most of them are designed for two-dimensional alignment (alignment between two sequences), because huge amount of memory and very long computational time are required by alignment among three or more sequences. In this paper, we describe a compact system with an off-the-shelf FPGA board and a host computer for three-dimensional alignment using Dynamic Programming. Through our approach, high performance are attained by ``two phase search'' with reconfigurations of an FPGA and co-processing the FPGA and software. Furthermore, in order to achieve higher parallelism in the FPGA, we use a payoff matrix for matching elements in sequences and the matrix is divided into sub-matrices which are minimized. In comparison to a single Intel Pentium4 2.53GHz processor, our system with a single XC2V6000 enables more than 250-fold speedup.
    Review: Rating:*** The system only handles 3 sequences and does an optimal alignment with DP. An alignment with a sequence length up to 1024 is demonstrated. Compared with a 2.53 GHz Pentium 4, a speedup over 200 is achieved with 64 units (PEs) operating at 66 MHz on a XC2V6000 (ADM-XRC-II by Alpha Data). A speedup of 327 is achieved on 3 sequences of length 1024 in the ``first phase'' (without traceback). A short-sighted affine gap model is implemented. Only one variable per DP matrix cell is used as opposed to the three needed for true affine gaps. At each cell calculation, a gap flag from the prior cell determines if the gap open or extension cost should be used. Significant bottlenecks are encountered in saving the traceback pointers to host memory. See [Masuno2007b] for a 4 and 5 sequence extension of this work.
    BibTeX:
    @inproceedings{Yamaguchi2004,
      author = {Yoshiki Yamaguchi and Tsutomu Maruyama and Akihiko Konagaya},
      title = {Three-Dimensional Dynamic Programming for Homology Search},
      booktitle = {Field Programmable Logic and Application, FPL 2004},
      publisher = {Springer Berlin / Heidelberg},
      year = {2004},
      volume = {LNCS 3203},
      pages = {505-515},
      doi = {http://dx.doi.org/10.1007/978-3-540-30117-2_52}
    }
    
    Yap, T.K., Munson, P.J., Frieder, O. & Martino, R.L. Parallel Multiple Sequence Alignment using Speculative Computation 1995 Proceedings of the 1995 International Conference on Parallel Processing, Vol. 3, pp. 60-67  inproceedings  
    Abstract: Many different methods have been presented for aligning multiple biological sequences. These methods can be classified into three categories: rigorous, tree-based, and iterative. The rigorous method, which always generates the optimal alignment, requires memory space and computation time proportional to the product of the sequence lengths. Even for a modest number of sequences, this method becomes impractical. As a result, the other two methods were introduced. The iterative methods were shown to generate better alignments than the treebased methods. However, these methods require as much as 100 times longer computation time. A number of days may be required to align a large number of sequences (e.g., over 100 sequences) sequentially. We present a parallel speculative computational method which reduces the computation time of the iterative methods from days to minutes. To evaluate our parallel method, we implemented a speculative computation version of the iterative improvement method of Berger and Munson on an Intel iPSC/860 parallel computer. The empirical results demonstrate that our parallel method obtained a significant speed up in comparison to the sequential method.
    Review: See [Ishikawa1992] and [Araki1993] and [Yap1995] and [Martino1997] and [Yap1998] Section 4. Speculative computation is applied to the Berger-Munson algorithm.
    BibTeX:
    @inproceedings{Yap1995,
      author = {Tieng~K. Yap and Peter~J. Munson and Ophir Frieder and Robert~L. Martino},
      title = {Parallel Multiple Sequence Alignment using Speculative Computation},
      booktitle = {Proceedings of the 1995 International Conference on Parallel Processing},
      year = {1995},
      volume = {3},
      pages = {60-67}
    }
    
    Yap, T.K., Frieder, O. & Martino, R.L. Parallel Computation in Biological Sequence Analysis 1998 IEEE Transactions on Parallel and Distributed Systems, Vol. 9(3), pp. 283-294  article DOI  
    Abstract: A massive volume of biological sequence data is available in over 36 different databases worldwide, including the sequence data generated by the Human Genome project. These databases, which also contain biological and bibliographical information, are growing at an exponential rate. Consequently, the computational demands needed to explore and analyze the data contained in these databases is quickly becoming a great concern. To meet these demands, we must use high performance computing systems, such as parallel computers and distributed networks of workstations. We present two parallel computational methods for analyzing these biological sequences. The first method is used to retrieve sequences that are homologous to a query sequence. The biological information associated with the homologous sequences found in the database may provide important clues to the structure and function of the query sequence. The second method, which helps in the prediction of the function, structure, and evolutionary history of biological sequences, is used to align a number of homologous sequences with each other. These two parallel computational methods were implemented and evaluated on an Intel IPSC/860 parallel computer. The resulting performance demonstrates that parallel computational methods can significantly reduce the computational time needed to analyze the sequences contained in large databases.
    Review: See [Ishikawa1992] and [Araki1993] and [Yap1995] and [Martino1997] and [Yap1998] Section 4. The method uses speculative computation which is similar to simulated annealing. Speculative computation is applied to parallelize the Berger-Munson algorithm. A draft alignment created manually or from Clustal is iteratively refined.
    BibTeX:
    @article{Yap1998,
      author = {Tieng~K. Yap and Ophir Frieder and Robert~L. Martino},
      title = {Parallel Computation in Biological Sequence Analysis},
      journal = {IEEE Transactions on Parallel and Distributed Systems},
      year = {1998},
      volume = {9},
      number = {3},
      pages = {283-294},
      doi = {http://dx.doi.org/10.1109/71.674320}
    }
    
    Zola, J., Trystram, D., Tchernykh, A. & Brizuela, C. Parallel multiple sequence alignment with local phylogeny search by simulated annealing 2006 Proceedings of the 20th International Parallel and Distributed Processing Symposium, IPDPS 2006, pp. 8  inproceedings DOI  
    Abstract: The problem of multiple sequence alignment is one of the most important problems in computational biology. In this paper we present a new method that simultaneously performs multiple sequence alignment and phylogenetic tree inference for large input data sets. We describe a parallel implementation of our method that utilises simulated annealing metaheuristic to find locally optimal phylogenetic trees in reasonable time. To validate the method, we perform a set of experiments with synthetic as well as real-life data.
    Review: PT-SA. The distance-matrix calculations of the first stage are distributed independently to worker nodes. After neighborhoods are found, each one is distributed to a worker node to do an alignment and find a phylogenetic tree. For smaller data sets, searching for the best tree is done in parallel with simulated annealing. The annealing algorithm generates candidate trees. Each candidate requires MSA and subsequent phylogenetic analysis (parsimony). Neighborhoods are then clustered by the master node to give the overall tree and alignment. PT-SA is more than 10 times slower than ClustalW-MPI, but PT-SA has better scores.
    BibTeX:
    @inproceedings{Zola2006,
      author = {Jaroslaw Zola and Denis Trystram and Andrei Tchernykh and Carlos Brizuela},
      title = {Parallel multiple sequence alignment with local phylogeny search by simulated annealing},
      booktitle = {Proceedings of the 20th International Parallel and Distributed Processing Symposium, IPDPS 2006},
      year = {2006},
      pages = {8},
      doi = {http://dx.doi.org/10.1109/IPDPS.2006.1639536}
    }
    
    Zola, J., Yang, X., Rospondek, A. & Aluru, S. Parallel T-Coffee: A Parallel Multiple Sequence Aligner 2007 Proceedings of the ISCA 20th International Conference on Parallel and Distributed Computing Systems, pp. 248-253  inproceedings URL 
    Abstract: In this paper we present a parallel implementation of T-Coffee --- a widely used multiple sequence alignment package. Our software supports a majority of options provided by the sequential program, including the 3D-Coffee mode, and uses a message passing paradigm to distribute computations and memory. The main stages of T-Coffee, that is library generation and progressive alignment, have been parallelized and can be executed on distributed memory machines. Using our parallel software we report alignments of data sets consisting of hundreds of protein sequences, which is far beyond the capability of the sequential T-Coffee program.
    Review: Parallel T-COFFEE. The library generation and progressive alignment stages are parallelized. The library generation stage sees a near linear speedup up to 80 CPUs. The progressive alignment stage does not see any speedup above 16 CPUs.
    BibTeX:
    @inproceedings{Zola2007,
      author = {Jaroslaw Zola and Xiao Yang and Adrian Rospondek and Srinivas Aluru},
      title = {Parallel T-Coffee: A Parallel Multiple Sequence Aligner},
      booktitle = {Proceedings of the ISCA 20th International Conference on Parallel and Distributed Computing Systems},
      year = {2007},
      pages = {248-253},
      url = {http://www.ece.iastate.edu/~zola/papers/zola_pdcs2007.pdf}
    }
    

    Created by JabRef on 15/03/2011.