gnumap

History

The development of the Genomic Next-generation Universal MAPper (GNUMAP) was performed at Brigham Young University's Computational Sciences Laboratory under the direction of Dr. Mark Clement and Dr. Quinn Snell from the Computer Science Department and Dr. Evan Johnson from the Statistics Department.

October 28, 2011: Version 3.0.2 Introduction of -k flag (to control number of matches) Also, will print out same fastq scores as existed in input file.
July 25, 2011: Version 2.9.0 Some stability fixes. Also included code for memory optimizations
June 3, 2011: sam2sgr Version 2.0 Uses threading via openmp
May 23, 2011: Version 2.6.0 Several stability updates included, in addition to significant changes in bisulfite mapping. Including backward compatability so scripts will perform the same way.
February 22, 2011: Version 2.2.4 Introduced --print_all_sam flag in addition to XP and X0 flags
December 13, 2010: Fixed SAM MAPQ characters (30 is good match, 0 is very poor)
November 29, 2010: Ambiguity characters allowed in fasta reads
November 09, 2010: sam2consensus program added
November 06, 2010: Bisulfite .sgrex printing
October 28, 2010: Version 2.2.1 Fixed error with catastrophic cancellation in PHMM code. Algorithm is more stable with long sequences.
October 20, 2010: Version 2.2.0 Update to alignment matrices. Much greater accuracy on real data.
September 20, 2010: Update to SNP calling for larger read totals. Using ratio instead of pvalues for comparison
August 30, 2010: Updates to output and SNP calling. Included ratio requirement for diploid SNP calling.
August 03, 2010: Version 2.1.7 Changed what FAST mode does. Also bug fixes and time reductions
July 22, 2010: Version 2.1.6 Fixes to Bisulfite mapping output. Also added SAM to SGR converter, Version 1.0
July 13, 2010: Version 2.1.5 Fixed memory leak, added
May 19, 2010: Version 2.1.0 Disallowed program from searching for the same string multiple times. About 2x speedup.
May 13, 2010: Version 2.0.1 Fixed MPI bugs and SAM output
May 07, 2010: Version 2.0.0 Added MPI for both large memory machines (genome spread across nodes) and small memory machines (reads spread across nodes)
March 29, 2010: Version 1.5.75 Fixed a memory leak that was introduced in a previous version
February 23, 2010: Version 1.5.6 Sam output and a larger genome capability
February 17, 2010: Version 1.5.5 Added SAM output flag
February 03, 2010: Version 1.5.3 Minor changes to improve speed
January 11, 2010: Version 1.5 Fixed several bugs (including fastq sequence files and SNP output in the .sgrex output file)
December 29, 2009: Version 1.4.9 Fixed several bugs (including fasta sequence files), only reads in a section of the sequences intead of all at once
December 14, 2009: Version 1.4.8 Fixed the number of sequences report
December 01, 2009: Version 1.4.7 Can read fasta sequences that are multi-lined
December 01, 2009: Version 1.4.5 and 1.4.6 Bug fixes
November 14, 2009: Version 1.4.0 Added functionality to read in both fastq and fasta files (in addition to prb and int files).
November 09, 2009: Version 1.3.3 Added examples/ directory
November 05, 2009: Version 1.3.2 Bug fixes. Added --snp param.
October 15, 2009: Version 1.3. Reduced memory footprint, added things to increase speed. Will also print out each base when needed for the .sgrex file
September 17, 2009: Version 1.2. Introduced flags for reading and writing of the genome (reduced hashing and storing the genome to just under 5 minutes). Also included a "sliding window" method to find a greater number of matches to the genome.
July 27, 2009: Version 1.02.1. Bug fixes and optimization. Introduction of the --fast flag.
02 July, 2009: Version 1.01. Additional performance-enhancing modifications. Also some bug fixes for reading in multiple genomes.
15 June, 2009: Version 1.0. Several modifications to create increased performance.
05 May, 2009: Version 0.99_5. Added -0 flag to print output at every position for unequally lengthed sequences. Also included two other flags (-b and -d) to provide output for assorted methylation analyses.
01 May, 2009: Version 0.99_4. Fixed an error with reading multiple chromosomes. Also allowed for multiple chromosomes to be included in one file.
01 April, 2009: Version 0.99_3. Fixed an error with overflowing the size of a 32-bit unsigned int. Using 64-bit unsigned long instead. Must compile under 64-bit mode (included in Makefile).
10 February, 2009: Version 0.99_2 removed the SEQ_LENGTH variable so the sequences can have variable lengths. Also allowed for the receipt for the adaptor sequence, removing any adaptor characters especially for shorter sequences.
Version 0.99 (02 Feb release) removed the popt library, using built-in command line parsing methods.
Version 0.99 of gnumap was released on January 07, 2009, with optimized alignment speed.
On December 30, 2008, the algorithm was updated, allowing a command-line argument for the specification of a matrix identifying match and mismatch scores.
August 20, 2008: Version 0.96 was released, allowing the user to include the analysis for many sequence files.
On August 05, 2008, Version 0.95 was released, providing compatibility with both the _prb.txt and _int.txt files.
Version 0.9 (BETA) of gnumap was released on July 31, 2008 with full functionality. Multithreading support was included to improve performance. The datagen program, used to generate synthetic reads for duplicate analysis, was also included.
Version 0.5 of gnumap was released on June 2, 2008 with limited functionality. Synthetic benchmark datasets were also developed to measure the impact of duplicate reads on the accuracy of different read mapping programs.
At our research meeting on April 23, 2008, we developed the hashing algorithm. Most mapping programs hash the reads and then make one pass across the genome to map the reads. In order to account for duplicate reads, we decided that gnumap would need to hash the genome and then map each read into muliple locations.
On March 21, 2008, the architecture of gnumap was developed with a focus on statistical methods to account for duplicate reads.
Januray 11, 2008: Initial work on gnumap began when Dr Evan Johnson and his statistics research group began meeting with researchers in the Computational Sciences Laboratory.