Running gnumap (quick start)
For help on how to run GNUMAP
, just type ./bin/gnumap
into a terminal and the usage
information will be displayed. A typical gnumap run requires several things.
For example, to run a test with the sequence file s_100_int.txt, reporting only
the locations containing a local alignment score of 90% or better, using the
file chrI.fa as the genome, and having the output printed to gnumap.output, I
./bin/gnumap -g examples/Cel_gen.fa -o example.output -a .9 -p -v 1 examples/example_sequences_prb.txt
(Note: This command can also be run by typing
- The -g option defines the genome.
- The -o option tells the program where to place the output (two output files will be created: one with the alignment
report for each read and another in the .sgr format usable with Integrated
Genome Browser (IGB) for convenient graphical disply.
- The -a option defines the minimum aligment score that will be accepted for mapped reads.
- The -p option indicates that the score given in the -a option is a percentage instead of a
- The last parameter is the name of Illumina's *_int.txt or *_prb.txt
file to be used for the sequences. This file is a tab- and space-deliminated file
containing either the base intensity or base quality scores respectively, with
each line containing a separate read. In order to improve accuracy, the *_seq.txt
file is not used.
- To make sure GNUMAP is running properly, there are sample files included. In order to
run this sample set, type
or, for an example with SNP output,
For the both examples, there should be about 3,000 out of 10,000 sequences that map
to locations in the C. elegans genome.
- Following are some additional example files that can be used:
- The prb, fasta,
and fastq files used for comparison in the Bioinformatics paper.
In addition, the spiked-in sequences (with original chromosome position) can be found here
and the spiked locations can be found here
- A Human Genome binary file (right click and select Save As).
This binary file was compiled on a 64-bit system with the following parameters:
- mer size: 13bp
- largest hash size: 100k
- bases skipped: 1
Running GNUMAP with MPI:
mpiexec -np N_MACH -machinefile MACH_FILE gnumap [options...]
is the number of machines you are using and
is a file
listing the machines that are available to use. Using the
option to specify the number
of processors can also be included with these parameters.
For those that are using BYU's supercomputer (or another PBS supercomputer), here is an example submission script:
#PBS -N MPI_test
#PBS -l nodes=30:ppn=1:pmem=12gb,walltime=3:00:00
#PBS -q batch
#PBS -k oe
#PBS -m bea
#PBS -M email@example.com
PROGARGS="-g \"$(echo $GENOME | sed -e 's/ /,/g')\" -o $OUTPUT -a .9 -p -c 8 \"$(echo $SEQFILES | sed -e 's/ /,/g')\" -m 12 -j 10 -v 1"
cp $PBS_NODEFILE $MACH_FILE
echo "mpiexec -np $N_MACH -machinefile $MACH_FILE $PROG $PROGARGS"
mpiexec -np $N_MACH -machinefile $MACH_FILE $PROG $PROGARGS
is a file that lists all the nodes your program is allowed to run on.
Alternatively, for a large genome would have the flag
on the end of the
This page last modified Wednesday May 21, 2014