(bio)-informatics, data processing and visualization

Monday, May 17, 2021

BLAST-N Plus run parameters and tab-delimited output



==================================================

makeblastdb -in Genome_DNA.Fasta -out Genome_DNA.bdb -dbtype nucl -input_type fasta -max_file_sz 2GB -hash_index -parse_seqids 

makeblastdb \
    -in Genome_DNA.Fasta \
    -out Genome_DNA.bdb \
    -dbtype nucl \
    -input_type fasta \
    -max_file_sz 2GB \
    -hash_index \
    -parse_seqids

==================================================

blastn -task blastn -query Query_Seqs.Fasta -db Genome_DNA.bdb -out Query_Seqs.vs.Genome_DNA.blastplus.out.m0 -outfmt 0 -evalue 1e-120 -dbsize 1000000 -dust no -word_size 24 -xdrop_ungap 50 -xdrop_gap 500 -xdrop_gap_final 1000 -max_target_seqs 240 -line_length 100 -num_threads 6

blastn \
    -task blastn \
    -query Query_Seqs.Fasta \
    -db Genome_DNA.bdb \
    -out Query_Seqs.vs.Genome_DNA.blastplus.out.m0 \
    -outfmt 0 \
    -evalue 1e-120 \
    -dbsize 1000000 \
    -dust no \
    -word_size 24 \
    -xdrop_ungap 50 \
    -xdrop_gap 500 \
    -xdrop_gap_final 1000 \
    -max_target_seqs 240 \
    -line_length 100 \
    -num_threads 6

==================================================

blastn -task blastn -query Query_Seqs.Fasta -db Genome_DNA.bdb -out Query_Seqs.vs.Genome_DNA.blastplus.out.m7 -outfmt '7 qseqid sseqid evalue pident score length nident mismatch gaps frames qstart qend sstart send qcovhsp qlen slen' -evalue 1e-120 -dbsize 1000000 -dust no -word_size 24 -xdrop_ungap 50 -xdrop_gap 500 -xdrop_gap_final 1000 -max_target_seqs 240 -num_threads 6 

blastn \
    -task blastn \
    -query Query_Seqs.Fasta \
    -db Genome_DNA.bdb \
    -out Query_Seqs.vs.Genome_DNA.blastplus.out.m7 \
    -outfmt '7 qseqid sseqid evalue pident score length nident mismatch gaps frames qstart qend sstart send qcovhsp qlen slen' \
    -evalue 1e-120 \
    -dbsize 1000000 \
    -dust no \
    -word_size 24 \
    -xdrop_ungap 50 \
    -xdrop_gap 500 \
    -xdrop_gap_final 1000 \
    -max_target_seqs 240 \
    -num_threads 6   

# Fields: 

query id - 1 [0] (qseqid)
subject id - 2 [1] (sseqid)
evalue - 3 [2] (evalue)
% identity - 4 [3] (pident)
score - 5 [4] (score)
alignment length - 6 [5] (length)
identical - 7 [6] (nident)
mismatches - 8 [7] (mismatch)
gaps - 9 [8] (gaps)
query/sbjct frames - 10 [9] (frames)
q. start - 11 [10] (qstart)
q. end - 12 [11] (qend)
s. start - 13 [12] (sstart)
s. end - 14 [13] (send)
% query coverage per hsp - 15 [14] (qcovhsp)
query length - 16 [15] (qlen)
subject length - 17 [16] (slen)

==================================================


Tuesday, May 4, 2021

BLAST-N of low quality long sequences

blastall -p blastn -V T -F F -e 1e-20 -y 50 -X 75 -Z 500 -b 240 -v 240 -d DATABASE -i INPUT -o OUTPUT

Where:

-y X  dropoff value for ungapped extensions in bits (0.0 invokes default behavior)

      blastn 20, megablast 10, all others 7 [Real]


-X X  dropoff value for gapped alignment (in bits) (zero invokes default behavior)

      blastn 30, megablast 20, tblastx 0, all others 15 [Integer]


-Z X  dropoff value for final gapped alignment in bits (0.0 invokes default behavior)

      blastn/megablast 100, tblastx 0, all others 25 [Integer]