(bio)-informatics, data processing and visualization

Tuesday, October 26, 2010

NCBI UniVec Illumina Adaptors and Primers



>gnl|uv|NGB00361.1:1-92 Illumina PCR Primer
CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTAATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>gnl|uv|NGB00361.1:1-92-rev-comp 92 nt
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG

>gnl|uv|NGB00362.1:1-61 Illumina Paired End PCR Primer 2.0
CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT
>gnl|uv|NGB00362.1:1-61-rev-comp 61 nt
AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG

>gnl|uv|NGB00376.1:1-44 Illumina Gex PCR Primer 2
AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA
>gnl|uv|NGB00376.1:1-44-rev-comp 44 nt
TCGGACTGTAGAACTCTGAACCTGTCGGTGGTCGCCGTATCATT

>gnl|uv|NGB00364.1:1-43 Illumina Multiplexing PCR Primer Index 1
CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTC
>gnl|uv|NGB00364.1:1-43-rev-comp 43 nt
GAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG

>gnl|uv|NGB00365.1:1-43 Illumina Multiplexing PCR Primer Index 2
CAAGCAGAAGACGGCATACGAGATACATCGGTGACTGGAGTTC
>gnl|uv|NGB00365.1:1-43-rev-comp 43 nt
GAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTG

>gnl|uv|NGB00366.1:1-43 Illumina Multiplexing PCR Primer Index 3
CAAGCAGAAGACGGCATACGAGATGCCTAAGTGACTGGAGTTC
>gnl|uv|NGB00366.1:1-43-rev-comp 43 nt
GAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG

>gnl|uv|NGB00367.1:1-43 Illumina Multiplexing PCR Primer Index 4
CAAGCAGAAGACGGCATACGAGATTGGTCAGTGACTGGAGTTC
>gnl|uv|NGB00367.1:1-43-rev-comp 43 nt
GAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTG

>gnl|uv|NGB00368.1:1-43 Illumina Multiplexing PCR Primer Index 5
CAAGCAGAAGACGGCATACGAGATCACTGTGTGACTGGAGTTC
>gnl|uv|NGB00368.1:1-43-rev-comp 43 nt
GAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTG

>gnl|uv|NGB00369.1:1-43 Illumina Multiplexing PCR Primer Index 6
CAAGCAGAAGACGGCATACGAGATATTGGCGTGACTGGAGTTC
>gnl|uv|NGB00369.1:1-43-rev-comp 43 nt
GAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCTTG

>gnl|uv|NGB00370.1:1-43 Illumina Multiplexing PCR Primer Index 7
CAAGCAGAAGACGGCATACGAGATGATCTGGTGACTGGAGTTC
>gnl|uv|NGB00370.1:1-43-rev-comp 43 nt
GAACTCCAGTCACCAGATCATCTCGTATGCCGTCTTCTGCTTG

>gnl|uv|NGB00371.1:1-43 Illumina Multiplexing PCR Primer Index 8
CAAGCAGAAGACGGCATACGAGATTCAAGTGTGACTGGAGTTC
>gnl|uv|NGB00371.1:1-43-rev-comp 43 nt
GAACTCCAGTCACACTTGAATCTCGTATGCCGTCTTCTGCTTG

>gnl|uv|NGB00372.1:1-43 Illumina Multiplexing PCR Primer Index 9
CAAGCAGAAGACGGCATACGAGATCTGATCGTGACTGGAGTTC
>gnl|uv|NGB00372.1:1-43-rev-comp 43 nt
GAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTG

>gnl|uv|NGB00373.1:1-43 Illumina Multiplexing PCR Primer Index 10
CAAGCAGAAGACGGCATACGAGATAAGCTAGTGACTGGAGTTC
>gnl|uv|NGB00373.1:1-43-rev-comp 43 nt
GAACTCCAGTCACTAGCTTATCTCGTATGCCGTCTTCTGCTTG

>gnl|uv|NGB00374.1:1-43 Illumina Multiplexing PCR Primer Index 11
CAAGCAGAAGACGGCATACGAGATGTAGCCGTGACTGGAGTTC
>gnl|uv|NGB00374.1:1-43-rev-comp 43 nt
GAACTCCAGTCACGGCTACATCTCGTATGCCGTCTTCTGCTTG

>gnl|uv|NGB00375.1:1-43 Illumina Multiplexing PCR Primer Index 12
CAAGCAGAAGACGGCATACGAGATTACAAGGTGACTGGAGTTC
>gnl|uv|NGB00375.1:1-43-rev-comp 43 nt
GAACTCCAGTCACCTTGTAATCTCGTATGCCGTCTTCTGCTTG

>gnl|uv|NGB00363.1:1-34 Illumina Multiplexing PCR Primer 2.0
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
>gnl|uv|NGB00363.1:1-34-rev-comp 34 nt
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

>gnl|uv|NGB00377.1:1-32 Illumina DpnII Gex Sequencing Primer
CGACAGGTTCAGAGTTCTACAGTCCGACGATC
>gnl|uv|NGB00377.1:1-32-rev-comp 32 nt
GATCGTCGGACTGTAGAACTCTGAACCTGTCG

>gnl|uv|NGB00378.1:1-32 Illumina NlaIII Gex Sequencing Primer
CCGACAGGTTCAGAGTTCTACAGTCCGACATG
>gnl|uv|NGB00378.1:1-32-rev-comp 32 nt
CATGTCGGACTGTAGAACTCTGAACCTGTCGG

>gnl|uv|NGB00380.1:1-26 Illumina Small RNA 3' Adapter
AATCTCGTATGCCGTCTTCTGCTTGC
>gnl|uv|NGB00380.1:1-26-rev-comp 26 nt
GCAAGCAGAAGACGGCATACGAGATT

>gnl|uv|NGB00379.1:1-23 Illumina 3' RNA Adapter
TCGTATGCCGTCTTCTGCTTGTT
>gnl|uv|NGB00379.1:1-23-rev-comp 23 nt
AACAAGCAGAAGACGGCATACGA

Tuesday, September 28, 2010

Illumina adaptor trimming


### NGB00362.1:1-61 Illumina Paired End PCR Primer 2.0
perl -p -i -e 's/AGATCGGAAGAGCGGT.*//' z-trim-test.trim
perl -p -i -e 's/AGATCGGAAGAGCGG$//' z-trim-test.trim

###
NGB00361.1:1-92 Illumina PCR Primer
perl -p -i -e 's/AGATCGGAAGAGCGTC.*//' z-trim-test.trim
perl -p -i -e 's/AGATCGGAAGAGCGT$//' z-trim-test.trim

### Common region for Illumina PCR Primer and Paired End PCR Primer 2.0
perl -p -i -e 's/AGATCGGAAGAGCG$//' z-trim-test.trim
perl -p -i -e 's/AGATCGGAAGAGC$//' z-trim-test.trim
perl -p -i -e 's/AGATCGGAAGAG$//' z-trim-test.trim
perl -p -i -e 's/AGATCGGAAGA$//' z-trim-test.trim
perl -p -i -e 's/AGATCGGAAG$//' z-trim-test.trim
perl -p -i -e 's/AGATCGGAA$//' z-trim-test.trim
perl -p -i -e 's/AGATCGGA$//' z-trim-test.trim

Removing of homopolymer tails:

perl -p -i -e 's/^A{8,}//' z-trim-test.trim
perl -p -i -e 's/^T{8,}//' z-trim-test.trim
perl -p -i -e 's/^C{8,}//' z-trim-test.trim
perl -p -i -e 's/^G{8,}//' z-trim-test.trim

perl -p -i -e 's/A{8,}$//' z-trim-test.trim
perl -p -i -e 's/T{8,}$//' z-trim-test.trim
perl -p -i -e 's/G{8,}$//' z-trim-test.trim
perl -p -i -e 's/C{8,}$//' z-trim-test.trim

Friday, July 30, 2010

virtual splicing

regular expressions to remove surrounded/enclosed lowercase characters by uppercase letters (virtual splicing), for example:
perl -p -i.1 -e 's/(?<=[A-Z])[a-z]*(?=[A-Z])//g' example.txt
or
perl -p -i.1 -e 's/(?<=[A-Z])[a-z]*(?=[A-Z])//g unless /^>/' example.txt
(in the case of file with FASTA header)

will transform string
atgcATGCcgtaACGTtgcaCGTAcgta
to
atgcATGCACGTCGTAcgta

(solution suggested by Leah McHale https://pro.osu.edu/profiles/mchale.21/)