Some Molecular Biology Scripts


All should be considered beta versions, and they my require a bit of tweaking to work.
Some take command line arguments, and some you have to edit the script itself.
You can e-mail me (below) with questions, problems, or suggestions. For additional scripts, check our public code repo on bitbucket...

NAMEPURPOSE
[encodename.pl]convert fasta file with long names to phylip with short names. useful for phyml and raxml. also can concatenate sequences or convert some file formats
[decodename.pl]convert short names generated by encodename.pl back into long names, for example in a tree file. can also edit the lookup table to clean up or annotations to tree names
[protcalc.pl]from a fasta file of amino acid sequences, print out a list of calculated values including molecular weight, charge, and percent composition. also search proteins for short motifs
[translateone.py]read a fasta file, do 6-frame translation, print best protein seqs as fasta
[mybio.py]mini library required for use with translatedna, seqlite and filterseqs
[genbanknames.py]retrieve genbank records from accession numbers
[blastplustable.py]perform blast searches on a fasta file, save abbreviated result table (requires biopython and local blast+ installation)
[lucidconvert.py]Convert Lucid Builder CSV exported files to a NEXUS formatted table
[seqlite_nohtml.py]from a sequence alignment, return only the variable sites and sequences. (requires mybio.py, above)
[sizeonly.py]from a fasta file, return only the size of each sequence, for making a histogram, etc
[gapmap.py]given an aligned amino acid file and a corresponding unaligned DNA file, insert gaps into the DNA file to create an alignment
[smartcat.sh]join together multiple files as one, using *.fta formulation
[findseqname.py]sort through a fasta file, keeping or rejecting sequences with names that contain a certain string. requires mybio.py, above
[findmotif.py]sort through a fasta file, keeping or rejecting sequences which contain a certain subsequence. search can be specified as regexp: "ATG$". requires mybio.py, above

Return to Steve Haddock's Home Page     E-mail:     Last modified: Apr. 17, 2013