NCBI BLAST FTP Site
Tao Tao, Ph.D.
User Service
NCBI, NLM, NIH
TOC
 
1. Introduction
NCBI BLAST ftp site (ftp.ncbi.nlm.nih.gov) provides software packages for standalone blast, client server blast, and wwwblast packages for various platforms. It also provides commonly used blast databases in preformatted as well as FASTA format. Documents on the blast executables and other related subjects are also available from this site.

This file describes the the subdirectories and files found on this ftp site. It also prvoides the basic information on file content and on how the files should be used.

 
2. File list and content
This section list and describes the files found on the BLAST FTP site. File content for each directory/subdirectory is described in a separate table.
 
    2.1 ftp.ncbi.nlm.nih.gov/blast/ general directory content
The blast ftp directory contains several subdirectories each for a specific set of files.

Table 2.1 ftp.ncbi.nlm.nih.gov/blast/ general directory content
File/Dir NameContent
blastftp.htmlREADME on FTP site (this file)
dbdatabases in preformatted or FASTA form
demodemonstration programs and documents from blast developers
documentsdocuments for programs in standalone blast, netblast, and wwwblast programs
executablesarchives for binary distribution of blast programs
matricesprotein and nucleotide score matrices, only a subset are supported by blast
temptemporary directory for miscellaneous files
 
 
    2.2 /blast/db/ directory content
Databases larger than two gigabytes (2 GB) are formatted in multiple volumes, which are named using the "database.##.tar.gz" convention. All relevant volumes are required. An alias file is provided so that the database can be called using the alias name without the extension (.nal or .pal). For example, to call est database, simply use "–d est" option in the commandline (without the quotes).

Certain databases are subsets of a larger parental database. For those databases, mask files, rather than actual databases, are provided. The mask file needs the parent database to function properly. The parent databases should be generated on the same day as the mask file. For example, to use swissprot preformatted database, swissprot.tar.gz, one will need to get the nr.tar.gz with the same date stamp.

To use the preformatted blast database file, first inflate the file using gzip (unix, linux), WinZip (window), or StuffIt Expander (Mac), then extract the component files out from the resulting tar file using tar (unix, linux), WinZip (Window), or StuffIt Expander (Mac). The resulting files are ready for BLAST.

Table 2.2 /blast/db/ directory content
File/Dir NameContent
FASTAsubdirectory with databases in FASTA format
blastdb.htmlcontent list of the blast databases
est.*.tar.gzvolumes of the est database, all are needed to reconstitute complete est database
est_human.tar.gzhuman est database, a mask file requires all volumes of est to work
est_mouse.tar.gzmouse est database, a mask file requires all volumes of est to work
est_others.tar.gznon-human, non-mouse est database, a mask file reqires both volumes of est
gss.*.tar.gzgenomic survery sequence database, all are needed to reconstitute complete gss database
htgs.*.tar.gzvolumes of the htgs database, all are needed to reconstitute complete htgs database
human_genomic.tar.gzhuman chromosome database containing concatenated contigs with adjusted gaps represented by N's
nr.*.tar.gzvolumes of non-redundant protein database, all are needed to reconstitute complete nr
nt.*.tar.gzvolumes of the nucleotide "nr" database, all are needed to reconstitute complete nt database
other_genomic.*.tar.gzvolumes of chromosome database for organisms other than human, all are needed to reconstitute complete other_genomic database
pataa.tar.gzpatent protein database
patnt.tar.gzpatent nucleotide database
pdbaa.tar.gzprotein sequence database from pdb entries, a mask file requires all nr.*.tar.gz to function
pdbnt.tar.gznucleotide sequence database from pdb entries. They are not the coding sequences for the corresponding protein structure entries!
sts.tar.gzsequence tag site database
swissprot.tar.gzswissprot sequence database, last major release. It requires all nr.*.tar.gz to work properly
taxdb.tar.gztaxonomy id database for use with the above file to retrieve taxonomic information for specific entries
wgs.*.tar.gzvolumes of wgs assembly database, all volumes are needed to reconsititute wgs database

        2.2.1 /blast/db/FASTA/ subdirectory content
The FASTA database files are now stored in this subdirectory, it does contain some additional databases that are not available via the NCBI BLAST pages. Due to file size issues, the full est database is not provided. One needs to get the three subsets and concatenate them together to get the complete est database.

These databases will need to be formatted using formatdb program found in the standalone blast executable package. The recommended commandlines to use are:

formatdb –i input_db –p F –o T    for nucleotide
formatdb –i input_db –p T –o T    for protein
For additional information on formatdb, please see the formatdb.html at:
/blast/documents/formatdb.html
2.2.1 blast/db/FASTA/ subdirectory content
File/Dir NameContent
alu.a.gzproteins translated from alu.n
alu.n.gzalu repeat sequences
drosoph.aa.gzDrosophila protein from genome annotation
drosoph.nt.gzDrosophila genome
ecoli.aa.gzE.coli K-12 proteins from genome annotation
ecoli.nt.gzE.coli K-12 genomic contigs
est_human.gzhuman subset of the est database
est_mouse.gzmouse subset of the est database
est_others.gzsubset of est other than human or mouse entries
gss.gzGenomic Survey Sequences (mostly BAC ends)
htgs.gzHigh Throughput Genomic Sequences
human_genomic.gzHuman chromosomes formed by concatenating genomic contig assemblies (NT_######) and adjusting the gaps with N’s
igSeqNt.gz Immunoglobulin nucleotide sequences
igSeqProt.gzImmunoglobulin protein sequences
mito.aa.gzprotein from the annotated mitochondrial genomes
mito.nt.gzmitochondrial genomes
month.aa.gzprotein sequences released or updated in the past 30 days
month.est_human.gzhuman subset of EST released/updated in the past 30 days
month.est_mouse.gzmosue subset of EST released/updated in the past 30 days
month.est_others.gznon-human non-mouse EST, released or updated in the past 30 days
month.gss.gzgss entries released/updated in the past 30 days
month.htgs.gz htgs entries released/updated in the past 30 days
month.nt.gzsubset of nt released/updated in the past 30 days
nr.gznon-redundant protein sequence database
nt.gznucleotide database from GenBank excluding the batch division htgs, est, gss,sts, pat divisions, and wgs entries. Not non-redundant
other_genomic.gzChromosome entries other than human
pataa.gzPatent protein sequence database
patnt.gzPatent nucleotide sequence database
pdbaa.gzprotein sequences for pdb entries
pdbnt.gznucleotide entries for pdb entries. They are NOT the coding sequence forthe corresponding protein entries
sts.gz Sequence Tag Sites database
swissprot.gzswissprot database, last major release
vector.gzvector sequences from synthetic (syn) division of GenBank
wgs.gzWhole Genome Shotgun sequence assembly
yeast.aa.gzprotein translations from yeast genome annotation
yeast.nt.gzyeast genomic sequence

    2.3 File content for /blast/demo/ directory
This directory contains some technical presentations from the BLAST developers along with some demo tools or documentation relevant to BLAST.

Table 2.3 File content for /blast/demo/ directory
File/Dir NameContent
README.blast_demoreadme for blast_demo package
README.first readme for this directory
README.parse_blast_xmlreadme for parse_blast_xml package
benchmarkPackage with sample database and query for gauging the performance of BLAST on different platforms
blast_demo.tar.gzblast_demo package on blast db, blast object, and reformating blast alignment from blastobj file
blast_programming.pptPowerPoint presentation on BLAST programing
ieee_blast.final.pptPowerPoint presentation (IEEE conference)
ieee_talk.pdfPDF file for presentation (IEEE conference)
ieee_talk.pdfAbove IEEE presentation in PDF format
oldContaining an older version of blast_demo.tar.gz
parse_blast_xml.tar.gzdemo package on parsing xml styled blast output
splitd.pptPowerPoint presentation on NCBI BLAST server’s splitd implementation
test_suite.tar.gztest package

    2.4 File content for /blast/documents/ directory
This directory contains copies of the documentation on different BLAST programs distributed from this ftp site under the /blast/executables/ directory. blast.txt also contains detailed release history.

2.4 File content for /blast/documents/ directory
File/Dir NameContent
bl2seq.htmlreadme for bl2seq (standalone "Align Two Sequences")
blast-sc2004.pdfPOSTER describing NCBI splitd system
blast.htmlGeneral readme for blast setup
blastall.htmlreadme for blastall and blastpgp
blastclust.htmlreadme for blastclust
blastdb.htmlreadme for blast databases
blastftp.htmlreadme for blast ftp site (this document)
blastpgp.htmlreadme for blastpgp, standalone PSI-BLAST
developersubdirectory with additional documentation:
blast_seqalign.txt: describing seqalign function
readdb.txt: describing readdb function
scoring.pdf: describing scoring systems used by BLAST
urlapi.txt: a short introduction on BLAST URL API
fastacmd.htmlreadme for fastacmd, a sequence retrieval tool
filter.htmlreadme on filter strings and their functions
formatdb.htmlreadme for formatdb, which converts fasta sequences into blast database
formatrpsdb.htmlreadme for formatrpsdb, which converts scoremat into rpsblast database
history.htmlhistory of changes/bug fixes introduced to blast package
impala.htmlreadme for impala
index.htmllist of this group of files/subdirs
megablast.htmlreadme for megablast
netblast.htmlreadme for netblast (blastcl3)
rpsblast.htmlreadme for rpsblast
seedtop.htmlreadme for seedtop, a standalone pattern match tool
web_blast.plExample perl script for doing URLAPI blast
xmlsubdirectory with .dtd and .mod field description files for blast xml output
NCBI_BlastOutput.dtd: dtd file for blast xml output
NCBI_BlastOutput.mod: mod file for blast xml output
NCBI_Entity.mod: mod file for NCBI xml file
README.blxml: readme on blast xml output

    2.5 File content for /blast/executables/ directory
This directory contains several subdirectories each for a specific subsets of executable BLAST programs.

        2.5.1 File content for /blast/executables/LATEST/ subdirectory
This directory contains the latest official release of precompiled BLAST executable programs.

The binaries can be divided into three groups. blast initialed files are equivalent. They contain the standalone command line blast binary programs for different platforms. Users need this package to set up BLAST locally. It also provides the tools necessary to prepare custom databases and retrieve sequences from these prepared databases.

The netblast initialed archives the blastcl3 program which functions by formulating BLAST search locally first before forwarding the search to NCBI blast server for process. The search results returned by NCBI BLAST server is saved to an user-specified file on local computer disk. Users do not need to maintain local databases, nor can they search custom databases locally.

The wwwblast initialed archive contains the web pages with embedded blast search forms similar to that of NCBI. They can process the BLAST request submitted through web and search against local set of databases and return the result to a browser window. wwwblast is now in sync with the NCBI toolkit and the two above two packages. Installation requires existing web server (apache) setup.

Table 2.5.1 File content for /blast/db/executables/LATEST/ directory
File/Dir NameContent
blast-x.y.z-axp64-tru64.tar.gzcommand line blast for COMPAQ/HP alpha machine (OSF 5.1 and above)
blast-x.y.z-ia32-freebsd.tar.gzcommand line blast for PC running freeBSD
blast-x.y.z-ia32-linux.tar.gzcommand line blast for PC running Linux
blast-x.y.z-ia32-solaris.tar.gzcommand line blast for PC runnig solaris
blast-x.y.z-ia32-win32.execommand line blast for PC running windows
blast-x.y.z-mips64-irix.tar.gzcommand line blast for 64-bits SGI machine
blast-x.y.z-ppc32-macosx.tar.gzcommand line blast for MacOSX
blast-x.y.z-sparc64-solaris.tar.gzcommand line blast for Sparc running Solaris
blast-x.y.z-x64-linux.tar.gzcommand line blast for 64-bits Linux
ncbi.tar.gzNCBI toolkit
ncbiz.exeNCBI toolkit for PC
netblast-x.y.z-axp64-tru64.tar.gznetblast for COMPAQ/HP alpha machine (OSF 5.1 and above)
netblast-x.y.z-ia32-freebsd.tar.gznetblast for PC running freeBSD
netblast-x.y.z-ia32-linux.tar.gznetblast for PC running Linux
netblast-x.y.z-ia32-solaris.tar.gznetblast for PC runnig solaris
netblast-x.y.z-ia32-win32.exenetblast for PC running windows
netblast-x.y.z-mips64-irix.tar.gznetblast for 64-bits SGI machine
netblast-x.y.z-ppc32-macosx.tar.gznetblast for MacOSX
netblast-x.y.z-sparc64-solaris.tar.gznetblast for Sparc running Solaris
netblast-x.y.z-x64-linux.tar.gznetblast for 64-bits Linux
wwwblast-x.y.z-axp64-tru64.tar.gzweb server blast for COMPAQ/HP alpha machine (OSF 5.1 and above)
wwwblast-x.y.z-ia32-freebsd.tar.gzweb server blast for PC running freeBSD
wwwblast-x.y.z-ia32-linux.tar.gzweb server blast for PC running Linux
wwwblast-x.y.z-ia32-solaris.tar.gzweb server blast for PC runnig solaris
wwwblast-x.y.z-mips64-irix.tar.gzweb server blast for 64-bits SGI machine
wwwblast-x.y.z-ppc32-macosx.tar.gzweb server blast for MacOSX
wwwblast-x.y.z-sparc64-solaris.tar.gzweb server blast for Sparc running Solaris
wwwblast-x.y.z-x64-linux.tar.gzweb server blast for 64-bits Linux
NOTE:
Currently, we do not have access to IBM AIX server to make the binary for that platform. Also, we do not make wwwblast package of window.

        2.5.2 /blast/executables/release/ subdirectory content
This directory contains past major releases of BLAST, as far back as version 2.0.10. Each release is in its own subdirectory.

2.5.2 File content for /blast/executables/release/ subdirectory
File/Dir NameContent
2.0.10Release version 2.0.10
2.0.11Release version 2.0.11
2.0.12Release version 2.0.12
2.0.13Release version 2.0.13
2.0.14Release version 2.0.14
2.0.7Release version 2.0.7
2.0.8 Release version 2.0.8
2.0.9Release version 2.0.9
2.1.2Release version 2.1.2
2.1.3Release version 2.1.3
2.2.10Release version 2.2.10
2.2.11 Release version 2.2.11
2.2.12Release version 2.2.12
2.2.13Release version 2.2.13
2.2.3Release version 2.2.3
2.2.4Release version 2.2.4
2.2.5Release version 2.2.5
2.2.6Release version 2.2.6
2.2.7Release version 2.2.7
2.2.8Release version 2.2.8
2.2.9Release version 2.2.9

        2.5.3 /blast/executables/snapshot/ subdirectory content
This directory contains compilation of blast packages with bug fixes in between major versioned releases. The subdirectories are named with date. There was not snapshot or patch since the end of 2004, which as for release 2.2.10. Available old patches will not be listed here.

    2.6 /blast/matrices directory content
This directory contains the scoring matrices, which are files that can be used by BLAST alignment assessment. The file are text files with special format that can be viewed directly by a browser.

For valid statistical analysis, blastn uses only identity matrix and blastp only supports a limited subset of the BLOSUM and PAM matrices: BLOSUM45, BLOSUM62, BLOSUM80, plus PAM30 and PAM70.

    2.7 /blast/temp/ directory content
For temporary file storage and miscellaneous files or tools. Current empty.

3. Techinical Support
Additional questions/comments on this ftp site should be directed to NCBI blast-help group at:
blast-help@ncbi.nlm.nih.gov
Other questions on general NCBI resources should be directed to:
info@ncbi.nlm.nih.gov