CC-BY 4.0 | BLAST index
IBM Functional Genomics Platform BLAST Databases
NCBI BLAST formatted databases for bacterial and viral gene and protein sequences derived using analytics from the IBM Functional Genomics Platform.
On this page
Overview
This dataset provides gene and protein sequences for comparative analysis and search for over 2,700 bacterial and viral genera. The original genomes were processed by the IBM Functional Genomics Platform from de novo assembled genomes (raw data from NCBI SRA) and reference genomes (from NCBI RefSeq and GenBank) and then processed via NCBI BLAST tools to create both nucleotide and protein BLAST databases. The resulting databases contain over 76M gene and 58M protein sequences for bacteria and 662K gene and 521K protein sequences for virus that can be searched using the blastn and blastp command line tools from NCBI. These databases were built with NCBI BLAST version 2.9.0+
Note: The data included in this dataset reflects a best effort based on references and tools available today in the public domain. IBM does not represent or guarantee the accuracy of data provided or of the original sources and tools. IBM does not represent or guarantee that conclusions drawn from these tools and this data are free from defects including false positive or false negative classifications.
Get this Dataset
Data Description | Zipped File Name |
---|---|
Genes and proteins for bacteria Dataset, 35 GB | ibm_fgp_blast_bacteria_gene_and_proteins.tar.gz |
Genes and proteins for virus Dataset, 256 MB | ibm_fgp_blast_virus_gene_and_proteins.tar.gz |
Dataset Metadata
Field | Value |
---|---|
Format | BLAST index |
License | CC-BY 4.0 |
Domain | Computational Biology and Bioinformatics |
Number of Records | 76,233,193 bacterial genes, 58,669,101 bacterial proteins, 662,729 viral genes, 521,851 viral proteins |
Data Split | 76,233,193 bacterial genes, 58,669,101 bacterial proteins, 662,729 viral genes, 521,851 viral proteins |
Size | Bacteria - 35 GB (compressed), 113 GB (uncompressed); Virus - 256 MB (compressed), 1.2 GB (uncompressed) |
Author | Ed Seabolt, Kristen L. Beck, Gowri Nayar, Akshay Agarwal, Harsha Krishnareddy, Hakan Bulu, Thuan Doan, James Kaufman, Vandana Mukherjee |
Dataset Origin | NCBI and IBM Functional Genomics Platform |
Dataset Version Update | 1.0.0 - December 23rd, 2020 |
Dataset Archive Contents
File or Folder | Description |
---|---|
ibm_fgp_blast_bacteria_gene_and_proteins.tar.gz | Contains the genes and protiens for bacteria |
ibm_fgp_blast_virus_gene_and_proteins.tar.gz | Contains the genes and protiens for virus |
Example Records
Example nucleotide search:
Example protein search: