Available Databases
SmartsBio connects to 36+ major public bioinformatics databases across sequences, structures, variants, pathways, clinical data, literature, and patents. Each database is accessed through one or more tools in the API.
How it works: Databases are accessed via tools — each tool wraps one or more database APIs. Call them directly with
POST /v1/tools/:id/run or let the agent automatically pick the right databases when you use the Query endpoint.Database Reference
36 databases organized by data type. Click a database name to read the full parameter reference.
Sequence Search6 databases
| Database | Tool ID | Scale | Description | Key Parameters |
|---|---|---|---|---|
| NCBI BLAST | ncbi_blast | Trillions of sequences | Nucleotide & protein similarity search (blastn, blastp, blastx, tblastn, tblastx) against nt, nr, refseq, swissprot. | sequence, program, database, evalue |
| EBI Job Dispatcher | ebi_job_dispatcher | 50+ EMBL-EBI tools | EMBL-EBI Job Dispatcher: BLAST, Clustal Omega, T-Coffee, InterProScan, HMMER, and 45+ more tools. | tool, sequence, database, params |
| SmartsMatch | smartsmatch_protein | 250M+ proteins (AI index) | Fast AI-powered protein similarity search using vector embeddings — faster and more context-aware than BLAST. Detects remote homologs and functional analogs. | sequence, top_k, threshold, search_mode, index |
| NCBI Nucleotide | ncbi_nucleotide | 400M+ sequences | DNA/RNA sequences from GenBank, RefSeq, and EMBL. Fetch by accession or search by keyword. | accession or query, database |
| NCBI Protein | ncbi_protein | 700M+ sequences | Protein sequences from NCBI Protein and RefSeq. Supports accession fetch and keyword search. | accession or query |
| NCBI SRA | ncbi_sra | 30+ petabytes of reads | Sequence Read Archive: raw sequencing runs. Search by organism, study, sample, or experiment. | query, organism, platform, library_strategy |
Protein & Structure6 databases
| Database | Tool ID | Scale | Description | Key Parameters |
|---|---|---|---|---|
| RCSB PDB | pdb_toolkit | 220,000+ structures | Experimentally determined 3D structures (X-ray, cryo-EM, NMR). Fetch PDB/mmCIF by ID or search by keyword. | pdb_id, format (pdb / mmcif) |
| AlphaFold DB | alphafold_db | 214M+ predicted structures | DeepMind AlphaFold predicted structures for the entire UniProt proteome with per-residue pLDDT confidence. | uniprot_id, format |
| UniProt | uniprot_toolkit | 250M+ entries (570K Swiss-Prot) | Protein knowledgebase: sequences, functional annotations, GO terms, pathways, and cross-references. Supports ID mapping. | accession or query, database (uniprot / swissprot), format |
| STRING | string_db | 20B+ interactions, 59M proteins | Protein-protein interaction networks: experimental, co-expression, co-occurrence, text-mining, and genomic context. | proteins (list), species, score_threshold, network_type |
| InterPro | interpro_toolkit | 48,000+ entries | Protein domain classification and functional prediction (Pfam, PANTHER, HAMAP, Gene3D, and 12+ member databases). | sequence or accession, databases |
| NCBI Structure | ncbi_structure | PDB-integrated | NCBI Structure database: structural neighbors, conserved domains, and integrated PDB access. | pdb_id or query |
Genomics & Variants7 databases
| Database | Tool ID | Scale | Description | Key Parameters |
|---|---|---|---|---|
| NCBI Gene | ncbi_gene | 50M+ gene records | Gene summaries, genomic coordinates, RefSeq IDs, expression data, and orthologs across species. | gene_symbol or query, organism |
| Ensembl | ensembl | 300+ vertebrate genomes | Genome browser and annotation for vertebrates: gene models, regulatory features, variants, and comparative genomics. | gene_id, species, features |
| dbSNP | ncbi_dbsnp | 1B+ variants | Reference SNP details by rs ID: allele frequencies, functional annotation, HGVS notation, and population data. | rs_id |
| GWAS Catalog | gwas_catalog | 500K+ associations | NHGRI-EBI GWAS Catalog: SNP-trait associations, effect sizes, p-values, and ancestral populations. | trait or gene, p_value_threshold |
| NCBI Assembly | ncbi_assembly | 1M+ genome assemblies | Reference and submitted genome assemblies. Search by organism, assembly level, or accession. | organism, assembly_level, query |
| NCBI Taxonomy | ncbi_taxonomy | 2M+ taxa | Taxonomic classification: lineage, synonyms, common names, and taxon IDs for organisms. | organism_name or taxon_id |
| UCSC Genome Browser | ucsc_genome_browser | 100+ genomes | UCSC genome tracks: conservation, regulatory elements, ENCODE data, genome coordinates, and liftover. | chrom, start, end, genome, tracks |
Pathway & Ontology4 databases
| Database | Tool ID | Scale | Description | Key Parameters |
|---|---|---|---|---|
| KEGG | kegg_toolkit | 550+ pathways (human) | KEGG pathway enrichment, gene-to-pathway mapping, metabolic networks, and SVG pathway diagrams. | genes (list), organism, p_cutoff, pathway_id |
| Gene Ontology | go_toolkit | 44,000+ GO terms | GO term search, gene-to-term lookup, and over-representation analysis (biological process, molecular function, cellular component). | genes (list), ontology, p_cutoff, organism |
| Biograph Knowledge Graph | biograph | 60K genes · 26K diseases · 1M+ variants | smarts.bio proprietary knowledge graph: query relationships between genes, proteins, diseases, variants, pathways, and drugs across 3 query modes (entity-lookup, search, network). | mode, entity, entity_type, relation_types, depth |
| EBI Search | ebi_search | 170+ biological datasets | EMBL-EBI unified cross-database search across UniProt, PDB, ChEMBL, Reactome, and 165+ more in a single query. | query, domain, fields, limit |
Clinical & Drug5 databases
| Database | Tool ID | Scale | Description | Key Parameters |
|---|---|---|---|---|
| ClinicalTrials.gov | clinical_trials_toolkit | 400K+ trials | Search clinical trials by condition, intervention, phase, status, sponsor, and location. | condition, intervention, phase, status, country |
| ChEMBL | chembl_toolkit | 2.4M+ compounds | Bioactivity database: compound-target interactions, IC50/Ki values, ADMET properties, and drug candidate profiles. | compound_id or smiles, target, activity_type |
| OpenFDA Drug | openfda_drug_toolkit | FDA/EMA/WHO data | Drug approvals, adverse event reports (FAERS), drug labeling, recalls, and WHO international drug data. | drug_name, event_type, date_range |
| FDA Medical Devices | medical_devices_toolkit | 500K+ devices | FDA 510k clearances, PMA approvals, device recalls, and SNOMED CT medical device codes. | device_name or k_number, device_class, recall_status |
| NLM Clinical Tables | clinical_tables_toolkit | 25+ reference tables | NLM Clinical Tables: RxNorm drugs, ICD codes, LOINC lab tests, HPO phenotypes, and genomics reference tables. | table, query, maxList |
Literature6 databases
| Database | Tool ID | Scale | Description | Key Parameters |
|---|---|---|---|---|
| PubMed | ncbi_pubmed | 37M+ citations | Boolean and MeSH-aware search of biomedical literature. Returns titles, abstracts, authors, and PMIDs. | query, maxResults, date_range |
| bioRxiv | biorxiv_search | 200K+ preprints | Biological sciences preprints — search by keyword, author, category, or DOI. | query, category, date_range |
| medRxiv | medrxiv_search | 50K+ preprints | Clinical and health sciences preprints — search by keyword, author, or category. | query, category, date_range |
| arXiv | arxiv_search | 2M+ papers | Physics, mathematics, and quantitative biology preprints. Useful for computational biology and bioinformatics algorithms. | query, category (q-bio.*), max_results |
| Crossref | crossref_search | 150M+ works | DOI resolution, citation counts, bibliographic metadata, and journal/publisher lookup for published articles. | doi or query, type, date_range |
| Web (Tavily) | tavily_search | Live web index | General web search for current news, tool documentation, and findings not yet indexed in academic databases. | query, maxResults |
Patents2 databases
| Database | Tool ID | Scale | Description | Key Parameters |
|---|---|---|---|---|
| Google Patents | patent_search | 87M+ patents | 17+ patent offices worldwide via Google Patents Public Data (BigQuery). Full-text and CPC classification code search. | query, country_code, date_range, assignee, cpc_code |
| USPTO PatentsView | uspto_patents_search | 10M+ US patents | US patent database via PatentsView API: full text, inventor/assignee details, citation counts, and CPC codes. | query, inventor, assignee, patent_date, cpc_subgroup |
Discover Databases at Runtime
from smartsbio import SmartsBio
client = SmartsBio(api_key="sk_live_...")
tools = client.tools.list()
db_categories = [
"Sequence Search", "Protein & Structure Databases",
"Genomics & Variant Databases", "Pathway & Ontology",
"Clinical & Drug Databases", "Literature", "Patents",
]
for category in db_categories:
in_cat = [t for t in tools if t.category == category]
if in_cat:
print(f"\n{category}:")
for t in in_cat:
print(f" {t.id:35s} {t.description[:50]}")Query Examples
# UniProt — protein annotation
uniprot = client.tools.run(
tool_id="uniprot_toolkit",
input={"query": "BRCA1_HUMAN", "format": "json"},
)
print(uniprot["function"], uniprot["subcellular_location"])
# Ensembl — gene coordinates
ensembl = client.tools.run(
tool_id="ensembl",
input={"gene_id": "ENSG00000012048", "species": "human", "features": ["variants", "regulation"]},
)
# ChEMBL — drug-target bioactivity
chembl = client.tools.run(
tool_id="chembl_toolkit",
input={"target": "EGFR", "activity_type": "IC50", "limit": 20},
)
for compound in chembl["compounds"][:5]:
print(f"{compound['molecule_chembl_id']} IC50={compound['standard_value']} nM")
# ClinicalTrials.gov — open trials
trials = client.tools.run(
tool_id="clinical_trials_toolkit",
input={"condition": "breast cancer", "intervention": "BRCA1", "status": "RECRUITING"},
)
for trial in trials["studies"][:5]:
print(f"{trial['nct_id']}: {trial['brief_title']}")