Skip to main content

Available Databases

SmartsBio connects to 36+ major public bioinformatics databases across sequences, structures, variants, pathways, clinical data, literature, and patents. Each database is accessed through one or more tools in the API.

How it works: Databases are accessed via tools — each tool wraps one or more database APIs. Call them directly with POST /v1/tools/:id/run or let the agent automatically pick the right databases when you use the Query endpoint.

Database Reference

36 databases organized by data type. Click a database name to read the full parameter reference.

Sequence Search6 databases

DatabaseTool IDScaleDescriptionKey Parameters
NCBI BLASTncbi_blastTrillions of sequencesNucleotide & protein similarity search (blastn, blastp, blastx, tblastn, tblastx) against nt, nr, refseq, swissprot.sequence, program, database, evalue
EBI Job Dispatcherebi_job_dispatcher50+ EMBL-EBI toolsEMBL-EBI Job Dispatcher: BLAST, Clustal Omega, T-Coffee, InterProScan, HMMER, and 45+ more tools.tool, sequence, database, params
SmartsMatchsmartsmatch_protein250M+ proteins (AI index)Fast AI-powered protein similarity search using vector embeddings — faster and more context-aware than BLAST. Detects remote homologs and functional analogs.sequence, top_k, threshold, search_mode, index
NCBI Nucleotidencbi_nucleotide400M+ sequencesDNA/RNA sequences from GenBank, RefSeq, and EMBL. Fetch by accession or search by keyword.accession or query, database
NCBI Proteinncbi_protein700M+ sequencesProtein sequences from NCBI Protein and RefSeq. Supports accession fetch and keyword search.accession or query
NCBI SRAncbi_sra30+ petabytes of readsSequence Read Archive: raw sequencing runs. Search by organism, study, sample, or experiment.query, organism, platform, library_strategy

Protein & Structure6 databases

DatabaseTool IDScaleDescriptionKey Parameters
RCSB PDBpdb_toolkit220,000+ structuresExperimentally determined 3D structures (X-ray, cryo-EM, NMR). Fetch PDB/mmCIF by ID or search by keyword.pdb_id, format (pdb / mmcif)
AlphaFold DBalphafold_db214M+ predicted structuresDeepMind AlphaFold predicted structures for the entire UniProt proteome with per-residue pLDDT confidence.uniprot_id, format
UniProtuniprot_toolkit250M+ entries (570K Swiss-Prot)Protein knowledgebase: sequences, functional annotations, GO terms, pathways, and cross-references. Supports ID mapping.accession or query, database (uniprot / swissprot), format
STRINGstring_db20B+ interactions, 59M proteinsProtein-protein interaction networks: experimental, co-expression, co-occurrence, text-mining, and genomic context.proteins (list), species, score_threshold, network_type
InterProinterpro_toolkit48,000+ entriesProtein domain classification and functional prediction (Pfam, PANTHER, HAMAP, Gene3D, and 12+ member databases).sequence or accession, databases
NCBI Structurencbi_structurePDB-integratedNCBI Structure database: structural neighbors, conserved domains, and integrated PDB access.pdb_id or query

Genomics & Variants7 databases

DatabaseTool IDScaleDescriptionKey Parameters
NCBI Genencbi_gene50M+ gene recordsGene summaries, genomic coordinates, RefSeq IDs, expression data, and orthologs across species.gene_symbol or query, organism
Ensemblensembl300+ vertebrate genomesGenome browser and annotation for vertebrates: gene models, regulatory features, variants, and comparative genomics.gene_id, species, features
dbSNPncbi_dbsnp1B+ variantsReference SNP details by rs ID: allele frequencies, functional annotation, HGVS notation, and population data.rs_id
GWAS Cataloggwas_catalog500K+ associationsNHGRI-EBI GWAS Catalog: SNP-trait associations, effect sizes, p-values, and ancestral populations.trait or gene, p_value_threshold
NCBI Assemblyncbi_assembly1M+ genome assembliesReference and submitted genome assemblies. Search by organism, assembly level, or accession.organism, assembly_level, query
NCBI Taxonomyncbi_taxonomy2M+ taxaTaxonomic classification: lineage, synonyms, common names, and taxon IDs for organisms.organism_name or taxon_id
UCSC Genome Browserucsc_genome_browser100+ genomesUCSC genome tracks: conservation, regulatory elements, ENCODE data, genome coordinates, and liftover.chrom, start, end, genome, tracks

Pathway & Ontology4 databases

DatabaseTool IDScaleDescriptionKey Parameters
KEGGkegg_toolkit550+ pathways (human)KEGG pathway enrichment, gene-to-pathway mapping, metabolic networks, and SVG pathway diagrams.genes (list), organism, p_cutoff, pathway_id
Gene Ontologygo_toolkit44,000+ GO termsGO term search, gene-to-term lookup, and over-representation analysis (biological process, molecular function, cellular component).genes (list), ontology, p_cutoff, organism
Biograph Knowledge Graphbiograph60K genes · 26K diseases · 1M+ variantssmarts.bio proprietary knowledge graph: query relationships between genes, proteins, diseases, variants, pathways, and drugs across 3 query modes (entity-lookup, search, network).mode, entity, entity_type, relation_types, depth
EBI Searchebi_search170+ biological datasetsEMBL-EBI unified cross-database search across UniProt, PDB, ChEMBL, Reactome, and 165+ more in a single query.query, domain, fields, limit

Clinical & Drug5 databases

DatabaseTool IDScaleDescriptionKey Parameters
ClinicalTrials.govclinical_trials_toolkit400K+ trialsSearch clinical trials by condition, intervention, phase, status, sponsor, and location.condition, intervention, phase, status, country
ChEMBLchembl_toolkit2.4M+ compoundsBioactivity database: compound-target interactions, IC50/Ki values, ADMET properties, and drug candidate profiles.compound_id or smiles, target, activity_type
OpenFDA Drugopenfda_drug_toolkitFDA/EMA/WHO dataDrug approvals, adverse event reports (FAERS), drug labeling, recalls, and WHO international drug data.drug_name, event_type, date_range
FDA Medical Devicesmedical_devices_toolkit500K+ devicesFDA 510k clearances, PMA approvals, device recalls, and SNOMED CT medical device codes.device_name or k_number, device_class, recall_status
NLM Clinical Tablesclinical_tables_toolkit25+ reference tablesNLM Clinical Tables: RxNorm drugs, ICD codes, LOINC lab tests, HPO phenotypes, and genomics reference tables.table, query, maxList

Literature6 databases

DatabaseTool IDScaleDescriptionKey Parameters
PubMedncbi_pubmed37M+ citationsBoolean and MeSH-aware search of biomedical literature. Returns titles, abstracts, authors, and PMIDs.query, maxResults, date_range
bioRxivbiorxiv_search200K+ preprintsBiological sciences preprints — search by keyword, author, category, or DOI.query, category, date_range
medRxivmedrxiv_search50K+ preprintsClinical and health sciences preprints — search by keyword, author, or category.query, category, date_range
arXivarxiv_search2M+ papersPhysics, mathematics, and quantitative biology preprints. Useful for computational biology and bioinformatics algorithms.query, category (q-bio.*), max_results
Crossrefcrossref_search150M+ worksDOI resolution, citation counts, bibliographic metadata, and journal/publisher lookup for published articles.doi or query, type, date_range
Web (Tavily)tavily_searchLive web indexGeneral web search for current news, tool documentation, and findings not yet indexed in academic databases.query, maxResults

Patents2 databases

DatabaseTool IDScaleDescriptionKey Parameters
Google Patentspatent_search87M+ patents17+ patent offices worldwide via Google Patents Public Data (BigQuery). Full-text and CPC classification code search.query, country_code, date_range, assignee, cpc_code
USPTO PatentsViewuspto_patents_search10M+ US patentsUS patent database via PatentsView API: full text, inventor/assignee details, citation counts, and CPC codes.query, inventor, assignee, patent_date, cpc_subgroup

Discover Databases at Runtime

from smartsbio import SmartsBio

client = SmartsBio(api_key="sk_live_...")
tools = client.tools.list()

db_categories = [
    "Sequence Search", "Protein & Structure Databases",
    "Genomics & Variant Databases", "Pathway & Ontology",
    "Clinical & Drug Databases", "Literature", "Patents",
]
for category in db_categories:
    in_cat = [t for t in tools if t.category == category]
    if in_cat:
        print(f"\n{category}:")
        for t in in_cat:
            print(f"  {t.id:35s} {t.description[:50]}")

Query Examples

# UniProt — protein annotation
uniprot = client.tools.run(
    tool_id="uniprot_toolkit",
    input={"query": "BRCA1_HUMAN", "format": "json"},
)
print(uniprot["function"], uniprot["subcellular_location"])

# Ensembl — gene coordinates
ensembl = client.tools.run(
    tool_id="ensembl",
    input={"gene_id": "ENSG00000012048", "species": "human", "features": ["variants", "regulation"]},
)

# ChEMBL — drug-target bioactivity
chembl = client.tools.run(
    tool_id="chembl_toolkit",
    input={"target": "EGFR", "activity_type": "IC50", "limit": 20},
)
for compound in chembl["compounds"][:5]:
    print(f"{compound['molecule_chembl_id']}  IC50={compound['standard_value']} nM")

# ClinicalTrials.gov — open trials
trials = client.tools.run(
    tool_id="clinical_trials_toolkit",
    input={"condition": "breast cancer", "intervention": "BRCA1", "status": "RECRUITING"},
)
for trial in trials["studies"][:5]:
    print(f"{trial['nct_id']}: {trial['brief_title']}")