Available Databases

SmartsBio connects to 71+ major public bioinformatics databases across sequences, structures, variants, pathways, clinical data, literature, and patents. Each database is accessed through one or more tools in the API.

How it works: Databases are accessed via tools — each tool wraps one or more database APIs. Call them directly with POST /v1/tools/:id/run or let the agent automatically pick the right databases when you use the Query endpoint.

Database Reference

71 databases organized by data type. Click a database name to read the full parameter reference.

Structural Biology1 databases

Database	Tool ID	Scale	Description	Key Parameters
Molecular Docking	molecular-docking	—	Blind molecular docking of small molecules onto protein targets using DiffDock-L (MIT, MIT/Stanford). Diffusion-based approach: automatically discovers binding pockets without prior knowledge. Success rate: 43% (RMSD <2Å), DockGen generalization benchmark: 22.6% — state-of-the-art blind docking. Generates multiple ranked docking pose candidates with confidence scores. Complements Chai-1 (co-folding from scratch); DiffDock-L docks onto an existing structure. Runs on GPU-accelerated AWS Batch (Spot g5.xlarge). Use for drug discovery: identify binding mode and rank ligand candidates.	proteinPDB, ligandSMILES, numPoses, inferenceSteps

Proteomics11 databases

Database	Tool ID	Scale	Description	Key Parameters
Molecular Dynamics Simulation	molecular-dynamics	—	Run molecular dynamics (MD) simulation to validate protein stability using OpenMM (MIT/LGPL, Stanford). Simulates the protein in explicit water + ions at physiological conditions (300 K, 1 atm, 150 mM NaCl). Validates dynamic stability — ThermoMPNN checks static ΔΔG; OpenMM checks real behaviour over nanoseconds. OpenMM 8 supports ML potentials alongside physics-based force fields. Outputs: trajectory file, RMSD stability plot, per-residue flexibility (RMSF), and average structure PDB. Runtime: ~100–300 ns/day on g5.xlarge (A10G); 50–100 ns sufficient for stability validation. Runs on GPU-accelerated AWS Batch (Spot g5.xlarge). Cost: ~$3–10 per job.	proteinPDB, simulationLength, forcefield, temperature, solvent
Protein Complex Prediction	protein-complex-prediction	—	Predict the 3D structure of multi-molecule complexes using Chai-1 (Apache 2.0, ChaiDiscovery). Co-folds proteins, small molecules, nucleic acids, glycans, and ions simultaneously in one model. Protein-ligand: 77% success rate (vs AlphaFold3 76%). Protein-protein multimers: 75.1% DockQ (beats AlphaFold-Multimer 67.7%). Antibody-antigen: 47.9% DockQ in single-sequence mode. Runs on GPU-accelerated AWS Batch (Spot g5.xlarge). Use when you need multi-molecule complex structure prediction without an AlphaFold3 licence.	sequences, ligandSMILES, useTemplates, numRecycles
Protein Sequence Co-design	protein-sequence-codesign	—	Generate protein sequence and 3D structure simultaneously using DPLM-2 diffusion model. Performs true joint sequence+structure co-design in a single diffusion pass — no separate inverse folding step. Supports unconditional generation and motif-constrained scaffolding (design a protein around a fixed active site). 650M DPLM-2 model outperforms 3B-scale baselines (ICLR 2025 / ICML 2025 Spotlight). Runs on GPU-accelerated AWS Batch (Spot g5.xlarge). Use for de novo protein generation or motif-anchored scaffold design.	mode, proteinLength, numSamples, temperature, motifPDB
Protein Sequence Generation	protein-sequence-generation	—	Generate diverse protein sequences within a protein family using ProGen2 (Salesforce, BSD-3). Trained on 1 billion protein sequences, ProGen2 produces natural-looking sequences that respect the evolutionary grammar of the target protein family. Provide a family name (e.g. "GFP", "serine protease") or a seed sequence, and the model generates diverse variants with perplexity scores for quality ranking. Runs on GPU-accelerated AWS Batch (Spot g5.xlarge). Ideal as a fast pre-screening step before expensive GPU design runs.	proteinFamily, numSequences, temperature, modelSize, maxLength
Protein Sequence Infilling	protein-sequence-infilling	—	Design protein scaffolds that preserve a required structural motif using DPLM-2 inverse folding. Given a PDB structure with a fixed functional site (e.g. catalytic triad, disulfide bonds, binding loop), generates diverse scaffold sequences that maintain the exact geometry of the fixed residues. Useful for enzyme engineering and epitope grafting where the active site geometry is non-negotiable. Runs on GPU-accelerated AWS Batch (Spot g5.xlarge).	motifPDB, motifResidues, scaffoldLength, numDesigns, confidenceThreshold
Protein Variant Scoring	protein-variant-scoring	—	Zero-shot fitness scoring of protein variants using ProGen2 log-likelihood. Ranks point mutations (e.g. A123G), insertions, deletions, and combinatorial variants by their log-likelihood delta relative to the wildtype — no experimental training data required. Handles both substitutions AND indels, unlike most tools that only score substitutions. Ideal for pre-screening thousands of variants cheaply before committing to wet lab synthesis. Runs on GPU-accelerated AWS Batch (Spot g5.xlarge). BSD-3 license (Salesforce).	baseSequence, mutations, scanAllSingle, modelSize
Antibody Humanization	antibody-humanization	—	Humanize antibody/nanobody sequences and score immunogenicity using BioPhi (MIT, Merck). Two complementary methods: Sapiens — deep learning CDR humanization while preserving binding specificity. OASis — 9-mer humanness scoring against Observed Antibody Space; correlates with clinical immunogenicity. Critical downstream step for all Boltz nanobody designs before therapeutic development. A nanobody with excellent binding but poor OASis humanness score will fail in clinical trials. Runs on ECS Fargate (CPU). Fast: 1–2 minutes per antibody.	antibodySequence, humanizationMethod, species, preserveBinding
Protein Function Annotation	protein-function-annotation	—	Predict protein function (GO terms) and enzyme class (EC number) using DeepFRI (BSD-3, Flatiron Institute). GCN on contact maps + protein language model features for accurate zero-shot function prediction. Predicts: Molecular Function (MF), Biological Process (BP), Cellular Component (CC), and EC class. Identifies functional residues — which specific amino acids drive each predicted GO term. Critical quality control: ensures designed proteins have the intended functional annotation. Runs on ECS Fargate (CPU). Fast: 1–3 minutes per protein.	proteinInput, ontology, identifyFunctionalResidues
Protein Stability Prediction	protein-stability-prediction	—	Predict thermodynamic stability change (ΔΔG) for protein point mutations using ThermoMPNN (UNC). ThermoMPNN-D extends prediction to double mutants. Fast CPU inference: screens hundreds of variants in under a minute. ΔΔG < 0 kcal/mol = stabilising mutation; ΔΔG > 0 = destabilising. Critical gatekeeper: filter unstable designs before committing to expensive GPU runs or wet lab synthesis. License: CC BY-ND 4.0 (allows commercial use and self-hosting; prohibits modifying the model itself). Runs on ECS Fargate (CPU). Fast: <1 minute for single scan.	proteinPDB, mutations, scanMode, includeDoubles
STRING Database Tools	string-db	—	Comprehensive plugin for protein-protein interaction analysis using STRING database	tool_type, identifiers, species, required_score, limit
UniProt Database Tools	uniprot-toolkit	—	Comprehensive plugin for UniProt protein database operations including search, fetch, and ID mapping	tool_type, query, ids, organism, format

Bowtie21 databases

Database	Tool ID	Scale	Description	Key Parameters
Bowtie2 - Fast Read Alignment	bowtie2-toolkit	—	Ultrafast and memory-efficient tool for aligning sequencing reads to large reference genomes. Supports both end-to-end and local alignment modes with comprehensive parameter control for various sequencing platforms and read types.	command, referenceGenome, inputReads, outputFile, outputFormat

Bwa1 databases

Database	Tool ID	Scale	Description	Key Parameters
BWA - Burrows-Wheeler Aligner	bwa-toolkit	—	Fast and memory-efficient alignment tool for short DNA sequencing reads against large reference genomes using Burrows-Wheeler Transform. Supports multiple alignment algorithms and paired-end sequencing data with comprehensive parameter control.	command, referenceGenome, inputReads, outputFile, threads

Hisat21 databases

Database	Tool ID	Scale	Description	Key Parameters
HISAT2 - Hierarchical Indexing for Spliced Alignment of Transcripts 2	hisat2-toolkit	—	Fast and sensitive splice-aware alignment of RNA-seq reads to a reference genome using hierarchical graph-based indexing.	command, indexPrefix, referenceGenome, inputReads, outputFile

3d Protein Structures1 databases

Database	Tool ID	Scale	Description	Key Parameters
AlphaFold 3D Protein Structure Database [PRIMARY FOR 3D STRUCTURES]	alphafold-db	—	🧬 PRIMARY TOOL FOR 3D PROTEIN STRUCTURE VISUALIZATION 🧬 AI-predicted 3D protein structure visualization and analysis. Access 214+ million protein structure predictions with confidence scores. ALWAYS USE THIS for 3D protein structures, molecular visualization, and structural analysis queries.	query_type, identifier, organism, confidence_threshold, format

Preprints3 databases

Database	Tool ID	Scale	Description	Key Parameters
arXiv Search	arxiv-search	—	Search arXiv preprint repository for cutting-edge research and early-stage scientific papers	query, category, max_results, sort_by, sort_order
bioRxiv Search	biorxiv-search	—	Search bioRxiv preprint repository for cutting-edge biological research and early-stage life science papers	query, max_results, start_date, end_date, cursor
medRxiv Search	medrxiv-search	—	Search medRxiv preprint repository for cutting-edge health sciences research and early-stage medical papers	query, max_results, start_date, end_date, cursor

Genomics4 databases

Database	Tool ID	Scale	Description	Key Parameters
Biograph Knowledge Graph	biograph	—	Query the Biograph Knowledge Graph for genes, proteins, diseases, variants, pathways, and drugs with their relationships	query_type, entity_type, entity_id, search_term, limit
Ensembl	ensembl	—	Comprehensive plugin for Ensembl genome browser and annotation database - providing access to vertebrate genomic data	toolType, species, division, id, symbol
Gene Ontology (GO) Toolkit	go-toolkit	—	Comprehensive toolkit for Gene Ontology analysis including term search, annotations, and enrichment	operation, query, go_id, identifiers, organism
NCBI-Gene	ncbi-gene	—	Search NCBI Gene database for gene information, locations, and annotations	query, organism, gene_type, chromosome, retmax

Chembl1 databases

Database	Tool ID	Scale	Description	Key Parameters
ChEMBL Database Tools	chembl-toolkit	—	Comprehensive plugin for ChEMBL bioactivity database operations including compound search, target analysis, and drug discovery data	tool_type, query, chembl_id, smiles, target_type

Clinical Medicine3 databases

Database	Tool ID	Scale	Description	Key Parameters
Clinical Tables Search Service	clinical-tables-toolkit	—	Comprehensive search across 25+ clinical and genomics tables from National Library of Medicine (NLM) including conditions, drugs, genes, variants, ICD codes, and medical terminology	table, terms, count, offset, save_format
Clinical Trials Database Tools	clinical-trials-toolkit	—	Comprehensive clinical trials search using ClinicalTrials.gov API v2 with 400,000+ registered trials worldwide	tool_type, condition, intervention, phase, status
Medical Devices & Procedures Tools	medical-devices-toolkit	—	Medical device information from FDA (510k, PMA, recalls) and medical procedures/devices from SNOMED CT terminology	tool_type, query, limit, product_code, applicant

Citations1 databases

Database	Tool ID	Scale	Description	Key Parameters
CrossRef Search	crossref-search	—	DOI resolution, citation tracking, and comprehensive bibliographic data retrieval using CrossRef API	query, doi, author, title, journal

Statistics1 databases

Database	Tool ID	Scale	Description	Key Parameters
Data Statistics - Statistical Analysis of Data Files	data-statistics	—	Run comprehensive statistical analysis on workspace data files. Computes descriptive statistics (mean, SD, median, CI), performs group comparisons (t-test, ANOVA, pairwise Tukey), correlation analysis, and data quality metrics (outliers, missing values). Supports multiple testing correction (Bonferroni, Benjamini-Hochberg). Use when the user needs statistical tests, group comparisons, or a numerical summary of their data.	fileKey, valueColumns, groupColumn, tests, multipleTestingCorrection

Ebi1 databases

Database	Tool ID	Scale	Description	Key Parameters
EMBL-EBI Job Dispatcher Tools	ebi-job-dispatcher	—	Comprehensive plugin for EMBL-EBI Job Dispatcher framework providing access to 50+ bioinformatics analysis tools including BLAST, Clustal, InterProScan, and more	tool_name, sequence, email, title, database

Cross Database Search1 databases

Database	Tool ID	Scale	Description	Key Parameters
EBI Search Cross-Database Discovery Tools	ebi-search	—	Comprehensive plugin for EMBL-EBI Search providing unified cross-database search across 170+ biological datasets with faceted filtering and cross-reference exploration	query_type, query, domain, entry_id, facet_fields

Bedtools1 databases

Database	Tool ID	Scale	Description	Key Parameters
BEDTools - Genome Interval Analysis	bedtools-toolkit	—	Powerful suite for genome interval manipulation and analysis. BEDTools provides comprehensive functionality for intersecting, merging, counting, complementing and many other operations on genomic intervals in BED, GFF/GTF, VCF and BAM file formats. Essential for comparative genomics and functional annotation analysis.	command, inputFile, secondFile, genomeFile, outputFile

Format Conversion1 databases

Database	Tool ID	Scale	Description	Key Parameters
Format Conversion Toolkit	format-conversion-toolkit	—	Comprehensive bioinformatics file format conversion toolkit. Converts between sequence formats (FASTA, FASTQ, GenBank, EMBL, AB1), alignment formats (SAM, BAM, CRAM, BED), variant formats (VCF, BCF), and annotation formats (GFF3, GTF, BED). Supports single files, multiple files, and automatic extraction of zip/tar archives.	conversionType, sourceFormat, targetFormat, inputFile, outputPrefix

Genomicranges1 databases

Database	Tool ID	Scale	Description	Key Parameters
GenomicRanges - Bioconductor Genomic Intervals	genomicranges-toolkit	—	Comprehensive genomic interval analysis using the Bioconductor GenomicRanges R package. Perform interval operations, overlap analysis, windowed analysis, and genomic annotation.	genomicRangesCommand, inputFile, outputFile, genomeBuild, chrPrefix

Picard1 databases

Database	Tool ID	Scale	Description	Key Parameters
Picard Tools - High-Throughput Sequencing Processing	picard-toolkit	—	Comprehensive suite of Java-based command-line utilities for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. Picard provides tools for data processing, quality control, metrics collection, and format conversion.	command, inputFile, outputFile, metricsFile, referenceSequence

Sequence Editing1 databases

Database	Tool ID	Scale	Description	Key Parameters
Sequence Editor Toolkit	sequence-editor-toolkit	—	Edit, trim, translate, and manipulate DNA/RNA/protein sequences. Supports trimming, extracting regions, reverse complement, translation, transcription, quality filtering (FASTQ), masking, find/replace, and more. Intelligently chooses instant execution for small files (<5MB) or background processing for large files.	inputFile, operation, outputFile, startPosition, endPosition

Vcftools1 databases

Database	Tool ID	Scale	Description	Key Parameters
VCFtools Toolkit	vcftools-toolkit	—	Comprehensive VCF/BCF file manipulation and analysis toolkit. Provides filtering, statistics, format conversion, merging, annotation, and quality control capabilities for variant call format files.	command, inputFile, outputFile, outputFormat, referenceFile

File Reading1 databases

Database	Tool ID	Scale	Description	Key Parameters
File Reader - Read and Parse Data Files	file-reader	—	Read and parse bioinformatics data files from workspace. Supports Word (.docx), Excel (.xlsx), CSV, PDF, FASTA (.fasta, .fa, .fna, .faa, .ffn, .frn, .fas, .fsa), and GenBank formats. Includes automatic OCR for scanned PDFs via AWS Textract. Provides multi-stage data intake: metadata only, structure preview, or full content. Use for reading experimental data, lab reports, protocols, understanding file structure, or extracting specific information.	fileKey, mode, maxRows, sheet, extractTables

File Search1 databases

Database	Tool ID	Scale	Description	Key Parameters
File Search - Discover Files in Workspace	file-search	—	Use this tool when the user asks to find, search, list, show, or discover files in their workspace (e.g., 'find files with...', 'show me files that contain...', 'what files have...'). Searches by name, content keywords, metadata, format, topics, sections, and tags. Returns file metadata without reading full content. Essential for file discovery before using file-reader.	query, tags, format, keywords, topics

File Writer1 databases

Database	Tool ID	Scale	Description	Key Parameters
File Writer - Save Reports and Documents to Workspace	file-writer	—	Save text content, reports, and documents as files in the user's workspace. Use after generating a report to save it as a markdown file that users can view and download. Supports plain text and markdown format.	filename, content, folder

Gatk1 databases

Database	Tool ID	Scale	Description	Key Parameters
GATK Toolkit	gatk-toolkit	—	Genome Analysis Toolkit (GATK) for variant discovery and genotyping in high-throughput sequencing data. Executes GATK tools on AWS infrastructure with automatic file management and process monitoring.	gatkTool, inputBam, referenceFasta, outputPrefix, intervals

Clinical Genomics1 databases

Database	Tool ID	Scale	Description	Key Parameters
GWAS Catalog Disease Association Tools	gwas-catalog	—	Comprehensive plugin for NHGRI-EBI GWAS Catalog providing access to genome-wide association studies, SNP-trait associations, and disease genomics data	query_type, identifier, trait, variant_id, gene_name

Interpro1 databases

Database	Tool ID	Scale	Description	Key Parameters
InterPro Database Tools	interpro-toolkit	—	Comprehensive plugin for InterPro protein functional analysis including domain prediction, family classification, and functional site identification	tool_type, query, accession, protein_id, sequence

Homer1 databases

Database	Tool ID	Scale	Description	Key Parameters
HOMER - Hypergeometric Optimization of Motif EnRichment	homer-toolkit	—	Comprehensive suite for motif discovery, ChIP-Seq analysis, and peak annotation. Identifies enriched sequence motifs and analyzes transcription factor binding sites.	homerCommand, inputFile, outputDir, genome, motifLength

Ncbi6 databases

Database	Tool ID	Scale	Description	Key Parameters
NCBI-Assembly	ncbi-assembly	—	Search genome assembly information from NCBI Assembly database	query, retmax
NCBI-Nucleotide	ncbi-nucleotide	—	Search and retrieve nucleotide sequences (DNA/RNA) from NCBI Nucleotide database	query, ids, organism, sequence_type, molecular_type
NCBI-Protein	ncbi-protein	—	Search and retrieve protein sequences from NCBI Protein database	query, ids, organism, protein_class, molecular_weight
NCBI-SRA	ncbi-sra	—	Search Sequence Read Archive from NCBI SRA database	query, retmax
NCBI-Structure	ncbi-structure	—	Search protein structures from NCBI Structure database with PDB integration	query, ids, organism, structure_type, experimental_method
NCBI-Taxonomy	ncbi-taxonomy	—	Search taxonomic information and organism classification from NCBI Taxonomy database	query, ids, rank, division, genetic_code

Blast1 databases

Database	Tool ID	Scale	Description	Key Parameters
NCBI-BLAST	ncbi-blast	—	Perform sequence alignment and homology searches using NCBI BLAST	sequence, program, database, expect, hitlist_size

Dbsnp1 databases

Database	Tool ID	Scale	Description	Key Parameters
NCBI-dbSNP	ncbi-dbsnp	—	Search SNP variant information from NCBI dbSNP database	query, retmax, saveToWorkspace, outputFormat, outputFilename

Literature1 databases

Database	Tool ID	Scale	Description	Key Parameters
NCBI-PubMed	ncbi-pubmed	—	Search PubMed literature database for scientific papers, articles, and reviews	query, retmax, sort, publication_type, date_range

Drug Discovery1 databases

Database	Tool ID	Scale	Description	Key Parameters
OpenFDA Drug Database Tools	openfda-drug-toolkit	—	Comprehensive drug information from FDA, EMA, and WHO databases including approvals, adverse events, labeling, and recalls	tool_type, query, limit, patient_reaction, seriousness

Structural1 databases

Database	Tool ID	Scale	Description	Key Parameters
PDB Experimental Structure Database	pdb-toolkit	—	Access experimentally determined protein structures from X-ray crystallography, NMR, and cryo-EM. Best for high-resolution experimental structures and validation data. Use AlphaFold for predicted structures.	operation, pdb_id, query, search_type, format

Pipeline3 databases

Database	Tool ID	Scale	Description	Key Parameters
List Pipelines - Discover Predefined Bioinformatics Pipelines	list-pipelines	—	List all available predefined bioinformatics pipelines. Returns pipeline names, descriptions, required inputs, and estimated duration. Use this BEFORE building a pipeline from scratch to check if a predefined pipeline already covers the user's need. Can filter by category (variant-calling, transcriptomics, genomics, epigenomics, quality-control, alignment, structural-biology).	category
Pipeline Status - Check Multi-Step Pipeline Progress	pipeline-status	—	Check the status and progress of a running multi-step bioinformatics pipeline. Returns overall pipeline status, individual step statuses, progress percentage, and output files when complete. Use after start-pipeline to monitor pipeline execution.	executionId
Start Pipeline - Launch Multi-Step Bioinformatics Pipeline	start-pipeline	—	Start a multi-step bioinformatics pipeline. Supports two modes: (1) pipelineId mode — pass a predefined pipeline id from list-pipelines and provide only the input parameters; the steps are loaded automatically from the registry. (2) Custom steps mode — provide an explicit steps[] array for one-off pipelines. Returns an executionId to track progress with pipeline-status tool.	pipelineId, name, description, steps, parameters

Process Management2 databases

Database	Tool ID	Scale	Description	Key Parameters
Process Results - Get Process Output Files	process-results	—	Retrieve output files and results from a completed bioinformatics process. Use when a process status shows 'completed' and you need to access the generated files (e.g., PDB structures, VCF variants, analysis reports). Returns file paths, URLs, or file content for downstream analysis.	processId, includeContent
Process Status - Check Running Process Status	process-status	—	Check the status of a long-running bioinformatics process by process ID. Use when the user asks 'is my process done?', 'what's the status of process X?', or wants to check if a process has completed. Returns process status (running, completed, failed, cancelled), progress, and error messages if applicable.	processId

Fastqc1 databases

Database	Tool ID	Scale	Description	Key Parameters
FastQC - A quality control tool for high throughput sequence data	fastqc-toolkit	—	Quality control tool for high throughput sequencing data providing comprehensive quality assessment reports with summary graphs and statistics for raw sequence data.	command, inputFiles, outputDirectory, quiet, nogroup

Fmlrc1 databases

Database	Tool ID	Scale	Description	Key Parameters
FMLRC - FM-index Long Read Error Correction	fmlrc-toolkit	—	Long-read error correction tool using FM-index based methods to correct errors in noisy long reads using high-quality short reads as reference.	fmlrcCommand, inputFile, outputFile, shortReadsFile, kmerSize

Trimmomatic1 databases

Database	Tool ID	Scale	Description	Key Parameters
Trimmomatic Toolkit	trimmomatic-toolkit	—	Comprehensive read trimming and adapter removal toolkit using Trimmomatic v0.39. Performs quality filtering, adapter trimming, and read preprocessing for both single-end and paired-end sequencing data.	command, inputFile, outputFile, outputFileR2, outputFileUnpairedR1

Samtools1 databases

Database	Tool ID	Scale	Description	Key Parameters
SAMtools Toolkit	samtools-toolkit	—	Comprehensive SAM/BAM/CRAM file manipulation toolkit using SAMtools v1.13. Provides format conversion, indexing, statistics, editing, and viewing capabilities for sequencing alignment data.	samtoolsCommand, inputFile, outputFormat, outputFile, referenceFile

Protein1 databases

Database	Tool ID	Scale	Description	Key Parameters
SmartMatch Protein Search	smartsmatch-protein	—	Fast protein similarity search using AI-powered vector embeddings. Find similar proteins by sequence with subsecond response times. Alternative to BLAST for rapid protein identification.	sequence, limit, threshold, include_metadata

Tabix1 databases

Database	Tool ID	Scale	Description	Key Parameters
Tabix Query	tabix-query	—	Fast indexed genomic file queries using Tabix/TBI indexes. Query specific regions from compressed VCF, BED, GFF files without full extraction. Runs locally with HTTP range request support for efficient S3 access.	command, inputFile, indexFile, format, chromosome

Web1 databases

Database	Tool ID	Scale	Description	Key Parameters
Tavily Web Search	tavily-search	—	Search the web for scientific literature and bioinformatics information using Tavily API	query, maxResults, searchDepth

Ucsc1 databases

Database	Tool ID	Scale	Description	Key Parameters
UCSC Genome Browser	ucsc-genome-browser	—	Access genome annotation and visualization data from the UCSC Genome Browser database	operation, genome, chrom, start, end

Patents1 databases

Database	Tool ID	Scale	Description	Key Parameters
USPTO Patents Search	uspto-patents-search	—	Search US patent database via PatentsView API for intellectual property research and prior art analysis	query, country_code, status, max_results, filing_date_start

Visualization1 databases

Database	Tool ID	Scale	Description	Key Parameters
Visualization Generator - Create Charts and Plots	visualization-generator	—	Generate professional charts, plots, and diagrams for scientific reports. Supports standard charts (bar, line, scatter, pie), specialized bioinformatics plots (volcano, MA, Manhattan, heatmap), workflow diagrams, and 3D molecular structures. Visualizations are saved to workspace and can be included in reports.	type, data, title, xLabel, yLabel

Compression1 databases

Database	Tool ID	Scale	Description	Key Parameters
Zip Toolkit	zip-toolkit	—	Comprehensive file compression and decompression toolkit supporting ZIP, GZIP, BZIP2, XZ, 7-Zip, and TAR formats. Handles bioinformatics files that are commonly distributed in compressed formats.	command, inputFile, inputFiles, compressionType, compressionLevel

Discover Databases at Runtime

from smartsbio import SmartsBio

client = SmartsBio(api_key="sk_live_...")
tools = client.tools.list()

db_categories = [
    "Sequence Search", "Protein & Structure Databases",
    "Genomics & Variant Databases", "Pathway & Ontology",
    "Clinical & Drug Databases", "Literature", "Patents",
]
for category in db_categories:
    in_cat = [t for t in tools if t.category == category]
    if in_cat:
        print(f"\n{category}:")
        for t in in_cat:
            print(f"  {t.id:35s} {t.description[:50]}")

Query Examples

# UniProt — protein annotation
uniprot = client.tools.run(
    tool_id="uniprot_toolkit",
    input={"query": "BRCA1_HUMAN", "format": "json"},
)
print(uniprot["function"], uniprot["subcellular_location"])

# Ensembl — gene coordinates
ensembl = client.tools.run(
    tool_id="ensembl",
    input={"gene_id": "ENSG00000012048", "species": "human", "features": ["variants", "regulation"]},
)

# ChEMBL — drug-target bioactivity
chembl = client.tools.run(
    tool_id="chembl_toolkit",
    input={"target": "EGFR", "activity_type": "IC50", "limit": 20},
)
for compound in chembl["compounds"][:5]:
    print(f"{compound['molecule_chembl_id']}  IC50={compound['standard_value']} nM")

# ClinicalTrials.gov — open trials
trials = client.tools.run(
    tool_id="clinical_trials_toolkit",
    input={"condition": "breast cancer", "intervention": "BRCA1", "status": "RECRUITING"},
)
for trial in trials["studies"][:5]:
    print(f"{trial['nct_id']}: {trial['brief_title']}")

← Available Tools BLAST Reference →NCBI Reference →STRING Reference →