Skip to main content

Available Tools

Computational bioinformatics tools available in SmartsBio — AI protein design, sequence alignment, quality control, variant analysis, file processing, and more. Tools can be called directly via the API or invoked automatically by the agent.

Looking for database tools? Sequence search, protein databases, genomics, pathways, clinical data, literature, and patents are covered on the Available Databases page.
Sync

Results returned immediately via POST /v1/tools/:id/run. Best for fast, lightweight operations.

Pipeline

Async execution via POST /v1/pipelines. Get a pipeline ID immediately, then stream or poll. Used for compute-heavy tools.

Both

Supports either mode. A single long-running tool submitted to /v1/pipelines is valid — no multi-step chain required.

Tool Reference

71 tools across 46 categories. Tool IDs are stable across versions.

Structural Biology1 tools

Tool IDNameDescriptionMode
molecular-dockingMolecular DockingBlind molecular docking of small molecules onto protein targets using DiffDock-L (MIT, MIT/Stanford). Diffusion-based approach: automatically discovers binding pockets without prior knowledge. Success rate: 43% (RMSD <2Å), DockGen generalization benchmark: 22.6% — state-of-the-art blind docking. Generates multiple ranked docking pose candidates with confidence scores. Complements Chai-1 (co-folding from scratch); DiffDock-L docks onto an existing structure. Runs on GPU-accelerated AWS Batch (Spot g5.xlarge). Use for drug discovery: identify binding mode and rank ligand candidates.Pipeline

Proteomics11 tools

Tool IDNameDescriptionMode
molecular-dynamicsMolecular Dynamics SimulationRun molecular dynamics (MD) simulation to validate protein stability using OpenMM (MIT/LGPL, Stanford). Simulates the protein in explicit water + ions at physiological conditions (300 K, 1 atm, 150 mM NaCl). Validates dynamic stability — ThermoMPNN checks static ΔΔG; OpenMM checks real behaviour over nanoseconds. OpenMM 8 supports ML potentials alongside physics-based force fields. Outputs: trajectory file, RMSD stability plot, per-residue flexibility (RMSF), and average structure PDB. Runtime: ~100–300 ns/day on g5.xlarge (A10G); 50–100 ns sufficient for stability validation. Runs on GPU-accelerated AWS Batch (Spot g5.xlarge). Cost: ~$3–10 per job.Pipeline
protein-complex-predictionProtein Complex PredictionPredict the 3D structure of multi-molecule complexes using Chai-1 (Apache 2.0, ChaiDiscovery). Co-folds proteins, small molecules, nucleic acids, glycans, and ions simultaneously in one model. Protein-ligand: 77% success rate (vs AlphaFold3 76%). Protein-protein multimers: 75.1% DockQ (beats AlphaFold-Multimer 67.7%). Antibody-antigen: 47.9% DockQ in single-sequence mode. Runs on GPU-accelerated AWS Batch (Spot g5.xlarge). Use when you need multi-molecule complex structure prediction without an AlphaFold3 licence.Pipeline
protein-sequence-codesignProtein Sequence Co-designGenerate protein sequence and 3D structure simultaneously using DPLM-2 diffusion model. Performs true joint sequence+structure co-design in a single diffusion pass — no separate inverse folding step. Supports unconditional generation and motif-constrained scaffolding (design a protein around a fixed active site). 650M DPLM-2 model outperforms 3B-scale baselines (ICLR 2025 / ICML 2025 Spotlight). Runs on GPU-accelerated AWS Batch (Spot g5.xlarge). Use for de novo protein generation or motif-anchored scaffold design.Pipeline
protein-sequence-generationProtein Sequence GenerationGenerate diverse protein sequences within a protein family using ProGen2 (Salesforce, BSD-3). Trained on 1 billion protein sequences, ProGen2 produces natural-looking sequences that respect the evolutionary grammar of the target protein family. Provide a family name (e.g. "GFP", "serine protease") or a seed sequence, and the model generates diverse variants with perplexity scores for quality ranking. Runs on GPU-accelerated AWS Batch (Spot g5.xlarge). Ideal as a fast pre-screening step before expensive GPU design runs.Pipeline
protein-sequence-infillingProtein Sequence InfillingDesign protein scaffolds that preserve a required structural motif using DPLM-2 inverse folding. Given a PDB structure with a fixed functional site (e.g. catalytic triad, disulfide bonds, binding loop), generates diverse scaffold sequences that maintain the exact geometry of the fixed residues. Useful for enzyme engineering and epitope grafting where the active site geometry is non-negotiable. Runs on GPU-accelerated AWS Batch (Spot g5.xlarge).Pipeline
protein-variant-scoringProtein Variant ScoringZero-shot fitness scoring of protein variants using ProGen2 log-likelihood. Ranks point mutations (e.g. A123G), insertions, deletions, and combinatorial variants by their log-likelihood delta relative to the wildtype — no experimental training data required. Handles both substitutions AND indels, unlike most tools that only score substitutions. Ideal for pre-screening thousands of variants cheaply before committing to wet lab synthesis. Runs on GPU-accelerated AWS Batch (Spot g5.xlarge). BSD-3 license (Salesforce).Pipeline
antibody-humanizationAntibody HumanizationHumanize antibody/nanobody sequences and score immunogenicity using BioPhi (MIT, Merck). Two complementary methods: Sapiens — deep learning CDR humanization while preserving binding specificity. OASis — 9-mer humanness scoring against Observed Antibody Space; correlates with clinical immunogenicity. Critical downstream step for all Boltz nanobody designs before therapeutic development. A nanobody with excellent binding but poor OASis humanness score will fail in clinical trials. Runs on ECS Fargate (CPU). Fast: 1–2 minutes per antibody.Pipeline
protein-function-annotationProtein Function AnnotationPredict protein function (GO terms) and enzyme class (EC number) using DeepFRI (BSD-3, Flatiron Institute). GCN on contact maps + protein language model features for accurate zero-shot function prediction. Predicts: Molecular Function (MF), Biological Process (BP), Cellular Component (CC), and EC class. Identifies functional residues — which specific amino acids drive each predicted GO term. Critical quality control: ensures designed proteins have the intended functional annotation. Runs on ECS Fargate (CPU). Fast: 1–3 minutes per protein.Pipeline
protein-stability-predictionProtein Stability PredictionPredict thermodynamic stability change (ΔΔG) for protein point mutations using ThermoMPNN (UNC). ThermoMPNN-D extends prediction to double mutants. Fast CPU inference: screens hundreds of variants in under a minute. ΔΔG < 0 kcal/mol = stabilising mutation; ΔΔG > 0 = destabilising. Critical gatekeeper: filter unstable designs before committing to expensive GPU runs or wet lab synthesis. License: CC BY-ND 4.0 (allows commercial use and self-hosting; prohibits modifying the model itself). Runs on ECS Fargate (CPU). Fast: <1 minute for single scan.Pipeline
string-dbSTRING Database ToolsComprehensive plugin for protein-protein interaction analysis using STRING databaseSync
uniprot-toolkitUniProt Database ToolsComprehensive plugin for UniProt protein database operations including search, fetch, and ID mappingSync

Bowtie21 tools

Tool IDNameDescriptionMode
bowtie2-toolkitBowtie2 - Fast Read AlignmentUltrafast and memory-efficient tool for aligning sequencing reads to large reference genomes. Supports both end-to-end and local alignment modes with comprehensive parameter control for various sequencing platforms and read types.Pipeline

Bwa1 tools

Tool IDNameDescriptionMode
bwa-toolkitBWA - Burrows-Wheeler AlignerFast and memory-efficient alignment tool for short DNA sequencing reads against large reference genomes using Burrows-Wheeler Transform. Supports multiple alignment algorithms and paired-end sequencing data with comprehensive parameter control.Pipeline

Hisat21 tools

Tool IDNameDescriptionMode
hisat2-toolkitHISAT2 - Hierarchical Indexing for Spliced Alignment of Transcripts 2Fast and sensitive splice-aware alignment of RNA-seq reads to a reference genome using hierarchical graph-based indexing.Pipeline

3d Protein Structures1 tools

Tool IDNameDescriptionMode
alphafold-dbAlphaFold 3D Protein Structure Database [PRIMARY FOR 3D STRUCTURES]🧬 PRIMARY TOOL FOR 3D PROTEIN STRUCTURE VISUALIZATION 🧬 AI-predicted 3D protein structure visualization and analysis. Access 214+ million protein structure predictions with confidence scores. ALWAYS USE THIS for 3D protein structures, molecular visualization, and structural analysis queries.Sync

Preprints3 tools

Tool IDNameDescriptionMode
arxiv-searcharXiv SearchSearch arXiv preprint repository for cutting-edge research and early-stage scientific papersSync
biorxiv-searchbioRxiv SearchSearch bioRxiv preprint repository for cutting-edge biological research and early-stage life science papersSync
medrxiv-searchmedRxiv SearchSearch medRxiv preprint repository for cutting-edge health sciences research and early-stage medical papersSync

Genomics4 tools

Tool IDNameDescriptionMode
biographBiograph Knowledge GraphQuery the Biograph Knowledge Graph for genes, proteins, diseases, variants, pathways, and drugs with their relationshipsSync
ensemblEnsemblComprehensive plugin for Ensembl genome browser and annotation database - providing access to vertebrate genomic dataSync
go-toolkitGene Ontology (GO) ToolkitComprehensive toolkit for Gene Ontology analysis including term search, annotations, and enrichmentSync
ncbi-geneNCBI-GeneSearch NCBI Gene database for gene information, locations, and annotationsSync

Chembl1 tools

Tool IDNameDescriptionMode
chembl-toolkitChEMBL Database ToolsComprehensive plugin for ChEMBL bioactivity database operations including compound search, target analysis, and drug discovery dataSync

Clinical Medicine3 tools

Tool IDNameDescriptionMode
clinical-tables-toolkitClinical Tables Search ServiceComprehensive search across 25+ clinical and genomics tables from National Library of Medicine (NLM) including conditions, drugs, genes, variants, ICD codes, and medical terminologySync
clinical-trials-toolkitClinical Trials Database ToolsComprehensive clinical trials search using ClinicalTrials.gov API v2 with 400,000+ registered trials worldwideSync
medical-devices-toolkitMedical Devices & Procedures ToolsMedical device information from FDA (510k, PMA, recalls) and medical procedures/devices from SNOMED CT terminologySync

Citations1 tools

Tool IDNameDescriptionMode
crossref-searchCrossRef SearchDOI resolution, citation tracking, and comprehensive bibliographic data retrieval using CrossRef APISync

Statistics1 tools

Tool IDNameDescriptionMode
data-statisticsData Statistics - Statistical Analysis of Data FilesRun comprehensive statistical analysis on workspace data files. Computes descriptive statistics (mean, SD, median, CI), performs group comparisons (t-test, ANOVA, pairwise Tukey), correlation analysis, and data quality metrics (outliers, missing values). Supports multiple testing correction (Bonferroni, Benjamini-Hochberg). Use when the user needs statistical tests, group comparisons, or a numerical summary of their data.Sync

Ebi1 tools

Tool IDNameDescriptionMode
ebi-job-dispatcherEMBL-EBI Job Dispatcher ToolsComprehensive plugin for EMBL-EBI Job Dispatcher framework providing access to 50+ bioinformatics analysis tools including BLAST, Clustal, InterProScan, and morePipeline

Cross Database Search1 tools

Tool IDNameDescriptionMode
ebi-searchEBI Search Cross-Database Discovery ToolsComprehensive plugin for EMBL-EBI Search providing unified cross-database search across 170+ biological datasets with faceted filtering and cross-reference explorationSync

Bedtools1 tools

Tool IDNameDescriptionMode
bedtools-toolkitBEDTools - Genome Interval AnalysisPowerful suite for genome interval manipulation and analysis. BEDTools provides comprehensive functionality for intersecting, merging, counting, complementing and many other operations on genomic intervals in BED, GFF/GTF, VCF and BAM file formats. Essential for comparative genomics and functional annotation analysis.Pipeline

Format Conversion1 tools

Tool IDNameDescriptionMode
format-conversion-toolkitFormat Conversion ToolkitComprehensive bioinformatics file format conversion toolkit. Converts between sequence formats (FASTA, FASTQ, GenBank, EMBL, AB1), alignment formats (SAM, BAM, CRAM, BED), variant formats (VCF, BCF), and annotation formats (GFF3, GTF, BED). Supports single files, multiple files, and automatic extraction of zip/tar archives.Pipeline

Genomicranges1 tools

Tool IDNameDescriptionMode
genomicranges-toolkitGenomicRanges - Bioconductor Genomic IntervalsComprehensive genomic interval analysis using the Bioconductor GenomicRanges R package. Perform interval operations, overlap analysis, windowed analysis, and genomic annotation.Pipeline

Picard1 tools

Tool IDNameDescriptionMode
picard-toolkitPicard Tools - High-Throughput Sequencing ProcessingComprehensive suite of Java-based command-line utilities for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. Picard provides tools for data processing, quality control, metrics collection, and format conversion.Pipeline

Sequence Editing1 tools

Tool IDNameDescriptionMode
sequence-editor-toolkitSequence Editor ToolkitEdit, trim, translate, and manipulate DNA/RNA/protein sequences. Supports trimming, extracting regions, reverse complement, translation, transcription, quality filtering (FASTQ), masking, find/replace, and more. Intelligently chooses instant execution for small files (<5MB) or background processing for large files.Sync

Vcftools1 tools

Tool IDNameDescriptionMode
vcftools-toolkitVCFtools ToolkitComprehensive VCF/BCF file manipulation and analysis toolkit. Provides filtering, statistics, format conversion, merging, annotation, and quality control capabilities for variant call format files.Pipeline

File Reading1 tools

Tool IDNameDescriptionMode
file-readerFile Reader - Read and Parse Data FilesRead and parse bioinformatics data files from workspace. Supports Word (.docx), Excel (.xlsx), CSV, PDF, FASTA (.fasta, .fa, .fna, .faa, .ffn, .frn, .fas, .fsa), and GenBank formats. Includes automatic OCR for scanned PDFs via AWS Textract. Provides multi-stage data intake: metadata only, structure preview, or full content. Use for reading experimental data, lab reports, protocols, understanding file structure, or extracting specific information.Sync

File Search1 tools

Tool IDNameDescriptionMode
file-searchFile Search - Discover Files in WorkspaceUse this tool when the user asks to find, search, list, show, or discover files in their workspace (e.g., 'find files with...', 'show me files that contain...', 'what files have...'). Searches by name, content keywords, metadata, format, topics, sections, and tags. Returns file metadata without reading full content. Essential for file discovery before using file-reader.Sync

File Writer1 tools

Tool IDNameDescriptionMode
file-writerFile Writer - Save Reports and Documents to WorkspaceSave text content, reports, and documents as files in the user's workspace. Use after generating a report to save it as a markdown file that users can view and download. Supports plain text and markdown format.Sync

Gatk1 tools

Tool IDNameDescriptionMode
gatk-toolkitGATK ToolkitGenome Analysis Toolkit (GATK) for variant discovery and genotyping in high-throughput sequencing data. Executes GATK tools on AWS infrastructure with automatic file management and process monitoring.Pipeline

Clinical Genomics1 tools

Tool IDNameDescriptionMode
gwas-catalogGWAS Catalog Disease Association ToolsComprehensive plugin for NHGRI-EBI GWAS Catalog providing access to genome-wide association studies, SNP-trait associations, and disease genomics dataSync

Interpro1 tools

Tool IDNameDescriptionMode
interpro-toolkitInterPro Database ToolsComprehensive plugin for InterPro protein functional analysis including domain prediction, family classification, and functional site identificationSync

Homer1 tools

Tool IDNameDescriptionMode
homer-toolkitHOMER - Hypergeometric Optimization of Motif EnRichmentComprehensive suite for motif discovery, ChIP-Seq analysis, and peak annotation. Identifies enriched sequence motifs and analyzes transcription factor binding sites.Pipeline

Ncbi6 tools

Tool IDNameDescriptionMode
ncbi-assemblyNCBI-AssemblySearch genome assembly information from NCBI Assembly databaseSync
ncbi-nucleotideNCBI-NucleotideSearch and retrieve nucleotide sequences (DNA/RNA) from NCBI Nucleotide databaseSync
ncbi-proteinNCBI-ProteinSearch and retrieve protein sequences from NCBI Protein databaseSync
ncbi-sraNCBI-SRASearch Sequence Read Archive from NCBI SRA databaseSync
ncbi-structureNCBI-StructureSearch protein structures from NCBI Structure database with PDB integrationSync
ncbi-taxonomyNCBI-TaxonomySearch taxonomic information and organism classification from NCBI Taxonomy databaseSync

Blast1 tools

Tool IDNameDescriptionMode
ncbi-blastNCBI-BLASTPerform sequence alignment and homology searches using NCBI BLASTPipeline

Dbsnp1 tools

Tool IDNameDescriptionMode
ncbi-dbsnpNCBI-dbSNPSearch SNP variant information from NCBI dbSNP databaseSync

Literature1 tools

Tool IDNameDescriptionMode
ncbi-pubmedNCBI-PubMedSearch PubMed literature database for scientific papers, articles, and reviewsSync

Drug Discovery1 tools

Tool IDNameDescriptionMode
openfda-drug-toolkitOpenFDA Drug Database ToolsComprehensive drug information from FDA, EMA, and WHO databases including approvals, adverse events, labeling, and recallsSync

Structural1 tools

Tool IDNameDescriptionMode
pdb-toolkitPDB Experimental Structure DatabaseAccess experimentally determined protein structures from X-ray crystallography, NMR, and cryo-EM. Best for high-resolution experimental structures and validation data. Use AlphaFold for predicted structures.Sync

Pipeline3 tools

Tool IDNameDescriptionMode
list-pipelinesList Pipelines - Discover Predefined Bioinformatics PipelinesList all available predefined bioinformatics pipelines. Returns pipeline names, descriptions, required inputs, and estimated duration. Use this BEFORE building a pipeline from scratch to check if a predefined pipeline already covers the user's need. Can filter by category (variant-calling, transcriptomics, genomics, epigenomics, quality-control, alignment, structural-biology).Sync
pipeline-statusPipeline Status - Check Multi-Step Pipeline ProgressCheck the status and progress of a running multi-step bioinformatics pipeline. Returns overall pipeline status, individual step statuses, progress percentage, and output files when complete. Use after start-pipeline to monitor pipeline execution.Sync
start-pipelineStart Pipeline - Launch Multi-Step Bioinformatics PipelineStart a multi-step bioinformatics pipeline. Supports two modes: (1) pipelineId mode — pass a predefined pipeline id from list-pipelines and provide only the input parameters; the steps are loaded automatically from the registry. (2) Custom steps mode — provide an explicit steps[] array for one-off pipelines. Returns an executionId to track progress with pipeline-status tool.Sync

Process Management2 tools

Tool IDNameDescriptionMode
process-resultsProcess Results - Get Process Output FilesRetrieve output files and results from a completed bioinformatics process. Use when a process status shows 'completed' and you need to access the generated files (e.g., PDB structures, VCF variants, analysis reports). Returns file paths, URLs, or file content for downstream analysis.Sync
process-statusProcess Status - Check Running Process StatusCheck the status of a long-running bioinformatics process by process ID. Use when the user asks 'is my process done?', 'what's the status of process X?', or wants to check if a process has completed. Returns process status (running, completed, failed, cancelled), progress, and error messages if applicable.Sync

Fastqc1 tools

Tool IDNameDescriptionMode
fastqc-toolkitFastQC - A quality control tool for high throughput sequence dataQuality control tool for high throughput sequencing data providing comprehensive quality assessment reports with summary graphs and statistics for raw sequence data.Pipeline

Fmlrc1 tools

Tool IDNameDescriptionMode
fmlrc-toolkitFMLRC - FM-index Long Read Error CorrectionLong-read error correction tool using FM-index based methods to correct errors in noisy long reads using high-quality short reads as reference.Pipeline

Trimmomatic1 tools

Tool IDNameDescriptionMode
trimmomatic-toolkitTrimmomatic ToolkitComprehensive read trimming and adapter removal toolkit using Trimmomatic v0.39. Performs quality filtering, adapter trimming, and read preprocessing for both single-end and paired-end sequencing data.Pipeline

Samtools1 tools

Tool IDNameDescriptionMode
samtools-toolkitSAMtools ToolkitComprehensive SAM/BAM/CRAM file manipulation toolkit using SAMtools v1.13. Provides format conversion, indexing, statistics, editing, and viewing capabilities for sequencing alignment data.Pipeline

Protein1 tools

Tool IDNameDescriptionMode
smartsmatch-proteinSmartMatch Protein SearchFast protein similarity search using AI-powered vector embeddings. Find similar proteins by sequence with subsecond response times. Alternative to BLAST for rapid protein identification.Pipeline

Tabix1 tools

Tool IDNameDescriptionMode
tabix-queryTabix QueryFast indexed genomic file queries using Tabix/TBI indexes. Query specific regions from compressed VCF, BED, GFF files without full extraction. Runs locally with HTTP range request support for efficient S3 access.Sync

Web1 tools

Tool IDNameDescriptionMode
tavily-searchTavily Web SearchSearch the web for scientific literature and bioinformatics information using Tavily APISync

Ucsc1 tools

Tool IDNameDescriptionMode
ucsc-genome-browserUCSC Genome BrowserAccess genome annotation and visualization data from the UCSC Genome Browser databaseSync

Patents1 tools

Tool IDNameDescriptionMode
uspto-patents-searchUSPTO Patents SearchSearch US patent database via PatentsView API for intellectual property research and prior art analysisSync

Visualization1 tools

Tool IDNameDescriptionMode
visualization-generatorVisualization Generator - Create Charts and PlotsGenerate professional charts, plots, and diagrams for scientific reports. Supports standard charts (bar, line, scatter, pie), specialized bioinformatics plots (volcano, MA, Manhattan, heatmap), workflow diagrams, and 3D molecular structures. Visualizations are saved to workspace and can be included in reports.Sync

Compression1 tools

Tool IDNameDescriptionMode
zip-toolkitZip ToolkitComprehensive file compression and decompression toolkit supporting ZIP, GZIP, BZIP2, XZ, 7-Zip, and TAR formats. Handles bioinformatics files that are commonly distributed in compressed formats.Pipeline
Looking for predefined pipelines? Multi-step bioinformatics workflows (WES alignment, RNA-seq, GATK variant calling, protein binder design, and more) are documented on the Predefined Pipelines page.

Fetch the Tool List at Runtime

Call GET /v1/tools to retrieve the live tool registry with each tool's ID, parameter schema, and description.

from smartsbio import SmartsBio

client = SmartsBio(api_key="sk_live_...")

tools = client.tools.list()
for tool in tools:
    print(f"{tool.id:40s} [{tool.category}]")

# Filter by category
protein_design = [t for t in tools if t.category == "AI Protein Design"]
pipeline_tools = [t for t in tools if t.invocation in ("pipeline", "both")]

# Inspect a tool's parameter schema
blast = next(t for t in tools if t.id == "ncbi_blast")
for param in blast.parameters:
    print(f"  {param.name} ({'required' if param.required else 'optional'}): {param.description}")

Run a Tool Directly

# Sync tool — results returned immediately
result = client.tools.run(
    tool_id="ncbi_pubmed",
    input={"query": "BRCA1 breast cancer 2024", "maxResults": 5},
)
for article in result["results"]:
    print(f"{article['pmid']}: {article['title']}")

# Pipeline tool — submit and poll
pipeline = client.pipelines.create(
    tool_id="protein_structure_prediction",
    workspace_id=ws_id,
    input={"sequence": "MKTIIALSYIFCLVFA...", "model": "boltz2"},
)
pipeline = client.pipelines.wait(pipeline["id"], workspace_id=ws_id,
                                  on_progress=lambda p: print(p["progress_pct"], "%"))
client.files.download(pipeline["output_keys"][0], workspace_id=ws_id, dest="./structure.pdb")