Predefined Pipelines

Ready-to-run multi-step bioinformatics workflows that run on managed AWS infrastructure. Submit a pipeline ID, poll for progress, and download your results — no infrastructure setup required.

1. Upload input files

2. POST /v1/pipelines

3. Poll for status

4. Download outputs

Predefined pipelines

NGS Sequencing8 pipelines

Pipeline ID	Name	Description	Runtime
quality-control	Quality Control	FastQC + MultiQC report generation for one or more sequencing datasets.	10–30 min
alignment-wes	WES Alignment	Whole-exome sequencing alignment: FastQC → Trimmomatic → BWA-MEM → Picard → GATK BQSR.	2–4 hours
whole-genome-sequencing	Whole Genome Sequencing	WGS alignment with QC, duplicate marking, and variant-ready BAM generation.	4–8 hours
rna-seq-analysis	RNA-seq Analysis	HISAT2 alignment + FeatureCounts. Produces a count matrix ready for DESeq2 / edgeR.	1–3 hours
atac-seq	ATAC-seq	Adapter trimming → Bowtie2 alignment → MACS2 peak calling → fragment size analysis.	1–2 hours
chip-seq	ChIP-seq	ChIP-seq alignment, MACS2 peak calling, and de novo motif enrichment with HOMER.	1–3 hours
gatk-variant-calling	GATK Variant Calling	Germline SNP & indel discovery using GATK4 HaplotypeCaller in GVCF mode.	1–4 hours
somatic-variant-calling	Somatic Variant Calling	Tumor-normal somatic SNV & indel detection using GATK Mutect2 with artifact filtering.	2–5 hours

AI Protein Design4 pipelines

Pipeline ID	Name	Description	Runtime
protein-binder-design-validated	Protein Binder Design	AI binder design (generative models) + automated binding affinity validation with Boltz-2.	30–90 min
nanobody-discovery	Nanobody Discovery	End-to-end VHH nanobody design, epitope targeting, structural screening, and binding prediction.	1–2 hours
enzyme-engineering	Enzyme Engineering	Generative enzyme design with catalytic activity prediction and thermostability optimization.	1–3 hours
structure-based-drug-discovery	Structure-Based Drug Discovery	Target structure retrieval or prediction, binding site identification, and virtual screening.	30–120 min

Quick Start

Upload your input files, submit a pipeline by ID, poll until completion, then download outputs. The SDKs provide a .wait() helper that handles polling automatically.

from smartsbio import SmartsBio

client = SmartsBio(api_key="sk_live_...")
ws_id = "ws_abc123"

# 1. Upload FASTQ files
r1 = client.files.upload("sample_R1.fastq.gz", workspace_id=ws_id, path="input/")
r2 = client.files.upload("sample_R2.fastq.gz", workspace_id=ws_id, path="input/")

# 2. Start the RNA-seq pipeline
pipeline = client.pipelines.create(
    pipeline_id="rna-seq-analysis",
    workspace_id=ws_id,
    input={
        "fastq_r1": r1["key"],
        "fastq_r2": r2["key"],
        "gtf": "input/genome.gtf",
        "reference": "GRCh38",
        "output_path": "results/rnaseq/",
    },
)
print(f"Pipeline {pipeline['id']} queued")

# 3. Wait for completion (polls every 30 s)
pipeline = client.pipelines.wait(
    pipeline["id"],
    workspace_id=ws_id,
    poll_interval=30,
    on_progress=lambda p: print(f"  [{p.get('current_step', '?')}]  {p['progress_pct']}%"),
)

# 4. Download output files
for key in pipeline["output_paths"]:
    client.files.download(key, workspace_id=ws_id, dest="./output/")
print("Done!")

Discover at Runtime

Use the list_pipelines tool to fetch the current pipeline registry with required inputs and estimated runtimes.

result = client.tools.run(tool_id="list_pipelines", input={})
for p in result["pipelines"]:
    print(f"{p['id']:45s} ~{p['estimated_runtime']}")

Full API details: The Pipelines API Reference covers all endpoints (POST, GET, DELETE), the pipeline object schema, polling options, log access, and input/output details for every predefined pipeline.

← Available Tools Pipelines API Reference →Pipeline Examples →