Skip to main content

Predefined Pipelines

Ready-to-run multi-step bioinformatics workflows that run on managed AWS infrastructure. Submit a pipeline ID, poll for progress, and download your results — no infrastructure setup required.

1. Upload input files
2. POST /v1/pipelines
3. Poll for status
4. Download outputs
12
Predefined pipelines
2
Categories
AWS
Managed infrastructure

NGS Sequencing8 pipelines

Pipeline IDNameDescriptionRuntime
quality-controlQuality ControlFastQC + MultiQC report generation for one or more sequencing datasets.10–30 min
alignment-wesWES AlignmentWhole-exome sequencing alignment: FastQC → Trimmomatic → BWA-MEM → Picard → GATK BQSR.2–4 hours
whole-genome-sequencingWhole Genome SequencingWGS alignment with QC, duplicate marking, and variant-ready BAM generation.4–8 hours
rna-seq-analysisRNA-seq AnalysisHISAT2 alignment + FeatureCounts. Produces a count matrix ready for DESeq2 / edgeR.1–3 hours
atac-seqATAC-seqAdapter trimming → Bowtie2 alignment → MACS2 peak calling → fragment size analysis.1–2 hours
chip-seqChIP-seqChIP-seq alignment, MACS2 peak calling, and de novo motif enrichment with HOMER.1–3 hours
gatk-variant-callingGATK Variant CallingGermline SNP & indel discovery using GATK4 HaplotypeCaller in GVCF mode.1–4 hours
somatic-variant-callingSomatic Variant CallingTumor-normal somatic SNV & indel detection using GATK Mutect2 with artifact filtering.2–5 hours

AI Protein Design4 pipelines

Pipeline IDNameDescriptionRuntime
protein-binder-design-validatedProtein Binder DesignAI binder design (generative models) + automated binding affinity validation with Boltz-2.30–90 min
nanobody-discoveryNanobody DiscoveryEnd-to-end VHH nanobody design, epitope targeting, structural screening, and binding prediction.1–2 hours
enzyme-engineeringEnzyme EngineeringGenerative enzyme design with catalytic activity prediction and thermostability optimization.1–3 hours
structure-based-drug-discoveryStructure-Based Drug DiscoveryTarget structure retrieval or prediction, binding site identification, and virtual screening.30–120 min

Quick Start

Upload your input files, submit a pipeline by ID, poll until completion, then download outputs. The SDKs provide a .wait() helper that handles polling automatically.

from smartsbio import SmartsBio

client = SmartsBio(api_key="sk_live_...")
ws_id = "ws_abc123"

# 1. Upload FASTQ files
r1 = client.files.upload("sample_R1.fastq.gz", workspace_id=ws_id, path="input/")
r2 = client.files.upload("sample_R2.fastq.gz", workspace_id=ws_id, path="input/")

# 2. Start the RNA-seq pipeline
pipeline = client.pipelines.create(
    pipeline_id="rna-seq-analysis",
    workspace_id=ws_id,
    input={
        "fastq_r1": r1["key"],
        "fastq_r2": r2["key"],
        "gtf": "input/genome.gtf",
        "reference": "GRCh38",
        "output_path": "results/rnaseq/",
    },
)
print(f"Pipeline {pipeline['id']} queued")

# 3. Wait for completion (polls every 30 s)
pipeline = client.pipelines.wait(
    pipeline["id"],
    workspace_id=ws_id,
    poll_interval=30,
    on_progress=lambda p: print(f"  [{p.get('current_step', '?')}]  {p['progress_pct']}%"),
)

# 4. Download output files
for key in pipeline["output_paths"]:
    client.files.download(key, workspace_id=ws_id, dest="./output/")
print("Done!")

Discover at Runtime

Use the list_pipelines tool to fetch the current pipeline registry with required inputs and estimated runtimes.

result = client.tools.run(tool_id="list_pipelines", input={})
for p in result["pipelines"]:
    print(f"{p['id']:45s} ~{p['estimated_runtime']}")
Full API details: The Pipelines API Reference covers all endpoints (POST, GET, DELETE), the pipeline object schema, polling options, log access, and input/output details for every predefined pipeline.