Predefined Pipelines
Ready-to-run multi-step bioinformatics workflows that run on managed AWS infrastructure. Submit a pipeline ID, poll for progress, and download your results — no infrastructure setup required.
1. Upload input files
2. POST /v1/pipelines
3. Poll for status
4. Download outputs
12
Predefined pipelines
2
Categories
AWS
Managed infrastructure
NGS Sequencing8 pipelines
| Pipeline ID | Name | Description | Runtime |
|---|---|---|---|
| quality-control | Quality Control | FastQC + MultiQC report generation for one or more sequencing datasets. | 10–30 min |
| alignment-wes | WES Alignment | Whole-exome sequencing alignment: FastQC → Trimmomatic → BWA-MEM → Picard → GATK BQSR. | 2–4 hours |
| whole-genome-sequencing | Whole Genome Sequencing | WGS alignment with QC, duplicate marking, and variant-ready BAM generation. | 4–8 hours |
| rna-seq-analysis | RNA-seq Analysis | HISAT2 alignment + FeatureCounts. Produces a count matrix ready for DESeq2 / edgeR. | 1–3 hours |
| atac-seq | ATAC-seq | Adapter trimming → Bowtie2 alignment → MACS2 peak calling → fragment size analysis. | 1–2 hours |
| chip-seq | ChIP-seq | ChIP-seq alignment, MACS2 peak calling, and de novo motif enrichment with HOMER. | 1–3 hours |
| gatk-variant-calling | GATK Variant Calling | Germline SNP & indel discovery using GATK4 HaplotypeCaller in GVCF mode. | 1–4 hours |
| somatic-variant-calling | Somatic Variant Calling | Tumor-normal somatic SNV & indel detection using GATK Mutect2 with artifact filtering. | 2–5 hours |
AI Protein Design4 pipelines
| Pipeline ID | Name | Description | Runtime |
|---|---|---|---|
| protein-binder-design-validated | Protein Binder Design | AI binder design (generative models) + automated binding affinity validation with Boltz-2. | 30–90 min |
| nanobody-discovery | Nanobody Discovery | End-to-end VHH nanobody design, epitope targeting, structural screening, and binding prediction. | 1–2 hours |
| enzyme-engineering | Enzyme Engineering | Generative enzyme design with catalytic activity prediction and thermostability optimization. | 1–3 hours |
| structure-based-drug-discovery | Structure-Based Drug Discovery | Target structure retrieval or prediction, binding site identification, and virtual screening. | 30–120 min |
Quick Start
Upload your input files, submit a pipeline by ID, poll until completion, then download outputs. The SDKs provide a .wait() helper that handles polling automatically.
from smartsbio import SmartsBio
client = SmartsBio(api_key="sk_live_...")
ws_id = "ws_abc123"
# 1. Upload FASTQ files
r1 = client.files.upload("sample_R1.fastq.gz", workspace_id=ws_id, path="input/")
r2 = client.files.upload("sample_R2.fastq.gz", workspace_id=ws_id, path="input/")
# 2. Start the RNA-seq pipeline
pipeline = client.pipelines.create(
pipeline_id="rna-seq-analysis",
workspace_id=ws_id,
input={
"fastq_r1": r1["key"],
"fastq_r2": r2["key"],
"gtf": "input/genome.gtf",
"reference": "GRCh38",
"output_path": "results/rnaseq/",
},
)
print(f"Pipeline {pipeline['id']} queued")
# 3. Wait for completion (polls every 30 s)
pipeline = client.pipelines.wait(
pipeline["id"],
workspace_id=ws_id,
poll_interval=30,
on_progress=lambda p: print(f" [{p.get('current_step', '?')}] {p['progress_pct']}%"),
)
# 4. Download output files
for key in pipeline["output_paths"]:
client.files.download(key, workspace_id=ws_id, dest="./output/")
print("Done!")Discover at Runtime
Use the list_pipelines tool to fetch the current pipeline registry with required inputs and estimated runtimes.
result = client.tools.run(tool_id="list_pipelines", input={})
for p in result["pipelines"]:
print(f"{p['id']:45s} ~{p['estimated_runtime']}")Full API details: The Pipelines API Reference covers all endpoints (POST, GET, DELETE), the pipeline object schema, polling options, log access, and input/output details for every predefined pipeline.