SmartsMatch

Proprietary

smarts.bio's AI-powered protein similarity search. Uses protein language model embeddings to find structurally and functionally related proteins — faster than BLAST and capable of detecting remote homologs that sequence-alignment methods miss.

How SmartsMatch works

Your query protein sequence is encoded into a high-dimensional vector using a protein language model. That vector is compared against the SmartsBio AI index — a pre-embedded snapshot of UniProtKB/Swiss-Prot, PDB chains, and selected AlphaFold models — using approximate nearest-neighbour (ANN) search.

< 1 s

Typical query time

30%+

Remote homolog recall vs BLAST

250 M+

Proteins in the AI index

TOOLsmartsmatch_proteintools scope

Find the most similar proteins in the SmartsBio AI index for a given query sequence. Returns ranked hits with similarity scores, UniProt/PDB accessions, and functional annotations.

Parameters

Parameter	Type	Description
sequence *	string	Amino-acid sequence in single-letter IUPAC code. Min 10 residues, max 2 048 residues.
top_k	integer	Number of top hits to return (default 10, max 100).
threshold	float	Minimum similarity score 0–1 (default 0.7). Lower values return more distant hits.
search_mode	string	`sequence` — prioritise sequence similarity; `functional` — prioritise functional analogs (may have low sequence identity). Default: `sequence`.
index	string	Which index to search: `all` (default), `swissprot`, `pdb`, or `alphafold`.
include_annotations	boolean	If true, each hit includes GO terms, EC numbers, and subcellular localisation from UniProt (default true).

Response fields (per hit)

Field	Description
accession	UniProt accession or PDB chain ID (e.g. `P38398` or `1JM7_A`)
name	Protein name from UniProt or PDB
organism	Source organism scientific name
similarity_score	Cosine similarity 0–1 between query and hit embeddings
sequence_identity	Pairwise sequence identity % (computed post-retrieval)
source	`swissprot` \| `trembl` \| `pdb` \| `alphafold`
go_terms	List of GO term objects with `id`, `name`, `namespace`
ec_numbers	EC number(s) for enzyme hits (e.g. `3.1.1.1`)

from smartsbio import SmartsBio

client = SmartsBio(api_key="sk_live_...")

result = client.tools.run(
    tool_id="smartsmatch_protein",
    input={
        "sequence": "MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSY...",
        "top_k": 10,
        "threshold": 0.75,
        "search_mode": "sequence",
        "index": "all",
    },
)

for hit in result["hits"]:
    print(
        f"{hit['accession']:<12} "
        f"sim={hit['similarity_score']:.3f}  "
        f"id={hit['sequence_identity']:.1f}%  "
        f"{hit['name'][:50]}"
    )

SmartsMatch vs BLAST

Use SmartsMatch when you need speed, remote homolog detection, or functional analog discovery. Fall back to BLAST (ncbi_blast) when you need strict statistical E-values or coverage of the full NCBI nt/nr databases.

	SmartsMatch	BLAST (ncbi_blast)
Speed	< 1 second	10 s – 10 min
Remote homolog detection (<25% identity)	Excellent	Poor–Fair
Functional analog search	Supported (search_mode=functional)	Not supported
Statistical E-values	Not provided	Yes
Database coverage	250 M proteins (UniProt + PDB + AlphaFold)	NCBI nt/nr/refseq (trillions of bases)
Nucleotide search	Protein only	blastn, blastx, tblastn, tblastx

Use cases

Find remote homologs across kingdoms

When sequence identity is too low for BLAST to find anything, SmartsMatch can still detect evolutionarily distant relatives that share the same fold.

result = client.tools.run(
    tool_id="smartsmatch_protein",
    input={
        "sequence": my_novel_sequence,
        "top_k": 20,
        "threshold": 0.6,   # Lower threshold for distant homologs
        "search_mode": "sequence",
        "index": "swissprot",  # Curated hits only
    },
)

# Filter to hits with low sequence identity (BLAST zone-of-darkness)
remote = [h for h in result["hits"] if h["sequence_identity"] < 25]
print(f"Found {len(remote)} remote homologs below 25% identity:")
for h in remote:
    print(f"  {h['accession']}  sim={h['similarity_score']:.3f}  {h['organism']}")

Discover functional analogs (convergent evolution)

Use search_mode: functional to find proteins that perform the same biochemical function through a different fold — e.g., alternative proteases or convergent ATP-binding domains.

result = client.tools.run(
    tool_id="smartsmatch_protein",
    input={
        "sequence": serine_protease_sequence,
        "top_k": 30,
        "threshold": 0.65,
        "search_mode": "functional",   # Functional analogs, not just homologs
    },
)

# Explore EC numbers to confirm convergent function
ec_counts: dict[str, int] = {}
for h in result["hits"]:
    for ec in h.get("ec_numbers", []):
        ec_counts[ec] = ec_counts.get(ec, 0) + 1

print("Most common EC numbers among hits:")
for ec, count in sorted(ec_counts.items(), key=lambda x: -x[1])[:5]:
    print(f"  {ec}: {count} hits")

Let the agent pick SmartsMatch automatically

When you use the Query endpoint, the agent will automatically use SmartsMatch for protein similarity tasks and combine it with structural (PDB/AlphaFold) and functional (UniProt/InterPro) data sources.

response = client.query.run(
    "Find proteins similar to this sequence and tell me what they do: MTEYKLVVVGAGGVGK...",
)
# The agent uses SmartsMatch internally, then enriches hits
# with UniProt annotations and PDB structures automatically.
print(response.answer)

← All Databases BLAST →PDB / AlphaFold →