Skip to main content

Biograph

Proprietary

smarts.bio's unified biomedical knowledge graph. Connects genes, proteins, diseases, variants, pathways, and drugs into a single queryable graph — powered by Neo4j and curated from dozens of public databases.

How Biograph works

Biograph integrates curated relationships from UniProt, NCBI Gene, OMIM, ClinVar, KEGG, Reactome, DrugBank, and the scientific literature into a Neo4j property graph. You can look up entities, do full-text search, or traverse multi-hop networks to discover indirect connections that would require querying a dozen databases separately.

60K+
Genes
26K+
Diseases
1M+
Variants
6
Entity types

Entity types

TypeScaleID formats accepted
gene60K+Gene symbol (BRCA1), NCBI Gene ID (672), Ensembl gene ID
protein20K+UniProt accession (P38398), UniProt entry name (BRCA1_HUMAN)
disease26K+OMIM ID (114480), MeSH ID, MONDO ID, disease name
variant1M+dbSNP rs ID (rs80357914), ClinVar variation ID, HGVS notation
pathway5K+KEGG pathway ID (hsa05212), Reactome ID, pathway name
drug10K+DrugBank ID (DB00619), drug name, ChEMBL ID

Query mode: entity-lookup

Retrieve a single entity and its direct relationships. Best for fetching structured facts about a known gene, protein, disease, or variant.

TOOLbiographtools scope

Parameters

ParameterTypeDescription
mode *stringentity-lookup
entity *stringEntity ID or name (gene symbol, UniProt accession, OMIM ID, rs ID, etc.)
entity_type *stringOne of: gene, protein, disease, variant, pathway, drug
relation_typesstring[]Filter to specific relationship types (e.g. ["AFFECTS", "ASSOCIATED_WITH"]). Default: all.
depthintegerRelationship traversal depth (1 = direct neighbors only, max 2). Default: 1.
result = client.tools.run(
    tool_id="biograph",
    input={
        "mode": "entity-lookup",
        "entity": "BRCA1",
        "entity_type": "gene",
    },
)

entity = result["entity"]
print(f"Gene: {entity['name']} ({entity['id']})")
print(f"Description: {entity['description']}")
print(f"\nRelationships ({len(result['relationships'])} total):")
for rel in result["relationships"][:10]:
    print(f"  [{rel['type']}] → {rel['target']['name']} ({rel['target']['entity_type']})")

Query mode: search

Full-text search across all entity types simultaneously. Returns ranked results with their type, identifiers, and a short description. Useful when you know a term but not its exact ID.

TOOLbiograph

Parameters

ParameterTypeDescription
mode *stringsearch
query *stringSearch term, disease name, gene name, pathway description, etc.
entity_typesstring[]Restrict to specific entity types. Default: all six types.
limitintegerMax results to return (default 20, max 100).
result = client.tools.run(
    tool_id="biograph",
    input={
        "mode": "search",
        "query": "hereditary breast ovarian cancer",
        "entity_types": ["gene", "disease", "variant"],
        "limit": 15,
    },
)

for hit in result["results"]:
    print(
        f"[{hit['entity_type']:8s}]  {hit['name']:<30s}  "
        f"id={hit['id']}  score={hit['score']:.2f}"
    )

Query mode: network

Traverse the knowledge graph starting from one or more seed entities. Returns the sub-graph (nodes + edges) up to a given depth — ideal for building interaction networks, drug target mapping, or disease mechanism exploration.

TOOLbiograph

Parameters

ParameterTypeDescription
mode *stringnetwork
seeds *object[]List of seed entities, each with entity and entity_type. Max 5 seeds.
depthintegerTraversal hops from seeds (1–3, default 2). Larger values return exponentially more nodes.
relation_typesstring[]Limit traversal to specific edge types (e.g. ["TARGETS", "ASSOCIATED_WITH"]). Default: all.
max_nodesintegerCap on returned nodes (default 200, max 500). Prevents oversized graphs.
# Map the drug-target network around an oncogene
result = client.tools.run(
    tool_id="biograph",
    input={
        "mode": "network",
        "seeds": [
            {"entity": "KRAS", "entity_type": "gene"},
            {"entity": "EGFR", "entity_type": "gene"},
        ],
        "depth": 2,
        "relation_types": ["TARGETS", "ASSOCIATED_WITH", "ENCODES"],
        "max_nodes": 150,
    },
)

nodes = result["nodes"]
edges = result["edges"]
print(f"Network: {len(nodes)} nodes, {len(edges)} edges")

# Count by entity type
from collections import Counter
type_counts = Counter(n["entity_type"] for n in nodes)
for entity_type, count in type_counts.most_common():
    print(f"  {entity_type}: {count}")

# Find drugs in the network
drugs = [n for n in nodes if n["entity_type"] == "drug"]
print(f"\nDrugs found: {[d['name'] for d in drugs[:10]]}")

Relationship types

Use these values in relation_types to filter traversals.

RelationshipSource → TargetMeaning
ENCODESgene → proteinGene encodes a protein product
ASSOCIATED_WITHgene / variant → diseaseGenetic association from GWAS or OMIM
AFFECTSvariant → gene / proteinVariant has functional effect (ClinVar significance)
TARGETSdrug → protein / geneDrug acts on this molecular target (DrugBank / ChEMBL)
TREATSdrug → diseaseApproved or investigational treatment indication
PARTICIPATES_INgene / protein → pathwayGene/protein is a member of a biological pathway
INTERACTS_WITHprotein ↔ proteinPhysical protein-protein interaction (STRING / IntAct)
REGULATESgene → geneTranscriptional regulation relationship

Use cases

Drug repurposing — find drugs targeting a disease pathway

Traverse from a disease through its associated genes, then follow TARGETS edges to find approved drugs that hit those genes — surfacing repurposing candidates.

# Start from a disease, discover drug repurposing candidates
result = client.tools.run(
    tool_id="biograph",
    input={
        "mode": "network",
        "seeds": [{"entity": "Pancreatic cancer", "entity_type": "disease"}],
        "depth": 3,
        "relation_types": ["ASSOCIATED_WITH", "ENCODES", "TARGETS", "TREATS"],
    },
)

# Extract drugs connected to this disease network
drugs = {n["id"]: n["name"] for n in result["nodes"] if n["entity_type"] == "drug"}

# Find edges where drugs target genes in the network
drug_targets = [
    e for e in result["edges"]
    if e["type"] == "TARGETS" and e["source"] in drugs
]

print("Drug repurposing candidates:")
for edge in drug_targets[:10]:
    drug_name = drugs[edge["source"]]
    target_id = edge["target"]
    target_node = next((n for n in result["nodes"] if n["id"] == target_id), {})
    print(f"  {drug_name:25s} → targets {target_node.get('name', target_id)}")

Let the agent traverse Biograph automatically

When you use the Query endpoint, the agent will automatically use Biograph when your prompt involves relationships between biological entities — combining it with literature (PubMed), clinical (ClinVar), and structural (PDB) data as needed.

response = client.query.run(
    "What genes are associated with Alzheimer's disease and which drugs target them?",
)
# The agent queries Biograph, enriches with literature evidence from PubMed,
# and returns a structured summary with references.
print(response.answer)