Skip to main content
Use this guide when you need validation KPIs outside the Speckle web app—for example custom dashboards, CI gates, or scheduled reports. You will use SpecklePy to run GraphQL queries with SpeckleClient and execute_query, fetch saved Data Validation checks, read pass/fail summaries, and optionally flatten rule results into a pandas DataFrame. The Python examples are split into notebook cells so you can run them in Jupyter or save as a script. In GraphQL, checks are queried with type: "model_validation". This guide is GraphQL-only and read-only. For what checks and results mean in the product UI, see Data Validation overview. For authentication and the /graphql endpoint, see GraphQL API.
Quick start: Run Step 4 cells on this page, or download the notebook from GitHub. Skim Find your IDs first. If you already have check or model IDs, start at Step 2 or Step 3.

Prerequisites

  • A personal access token with streams:read scope (Building with PATs)
  • Read access to a project that has at least one saved validation check
  • Python 3.10+ with specklepy, pandas, and python-dotenv
  • JupyterLab or VS Code with a Python kernel (optional — the same cells run as a .py script)
Python examples authenticate with SpeckleClient.authenticate_with_token. SpecklePy attaches the Bearer token to GraphQL requests — you do not need to set headers manually. This flow uses standard project read access and streams:read; you do not need server admin permissions or write scopes to consume saved results.
On self-hosted Speckle Enterprise Server, Intelligence and Data Validation may require the Intelligence feature flag. See Enterprise deployment — Intelligence for setup; this page does not reproduce Helm configuration.

Find your IDs

Use one consistent token set across this guide. Copy the segment after each path prefix from your browser URL, or read id fields from the discovery query below. Substitute your real IDs wherever you see PROJECT_ID, INSIGHT_ID, MODEL_ID, or VERSION_ID.
IDTokenFull URL pattern
projectIdPROJECT_IDhttps://app.speckle.systems/projects/{PROJECT_ID}/data-validation
Check ID (insightId)INSIGHT_IDhttps://app.speckle.systems/projects/{PROJECT_ID}/data-validation/{INSIGHT_ID}/
modelIdMODEL_IDhttps://app.speckle.systems/projects/{PROJECT_ID}/models/{MODEL_ID}
versionIdVERSION_IDhttps://app.speckle.systems/projects/{PROJECT_ID}/models/{MODEL_ID}@{VERSION_ID} or latestResults[].versionId
Replace tokens with IDs from your project when you run queries against live data. POST /graphql
query DiscoverValidationChecks($projectId: String!) {
  projectInsights(projectId: $projectId, type: "model_validation") {
    id
    name
  }
}
You should see… a JSON projectInsights array; each item’s id is the check ID (insightId in GraphQL) for later steps. Run the query in Apollo Studio to explore the schema, or from Python using the Step 4 cells.

Overview

The read-only flow:
1

List validation checks

Call projectInsights with type: "model_validation". Read each check’s latest KPI from aggregateResults(limit: 1) summary — omit result here.
2

Fetch check history

Open one check by insightId. Use aggregateResults for score history and latestResults for the newest result per tracked model.
3

Load model results

Call modelResults(modelId, limit) when you need version history for one model. Request the result field only in this step — it is the heaviest payload.
4

Build a KPI DataFrame

Transform aggregate summaries into a pandas table for dashboards, exports, or scheduled reporting.
Use aggregate summary for dashboards and score history. Request result only when you need per-rule rows.
For dashboard KPIs, use aggregateResults(limit: 1) and omit result from your selection set until Step 3. If you already know INSIGHT_ID or MODEL_ID, each step notes what you can skip.

Step 1: List validation checks

List saved checks and the latest aggregate pass/fail counts for each.
Only have MODEL_ID? After this query, keep checks whose modelIds array includes your model. Use that check’s id as INSIGHT_ID in Step 2 or 3. A model can appear in multiple checks — pick the one you care about.
POST /graphql
query ProjectValidationChecks($projectId: String!) {
  projectInsights(projectId: $projectId, type: "model_validation") {
    id
    name
    modelIds
    metadata
    updatedAt
    aggregateResults(limit: 1) {
      id
      timestamp
      summary
    }
  }
}
You should see… one or more checks with name, modelIds, and aggregateResults[0].summary containing numeric pass and fail.

Step 2: Fetch check history

Load aggregate history and per-model latest summaries for one check. Do not request result yet. If you already have INSIGHT_ID from the check URL or your environment, skip Step 1. POST /graphql
query ValidationCheckHistory($projectId: String!, $insightId: String!) {
  insight(id: $insightId, projectId: $projectId) {
    id
    name
    metadata
    aggregateResults(limit: 20) {
      id
      timestamp
      summary
    }
    latestResults {
      id
      modelId
      versionId
      timestamp
      summary
    }
  }
}
You should see… up to 20 historical aggregate rows (summary, timestamp) plus latestResults with one entry per tracked model. If you also know MODEL_ID, pick the latestResults row where modelId matches — that gives pass/fail for the newest stored result on that model without fetching result. Derive score and status from summary and metadata.displayConfig; see Compute KPI score and status.

Step 3: Load model results

Request stored results for one model. Results are ordered newest first. If you already have PROJECT_ID, INSIGHT_ID, and MODEL_ID (common for CI gates or post-publish scripts), skip Steps 1–2. modelResults and versionResults require the model to appear in the check’s modelIds (visible in Step 1); otherwise responses are empty.

Version history

Use when you need multiple snapshots for one model, or the full result payload for rule breakdown. POST /graphql
query ModelValidationResults(
  $projectId: String!
  $insightId: String!
  $modelId: String!
  $limit: Int
) {
  insight(id: $insightId, projectId: $projectId) {
    id
    modelResults(modelId: $modelId, limit: $limit) {
      id
      versionId
      timestamp
      summary
      result
    }
  }
}
You should see… modelResults ordered newest-first, each with versionId, summary, and a result object with columns and rows.

One version snapshot

Use when you know VERSION_ID as well — from the model version URL (.../models/{MODEL_ID}@{VERSION_ID}) or a publish webhook — and want that snapshot only. POST /graphql
query ModelVersionValidationResults(
  $projectId: String!
  $insightId: String!
  $modelId: String!
  $versionId: String!
) {
  insight(id: $insightId, projectId: $projectId) {
    id
    versionResults(modelId: $modelId, versionId: $versionId) {
      id
      versionId
      timestamp
      summary
      result
    }
  }
}
Omit result from the selection set for summary-only CI gates. Add it when you need per-rule rows. You should see… a (usually single-item) versionResults array for that version, or an empty array if validation has not run yet — see Wait for new results.
Single-model pipeline .env example:
SPECKLE_PROJECT_ID=your_project_id
SPECKLE_INSIGHT_ID=your_check_id
SPECKLE_MODEL_ID=your_model_id
INSIGHT_ID is the segment after /data-validation/ in the check URL.
In Step 3, result can be large and may include object IDs. Fetch one model at a time, keep limit small, and treat the payload as sensitive project data.

Step 4: Build a KPI DataFrame

Run the Python below as notebook cells or concatenate into a script. It authenticates with SpecklePy, runs the Step 1 query through the SDK GraphQL handler, and builds a KPI table. Helpers for score and status are in Extend the examples — start with Compute KPI score and status. Queries with GraphQL variables use the same authenticated client as execute_query, with variable_values passed to the underlying GraphQL client. Create a .env file next to your notebook or script:
SPECKLE_HOST=https://app.speckle.systems
SPECKLE_TOKEN=your_personal_access_token
SPECKLE_PROJECT_ID=your_project_id
Cell 1 — install dependencies (notebook only):
%pip install -q specklepy pandas python-dotenv
Cell 2 — imports:
import os
from collections import defaultdict

import pandas as pd
from dotenv import load_dotenv
from gql import gql
from specklepy.api.client import SpeckleClient

load_dotenv()
Cell 3 — authenticate:
HOST = os.getenv("SPECKLE_HOST", "https://app.speckle.systems")
TOKEN = os.getenv("SPECKLE_TOKEN")
PROJECT_ID = os.getenv("SPECKLE_PROJECT_ID")

if not TOKEN:
    raise ValueError("Set SPECKLE_TOKEN in your environment.")
if not PROJECT_ID:
    raise ValueError("Set SPECKLE_PROJECT_ID in your environment.")

client = SpeckleClient(host=HOST)
client.authenticate_with_token(TOKEN)
Cell 4 — query handler and KPI helpers:
DEFAULT_DISPLAY_CONFIG = {
    "passThreshold": 0.9,
    "warningThreshold": None,
    "rulePassThreshold": {},
    "ruleWarningThreshold": {},
    "ruleSeverity": {},
}

LIST_CHECKS_QUERY = gql("""
query ProjectValidationChecks($projectId: String!) {
  projectInsights(projectId: $projectId, type: "model_validation") {
    id
    name
    metadata
    aggregateResults(limit: 1) {
      timestamp
      summary
    }
  }
}
""")


def run_query(query, variables: dict | None = None) -> dict:
    if variables:
        return client.httpclient.execute(query, variable_values=variables)
    return client.execute_query(query)


def resolve_thresholds(display_config: dict, rule_name: str | None = None) -> dict:
    cfg = {**DEFAULT_DISPLAY_CONFIG, **(display_config or {})}
    if rule_name:
        pass_t = cfg["rulePassThreshold"].get(rule_name, cfg["passThreshold"])
        warn_map = cfg.get("ruleWarningThreshold") or {}
        warn_t = warn_map[rule_name] if rule_name in warn_map else cfg.get("warningThreshold")
        severity = cfg.get("ruleSeverity", {}).get(rule_name, "error")
        return {"passThreshold": pass_t, "warningThreshold": warn_t, "severity": severity}
    return {
        "passThreshold": cfg["passThreshold"],
        "warningThreshold": cfg.get("warningThreshold"),
        "severity": "error",
    }


def compute_pass_rate(summary: dict) -> float | None:
    pass_n = summary.get("pass", 0) or 0
    fail_n = summary.get("fail", 0) or 0
    total = pass_n + fail_n
    if total == 0:
        return None
    return pass_n / total


def compute_score_pct(summary: dict) -> int | None:
    rate = compute_pass_rate(summary)
    if rate is None:
        return None
    return round(rate * 100)


def compute_status(
    pass_rate: float | None,
    display_config: dict,
    rule_name: str | None = None,
) -> str:
    if pass_rate is None:
        return "na"
    thresholds = resolve_thresholds(display_config, rule_name)
    if rule_name and thresholds.get("severity") == "info":
        return "info"
    pass_t = thresholds["passThreshold"]
    warn_t = thresholds.get("warningThreshold")
    if pass_rate >= pass_t:
        return "pass"
    if warn_t is not None and pass_rate >= warn_t:
        return "warning"
    return "fail"


def checks_to_kpi_df(checks: list[dict]) -> pd.DataFrame:
    rows = []
    for check in checks:
        agg_list = check.get("aggregateResults") or []
        agg = agg_list[0] if agg_list else None
        summary = (agg or {}).get("summary") or {}
        metadata = check.get("metadata") or {}
        display_config = metadata.get("displayConfig") or {}
        pass_rate = compute_pass_rate(summary)
        rows.append(
            {
                "name": check.get("name"),
                "insight_id": check.get("id"),
                "pass": summary.get("pass", 0),
                "fail": summary.get("fail", 0),
                "score_pct": compute_score_pct(summary),
                "status": compute_status(pass_rate, display_config),
                "evaluated_at": (agg or {}).get("timestamp"),
            }
        )
    return pd.DataFrame(rows)
Cell 5 — list checks and show the KPI table:
result = run_query(LIST_CHECKS_QUERY, {"projectId": PROJECT_ID})
checks = result.get("projectInsights") or []
kpi_df = checks_to_kpi_df(checks)
kpi_df
In Jupyter, the last line renders the table. In a script, use print(kpi_df.to_string(index=False)). You should see… one row per check with columns name, score_pct, status, pass, fail, and evaluated_at. For related GraphQL and Python patterns, see SpecklePy model data analytics.

Notebook

Download the notebook. Save the file locally, add your .env in the same folder, and run top to bottom.

Extend the examples

Optional depth after the main flow: how scores map to the UI, rule-level DataFrames, and polling when results are still processing.

Compute KPI score and status

The web app score is not returned pre-computed. Derive it from summary and metadata.displayConfig:
  1. Read summary.pass and summary.fail from the latest aggregateResults[0].
  2. Set total = pass + fail. If total == 0, status is pending / no data (na).
  3. Set pass_rate = pass / total (a decimal between 0 and 1).
  4. Set score_pct = round(pass_rate * 100) — the large percentage on check cards.
  5. Load thresholds from metadata.displayConfig on each check. Defaults when absent: passThreshold: 0.9, optional warningThreshold.
  6. Apply status rules:
    • pass_rate >= passThresholdpass
    • else if warningThreshold is set and pass_rate >= warningThresholdwarning
    • else → fail
  7. Per-rule status uses the same logic on each rule’s pass/fail ratio.
Worked example: pass=870, fail=130pass_rate=0.87score_pct=87. With project passThreshold=0.9, status is warning. The same counts yield pass for a rule with rulePassThreshold of 0.85. Thresholds are stored in metadata.displayConfig on each check (returned on projectInsights / insight queries). They are set in the Data Validation UI. Consumers read the stored config; updating checks via API is out of scope for this guide.
KeyScopeExample
passThresholdProject-wide default (0–1)0.9
warningThresholdProject-wide optional band0.7
rulePassThresholdPer-rule override map{ "Room name required": 0.95 }
ruleWarningThresholdPer-rule warn override{ "Room name required": 0.8 }
ruleSeverityPer-rule advisory vs error{ "Optional note": "info" }
For PASS, WARN, and FAIL meaning in the UI, see Viewing results — result states. For threshold tuning in the product, see Checks — thresholds and status.

Build a rule breakdown DataFrame

The result field is a tabular JSON object with columns and rows. Validation rows use dimensions rule, status, and gate, plus measure count.
def query_result_to_rules_df(result: dict, display_config: dict) -> pd.DataFrame:
    rule_entries: dict[str, list[dict]] = defaultdict(list)
    for row in result.get("rows", []):
        vals = row.get("values", {})
        rule_name = vals.get("rule")
        status = vals.get("status")
        if rule_name == "_total" or status not in ("pass", "fail"):
            continue
        rule_entries[rule_name].append(vals)

    out = []
    for rule_name, entries in rule_entries.items():
        max_gate = max(int(e.get("gate", 0)) for e in entries)
        pass_n = sum(
            int(e.get("count", 0))
            for e in entries
            if e.get("status") == "pass" and int(e.get("gate", 0)) == max_gate
        )
        fail_n = sum(
            int(e.get("count", 0))
            for e in entries
            if e.get("status") == "fail" and int(e.get("gate", 0)) == max_gate
        )
        total = pass_n + fail_n
        ratio = pass_n / total if total else 0.0
        out.append(
            {
                "rule": rule_name,
                "pass": pass_n,
                "fail": fail_n,
                "ratio": ratio,
                "status": compute_status(ratio if total else None, display_config, rule_name),
            }
        )
    return pd.DataFrame(out)


def aggregate_history_to_df(aggregate_results: list[dict], display_config: dict) -> pd.DataFrame:
    rows = []
    for entry in reversed(aggregate_results):
        summary = entry.get("summary") or {}
        rate = compute_pass_rate(summary)
        rows.append(
            {
                "timestamp": entry.get("timestamp"),
                "pass": summary.get("pass", 0),
                "fail": summary.get("fail", 0),
                "score_pct": compute_score_pct(summary),
                "status": compute_status(rate, display_config),
            }
        )
    return pd.DataFrame(rows)
aggregate_results from the API is newest-first; reverse for chronological charts.

Wait for new results

Results are computed asynchronously after a new model version is published. Poll until data appears, then stop.
SituationSuggested intervalStop whenMax wait guidance
New check or new version pushed15 secondsaggregateResults has at least one row~10 minutes; then investigate
Waiting for aggregate rollup30 secondsaggregate non-empty~15 minutes for large checks
Steady-state monitoringNo pollingQuery on a schedule instead
import time
from gql import gql
from specklepy.api.client import SpeckleClient

WAIT_FOR_AGGREGATE_QUERY = gql("""
query WaitForAggregate($projectId: String!, $insightId: String!) {
  insight(id: $insightId, projectId: $projectId) {
    aggregateResults(limit: 1) {
      timestamp
      summary
    }
  }
}
""")


def wait_for_aggregate_result(
    client: SpeckleClient,
    project_id: str,
    insight_id: str,
    interval_sec: float = 15.0,
    max_attempts: int = 40,
) -> dict | None:
    variables = {"projectId": project_id, "insightId": insight_id}
    for _ in range(max_attempts):
        result = client.httpclient.execute(WAIT_FOR_AGGREGATE_QUERY, variable_values=variables)
        results = (result.get("insight") or {}).get("aggregateResults") or []
        if results:
            return results[0]
        time.sleep(interval_sec)
    return None
Do not poll faster than every 15 seconds. For production dashboards, prefer scheduled queries over tight polling loops.

Troubleshooting

Common errors

Error / symptomLikely causeWhat to do
"Insights module is not enabled on this server"Intelligence feature disabled on self-hostedEnable per Enterprise deployment
401 / auth errorsMissing or invalid tokenCall authenticate_with_token with a valid PAT
403 / forbiddenNo project read or scoped token wrong projectVerify streams:read and project access
Empty modelResults / versionResultsModel not in check modelIds or no run yetConfirm MODEL_ID in check’s modelIds; poll Wait for new results
Empty aggregateResultsCheck still processing or no models attachedPoll Wait for new results; confirm modelIds
Empty projectInsightsWrong projectId or no checks savedRun Find your IDs discovery query; confirm checks in UI
GraphQL errors arrayMalformed query or wrong variable typesMatch variables to $projectId: String! and other declared types
Slow / timeout responsesRequesting result for many models at onceDrop result; reduce limit; fetch one model at a time
HTTP 429 Too Many RequestsRate limit (common when polling too fast)Poll ≥15s; slow down or use scheduled queries instead
Score differs from UIThreshold overrides or per-rule severityRead metadata.displayConfig; use resolve_thresholds(..., rule_name=...)
Scoped tokens must include the target projectId.

Verify your notebook or script

After running Step 4:
  • The DataFrame has one row per check with score_pct, status, and evaluated_at.
  • Re-run with your project’s projectId; row count matches the number of checks in the UI.
  • If the table is empty, see Common errors above.

What developers need to know

Copy the segment after /projects/ in the web app URL — that value is your PROJECT_ID in https://app.speckle.systems/projects/{PROJECT_ID}/. You can also list projects via SpecklePy or GraphQL. Check IDs come from the discovery query (projectInsights[].id).
aggregateResults returns project-wide rollup history for a check (use limit: 1 for the latest KPI). latestResults returns the newest stored result per tracked model (excludes the aggregate row). modelResults(modelId, limit) returns version history for one model, newest first. If you know MODEL_ID already, use Step 1 to resolve INSIGHT_ID, Step 2 for latest summary per model, or Step 3 for history or a single version snapshot.
The check may still be processing after a version push, or no models are attached. Poll every 15–30 seconds for up to 10–15 minutes. Confirm the check lists modelIds and those models have published versions.
Copy the Step 4 cells into your own Jupyter project, or download the notebook from GitHub. Authenticate a SpeckleClient with authenticate_with_token, wrap each authenticate_with_token, wrap each operation in gql(), then call execute_query or pass variable_values to the same authenticated GraphQL client when the operation declares variables.
See GraphQL API and the Apollo Studio reference for the full schema, including projectInsights, insight, and related fields.
Creating or updating checks (insightMutations.create, update, delete). Ad-hoc validation without saving (executeQuery, executeVersionQuery). Webhooks or subscriptions when a result is ready (poll instead). REST endpoints for validation results. Authoring EAV query or rule DSL from scratch. Linking validation failures to Speckle issues (syncValidationIssues).
Last modified on July 1, 2026