ScopeDriftCheck¶
What it does¶
INSPECTION-stage check that catches three drift categories on streaming responses:
- Token-count drift — output exceeds
max_tokens × char_per_token_estimate × (1 + tolerance). - Character-length drift — raw output exceeds an absolute character cap.
- Classification-marker drift — output contains a marker
(e.g.
SECRET//NOFORN,(TS)) that implies a higher classification than the request'sX-Classificationdeclared.
When any drift fires, the stream is aborted mid-flight; the caller receives the chunks delivered so far plus a trailer event naming the check. The audit row records exactly which marker / cap fired.
Stage¶
INSPECTION.
Configuration¶
from signet.checks import ScopeDriftCheck
# Defaults: USG marker table, 10% over-shoot tolerance
ScopeDriftCheck()
# Custom marker table for non-USG classification systems
ScopeDriftCheck(markers={
"INTERNAL_ONLY": 1,
"CONFIDENTIAL": 2,
"RESTRICTED": 3,
})
# Disable marker check (token+length drift only)
ScopeDriftCheck(markers={})
# Tighter token-budget tolerance
ScopeDriftCheck(token_tolerance=0.0) # zero overshoot
Default marker table¶
{
# SECRET (level 2)
"SECRET//NOFORN": 2, "SECRET//REL": 2, "(S)": 2, "(S//NF)": 2,
# TS / TS/SCI (levels 3-4)
"TOP SECRET//SCI": 4, "TS//SCI": 4, "(TS)": 3, "(TS//SCI)": 4,
# CUI (level 1)
"CUI//": 1, "FOUO": 1,
}
Override the entire table via the markers constructor argument.
Audit row example¶
{
"check_name": "scope_drift",
"decision": "block",
"reason": "classification marker 'SECRET//NOFORN' (level 2) exceeds request level 0 (UNCLASS)",
"metadata": {
"marker": "SECRET//NOFORN",
"marker_level": 2,
"request_level": 0,
"drift_kind": "classification_marker"
}
}
For token/length drift:
{
"check_name": "scope_drift",
"decision": "block",
"reason": "output exceeds 200 max_tokens × 4 chars × 1.10 tolerance = 880 chars",
"metadata": {
"drift_kind": "token_count",
"max_tokens": 200,
"tolerance": 0.10,
"actual_chars": 942
}
}
False-positive surface¶
Marker matching is literal-substring. A model explaining "the SECRET//NOFORN handling rules are…" in legitimately UNCLASS training material will trip the check. Mitigations:
- Override the marker table to only include strings your domain doesn't use benignly.
- Set
markers={}to disable classification drift entirely if your deployment doesn't use classification. - Move the check to RECORD-stage (audit-only) for low-stakes deployments — drift is logged but doesn't abort the stream.