2.9 KiB
Plan: Safety Guardrail Architecture (Post-MVP)
Overview
A dedicated, parallel Safety Guardrail Agent designed to monitor clinical sessions for immediate risks (self-harm, suicidal ideation) and intervene regardless of the primary assessment agent's state. This component is critical for "Duty of Care" compliance but is scoped out of the initial MVP to focus on the core scoring pipeline.
Problem Statement
General-purpose reasoning agents (like the PHQ-8 scorer) often exhibit "tunnel vision," focusing exclusively on their analytical task while missing or delaying the flagging of critical safety signals. In a clinical context, waiting for a 60-second reasoning loop to finish before flagging a suicide risk is unacceptable.
Proposed Solution
A Parallel Supervisor pattern where the Safety Agent runs asynchronously alongside the main Assessment Agent.
Architecture
graph TD
Router{Router}
subgraph "Main Flow"
Router --> Assessment[Assessment Agent]
end
subgraph "Safety Layer"
Router --> Safety[Safety Guardrail]
Safety --> |Risk Detected| Interrupt[Interrupt Signal]
end
Assessment --> Merger
Interrupt --> Merger
Merger --> Handler{Risk Handling}
Technical Approach
1. The Safety Agent Node
- Model: Uses a smaller, faster model (e.g., Llama-3-8B-Instruct or a specialized BERT classifier) optimized for classification, not reasoning.
- Prompting: Few-shot prompted specifically for:
- Suicidal Ideation (Passive vs Active)
- Self-Harm Intent
- Harm to Others
- Output: Boolean flag (
risk_detected) +risk_category+evidence_snippet.
2. Parallel Execution in LangGraph
- Fan-Out: The Supervisor node spawns both
assessment_nodeandsafety_nodefor every transcript chunk. - Race Condition Handling:
- If
safety_nodereturnsrisk_detected=True, it must trigger aNodeInterruptor inject a high-priority state update that overrides the Assessment Agent's output.
- If
3. Integration Points (Post-MVP)
- State Schema:
class ClinicalState(BaseModel): # ... existing fields ... safety_flags: List[SafetyAlert] = [] is_session_halted: bool = False - Transition Logic:
If
is_session_haltedbecomes True, the graph routes immediately to a "Crisis Protocol" node, bypassing all remaining PHQ-8 items.
Implementation Plan
- Define Safety Schema: Create
SafetyAlertPydantic model. - Implement Guardrail Node: Create
src/helia/agent/nodes/safety.py. - Update Graph: Modify
src/helia/agent/graph.pyto add the parallel edge. - Test Scenarios: Create synthetic transcripts with hidden self-harm indicators to verify interruption works.