# Plan: Safety Guardrail Architecture (Post-MVP)

## Overview

A dedicated, parallel **Safety Guardrail Agent** designed to monitor clinical sessions for immediate risks (self-harm, suicidal ideation) and intervene regardless of the primary assessment agent's state. This component is critical for "Duty of Care" compliance but is scoped out of the initial MVP to focus on the core scoring pipeline.

## Problem Statement

General-purpose reasoning agents (like the PHQ-8 scorer) often exhibit "tunnel vision," focusing exclusively on their analytical task while missing or delaying the flagging of critical safety signals. In a clinical context, waiting for a 60-second reasoning loop to finish before flagging a suicide risk is unacceptable.

## Proposed Solution

A **Parallel Supervisor** pattern where the Safety Agent runs asynchronously alongside the main Assessment Agent.

### Architecture

```mermaid
graph TD
    Router{Router}

    subgraph "Main Flow"
        Router --> Assessment[Assessment Agent]
    end

    subgraph "Safety Layer"
        Router --> Safety[Safety Guardrail]
        Safety --> |Risk Detected| Interrupt[Interrupt Signal]
    end

    Assessment --> Merger
    Interrupt --> Merger
    Merger --> Handler{Risk Handling}
```

## Technical Approach

### 1. The Safety Agent Node
*   **Model**: Uses a smaller, faster model (e.g., Llama-3-8B-Instruct or a specialized BERT classifier) optimized for classification, not reasoning.
*   **Prompting**: Few-shot prompted specifically for:
    *   Suicidal Ideation (Passive vs Active)
    *   Self-Harm Intent
    *   Harm to Others
*   **Output**: Boolean flag (`risk_detected`) + `risk_category` + `evidence_snippet`.

### 2. Parallel Execution in LangGraph
*   **Fan-Out**: The Supervisor node spawns *both* `assessment_node` and `safety_node` for every transcript chunk.
*   **Race Condition Handling**:
    *   If `safety_node` returns `risk_detected=True`, it must trigger a **`NodeInterrupt`** or inject a high-priority state update that overrides the Assessment Agent's output.

### 3. Integration Points (Post-MVP)
*   **State Schema**:
    ```python
    class ClinicalState(BaseModel):
        # ... existing fields ...
        safety_flags: List[SafetyAlert] = []
        is_session_halted: bool = False
    ```
*   **Transition Logic**:
    If `is_session_halted` becomes True, the graph routes immediately to a "Crisis Protocol" node, bypassing all remaining PHQ-8 items.

## Implementation Plan

1.  **Define Safety Schema**: Create `SafetyAlert` Pydantic model.
2.  **Implement Guardrail Node**: Create `src/helia/agent/nodes/safety.py`.
3.  **Update Graph**: Modify `src/helia/agent/graph.py` to add the parallel edge.
4.  **Test Scenarios**: Create synthetic transcripts with hidden self-harm indicators to verify interruption works.

## References
*   [EmoAgent: Assessing and Safeguarding Human-AI Interaction (2025)](https://www.semanticscholar.org/paper/110ab0beb74ffb7ab1efe55ad36b4732835fa5c9)