This commit is contained in:
Santiago Martinez-Avial
2025-12-23 13:35:15 +01:00
parent a9346ccb34
commit 5ce6d7e1d3
12 changed files with 734 additions and 22 deletions

View File

@@ -17,7 +17,7 @@ A **Hierarchical Agent Supervisor** architecture built with **LangGraph**:
* **Extract**: Quote relevant patient text.
* **Map**: Align quotes to PHQ-8 criteria.
* **Score**: Assign 0-3 value.
3. **Ingestion**: Standardizes data from S3/Local into a `ClinicalState`.
3. **Ingestion**: Standardizes data from MongoDB into a `ClinicalState`.
4. **Benchmarking**: Automates the comparison between Generated Scores vs. Ground Truth (DAIC-WOZ labels).
**Note:** A dedicated **Safety Guardrail** agent has been designed but is scoped out of this MVP. See `plans/safety-guardrail-architecture.md` for details.
@@ -50,20 +50,20 @@ graph TD
* **Deliverables**:
* `src/helia/agent/state.py`: Define `ClinicalState` (transcript, current_item, scores).
* `src/helia/agent/graph.py`: Define the main `StateGraph` with Ingestion -> Assessment -> Persistence nodes.
* `src/helia/ingestion/loader.py`: Add "Ground Truth" loading for DAIC-WOZ labels (critical for benchmarking).
* `src/helia/ingestion/loader.py`: Refactor to load Transcript documents from MongoDB.
#### Phase 2: The "RISEN" Assessment Logic
* **Goal**: Replace monolithic `PHQ8Evaluator` with granular nodes.
* **Deliverables**:
* `src/helia/agent/nodes/assessment.py`: Implement `extract_node`, `map_node`, `score_node`.
* `src/helia/prompts/`: Create specialized prompt templates for each stage (optimized for Llama 3).
* `src/helia/agent/nodes/assessment.py`: Implement `extract_node`, `map_node`, `score_node` that fetch prompts from DB.
* `migrations/init_risen_prompts.py`: Database migration to seed the Extract/Map/Score prompts.
* **Refactor**: Update `PHQ8Evaluator` to be callable as a tool/node rather than a standalone class.
#### Phase 3: Tier Switching & Execution
* **Goal**: Implement dynamic model config.
* **Deliverables**:
* `src/helia/configuration.py`: Ensure `RunConfig` (Tier 1/2/3) propagates to LangGraph `configurable` params.
* `src/helia/agent/runner.py`: CLI entry point to run batch benchmarks.
* `src/helia/agent/runner.py`: CLI entry point to run batch benchmarks using MongoDB transcripts.
#### Phase 4: Human-in-the-Loop & Persistence
* **Goal**: Enable clinician review and data saving.
@@ -87,7 +87,7 @@ graph TD
## Dependencies & Risks
- **Risk**: Local models (Tier 1) may hallucinate formatting in the "Map" stage.
* *Mitigation*: Use `instructor` or constrained decoding (JSON mode) for Tier 1.
- **Dependency**: Requires DAIC-WOZ dataset (assumed available locally or mocked).
- **Dependency**: Requires DAIC-WOZ dataset (loaded in MongoDB).
## References
- **LangGraph**: [State Management](https://langchain-ai.github.io/langgraph/concepts/high_level/#state)