3.5 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Context: Bachelor Thesis
Title: A Modular Agent Framework for Therapeutic Interview Analysis Goal: systematically compare local-first/on-premise LLMs against cloud-based state-of-the-art models for a specific therapeutic task (PHQ-8 assessment).
Core Hypothesis: Small quantized language models running locally can provide analytical performance comparable to large cloud models when supported by an appropriate agentic framework.
Key Requirements:
- Privacy-First: Architecture must support local/on-premise execution to address clinical data privacy concerns.
- Modularity: The system must allow easy swapping of underlying models (Tier 1: Local, Tier 2: Self-hosted, Tier 3: Cloud).
- Benchmark: The system is evaluated on its ability to accurately map therapy transcripts to PHQ-8 (Patient Health Questionnaire) scores using the DAIC-WOZ dataset.
Commands
- Install Dependencies:
uv sync - Run Agent:
python -m helia.main "Your query here" - Verify Prompts:
python scripts/verify_prompt_db.py - Lint:
uv run ruff check . - Format:
uv run ruff format . - Type Check:
uv run ty check - Test: No test suite currently exists. (Priority Roadmap item)
Architecture
Helia is a modular ReAct-style agent framework designed for clinical interview analysis.
Core Modules
-
Ingestion (
src/helia/ingestion/):- Parser:
TranscriptParserparses clinical interview transcripts (e.g., DAIC-WOZ dataset). - Loader:
ClinicalDataLoaderinloader.pyretrievesTranscriptdocuments from MongoDB. - Legacy:
S3DatasetLoader(deprecated for runtime use, used for initial population).
- Parser:
-
Data Models (
src/helia/models/):- Transcript: Document model for interview transcripts.
- Utterance/Turn: Standardized conversation units.
- Prompt: Manages prompt templates and versioning.
-
Assessment (
src/helia/assessment/):- Evaluator:
PHQ8Evaluator(incore.py) orchestrates the LLM interaction. - Logic: Implements clinical logic for standard instruments (e.g., PHQ-8).
- Schema:
src/helia/assessment/schema.pydefinesAssessmentResultandEvidence.
- Evaluator:
-
Persistence Layer (
src/helia/db.py):- Document-Based Storage: Uses MongoDB with Beanie (ODM).
- Data Capture: Stores full context (Configuration, Evidence, Outcomes) to support comparative analysis.
-
Agent Workflow (
src/helia/agent/):- Graph Architecture: Implements RISEN pattern (Extract -> Map -> Score) using LangGraph in
src/helia/agent/graph.py. - State:
ClinicalState(instate.py) manages transcript, scores, and execution status. - Nodes: Specialized logic in
src/helia/agent/nodes/(assessment, persistence). - Execution: Run benchmarks via
python -m helia.agent.runner <run_id>.
- Graph Architecture: Implements RISEN pattern (Extract -> Map -> Score) using LangGraph in
Development Standards
- Environment: Requires
OPENAI_API_KEYand MongoDB credentials. - Configuration: managed via Pydantic models in
src/helia/configuration.py. - Python: Uses implicit namespace packages.
__init__.pyfiles may be missing by design in some subdirectories. - Code Style: Follows PEP 8. Enforced via
ruff. - Security: Do not commit secrets. Avoid hardcoding model parameters; use configuration injection to support the comparative benchmark (Tiers 1-3).