95 lines
3.6 KiB
Markdown
95 lines
3.6 KiB
Markdown
# Helia
|
|
|
|
**A Modular Agent Framework for Therapeutic Interview Analysis**
|
|
|
|
This project is the core implementation for the Bachelor Thesis: *"Comparing Local, Self-Hosted, and Cloud LLM Deployments for Therapeutic Interview Analysis"*.
|
|
|
|
## Project Context
|
|
|
|
Helia aims to bridge the gap between AI and mental healthcare by automating the analysis of diagnostic interviews. Specifically, it tests whether **local, privacy-first LLMs** can match the performance of cloud-based models when mapping therapy transcripts to standardized **PHQ-8** (Patient Health Questionnaire) scores.
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
src/helia/
|
|
├── agent/
|
|
│ └── workflow.py # LangGraph agent & router
|
|
├── analysis/
|
|
│ └── extractor.py # Metadata extraction (LLM-agnostic)
|
|
├── assessment/
|
|
│ └── core.py # Clinical assessment logic (PHQ-8)
|
|
│ └── schema.py # Data models (AssessmentResult, PHQ8Item)
|
|
├── ingestion/
|
|
│ └── parser.py # Transcript parsing (DAIC-WOZ support)
|
|
├── db.py # MongoDB persistence layer
|
|
└── main.py # CLI entry point
|
|
```
|
|
|
|
## Data Flow
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[Transcript File<br/>TSV/TXT] -->|TranscriptParser| B(Utterance Objects)
|
|
B -->|MetadataExtractor<br/>+ LLM| C(Enriched Utterances)
|
|
C -->|Assessment Engine<br/>+ Clinical Logic| D(PHQ-8 Scores & Evidence)
|
|
D -->|Persistence Layer| E[(MongoDB / Beanie)]
|
|
|
|
U[User Query] -->|LangGraph Agent| R{Router}
|
|
R -->|Assessment Tool| D
|
|
R -->|Search Tool| E
|
|
```
|
|
|
|
1. **Ingestion**: `TranscriptParser` reads clinical interview files (e.g., DAIC-WOZ).
|
|
2. **Analysis**: `MetadataExtractor` enriches data with sentiment/tone using interchangeable LLMs.
|
|
3. **Assessment**: The core engine maps dialogue to clinical criteria (PHQ-8), generating scores and citing evidence.
|
|
4. **Persistence**: Results are stored as structured `AssessmentResult` documents in MongoDB for analysis and benchmarking.
|
|
|
|
## Implemented Features
|
|
|
|
- **Modular LLM Backend**: designed to switch between Cloud (OpenAI) and Local models (Tier 1-3 comparative benchmark).
|
|
- **Clinical Parsing**: Native support for DAIC-WOZ transcript formats.
|
|
- **Structured Assessment**: Maps unstructured conversation to validatable PHQ-8 scores.
|
|
- **Document Persistence**: Stores full experimental context (config + evidence + scores) in MongoDB using Beanie.
|
|
- **Unified Configuration**: Single YAML config with environment variable fallbacks for system settings.
|
|
|
|
## Roadmap
|
|
|
|
- [ ] **Comparative Benchmark**: Run full evaluation across Local vs. Cloud tiers.
|
|
- [ ] **Vector Search**: Implement semantic search over transcript evidence.
|
|
- [ ] **Test Suite**: Add comprehensive tests for the assessment logic.
|
|
|
|
## Installation
|
|
|
|
Install the package using `uv`.
|
|
|
|
```sh
|
|
uv sync
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
1. **Configuration**:
|
|
Copy `example.config.yaml` to `config.yaml` and edit it.
|
|
```sh
|
|
cp example.config.yaml config.yaml
|
|
```
|
|
|
|
You can set system settings (MongoDB, S3) in the YAML file, or use environment variables (e.g., `HELIA_MONGO_URI`, `HELIA_S3_BUCKET`).
|
|
Provider API keys can be set in YAML or via `{PROVIDER_NAME}_API_KEY` (uppercased), e.g. `ANTHROPIC_API_KEY` for `providers.anthropic`.
|
|
|
|
2. **Run an Assessment**:
|
|
```sh
|
|
# Ensure MongoDB is running
|
|
uv run python -m helia.main config.yaml
|
|
```
|
|
|
|
## Development
|
|
|
|
- **Linting**: `uv run ruff check .`
|
|
- **Formatting**: `uv run ruff format .`
|
|
- **Type Checking**: `uv run ty check`
|
|
|
|
## License
|
|
|
|
This project is available as open source under the terms of the [MIT License](LICENSE).
|