This commit is contained in:
Santiago Martinez-Avial
2025-12-19 20:13:00 +01:00
commit 97b7a15977
17 changed files with 1913 additions and 0 deletions

113
README.md Normal file
View File

@@ -0,0 +1,113 @@
# Helia
Agentic Interview Framework for ingesting, analyzing, and querying transcript data.
## Project Structure
```
src/helia/
├── agent/
│ └── workflow.py # LangGraph agent workflow
├── analysis/
│ └── extractor.py # LLM metadata extraction
├── graph/
│ ├── loader.py # Neo4j data loading
│ └── schema.py # Pydantic graph models
├── ingestion/
│ └── parser.py # Transcript parsing logic
└── main.py # CLI entry point
```
## Data Flow
```mermaid
graph TD
A[Transcript File<br/>TSV/TXT] -->|TranscriptParser| B(Utterance Objects)
B -->|MetadataExtractor<br/>+ OpenAI LLM| C(Enriched UtteranceNodes)
C -->|GraphLoader| D[(Neo4j Database)]
E[User Question] -->|LangGraph Agent| F{Router}
F -->|Graph Tool| D
F -->|Vector Tool| G[(Vector Store)]
D --> H[Context]
G --> H
H -->|Synthesizer| I[Answer]
```
1. **Ingestion**: `TranscriptParser` reads TSV/txt files into `Utterance` objects.
2. **Analysis**: `MetadataExtractor` enriches utterances with sentiment and tone using LLMs.
3. **Graph**: `GraphLoader` pushes nodes and relationships to Neo4j database.
4. **Agent**: ReAct workflow queries graph/vector data to answer user questions.
## Implemented Features
- Parse DAIC-WOZ transcripts and simple text formats.
- Extract metadata (sentiment, tone, speech acts) via OpenAI.
- Load `Utterance` and `Speaker` nodes into Neo4j.
- Run basic LangGraph agent with planner and router.
## Roadmap
- Add robust error handling for LLM API failures.
- Implement real `graph_tool` and `vector_tool` logic.
- Enhance agent planning capabilities.
- Add comprehensive test suite.
## Installation
Install the package using `uv`.
```sh
uv pip install helia
```
## Quick Start
Run the agent directly from the command line.
```sh
export OPENAI_API_KEY=sk-...
export NEO4J_URI=bolt://localhost:7687
export NEO4J_PASSWORD=password
python -m helia.main "How many interruptions occurred?"
```
## Usage
Parse a transcript file programmatically.
```python
from helia.ingestion.parser import TranscriptParser
from pathlib import Path
parser = TranscriptParser()
utterances = parser.parse(Path("transcript.tsv"))
```
Extract metadata from utterances.
```python
from helia.analysis.extractor import MetadataExtractor
extractor = MetadataExtractor()
nodes = extractor.extract(utterances)
```
Load data into Neo4j.
```python
from helia.graph.loader import GraphLoader
loader = GraphLoader()
loader.connect()
loader.load_utterances(nodes)
loader.close()
```
## Contributing
Fork the project and submit a pull request.
## License
This project is available as open source under the terms of the [MIT License](LICENSE).