114 lines
2.7 KiB
Markdown
114 lines
2.7 KiB
Markdown
# Helia
|
|
|
|
Agentic Interview Framework for ingesting, analyzing, and querying transcript data.
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
src/helia/
|
|
├── agent/
|
|
│ └── workflow.py # LangGraph agent workflow
|
|
├── analysis/
|
|
│ └── extractor.py # LLM metadata extraction
|
|
├── graph/
|
|
│ ├── loader.py # Neo4j data loading
|
|
│ └── schema.py # Pydantic graph models
|
|
├── ingestion/
|
|
│ └── parser.py # Transcript parsing logic
|
|
└── main.py # CLI entry point
|
|
```
|
|
|
|
## Data Flow
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[Transcript File<br/>TSV/TXT] -->|TranscriptParser| B(Utterance Objects)
|
|
B -->|MetadataExtractor<br/>+ OpenAI LLM| C(Enriched UtteranceNodes)
|
|
C -->|GraphLoader| D[(Neo4j Database)]
|
|
E[User Question] -->|LangGraph Agent| F{Router}
|
|
F -->|Graph Tool| D
|
|
F -->|Vector Tool| G[(Vector Store)]
|
|
D --> H[Context]
|
|
G --> H
|
|
H -->|Synthesizer| I[Answer]
|
|
```
|
|
|
|
1. **Ingestion**: `TranscriptParser` reads TSV/txt files into `Utterance` objects.
|
|
2. **Analysis**: `MetadataExtractor` enriches utterances with sentiment and tone using LLMs.
|
|
3. **Graph**: `GraphLoader` pushes nodes and relationships to Neo4j database.
|
|
4. **Agent**: ReAct workflow queries graph/vector data to answer user questions.
|
|
|
|
## Implemented Features
|
|
|
|
- Parse DAIC-WOZ transcripts and simple text formats.
|
|
- Extract metadata (sentiment, tone, speech acts) via OpenAI.
|
|
- Load `Utterance` and `Speaker` nodes into Neo4j.
|
|
- Run basic LangGraph agent with planner and router.
|
|
|
|
## Roadmap
|
|
|
|
- Add robust error handling for LLM API failures.
|
|
- Implement real `graph_tool` and `vector_tool` logic.
|
|
- Enhance agent planning capabilities.
|
|
- Add comprehensive test suite.
|
|
|
|
## Installation
|
|
|
|
Install the package using `uv`.
|
|
|
|
```sh
|
|
uv pip install helia
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
Run the agent directly from the command line.
|
|
|
|
```sh
|
|
export OPENAI_API_KEY=sk-...
|
|
export NEO4J_URI=bolt://localhost:7687
|
|
export NEO4J_PASSWORD=password
|
|
|
|
python -m helia.main "How many interruptions occurred?"
|
|
```
|
|
|
|
## Usage
|
|
|
|
Parse a transcript file programmatically.
|
|
|
|
```python
|
|
from helia.ingestion.parser import TranscriptParser
|
|
from pathlib import Path
|
|
|
|
parser = TranscriptParser()
|
|
utterances = parser.parse(Path("transcript.tsv"))
|
|
```
|
|
|
|
Extract metadata from utterances.
|
|
|
|
```python
|
|
from helia.analysis.extractor import MetadataExtractor
|
|
|
|
extractor = MetadataExtractor()
|
|
nodes = extractor.extract(utterances)
|
|
```
|
|
|
|
Load data into Neo4j.
|
|
|
|
```python
|
|
from helia.graph.loader import GraphLoader
|
|
|
|
loader = GraphLoader()
|
|
loader.connect()
|
|
loader.load_utterances(nodes)
|
|
loader.close()
|
|
```
|
|
|
|
## Contributing
|
|
|
|
Fork the project and submit a pull request.
|
|
|
|
## License
|
|
|
|
This project is available as open source under the terms of the [MIT License](LICENSE).
|