WIP
This commit is contained in:
113
README.md
Normal file
113
README.md
Normal file
@@ -0,0 +1,113 @@
|
||||
# Helia
|
||||
|
||||
Agentic Interview Framework for ingesting, analyzing, and querying transcript data.
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
src/helia/
|
||||
├── agent/
|
||||
│ └── workflow.py # LangGraph agent workflow
|
||||
├── analysis/
|
||||
│ └── extractor.py # LLM metadata extraction
|
||||
├── graph/
|
||||
│ ├── loader.py # Neo4j data loading
|
||||
│ └── schema.py # Pydantic graph models
|
||||
├── ingestion/
|
||||
│ └── parser.py # Transcript parsing logic
|
||||
└── main.py # CLI entry point
|
||||
```
|
||||
|
||||
## Data Flow
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A[Transcript File<br/>TSV/TXT] -->|TranscriptParser| B(Utterance Objects)
|
||||
B -->|MetadataExtractor<br/>+ OpenAI LLM| C(Enriched UtteranceNodes)
|
||||
C -->|GraphLoader| D[(Neo4j Database)]
|
||||
E[User Question] -->|LangGraph Agent| F{Router}
|
||||
F -->|Graph Tool| D
|
||||
F -->|Vector Tool| G[(Vector Store)]
|
||||
D --> H[Context]
|
||||
G --> H
|
||||
H -->|Synthesizer| I[Answer]
|
||||
```
|
||||
|
||||
1. **Ingestion**: `TranscriptParser` reads TSV/txt files into `Utterance` objects.
|
||||
2. **Analysis**: `MetadataExtractor` enriches utterances with sentiment and tone using LLMs.
|
||||
3. **Graph**: `GraphLoader` pushes nodes and relationships to Neo4j database.
|
||||
4. **Agent**: ReAct workflow queries graph/vector data to answer user questions.
|
||||
|
||||
## Implemented Features
|
||||
|
||||
- Parse DAIC-WOZ transcripts and simple text formats.
|
||||
- Extract metadata (sentiment, tone, speech acts) via OpenAI.
|
||||
- Load `Utterance` and `Speaker` nodes into Neo4j.
|
||||
- Run basic LangGraph agent with planner and router.
|
||||
|
||||
## Roadmap
|
||||
|
||||
- Add robust error handling for LLM API failures.
|
||||
- Implement real `graph_tool` and `vector_tool` logic.
|
||||
- Enhance agent planning capabilities.
|
||||
- Add comprehensive test suite.
|
||||
|
||||
## Installation
|
||||
|
||||
Install the package using `uv`.
|
||||
|
||||
```sh
|
||||
uv pip install helia
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
Run the agent directly from the command line.
|
||||
|
||||
```sh
|
||||
export OPENAI_API_KEY=sk-...
|
||||
export NEO4J_URI=bolt://localhost:7687
|
||||
export NEO4J_PASSWORD=password
|
||||
|
||||
python -m helia.main "How many interruptions occurred?"
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
Parse a transcript file programmatically.
|
||||
|
||||
```python
|
||||
from helia.ingestion.parser import TranscriptParser
|
||||
from pathlib import Path
|
||||
|
||||
parser = TranscriptParser()
|
||||
utterances = parser.parse(Path("transcript.tsv"))
|
||||
```
|
||||
|
||||
Extract metadata from utterances.
|
||||
|
||||
```python
|
||||
from helia.analysis.extractor import MetadataExtractor
|
||||
|
||||
extractor = MetadataExtractor()
|
||||
nodes = extractor.extract(utterances)
|
||||
```
|
||||
|
||||
Load data into Neo4j.
|
||||
|
||||
```python
|
||||
from helia.graph.loader import GraphLoader
|
||||
|
||||
loader = GraphLoader()
|
||||
loader.connect()
|
||||
loader.load_utterances(nodes)
|
||||
loader.close()
|
||||
```
|
||||
|
||||
## Contributing
|
||||
|
||||
Fork the project and submit a pull request.
|
||||
|
||||
## License
|
||||
|
||||
This project is available as open source under the terms of the [MIT License](LICENSE).
|
||||
Reference in New Issue
Block a user