WIP

2025-12-19 20:13:00 +01:00
commit 97b7a15977
17 changed files with 1913 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,113 @@
+# Helia
+
+Agentic Interview Framework for ingesting, analyzing, and querying transcript data.
+
+## Project Structure
+
+```
+src/helia/
+├── agent/
+│   └── workflow.py      # LangGraph agent workflow
+├── analysis/
+│   └── extractor.py     # LLM metadata extraction
+├── graph/
+│   ├── loader.py        # Neo4j data loading
+│   └── schema.py        # Pydantic graph models
+├── ingestion/
+│   └── parser.py        # Transcript parsing logic
+└── main.py              # CLI entry point
+```
+
+## Data Flow
+
+```mermaid
+graph TD
+    A[Transcript File<br/>TSV/TXT] -->|TranscriptParser| B(Utterance Objects)
+    B -->|MetadataExtractor<br/>+ OpenAI LLM| C(Enriched UtteranceNodes)
+    C -->|GraphLoader| D[(Neo4j Database)]
+    E[User Question] -->|LangGraph Agent| F{Router}
+    F -->|Graph Tool| D
+    F -->|Vector Tool| G[(Vector Store)]
+    D --> H[Context]
+    G --> H
+    H -->|Synthesizer| I[Answer]
+```
+
+1. **Ingestion**: `TranscriptParser` reads TSV/txt files into `Utterance` objects.
+2. **Analysis**: `MetadataExtractor` enriches utterances with sentiment and tone using LLMs.
+3. **Graph**: `GraphLoader` pushes nodes and relationships to Neo4j database.
+4. **Agent**: ReAct workflow queries graph/vector data to answer user questions.
+
+## Implemented Features
+
+- Parse DAIC-WOZ transcripts and simple text formats.
+- Extract metadata (sentiment, tone, speech acts) via OpenAI.
+- Load `Utterance` and `Speaker` nodes into Neo4j.
+- Run basic LangGraph agent with planner and router.
+
+## Roadmap
+
+- Add robust error handling for LLM API failures.
+- Implement real `graph_tool` and `vector_tool` logic.
+- Enhance agent planning capabilities.
+- Add comprehensive test suite.
+
+## Installation
+
+Install the package using `uv`.
+
+```sh
+uv pip install helia
+```
+
+## Quick Start
+
+Run the agent directly from the command line.
+
+```sh
+export OPENAI_API_KEY=sk-...
+export NEO4J_URI=bolt://localhost:7687
+export NEO4J_PASSWORD=password
+
+python -m helia.main "How many interruptions occurred?"
+```
+
+## Usage
+
+Parse a transcript file programmatically.
+
+```python
+from helia.ingestion.parser import TranscriptParser
+from pathlib import Path
+
+parser = TranscriptParser()
+utterances = parser.parse(Path("transcript.tsv"))
+```
+
+Extract metadata from utterances.
+
+```python
+from helia.analysis.extractor import MetadataExtractor
+
+extractor = MetadataExtractor()
+nodes = extractor.extract(utterances)
+```
+
+Load data into Neo4j.
+
+```python
+from helia.graph.loader import GraphLoader
+
+loader = GraphLoader()
+loader.connect()
+loader.load_utterances(nodes)
+loader.close()
+```
+
+## Contributing
+
+Fork the project and submit a pull request.
+
+## License
+
+This project is available as open source under the terms of the [MIT License](LICENSE).