Files
helia/README.md
Santiago Martinez-Avial 97b7a15977 WIP
2025-12-19 20:13:00 +01:00

2.7 KiB

Helia

Agentic Interview Framework for ingesting, analyzing, and querying transcript data.

Project Structure

src/helia/
├── agent/
│   └── workflow.py      # LangGraph agent workflow
├── analysis/
│   └── extractor.py     # LLM metadata extraction
├── graph/
│   ├── loader.py        # Neo4j data loading
│   └── schema.py        # Pydantic graph models
├── ingestion/
│   └── parser.py        # Transcript parsing logic
└── main.py              # CLI entry point

Data Flow

graph TD
    A[Transcript File<br/>TSV/TXT] -->|TranscriptParser| B(Utterance Objects)
    B -->|MetadataExtractor<br/>+ OpenAI LLM| C(Enriched UtteranceNodes)
    C -->|GraphLoader| D[(Neo4j Database)]
    E[User Question] -->|LangGraph Agent| F{Router}
    F -->|Graph Tool| D
    F -->|Vector Tool| G[(Vector Store)]
    D --> H[Context]
    G --> H
    H -->|Synthesizer| I[Answer]
  1. Ingestion: TranscriptParser reads TSV/txt files into Utterance objects.
  2. Analysis: MetadataExtractor enriches utterances with sentiment and tone using LLMs.
  3. Graph: GraphLoader pushes nodes and relationships to Neo4j database.
  4. Agent: ReAct workflow queries graph/vector data to answer user questions.

Implemented Features

  • Parse DAIC-WOZ transcripts and simple text formats.
  • Extract metadata (sentiment, tone, speech acts) via OpenAI.
  • Load Utterance and Speaker nodes into Neo4j.
  • Run basic LangGraph agent with planner and router.

Roadmap

  • Add robust error handling for LLM API failures.
  • Implement real graph_tool and vector_tool logic.
  • Enhance agent planning capabilities.
  • Add comprehensive test suite.

Installation

Install the package using uv.

uv pip install helia

Quick Start

Run the agent directly from the command line.

export OPENAI_API_KEY=sk-...
export NEO4J_URI=bolt://localhost:7687
export NEO4J_PASSWORD=password

python -m helia.main "How many interruptions occurred?"

Usage

Parse a transcript file programmatically.

from helia.ingestion.parser import TranscriptParser
from pathlib import Path

parser = TranscriptParser()
utterances = parser.parse(Path("transcript.tsv"))

Extract metadata from utterances.

from helia.analysis.extractor import MetadataExtractor

extractor = MetadataExtractor()
nodes = extractor.extract(utterances)

Load data into Neo4j.

from helia.graph.loader import GraphLoader

loader = GraphLoader()
loader.connect()
loader.load_utterances(nodes)
loader.close()

Contributing

Fork the project and submit a pull request.

License

This project is available as open source under the terms of the MIT License.