Phone Agent Training Pipeline

A pipeline for generating synthetic phone conversation data and fine-tuning local LLMs (via LoRA on Apple Silicon) to act as automated plumbing receptionists.

https://github.com/davidbmar/training  ·  public  ·  shipped

What it is

This project provides a complete workflow to train small, efficient language models (3.8B-7B parameters) to handle inbound phone calls for a plumbing business. It uses knowledge distillation where a large teacher model generates realistic conversations, which are then processed into structured training data with slot injection (context, state, tasks). The resulting models are fine-tuned using MLX LoRA adapters to run locally on Mac hardware, offering low-latency, zero-cost inference while maintaining high-quality receptionist behavior across 11 conversation phases.

Features

Quickstart

python3 scripts/normalize_and_split.py
python3 scripts/convert_to_chat_templates.py -i data/splits/train.json -f mlx -o data/splits/train_mlx.jsonl
python3 scripts/convert_to_chat_templates.py -i data/splits/val.json -f mlx -o data/splits/val_mlx.jsonl
pip install mlx-lm
python -m mlx_lm.lora --model microsoft/phi-4-mini-instruct --data data/splits/ --train --batch-size 2 --lora-rank 8 --iters 600 --adapter-path adapters/phi4-mini

Architecture

flowchart TD
    A[Scenario Matrix] -->|Defines| B(Raw Conversations)
    C[Teacher Model/Claude] -->|Generates| B
    B --> D[normalize_and_split.py]
    D -->|Stratified Splits| E[train/val/test JSON]
    E --> F[convert_to_chat_templates.py]
    F -->|MLX Format| G[MLX JSONL Data]
    G --> H[MLX LoRA Trainer]
    I[Base Model Phi-4] --> H
    H -->|Produces| J[LoRA Adapters]
    J --> K[Local Phone Agent]

How it's built

The pipeline is built in Python, leveraging the `mlx-lm` library for efficient fine-tuning on Apple Silicon. It processes raw JSON conversation logs through normalization scripts that map diverse user inputs to standardized FSM states. Data is enriched with structured context blocks ([CONTEXT], [SLOTS], [STATE], [TASK]) to teach the model slot-filling and workflow adherence. The final training data is converted into chat templates compatible with MLX, and LoRA adapters are trained against base models like Phi-4-mini-instruct.

How it runs

sequenceDiagram
    participant Dev as Developer
    participant Script as Normalization Script
    participant Data as Processed Data
    participant Trainer as MLX LoRA Trainer
    participant Model as Base LLM
    
    Dev->>Script: Run normalize_and_split.py
    Script->>Data: Create stratified train/val/test splits
    Dev->>Script: Run convert_to_chat_templates.py
    Script->>Data: Convert to MLX chat format
    Dev->>Trainer: Execute mlx_lm.lora command
    Trainer->>Model: Load base model & adapters
    Trainer->>Data: Load training batches
    loop Training Iterations
        Trainer->>Model: Compute loss & update weights
    end
    Trainer->>Dev: Save LoRA adapters

How to apply & reuse

Use this pipeline when you need a specialized, domain-specific voice agent that runs entirely on-device without API costs. It is ideal for businesses requiring consistent handling of scheduling, triage, and customer service queries where data privacy and low latency are critical. The modular script structure allows adaptation to other service industries by updating the scenario matrix and slot definitions.

At a glance

CapabilitiesSynthetic data generationLoRA fine-tuningSlot fillingFSM state trackingLocal inferenceConversation normalization
Componentsnormalize_and_split.pyconvert_to_chat_templates.pybuild_final_training.pyrewrite_with_slots.pyretrain_from_fsm.pyadd_json_responses.py
TechPythonMLXLoRAJSONLApple SiliconPhi-4-mini
Depends onmlx-lmpython3numpyjson
Integrates withphone-agent-schedulerClaude APIGGUF quantization tools
PatternsKnowledge DistillationSlot InjectionFinite State MachineChat Template ConversionStratified Sampling
Reuse tagsllm-trainingvoice-agentapple-siliconmlxlorasynthetic-data

Repo hygiene

✓ all on main — nothing unmerged.