A pipeline for generating synthetic phone conversation data and fine-tuning local LLMs (via LoRA on Apple Silicon) to act as automated plumbing receptionists.
https://github.com/davidbmar/training · public · shipped
This project provides a complete workflow to train small, efficient language models (3.8B-7B parameters) to handle inbound phone calls for a plumbing business. It uses knowledge distillation where a large teacher model generates realistic conversations, which are then processed into structured training data with slot injection (context, state, tasks). The resulting models are fine-tuned using MLX LoRA adapters to run locally on Mac hardware, offering low-latency, zero-cost inference while maintaining high-quality receptionist behavior across 11 conversation phases.
python3 scripts/normalize_and_split.py python3 scripts/convert_to_chat_templates.py -i data/splits/train.json -f mlx -o data/splits/train_mlx.jsonl python3 scripts/convert_to_chat_templates.py -i data/splits/val.json -f mlx -o data/splits/val_mlx.jsonl pip install mlx-lm python -m mlx_lm.lora --model microsoft/phi-4-mini-instruct --data data/splits/ --train --batch-size 2 --lora-rank 8 --iters 600 --adapter-path adapters/phi4-mini
flowchart TD
A[Scenario Matrix] -->|Defines| B(Raw Conversations)
C[Teacher Model/Claude] -->|Generates| B
B --> D[normalize_and_split.py]
D -->|Stratified Splits| E[train/val/test JSON]
E --> F[convert_to_chat_templates.py]
F -->|MLX Format| G[MLX JSONL Data]
G --> H[MLX LoRA Trainer]
I[Base Model Phi-4] --> H
H -->|Produces| J[LoRA Adapters]
J --> K[Local Phone Agent]
The pipeline is built in Python, leveraging the `mlx-lm` library for efficient fine-tuning on Apple Silicon. It processes raw JSON conversation logs through normalization scripts that map diverse user inputs to standardized FSM states. Data is enriched with structured context blocks ([CONTEXT], [SLOTS], [STATE], [TASK]) to teach the model slot-filling and workflow adherence. The final training data is converted into chat templates compatible with MLX, and LoRA adapters are trained against base models like Phi-4-mini-instruct.
sequenceDiagram
participant Dev as Developer
participant Script as Normalization Script
participant Data as Processed Data
participant Trainer as MLX LoRA Trainer
participant Model as Base LLM
Dev->>Script: Run normalize_and_split.py
Script->>Data: Create stratified train/val/test splits
Dev->>Script: Run convert_to_chat_templates.py
Script->>Data: Convert to MLX chat format
Dev->>Trainer: Execute mlx_lm.lora command
Trainer->>Model: Load base model & adapters
Trainer->>Data: Load training batches
loop Training Iterations
Trainer->>Model: Compute loss & update weights
end
Trainer->>Dev: Save LoRA adapters
Use this pipeline when you need a specialized, domain-specific voice agent that runs entirely on-device without API costs. It is ideal for businesses requiring consistent handling of scheduling, triage, and customer service queries where data privacy and low latency are critical. The modular script structure allows adaptation to other service industries by updating the scenario matrix and slot definitions.
✓ all on main — nothing unmerged.