AI-Powered Sales Call Coaching at Scale
Sales leaders needed a repeatable way to evaluate call quality across teams. Manual review was slow, subjective, and impossible to scale. We built a platform that ingests transcripts, scores them against custom playbooks, and delivers coaching feedback automatically.
Business Impact
Every call scored against the team playbook, not just a random sample
Business Impact
Executive Outcomes
From sampling ~10% to reviewing every conversation
Manual QA and coaching prep eliminated
Concept to production multi-tenant SaaS
Weekly QA labor eliminated, freeing budget for coaching and closing
The Challenge
“A sales coaching company needed an automated system for call quality assurance. Their team was reviewing calls manually, making coaching inconsistent and difficult to scale across organizations. Transcripts came from multiple sources (Fathom, Fireflies, direct webhooks) with no unified processing pipeline.”
Sales calls were reviewed manually, making coaching feedback slow, subjective, and impossible to scale across teams
Coaching quality varied wildly between reviewers, with no shared standard or playbook enforcement
Transcripts arrived from Fathom, Fireflies, and direct webhooks with no unified processing pipeline
No multi-tenant isolation between client organizations, creating security and data boundary risks
Usage billing and budget controls did not exist, making LLM costs unpredictable as client volume grew
The Transformation
What changed after we built the system
Sales calls were reviewed manually, making coaching feedback slow, subjective, and impossible to scale across teams
AI evaluates every call against custom playbooks and delivers structured coaching automatically
Coaching quality varied wildly between reviewers, with no shared standard or playbook enforcement
Consistent, playbook-grounded scoring across every team and organization using RAG-based evaluation
Transcripts arrived from Fathom, Fireflies, and direct webhooks with no unified processing pipeline
A single ingestion layer normalizes transcripts from all sources into a shared format with dedup and validation
No multi-tenant isolation between client organizations, creating security and data boundary risks
Full multi-tenant isolation with per-organization data boundaries and API security at every endpoint
Usage billing and budget controls did not exist, making LLM costs unpredictable as client volume grew
Stripe-synced tiered billing with OpenRouter key management and per-organization usage limits
Why 13 packages for a team of one
When you build a multi-tenant platform, the temptation is to move fast with a single package. Everything in one place, easy to navigate, ship quickly.
The problem shows up at integration boundaries. Billing logic bleeds into analysis code. Auth middleware gets coupled to transcript parsing. A change in the playbook ingestion pipeline accidentally breaks the webhook handler.
Splitting into 13 packages forces explicit contracts between modules. The billing package cannot import from the AI package without declaring the dependency. This made the system safe to modify at speed, which matters when a solo engineer is shipping 496 commits in 8 weeks.
How We Built It
Technical architecture for the curious
API
Type-safe API layer with strict validation at every boundary and tenant-aware authentication.
AI Pipeline
Playbook content is chunked, embedded, and stored in pgvector. RAG retrieval drives structured call evaluations.
Orchestration
Background jobs handle analysis with synchronous fallback paths for resilience when job infrastructure degrades.
Data
Single database for relational data and vector search. 13-package monorepo with explicit module boundaries.
Billing
Stripe sync with tier-change verification and predictable usage limits per organization.
Engineering Decisions
Tradeoffs we made and why
13-package Turborepo monorepo instead of a single package
Benefit
Each module (auth, billing, AI, ingestion) has clear boundaries and can be tested independently
Cost
Higher initial setup complexity and longer CI build times for a solo engineer
pgvector for playbook embeddings instead of a dedicated vector database
Benefit
Single database for both relational data and vector search simplifies operations and deployment
Cost
PostgreSQL vector search is slower than Qdrant or Pinecone at very high embedding volumes
Synchronous fallback paths alongside Trigger.dev async
Benefit
System remains functional even when background job infrastructure is degraded
Cost
Duplicate code paths that must stay in sync during updates
oRPC over tRPC for the API layer
Benefit
Better compatibility with Hono middleware and non-Next.js API consumers
Cost
Smaller ecosystem and fewer community plugins compared to tRPC
Explicit payment authorization verification on tier changes
Benefit
Catches race conditions between subscription events and budget updates before they cause billing errors
Cost
Slightly slower checkout flow due to additional verification round-trip
Certain client names, proprietary workflows, screenshots, and internal assets referenced in this case study are protected under a non-disclosure agreement and have been anonymized or omitted to comply with our confidentiality obligations.
Need a multi-tenant platform built fast?
Book a free 30-minute call. We will walk through your requirements, identify the architectural decisions that matter, and scope a realistic delivery timeline.
30 minutes with Apurva. Not a sales call.
Book Your Free Audit