Six stages between a real conversation and a recovery card.
Each stage shipped its own depth before we could call the loop closed. The hard parts —
Devanagari preserved exactly as spoken, code-mixed Hindi · Marathi · English in one pass,
diarisation that works at the noise floor of an Indian retail counter — were our own builds,
not wrappers around someone else’s API.
01CaptureRover device · far-field mic · always-on
A purpose-built device on the counter and a wearable in the field. The hard constraint:
zero behaviour change from staff. No app, no login, no badge to tap. Audio goes up the
moment the device sees a network — including the cellular fallback we keep on for stores
with patchy wifi.
What we built
Far-field mic array tuned for the 60–90cm distance at a retail counter
Per-device acoustic calibration — no two stores sound the same
Cellular fallback so stores with patchy wifi do not drop hours of capture
Local encryption on the device; nothing leaves the floor unencrypted
02AssembleConversation windows · pause + proximity
Our AI infers the boundary of every customer conversation from pause and proximity
in the signal — not from uniform clock windows. A 17-minute interaction at one counter and a
38-second one at the next stay separate, with their own speaker assignments and event
tags, even if they share a shift.
What we built
Conversation boundaries inferred from the signal, not from clock buckets
Same-counter continuity across shifts and devices
Per-shift staff handoff that doesn’t lose speaker identity
03TranscribeHindi · Marathi · English · code-mixed
Devanagari is preserved exactly as spoken. Multilingual code-switching inside a single
utterance — “sir, paracetamol नहीं है, but ये देख लीजिए” — is handled in one
pass. No language toggle. Our AI transcription pipeline routes between primary and
fallback models on confidence, not on language hints.
What we built
Code-mixed Hindi · Marathi · English in a single forward pass
Devanagari preserved as Devanagari — never transliterated to Latin
Multi-stage routing: primary model + fallback + confidence gate
Per-segment confidence used downstream — uncertain turns are flagged, not silently shipped
What we see · spectrogram of one counter conversation
A frequency-vs-time view of a real conversation our pipeline ingested. The bright
vertical bands are speech; the warm-toned high-frequency clusters are sibilants and
consonants — the parts that carry word identity. The clarity of these bands
under our denoising is what makes Devanagari transcription possible at the noise floor
of an Indian retail counter.
Why we built our own transcription
We tested four of the major transcription vendors before deciding to build our own.
One transliterated Devanagari into Latin script, losing the language.
One missed Marathi entirely, defaulting to English. One
hallucinated language under noise — inventing words on long Hindi clips.
None of them surfaced the confidence signal we needed to know when to ask for human
review.
We keep the head-to-head numbers internal so we don’t end up in a comparison war.
Ask us on a call — we’ll show you the receipts.
04DiariseStaff vs. customer, per turn
Our diarisation AI attributes each turn to the right person. Staff are recognised by voice
embedding across conversations and shifts. Customers stay anonymous. No enrolment
ritual — staff don’t have to read a script into a microphone to be identified.
What we built
Our own diarisation pipeline — purpose-built for noisy, multi-speaker counter audio
Staff recognised by voice across conversations; customers stay anonymous
No enrolment ritual — staff change nothing about how they work
When the audio is ambiguous, we ask before we answer — your team only sees what we’re sure of
Watch the pipeline split a conversation in real time
A real customer conversation, split into staff and customer turns as the audio plays.
The red tag at the top marks where our retail-event taxonomy flagged a stockout
with no recovery — the kind of moment that walks revenue out the door.
05ExtractBusiness events — not topics
Our extraction AI doesn’t hand operators a topic cloud. It surfaces specific business
events against a hand-built retail taxonomy: stockout uncoached · bounce unhandled
· substitution offered · prescription check missed · return handled well
· cross-sell success. Each one ships with the exact line of transcript that triggered it.
What we built
A taxonomy of retail conversation events, not generic NLP topic clusters
Every event ships with the exact line of evidence behind it — no ‘trust us’ signals
Lost-revenue valuation joined against real POS history
We don’t ship a moment to your team unless we’re sure
Where we’re unsure, we say so — your team confirms with one tap and the system gets better
06CoachInbox → practice → measure
Every event becomes a 90-second practice rep that our AI generates from the actual
conversation — not a generic scenario. The same evaluator runs on practice and
production, so improvement on the rep is measurable on the next month’s real
conversations.
What we built
Per-event remediation paths into practice scenarios
Same evaluator on practice and production — one standard, one feedback loop
Skill drift tracked across staff, stores, and cohorts
Built on six years of teaching 10,000+ professionals what actually changes behaviour in front of a customer
Where the intelligence shows up
Three output surfaces.
The pipeline produces intelligence that lands in three places, depending on who needs it and when.
Web dashboard — operations manager
Opens Monday morning. Fleet view → store view → conversation view. Search by time, store,
staff member, moment type, or language. Every number links back to a clip and a transcript line.
app.ostronaut.ai/todayLive
TodayMapStoresCoachingCalendar
₹11,800 at risk across 3 stores today.
The one you can fix in 15 minutes: Priya at Virar West.
All 10 storesChain leadership viewPharma · PM shift
Incriminating evidence · What to do tonight
Coach Priya Sharma to introduce alternative recovery before her 10:00 am shift. One walk-in this afternoon cost ₹3,700.
Open coaching planListen to the walk-in
Top 3 stores to act on todayAs of 14:32
Money first. Named accountability. Top-3 stores at risk, the one you can fix in 15 minutes.
MCP server — your AI talks to ours
Query our intelligence store from any AI assistant that supports the Model Context Protocol.
Ask natural-language questions over structured retail data. Your audio never leaves Ostronaut.
# Ask your AI assistant, connected to Ostronaut MCP →Show me every conversation last week where a customer asked about a brand we don’t carry, ranked by store.
# Ostronaut MCP returns structured events, not text summaries ←14 conversations · 4 stores · top brand gap: Dolo 650 (9 mentions) · Store 3 leads
Your data stays in our store. The MCP surface gives your existing AI assistant structured access — no new dashboards required.
Coaching artifact + practice rep — staff member
The staff member who lived a missed moment gets a 90-second practice rep on their phone,
tied to the exact conversation. Built from the real audio — not a generic objection-handling
module — and delivered before the next shift.
Priya Sharma (staff)Customer (anonymous)Hindi · Marathi · English
14:32:08
Customerपैन्टकाइंड 40 है क्या?Brand requestPentakind 40
14:32:14
Priyaनहीं दीदी, स्टॉक ख्बह गया।StockoutNo alternative offered
14:32:22
Customerठीक है, देख्या दुसरी दुकान से।Walk-out₹3,700 lost
Every coaching rep ties back to a specific moment in a real conversation.
What the coaching loop actually looks like
A four-stop journey, every Monday morning.
Money at risk in the day → drill into the worst store → open the staff member’s scorecard for the 1:1 → send the practice rep tied to the recurring pattern. One loop, four stops, every morning.
app.ostronaut.ai/todayLive
TodayMapStoresCoaching
₹11,800 at risk across 3 stores today.
Fix in 15 min: Priya at Virar West.
Incriminating evidence
Coach Priya Sharma on alternative recovery before her 10:00 am shift. One walk-in cost ₹3,700.
Open planListen
Top 3 stores to act on today14:32
01 · SeeOpen the day. ₹11,800 at risk, named to the staff who can fix it before the next shift.
app.ostronaut.ai/stores/mumbai-westLive
TodayMapMumbai WestStaff
Mumbai West
₹3,700 walked out today · 4 conversations · 1 staff
₹3,700
Today
12
This week
82%
No alt rate
3
Themes
Conversations that leakedAs of 14:32
14:32 · Priya₹3,700
Customer asked for 4 medicines back-to-back — none with alternatives offered.
02 · DrillOpen the worst store. Conversation by conversation, see exactly where revenue walked out — with the staff and the moment attached.
app.ostronaut.ai/staff/priya-sharmaLive
TodayStoresPriya SharmaThemes
Priya Sharma
Counter staff · Virar West · 14 shifts this month
₹9,400
At risk · 30d
52
Moments
14%
Alt offered
3
Reps done
For your next 1:13 items
Top theme · 30d52
Stockout with no alternative offered — 52 moments, 78% of Virar West total.
03 · PrepOpen the staff member’s scorecard — the same view the manager pulls up in the 1:1. Trend, top themes, what to focus this week.
04 · CoachOpen the recurring pattern. Send the 90-second practice rep, built from the actual conversations — delivered before the staff’s next shift.
Coaching is one branch of the intelligence layer — shown last on purpose. The same platform
that surfaces a missed moment can wrap it into a 90-second practice rep that lands with the
staff member who lived it. Built on six years of teaching 10,000+ professionals what actually
changes behaviour in front of a customer.
Why this matters
We did not wrap an API.
Off-the-shelf transcription doesn’t handle Devanagari without transliterating it.
Generic diarisation tools fall apart when two voices overlap at a busy counter. And nobody
else has a retail event taxonomy — they hand you topics, not moments. Each stage of our
pipeline is its own build, joined to a real POS ledger and a real coaching loop. The numbers
on the home page — 25,078 transcribed turns, 3,044 coachable moments, ₹1.64L caught —
are what falls out of that stack running on real audio from a real Indian retail floor.