Back to Case Studies
Conversational AI Platform (NDA, North America)
Conversational AI / SaaS

Conversational AI Platform (NDA, North America)

A North American conversational AI company needed 40 languages in 8 weeks for their international launch. We delivered in 6, including 12 languages their incumbent LSP had declined, at 38% under the tier-1 quote.

6 weeks
84 specialists across 40 language pods
Conversational AI / SaaS
Conversational AI Platform (NDA, North America)

Key Results

Languages delivered40
Words translated1.2M+
System prompt variants adapted14,000+
Safety templates per locale8,000+
Test conversations evaluated20,000+
Avg. human eval score4.6 / 5
MT baseline score3.1 / 5
Cost vs incumbent quote38% lower
Critical post-launch bugs0
Low-resource languages12
"Our incumbent quoted 14 weeks for 28 languages. Saytica delivered 40 in 6, including ones we'd been told weren't possible. What surprised me more was the evaluation work. Three jailbreaks in low-resource languages caught before they reached production. Honestly, that's not work we knew to ask for."
M. Chen
Head of Localization

The Challenge

The client is a North American conversational AI company. Their Series B closed in early 2026, and the lead investor wanted international expansion fast, before two well-funded competitors got there. The board signed off on a target: 40 languages in 8 weeks, covering EMEA, APAC, and LATAM. Their internal localization team was two people. The incumbent LSP, a tier-1 agency they'd used for marketing translation, came back with a 14-week timeline and a list of 28 languages they could confidently deliver. The remaining 12, including Sylheti, Khmer, Amharic, Pashto, Burmese, and Tigrinya, were either declined or quoted at premium rates with no quality guarantee. The harder problem showed up in beta testing. Machine-translated system prompts made the assistant feel cold in Japanese, oddly formal in Brazilian Portuguese, and culturally off in three other markets. Refusal templates, the "I can't help with that" messages, translated literally and came out rude in Japanese and excessively apologetic in German. None of this was anyone's fault exactly. It just wasn't work their LSP knew how to do.

Our Solution

We started with a hard look at the workflow. The standard agency pattern, translate then review then QA in sequence, was never going to hit six weeks for forty languages. So we ran them all in parallel. Forty language pods, each with a senior translator, a reviewer, and an AI-context specialist (usually a linguist with prompt engineering or model evaluation experience). Product surface work, UI strings, help docs, marketing pages, error messages, ran continuously and synced to the client's repo daily through API. This part most modern LSPs can do. The behavior layer was where the actual work happened. We didn't translate system prompts word-for-word. The Japanese pod rewrote the assistant's default formality level because neutral keigo is technically polite but lands as distant. The Brazilian Portuguese pod swapped formal "vocĂȘ" patterns for warmer regional phrasing in onboarding. The Arabic pod sorted out gender agreement and RTL UI rendering together, because the assistant's text was breaking visually in mixed-language conversations and nobody had flagged it. For evaluation, each pod ran a 500-conversation test set scored by native raters on a 5-point rubric covering naturalness, cultural fit, and safety behavior. The MT baseline averaged 3.1 across languages. Our localized output averaged 4.6. More importantly, three jailbreak prompts that bypassed safety in machine-translated Bengali, Pashto, and Burmese got caught and patched before launch. Those weren't on anyone's deliverable list. They were things our linguists noticed and flagged. The 12 languages the incumbent declined went to our in-country specialist network. Native Sylheti reviewers in Sylhet. Amharic linguists in Addis Ababa. Pashto speakers in Peshawar. Vetted teams with credentials we can vouch for, not freelancer marketplace work. All 40 languages delivered in six weeks, two weeks ahead of the client's deadline.

Technologies Used

Phrase TMSLokaliseCustom AI evaluation rubricICU MessageFormatRTL/LTR rendering QAContinuous localization APINative rater scoring platformMulti-script subtitle tooling

The Result

40 languages in 6 weeks, 38% under incumbent quote

Ready to achieve similar results?

Let's discuss how we can help your business succeed with our localization and data solutions.