Multilingual AI Quality Evaluation Specialist
The world’s most popular audio streaming subscription service is looking for a Multilingual AI Quality Evaluation Specialist to join the band in a consultant assignment. The client has transformed music listening forever when launched in 2008.
Period: ASAP to 2026-06-14 (full-time), with a possibility of extension.
About the role
We’re looking for an Multilingual AI Quality Evaluation Specialist to define, test, and continuously improve client’s multilingual AI quality standards.
You’ll design and optimize evaluation frameworks, datasets, and scoring methodologies that power QUAIL (our Quality Assessment AI for Language) and MAP the Multilingual AI Portal. This role bridges localization expertise, quality data analysis, and language evaluation frameworks. You’ll translate business and content goals into structured evaluation logic, ensuring that every AI-generated or AI-translated output across the company is accurate, fluent, culturally relevant, and fit for purpose. This role bridges linguistic expertise, quality science, and evaluation design, turning linguistic nuance and business intent into measurable, automatable evaluation logic.
By joining this team, you will shape the evaluation intelligence layer that underpins client’s multilingual AI ecosystem.
Your work ensures that AI outputs are linguistically accurate, culturally adaptive, and explainably evaluated, directly influencing the experience of hundreds of millions of global users.
What you'll do
- Build and implement evaluation methodologies across multilingual settings and content types
- Develop and validate multilingual evaluation rubrics aligned with QUAIL’s multi-metric architecture (accuracy, fluency, tone, compliance, factuality).
- Design calibration studies comparing QUAIL’s LLM judgments with human-rated benchmarks to ensure scoring reliability and explainability.
- Define sampling and scoring protocols (human-in-the-loop validation, confidence thresholds, correlation metrics).
- Collaborate with ML engineers to train and fine-tune evaluators using gold datasets and human-annotated examples; contribute to the synthetic data generation pipeline (template-based data creation and validation).
- Analyze model outputs and error patterns, using QUAIL’s scoring results to identify quality gaps and update routing or prompt logic in MAP.
- Partner cross-functionally with Localization, PZN, and GLEE to ensure consistent language quality signals across client’s agentic products.
- Partner with engineers to implement feedback loops that iteratively improve model accuracy and cultural fit.
- Define and document language-specific quality guidelines, thresholds, and evaluation protocols for internal use.
- Measure LLM evaluator stability and bias across locales and continuously improve prompt instructions for fairness and accuracy.
Who you are
- Background in Language Quality Evaluation, Applied Linguistics, Computational Linguistics, or Language Quality Research.
- Experience in LLM evaluation, machine translation evaluation, or linguistic annotation pipelines acrros multilingual settings.
- Strong understanding of linguistic or translation evaluation frameworks and metrics such as MQM, MetricX, or COMET; hands-on experience designing multidimensional rubrics or scoring schemes.
- Demonstrated understanding of GenAI evaluation methods: prompt testing, model calibration, and score validation.
- Experience collaborating with ML teams on training linguistic data pipelines, annotation workflows, or fine-tuning evaluators.
- Deep linguistic and cultural sensitivity across multiple locales; able to articulate what “fit for purpose” quality means per content type.
- Bonus: experience with programmatic QA, e.g., rule-based or API-driven validation using Python, YAML, or gRPC-based systems.
- Bonus: experience with inter-rater reliability measurement (e.g., Krippendorff’s alpha, Cohen’s κ) or human–AI agreement studies
We are Market Partner
Market Partner is proud to be an equal opportunity employer. You are welcome to our community regardless of who you are, no matter where you come from, or what you look like. We apply ongoing selection and may fill the position as soon as we find the right candidate.
- Avdelning
- Audio streaming
- Locations
- Stockholm
- Remote status
- Hybrid
Stockholm
About Market Partner
I över 10 år har Market Partner utvecklat olika företags verksamheter genom att erbjuda skräddarsydda kundlösningar inom Projektledning, Affärsutveckling, Rekrytering och Utbildning inom IT & Telecom.
Already working at Market Partner?
Let’s recruit together and find your next colleague.