Multi-Lingual Research Workflow for LangGraph
Research topics often benefit from sources in multiple languages. Medical research in German journals, fashion trends in French publications, technology developments in Chinese sources - a single-language workflow misses these unique insights.
This pattern extends LangGraph research workflows with multi-lingual capabilities: automatic language selection, language-specific researcher agents, cross-language synthesis, and translation with citation preservation.
The Problem
Consider researching “AI regulation approaches worldwide.” English sources provide coverage of US and UK policies, but:
- German sources offer detailed EU regulatory perspectives (GDPR, AI Act)
- Japanese sources cover Asia-Pacific regulatory frameworks
- Chinese sources explain domestic AI governance approaches
A single-language workflow misses 60-70% of the relevant discourse. Traditional solutions - manually translating queries and synthesizing results - are time-consuming and don’t scale.
The Solution
The pattern adds four capabilities to standard LangGraph research workflows:
1. LLM-Based Language Selection
Instead of hardcoding target languages, use an LLM to analyze which languages would provide unique value:
ANALYZE_LANGUAGES_SYSTEM = """Analyze this research topic and recommend which
languages would provide UNIQUE, valuable insights not readily available in English.
GUIDELINES:
- Only recommend languages offering genuinely unique perspectives
- Do NOT recommend a language just because it's widely spoken
- Consider: academic journals, regional expertise, cultural perspectives
- Maximum 3-4 languages unless exceptionally global topic"""
class LanguageRecommendation(BaseModel):
language_code: str = Field(description="ISO 639-1 language code")
rationale: str = Field(description="Why this language adds unique value")
class LanguageAnalysisResult(BaseModel):
recommendations: list[LanguageRecommendation] = Field(max_length=5)This prevents wasted effort on languages that won’t provide unique insights while ensuring relevant language communities aren’t overlooked.
2. Round-Robin Question Distribution
Distribute research questions across language-specific researchers using LangGraph’s Send() API:
def route_supervisor_action(state: DeepResearchState) -> str | list[Send]:
"""Route questions to language-specific researchers using round-robin."""
pending = state.get("pending_questions", [])
language_configs = state["language_configs"]
active_languages = list(language_configs.keys())
researchers = []
for i, question in enumerate(pending):
# Round-robin: question 0 -> lang 0, question 1 -> lang 1, etc.
target_lang = active_languages[i % len(active_languages)]
lang_config = language_configs[target_lang]
researchers.append(
Send("researcher", ResearcherState(
question=question,
language_config=lang_config,
research_findings=[],
))
)
return researchersEach researcher receives a LanguageConfig that influences:
- Search query translation
- Search API locale settings
- Compression prompt language
3. Cross-Language Synthesis
After aggregating findings, synthesize unique insights across language streams:
SYNTHESIS_SYSTEM = """You are synthesizing research findings from multiple languages.
For each language stream, identify:
1. UNIQUE insights not found in other languages
2. Cultural or regional perspectives specific to that language community
3. Consensus across languages (confirms findings)
4. Contradictions requiring resolution
Use format: "According to [Language] sources, ..." when attributing."""
def group_findings_by_language(
findings: list[ResearchFinding],
) -> dict[str, list[ResearchFinding]]:
"""Group research findings by their source language."""
grouped: dict[str, list[ResearchFinding]] = defaultdict(list)
for finding in findings:
lang_code = finding.get("language_code") or "en"
grouped[lang_code].append(finding)
return dict(grouped)This is the key differentiator: identifying what each language community uniquely contributes, rather than just aggregating all findings.
4. Translation with Citation Preservation
Translate final reports while preserving academic formatting:
TRANSLATION_SYSTEM = """Translate the following research report to {target_language}.
CRITICAL REQUIREMENTS:
- Maintain academic tone and precision
- Preserve all citation references exactly as written (e.g., [1], [@AuthorYear])
- Keep direct quotes in original language with translation in parentheses
- Keep proper nouns, technical terms, and acronyms as appropriate
- Maintain paragraph structure and heading hierarchy"""This ensures citations remain traceable and direct quotes preserve their original nuance.
Workflow Structure
flowchart TD Start([START]) Clarify["clarify_intent"] Brief["create_brief"] Analyze["analyze_languages"] Search["search_memory"] Supervisor["supervisor"] Researchers["Researchers (round-robin by language)"] Aggregate["aggregate_findings"] Synthesize["synthesize_languages"] Report["final_report"] Translate["translate_report"] End([END]) Start --> Clarify --> Brief Brief -->|"multi_lingual=True"| Analyze Brief -->|else| Search Analyze --> Search Search --> Supervisor Supervisor --> Researchers --> Aggregate --> Supervisor Aggregate -->|"if multi_lingual"| Synthesize --> Report Aggregate -->|else| Report Report -->|"if translate_to"| Translate --> End Report --> End
Usage Examples
# Auto-detect valuable languages
result = await run_research(
topic="Climate change policies and public perception",
multi_lingual=True, # LLM will recommend languages
translate_to="en",
)
# Explicit language selection
result = await run_research(
topic="Supply chain disruptions in automotive industry",
multi_lingual=True,
target_languages=["de", "ja", "zh"],
translate_to="en",
)
# Single-language research with translation
result = await run_research(
topic="AI regulation in Japan",
language="ja",
translate_to="en",
preserve_quotes=True, # Keep Japanese quotes with translations
)Trade-offs
Benefits:
- Access language-specific sources unavailable in English
- Cross-cultural synthesis identifies consensus and cultural differences
- Flexible modes: single-language, auto-select, or explicit language lists
- Citation preservation maintains academic integrity through translation
Costs:
- Increased latency from language analysis, synthesis, and translation
- Higher LLM costs (3-4x for multi-lingual vs single-language)
- Translation quality varies by domain specialization
When to Use This Pattern
Good fit:
- Research topics span multiple cultural or regional contexts
- Non-English sources provide unique expertise (academic journals, local news)
- Cross-cultural comparison is valuable
- Final output needs translation while preserving formatting
Poor fit:
- Topic is well-covered in English (no unique insights elsewhere)
- Speed is critical (adds 30-50% latency)
- Single target audience with known language preference