# Translation Testing Framework A/B testing for Dify translation workflows. Primary user: **Claude Code**. ## Important - **DO NOT commit test results** - `results/` is gitignored - **DO NOT commit real API keys** - always redact with `app-***` before committing - **DO NOT commit mock_docs/** - temporary files copied for testing ## Quick Start ```bash # Setup (first time) ./setup.sh source venv/bin/activate # Run test python run_test.py # Compare results python compare.py results// ``` ## Test Spec Format ```markdown # Test Title ## keys app-xxx Description A app-yyy Description B ## test_content (Inline content - Claude Code generates this for each test) # OR reference existing file: ## test_file en/guides/workflow/some-doc.md ``` ## Workflow 1. User describes test scenario 2. Claude Code creates spec with `## test_content` tailored to the issue 3. Run: `source venv/bin/activate && python run_test.py spec.md` 4. Analyze: `python compare.py results//` 5. **Redact API keys** with `app-***` before committing ## Example: Punctuation Test ```markdown # Punctuation Test ## keys app-*** Sonnet app-*** Opus ## test_content --- title: Test Doc --- # Test Sentence with commas, colons: semicolons; and more. - Item one, comma - Item two; semicolon ``` See `example-model-comparison.md` for a complete example. ## Files | File | Purpose | |------|---------| | run_test.py | Test runner | | compare.py | Generate comparison reports | | example-model-comparison.md | Example test spec | | results/ | Output (gitignored) | | mock_docs/ | Temp test files (gitignored) | ## Language Policy All code and documentation in **English** (international project).