feat: add translation A/B testing framework (#564)

Add tools for comparing translation quality between different models (e.g., Sonnet vs Opus) or prompt variations. Useful for evaluating translation improvements before deploying changes. - run_test.py: Test runner with Dify API streaming - compare.py: Generate similarity reports between variants - Example spec and documentation included 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>
2026-03-26 13:18:34 +07:00 · 2025-11-28 00:02:13 -08:00
parent 1b2f5edc6a
commit 4339f79a55
8 changed files with 677 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -128,10 +128,28 @@ SUCCESS: Moved cn/test-file to new location
 SUCCESS: Moved jp/test-file to new location
 ```

+## Translation A/B Testing
+
+For comparing translation quality between models or prompt variations:
+
+```bash
+cd tools/translate-test-dify
+./setup.sh
+source venv/bin/activate
+python run_test.py <spec.md>
+python compare.py results/<folder>/
+```
+
+**Important**:
+- Never commit `results/`, `mock_docs/`, or real API keys
+- Always redact keys with `app-***` before committing
+- See `tools/translate-test-dify/README.md` for details
+
 ## Key Paths

 - `docs.json` - Navigation structure
 - `tools/translate/config.json` - Language configuration (single source of truth)
 - `tools/translate/termbase_i18n.md` - Translation terminology database
 - `tools/translate/sync_and_translate.py` - Core translation + surgical reconciliation logic
+- `tools/translate-test-dify/` - Translation A/B testing framework
 - `.github/workflows/sync_docs_*.yml` - Auto-translation workflow triggers