Files
dify-docs/tools/translate-test-dify/example-model-comparison.md
Alter-xyz 4339f79a55 feat: add translation A/B testing framework (#564)
Add tools for comparing translation quality between different models
(e.g., Sonnet vs Opus) or prompt variations. Useful for evaluating
translation improvements before deploying changes.

- run_test.py: Test runner with Dify API streaming
- compare.py: Generate similarity reports between variants
- Example spec and documentation included

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-28 16:02:13 +08:00

580 B

Model Comparison Test

Background

Compare translation quality between different models (e.g., Sonnet vs Opus) to evaluate improvements in accuracy, style, and punctuation handling.

keys

app-*** Model A (e.g., Sonnet original)

app-*** Model B (e.g., Opus upgraded)

test_file

en/self-host/quick-start/docker-compose.mdx

Conclusion

(Record your findings here after testing, should be filled by AI Agent)

Variant Config Result
A Sonnet
B Opus

Recommendation: (Your recommendation based on test results)