mirror of
https://github.com/langgenius/dify-docs.git
synced 2026-03-27 13:28:32 +07:00
Add tools for comparing translation quality between different models (e.g., Sonnet vs Opus) or prompt variations. Useful for evaluating translation improvements before deploying changes. - run_test.py: Test runner with Dify API streaming - compare.py: Generate similarity reports between variants - Example spec and documentation included 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>
580 B
580 B
Model Comparison Test
Background
Compare translation quality between different models (e.g., Sonnet vs Opus) to evaluate improvements in accuracy, style, and punctuation handling.
keys
app-*** Model A (e.g., Sonnet original)
app-*** Model B (e.g., Opus upgraded)
test_file
en/self-host/quick-start/docker-compose.mdx
Conclusion
(Record your findings here after testing, should be filled by AI Agent)
| Variant | Config | Result |
|---|---|---|
| A | Sonnet | |
| B | Opus |
Recommendation: (Your recommendation based on test results)