mirror of
https://github.com/open-webui/docs.git
synced 2026-01-04 10:46:26 +07:00
chore: format
This commit is contained in:
@@ -3,7 +3,6 @@ sidebar_position: 6
|
||||
title: "📝 Evaluation"
|
||||
---
|
||||
|
||||
|
||||
## Why Should I Evaluate Models?
|
||||
|
||||
Meet **Alex**, a machine learning engineer at a mid-sized company. Alex knows there are numerous AI models out there—GPTs, LLaMA, and many more—but which one works best for the job at hand? They all sound impressive on paper, but Alex can’t just rely on public leaderboards. These models perform differently depending on the context, and some models may have been trained on the evaluation dataset (sneaky!). Plus, the way these models write can sometimes feel … off.
|
||||
@@ -15,7 +14,7 @@ That's where Open WebUI comes in. It gives Alex and their team an easy way to ev
|
||||
- **Why evaluations matter**: Too many models, but not all fit your specific needs. General public leaderboards can't always be trusted.
|
||||
- **How to solve it**: Open WebUI offers a built-in evaluation system. Use a thumbs up/down to rate model responses.
|
||||
- **What happens behind the scenes**: Ratings adjust your personalized leaderboard, and snapshots from rated chats will be used for future model fine-tuning!
|
||||
- **Evaluation options**:
|
||||
- **Evaluation options**:
|
||||
- **Arena Model**: Randomly selects models for you to compare.
|
||||
- **Normal Interaction**: Just chat like usual and rate the responses.
|
||||
|
||||
@@ -42,7 +41,7 @@ One cool feature? **Whenever you rate a response**, the system captures a **snap
|
||||
|
||||
### Two Ways to Evaluate an AI Model
|
||||
|
||||
Open WebUI provides two straightforward approaches for evaluating AI models.
|
||||
Open WebUI provides two straightforward approaches for evaluating AI models.
|
||||
|
||||
### **1. Arena Model**
|
||||
|
||||
@@ -51,7 +50,7 @@ The **Arena Model** randomly selects from a pool of available models, making sur
|
||||
How to use it:
|
||||
- Select a model from the Arena Model selector.
|
||||
- Use it like you normally would, but now you’re in “arena mode.”
|
||||
|
||||
|
||||
For your feedback to affect the leaderboard, you need what’s called a **sibling message**. What's a sibling message? A sibling message is just any alternative response generated by the same query (think of message regenerations or having multiple models generating responses side-by-side). This way, you’re comparing responses **head-to-head**.
|
||||
|
||||
- **Scoring tip**: When you thumbs up one response, the other will automatically get a thumbs down. So, be mindful and only upvote the message you believe is genuinely the best!
|
||||
@@ -67,7 +66,7 @@ Need more depth? You can even replicate a [**Chatbot Arena**](https://lmarena.ai
|
||||
|
||||
### **2. Normal Interaction**
|
||||
|
||||
No need to switch to “arena mode” if you don't want to. You can use Open WebUI normally and rate the AI model responses as you would in everyday operations. Just thumbs up/down the model responses, whenever you feel like it. However, **if you want your feedback to be used for ranking on the leaderboard**, you'll need to **swap out the model and interact with a different one**. This ensures there's a **sibling response** to compare it with – only comparisons between two different models will influence rankings.
|
||||
No need to switch to “arena mode” if you don't want to. You can use Open WebUI normally and rate the AI model responses as you would in everyday operations. Just thumbs up/down the model responses, whenever you feel like it. However, **if you want your feedback to be used for ranking on the leaderboard**, you'll need to **swap out the model and interact with a different one**. This ensures there's a **sibling response** to compare it with – only comparisons between two different models will influence rankings.
|
||||
|
||||
For instance, this is how you can rate during a normal interaction:
|
||||
|
||||
@@ -95,7 +94,7 @@ When you rate chats, you can **tag them by topic** for more granular insights. T
|
||||
Open WebUI tries to **automatically tag chats** based on the conversation topic. However, depending on the model you're using, the automatic tagging feature might **sometimes fail** or misinterpret the conversation. When this happens, it’s best practice to **manually tag your chats** to ensure the feedback is accurate.
|
||||
|
||||
- **How to manually tag**: When you rate a response, you'll have the option to add your own tags based on the conversation's context.
|
||||
|
||||
|
||||
Don't skip this! Tagging is super powerful because it allows you to **re-rank models based on specific topics**. For instance, you might want to see which model performs best for answering technical support questions versus general customer inquiries.
|
||||
|
||||
Here’s an example of how re-ranking looks:
|
||||
|
||||
Reference in New Issue
Block a user