-
Notifications
You must be signed in to change notification settings - Fork 12
Adding MCP Evals with Opik #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
also change parquetpath to path and add .parquet suffix to the path in config
KaliszS
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need instructions to run those tests locally without actions. So in order to achieve taht we need to change .env.example and README.
| dev = [ | ||
| "fastapi>=0.116.2", | ||
| "opik>=1.8.56", | ||
| "pydantic-ai>=1.0.10", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pydantic-ai is for the agent used in opik tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generated from (old) template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generated from gist
This pull request adds evaluations and tests for this server using Opik. The tests are run on a prepared dataset and include:
query_tests.py)e2e_tests.py)opik/tool_calls.py): evaluating answers based on a set of questions in Opik and judging them on metrics like hallucination or answer relevancyAs an addition, unit tests and opik tests are added to Github Actions, however an Opik API key and workspace name need to be set in secrets
Opik experiment result example:

Actions results:


It is also possible to show the results of individual tests instead of averages in the pipeline (for Opik)