diff --git a/docs/cli/index.rst b/docs/cli/index.rst deleted file mode 100644 index 4fc4440881..0000000000 --- a/docs/cli/index.rst +++ /dev/null @@ -1,12 +0,0 @@ -Command-line interface -====================== - -.. click:: giskard.cli:cli - :prog: giskard - :nested: full - -.. toctree:: - :maxdepth: 1 - :hidden: - - ngrok/index \ No newline at end of file diff --git a/docs/cli/ngrok/index.rst b/docs/cli/ngrok/index.rst deleted file mode 100644 index b17dc7c887..0000000000 --- a/docs/cli/ngrok/index.rst +++ /dev/null @@ -1,43 +0,0 @@ -Setup a :code:`ngrok` account -============================= - -In order to expose the Giskard Hub to the internet, you would need to perform the following steps - -1. Sign up `here `__ -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -You will be prompted by the following: - -.. image:: ../../assets/ngrok_aut.png - :width: 400 - -You would need to have either :code:`Google Authenticator` or :code:`1Password` on your phone to generate codes. - -2. Generate an API key `here `__ -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Copy the following key: - -.. image:: ../../assets/ngrok_aut2.png - :width: 400 - - -3. Expose the giskard hub -^^^^^^^^^^^^^^^^^^^^^^^^^ -Now you can run :code:`giskard hub expose --ngrok-token ` which should prompt you with the following instructions::: - - Exposing Giskard Hub to the internet... - Giskard Hub is now exposed to the internet. - You can now upload objects to the Giskard Hub using the following client: - - token=... - client = giskard.GiskardClient("", token) - - # To run your model with the current Python environment on Google Colab, execute these lines: - - %env GSK_API_KEY=... - !giskard worker start -d -u --name - - # To let Giskard Hub run your model in a managed Python environment, execute these lines: - - %env GSK_API_KEY=... - !giskard worker start -s -u --name - diff --git a/docs/getting_started/index.md b/docs/getting_started/index.md index e1d82c4601..85d0ad9783 100644 --- a/docs/getting_started/index.md +++ b/docs/getting_started/index.md @@ -4,17 +4,17 @@ Giskard is a **holistic Testing platform for AI models** to control all 3 types It addresses the following challenges in AI testing: -* Edge cases in AI are **domain-specific** and often seemingly **infinite**. -* The AI development process is an experimental, **trial-and-error** process where quality KPIs are multi-dimensional. -* Generative AI introduces new **security vulnerabilities** which requires constant vigilance and adversarial red-teaming. -* AI compliance with new regulations necessitate that data scientists write **extensive documentation**. +- Edge cases in AI are **domain-specific** and often seemingly **infinite**. +- The AI development process is an experimental, **trial-and-error** process where quality KPIs are multi-dimensional. +- Generative AI introduces new **security vulnerabilities** which requires constant vigilance and adversarial red-teaming. +- AI compliance with new regulations necessitate that data scientists write **extensive documentation**. Giskard provides a platform for testing all AI models, from tabular ML to LLMs. This enables AI teams to: + 1. **Reduce AI risks** by enhancing the test coverage on quality & security dimensions. 2. **Save time** by automating testing, evaluation and debugging processes. 3. **Automate compliance** with the EU AI Act and upcoming AI regulations & standards. - ## Giskard Library (open-source) An **open-source** library to scan your AI models for vulnerabilities and generate test suites automatically to aid in the Quality & Security evaluation process of ML models and LLMs. @@ -26,39 +26,16 @@ To help you solve these challenges, Giskard library helps to: - **Scan your model to find hidden vulnerabilities automatically**: The `giskard` scan automatically detects vulnerabilities such as performance bias, hallucination, prompt injection, data leakage, spurious correlation, overconfidence, etc. -

+

- - **Instantaneously generate domain-specific tests**: `giskard` automatically generates relevant, customizable tests based on the -vulnerabilities detected in the scan. + vulnerabilities detected in the scan.

- - **Integrate and automate** testing of AI models in **CI/CD** pipelines by leveraging native `giskard` integrations.

- Get started **now** with our [quickstart notebooks](../getting_started/quickstart/index.md)! ⚡️ - -## Giskard Hub - -An Enterprise Hub for teams to collaborate on top of the open-source Giskard library, with interactive testing dashboards, debugging interfaces with explainability & human feedback, and secure access controls for compliance audits. - -- 🔍 **Debug** your issues by inspecting the failing examples of your tests (⬇️ see below the DEBUG button) -

- ![](../assets/test_suite_tabular.png) - -- 📖 Leverage the Quality Assurance best practices of the most advanced ML teams with a centralized **catalog** of tests -

- ![](../assets/catalog.png) - -- 💡 Create hundreds of domain-specific tests thanks to **automated model insights** (⬇️ see below the bulbs 💡). -

- ![](../assets/push.png) - -- 💬 **Collect business feedback** and **share your model results** with data scientists, QA teams and auditors. -

- ![](../assets/credit_scoring_comment.png) diff --git a/docs/getting_started/quickstart/quickstart_llm.ipynb b/docs/getting_started/quickstart/quickstart_llm.ipynb index a092edfd7f..1668626b39 100644 --- a/docs/getting_started/quickstart/quickstart_llm.ipynb +++ b/docs/getting_started/quickstart/quickstart_llm.ipynb @@ -2928,168 +2928,6 @@ "test_suite = full_report.generate_test_suite(name=\"Test suite generated by scan\")\n", "test_suite.run()" ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false, - "id": "xrCXjxV3DlLA" - }, - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that covers a first layer of potential vulnerabilities for your LLM. From here, we encourage you to boost the coverage rate of your tests to anticipate as many failures as possible for your model. The base layer provided by the scan needs to be fine-tuned and augmented by human review, which is a great reason to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just fine-tuning tests, the Giskard Hub allows you to:\n", - "\n", - "* Compare models and prompts to decide which model or prompt to promote\n", - "* Test out input prompts and evaluation criteria that make your model fail\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false, - "id": "QiV8EoApDlLA" - }, - "source": [ - "Here's a sneak peek of the fine-tuning interface proposed by the Giskard Hub:" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "UEd30Uj_Xi3_" - }, - "source": [ - "![](../../_static/test_suite_example.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Adding persistence to our Giskard Model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To work with the Giskard Hub we need to be able to save and load the model, so that we can upload it and store. The `giskard.Model` class handles this automatically in most cases, but for more complex models you might need to do a bit of refactoring to get it working smoothly. This is especially the case if your model includes a custom index or other custom objects that are not easily serializable.\n", - "\n", - "In our case, we are using the FAISS index to retrieve the documents, and we need to tell Giskard how to save and load it. Luckily, Giskard provides a simple but powerful way to customize the model wrapper by extending the `giskard.Model` class. To make sure that we can save to disk out model, we will need to implement `save_model` and `load_model` method to save and load both the RetrievalQA and the FAISS index:" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [], - "source": [ - "from pathlib import Path\n", - "from langchain.chains import load_chain\n", - "\n", - "\n", - "class FAISSRAGModel(giskard.Model):\n", - " def model_predict(self, df: pd.DataFrame):\n", - " # Same as our model_predict function above, but now using self.model,\n", - " # which we pass upon initialization.\n", - " return [self.model.run({\"query\": question}) for question in df[\"question\"]]\n", - "\n", - " def save_model(self, path: str, *args, **kwargs):\n", - " \"\"\"Saves the model object to the given directory.\"\"\"\n", - " out_dest = Path(path)\n", - "\n", - " # Save the langchain RetrievalQA object\n", - " self.model.save(out_dest.joinpath(\"model.json\"))\n", - "\n", - " # Save the FAISS-based retriever\n", - " db = self.model.retriever.vectorstore\n", - " db.save_local(out_dest.joinpath(\"faiss\"))\n", - "\n", - " @classmethod\n", - " def load_model(cls, path: str, *args, **kwargs):\n", - " \"\"\"Loads the model object from the given directory.\"\"\"\n", - " src = Path(path)\n", - "\n", - " # Load the FAISS-based retriever\n", - " db = FAISS.load_local(src.joinpath(\"faiss\"), OpenAIEmbeddings())\n", - "\n", - " # Load the chain, passing the retriever\n", - " chain = load_chain(src.joinpath(\"model.json\"), retriever=db.as_retriever())\n", - " return chain" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we can wrap our model function as above, but using our custom model class:" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [], - "source": [ - "giskard_model = FAISSRAGModel(\n", - " climate_qa_chain,\n", - " model_type=\"text_generation\",\n", - " name=\"Climate Change Question Answering\",\n", - " description=\"This model answers any question about climate change based on IPCC reports\",\n", - " feature_names=[\"question\"],\n", - ")\n", - "\n", - "# Let’s set this as our test suite model\n", - "test_suite.default_params[\"model\"] = giskard_model" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false, - "id": "C3EtWiq0DlLC" - }, - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model & tests to the Giskard Hub." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "l71Vs1Q1DlLC", - "outputId": "8a5708fb-7736-4e61-ed10-32c9c069e70f" - }, - "outputs": [], - "source": [ - "from giskard import GiskardClient\n", - "\n", - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" # This can be found in the Settings tab of the Giskard Hub\n", - "hf_token = \"\" # If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "my_project = client.create_project(\"my_project\", \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, \"my_project\")" - ] } ], "metadata": { diff --git a/docs/getting_started/quickstart/quickstart_nlp.ipynb b/docs/getting_started/quickstart/quickstart_nlp.ipynb index 19eda182d9..09a8d870b7 100644 --- a/docs/getting_started/quickstart/quickstart_nlp.ipynb +++ b/docs/getting_started/quickstart/quickstart_nlp.ipynb @@ -23,11 +23,7 @@ "Outline:\n", "\n", "* Detect vulnerabilities automatically with Giskard's scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to:\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { @@ -83,7 +79,7 @@ "from scipy.special import softmax\n", "from transformers import AutoModelForSequenceClassification, AutoTokenizer\n", "\n", - "from giskard import Dataset, Model, scan, testing, GiskardClient, Suite" + "from giskard import Dataset, Model, scan, testing" ] }, { @@ -327,11 +323,11 @@ }, { "cell_type": "markdown", + "id": "9dd5baaaa6a7ee62", "metadata": {}, "source": [ "If you are running in a notebook, you can display the scan report directly in the notebook using `display(...)`, otherwise you can export the report to an HTML file. Check the [API Reference](https://docs.giskard.ai/en/stable/reference/scan/report.html#giskard.scanner.report.ScanReport) for more details on the export methods available on the `ScanReport` class." - ], - "id": "9dd5baaaa6a7ee62" + ] }, { "cell_type": "code", @@ -1851,126 +1847,6 @@ "source": [ "test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" ] - }, - { - "cell_type": "markdown", - "id": "3f43ed2aac0d94a4", - "metadata": { - "collapsed": false, - "id": "3f43ed2aac0d94a4" - }, - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to:\n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ] - }, - { - "cell_type": "markdown", - "id": "rm5e4aDrW_lm", - "metadata": { - "id": "rm5e4aDrW_lm" - }, - "source": [ - "Here's a sneak peak of automated model insights on a credit scoring classification model." - ] - }, - { - "cell_type": "markdown", - "id": "iZ8HF7pmWzeO", - "metadata": { - "id": "iZ8HF7pmWzeO" - }, - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ] - }, - { - "cell_type": "markdown", - "id": "iwBobi_wW25Y", - "metadata": { - "id": "iwBobi_wW25Y" - }, - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ] - }, - { - "cell_type": "markdown", - "id": "ivXicvcMVdOM", - "metadata": { - "id": "ivXicvcMVdOM" - }, - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "829536054bde5075", - "metadata": { - "id": "829536054bde5075" - }, - "outputs": [], - "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" # This can be found in the Settings tab of the Giskard hub\n", - "# hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "my_project = client.create_project(\"my_project\", \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, \"my_project\")" - ] - }, - { - "cell_type": "markdown", - "id": "6QCJMgauXPlc", - "metadata": { - "id": "6QCJMgauXPlc" - }, - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to:\n", - "\n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "uNUQSpb5XUsE", - "metadata": { - "id": "uNUQSpb5XUsE" - }, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, \"my_project\", 1753)\n", - "test_suite_downloaded.run()" - ] } ], "metadata": { diff --git a/docs/getting_started/quickstart/quickstart_tabular.ipynb b/docs/getting_started/quickstart/quickstart_tabular.ipynb index b22589b318..865914486d 100644 --- a/docs/getting_started/quickstart/quickstart_tabular.ipynb +++ b/docs/getting_started/quickstart/quickstart_tabular.ipynb @@ -20,11 +20,7 @@ "Outline:\n", "\n", "* Detect vulnerabilities automatically with Giskard's scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to:\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { @@ -38,7 +34,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "id": "774e195a6fdbaa27", "metadata": { "ExecuteTime": { @@ -64,7 +60,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "id": "7e53bd53b81a6b37", "metadata": { "collapsed": false @@ -74,7 +70,7 @@ "import numpy as np\n", "import pandas as pd\n", "\n", - "from giskard import Model, Dataset, scan, testing, GiskardClient, demo, Suite" + "from giskard import Model, Dataset, scan, testing, demo" ] }, { @@ -89,7 +85,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 2, "id": "54a2d07ad1ee745a", "metadata": { "ExecuteTime": { @@ -128,7 +124,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 3, "id": "1c439243e2552799", "metadata": { "ExecuteTime": { @@ -156,7 +152,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "id": "a3fea25c991fe05c", "metadata": { "ExecuteTime": { @@ -198,7 +194,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 5, "id": "c612f436dacff7c2", "metadata": { "collapsed": false @@ -222,7 +218,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "id": "b09b62fba752975a", "metadata": { "ExecuteTime": { @@ -295,15 +291,15 @@ }, { "cell_type": "markdown", + "id": "28272f36e73f8a76", "metadata": {}, "source": [ "If you are running in a notebook, you can display the scan report directly in the notebook using `display(...)`, otherwise you can export the report to an HTML file. Check the [API Reference](https://docs.giskard.ai/en/stable/reference/scan/report.html#giskard.scanner.report.ScanReport) for more details on the export methods available on the `ScanReport` class." - ], - "id": "28272f36e73f8a76" + ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 8, "id": "ecb49fa5", "metadata": { "ExecuteTime": { @@ -315,7 +311,7 @@ { "data": { "text/html": [ - "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -522,51 +2024,331 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "### Generate test suites from the scan\n", "\n", "The objects produced by the scan can be used as fixtures to generate a test suite that integrate all detected vulnerabilities. Test suites allow you to evaluate and validate your model's performance, ensuring that it behaves as expected on a set of predefined test cases, and to identify any regressions or issues that might arise during development or updates." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": 14, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T22:25:12.590245Z", "start_time": "2023-11-08T22:24:56.084975Z" - } + }, + "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Executed 'Invariance to “Switch Gender”' with arguments {'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", + "2024-05-29 13:27:10,197 pid:61763 MainThread giskard.datasets.base INFO Casting dataframe columns from {'text': 'object'} to {'text': 'object'}\n", + "2024-05-29 13:27:10,199 pid:61763 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (375, 2) executed in 0:00:00.007260\n", + "2024-05-29 13:27:10,212 pid:61763 MainThread giskard.datasets.base INFO Casting dataframe columns from {'text': 'object'} to {'text': 'object'}\n", + "2024-05-29 13:27:10,214 pid:61763 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (375, 2) executed in 0:00:00.005033\n", + "2024-05-29 13:27:10,219 pid:61763 MainThread giskard.utils.logging_utils INFO Perturb and predict data executed in 0:00:00.033015\n", + "2024-05-29 13:27:10,220 pid:61763 MainThread giskard.utils.logging_utils INFO Compare and predict the data executed in 0:00:00.000322\n", + "Executed 'Invariance to “Switch Gender”' with arguments {'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", " Test failed\n", " Metric: 0.95\n", - " - [TestMessageLevel.INFO] 20 rows were perturbed\n", + " - [INFO] 20 rows were perturbed\n", " \n", - "Executed 'Invariance to “Add typos”' with arguments {'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", + "2024-05-29 13:27:10,228 pid:61763 MainThread giskard.datasets.base INFO Casting dataframe columns from {'text': 'object'} to {'text': 'object'}\n", + "2024-05-29 13:27:10,229 pid:61763 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (375, 2) executed in 0:00:00.004654\n", + "2024-05-29 13:27:10,248 pid:61763 MainThread giskard.datasets.base INFO Casting dataframe columns from {'text': 'object'} to {'text': 'object'}\n", + "2024-05-29 13:27:13,896 pid:61763 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (375, 2) executed in 0:00:03.650915\n", + "2024-05-29 13:27:13,899 pid:61763 MainThread giskard.utils.logging_utils INFO Perturb and predict data executed in 0:00:03.676607\n", + "2024-05-29 13:27:13,900 pid:61763 MainThread giskard.utils.logging_utils INFO Compare and predict the data executed in 0:00:00.000276\n", + "Executed 'Invariance to “Add typos”' with arguments {'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", " Test failed\n", - " Metric: 0.87\n", - " - [TestMessageLevel.INFO] 352 rows were perturbed\n", + " Metric: 0.85\n", + " - [INFO] 355 rows were perturbed\n", " \n", - "Executed 'Overconfidence on data slice “`avg_whitespace(text)` < 0.149”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7140350877192984, 'p_threshold': 0.43497172683775553}: \n", + "2024-05-29 13:27:13,909 pid:61763 MainThread giskard.datasets.base INFO Casting dataframe columns from {'text': 'object'} to {'text': 'object'}\n", + "2024-05-29 13:27:13,910 pid:61763 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (87, 2) executed in 0:00:00.006215\n", + "Executed 'Overconfidence on data slice “`text_length(text)` < 65.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6886956521739132, 'p_threshold': 0.43497172683775553}: \n", " Test failed\n", - " Metric: 0.76\n", + " Metric: 0.74\n", " \n", - " \n" + " \n", + "2024-05-29 13:27:13,913 pid:61763 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'\n", + "2024-05-29 13:27:13,913 pid:61763 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 13:27:13,914 pid:61763 MainThread giskard.core.suite INFO Invariance to “Switch Gender” ({'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}): {failed, metric=0.95}\n", + "2024-05-29 13:27:13,914 pid:61763 MainThread giskard.core.suite INFO Invariance to “Add typos” ({'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}): {failed, metric=0.8450704225352113}\n", + "2024-05-29 13:27:13,914 pid:61763 MainThread giskard.core.suite INFO Overconfidence on data slice “`text_length(text)` < 65.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6886956521739132, 'p_threshold': 0.43497172683775553}): {failed, metric=0.7419354838709677}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test Invariance to “Switch Gender”\n
\n \n Measured Metric = 0.95\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n e22c8893-a3d6-4ecd-8341-7cab2b8c311d\n
\n \n
\n dataset\n Tweets sentiment dataset\n
\n \n
\n transformation_function\n Switch Gender\n
\n \n
\n threshold\n 0.95\n
\n \n
\n output_sensitivity\n 0.05\n
\n \n
\n
\n \n \n
\n Test Invariance to “Add typos”\n
\n \n Measured Metric = 0.87216\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n e22c8893-a3d6-4ecd-8341-7cab2b8c311d\n
\n \n
\n dataset\n Tweets sentiment dataset\n
\n \n
\n transformation_function\n Add typos\n
\n \n
\n threshold\n 0.95\n
\n \n
\n output_sensitivity\n 0.05\n
\n \n
\n
\n \n \n
\n Test Overconfidence on data slice “`avg_whitespace(text)` < 0.149”\n
\n \n Measured Metric = 0.7561\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n e22c8893-a3d6-4ecd-8341-7cab2b8c311d\n
\n \n
\n dataset\n Tweets sentiment dataset\n
\n \n
\n slicing_function\n `avg_whitespace(text)` < 0.149\n
\n \n
\n threshold\n 0.7140350877192984\n
\n \n
\n p_threshold\n 0.43497172683775553\n
\n \n
\n
\n \n\n \n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Invariance to “Switch Gender”\n", + "
\n", + " \n", + " Measured Metric = 0.95\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Twitter sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Tweets sentiment dataset\n", + "
\n", + " \n", + "
\n", + " transformation_function\n", + " Switch Gender\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " output_sensitivity\n", + " 0.05\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Invariance to “Add typos”\n", + "
\n", + " \n", + " Measured Metric = 0.84507\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Twitter sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Tweets sentiment dataset\n", + "
\n", + " \n", + "
\n", + " transformation_function\n", + " Add typos\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " output_sensitivity\n", + " 0.05\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`text_length(text)` < 65.500”\n", + "
\n", + " \n", + " Measured Metric = 0.74194\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Twitter sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Tweets sentiment dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `text_length(text)` < 65.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6886956521739132\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.43497172683775553\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + "\n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] }, "execution_count": 14, "metadata": {}, @@ -607,118 +2389,6 @@ "source": [ "test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to:\n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ] - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to: \n", - "\n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" - ], - "metadata": { - "collapsed": false - } } ], "metadata": { diff --git a/docs/reference/notebooks/amazon_review_classification_sklearn.ipynb b/docs/reference/notebooks/amazon_review_classification_sklearn.ipynb index 5d4d002cdf..836b41bf7b 100644 --- a/docs/reference/notebooks/amazon_review_classification_sklearn.ipynb +++ b/docs/reference/notebooks/amazon_review_classification_sklearn.ipynb @@ -21,12 +21,7 @@ "Outline: \n", "\n", "* Detect vulnerabilities automatically with Giskard’s scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to: \n", - "\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { @@ -41,7 +36,7 @@ }, { "cell_type": "code", - "execution_count": 40, + "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2023-08-21T10:23:41.394690Z", @@ -63,13 +58,13 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 1, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T20:55:52.227278Z", "start_time": "2023-11-08T20:55:52.178313Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -86,33 +81,33 @@ "from sklearn.pipeline import Pipeline\n", "from sklearn.preprocessing import FunctionTransformer\n", "\n", - "from giskard import Dataset, Model, GiskardClient, testing, Suite, scan" + "from giskard import Dataset, Model, scan, testing" ] }, { "cell_type": "markdown", - "source": [ - "## Notebook-level settings" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "## Notebook-level settings" + ] }, { "cell_type": "code", - "execution_count": 3, - "outputs": [], - "source": [ - "# Disable chained assignment warning.\n", - "pd.options.mode.chained_assignment = None" - ], + "execution_count": 2, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T20:55:54.918094Z", "start_time": "2023-11-08T20:55:54.886389Z" - } - } + }, + "collapsed": false + }, + "outputs": [], + "source": [ + "# Disable chained assignment warning.\n", + "pd.options.mode.chained_assignment = None" + ] }, { "cell_type": "markdown", @@ -123,7 +118,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2023-11-08T20:55:58.032511Z", @@ -162,13 +157,13 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 4, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T20:56:00.293536Z", "start_time": "2023-11-08T20:56:00.234306Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -240,7 +235,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2023-11-08T20:56:57.696227Z", @@ -264,7 +259,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2023-11-08T20:57:20.112511Z", @@ -300,7 +295,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2023-11-08T20:57:21.472626Z", @@ -368,7 +363,7 @@ "metadata": {}, "outputs": [], "source": [ - "# Wrap prediction function so that the whole pipeline is uploaded to the Hub\n", + "# Wrap prediction function\n", "def prediction_function(df):\n", " return pipeline.predict_proba(df)\n", "\n", @@ -391,12 +386,12 @@ }, { "cell_type": "markdown", - "source": [ - "## Detect vulnerabilities in your model" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "## Detect vulnerabilities in your model" + ] }, { "cell_type": "markdown", @@ -418,18 +413,2369 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 12, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T21:19:07.555323Z", "start_time": "2023-11-08T21:19:07.343077Z" - } + }, + "collapsed": false }, "outputs": [ { "data": { - "text/html": "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -441,12 +2787,12 @@ }, { "cell_type": "markdown", - "source": [ - "## Generate comprehensive test suites automatically for your model" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "## Generate comprehensive test suites automatically for your model" + ] }, { "cell_type": "markdown", @@ -461,72 +2807,593 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 13, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T21:31:48.577324Z", "start_time": "2023-11-08T21:31:39.914226Z" - } + }, + "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Executed 'Invariance to “Add typos”' with arguments {'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", + "2024-05-29 11:31:46,497 pid:47376 MainThread giskard.datasets.base INFO Casting dataframe columns from {'reviewText': 'object'} to {'reviewText': 'object'}\n", + "2024-05-29 11:31:46,501 pid:47376 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (9587, 2) executed in 0:00:00.015202\n", + "2024-05-29 11:31:47,278 pid:47376 MainThread giskard.datasets.base INFO Casting dataframe columns from {'reviewText': 'object'} to {'reviewText': 'object'}\n", + "2024-05-29 11:31:47,490 pid:47376 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (9587, 2) executed in 0:00:00.246625\n", + "2024-05-29 11:31:47,499 pid:47376 MainThread giskard.utils.logging_utils INFO Perturb and predict data executed in 0:00:01.872730\n", + "2024-05-29 11:31:47,500 pid:47376 MainThread giskard.utils.logging_utils INFO Compare and predict the data executed in 0:00:00.000723\n", + "Executed 'Invariance to “Add typos”' with arguments {'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", " Test failed\n", - " Metric: 0.91\n", - " - [TestMessageLevel.INFO] 9506 rows were perturbed\n", + " Metric: 0.9\n", + " - [INFO] 9515 rows were perturbed\n", " \n", - "Executed 'Overconfidence on data slice “`reviewText` contains \"don\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.20729267505646984, 'p_threshold': 0.5}: \n", + "2024-05-29 11:31:47,559 pid:47376 MainThread giskard.datasets.base INFO Casting dataframe columns from {'reviewText': 'object'} to {'reviewText': 'object'}\n", + "2024-05-29 11:31:47,560 pid:47376 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (1554, 2) executed in 0:00:00.013542\n", + "Executed 'Overconfidence on data slice “`reviewText` contains \"don\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.20769479469770452, 'p_threshold': 0.5}: \n", " Test failed\n", " Metric: 0.27\n", " \n", " \n", - "Executed 'Overconfidence on data slice “`text_length(reviewText)` < 174.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.20729267505646984, 'p_threshold': 0.5}: \n", + "2024-05-29 11:31:47,582 pid:47376 MainThread giskard.datasets.base INFO Casting dataframe columns from {'reviewText': 'object'} to {'reviewText': 'object'}\n", + "2024-05-29 11:31:47,584 pid:47376 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (4436, 2) executed in 0:00:00.016746\n", + "Executed 'Overconfidence on data slice “`text_length(reviewText)` < 174.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.20769479469770452, 'p_threshold': 0.5}: \n", " Test failed\n", " Metric: 0.21\n", " \n", " \n", - "Executed 'Underconfidence on data slice “`reviewText` contains \"better\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.03855220611244394, 'p_threshold': 0.95}: \n", - " Test failed\n", - " Metric: 0.05\n", - " \n", - " \n", - "Executed 'Underconfidence on data slice “`reviewText` contains \"got\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.03855220611244394, 'p_threshold': 0.95}: \n", + "2024-05-29 11:31:47,641 pid:47376 MainThread giskard.datasets.base INFO Casting dataframe columns from {'reviewText': 'object'} to {'reviewText': 'object'}\n", + "2024-05-29 11:31:47,642 pid:47376 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (734, 2) executed in 0:00:00.008658\n", + "Executed 'Underconfidence on data slice “`reviewText` contains \"way\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.03912589965578388, 'p_threshold': 0.95}: \n", " Test failed\n", " Metric: 0.05\n", " \n", " \n", - "Executed 'Underconfidence on data slice “`reviewText` contains \"doesn\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.03855220611244394, 'p_threshold': 0.95}: \n", + "2024-05-29 11:31:47,698 pid:47376 MainThread giskard.datasets.base INFO Casting dataframe columns from {'reviewText': 'object'} to {'reviewText': 'object'}\n", + "2024-05-29 11:31:47,699 pid:47376 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (707, 2) executed in 0:00:00.008718\n", + "Executed 'Underconfidence on data slice “`reviewText` contains \"better\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.03912589965578388, 'p_threshold': 0.95}: \n", " Test failed\n", " Metric: 0.04\n", " \n", " \n", - "Executed 'Underconfidence on data slice “`reviewText` contains \"way\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.03855220611244394, 'p_threshold': 0.95}: \n", + "2024-05-29 11:31:47,756 pid:47376 MainThread giskard.datasets.base INFO Casting dataframe columns from {'reviewText': 'object'} to {'reviewText': 'object'}\n", + "2024-05-29 11:31:47,757 pid:47376 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (849, 2) executed in 0:00:00.009271\n", + "Executed 'Underconfidence on data slice “`reviewText` contains \"want\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.03912589965578388, 'p_threshold': 0.95}: \n", " Test failed\n", " Metric: 0.04\n", " \n", " \n", - "Executed 'Underconfidence on data slice “`reviewText` contains \"good\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.03855220611244394, 'p_threshold': 0.95}: \n", + "2024-05-29 11:31:47,813 pid:47376 MainThread giskard.datasets.base INFO Casting dataframe columns from {'reviewText': 'object'} to {'reviewText': 'object'}\n", + "2024-05-29 11:31:47,814 pid:47376 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (733, 2) executed in 0:00:00.007101\n", + "Executed 'Underconfidence on data slice “`reviewText` contains \"got\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.03912589965578388, 'p_threshold': 0.95}: \n", " Test failed\n", " Metric: 0.04\n", " \n", " \n", - "Executed 'Recall on data slice “`reviewText` contains \"download\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6416057519472739}: \n", + "2024-05-29 11:31:47,868 pid:47376 MainThread giskard.datasets.base INFO Casting dataframe columns from {'reviewText': 'object'} to {'reviewText': 'object'}\n", + "2024-05-29 11:31:47,869 pid:47376 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (538, 2) executed in 0:00:00.006875\n", + "Executed 'Recall on data slice “`reviewText` contains \"download\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6419472738166567}: \n", " Test failed\n", " Metric: 0.59\n", " \n", - " \n" + " \n", + "2024-05-29 11:31:47,872 pid:47376 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'\n", + "2024-05-29 11:31:47,873 pid:47376 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 11:31:47,873 pid:47376 MainThread giskard.core.suite INFO Invariance to “Add typos” ({'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}): {failed, metric=0.9038360483447189}\n", + "2024-05-29 11:31:47,873 pid:47376 MainThread giskard.core.suite INFO Overconfidence on data slice “`reviewText` contains \"don\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.20769479469770452, 'p_threshold': 0.5}): {failed, metric=0.26622296173044924}\n", + "2024-05-29 11:31:47,873 pid:47376 MainThread giskard.core.suite INFO Overconfidence on data slice “`text_length(reviewText)` < 174.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.20769479469770452, 'p_threshold': 0.5}): {failed, metric=0.21353196772191185}\n", + "2024-05-29 11:31:47,874 pid:47376 MainThread giskard.core.suite INFO Underconfidence on data slice “`reviewText` contains \"way\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.03912589965578388, 'p_threshold': 0.95}): {failed, metric=0.04904632152588556}\n", + "2024-05-29 11:31:47,874 pid:47376 MainThread giskard.core.suite INFO Underconfidence on data slice “`reviewText` contains \"better\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.03912589965578388, 'p_threshold': 0.95}): {failed, metric=0.042432814710042434}\n", + "2024-05-29 11:31:47,874 pid:47376 MainThread giskard.core.suite INFO Underconfidence on data slice “`reviewText` contains \"want\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.03912589965578388, 'p_threshold': 0.95}): {failed, metric=0.04240282685512368}\n", + "2024-05-29 11:31:47,874 pid:47376 MainThread giskard.core.suite INFO Underconfidence on data slice “`reviewText` contains \"got\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.03912589965578388, 'p_threshold': 0.95}): {failed, metric=0.03956343792633015}\n", + "2024-05-29 11:31:47,875 pid:47376 MainThread giskard.core.suite INFO Recall on data slice “`reviewText` contains \"download\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6419472738166567}): {failed, metric=0.5929203539823009}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test Invariance to “Add typos”\n
\n \n Measured Metric = 0.90627\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 2bbb4689-1208-4860-971f-add01cc7f3eb\n
\n \n
\n dataset\n reviews\n
\n \n
\n transformation_function\n Add typos\n
\n \n
\n threshold\n 0.95\n
\n \n
\n output_sensitivity\n 0.05\n
\n \n
\n
\n \n \n
\n Test Overconfidence on data slice “`reviewText` contains "don"”\n
\n \n Measured Metric = 0.26578\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 2bbb4689-1208-4860-971f-add01cc7f3eb\n
\n \n
\n dataset\n reviews\n
\n \n
\n slicing_function\n `reviewText` contains "don"\n
\n \n
\n threshold\n 0.20729267505646984\n
\n \n
\n p_threshold\n 0.5\n
\n \n
\n
\n \n \n
\n Test Overconfidence on data slice “`text_length(reviewText)` < 174.500”\n
\n \n Measured Metric = 0.2119\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 2bbb4689-1208-4860-971f-add01cc7f3eb\n
\n \n
\n dataset\n reviews\n
\n \n
\n slicing_function\n `text_length(reviewText)` < 174.500\n
\n \n
\n threshold\n 0.20729267505646984\n
\n \n
\n p_threshold\n 0.5\n
\n \n
\n
\n \n \n
\n Test Underconfidence on data slice “`reviewText` contains "better"”\n
\n \n Measured Metric = 0.04526\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 2bbb4689-1208-4860-971f-add01cc7f3eb\n
\n \n
\n dataset\n reviews\n
\n \n
\n slicing_function\n `reviewText` contains "better"\n
\n \n
\n threshold\n 0.03855220611244394\n
\n \n
\n p_threshold\n 0.95\n
\n \n
\n \n \n \n
\n Test Underconfidence on data slice “`reviewText` contains "got"”\n
\n \n Measured Metric = 0.04502\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 2bbb4689-1208-4860-971f-add01cc7f3eb\n
\n \n
\n dataset\n reviews\n
\n \n
\n slicing_function\n `reviewText` contains "got"\n
\n \n
\n threshold\n 0.03855220611244394\n
\n \n
\n p_threshold\n 0.95\n
\n \n
\n \n \n \n
\n Test Underconfidence on data slice “`reviewText` contains "doesn"”\n
\n \n Measured Metric = 0.0415\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 2bbb4689-1208-4860-971f-add01cc7f3eb\n
\n \n
\n dataset\n reviews\n
\n \n
\n slicing_function\n `reviewText` contains "doesn"\n
\n \n
\n threshold\n 0.03855220611244394\n
\n \n
\n p_threshold\n 0.95\n
\n \n
\n \n \n \n
\n Test Underconfidence on data slice “`reviewText` contains "way"”\n
\n \n Measured Metric = 0.04087\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 2bbb4689-1208-4860-971f-add01cc7f3eb\n
\n \n
\n dataset\n reviews\n
\n \n
\n slicing_function\n `reviewText` contains "way"\n
\n \n
\n threshold\n 0.03855220611244394\n
\n \n
\n p_threshold\n 0.95\n
\n \n
\n \n \n \n
\n Test Underconfidence on data slice “`reviewText` contains "good"”\n
\n \n Measured Metric = 0.03938\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 2bbb4689-1208-4860-971f-add01cc7f3eb\n
\n \n
\n dataset\n reviews\n
\n \n
\n slicing_function\n `reviewText` contains "good"\n
\n \n
\n threshold\n 0.03855220611244394\n
\n \n
\n p_threshold\n 0.95\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`reviewText` contains "download"”\n
\n \n Measured Metric = 0.59292\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 2bbb4689-1208-4860-971f-add01cc7f3eb\n
\n \n
\n dataset\n reviews\n
\n \n
\n slicing_function\n `reviewText` contains "download"\n
\n \n
\n threshold\n 0.6416057519472739\n
\n \n
\n \n \n\n \n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Invariance to “Add typos”\n", + "
\n", + " \n", + " Measured Metric = 0.90384\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " review_helpfulness_predictor\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " reviews\n", + "
\n", + " \n", + "
\n", + " transformation_function\n", + " Add typos\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " output_sensitivity\n", + " 0.05\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`reviewText` contains "don"”\n", + "
\n", + " \n", + " Measured Metric = 0.26622\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " review_helpfulness_predictor\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " reviews\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `reviewText` contains "don"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.20769479469770452\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.5\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`text_length(reviewText)` < 174.500”\n", + "
\n", + " \n", + " Measured Metric = 0.21353\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " review_helpfulness_predictor\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " reviews\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `text_length(reviewText)` < 174.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.20769479469770452\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.5\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Underconfidence on data slice “`reviewText` contains "way"”\n", + "
\n", + " \n", + " Measured Metric = 0.04905\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " review_helpfulness_predictor\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " reviews\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `reviewText` contains "way"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.03912589965578388\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Underconfidence on data slice “`reviewText` contains "better"”\n", + "
\n", + " \n", + " Measured Metric = 0.04243\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " review_helpfulness_predictor\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " reviews\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `reviewText` contains "better"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.03912589965578388\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Underconfidence on data slice “`reviewText` contains "want"”\n", + "
\n", + " \n", + " Measured Metric = 0.0424\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " review_helpfulness_predictor\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " reviews\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `reviewText` contains "want"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.03912589965578388\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Underconfidence on data slice “`reviewText` contains "got"”\n", + "
\n", + " \n", + " Measured Metric = 0.03956\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " review_helpfulness_predictor\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " reviews\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `reviewText` contains "got"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.03912589965578388\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`reviewText` contains "download"”\n", + "
\n", + " \n", + " Measured Metric = 0.59292\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " review_helpfulness_predictor\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " reviews\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `reviewText` contains "download"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6419472738166567\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + "\n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] }, - "execution_count": 16, + "execution_count": 13, "metadata": {}, "output_type": "execute_result" } @@ -565,113 +3432,6 @@ "source": [ "test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to: \n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ] - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to: \n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" - ], - "metadata": { - "collapsed": false - } } ], "metadata": { @@ -690,7 +3450,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.11" + "version": "3.10.14" } }, "nbformat": 4, diff --git a/docs/reference/notebooks/cancer_detection_xgboost.ipynb b/docs/reference/notebooks/cancer_detection_xgboost.ipynb index 00c454ef4b..b697c85755 100644 --- a/docs/reference/notebooks/cancer_detection_xgboost.ipynb +++ b/docs/reference/notebooks/cancer_detection_xgboost.ipynb @@ -2,6 +2,10 @@ "cells": [ { "cell_type": "markdown", + "id": "990eccb8", + "metadata": { + "collapsed": false + }, "source": [ "# Breast cancer detection [XGBoost]\n", "\n", @@ -18,17 +22,8 @@ "Outline: \n", "\n", "* Detect vulnerabilities automatically with Giskard’s scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to: \n", - "\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" - ], - "metadata": { - "collapsed": false - }, - "id": "990eccb8" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" + ] }, { "attachments": {}, @@ -42,7 +37,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "id": "eb828d6da954f51d", "metadata": { "ExecuteTime": { @@ -83,37 +78,37 @@ "from sklearn.model_selection import train_test_split\n", "from xgboost import XGBClassifier\n", "\n", - "from giskard import Dataset, Model, scan, testing, Suite, GiskardClient" + "from giskard import Dataset, Model, scan, testing" ] }, { "cell_type": "markdown", - "source": [ - "## Define constants" - ], + "id": "9dac8b68bec87e9d", "metadata": { "collapsed": false }, - "id": "9dac8b68bec87e9d" + "source": [ + "## Define constants" + ] }, { "cell_type": "code", "execution_count": 2, + "id": "6f78eeb5c5d1e734", + "metadata": { + "ExecuteTime": { + "end_time": "2023-11-08T22:42:48.116285Z", + "start_time": "2023-11-08T22:42:48.066331Z" + }, + "collapsed": false + }, "outputs": [], "source": [ "# Constants.\n", "TARGET_COLUMN_NAME = \"target\"\n", "\n", "RANDOM_SEED = 42" - ], - "metadata": { - "collapsed": false, - "ExecuteTime": { - "end_time": "2023-11-08T22:42:48.116285Z", - "start_time": "2023-11-08T22:42:48.066331Z" - } - }, - "id": "6f78eeb5c5d1e734" + ] }, { "attachments": {}, @@ -126,13 +121,13 @@ }, { "cell_type": "markdown", - "source": [ - "### Load data" - ], + "id": "7a271dcc766e688f", "metadata": { "collapsed": false }, - "id": "7a271dcc766e688f" + "source": [ + "### Load data" + ] }, { "cell_type": "code", @@ -153,46 +148,54 @@ }, { "cell_type": "markdown", - "source": [ - "### Train-test split" - ], + "id": "bb43fe6e0c8de713", "metadata": { "collapsed": false }, - "id": "bb43fe6e0c8de713" + "source": [ + "### Train-test split" + ] }, { "cell_type": "code", "execution_count": 4, - "outputs": [], - "source": [ - "# Train/test split.\n", - "X_train, X_test, y_train, y_test = train_test_split(features.loc[:, features.columns != TARGET_COLUMN_NAME], target,\n", - " random_state=RANDOM_SEED)" - ], + "id": "b3cd0e34e8eb0a25", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T22:42:50.702481Z", "start_time": "2023-11-08T22:42:50.618451Z" - } + }, + "collapsed": false }, - "id": "b3cd0e34e8eb0a25" + "outputs": [], + "source": [ + "# Train/test split.\n", + "X_train, X_test, y_train, y_test = train_test_split(features.loc[:, features.columns != TARGET_COLUMN_NAME], target,\n", + " random_state=RANDOM_SEED)" + ] }, { "cell_type": "markdown", - "source": [ - "### Wrap dataset with Giskard\n", - "To prepare for the vulnerability scan, make sure to wrap your dataset using Giskard's Dataset class. More details [here](https://docs.giskard.ai/en/stable/open_source/scan/scan_nlp/index.html#step-1-wrap-your-dataset)." - ], + "id": "981e2b28aa4fda32", "metadata": { "collapsed": false }, - "id": "981e2b28aa4fda32" + "source": [ + "### Wrap dataset with Giskard\n", + "To prepare for the vulnerability scan, make sure to wrap your dataset using Giskard's Dataset class. More details [here](https://docs.giskard.ai/en/stable/open_source/scan/scan_nlp/index.html#step-1-wrap-your-dataset)." + ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, + "id": "e3c3e6a5", + "metadata": { + "ExecuteTime": { + "end_time": "2023-11-08T22:42:51.594977Z", + "start_time": "2023-11-08T22:42:51.547696Z" + }, + "collapsed": false + }, "outputs": [], "source": [ "raw_data = pd.concat([X_test, y_test], axis=1)\n", @@ -202,15 +205,7 @@ " target=\"target\", # Ground truth variable.\n", " name=\"breast_cancer\", # Optional.\n", ")" - ], - "metadata": { - "collapsed": false, - "ExecuteTime": { - "end_time": "2023-11-08T22:42:51.594977Z", - "start_time": "2023-11-08T22:42:51.547696Z" - } - }, - "id": "e3c3e6a5" + ] }, { "attachments": {}, @@ -278,13 +273,13 @@ }, { "cell_type": "markdown", - "source": [ - "## Detect vulnerabilities in your model" - ], + "id": "b4fd45501d666893", "metadata": { "collapsed": false }, - "id": "b4fd45501d666893" + "source": [ + "## Detect vulnerabilities in your model" + ] }, { "cell_type": "markdown", @@ -321,7 +316,3439 @@ "outputs": [ { "data": { - "text/html": "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -351,7 +3778,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "id": "bea736a9", "metadata": { "ExecuteTime": { @@ -364,136 +3791,1376 @@ "name": "stdout", "output_type": "stream", "text": [ - "Executed 'Nominal association (Theil's U) on data slice “`worst concave points` >= 0.144”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", + "2024-05-29 11:38:26,362 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,366 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (143, 31) executed in 0:00:00.007008\n", + "Executed 'Nominal association (Theil's U) on data slice “`worst concave points` >= 0.144”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", " Test failed\n", " Metric: 0.77\n", - " - [TestMessageLevel.INFO] metric = 0.7745014140794255, threshold = 0.5\n", + " - [INFO] metric = 0.7745014140794255, threshold = 0.5\n", " \n", - "Executed 'Nominal association (Theil's U) on data slice “`worst radius` >= 17.420”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", + "2024-05-29 11:38:26,374 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,376 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (143, 31) executed in 0:00:00.005423\n", + "Executed 'Nominal association (Theil's U) on data slice “`worst radius` >= 17.420”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", " Test failed\n", " Metric: 0.76\n", - " - [TestMessageLevel.INFO] metric = 0.7639160128151123, threshold = 0.5\n", + " - [INFO] metric = 0.7639160128151123, threshold = 0.5\n", " \n", - "Executed 'Nominal association (Theil's U) on data slice “`worst perimeter` >= 110.950”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", + "2024-05-29 11:38:26,385 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,388 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (143, 31) executed in 0:00:00.005598\n", + "Executed 'Nominal association (Theil's U) on data slice “`worst perimeter` >= 110.950”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", " Test failed\n", " Metric: 0.74\n", - " - [TestMessageLevel.INFO] metric = 0.7427987555404146, threshold = 0.5\n", + " - [INFO] metric = 0.7427987555404146, threshold = 0.5\n", " \n", - "Executed 'Nominal association (Theil's U) on data slice “`worst area` >= 885.950”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", + "2024-05-29 11:38:26,394 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,399 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (143, 31) executed in 0:00:00.006882\n", + "Executed 'Nominal association (Theil's U) on data slice “`worst area` >= 885.950”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", " Test failed\n", " Metric: 0.71\n", - " - [TestMessageLevel.INFO] metric = 0.7131111241996619, threshold = 0.5\n", + " - [INFO] metric = 0.7131111241996619, threshold = 0.5\n", " \n", - "Executed 'Nominal association (Theil's U) on data slice “`mean concave points` >= 0.053”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", + "2024-05-29 11:38:26,406 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,409 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (143, 31) executed in 0:00:00.005363\n", + "Executed 'Nominal association (Theil's U) on data slice “`mean concave points` >= 0.053”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", " Test failed\n", " Metric: 0.64\n", - " - [TestMessageLevel.INFO] metric = 0.6377395426892722, threshold = 0.5\n", + " - [INFO] metric = 0.6377395426892722, threshold = 0.5\n", " \n", - "Executed 'Nominal association (Theil's U) on data slice “`mean perimeter` >= 96.380”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", + "2024-05-29 11:38:26,419 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,424 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (143, 31) executed in 0:00:00.009371\n", + "Executed 'Nominal association (Theil's U) on data slice “`mean perimeter` >= 96.380”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", " Test failed\n", " Metric: 0.62\n", - " - [TestMessageLevel.INFO] metric = 0.6227293846650519, threshold = 0.5\n", + " - [INFO] metric = 0.6227293846650519, threshold = 0.5\n", " \n", - "Executed 'Nominal association (Theil's U) on data slice “`mean radius` >= 15.060”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", + "2024-05-29 11:38:26,433 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,439 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (143, 31) executed in 0:00:00.009926\n", + "Executed 'Nominal association (Theil's U) on data slice “`mean area` >= 697.300”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", " Test failed\n", " Metric: 0.61\n", - " - [TestMessageLevel.INFO] metric = 0.6087180717056299, threshold = 0.5\n", + " - [INFO] metric = 0.6087180717056299, threshold = 0.5\n", " \n", - "Executed 'Nominal association (Theil's U) on data slice “`mean area` >= 697.300”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", + "2024-05-29 11:38:26,449 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,455 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (143, 31) executed in 0:00:00.010707\n", + "Executed 'Nominal association (Theil's U) on data slice “`mean radius` >= 15.060”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", " Test failed\n", " Metric: 0.61\n", - " - [TestMessageLevel.INFO] metric = 0.6087180717056299, threshold = 0.5\n", + " - [INFO] metric = 0.6087180717056299, threshold = 0.5\n", " \n", - "Executed 'Nominal association (Theil's U) on data slice “`area error` >= 39.690”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", + "2024-05-29 11:38:26,463 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,467 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (143, 31) executed in 0:00:00.007197\n", + "Executed 'Nominal association (Theil's U) on data slice “`area error` >= 39.690”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}: \n", " Test failed\n", " Metric: 0.52\n", - " - [TestMessageLevel.INFO] metric = 0.5166070702259753, threshold = 0.5\n", + " - [INFO] metric = 0.5166070702259753, threshold = 0.5\n", " \n", - "Executed 'Accuracy on data slice “`worst radius` >= 14.765 AND `worst radius` < 17.625”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", + "2024-05-29 11:38:26,475 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,481 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (30, 31) executed in 0:00:00.009332\n", + "Executed 'Accuracy on data slice “`worst radius` >= 14.765 AND `worst radius` < 17.625”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", " Test failed\n", " Metric: 0.8\n", " \n", " \n", - "Executed 'Accuracy on data slice “`mean radius` >= 13.310 AND `mean radius` < 15.005”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", + "2024-05-29 11:38:26,493 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,497 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (30, 31) executed in 0:00:00.008997\n", + "Executed 'Accuracy on data slice “`worst perimeter` >= 96.625 AND `worst perimeter` < 122.350”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", " Test failed\n", " Metric: 0.8\n", " \n", " \n", - "Executed 'Accuracy on data slice “`mean perimeter` >= 86.140 AND `mean perimeter` < 98.085”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", + "2024-05-29 11:38:26,507 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,511 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (30, 31) executed in 0:00:00.008990\n", + "Executed 'Accuracy on data slice “`mean perimeter` >= 86.140 AND `mean perimeter` < 98.085”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", " Test failed\n", " Metric: 0.8\n", " \n", " \n", - "Executed 'Accuracy on data slice “`worst perimeter` >= 96.625 AND `worst perimeter` < 122.350”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", + "2024-05-29 11:38:26,523 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,527 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (30, 31) executed in 0:00:00.008275\n", + "Executed 'Accuracy on data slice “`mean area` >= 518.300 AND `mean area` < 664.350”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", " Test failed\n", " Metric: 0.8\n", " \n", " \n", - "Executed 'Accuracy on data slice “`mean area` >= 518.300 AND `mean area` < 664.350”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", + "2024-05-29 11:38:26,538 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,541 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (30, 31) executed in 0:00:00.010876\n", + "Executed 'Accuracy on data slice “`mean radius` >= 13.310 AND `mean radius` < 15.005”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", " Test failed\n", " Metric: 0.8\n", " \n", " \n", - "Executed 'Accuracy on data slice “`mean concavity` >= 0.078 AND `mean concavity` < 0.140”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", + "2024-05-29 11:38:26,562 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,577 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (30, 31) executed in 0:00:00.020399\n", + "Executed 'Accuracy on data slice “`mean concave points` >= 0.048 AND `mean concave points` < 0.079”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", " Test failed\n", " Metric: 0.8\n", " \n", " \n", - "Executed 'Accuracy on data slice “`mean concave points` >= 0.048 AND `mean concave points` < 0.079”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", + "2024-05-29 11:38:26,592 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,601 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (30, 31) executed in 0:00:00.013636\n", + "Executed 'Accuracy on data slice “`mean concavity` >= 0.078 AND `mean concavity` < 0.140”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", " Test failed\n", " Metric: 0.8\n", " \n", " \n", - "Executed 'Accuracy on data slice “`worst area` >= 610.200 AND `worst area` < 828.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", + "2024-05-29 11:38:26,616 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,625 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (33, 31) executed in 0:00:00.013964\n", + "Executed 'Accuracy on data slice “`worst area` >= 610.200 AND `worst area` < 828.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", " Test failed\n", " Metric: 0.82\n", " \n", " \n", - "Executed 'Accuracy on data slice “`fractal dimension error` < 0.003 AND `fractal dimension error` >= 0.002”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", + "2024-05-29 11:38:26,637 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,649 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (30, 31) executed in 0:00:00.019259\n", + "Executed 'Accuracy on data slice “`fractal dimension error` < 0.003 AND `fractal dimension error` >= 0.002”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", " Test failed\n", " Metric: 0.83\n", " \n", " \n", - "Executed 'Accuracy on data slice “`worst concavity` >= 0.277 AND `worst concavity` < 0.415”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", + "2024-05-29 11:38:26,662 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,667 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (30, 31) executed in 0:00:00.010570\n", + "Executed 'Accuracy on data slice “`worst concavity` >= 0.277 AND `worst concavity` < 0.415”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", " Test failed\n", " Metric: 0.83\n", " \n", " \n", - "Executed 'Accuracy on data slice “`compactness error` >= 0.020 AND `compactness error` < 0.030”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", + "2024-05-29 11:38:26,667 pid:50687 Thread-54 (_track) urllib3.connectionpool WARNING Connection pool is full, discarding connection: api.mixpanel.com. Connection pool size: 10\n", + "2024-05-29 11:38:26,668 pid:50687 Thread-47 (_track) urllib3.connectionpool WARNING Connection pool is full, discarding connection: api.mixpanel.com. Connection pool size: 10\n", + "2024-05-29 11:38:26,674 pid:50687 Thread-48 (_track) urllib3.connectionpool WARNING Connection pool is full, discarding connection: api.mixpanel.com. Connection pool size: 10\n", + "2024-05-29 11:38:26,675 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,681 pid:50687 Thread-50 (_track) urllib3.connectionpool WARNING Connection pool is full, discarding connection: api.mixpanel.com. Connection pool size: 10\n", + "2024-05-29 11:38:26,684 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (32, 31) executed in 0:00:00.013371\n", + "Executed 'Accuracy on data slice “`compactness error` >= 0.020 AND `compactness error` < 0.030”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", " Test failed\n", " Metric: 0.84\n", " \n", " \n", - "Executed 'Accuracy on data slice “`mean compactness` >= 0.099 AND `mean compactness` < 0.128”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", + "2024-05-29 11:38:26,693 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,697 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (32, 31) executed in 0:00:00.007443\n", + "Executed 'Accuracy on data slice “`mean compactness` >= 0.099 AND `mean compactness` < 0.128”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", " Test failed\n", " Metric: 0.84\n", " \n", " \n", - "Executed 'Recall on data slice “`concave points error` >= 0.009 AND `concave points error` < 0.015”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9179775280898876}: \n", + "2024-05-29 11:38:26,707 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,710 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (51, 31) executed in 0:00:00.007322\n", + "Executed 'Recall on data slice “`concave points error` >= 0.009 AND `concave points error` < 0.015”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9179775280898876}: \n", " Test failed\n", " Metric: 0.88\n", " \n", " \n", - "Executed 'Accuracy on data slice “`perimeter error` < 2.151 AND `perimeter error` >= 1.505”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", + "2024-05-29 11:38:26,720 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,723 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (35, 31) executed in 0:00:00.008618\n", + "Executed 'Accuracy on data slice “`perimeter error` < 2.151 AND `perimeter error` >= 1.505”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", " Test failed\n", " Metric: 0.89\n", " \n", " \n", - "Executed 'Accuracy on data slice “`area error` < 40.745 AND `area error` >= 19.215”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", + "2024-05-29 11:38:26,732 pid:50687 MainThread giskard.datasets.base INFO Casting dataframe columns from {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'} to {'mean radius': 'float64', 'mean texture': 'float64', 'mean perimeter': 'float64', 'mean area': 'float64', 'mean smoothness': 'float64', 'mean compactness': 'float64', 'mean concavity': 'float64', 'mean concave points': 'float64', 'mean symmetry': 'float64', 'mean fractal dimension': 'float64', 'radius error': 'float64', 'texture error': 'float64', 'perimeter error': 'float64', 'area error': 'float64', 'smoothness error': 'float64', 'compactness error': 'float64', 'concavity error': 'float64', 'concave points error': 'float64', 'symmetry error': 'float64', 'fractal dimension error': 'float64', 'worst radius': 'float64', 'worst texture': 'float64', 'worst perimeter': 'float64', 'worst area': 'float64', 'worst smoothness': 'float64', 'worst compactness': 'float64', 'worst concavity': 'float64', 'worst concave points': 'float64', 'worst symmetry': 'float64', 'worst fractal dimension': 'float64'}\n", + "2024-05-29 11:38:26,736 pid:50687 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (45, 31) executed in 0:00:00.009279\n", + "Executed 'Accuracy on data slice “`area error` < 40.745 AND `area error` >= 19.215”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}: \n", " Test failed\n", " Metric: 0.89\n", " \n", - " \n" + " \n", + "2024-05-29 11:38:26,737 pid:50687 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'\n", + "2024-05-29 11:38:26,738 pid:50687 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 11:38:26,738 pid:50687 MainThread giskard.core.suite INFO Nominal association (Theil's U) on data slice “`worst concave points` >= 0.144” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}): {failed, metric=0.7745014140794255}\n", + "2024-05-29 11:38:26,738 pid:50687 MainThread giskard.core.suite INFO Nominal association (Theil's U) on data slice “`worst radius` >= 17.420” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}): {failed, metric=0.7639160128151123}\n", + "2024-05-29 11:38:26,739 pid:50687 MainThread giskard.core.suite INFO Nominal association (Theil's U) on data slice “`worst perimeter` >= 110.950” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}): {failed, metric=0.7427987555404146}\n", + "2024-05-29 11:38:26,740 pid:50687 MainThread giskard.core.suite INFO Nominal association (Theil's U) on data slice “`worst area` >= 885.950” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}): {failed, metric=0.7131111241996619}\n", + "2024-05-29 11:38:26,740 pid:50687 MainThread giskard.core.suite INFO Nominal association (Theil's U) on data slice “`mean concave points` >= 0.053” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}): {failed, metric=0.6377395426892722}\n", + "2024-05-29 11:38:26,741 pid:50687 MainThread giskard.core.suite INFO Nominal association (Theil's U) on data slice “`mean perimeter` >= 96.380” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}): {failed, metric=0.6227293846650519}\n", + "2024-05-29 11:38:26,741 pid:50687 MainThread giskard.core.suite INFO Nominal association (Theil's U) on data slice “`mean area` >= 697.300” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}): {failed, metric=0.6087180717056299}\n", + "2024-05-29 11:38:26,743 pid:50687 MainThread giskard.core.suite INFO Nominal association (Theil's U) on data slice “`mean radius` >= 15.060” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}): {failed, metric=0.6087180717056299}\n", + "2024-05-29 11:38:26,744 pid:50687 MainThread giskard.core.suite INFO Nominal association (Theil's U) on data slice “`area error` >= 39.690” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5}): {failed, metric=0.5166070702259753}\n", + "2024-05-29 11:38:26,745 pid:50687 MainThread giskard.core.suite INFO Accuracy on data slice “`worst radius` >= 14.765 AND `worst radius` < 17.625” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}): {failed, metric=0.8}\n", + "2024-05-29 11:38:26,746 pid:50687 MainThread giskard.core.suite INFO Accuracy on data slice “`worst perimeter` >= 96.625 AND `worst perimeter` < 122.350” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}): {failed, metric=0.8}\n", + "2024-05-29 11:38:26,747 pid:50687 MainThread giskard.core.suite INFO Accuracy on data slice “`mean perimeter` >= 86.140 AND `mean perimeter` < 98.085” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}): {failed, metric=0.8}\n", + "2024-05-29 11:38:26,748 pid:50687 MainThread giskard.core.suite INFO Accuracy on data slice “`mean area` >= 518.300 AND `mean area` < 664.350” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}): {failed, metric=0.8}\n", + "2024-05-29 11:38:26,748 pid:50687 MainThread giskard.core.suite INFO Accuracy on data slice “`mean radius` >= 13.310 AND `mean radius` < 15.005” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}): {failed, metric=0.8}\n", + "2024-05-29 11:38:26,749 pid:50687 MainThread giskard.core.suite INFO Accuracy on data slice “`mean concave points` >= 0.048 AND `mean concave points` < 0.079” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}): {failed, metric=0.8}\n", + "2024-05-29 11:38:26,749 pid:50687 MainThread giskard.core.suite INFO Accuracy on data slice “`mean concavity` >= 0.078 AND `mean concavity` < 0.140” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}): {failed, metric=0.8}\n", + "2024-05-29 11:38:26,752 pid:50687 MainThread giskard.core.suite INFO Accuracy on data slice “`worst area` >= 610.200 AND `worst area` < 828.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}): {failed, metric=0.8181818181818182}\n", + "2024-05-29 11:38:26,752 pid:50687 MainThread giskard.core.suite INFO Accuracy on data slice “`fractal dimension error` < 0.003 AND `fractal dimension error` >= 0.002” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}): {failed, metric=0.8333333333333334}\n", + "2024-05-29 11:38:26,752 pid:50687 MainThread giskard.core.suite INFO Accuracy on data slice “`worst concavity` >= 0.277 AND `worst concavity` < 0.415” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}): {failed, metric=0.8333333333333334}\n", + "2024-05-29 11:38:26,753 pid:50687 MainThread giskard.core.suite INFO Accuracy on data slice “`compactness error` >= 0.020 AND `compactness error` < 0.030” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}): {failed, metric=0.84375}\n", + "2024-05-29 11:38:26,753 pid:50687 MainThread giskard.core.suite INFO Accuracy on data slice “`mean compactness` >= 0.099 AND `mean compactness` < 0.128” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}): {failed, metric=0.84375}\n", + "2024-05-29 11:38:26,753 pid:50687 MainThread giskard.core.suite INFO Recall on data slice “`concave points error` >= 0.009 AND `concave points error` < 0.015” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9179775280898876}): {failed, metric=0.875}\n", + "2024-05-29 11:38:26,754 pid:50687 MainThread giskard.core.suite INFO Accuracy on data slice “`perimeter error` < 2.151 AND `perimeter error` >= 1.505” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}): {failed, metric=0.8857142857142857}\n", + "2024-05-29 11:38:26,754 pid:50687 MainThread giskard.core.suite INFO Accuracy on data slice “`area error` < 40.745 AND `area error` >= 19.215” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9101398601398601}): {failed, metric=0.8888888888888888}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test Nominal association (Theil's U) on data slice “`worst concave points` >= 0.144”\n
\n \n Measured Metric = 0.7745\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `worst concave points` >= 0.144\n
\n \n
\n threshold\n 0.5\n
\n \n
\n
\n \n \n
\n Test Nominal association (Theil's U) on data slice “`worst radius` >= 17.420”\n
\n \n Measured Metric = 0.76392\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `worst radius` >= 17.420\n
\n \n
\n threshold\n 0.5\n
\n \n
\n
\n \n \n
\n Test Nominal association (Theil's U) on data slice “`worst perimeter` >= 110.950”\n
\n \n Measured Metric = 0.7428\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `worst perimeter` >= 110.950\n
\n \n
\n threshold\n 0.5\n
\n \n
\n
\n \n \n
\n Test Nominal association (Theil's U) on data slice “`worst area` >= 885.950”\n
\n \n Measured Metric = 0.71311\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `worst area` >= 885.950\n
\n \n
\n threshold\n 0.5\n
\n \n
\n \n \n \n
\n Test Nominal association (Theil's U) on data slice “`mean concave points` >= 0.053”\n
\n \n Measured Metric = 0.63774\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `mean concave points` >= 0.053\n
\n \n
\n threshold\n 0.5\n
\n \n
\n \n \n \n
\n Test Nominal association (Theil's U) on data slice “`mean perimeter` >= 96.380”\n
\n \n Measured Metric = 0.62273\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `mean perimeter` >= 96.380\n
\n \n
\n threshold\n 0.5\n
\n \n
\n \n \n \n
\n Test Nominal association (Theil's U) on data slice “`mean radius` >= 15.060”\n
\n \n Measured Metric = 0.60872\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `mean radius` >= 15.060\n
\n \n
\n threshold\n 0.5\n
\n \n
\n \n \n \n
\n Test Nominal association (Theil's U) on data slice “`mean area` >= 697.300”\n
\n \n Measured Metric = 0.60872\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `mean area` >= 697.300\n
\n \n
\n threshold\n 0.5\n
\n \n
\n \n \n \n
\n Test Nominal association (Theil's U) on data slice “`area error` >= 39.690”\n
\n \n Measured Metric = 0.51661\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `area error` >= 39.690\n
\n \n
\n threshold\n 0.5\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`worst radius` >= 14.765 AND `worst radius` < 17.625”\n
\n \n Measured Metric = 0.8\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `worst radius` >= 14.765 AND `worst radius` < 17.625\n
\n \n
\n threshold\n 0.9101398601398601\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`mean radius` >= 13.310 AND `mean radius` < 15.005”\n
\n \n Measured Metric = 0.8\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `mean radius` >= 13.310 AND `mean radius` < 15.005\n
\n \n
\n threshold\n 0.9101398601398601\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`mean perimeter` >= 86.140 AND `mean perimeter` < 98.085”\n
\n \n Measured Metric = 0.8\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `mean perimeter` >= 86.140 AND `mean perimeter` < 98.085\n
\n \n
\n threshold\n 0.9101398601398601\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`worst perimeter` >= 96.625 AND `worst perimeter` < 122.350”\n
\n \n Measured Metric = 0.8\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `worst perimeter` >= 96.625 AND `worst perimeter` < 122.350\n
\n \n
\n threshold\n 0.9101398601398601\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`mean area` >= 518.300 AND `mean area` < 664.350”\n
\n \n Measured Metric = 0.8\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `mean area` >= 518.300 AND `mean area` < 664.350\n
\n \n
\n threshold\n 0.9101398601398601\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`mean concavity` >= 0.078 AND `mean concavity` < 0.140”\n
\n \n Measured Metric = 0.8\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `mean concavity` >= 0.078 AND `mean concavity` < 0.140\n
\n \n
\n threshold\n 0.9101398601398601\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`mean concave points` >= 0.048 AND `mean concave points` < 0.079”\n
\n \n Measured Metric = 0.8\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `mean concave points` >= 0.048 AND `mean concave points` < 0.079\n
\n \n
\n threshold\n 0.9101398601398601\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`worst area` >= 610.200 AND `worst area` < 828.500”\n
\n \n Measured Metric = 0.81818\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `worst area` >= 610.200 AND `worst area` < 828.500\n
\n \n
\n threshold\n 0.9101398601398601\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`fractal dimension error` < 0.003 AND `fractal dimension error` >= 0.002”\n
\n \n Measured Metric = 0.83333\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `fractal dimension error` < 0.003 AND `fractal dimension error` >= 0.002\n
\n \n
\n threshold\n 0.9101398601398601\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`worst concavity` >= 0.277 AND `worst concavity` < 0.415”\n
\n \n Measured Metric = 0.83333\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `worst concavity` >= 0.277 AND `worst concavity` < 0.415\n
\n \n
\n threshold\n 0.9101398601398601\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`compactness error` >= 0.020 AND `compactness error` < 0.030”\n
\n \n Measured Metric = 0.84375\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `compactness error` >= 0.020 AND `compactness error` < 0.030\n
\n \n
\n threshold\n 0.9101398601398601\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`mean compactness` >= 0.099 AND `mean compactness` < 0.128”\n
\n \n Measured Metric = 0.84375\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `mean compactness` >= 0.099 AND `mean compactness` < 0.128\n
\n \n
\n threshold\n 0.9101398601398601\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`concave points error` >= 0.009 AND `concave points error` < 0.015”\n
\n \n Measured Metric = 0.875\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `concave points error` >= 0.009 AND `concave points error` < 0.015\n
\n \n
\n threshold\n 0.9179775280898876\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`perimeter error` < 2.151 AND `perimeter error` >= 1.505”\n
\n \n Measured Metric = 0.88571\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `perimeter error` < 2.151 AND `perimeter error` >= 1.505\n
\n \n
\n threshold\n 0.9101398601398601\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`area error` < 40.745 AND `area error` >= 19.215”\n
\n \n Measured Metric = 0.88889\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 534fd047-4921-414f-9e34-ca2a918b6822\n
\n \n
\n dataset\n breast_cancer\n
\n \n
\n slicing_function\n `area error` < 40.745 AND `area error` >= 19.215\n
\n \n
\n threshold\n 0.9101398601398601\n
\n \n
\n \n \n\n \n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Nominal association (Theil's U) on data slice “`worst concave points` >= 0.144”\n", + "
\n", + " \n", + " Measured Metric = 0.7745\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `worst concave points` >= 0.144\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Nominal association (Theil's U) on data slice “`worst radius` >= 17.420”\n", + "
\n", + " \n", + " Measured Metric = 0.76392\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `worst radius` >= 17.420\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Nominal association (Theil's U) on data slice “`worst perimeter` >= 110.950”\n", + "
\n", + " \n", + " Measured Metric = 0.7428\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `worst perimeter` >= 110.950\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Nominal association (Theil's U) on data slice “`worst area` >= 885.950”\n", + "
\n", + " \n", + " Measured Metric = 0.71311\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `worst area` >= 885.950\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Nominal association (Theil's U) on data slice “`mean concave points` >= 0.053”\n", + "
\n", + " \n", + " Measured Metric = 0.63774\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `mean concave points` >= 0.053\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Nominal association (Theil's U) on data slice “`mean perimeter` >= 96.380”\n", + "
\n", + " \n", + " Measured Metric = 0.62273\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `mean perimeter` >= 96.380\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Nominal association (Theil's U) on data slice “`mean area` >= 697.300”\n", + "
\n", + " \n", + " Measured Metric = 0.60872\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `mean area` >= 697.300\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Nominal association (Theil's U) on data slice “`mean radius` >= 15.060”\n", + "
\n", + " \n", + " Measured Metric = 0.60872\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `mean radius` >= 15.060\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Nominal association (Theil's U) on data slice “`area error` >= 39.690”\n", + "
\n", + " \n", + " Measured Metric = 0.51661\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `area error` >= 39.690\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`worst radius` >= 14.765 AND `worst radius` < 17.625”\n", + "
\n", + " \n", + " Measured Metric = 0.8\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `worst radius` >= 14.765 AND `worst radius` < 17.625\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9101398601398601\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`worst perimeter` >= 96.625 AND `worst perimeter` < 122.350”\n", + "
\n", + " \n", + " Measured Metric = 0.8\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `worst perimeter` >= 96.625 AND `worst perimeter` < 122.350\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9101398601398601\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`mean perimeter` >= 86.140 AND `mean perimeter` < 98.085”\n", + "
\n", + " \n", + " Measured Metric = 0.8\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `mean perimeter` >= 86.140 AND `mean perimeter` < 98.085\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9101398601398601\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`mean area` >= 518.300 AND `mean area` < 664.350”\n", + "
\n", + " \n", + " Measured Metric = 0.8\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `mean area` >= 518.300 AND `mean area` < 664.350\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9101398601398601\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`mean radius` >= 13.310 AND `mean radius` < 15.005”\n", + "
\n", + " \n", + " Measured Metric = 0.8\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `mean radius` >= 13.310 AND `mean radius` < 15.005\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9101398601398601\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`mean concave points` >= 0.048 AND `mean concave points` < 0.079”\n", + "
\n", + " \n", + " Measured Metric = 0.8\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `mean concave points` >= 0.048 AND `mean concave points` < 0.079\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9101398601398601\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`mean concavity` >= 0.078 AND `mean concavity` < 0.140”\n", + "
\n", + " \n", + " Measured Metric = 0.8\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `mean concavity` >= 0.078 AND `mean concavity` < 0.140\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9101398601398601\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`worst area` >= 610.200 AND `worst area` < 828.500”\n", + "
\n", + " \n", + " Measured Metric = 0.81818\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `worst area` >= 610.200 AND `worst area` < 828.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9101398601398601\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`fractal dimension error` < 0.003 AND `fractal dimension error` >= 0.002”\n", + "
\n", + " \n", + " Measured Metric = 0.83333\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `fractal dimension error` < 0.003 AND `fractal dimension error` >= 0.002\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9101398601398601\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`worst concavity` >= 0.277 AND `worst concavity` < 0.415”\n", + "
\n", + " \n", + " Measured Metric = 0.83333\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `worst concavity` >= 0.277 AND `worst concavity` < 0.415\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9101398601398601\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`compactness error` >= 0.020 AND `compactness error` < 0.030”\n", + "
\n", + " \n", + " Measured Metric = 0.84375\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `compactness error` >= 0.020 AND `compactness error` < 0.030\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9101398601398601\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`mean compactness` >= 0.099 AND `mean compactness` < 0.128”\n", + "
\n", + " \n", + " Measured Metric = 0.84375\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `mean compactness` >= 0.099 AND `mean compactness` < 0.128\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9101398601398601\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`concave points error` >= 0.009 AND `concave points error` < 0.015”\n", + "
\n", + " \n", + " Measured Metric = 0.875\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `concave points error` >= 0.009 AND `concave points error` < 0.015\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9179775280898876\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`perimeter error` < 2.151 AND `perimeter error` >= 1.505”\n", + "
\n", + " \n", + " Measured Metric = 0.88571\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `perimeter error` < 2.151 AND `perimeter error` >= 1.505\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9101398601398601\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`area error` < 40.745 AND `area error` >= 19.215”\n", + "
\n", + " \n", + " Measured Metric = 0.88889\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " breast_cancer_xgboost\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " breast_cancer\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `area error` < 40.745 AND `area error` >= 19.215\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9101398601398601\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + "\n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2024-05-29 11:38:26,789 pid:50687 Thread-59 (_track) urllib3.connectionpool WARNING Connection pool is full, discarding connection: api.mixpanel.com. Connection pool size: 10\n", + "2024-05-29 11:38:26,796 pid:50687 Thread-58 (_track) urllib3.connectionpool WARNING Connection pool is full, discarding connection: api.mixpanel.com. Connection pool size: 10\n", + "2024-05-29 11:38:26,822 pid:50687 Thread-60 (_track) urllib3.connectionpool WARNING Connection pool is full, discarding connection: api.mixpanel.com. Connection pool size: 10\n", + "2024-05-29 11:38:26,848 pid:50687 Thread-61 (_track) urllib3.connectionpool WARNING Connection pool is full, discarding connection: api.mixpanel.com. Connection pool size: 10\n", + "2024-05-29 11:38:26,853 pid:50687 Thread-63 (_track) urllib3.connectionpool WARNING Connection pool is full, discarding connection: api.mixpanel.com. Connection pool size: 10\n", + "2024-05-29 11:38:26,873 pid:50687 Thread-62 (_track) urllib3.connectionpool WARNING Connection pool is full, discarding connection: api.mixpanel.com. Connection pool size: 10\n", + "2024-05-29 11:38:26,899 pid:50687 Thread-64 (_track) urllib3.connectionpool WARNING Connection pool is full, discarding connection: api.mixpanel.com. Connection pool size: 10\n" + ] } ], "source": [ @@ -503,13 +5170,13 @@ }, { "cell_type": "markdown", - "source": [ - "## Customize your suite by loading objects from the Giskard catalog" - ], + "id": "882f4638", "metadata": { "collapsed": false }, - "id": "882f4638" + "source": [ + "## Customize your suite by loading objects from the Giskard catalog" + ] }, { "cell_type": "markdown", @@ -538,124 +5205,6 @@ "source": [ "test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" ] - }, - { - "cell_type": "markdown", - "id": "cf824254", - "metadata": {}, - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to: \n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ] - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], - "metadata": { - "collapsed": false - }, - "id": "b03d16b0b4dc2e6f" - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - }, - "id": "eee658d69dd8d8c2" - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - }, - "id": "ecde49539af0f699" - }, - { - "cell_type": "markdown", - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." - ], - "metadata": { - "collapsed": false - }, - "id": "984c06a4f3c1bb01" - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ], - "metadata": { - "collapsed": false - }, - "id": "a309531193bd89cc" - }, - { - "cell_type": "markdown", - "id": "193983c206c0103f", - "metadata": { - "collapsed": false - }, - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to: \n", - "\n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" - ], - "metadata": { - "collapsed": false - }, - "id": "e9dadbd02995199d" } ], "metadata": { @@ -674,7 +5223,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.11" + "version": "3.10.14" } }, "nbformat": 4, diff --git a/docs/reference/notebooks/churn_prediction_lgbm.ipynb b/docs/reference/notebooks/churn_prediction_lgbm.ipynb index 709fed0eb6..b56b11cd72 100644 --- a/docs/reference/notebooks/churn_prediction_lgbm.ipynb +++ b/docs/reference/notebooks/churn_prediction_lgbm.ipynb @@ -21,12 +21,7 @@ "Outline: \n", "\n", "* Detect vulnerabilities automatically with Giskard’s scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to: \n", - "\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { @@ -41,7 +36,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2023-08-22T10:12:25.295802Z", @@ -96,7 +91,7 @@ "from sklearn.preprocessing import OneHotEncoder\n", "from sklearn.preprocessing import StandardScaler\n", "\n", - "from giskard import Dataset, Model, scan, GiskardClient, testing, Suite" + "from giskard import Dataset, Model, scan, testing" ] }, { @@ -112,11 +107,11 @@ "cell_type": "code", "execution_count": 2, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T22:45:06.756123Z", "start_time": "2023-11-08T22:45:06.722479Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -177,11 +172,11 @@ "cell_type": "code", "execution_count": 3, "metadata": { - "scrolled": false, "ExecuteTime": { "end_time": "2023-11-08T22:45:08.742520Z", "start_time": "2023-11-08T22:45:08.265673Z" - } + }, + "scrolled": false }, "outputs": [], "source": [ @@ -211,11 +206,11 @@ "cell_type": "code", "execution_count": 4, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T22:45:09.773474Z", "start_time": "2023-11-08T22:45:09.672374Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -238,13 +233,21 @@ "cell_type": "code", "execution_count": 5, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T22:45:10.845261Z", "start_time": "2023-11-08T22:45:10.797728Z" - } + }, + "collapsed": false }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2024-05-29 11:39:29,918 pid:51250 MainThread giskard.datasets.base INFO Your 'pandas.DataFrame' is successfully wrapped by Giskard's 'Dataset' wrapper class.\n" + ] + } + ], "source": [ "raw_data = pd.concat([X_test, Y_test], axis=1)\n", "giskard_dataset = Dataset(\n", @@ -275,11 +278,11 @@ "cell_type": "code", "execution_count": 6, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T22:45:12.378895Z", "start_time": "2023-11-08T22:45:12.356273Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -358,12 +361,12 @@ }, { "cell_type": "markdown", - "source": [ - "## Detect vulnerabilities in your model" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "## Detect vulnerabilities in your model" + ] }, { "cell_type": "markdown", @@ -391,16 +394,3238 @@ "cell_type": "code", "execution_count": 10, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T22:45:57.660834Z", "start_time": "2023-11-08T22:45:57.392784Z" - } + }, + "collapsed": false }, "outputs": [ { "data": { - "text/html": "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -421,145 +3646,1269 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "### Generate test suites from the scan\n", "\n", "The objects produced by the scan can be used as fixtures to generate a test suite that integrate all detected vulnerabilities. Test suites allow you to evaluate and validate your model's performance, ensuring that it behaves as expected on a set of predefined test cases, and to identify any regressions or issues that might arise during development or updates." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": 11, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T22:46:08.319084Z", "start_time": "2023-11-08T22:46:07.195401Z" - } + }, + "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Executed 'Overconfidence on data slice “`TotalCharges` >= 3246.925”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.4486033519553073, 'p_threshold': 0.5}: \n", + "2024-05-29 11:43:15,920 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:15,923 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (506, 20) executed in 0:00:00.018539\n", + "Executed 'Overconfidence on data slice “`TotalCharges` >= 3246.925”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.4486033519553073, 'p_threshold': 0.5}: \n", " Test failed\n", " Metric: 0.56\n", " \n", " \n", - "Executed 'Overconfidence on data slice “`InternetService` == \"DSL\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.4486033519553073, 'p_threshold': 0.5}: \n", + "2024-05-29 11:43:15,939 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:15,941 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (561, 20) executed in 0:00:00.010149\n", + "Executed 'Overconfidence on data slice “`InternetService` == \"DSL\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.4486033519553073, 'p_threshold': 0.5}: \n", " Test failed\n", " Metric: 0.51\n", " \n", " \n", - "Executed 'Overconfidence on data slice “`OnlineBackup` == \"Yes\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.4486033519553073, 'p_threshold': 0.5}: \n", + "2024-05-29 11:43:15,956 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:15,958 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (614, 20) executed in 0:00:00.010088\n", + "Executed 'Overconfidence on data slice “`OnlineBackup` == \"Yes\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.4486033519553073, 'p_threshold': 0.5}: \n", " Test failed\n", " Metric: 0.46\n", " \n", " \n", - "Executed 'Underconfidence on data slice “`OnlineSecurity` == \"No\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.014391353811149032, 'p_threshold': 0.95}: \n", + "2024-05-29 11:43:15,974 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:15,976 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (870, 20) executed in 0:00:00.010450\n", + "Executed 'Underconfidence on data slice “`OnlineSecurity` == \"No\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.014391353811149032, 'p_threshold': 0.95}: \n", " Test failed\n", " Metric: 0.02\n", " \n", " \n", - "Executed 'Underconfidence on data slice “`Contract` == \"Month-to-month\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.014391353811149032, 'p_threshold': 0.95}: \n", + "2024-05-29 11:43:15,993 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:15,995 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (980, 20) executed in 0:00:00.011023\n", + "Executed 'Underconfidence on data slice “`Contract` == \"Month-to-month\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.014391353811149032, 'p_threshold': 0.95}: \n", " Test failed\n", " Metric: 0.02\n", " \n", " \n", - "Executed 'Underconfidence on data slice “`Dependents` == \"No\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.014391353811149032, 'p_threshold': 0.95}: \n", + "2024-05-29 11:43:16,017 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:16,021 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (1248, 20) executed in 0:00:00.014565\n", + "Executed 'Underconfidence on data slice “`Dependents` == \"No\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.014391353811149032, 'p_threshold': 0.95}: \n", " Test failed\n", " Metric: 0.02\n", " \n", " \n", - "Executed 'Recall on data slice “`Contract` == \"One year\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", + "2024-05-29 11:43:16,035 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:16,037 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (376, 20) executed in 0:00:00.009567\n", + "Executed 'Recall on data slice “`Contract` == \"One year\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", " Test failed\n", " Metric: 0.0\n", " \n", " \n", - "Executed 'Recall on data slice “`tenure` >= 44.500 AND `tenure` < 70.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", + "2024-05-29 11:43:16,054 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:16,055 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (497, 20) executed in 0:00:00.008861\n", + "Executed 'Recall on data slice “`tenure` >= 44.500 AND `tenure` < 70.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", " Test failed\n", " Metric: 0.06\n", " \n", " \n", - "Executed 'Recall on data slice “`InternetService` == \"No\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", + "2024-05-29 11:43:16,071 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:16,074 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (405, 20) executed in 0:00:00.009695\n", + "Executed 'Recall on data slice “`InternetService` == \"No\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", " Test failed\n", " Metric: 0.08\n", " \n", " \n", - "Executed 'Recall on data slice “`OnlineSecurity` == \"No internet service\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", + "2024-05-29 11:43:16,092 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:16,093 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (405, 20) executed in 0:00:00.008504\n", + "Executed 'Recall on data slice “`OnlineSecurity` == \"No internet service\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", " Test failed\n", " Metric: 0.08\n", " \n", " \n", - "Executed 'Recall on data slice “`OnlineBackup` == \"No internet service\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", + "2024-05-29 11:43:16,109 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:16,111 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (405, 20) executed in 0:00:00.008597\n", + "Executed 'Recall on data slice “`OnlineBackup` == \"No internet service\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", " Test failed\n", " Metric: 0.08\n", " \n", " \n", - "Executed 'Recall on data slice “`DeviceProtection` == \"No internet service\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", + "2024-05-29 11:43:16,125 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:16,128 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (405, 20) executed in 0:00:00.008774\n", + "Executed 'Recall on data slice “`DeviceProtection` == \"No internet service\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", " Test failed\n", " Metric: 0.08\n", " \n", " \n", - "Executed 'Recall on data slice “`TechSupport` == \"No internet service\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", + "2024-05-29 11:43:16,143 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:16,145 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (405, 20) executed in 0:00:00.008562\n", + "Executed 'Recall on data slice “`TechSupport` == \"No internet service\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", " Test failed\n", " Metric: 0.08\n", " \n", " \n", - "Executed 'Recall on data slice “`StreamingTV` == \"No internet service\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", + "2024-05-29 11:43:16,161 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:16,164 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (405, 20) executed in 0:00:00.010419\n", + "Executed 'Recall on data slice “`StreamingTV` == \"No internet service\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", " Test failed\n", " Metric: 0.08\n", " \n", " \n", - "Executed 'Recall on data slice “`StreamingMovies` == \"No internet service\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", + "2024-05-29 11:43:16,179 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:16,182 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (405, 20) executed in 0:00:00.010005\n", + "Executed 'Recall on data slice “`StreamingMovies` == \"No internet service\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", " Test failed\n", " Metric: 0.08\n", " \n", " \n", - "Executed 'Recall on data slice “`MonthlyCharges` < 20.775”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", + "2024-05-29 11:43:16,196 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:16,198 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (296, 20) executed in 0:00:00.008635\n", + "Executed 'Recall on data slice “`MonthlyCharges` < 20.775”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", " Test failed\n", " Metric: 0.1\n", " \n", " \n", - "Executed 'Recall on data slice “`TechSupport` == \"Yes\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", + "2024-05-29 11:43:16,214 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:16,216 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (472, 20) executed in 0:00:00.009999\n", + "Executed 'Recall on data slice “`TechSupport` == \"Yes\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", " Test failed\n", " Metric: 0.21\n", " \n", " \n", - "Executed 'Recall on data slice “`OnlineSecurity` == \"Yes\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", + "2024-05-29 11:43:16,231 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:16,233 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (483, 20) executed in 0:00:00.009687\n", + "Executed 'Recall on data slice “`OnlineSecurity` == \"Yes\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", " Test failed\n", " Metric: 0.21\n", " \n", " \n", - "Executed 'Recall on data slice “`PaymentMethod` == \"Credit card\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", + "2024-05-29 11:43:16,256 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:16,268 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (368, 20) executed in 0:00:00.025869\n", + "Executed 'Recall on data slice “`PaymentMethod` == \"Credit card\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", " Test failed\n", " Metric: 0.28\n", " \n", " \n", - "Executed 'Recall on data slice “`InternetService` == \"DSL\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", + "2024-05-29 11:43:16,297 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:16,305 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (561, 20) executed in 0:00:00.017681\n", + "Executed 'Recall on data slice “`InternetService` == \"DSL\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", " Test failed\n", " Metric: 0.32\n", " \n", " \n", - "Executed 'Recall on data slice “`Dependents` == \"Yes\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", + "2024-05-29 11:43:16,332 pid:51250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'} to {'gender': 'object', 'SeniorCitizen': 'int64', 'Partner': 'object', 'Dependents': 'object', 'tenure': 'int64', 'PhoneService': 'object', 'MultipleLines': 'object', 'InternetService': 'object', 'OnlineSecurity': 'object', 'OnlineBackup': 'object', 'DeviceProtection': 'object', 'TechSupport': 'object', 'StreamingTV': 'object', 'StreamingMovies': 'object', 'Contract': 'object', 'PaperlessBilling': 'object', 'PaymentMethod': 'object', 'MonthlyCharges': 'float64', 'TotalCharges': 'float64'}\n", + "2024-05-29 11:43:16,340 pid:51250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (510, 20) executed in 0:00:00.020697\n", + "Executed 'Recall on data slice “`Dependents` == \"Yes\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}: \n", " Test failed\n", " Metric: 0.33\n", " \n", - " \n" + " \n", + "2024-05-29 11:43:16,348 pid:51250 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'\n", + "2024-05-29 11:43:16,348 pid:51250 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 11:43:16,349 pid:51250 MainThread giskard.core.suite INFO Overconfidence on data slice “`TotalCharges` >= 3246.925” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.4486033519553073, 'p_threshold': 0.5}): {failed, metric=0.5568181818181818}\n", + "2024-05-29 11:43:16,349 pid:51250 MainThread giskard.core.suite INFO Overconfidence on data slice “`InternetService` == \"DSL\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.4486033519553073, 'p_threshold': 0.5}): {failed, metric=0.5053763440860215}\n", + "2024-05-29 11:43:16,349 pid:51250 MainThread giskard.core.suite INFO Overconfidence on data slice “`OnlineBackup` == \"Yes\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.4486033519553073, 'p_threshold': 0.5}): {failed, metric=0.45588235294117646}\n", + "2024-05-29 11:43:16,349 pid:51250 MainThread giskard.core.suite INFO Underconfidence on data slice “`OnlineSecurity` == \"No\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.014391353811149032, 'p_threshold': 0.95}): {failed, metric=0.02413793103448276}\n", + "2024-05-29 11:43:16,350 pid:51250 MainThread giskard.core.suite INFO Underconfidence on data slice “`Contract` == \"Month-to-month\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.014391353811149032, 'p_threshold': 0.95}): {failed, metric=0.022448979591836733}\n", + "2024-05-29 11:43:16,350 pid:51250 MainThread giskard.core.suite INFO Underconfidence on data slice “`Dependents` == \"No\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.014391353811149032, 'p_threshold': 0.95}): {failed, metric=0.016826923076923076}\n", + "2024-05-29 11:43:16,350 pid:51250 MainThread giskard.core.suite INFO Recall on data slice “`Contract` == \"One year\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}): {failed, metric=0.0}\n", + "2024-05-29 11:43:16,351 pid:51250 MainThread giskard.core.suite INFO Recall on data slice “`tenure` >= 44.500 AND `tenure` < 70.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}): {failed, metric=0.05970149253731343}\n", + "2024-05-29 11:43:16,351 pid:51250 MainThread giskard.core.suite INFO Recall on data slice “`InternetService` == \"No\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}): {failed, metric=0.07692307692307693}\n", + "2024-05-29 11:43:16,351 pid:51250 MainThread giskard.core.suite INFO Recall on data slice “`OnlineSecurity` == \"No internet service\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}): {failed, metric=0.07692307692307693}\n", + "2024-05-29 11:43:16,352 pid:51250 MainThread giskard.core.suite INFO Recall on data slice “`OnlineBackup` == \"No internet service\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}): {failed, metric=0.07692307692307693}\n", + "2024-05-29 11:43:16,352 pid:51250 MainThread giskard.core.suite INFO Recall on data slice “`DeviceProtection` == \"No internet service\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}): {failed, metric=0.07692307692307693}\n", + "2024-05-29 11:43:16,352 pid:51250 MainThread giskard.core.suite INFO Recall on data slice “`TechSupport` == \"No internet service\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}): {failed, metric=0.07692307692307693}\n", + "2024-05-29 11:43:16,353 pid:51250 MainThread giskard.core.suite INFO Recall on data slice “`StreamingTV` == \"No internet service\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}): {failed, metric=0.07692307692307693}\n", + "2024-05-29 11:43:16,353 pid:51250 MainThread giskard.core.suite INFO Recall on data slice “`StreamingMovies` == \"No internet service\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}): {failed, metric=0.07692307692307693}\n", + "2024-05-29 11:43:16,353 pid:51250 MainThread giskard.core.suite INFO Recall on data slice “`MonthlyCharges` < 20.775” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}): {failed, metric=0.10344827586206896}\n", + "2024-05-29 11:43:16,353 pid:51250 MainThread giskard.core.suite INFO Recall on data slice “`TechSupport` == \"Yes\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}): {failed, metric=0.2054794520547945}\n", + "2024-05-29 11:43:16,353 pid:51250 MainThread giskard.core.suite INFO Recall on data slice “`OnlineSecurity` == \"Yes\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}): {failed, metric=0.2125}\n", + "2024-05-29 11:43:16,354 pid:51250 MainThread giskard.core.suite INFO Recall on data slice “`PaymentMethod` == \"Credit card\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}): {failed, metric=0.2777777777777778}\n", + "2024-05-29 11:43:16,354 pid:51250 MainThread giskard.core.suite INFO Recall on data slice “`InternetService` == \"DSL\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}): {failed, metric=0.3181818181818182}\n", + "2024-05-29 11:43:16,354 pid:51250 MainThread giskard.core.suite INFO Recall on data slice “`Dependents` == \"Yes\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.49131679389312977}): {failed, metric=0.32558139534883723}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test Overconfidence on data slice “`TotalCharges` >= 3246.925”\n
\n \n Measured Metric = 0.55682\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `TotalCharges` >= 3246.925\n
\n \n
\n threshold\n 0.4486033519553073\n
\n \n
\n p_threshold\n 0.5\n
\n \n
\n
\n \n \n
\n Test Overconfidence on data slice “`InternetService` == "DSL"”\n
\n \n Measured Metric = 0.50538\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `InternetService` == "DSL"\n
\n \n
\n threshold\n 0.4486033519553073\n
\n \n
\n p_threshold\n 0.5\n
\n \n
\n
\n \n \n
\n Test Overconfidence on data slice “`OnlineBackup` == "Yes"”\n
\n \n Measured Metric = 0.45588\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `OnlineBackup` == "Yes"\n
\n \n
\n threshold\n 0.4486033519553073\n
\n \n
\n p_threshold\n 0.5\n
\n \n
\n
\n \n \n
\n Test Underconfidence on data slice “`OnlineSecurity` == "No"”\n
\n \n Measured Metric = 0.02414\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `OnlineSecurity` == "No"\n
\n \n
\n threshold\n 0.014391353811149032\n
\n \n
\n p_threshold\n 0.95\n
\n \n
\n \n \n \n
\n Test Underconfidence on data slice “`Contract` == "Month-to-month"”\n
\n \n Measured Metric = 0.02245\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `Contract` == "Month-to-month"\n
\n \n
\n threshold\n 0.014391353811149032\n
\n \n
\n p_threshold\n 0.95\n
\n \n
\n \n \n \n
\n Test Underconfidence on data slice “`Dependents` == "No"”\n
\n \n Measured Metric = 0.01683\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `Dependents` == "No"\n
\n \n
\n threshold\n 0.014391353811149032\n
\n \n
\n p_threshold\n 0.95\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`Contract` == "One year"”\n
\n \n Measured Metric = 0.0\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `Contract` == "One year"\n
\n \n
\n threshold\n 0.49131679389312977\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`tenure` >= 44.500 AND `tenure` < 70.500”\n
\n \n Measured Metric = 0.0597\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `tenure` >= 44.500 AND `tenure` < 70.500\n
\n \n
\n threshold\n 0.49131679389312977\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`InternetService` == "No"”\n
\n \n Measured Metric = 0.07692\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `InternetService` == "No"\n
\n \n
\n threshold\n 0.49131679389312977\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`OnlineSecurity` == "No internet service"”\n
\n \n Measured Metric = 0.07692\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `OnlineSecurity` == "No internet service"\n
\n \n
\n threshold\n 0.49131679389312977\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`OnlineBackup` == "No internet service"”\n
\n \n Measured Metric = 0.07692\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `OnlineBackup` == "No internet service"\n
\n \n
\n threshold\n 0.49131679389312977\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`DeviceProtection` == "No internet service"”\n
\n \n Measured Metric = 0.07692\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `DeviceProtection` == "No internet service"\n
\n \n
\n threshold\n 0.49131679389312977\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`TechSupport` == "No internet service"”\n
\n \n Measured Metric = 0.07692\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `TechSupport` == "No internet service"\n
\n \n
\n threshold\n 0.49131679389312977\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`StreamingTV` == "No internet service"”\n
\n \n Measured Metric = 0.07692\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `StreamingTV` == "No internet service"\n
\n \n
\n threshold\n 0.49131679389312977\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`StreamingMovies` == "No internet service"”\n
\n \n Measured Metric = 0.07692\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `StreamingMovies` == "No internet service"\n
\n \n
\n threshold\n 0.49131679389312977\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`MonthlyCharges` < 20.775”\n
\n \n Measured Metric = 0.10345\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `MonthlyCharges` < 20.775\n
\n \n
\n threshold\n 0.49131679389312977\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`TechSupport` == "Yes"”\n
\n \n Measured Metric = 0.20548\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `TechSupport` == "Yes"\n
\n \n
\n threshold\n 0.49131679389312977\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`OnlineSecurity` == "Yes"”\n
\n \n Measured Metric = 0.2125\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `OnlineSecurity` == "Yes"\n
\n \n
\n threshold\n 0.49131679389312977\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`PaymentMethod` == "Credit card"”\n
\n \n Measured Metric = 0.27778\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `PaymentMethod` == "Credit card"\n
\n \n
\n threshold\n 0.49131679389312977\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`InternetService` == "DSL"”\n
\n \n Measured Metric = 0.31818\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `InternetService` == "DSL"\n
\n \n
\n threshold\n 0.49131679389312977\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`Dependents` == "Yes"”\n
\n \n Measured Metric = 0.32558\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n dbe1aaf2-396d-4fc2-8c13-1128bb44cf24\n
\n \n
\n dataset\n Churn classification dataset\n
\n \n
\n slicing_function\n `Dependents` == "Yes"\n
\n \n
\n threshold\n 0.49131679389312977\n
\n \n
\n \n \n\n \n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`TotalCharges` >= 3246.925”\n", + "
\n", + " \n", + " Measured Metric = 0.55682\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `TotalCharges` >= 3246.925\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.4486033519553073\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.5\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`InternetService` == "DSL"”\n", + "
\n", + " \n", + " Measured Metric = 0.50538\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `InternetService` == "DSL"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.4486033519553073\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.5\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`OnlineBackup` == "Yes"”\n", + "
\n", + " \n", + " Measured Metric = 0.45588\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `OnlineBackup` == "Yes"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.4486033519553073\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.5\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Underconfidence on data slice “`OnlineSecurity` == "No"”\n", + "
\n", + " \n", + " Measured Metric = 0.02414\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `OnlineSecurity` == "No"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.014391353811149032\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Underconfidence on data slice “`Contract` == "Month-to-month"”\n", + "
\n", + " \n", + " Measured Metric = 0.02245\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Contract` == "Month-to-month"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.014391353811149032\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Underconfidence on data slice “`Dependents` == "No"”\n", + "
\n", + " \n", + " Measured Metric = 0.01683\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Dependents` == "No"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.014391353811149032\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`Contract` == "One year"”\n", + "
\n", + " \n", + " Measured Metric = 0.0\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Contract` == "One year"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.49131679389312977\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`tenure` >= 44.500 AND `tenure` < 70.500”\n", + "
\n", + " \n", + " Measured Metric = 0.0597\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `tenure` >= 44.500 AND `tenure` < 70.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.49131679389312977\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`InternetService` == "No"”\n", + "
\n", + " \n", + " Measured Metric = 0.07692\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `InternetService` == "No"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.49131679389312977\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`OnlineSecurity` == "No internet service"”\n", + "
\n", + " \n", + " Measured Metric = 0.07692\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `OnlineSecurity` == "No internet service"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.49131679389312977\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`OnlineBackup` == "No internet service"”\n", + "
\n", + " \n", + " Measured Metric = 0.07692\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `OnlineBackup` == "No internet service"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.49131679389312977\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`DeviceProtection` == "No internet service"”\n", + "
\n", + " \n", + " Measured Metric = 0.07692\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `DeviceProtection` == "No internet service"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.49131679389312977\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`TechSupport` == "No internet service"”\n", + "
\n", + " \n", + " Measured Metric = 0.07692\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `TechSupport` == "No internet service"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.49131679389312977\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`StreamingTV` == "No internet service"”\n", + "
\n", + " \n", + " Measured Metric = 0.07692\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `StreamingTV` == "No internet service"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.49131679389312977\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`StreamingMovies` == "No internet service"”\n", + "
\n", + " \n", + " Measured Metric = 0.07692\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `StreamingMovies` == "No internet service"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.49131679389312977\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`MonthlyCharges` < 20.775”\n", + "
\n", + " \n", + " Measured Metric = 0.10345\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `MonthlyCharges` < 20.775\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.49131679389312977\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`TechSupport` == "Yes"”\n", + "
\n", + " \n", + " Measured Metric = 0.20548\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `TechSupport` == "Yes"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.49131679389312977\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`OnlineSecurity` == "Yes"”\n", + "
\n", + " \n", + " Measured Metric = 0.2125\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `OnlineSecurity` == "Yes"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.49131679389312977\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`PaymentMethod` == "Credit card"”\n", + "
\n", + " \n", + " Measured Metric = 0.27778\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `PaymentMethod` == "Credit card"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.49131679389312977\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`InternetService` == "DSL"”\n", + "
\n", + " \n", + " Measured Metric = 0.31818\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `InternetService` == "DSL"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.49131679389312977\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`Dependents` == "Yes"”\n", + "
\n", + " \n", + " Measured Metric = 0.32558\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Churn classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Churn classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Dependents` == "Yes"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.49131679389312977\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + "\n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2024-05-29 11:43:16,426 pid:51250 Thread-36 (_track) urllib3.connectionpool WARNING Connection pool is full, discarding connection: api.mixpanel.com. Connection pool size: 10\n", + "2024-05-29 11:43:16,454 pid:51250 Thread-37 (_track) urllib3.connectionpool WARNING Connection pool is full, discarding connection: api.mixpanel.com. Connection pool size: 10\n", + "2024-05-29 11:43:16,485 pid:51250 Thread-38 (_track) urllib3.connectionpool WARNING Connection pool is full, discarding connection: api.mixpanel.com. Connection pool size: 10\n" + ] } ], "source": [ @@ -596,118 +4945,6 @@ "source": [ "test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" ] - }, - { - "cell_type": "markdown", - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to: \n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to: \n", - " \n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" - ], - "metadata": { - "collapsed": false - } } ], "metadata": { @@ -737,7 +4974,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.15" + "version": "3.10.14" } }, "nbformat": 4, diff --git a/docs/reference/notebooks/credit_scoring.ipynb b/docs/reference/notebooks/credit_scoring.ipynb index 2d6bcb2ff4..db450b2446 100644 --- a/docs/reference/notebooks/credit_scoring.ipynb +++ b/docs/reference/notebooks/credit_scoring.ipynb @@ -21,12 +21,7 @@ "Outline: \n", "\n", "* Detect vulnerabilities automatically with Giskard’s scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to: \n", - "\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { @@ -41,7 +36,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2023-08-21T12:06:20.563432Z", @@ -67,11 +62,11 @@ "cell_type": "code", "execution_count": 1, "metadata": { - "collapsed": true, "ExecuteTime": { "end_time": "2023-11-08T22:50:16.461436Z", "start_time": "2023-11-08T22:50:09.622133Z" - } + }, + "collapsed": true }, "outputs": [], "source": [ @@ -84,7 +79,7 @@ "from sklearn.pipeline import Pipeline\n", "from sklearn.preprocessing import OneHotEncoder, StandardScaler\n", "\n", - "from giskard import Model, Dataset, scan, testing, GiskardClient, Suite" + "from giskard import Model, Dataset, scan, testing" ] }, { @@ -100,11 +95,11 @@ "cell_type": "code", "execution_count": 2, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T22:50:22.598805Z", "start_time": "2023-11-08T22:50:22.551813Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -164,11 +159,11 @@ "cell_type": "code", "execution_count": 3, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T22:50:24.243291Z", "start_time": "2023-11-08T22:50:23.880752Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -188,11 +183,11 @@ "cell_type": "code", "execution_count": 4, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T22:50:24.772138Z", "start_time": "2023-11-08T22:50:24.740667Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -212,13 +207,13 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T22:50:25.702983Z", "start_time": "2023-11-08T22:50:25.668475Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -255,11 +250,11 @@ "cell_type": "code", "execution_count": 6, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T22:50:26.994317Z", "start_time": "2023-11-08T22:50:26.960393Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -343,12 +338,12 @@ }, { "cell_type": "markdown", - "source": [ - "## Detect vulnerabilities in your model" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "## Detect vulnerabilities in your model" + ] }, { "cell_type": "markdown", @@ -376,16 +371,2422 @@ "cell_type": "code", "execution_count": 10, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-08T22:51:12.440487Z", "start_time": "2023-11-08T22:51:12.179232Z" - } + }, + "collapsed": false }, "outputs": [ { "data": { - "text/html": "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -397,12 +2798,12 @@ }, { "cell_type": "markdown", - "source": [ - "## Generate comprehensive test suites automatically for your model" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "## Generate comprehensive test suites automatically for your model" + ] }, { "cell_type": "markdown", @@ -418,52 +2819,500 @@ { "cell_type": "code", "execution_count": 11, + "metadata": { + "ExecuteTime": { + "end_time": "2023-11-08T22:51:26.616126Z", + "start_time": "2023-11-08T22:51:26.248461Z" + }, + "collapsed": false + }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Executed 'Precision on data slice “`other_installment_plans` == \"bank\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}: \n", + "2024-05-29 11:45:25,982 pid:52370 MainThread giskard.datasets.base INFO Casting dataframe columns from {'account_check_status': 'object', 'duration_in_month': 'int64', 'credit_history': 'object', 'purpose': 'object', 'credit_amount': 'int64', 'savings': 'object', 'present_employment_since': 'object', 'installment_as_income_perc': 'int64', 'sex': 'object', 'personal_status': 'object', 'other_debtors': 'object', 'present_residence_since': 'int64', 'property': 'object', 'age': 'int64', 'other_installment_plans': 'object', 'housing': 'object', 'credits_this_bank': 'int64', 'job': 'object', 'people_under_maintenance': 'int64', 'telephone': 'object', 'foreign_worker': 'object'} to {'account_check_status': 'object', 'duration_in_month': 'int64', 'credit_history': 'object', 'purpose': 'object', 'credit_amount': 'int64', 'savings': 'object', 'present_employment_since': 'object', 'installment_as_income_perc': 'int64', 'sex': 'object', 'personal_status': 'object', 'other_debtors': 'object', 'present_residence_since': 'int64', 'property': 'object', 'age': 'int64', 'other_installment_plans': 'object', 'housing': 'object', 'credits_this_bank': 'int64', 'job': 'object', 'people_under_maintenance': 'int64', 'telephone': 'object', 'foreign_worker': 'object'}\n", + "2024-05-29 11:45:25,986 pid:52370 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (32, 22) executed in 0:00:00.014548\n", + "Executed 'Precision on data slice “`other_installment_plans` == \"bank\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}: \n", " Test failed\n", " Metric: 0.6\n", " \n", " \n", - "Executed 'Precision on data slice “`account_check_status` == \"0 <= ... < 200 DM\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}: \n", + "2024-05-29 11:45:26,003 pid:52370 MainThread giskard.datasets.base INFO Casting dataframe columns from {'account_check_status': 'object', 'duration_in_month': 'int64', 'credit_history': 'object', 'purpose': 'object', 'credit_amount': 'int64', 'savings': 'object', 'present_employment_since': 'object', 'installment_as_income_perc': 'int64', 'sex': 'object', 'personal_status': 'object', 'other_debtors': 'object', 'present_residence_since': 'int64', 'property': 'object', 'age': 'int64', 'other_installment_plans': 'object', 'housing': 'object', 'credits_this_bank': 'int64', 'job': 'object', 'people_under_maintenance': 'int64', 'telephone': 'object', 'foreign_worker': 'object'} to {'account_check_status': 'object', 'duration_in_month': 'int64', 'credit_history': 'object', 'purpose': 'object', 'credit_amount': 'int64', 'savings': 'object', 'present_employment_since': 'object', 'installment_as_income_perc': 'int64', 'sex': 'object', 'personal_status': 'object', 'other_debtors': 'object', 'present_residence_since': 'int64', 'property': 'object', 'age': 'int64', 'other_installment_plans': 'object', 'housing': 'object', 'credits_this_bank': 'int64', 'job': 'object', 'people_under_maintenance': 'int64', 'telephone': 'object', 'foreign_worker': 'object'}\n", + "2024-05-29 11:45:26,006 pid:52370 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (58, 22) executed in 0:00:00.011488\n", + "Executed 'Precision on data slice “`account_check_status` == \"0 <= ... < 200 DM\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}: \n", " Test failed\n", " Metric: 0.6\n", " \n", " \n", - "Executed 'Precision on data slice “`present_employment_since` == \"... < 1 year\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}: \n", + "2024-05-29 11:45:26,019 pid:52370 MainThread giskard.datasets.base INFO Casting dataframe columns from {'account_check_status': 'object', 'duration_in_month': 'int64', 'credit_history': 'object', 'purpose': 'object', 'credit_amount': 'int64', 'savings': 'object', 'present_employment_since': 'object', 'installment_as_income_perc': 'int64', 'sex': 'object', 'personal_status': 'object', 'other_debtors': 'object', 'present_residence_since': 'int64', 'property': 'object', 'age': 'int64', 'other_installment_plans': 'object', 'housing': 'object', 'credits_this_bank': 'int64', 'job': 'object', 'people_under_maintenance': 'int64', 'telephone': 'object', 'foreign_worker': 'object'} to {'account_check_status': 'object', 'duration_in_month': 'int64', 'credit_history': 'object', 'purpose': 'object', 'credit_amount': 'int64', 'savings': 'object', 'present_employment_since': 'object', 'installment_as_income_perc': 'int64', 'sex': 'object', 'personal_status': 'object', 'other_debtors': 'object', 'present_residence_since': 'int64', 'property': 'object', 'age': 'int64', 'other_installment_plans': 'object', 'housing': 'object', 'credits_this_bank': 'int64', 'job': 'object', 'people_under_maintenance': 'int64', 'telephone': 'object', 'foreign_worker': 'object'}\n", + "2024-05-29 11:45:26,021 pid:52370 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (37, 22) executed in 0:00:00.008733\n", + "Executed 'Precision on data slice “`present_employment_since` == \"... < 1 year\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}: \n", " Test failed\n", " Metric: 0.65\n", " \n", " \n", - "Executed 'Recall on data slice “`personal_status` == \"divorced\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8617857142857143}: \n", + "2024-05-29 11:45:26,033 pid:52370 MainThread giskard.datasets.base INFO Casting dataframe columns from {'account_check_status': 'object', 'duration_in_month': 'int64', 'credit_history': 'object', 'purpose': 'object', 'credit_amount': 'int64', 'savings': 'object', 'present_employment_since': 'object', 'installment_as_income_perc': 'int64', 'sex': 'object', 'personal_status': 'object', 'other_debtors': 'object', 'present_residence_since': 'int64', 'property': 'object', 'age': 'int64', 'other_installment_plans': 'object', 'housing': 'object', 'credits_this_bank': 'int64', 'job': 'object', 'people_under_maintenance': 'int64', 'telephone': 'object', 'foreign_worker': 'object'} to {'account_check_status': 'object', 'duration_in_month': 'int64', 'credit_history': 'object', 'purpose': 'object', 'credit_amount': 'int64', 'savings': 'object', 'present_employment_since': 'object', 'installment_as_income_perc': 'int64', 'sex': 'object', 'personal_status': 'object', 'other_debtors': 'object', 'present_residence_since': 'int64', 'property': 'object', 'age': 'int64', 'other_installment_plans': 'object', 'housing': 'object', 'credits_this_bank': 'int64', 'job': 'object', 'people_under_maintenance': 'int64', 'telephone': 'object', 'foreign_worker': 'object'}\n", + "2024-05-29 11:45:26,036 pid:52370 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (68, 22) executed in 0:00:00.008518\n", + "Executed 'Recall on data slice “`personal_status` == \"divorced\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8617857142857143}: \n", " Test failed\n", " Metric: 0.8\n", " \n", " \n", - "Executed 'Precision on data slice “`duration_in_month` >= 16.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}: \n", + "2024-05-29 11:45:26,050 pid:52370 MainThread giskard.datasets.base INFO Casting dataframe columns from {'account_check_status': 'object', 'duration_in_month': 'int64', 'credit_history': 'object', 'purpose': 'object', 'credit_amount': 'int64', 'savings': 'object', 'present_employment_since': 'object', 'installment_as_income_perc': 'int64', 'sex': 'object', 'personal_status': 'object', 'other_debtors': 'object', 'present_residence_since': 'int64', 'property': 'object', 'age': 'int64', 'other_installment_plans': 'object', 'housing': 'object', 'credits_this_bank': 'int64', 'job': 'object', 'people_under_maintenance': 'int64', 'telephone': 'object', 'foreign_worker': 'object'} to {'account_check_status': 'object', 'duration_in_month': 'int64', 'credit_history': 'object', 'purpose': 'object', 'credit_amount': 'int64', 'savings': 'object', 'present_employment_since': 'object', 'installment_as_income_perc': 'int64', 'sex': 'object', 'personal_status': 'object', 'other_debtors': 'object', 'present_residence_since': 'int64', 'property': 'object', 'age': 'int64', 'other_installment_plans': 'object', 'housing': 'object', 'credits_this_bank': 'int64', 'job': 'object', 'people_under_maintenance': 'int64', 'telephone': 'object', 'foreign_worker': 'object'}\n", + "2024-05-29 11:45:26,054 pid:52370 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (112, 22) executed in 0:00:00.009998\n", + "Executed 'Precision on data slice “`duration_in_month` >= 16.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}: \n", " Test failed\n", " Metric: 0.71\n", " \n", " \n", - "Executed 'Precision on data slice “`property` == \"if not A121/A122 : car or other, not in attribute 6\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}: \n", + "2024-05-29 11:45:26,072 pid:52370 MainThread giskard.datasets.base INFO Casting dataframe columns from {'account_check_status': 'object', 'duration_in_month': 'int64', 'credit_history': 'object', 'purpose': 'object', 'credit_amount': 'int64', 'savings': 'object', 'present_employment_since': 'object', 'installment_as_income_perc': 'int64', 'sex': 'object', 'personal_status': 'object', 'other_debtors': 'object', 'present_residence_since': 'int64', 'property': 'object', 'age': 'int64', 'other_installment_plans': 'object', 'housing': 'object', 'credits_this_bank': 'int64', 'job': 'object', 'people_under_maintenance': 'int64', 'telephone': 'object', 'foreign_worker': 'object'} to {'account_check_status': 'object', 'duration_in_month': 'int64', 'credit_history': 'object', 'purpose': 'object', 'credit_amount': 'int64', 'savings': 'object', 'present_employment_since': 'object', 'installment_as_income_perc': 'int64', 'sex': 'object', 'personal_status': 'object', 'other_debtors': 'object', 'present_residence_since': 'int64', 'property': 'object', 'age': 'int64', 'other_installment_plans': 'object', 'housing': 'object', 'credits_this_bank': 'int64', 'job': 'object', 'people_under_maintenance': 'int64', 'telephone': 'object', 'foreign_worker': 'object'}\n", + "2024-05-29 11:45:26,074 pid:52370 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (69, 22) executed in 0:00:00.012084\n", + "Executed 'Precision on data slice “`property` == \"if not A121/A122 : car or other, not in attribute 6\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}: \n", " Test failed\n", " Metric: 0.72\n", " \n", " \n", - "Executed 'Precision on data slice “`sex` == \"female\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}: \n", + "2024-05-29 11:45:26,087 pid:52370 MainThread giskard.datasets.base INFO Casting dataframe columns from {'account_check_status': 'object', 'duration_in_month': 'int64', 'credit_history': 'object', 'purpose': 'object', 'credit_amount': 'int64', 'savings': 'object', 'present_employment_since': 'object', 'installment_as_income_perc': 'int64', 'sex': 'object', 'personal_status': 'object', 'other_debtors': 'object', 'present_residence_since': 'int64', 'property': 'object', 'age': 'int64', 'other_installment_plans': 'object', 'housing': 'object', 'credits_this_bank': 'int64', 'job': 'object', 'people_under_maintenance': 'int64', 'telephone': 'object', 'foreign_worker': 'object'} to {'account_check_status': 'object', 'duration_in_month': 'int64', 'credit_history': 'object', 'purpose': 'object', 'credit_amount': 'int64', 'savings': 'object', 'present_employment_since': 'object', 'installment_as_income_perc': 'int64', 'sex': 'object', 'personal_status': 'object', 'other_debtors': 'object', 'present_residence_since': 'int64', 'property': 'object', 'age': 'int64', 'other_installment_plans': 'object', 'housing': 'object', 'credits_this_bank': 'int64', 'job': 'object', 'people_under_maintenance': 'int64', 'telephone': 'object', 'foreign_worker': 'object'}\n", + "2024-05-29 11:45:26,089 pid:52370 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (54, 22) executed in 0:00:00.009407\n", + "Executed 'Precision on data slice “`sex` == \"female\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}: \n", " Test failed\n", " Metric: 0.74\n", " \n", - " \n" + " \n", + "2024-05-29 11:45:26,093 pid:52370 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'\n", + "2024-05-29 11:45:26,093 pid:52370 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 11:45:26,094 pid:52370 MainThread giskard.core.suite INFO Precision on data slice “`other_installment_plans` == \"bank\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}): {failed, metric=0.6}\n", + "2024-05-29 11:45:26,094 pid:52370 MainThread giskard.core.suite INFO Precision on data slice “`account_check_status` == \"0 <= ... < 200 DM\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}): {failed, metric=0.6046511627906976}\n", + "2024-05-29 11:45:26,094 pid:52370 MainThread giskard.core.suite INFO Precision on data slice “`present_employment_since` == \"... < 1 year\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}): {failed, metric=0.6521739130434783}\n", + "2024-05-29 11:45:26,094 pid:52370 MainThread giskard.core.suite INFO Recall on data slice “`personal_status` == \"divorced\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8617857142857143}): {failed, metric=0.8048780487804879}\n", + "2024-05-29 11:45:26,095 pid:52370 MainThread giskard.core.suite INFO Precision on data slice “`duration_in_month` >= 16.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}): {failed, metric=0.7125}\n", + "2024-05-29 11:45:26,095 pid:52370 MainThread giskard.core.suite INFO Precision on data slice “`property` == \"if not A121/A122 : car or other, not in attribute 6\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}): {failed, metric=0.7192982456140351}\n", + "2024-05-29 11:45:26,095 pid:52370 MainThread giskard.core.suite INFO Precision on data slice “`sex` == \"female\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7540625}): {failed, metric=0.7368421052631579}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test Precision on data slice “`other_installment_plans` == "bank"”\n
\n \n Measured Metric = 0.6\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n a4d8f299-351e-4722-8176-9b37fa35a9e8\n
\n \n
\n dataset\n German credit scoring dataset\n
\n \n
\n slicing_function\n `other_installment_plans` == "bank"\n
\n \n
\n threshold\n 0.7540625\n
\n \n
\n
\n \n \n
\n Test Precision on data slice “`account_check_status` == "0 <= ... < 200 DM"”\n
\n \n Measured Metric = 0.60465\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n a4d8f299-351e-4722-8176-9b37fa35a9e8\n
\n \n
\n dataset\n German credit scoring dataset\n
\n \n
\n slicing_function\n `account_check_status` == "0 <= ... < 200 DM"\n
\n \n
\n threshold\n 0.7540625\n
\n \n
\n
\n \n \n
\n Test Precision on data slice “`present_employment_since` == "... < 1 year"”\n
\n \n Measured Metric = 0.65217\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n a4d8f299-351e-4722-8176-9b37fa35a9e8\n
\n \n
\n dataset\n German credit scoring dataset\n
\n \n
\n slicing_function\n `present_employment_since` == "... < 1 year"\n
\n \n
\n threshold\n 0.7540625\n
\n \n
\n
\n \n \n
\n Test Recall on data slice “`personal_status` == "divorced"”\n
\n \n Measured Metric = 0.80488\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n a4d8f299-351e-4722-8176-9b37fa35a9e8\n
\n \n
\n dataset\n German credit scoring dataset\n
\n \n
\n slicing_function\n `personal_status` == "divorced"\n
\n \n
\n threshold\n 0.8617857142857143\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`duration_in_month` >= 16.500”\n
\n \n Measured Metric = 0.7125\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n a4d8f299-351e-4722-8176-9b37fa35a9e8\n
\n \n
\n dataset\n German credit scoring dataset\n
\n \n
\n slicing_function\n `duration_in_month` >= 16.500\n
\n \n
\n threshold\n 0.7540625\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`property` == "if not A121/A122 : car or other, not in attribute 6"”\n
\n \n Measured Metric = 0.7193\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n a4d8f299-351e-4722-8176-9b37fa35a9e8\n
\n \n
\n dataset\n German credit scoring dataset\n
\n \n
\n slicing_function\n `property` == "if not A121/A122 : car or other, not in attribute 6"\n
\n \n
\n threshold\n 0.7540625\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`sex` == "female"”\n
\n \n Measured Metric = 0.73684\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n a4d8f299-351e-4722-8176-9b37fa35a9e8\n
\n \n
\n dataset\n German credit scoring dataset\n
\n \n
\n slicing_function\n `sex` == "female"\n
\n \n
\n threshold\n 0.7540625\n
\n \n
\n \n \n\n \n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`other_installment_plans` == "bank"”\n", + "
\n", + " \n", + " Measured Metric = 0.6\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Credit scoring classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " German credit scoring dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `other_installment_plans` == "bank"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.7540625\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`account_check_status` == "0 <= ... < 200 DM"”\n", + "
\n", + " \n", + " Measured Metric = 0.60465\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Credit scoring classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " German credit scoring dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `account_check_status` == "0 <= ... < 200 DM"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.7540625\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`present_employment_since` == "... < 1 year"”\n", + "
\n", + " \n", + " Measured Metric = 0.65217\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Credit scoring classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " German credit scoring dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `present_employment_since` == "... < 1 year"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.7540625\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`personal_status` == "divorced"”\n", + "
\n", + " \n", + " Measured Metric = 0.80488\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Credit scoring classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " German credit scoring dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `personal_status` == "divorced"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.8617857142857143\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`duration_in_month` >= 16.500”\n", + "
\n", + " \n", + " Measured Metric = 0.7125\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Credit scoring classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " German credit scoring dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `duration_in_month` >= 16.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.7540625\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`property` == "if not A121/A122 : car or other, not in attribute 6"”\n", + "
\n", + " \n", + " Measured Metric = 0.7193\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Credit scoring classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " German credit scoring dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `property` == "if not A121/A122 : car or other, not in attribute 6"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.7540625\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`sex` == "female"”\n", + "
\n", + " \n", + " Measured Metric = 0.73684\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Credit scoring classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " German credit scoring dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `sex` == "female"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.7540625\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + "\n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] }, "execution_count": 11, "metadata": {}, @@ -473,17 +3322,13 @@ "source": [ "test_suite = results.generate_test_suite(\"My first test suite\")\n", "test_suite.run()" - ], - "metadata": { - "collapsed": false, - "ExecuteTime": { - "end_time": "2023-11-08T22:51:26.616126Z", - "start_time": "2023-11-08T22:51:26.248461Z" - } - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "### Customize your suite by loading objects from the Giskard catalog\n", "\n", @@ -496,79 +3341,6 @@ "To create custom tests, refer to [this page](https://docs.giskard.ai/en/stable/open_source/customize_tests/test_model/index.html).\n", "\n", "For demo purposes, we will load a simple unit test (test_f1) that checks if the test F1 score is above the given threshold. For more examples of tests and functions, refer to the Giskard catalog." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" - ] - }, - { - "cell_type": "markdown", - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to:\n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." ] }, { @@ -579,49 +3351,7 @@ }, "outputs": [], "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to:\n", - " \n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" + "test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" ] } ], @@ -634,14 +3364,14 @@ "language_info": { "codemirror_mode": { "name": "ipython", - "version": 2 + "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython2", - "version": "2.7.6" + "pygments_lexer": "ipython3", + "version": "3.10.14" } }, "nbformat": 4, diff --git a/docs/reference/notebooks/drug_classification_sklearn.ipynb b/docs/reference/notebooks/drug_classification_sklearn.ipynb index 1f4109015c..6c443f487b 100644 --- a/docs/reference/notebooks/drug_classification_sklearn.ipynb +++ b/docs/reference/notebooks/drug_classification_sklearn.ipynb @@ -22,12 +22,7 @@ "Outline: \n", "\n", "* Detect vulnerabilities automatically with Giskard’s scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to: \n", - "\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { @@ -43,7 +38,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "id": "85a76ae027fad887", "metadata": { "ExecuteTime": { @@ -69,7 +64,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "id": "5fdae2be34577a32", "metadata": { "collapsed": false @@ -88,7 +83,7 @@ "from sklearn.model_selection import train_test_split\n", "from imblearn.pipeline import Pipeline as PipelineImb\n", "\n", - "from giskard import Dataset, Model, scan, GiskardClient, testing, Suite" + "from giskard import Dataset, Model, scan, testing" ] }, { @@ -106,11 +101,11 @@ "execution_count": 2, "id": "d44430add2918aa1", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2024-02-09T09:29:15.513819Z", "start_time": "2024-02-09T09:29:15.470284Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -155,11 +150,11 @@ "execution_count": 3, "id": "5a2fbb53dd96b195", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2024-02-09T09:29:16.892145Z", "start_time": "2024-02-09T09:29:16.870124Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -204,9 +199,13 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "id": "a1887adb", "metadata": { + "ExecuteTime": { + "end_time": "2024-02-09T09:29:17.524094Z", + "start_time": "2024-02-09T09:29:17.478053Z" + }, "execution": { "iopub.execute_input": "2022-05-04T02:53:16.032035Z", "iopub.status.busy": "2022-05-04T02:53:16.030941Z", @@ -222,21 +221,9 @@ "start_time": "2022-05-04T02:53:15.970178", "status": "completed" }, - "tags": [], - "ExecuteTime": { - "end_time": "2024-02-09T09:29:17.524094Z", - "start_time": "2024-02-09T09:29:17.478053Z" - } + "tags": [] }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Data was loaded!\n" - ] - } - ], + "outputs": [], "source": [ "df_drug = load_data()\n", "df_drug = bin_numerical(df_drug)" @@ -257,11 +244,11 @@ "execution_count": 5, "id": "7c32c64979960c7d", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2024-02-09T09:29:19.259540Z", "start_time": "2024-02-09T09:29:18.975922Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -282,14 +269,14 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "id": "3c6c6bea2652fe95", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2024-02-09T09:29:20.755641Z", "start_time": "2024-02-09T09:29:20.712569Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -322,13 +309,13 @@ }, { "cell_type": "markdown", - "source": [ - "### Build estimator" - ], + "id": "a753c44d4e184034", "metadata": { "collapsed": false }, - "id": "a753c44d4e184034" + "source": [ + "### Build estimator" + ] }, { "cell_type": "code", @@ -396,13 +383,13 @@ }, { "cell_type": "markdown", - "source": [ - "## Detect vulnerabilities in your model" - ], + "id": "cba9f80f09e85818", "metadata": { "collapsed": false }, - "id": "cba9f80f09e85818" + "source": [ + "## Detect vulnerabilities in your model" + ] }, { "cell_type": "markdown", @@ -430,19 +417,1127 @@ }, { "cell_type": "code", - "execution_count": 65, + "execution_count": 10, "id": "82db2acb6c2ae6dd", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2024-02-09T10:02:51.276834Z", "start_time": "2024-02-09T10:02:50.767947Z" - } + }, + "collapsed": false }, "outputs": [ { "data": { - "text/html": "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -464,60 +1559,363 @@ }, { "cell_type": "markdown", + "id": "2047c63c4654ad4c", + "metadata": { + "collapsed": false + }, "source": [ "### Generate test suites from the scan\n", "\n", "The objects produced by the scan can be used as fixtures to generate a test suite that integrate all detected vulnerabilities. Test suites allow you to evaluate and validate your model's performance, ensuring that it behaves as expected on a set of predefined test cases, and to identify any regressions or issues that might arise during development or updates." - ], - "metadata": { - "collapsed": false - }, - "id": "2047c63c4654ad4c" + ] }, { "cell_type": "code", - "execution_count": 66, + "execution_count": 11, "id": "f44d26a78bda617e", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2024-02-09T10:11:54.015092Z", "start_time": "2024-02-09T10:11:53.880966Z" - } + }, + "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Executed 'Precision on data slice “`Age` == \"30s\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.76}: \n", + "2024-05-29 11:46:34,990 pid:52758 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Age': 'category', 'Sex': 'object', 'BP': 'object', 'Cholesterol': 'object', 'Na_to_K': 'category'} to {'Age': 'category', 'Sex': 'object', 'BP': 'object', 'Cholesterol': 'object', 'Na_to_K': 'category'}\n", + "2024-05-29 11:46:34,993 pid:52758 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (22, 6) executed in 0:00:00.009771\n", + "Executed 'Precision on data slice “`Age` == \"30s\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.76}: \n", " Test failed\n", " Metric: 0.68\n", " \n", " \n", - "Executed 'Precision on data slice “`BP` == \"NORMAL\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.76}: \n", + "2024-05-29 11:46:35,010 pid:52758 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Age': 'category', 'Sex': 'object', 'BP': 'object', 'Cholesterol': 'object', 'Na_to_K': 'category'} to {'Age': 'category', 'Sex': 'object', 'BP': 'object', 'Cholesterol': 'object', 'Na_to_K': 'category'}\n", + "2024-05-29 11:46:35,011 pid:52758 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (33, 6) executed in 0:00:00.007762\n", + "Executed 'Precision on data slice “`BP` == \"NORMAL\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.76}: \n", " Test failed\n", " Metric: 0.73\n", " \n", " \n", - "Executed 'Precision on data slice “`Na_to_K` == \"10-20\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.76}: \n", + "2024-05-29 11:46:35,023 pid:52758 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Age': 'category', 'Sex': 'object', 'BP': 'object', 'Cholesterol': 'object', 'Na_to_K': 'category'} to {'Age': 'category', 'Sex': 'object', 'BP': 'object', 'Cholesterol': 'object', 'Na_to_K': 'category'}\n", + "2024-05-29 11:46:35,025 pid:52758 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (59, 6) executed in 0:00:00.006482\n", + "Executed 'Precision on data slice “`Na_to_K` == \"10-20\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.76}: \n", " Test failed\n", " Metric: 0.73\n", " \n", " \n", - "Executed 'Precision on data slice “`Cholesterol` == \"HIGH\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.76}: \n", + "2024-05-29 11:46:35,034 pid:52758 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Age': 'category', 'Sex': 'object', 'BP': 'object', 'Cholesterol': 'object', 'Na_to_K': 'category'} to {'Age': 'category', 'Sex': 'object', 'BP': 'object', 'Cholesterol': 'object', 'Na_to_K': 'category'}\n", + "2024-05-29 11:46:35,036 pid:52758 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (53, 6) executed in 0:00:00.006638\n", + "Executed 'Precision on data slice “`Cholesterol` == \"HIGH\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.76}: \n", " Test failed\n", " Metric: 0.75\n", " \n", - " \n" + " \n", + "2024-05-29 11:46:35,039 pid:52758 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'\n", + "2024-05-29 11:46:35,040 pid:52758 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 11:46:35,040 pid:52758 MainThread giskard.core.suite INFO Precision on data slice “`Age` == \"30s\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.76}): {failed, metric=0.6818181818181818}\n", + "2024-05-29 11:46:35,040 pid:52758 MainThread giskard.core.suite INFO Precision on data slice “`BP` == \"NORMAL\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.76}): {failed, metric=0.7272727272727273}\n", + "2024-05-29 11:46:35,041 pid:52758 MainThread giskard.core.suite INFO Precision on data slice “`Na_to_K` == \"10-20\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.76}): {failed, metric=0.7288135593220338}\n", + "2024-05-29 11:46:35,041 pid:52758 MainThread giskard.core.suite INFO Precision on data slice “`Cholesterol` == \"HIGH\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.76}): {failed, metric=0.7547169811320755}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test Precision on data slice “`Age` == "30s"”\n
\n \n Measured Metric = 0.68182\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n Drug classifier\n
\n \n
\n dataset\n Drug classification dataset\n
\n \n
\n slicing_function\n `Age` == "30s"\n
\n \n
\n threshold\n 0.76\n
\n \n
\n
\n \n \n
\n Test Precision on data slice “`BP` == "NORMAL"”\n
\n \n Measured Metric = 0.72727\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n Drug classifier\n
\n \n
\n dataset\n Drug classification dataset\n
\n \n
\n slicing_function\n `BP` == "NORMAL"\n
\n \n
\n threshold\n 0.76\n
\n \n
\n
\n \n \n
\n Test Precision on data slice “`Na_to_K` == "10-20"”\n
\n \n Measured Metric = 0.72881\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n Drug classifier\n
\n \n
\n dataset\n Drug classification dataset\n
\n \n
\n slicing_function\n `Na_to_K` == "10-20"\n
\n \n
\n threshold\n 0.76\n
\n \n
\n
\n \n \n
\n Test Precision on data slice “`Cholesterol` == "HIGH"”\n
\n \n Measured Metric = 0.75472\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n Drug classifier\n
\n \n
\n dataset\n Drug classification dataset\n
\n \n
\n slicing_function\n `Cholesterol` == "HIGH"\n
\n \n
\n threshold\n 0.76\n
\n \n
\n \n \n\n \n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Age` == "30s"”\n", + "
\n", + " \n", + " Measured Metric = 0.68182\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Drug classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Drug classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Age` == "30s"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.76\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`BP` == "NORMAL"”\n", + "
\n", + " \n", + " Measured Metric = 0.72727\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Drug classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Drug classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `BP` == "NORMAL"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.76\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Na_to_K` == "10-20"”\n", + "
\n", + " \n", + " Measured Metric = 0.72881\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Drug classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Drug classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Na_to_K` == "10-20"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.76\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Cholesterol` == "HIGH"”\n", + "
\n", + " \n", + " Measured Metric = 0.75472\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Drug classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Drug classification dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Cholesterol` == "HIGH"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.76\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + "\n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] }, - "execution_count": 66, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } @@ -558,126 +1956,6 @@ "source": [ "test_suite.add_test(testing.test_f1(model=wrapped_model, dataset=giskard_dataset, threshold=0.7)).run()" ] - }, - { - "cell_type": "markdown", - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to:\n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ], - "metadata": { - "collapsed": false - }, - "id": "850f69c9e106c2c7" - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], - "metadata": { - "collapsed": false - }, - "id": "428e6ea983a1c8a5" - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - }, - "id": "d2987c5a0a27a36a" - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - }, - "id": "9f7b1a4e50f63aad" - }, - { - "cell_type": "markdown", - "id": "f2c270bda4037820", - "metadata": { - "collapsed": false - }, - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "95186436fe201810", - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ] - }, - { - "cell_type": "markdown", - "id": "5e6a3c1e12f8cedd", - "metadata": { - "collapsed": false - }, - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to:\n", - " \n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" - ], - "metadata": { - "collapsed": false - }, - "id": "e3d431fdd4fcc407" } ], "metadata": { @@ -696,7 +1974,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.10" + "version": "3.10.14" }, "papermill": { "default_parameters": {}, diff --git a/docs/reference/notebooks/enron_email_classification_sklearn.ipynb b/docs/reference/notebooks/enron_email_classification_sklearn.ipynb index 24647b3874..cc705937be 100644 --- a/docs/reference/notebooks/enron_email_classification_sklearn.ipynb +++ b/docs/reference/notebooks/enron_email_classification_sklearn.ipynb @@ -21,12 +21,7 @@ "Outline: \n", "\n", "* Detect vulnerabilities automatically with Giskard’s scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to: \n", - "\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { @@ -62,7 +57,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 1, "id": "7d960163", "metadata": { "ExecuteTime": { @@ -90,22 +85,30 @@ "from sklearn.feature_extraction.text import CountVectorizer\n", "from sklearn.feature_extraction.text import TfidfTransformer\n", "\n", - "from giskard import Dataset, Model, scan, testing, Suite, GiskardClient" + "from giskard import Dataset, Model, scan, testing" ] }, { "cell_type": "markdown", - "source": [ - "## Define constants" - ], + "id": "80172d2eb14941f5", "metadata": { "collapsed": false }, - "id": "80172d2eb14941f5" + "source": [ + "## Define constants" + ] }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 2, + "id": "4e5025e1dfa0343f", + "metadata": { + "ExecuteTime": { + "end_time": "2023-11-09T11:24:20.678786Z", + "start_time": "2023-11-09T11:24:20.085220Z" + }, + "collapsed": false + }, "outputs": [], "source": [ "TEXT_COLUMN = \"Content\"\n", @@ -132,15 +135,7 @@ " 13: 'trip reports'}\n", "\n", "LABEL_CAT = 3" - ], - "metadata": { - "collapsed": false, - "ExecuteTime": { - "end_time": "2023-11-09T11:24:20.678786Z", - "start_time": "2023-11-09T11:24:20.085220Z" - } - }, - "id": "4e5025e1dfa0343f" + ] }, { "attachments": {}, @@ -251,31 +246,31 @@ }, { "cell_type": "markdown", - "source": [ - "### Train-test split" - ], + "id": "6d7d19a7ccedf125", "metadata": { "collapsed": false }, - "id": "6d7d19a7ccedf125" + "source": [ + "### Train-test split" + ] }, { "cell_type": "code", - "execution_count": 37, - "outputs": [], - "source": [ - "Y = data_filtered[TARGET_COLUMN]\n", - "X = data_filtered.drop(columns=[TARGET_COLUMN])[list(COLUMN_TYPES.keys())]\n", - "X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, random_state=RANDOM_STATE, stratify=Y)" - ], + "execution_count": 6, + "id": "1019a31f5c38dc95", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T11:24:32.909461Z", "start_time": "2023-11-09T11:24:32.904935Z" - } + }, + "collapsed": false }, - "id": "1019a31f5c38dc95" + "outputs": [], + "source": [ + "Y = data_filtered[TARGET_COLUMN]\n", + "X = data_filtered.drop(columns=[TARGET_COLUMN])[list(COLUMN_TYPES.keys())]\n", + "X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, random_state=RANDOM_STATE, stratify=Y)" + ] }, { "attachments": {}, @@ -289,7 +284,7 @@ }, { "cell_type": "code", - "execution_count": 38, + "execution_count": null, "id": "b4adbbbd", "metadata": { "ExecuteTime": { @@ -385,13 +380,13 @@ }, { "cell_type": "markdown", - "source": [ - "## Detect vulnerabilities in your model" - ], + "id": "6c5788eb97dd7e2b", "metadata": { "collapsed": false }, - "id": "6c5788eb97dd7e2b" + "source": [ + "## Detect vulnerabilities in your model" + ] }, { "cell_type": "markdown", @@ -417,7 +412,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 11, "id": "ecb49fa5", "metadata": { "ExecuteTime": { @@ -428,7 +423,2709 @@ "outputs": [ { "data": { - "text/html": "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -440,13 +3137,13 @@ }, { "cell_type": "markdown", - "source": [ - "## Generate comprehensive test suites automatically for your model" - ], + "id": "5651210c188e81ca", "metadata": { "collapsed": false }, - "id": "5651210c188e81ca" + "source": [ + "## Generate comprehensive test suites automatically for your model" + ] }, { "cell_type": "markdown", @@ -460,7 +3157,7 @@ }, { "cell_type": "code", - "execution_count": 45, + "execution_count": 12, "id": "bea736a9", "metadata": { "ExecuteTime": { @@ -473,94 +3170,1018 @@ "name": "stdout", "output_type": "stream", "text": [ - "Executed 'Invariance to “Add typos”' with arguments {'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", + "2024-05-29 11:56:58,129 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:56:58,130 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (213, 2) executed in 0:00:00.018116\n", + "2024-05-29 11:57:01,545 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:01,546 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (213, 2) executed in 0:00:00.022507\n", + "2024-05-29 11:57:01,552 pid:53166 MainThread giskard.utils.logging_utils INFO Perturb and predict data executed in 0:00:03.448961\n", + "2024-05-29 11:57:01,554 pid:53166 MainThread giskard.utils.logging_utils INFO Compare and predict the data executed in 0:00:00.000496\n", + "Executed 'Invariance to “Transform numbers to words”' with arguments {'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", " Test failed\n", - " Metric: 0.95\n", - " - [TestMessageLevel.INFO] 213 rows were perturbed\n", + " Metric: 0.92\n", + " - [INFO] 192 rows were perturbed\n", " \n", - "Executed 'Precision on data slice “`Content` contains \"gives\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", + "2024-05-29 11:57:01,567 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:01,568 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (213, 2) executed in 0:00:00.008830\n", + "2024-05-29 11:57:03,985 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:04,121 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (213, 2) executed in 0:00:00.155254\n", + "2024-05-29 11:57:04,123 pid:53166 MainThread giskard.utils.logging_utils INFO Perturb and predict data executed in 0:00:02.564476\n", + "2024-05-29 11:57:04,124 pid:53166 MainThread giskard.utils.logging_utils INFO Compare and predict the data executed in 0:00:00.000349\n", + "Executed 'Invariance to “Add typos”' with arguments {'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", + " Test failed\n", + " Metric: 0.93\n", + " - [INFO] 213 rows were perturbed\n", + " \n", + "2024-05-29 11:57:04,161 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:04,162 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (21, 2) executed in 0:00:00.017651\n", + "Executed 'Precision on data slice “`Content` contains \"gives\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", " Test failed\n", " Metric: 0.29\n", " \n", " \n", - "Executed 'Precision on data slice “`Content` contains \"delay\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", + "2024-05-29 11:57:04,199 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:04,199 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (20, 2) executed in 0:00:00.017908\n", + "Executed 'Precision on data slice “`Content` contains \"delay\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", " Test failed\n", " Metric: 0.3\n", " \n", " \n", - "Executed 'Precision on data slice “`Content` contains \"sacramento\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", + "2024-05-29 11:57:04,232 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:04,233 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (23, 2) executed in 0:00:00.018628\n", + "Executed 'Precision on data slice “`Content` contains \"sacramento\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", " Test failed\n", " Metric: 0.3\n", " \n", " \n", - "Executed 'Precision on data slice “`Content` contains \"dasovich\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", + "2024-05-29 11:57:04,272 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:04,272 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (23, 2) executed in 0:00:00.008717\n", + "Executed 'Precision on data slice “`Content` contains \"dasovich\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", " Test failed\n", " Metric: 0.3\n", " \n", " \n", - "Executed 'Precision on data slice “`Content` contains \"pro\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", + "2024-05-29 11:57:04,312 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:04,313 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (21, 2) executed in 0:00:00.016500\n", + "Executed 'Precision on data slice “`Content` contains \"pro\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", " Test failed\n", " Metric: 0.33\n", " \n", " \n", - "Executed 'Precision on data slice “`Content` contains \"jeff\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", + "2024-05-29 11:57:04,350 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:04,352 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (53, 2) executed in 0:00:00.017631\n", + "Executed 'Precision on data slice “`Content` contains \"jeff\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", " Test failed\n", " Metric: 0.34\n", " \n", " \n", - "Executed 'Precision on data slice “`Content` contains \"alan\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", + "2024-05-29 11:57:04,390 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:04,391 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (28, 2) executed in 0:00:00.013602\n", + "Executed 'Precision on data slice “`Content` contains \"alan\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", " Test failed\n", " Metric: 0.36\n", " \n", " \n", - "Executed 'Precision on data slice “`Content` contains \"judge\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", + "2024-05-29 11:57:04,426 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:04,427 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (21, 2) executed in 0:00:00.017925\n", + "Executed 'Precision on data slice “`Content` contains \"judge\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", " Test failed\n", " Metric: 0.38\n", " \n", " \n", - "Executed 'Precision on data slice “`Content` contains \"blackouts\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", + "2024-05-29 11:57:04,458 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:04,459 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (21, 2) executed in 0:00:00.019073\n", + "Executed 'Precision on data slice “`Content` contains \"blackouts\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", " Test failed\n", " Metric: 0.38\n", " \n", " \n", - "Executed 'Precision on data slice “`Content` contains \"emergency\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", + "2024-05-29 11:57:04,498 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:04,499 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (23, 2) executed in 0:00:00.017635\n", + "Executed 'Precision on data slice “`Content` contains \"push\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", " Test failed\n", " Metric: 0.39\n", " \n", " \n", - "Executed 'Precision on data slice “`Content` contains \"push\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", + "2024-05-29 11:57:04,535 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:04,536 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (23, 2) executed in 0:00:00.019982\n", + "Executed 'Precision on data slice “`Content` contains \"emergency\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", " Test failed\n", " Metric: 0.39\n", " \n", " \n", - "Executed 'Precision on data slice “`Content` contains \"fair\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", + "2024-05-29 11:57:04,568 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:04,568 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (35, 2) executed in 0:00:00.019893\n", + "Executed 'Precision on data slice “`Content` contains \"governor\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", " Test failed\n", " Metric: 0.4\n", " \n", " \n", - "Executed 'Precision on data slice “`Content` contains \"duke\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", + "2024-05-29 11:57:04,609 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:04,609 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (25, 2) executed in 0:00:00.018383\n", + "Executed 'Precision on data slice “`Content` contains \"duke\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", " Test failed\n", " Metric: 0.4\n", " \n", " \n", - "Executed 'Precision on data slice “`Content` contains \"governor\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", + "2024-05-29 11:57:04,648 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:04,649 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (20, 2) executed in 0:00:00.018559\n", + "Executed 'Precision on data slice “`Content` contains \"fair\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", " Test failed\n", " Metric: 0.4\n", " \n", " \n", - "Executed 'Precision on data slice “`Content` contains \"friday\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", + "2024-05-29 11:57:04,686 pid:53166 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Content': 'object'} to {'Content': 'object'}\n", + "2024-05-29 11:57:04,687 pid:53166 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (42, 2) executed in 0:00:00.020775\n", + "Executed 'Precision on data slice “`Content` contains \"friday\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}: \n", " Test failed\n", " Metric: 0.4\n", " \n", - " \n" + " \n", + "2024-05-29 11:57:04,689 pid:53166 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'\n", + "2024-05-29 11:57:04,689 pid:53166 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 11:57:04,690 pid:53166 MainThread giskard.core.suite INFO Invariance to “Transform numbers to words” ({'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}): {failed, metric=0.921875}\n", + "2024-05-29 11:57:04,690 pid:53166 MainThread giskard.core.suite INFO Invariance to “Add typos” ({'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}): {failed, metric=0.9342723004694836}\n", + "2024-05-29 11:57:04,690 pid:53166 MainThread giskard.core.suite INFO Precision on data slice “`Content` contains \"gives\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}): {failed, metric=0.2857142857142857}\n", + "2024-05-29 11:57:04,690 pid:53166 MainThread giskard.core.suite INFO Precision on data slice “`Content` contains \"delay\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}): {failed, metric=0.3}\n", + "2024-05-29 11:57:04,691 pid:53166 MainThread giskard.core.suite INFO Precision on data slice “`Content` contains \"sacramento\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}): {failed, metric=0.30434782608695654}\n", + "2024-05-29 11:57:04,691 pid:53166 MainThread giskard.core.suite INFO Precision on data slice “`Content` contains \"dasovich\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}): {failed, metric=0.30434782608695654}\n", + "2024-05-29 11:57:04,691 pid:53166 MainThread giskard.core.suite INFO Precision on data slice “`Content` contains \"pro\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}): {failed, metric=0.3333333333333333}\n", + "2024-05-29 11:57:04,691 pid:53166 MainThread giskard.core.suite INFO Precision on data slice “`Content` contains \"jeff\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}): {failed, metric=0.33962264150943394}\n", + "2024-05-29 11:57:04,692 pid:53166 MainThread giskard.core.suite INFO Precision on data slice “`Content` contains \"alan\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}): {failed, metric=0.35714285714285715}\n", + "2024-05-29 11:57:04,692 pid:53166 MainThread giskard.core.suite INFO Precision on data slice “`Content` contains \"judge\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}): {failed, metric=0.38095238095238093}\n", + "2024-05-29 11:57:04,692 pid:53166 MainThread giskard.core.suite INFO Precision on data slice “`Content` contains \"blackouts\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}): {failed, metric=0.38095238095238093}\n", + "2024-05-29 11:57:04,693 pid:53166 MainThread giskard.core.suite INFO Precision on data slice “`Content` contains \"push\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}): {failed, metric=0.391304347826087}\n", + "2024-05-29 11:57:04,693 pid:53166 MainThread giskard.core.suite INFO Precision on data slice “`Content` contains \"emergency\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}): {failed, metric=0.391304347826087}\n", + "2024-05-29 11:57:04,694 pid:53166 MainThread giskard.core.suite INFO Precision on data slice “`Content` contains \"governor\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}): {failed, metric=0.4}\n", + "2024-05-29 11:57:04,694 pid:53166 MainThread giskard.core.suite INFO Precision on data slice “`Content` contains \"duke\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}): {failed, metric=0.4}\n", + "2024-05-29 11:57:04,695 pid:53166 MainThread giskard.core.suite INFO Precision on data slice “`Content` contains \"fair\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}): {failed, metric=0.4}\n", + "2024-05-29 11:57:04,695 pid:53166 MainThread giskard.core.suite INFO Precision on data slice “`Content` contains \"friday\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.544131455399061}): {failed, metric=0.40476190476190477}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test Invariance to “Add typos”\n
\n \n Measured Metric = 0.94836\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 3ef6293a-8420-40ee-bb44-57602b025a6f\n
\n \n
\n dataset\n Email classifier\n
\n \n
\n transformation_function\n Add typos\n
\n \n
\n threshold\n 0.95\n
\n \n
\n output_sensitivity\n 0.05\n
\n \n
\n
\n \n \n
\n Test Precision on data slice “`Content` contains "gives"”\n
\n \n Measured Metric = 0.28571\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 3ef6293a-8420-40ee-bb44-57602b025a6f\n
\n \n
\n dataset\n Email classifier\n
\n \n
\n slicing_function\n `Content` contains "gives"\n
\n \n
\n threshold\n 0.544131455399061\n
\n \n
\n
\n \n \n
\n Test Precision on data slice “`Content` contains "delay"”\n
\n \n Measured Metric = 0.3\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 3ef6293a-8420-40ee-bb44-57602b025a6f\n
\n \n
\n dataset\n Email classifier\n
\n \n
\n slicing_function\n `Content` contains "delay"\n
\n \n
\n threshold\n 0.544131455399061\n
\n \n
\n
\n \n \n
\n Test Precision on data slice “`Content` contains "sacramento"”\n
\n \n Measured Metric = 0.30435\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 3ef6293a-8420-40ee-bb44-57602b025a6f\n
\n \n
\n dataset\n Email classifier\n
\n \n
\n slicing_function\n `Content` contains "sacramento"\n
\n \n
\n threshold\n 0.544131455399061\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Content` contains "dasovich"”\n
\n \n Measured Metric = 0.30435\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 3ef6293a-8420-40ee-bb44-57602b025a6f\n
\n \n
\n dataset\n Email classifier\n
\n \n
\n slicing_function\n `Content` contains "dasovich"\n
\n \n
\n threshold\n 0.544131455399061\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Content` contains "pro"”\n
\n \n Measured Metric = 0.33333\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 3ef6293a-8420-40ee-bb44-57602b025a6f\n
\n \n
\n dataset\n Email classifier\n
\n \n
\n slicing_function\n `Content` contains "pro"\n
\n \n
\n threshold\n 0.544131455399061\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Content` contains "jeff"”\n
\n \n Measured Metric = 0.33962\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 3ef6293a-8420-40ee-bb44-57602b025a6f\n
\n \n
\n dataset\n Email classifier\n
\n \n
\n slicing_function\n `Content` contains "jeff"\n
\n \n
\n threshold\n 0.544131455399061\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Content` contains "alan"”\n
\n \n Measured Metric = 0.35714\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 3ef6293a-8420-40ee-bb44-57602b025a6f\n
\n \n
\n dataset\n Email classifier\n
\n \n
\n slicing_function\n `Content` contains "alan"\n
\n \n
\n threshold\n 0.544131455399061\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Content` contains "judge"”\n
\n \n Measured Metric = 0.38095\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 3ef6293a-8420-40ee-bb44-57602b025a6f\n
\n \n
\n dataset\n Email classifier\n
\n \n
\n slicing_function\n `Content` contains "judge"\n
\n \n
\n threshold\n 0.544131455399061\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Content` contains "blackouts"”\n
\n \n Measured Metric = 0.38095\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 3ef6293a-8420-40ee-bb44-57602b025a6f\n
\n \n
\n dataset\n Email classifier\n
\n \n
\n slicing_function\n `Content` contains "blackouts"\n
\n \n
\n threshold\n 0.544131455399061\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Content` contains "emergency"”\n
\n \n Measured Metric = 0.3913\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 3ef6293a-8420-40ee-bb44-57602b025a6f\n
\n \n
\n dataset\n Email classifier\n
\n \n
\n slicing_function\n `Content` contains "emergency"\n
\n \n
\n threshold\n 0.544131455399061\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Content` contains "push"”\n
\n \n Measured Metric = 0.3913\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 3ef6293a-8420-40ee-bb44-57602b025a6f\n
\n \n
\n dataset\n Email classifier\n
\n \n
\n slicing_function\n `Content` contains "push"\n
\n \n
\n threshold\n 0.544131455399061\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Content` contains "fair"”\n
\n \n Measured Metric = 0.4\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 3ef6293a-8420-40ee-bb44-57602b025a6f\n
\n \n
\n dataset\n Email classifier\n
\n \n
\n slicing_function\n `Content` contains "fair"\n
\n \n
\n threshold\n 0.544131455399061\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Content` contains "duke"”\n
\n \n Measured Metric = 0.4\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 3ef6293a-8420-40ee-bb44-57602b025a6f\n
\n \n
\n dataset\n Email classifier\n
\n \n
\n slicing_function\n `Content` contains "duke"\n
\n \n
\n threshold\n 0.544131455399061\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Content` contains "governor"”\n
\n \n Measured Metric = 0.4\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 3ef6293a-8420-40ee-bb44-57602b025a6f\n
\n \n
\n dataset\n Email classifier\n
\n \n
\n slicing_function\n `Content` contains "governor"\n
\n \n
\n threshold\n 0.544131455399061\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Content` contains "friday"”\n
\n \n Measured Metric = 0.40476\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 3ef6293a-8420-40ee-bb44-57602b025a6f\n
\n \n
\n dataset\n Email classifier\n
\n \n
\n slicing_function\n `Content` contains "friday"\n
\n \n
\n threshold\n 0.544131455399061\n
\n \n
\n \n \n\n \n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Invariance to “Transform numbers to words”\n", + "
\n", + " \n", + " Measured Metric = 0.92188\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " transformation_function\n", + " Transform numbers to words\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " output_sensitivity\n", + " 0.05\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Invariance to “Add typos”\n", + "
\n", + " \n", + " Measured Metric = 0.93427\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " transformation_function\n", + " Add typos\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " output_sensitivity\n", + " 0.05\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Content` contains "gives"”\n", + "
\n", + " \n", + " Measured Metric = 0.28571\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Content` contains "gives"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.544131455399061\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Content` contains "delay"”\n", + "
\n", + " \n", + " Measured Metric = 0.3\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Content` contains "delay"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.544131455399061\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Content` contains "sacramento"”\n", + "
\n", + " \n", + " Measured Metric = 0.30435\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Content` contains "sacramento"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.544131455399061\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Content` contains "dasovich"”\n", + "
\n", + " \n", + " Measured Metric = 0.30435\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Content` contains "dasovich"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.544131455399061\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Content` contains "pro"”\n", + "
\n", + " \n", + " Measured Metric = 0.33333\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Content` contains "pro"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.544131455399061\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Content` contains "jeff"”\n", + "
\n", + " \n", + " Measured Metric = 0.33962\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Content` contains "jeff"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.544131455399061\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Content` contains "alan"”\n", + "
\n", + " \n", + " Measured Metric = 0.35714\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Content` contains "alan"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.544131455399061\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Content` contains "judge"”\n", + "
\n", + " \n", + " Measured Metric = 0.38095\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Content` contains "judge"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.544131455399061\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Content` contains "blackouts"”\n", + "
\n", + " \n", + " Measured Metric = 0.38095\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Content` contains "blackouts"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.544131455399061\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Content` contains "push"”\n", + "
\n", + " \n", + " Measured Metric = 0.3913\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Content` contains "push"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.544131455399061\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Content` contains "emergency"”\n", + "
\n", + " \n", + " Measured Metric = 0.3913\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Content` contains "emergency"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.544131455399061\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Content` contains "governor"”\n", + "
\n", + " \n", + " Measured Metric = 0.4\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Content` contains "governor"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.544131455399061\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Content` contains "duke"”\n", + "
\n", + " \n", + " Measured Metric = 0.4\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Content` contains "duke"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.544131455399061\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Content` contains "fair"”\n", + "
\n", + " \n", + " Measured Metric = 0.4\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Content` contains "fair"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.544131455399061\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Content` contains "friday"”\n", + "
\n", + " \n", + " Measured Metric = 0.40476\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Email category classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Emails of different categories\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Content` contains "friday"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.544131455399061\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + "\n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] }, - "execution_count": 45, + "execution_count": 12, "metadata": {}, "output_type": "execute_result" } @@ -591,132 +4212,14 @@ { "cell_type": "code", "execution_count": null, - "outputs": [], - "source": [ - "test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" - ], - "metadata": { - "collapsed": false - }, - "id": "13c049a762dcc4f" - }, - { - "cell_type": "markdown", - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to:\n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ], - "metadata": { - "collapsed": false - }, - "id": "42bef6e343b97544" - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], + "id": "13c049a762dcc4f", "metadata": { "collapsed": false }, - "id": "36f7ffad0a45f60e" - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - }, - "id": "8f25937d7f7f138e" - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - }, - "id": "cfe814d192e70ca1" - }, - { - "cell_type": "markdown", - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." - ], - "metadata": { - "collapsed": false - }, - "id": "f4e667202aed8fbd" - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8efd6bf3", - "metadata": {}, "outputs": [], "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ] - }, - { - "cell_type": "markdown", - "id": "7f594b5a762b09", - "metadata": { - "collapsed": false - }, - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to: \n", - "\n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" + "test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" - ], - "metadata": { - "collapsed": false - }, - "id": "c84fad7b733e0553" } ], "metadata": { @@ -735,7 +4238,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.11" + "version": "3.10.14" } }, "nbformat": 4, diff --git a/docs/reference/notebooks/fake_real_news_classification.ipynb b/docs/reference/notebooks/fake_real_news_classification.ipynb index 40e9fa9766..cb74b26e6b 100644 --- a/docs/reference/notebooks/fake_real_news_classification.ipynb +++ b/docs/reference/notebooks/fake_real_news_classification.ipynb @@ -19,12 +19,7 @@ "Outline: \n", "\n", "* Detect vulnerabilities automatically with Giskard’s scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to: \n", - "\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { @@ -61,7 +56,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": { "_cell_guid": "79c7e3d0-c299-4dcb-8224-4455121ee9b0", "_uuid": "d629ff2d2480ee46fbb7e2d37f6b5fab8052498a", @@ -86,32 +81,32 @@ "from sklearn.model_selection import train_test_split\n", "from typing import Tuple, Callable\n", "\n", - "from giskard import Dataset, Model, scan, testing, GiskardClient, Suite" + "from giskard import Dataset, Model, scan, testing" ] }, { "cell_type": "markdown", - "source": [ - "## Notebook-level settings" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "## Notebook-level settings" + ] }, { "cell_type": "code", "execution_count": 2, - "outputs": [], - "source": [ - "os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'" - ], "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:28:18.418915Z", "start_time": "2023-11-09T12:28:18.408860Z" - } - } + }, + "collapsed": false + }, + "outputs": [], + "source": [ + "os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'" + ] }, { "cell_type": "markdown", @@ -126,11 +121,11 @@ "cell_type": "code", "execution_count": 3, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:28:21.279141Z", "start_time": "2023-11-09T12:28:21.238638Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -226,11 +221,11 @@ "cell_type": "code", "execution_count": 5, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:28:24.341850Z", "start_time": "2023-11-09T12:28:24.290063Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -257,13 +252,13 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:28:25.530437Z", "start_time": "2023-11-09T12:28:25.487051Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -298,11 +293,11 @@ "cell_type": "code", "execution_count": 7, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:28:32.029185Z", "start_time": "2023-11-09T12:28:27.412721Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -357,12 +352,12 @@ "cell_type": "code", "execution_count": 8, "metadata": { - "_kg_hide-output": true, - "trusted": true, "ExecuteTime": { "end_time": "2023-11-09T12:29:05.258459Z", "start_time": "2023-11-09T12:28:32.775183Z" - } + }, + "_kg_hide-output": true, + "trusted": true }, "outputs": [], "source": [ @@ -509,14 +504,14 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "### Scan your model for vulnerabilities with Giskard\n", "\n", "Giskard's scan allows you to detect vulnerabilities in your model automatically. These include performance biases, unrobustness, data leakage, stochasticity, underconfidence, ethical issues, and more. For detailed information about the scan feature, please refer to our [scan documentation](https://docs.giskard.ai/en/stable/open_source/scan/scan_nlp/index.html)." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", @@ -533,16 +528,1327 @@ "cell_type": "code", "execution_count": 12, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:41:37.561048Z", "start_time": "2023-11-09T12:41:37.297675Z" - } + }, + "collapsed": false }, "outputs": [ { "data": { - "text/html": "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -563,106 +1869,461 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "### Generate test suites from the scan\n", "\n", "The objects produced by the scan can be used as fixtures to generate a test suite that integrate all detected vulnerabilities. Test suites allow you to evaluate and validate your model's performance, ensuring that it behaves as expected on a set of predefined test cases, and to identify any regressions or issues that might arise during development or updates." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": 13, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:42:37.991107Z", "start_time": "2023-11-09T12:42:36.923834Z" - } + }, + "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Executed 'Recall on data slice “`text` contains \"october\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9306910569105691}: \n", - " Test failed\n", - " Metric: 0.87\n", - " \n", - " \n", - "Executed 'Accuracy on data slice “`avg_whitespace(title)` >= 0.147 AND `avg_whitespace(title)` < 0.152”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9386}: \n", + "2024-05-29 12:04:25,717 pid:57157 MainThread giskard.datasets.base INFO Casting dataframe columns from {'title': 'object', 'text': 'object'} to {'title': 'object', 'text': 'object'}\n", + "2024-05-29 12:04:25,720 pid:57157 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (30, 3) executed in 0:00:00.013458\n", + "Executed 'Recall on data slice “`text_length(text)` >= 2570.500 AND `text_length(text)` < 2729.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9384146341463414}: \n", " Test failed\n", " Metric: 0.91\n", " \n", " \n", - "Executed 'Recall on data slice “`text_length(title)` >= 92.500 AND `text_length(title)` < 97.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9306910569105691}: \n", - " Test failed\n", - " Metric: 0.9\n", - " \n", - " \n", - "Executed 'Recall on data slice “`text` contains \"decision\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9306910569105691}: \n", + "2024-05-29 12:04:25,756 pid:57157 MainThread giskard.datasets.base INFO Casting dataframe columns from {'title': 'object', 'text': 'object'} to {'title': 'object', 'text': 'object'}\n", + "2024-05-29 12:04:25,757 pid:57157 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (42, 3) executed in 0:00:00.006029\n", + "Executed 'Recall on data slice “`text` contains \"texas\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9384146341463414}: \n", " Test failed\n", " Metric: 0.91\n", " \n", " \n", - "Executed 'Recall on data slice “`text` contains \"texas\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9306910569105691}: \n", + "2024-05-29 12:04:25,781 pid:57157 MainThread giskard.datasets.base INFO Casting dataframe columns from {'title': 'object', 'text': 'object'} to {'title': 'object', 'text': 'object'}\n", + "2024-05-29 12:04:25,782 pid:57157 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (38, 3) executed in 0:00:00.005618\n", + "Executed 'Recall on data slice “`text` contains \"october\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9384146341463414}: \n", " Test failed\n", " Metric: 0.91\n", " \n", " \n", - "Executed 'Recall on data slice “`text` contains \"life\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9306910569105691}: \n", - " Test failed\n", - " Metric: 0.92\n", - " \n", - " \n", - "Executed 'Accuracy on data slice “`text` contains \"guilty\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9386}: \n", - " Test failed\n", - " Metric: 0.93\n", - " \n", - " \n", - "Executed 'Recall on data slice “`text` contains \"september\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9306910569105691}: \n", - " Test failed\n", - " Metric: 0.92\n", - " \n", - " \n", - "Executed 'Accuracy on data slice “`avg_word_length(text)` < 4.884 AND `avg_word_length(text)` >= 4.835”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9386}: \n", - " Test failed\n", - " Metric: 0.93\n", - " \n", - " \n", - "Executed 'Accuracy on data slice “`avg_digits(text)` >= 0.007 AND `avg_digits(text)` < 0.008”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9386}: \n", + "2024-05-29 12:04:25,808 pid:57157 MainThread giskard.datasets.base INFO Casting dataframe columns from {'title': 'object', 'text': 'object'} to {'title': 'object', 'text': 'object'}\n", + "2024-05-29 12:04:25,808 pid:57157 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (27, 3) executed in 0:00:00.005583\n", + "Executed 'Accuracy on data slice “`text` contains \"guilty\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9424}: \n", " Test failed\n", " Metric: 0.93\n", " \n", " \n", - "Executed 'Accuracy on data slice “`text_length(text)` >= 2446.500 AND `text_length(text)` < 2572.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9386}: \n", + "2024-05-29 12:04:25,832 pid:57157 MainThread giskard.datasets.base INFO Casting dataframe columns from {'title': 'object', 'text': 'object'} to {'title': 'object', 'text': 'object'}\n", + "2024-05-29 12:04:25,833 pid:57157 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (61, 3) executed in 0:00:00.006272\n", + "Executed 'Recall on data slice “`text` contains \"investigation\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9384146341463414}: \n", " Test failed\n", " Metric: 0.93\n", " \n", " \n", - "Executed 'Accuracy on data slice “`title` contains \"video\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9386}: \n", - " Test failed\n", - " Metric: 0.94\n", - " \n", - " \n", - "Executed 'Recall on data slice “`avg_digits(text)` >= 0.003 AND `avg_digits(text)` < 0.004”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9306910569105691}: \n", + "2024-05-29 12:04:25,844 pid:57157 MainThread giskard.datasets.base INFO Casting dataframe columns from {'title': 'object', 'text': 'object'} to {'title': 'object', 'text': 'object'}\n", + "2024-05-29 12:04:25,845 pid:57157 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (33, 3) executed in 0:00:00.006265\n", + "Executed 'Recall on data slice “`text_length(title)` >= 92.500 AND `text_length(title)` < 97.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9384146341463414}: \n", " Test failed\n", " Metric: 0.93\n", " \n", " \n", - "Executed 'Accuracy on data slice “`text` contains \"elections\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9386}: \n", - " Test failed\n", - " Metric: 0.94\n", - " \n", - " \n" + "2024-05-29 12:04:25,847 pid:57157 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'\n", + "2024-05-29 12:04:25,847 pid:57157 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 12:04:25,848 pid:57157 MainThread giskard.core.suite INFO Recall on data slice “`text_length(text)` >= 2570.500 AND `text_length(text)` < 2729.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9384146341463414}): {failed, metric=0.9090909090909091}\n", + "2024-05-29 12:04:25,848 pid:57157 MainThread giskard.core.suite INFO Recall on data slice “`text` contains \"texas\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9384146341463414}): {failed, metric=0.9090909090909091}\n", + "2024-05-29 12:04:25,849 pid:57157 MainThread giskard.core.suite INFO Recall on data slice “`text` contains \"october\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9384146341463414}): {failed, metric=0.9130434782608695}\n", + "2024-05-29 12:04:25,849 pid:57157 MainThread giskard.core.suite INFO Accuracy on data slice “`text` contains \"guilty\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9424}): {failed, metric=0.9259259259259259}\n", + "2024-05-29 12:04:25,850 pid:57157 MainThread giskard.core.suite INFO Recall on data slice “`text` contains \"investigation\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9384146341463414}): {failed, metric=0.9333333333333333}\n", + "2024-05-29 12:04:25,850 pid:57157 MainThread giskard.core.suite INFO Recall on data slice “`text_length(title)` >= 92.500 AND `text_length(title)` < 97.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.9384146341463414}): {failed, metric=0.9333333333333333}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test Recall on data slice “`text` contains "october"”\n
\n \n Measured Metric = 0.86957\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5135f9ad-6d07-4db6-af80-a84dd1965396\n
\n \n
\n dataset\n fake_and_real_news\n
\n \n
\n slicing_function\n `text` contains "october"\n
\n \n
\n threshold\n 0.9306910569105691\n
\n \n
\n
\n \n \n
\n Test Accuracy on data slice “`avg_whitespace(title)` >= 0.147 AND `avg_whitespace(title)` < 0.152”\n
\n \n Measured Metric = 0.90625\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5135f9ad-6d07-4db6-af80-a84dd1965396\n
\n \n
\n dataset\n fake_and_real_news\n
\n \n
\n slicing_function\n `avg_whitespace(title)` >= 0.147 AND `avg_whitespace(title)` < 0.152\n
\n \n
\n threshold\n 0.9386\n
\n \n
\n
\n \n \n
\n Test Recall on data slice “`text_length(title)` >= 92.500 AND `text_length(title)` < 97.500”\n
\n \n Measured Metric = 0.9\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5135f9ad-6d07-4db6-af80-a84dd1965396\n
\n \n
\n dataset\n fake_and_real_news\n
\n \n
\n slicing_function\n `text_length(title)` >= 92.500 AND `text_length(title)` < 97.500\n
\n \n
\n threshold\n 0.9306910569105691\n
\n \n
\n
\n \n \n
\n Test Recall on data slice “`text` contains "decision"”\n
\n \n Measured Metric = 0.90909\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5135f9ad-6d07-4db6-af80-a84dd1965396\n
\n \n
\n dataset\n fake_and_real_news\n
\n \n
\n slicing_function\n `text` contains "decision"\n
\n \n
\n threshold\n 0.9306910569105691\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`text` contains "texas"”\n
\n \n Measured Metric = 0.90909\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5135f9ad-6d07-4db6-af80-a84dd1965396\n
\n \n
\n dataset\n fake_and_real_news\n
\n \n
\n slicing_function\n `text` contains "texas"\n
\n \n
\n threshold\n 0.9306910569105691\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`text` contains "life"”\n
\n \n Measured Metric = 0.91667\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5135f9ad-6d07-4db6-af80-a84dd1965396\n
\n \n
\n dataset\n fake_and_real_news\n
\n \n
\n slicing_function\n `text` contains "life"\n
\n \n
\n threshold\n 0.9306910569105691\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`text` contains "guilty"”\n
\n \n Measured Metric = 0.92593\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5135f9ad-6d07-4db6-af80-a84dd1965396\n
\n \n
\n dataset\n fake_and_real_news\n
\n \n
\n slicing_function\n `text` contains "guilty"\n
\n \n
\n threshold\n 0.9386\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`text` contains "september"”\n
\n \n Measured Metric = 0.92\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5135f9ad-6d07-4db6-af80-a84dd1965396\n
\n \n
\n dataset\n fake_and_real_news\n
\n \n
\n slicing_function\n `text` contains "september"\n
\n \n
\n threshold\n 0.9306910569105691\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`avg_word_length(text)` < 4.884 AND `avg_word_length(text)` >= 4.835”\n
\n \n Measured Metric = 0.93333\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5135f9ad-6d07-4db6-af80-a84dd1965396\n
\n \n
\n dataset\n fake_and_real_news\n
\n \n
\n slicing_function\n `avg_word_length(text)` < 4.884 AND `avg_word_length(text)` >= 4.835\n
\n \n
\n threshold\n 0.9386\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`avg_digits(text)` >= 0.007 AND `avg_digits(text)` < 0.008”\n
\n \n Measured Metric = 0.93333\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5135f9ad-6d07-4db6-af80-a84dd1965396\n
\n \n
\n dataset\n fake_and_real_news\n
\n \n
\n slicing_function\n `avg_digits(text)` >= 0.007 AND `avg_digits(text)` < 0.008\n
\n \n
\n threshold\n 0.9386\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`text_length(text)` >= 2446.500 AND `text_length(text)` < 2572.500”\n
\n \n Measured Metric = 0.93333\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5135f9ad-6d07-4db6-af80-a84dd1965396\n
\n \n
\n dataset\n fake_and_real_news\n
\n \n
\n slicing_function\n `text_length(text)` >= 2446.500 AND `text_length(text)` < 2572.500\n
\n \n
\n threshold\n 0.9386\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`title` contains "video"”\n
\n \n Measured Metric = 0.93548\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5135f9ad-6d07-4db6-af80-a84dd1965396\n
\n \n
\n dataset\n fake_and_real_news\n
\n \n
\n slicing_function\n `title` contains "video"\n
\n \n
\n threshold\n 0.9386\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`avg_digits(text)` >= 0.003 AND `avg_digits(text)` < 0.004”\n
\n \n Measured Metric = 0.92857\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5135f9ad-6d07-4db6-af80-a84dd1965396\n
\n \n
\n dataset\n fake_and_real_news\n
\n \n
\n slicing_function\n `avg_digits(text)` >= 0.003 AND `avg_digits(text)` < 0.004\n
\n \n
\n threshold\n 0.9306910569105691\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`text` contains "elections"”\n
\n \n Measured Metric = 0.9375\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5135f9ad-6d07-4db6-af80-a84dd1965396\n
\n \n
\n dataset\n fake_and_real_news\n
\n \n
\n slicing_function\n `text` contains "elections"\n
\n \n
\n threshold\n 0.9386\n
\n \n
\n \n \n\n \n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`text_length(text)` >= 2570.500 AND `text_length(text)` < 2729.500”\n", + "
\n", + " \n", + " Measured Metric = 0.90909\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " fake_real_news_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fake_and_real_news\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `text_length(text)` >= 2570.500 AND `text_length(text)` < 2729.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9384146341463414\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`text` contains "texas"”\n", + "
\n", + " \n", + " Measured Metric = 0.90909\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " fake_real_news_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fake_and_real_news\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `text` contains "texas"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9384146341463414\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`text` contains "october"”\n", + "
\n", + " \n", + " Measured Metric = 0.91304\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " fake_real_news_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fake_and_real_news\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `text` contains "october"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9384146341463414\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`text` contains "guilty"”\n", + "
\n", + " \n", + " Measured Metric = 0.92593\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " fake_real_news_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fake_and_real_news\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `text` contains "guilty"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9424\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`text` contains "investigation"”\n", + "
\n", + " \n", + " Measured Metric = 0.93333\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " fake_real_news_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fake_and_real_news\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `text` contains "investigation"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9384146341463414\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`text_length(title)` >= 92.500 AND `text_length(title)` < 97.500”\n", + "
\n", + " \n", + " Measured Metric = 0.93333\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " fake_real_news_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fake_and_real_news\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `text_length(title)` >= 92.500 AND `text_length(title)` < 97.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.9384146341463414\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + "\n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] }, "execution_count": 13, "metadata": {}, @@ -703,118 +2364,6 @@ "source": [ "test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" ] - }, - { - "cell_type": "markdown", - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to:\n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces.\n" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to: \n", - " \n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" - ], - "metadata": { - "collapsed": false - } } ], "metadata": { @@ -833,7 +2382,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.6" + "version": "3.10.14" } }, "nbformat": 4, diff --git a/docs/reference/notebooks/hotel_text_regression.ipynb b/docs/reference/notebooks/hotel_text_regression.ipynb index 4471b6e6ab..74d4538f7a 100644 --- a/docs/reference/notebooks/hotel_text_regression.ipynb +++ b/docs/reference/notebooks/hotel_text_regression.ipynb @@ -21,12 +21,7 @@ "Outline: \n", "\n", "* Detect vulnerabilities automatically with Giskard’s scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to: \n", - "\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { @@ -63,11 +58,11 @@ "cell_type": "code", "execution_count": 1, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:11:53.867789Z", "start_time": "2023-11-09T12:11:49.171837Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -83,7 +78,7 @@ "from sklearn.preprocessing import FunctionTransformer\n", "from typing import Iterable\n", "\n", - "from giskard import Model, Dataset, scan, testing, GiskardClient, Suite" + "from giskard import Model, Dataset, scan, testing" ] }, { @@ -99,11 +94,11 @@ "cell_type": "code", "execution_count": 2, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:12:05.303464Z", "start_time": "2023-11-09T12:12:05.254149Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -139,11 +134,11 @@ "cell_type": "code", "execution_count": 3, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:12:21.189537Z", "start_time": "2023-11-09T12:12:21.170978Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -194,12 +189,12 @@ "cell_type": "code", "execution_count": 5, "metadata": { - "_cell_guid": "1fc3041b-4143-4913-be91-522a80491717", - "_uuid": "6edbd3a2e85aced1897d44dbabf74ebfecf10110", "ExecuteTime": { "end_time": "2023-11-09T12:12:40.137753Z", "start_time": "2023-11-09T12:12:40.084154Z" - } + }, + "_cell_guid": "1fc3041b-4143-4913-be91-522a80491717", + "_uuid": "6edbd3a2e85aced1897d44dbabf74ebfecf10110" }, "outputs": [], "source": [ @@ -219,13 +214,13 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:12:51.323768Z", "start_time": "2023-11-09T12:12:51.270599Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -260,11 +255,11 @@ "cell_type": "code", "execution_count": 7, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:12:53.290346Z", "start_time": "2023-11-09T12:12:53.254228Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -336,7 +331,7 @@ }, "outputs": [], "source": [ - "# Wrap the prediction method to allow saving the whole pipeline to the Hub\n", + "# Wrap the prediction method\n", "def prediction_function(df):\n", " return pipeline.predict(df)\n", "\n", @@ -357,12 +352,12 @@ }, { "cell_type": "markdown", - "source": [ - "## Detect vulnerabilities in your model" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "## Detect vulnerabilities in your model" + ] }, { "cell_type": "markdown", @@ -390,16 +385,1773 @@ "cell_type": "code", "execution_count": 11, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:13:26.545786Z", "start_time": "2023-11-09T12:13:26.215236Z" - } + }, + "collapsed": false }, "outputs": [ { "data": { - "text/html": "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -420,81 +2172,623 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "### Generate test suites from the scan\n", "\n", "The objects produced by the scan can be used as fixtures to generate a test suite that integrate all detected vulnerabilities. Test suites allow you to evaluate and validate your model's performance, ensuring that it behaves as expected on a set of predefined test cases, and to identify any regressions or issues that might arise during development or updates." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": 12, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:13:35.489958Z", "start_time": "2023-11-09T12:13:29.988997Z" - } + }, + "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Executed 'Invariance to “Add typos”' with arguments {'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", + "2024-05-29 17:25:08,294 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}\n", + "2024-05-29 17:25:08,295 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (250, 2) executed in 0:00:00.003143\n", + "2024-05-29 17:25:08,317 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}\n", + "2024-05-29 17:25:08,322 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (250, 2) executed in 0:00:00.008373\n", + "2024-05-29 17:25:08,325 pid:11998 MainThread giskard.utils.logging_utils INFO Perturb and predict data executed in 0:00:00.781405\n", + "2024-05-29 17:25:08,326 pid:11998 MainThread giskard.utils.logging_utils INFO Compare and predict the data executed in 0:00:00.001536\n", + "Executed 'Invariance to “Add typos”' with arguments {'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", " Test failed\n", - " Metric: 0.87\n", - " - [TestMessageLevel.INFO] 239 rows were perturbed\n", + " Metric: 0.91\n", + " - [INFO] 241 rows were perturbed\n", " \n", - "Executed 'MSE on data slice “`Full_Review` contains \"building\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}: \n", + "2024-05-29 17:25:08,335 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}\n", + "2024-05-29 17:25:08,337 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (25, 2) executed in 0:00:00.004939\n", + "Executed 'MSE on data slice “`Full_Review` contains \"building\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}: \n", " Test failed\n", " Metric: 3.6\n", " \n", " \n", - "Executed 'MSE on data slice “`Full_Review` contains \"stay\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}: \n", + "2024-05-29 17:25:08,345 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}\n", + "2024-05-29 17:25:08,347 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (25, 2) executed in 0:00:00.004945\n", + "Executed 'MSE on data slice “`Full_Review` contains \"stay\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}: \n", " Test failed\n", " Metric: 3.4\n", " \n", " \n", - "Executed 'MSE on data slice “`Full_Review` contains \"bed\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}: \n", + "2024-05-29 17:25:08,354 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}\n", + "2024-05-29 17:25:08,356 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (46, 2) executed in 0:00:00.003780\n", + "Executed 'MSE on data slice “`Full_Review` contains \"bed\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}: \n", " Test failed\n", " Metric: 2.96\n", " \n", " \n", - "Executed 'MSE on data slice “`Full_Review` contains \"comfy\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}: \n", + "2024-05-29 17:25:08,364 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}\n", + "2024-05-29 17:25:08,365 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (23, 2) executed in 0:00:00.005052\n", + "Executed 'MSE on data slice “`Full_Review` contains \"comfy\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}: \n", " Test failed\n", " Metric: 2.72\n", " \n", " \n", - "Executed 'MSE on data slice “`Full_Review` contains \"area\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}: \n", + "2024-05-29 17:25:08,374 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}\n", + "2024-05-29 17:25:08,375 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (24, 2) executed in 0:00:00.004607\n", + "Executed 'MSE on data slice “`Full_Review` contains \"area\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}: \n", " Test failed\n", " Metric: 2.63\n", " \n", " \n", - "Executed 'MSE on data slice “`Full_Review` contains \"food\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}: \n", + "2024-05-29 17:25:08,385 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}\n", + "2024-05-29 17:25:08,386 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (23, 2) executed in 0:00:00.005924\n", + "Executed 'MSE on data slice “`Full_Review` contains \"food\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}: \n", " Test failed\n", " Metric: 2.57\n", " \n", " \n", - "Executed 'MSE on data slice “`Full_Review` contains \"hotel\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}: \n", + "2024-05-29 17:25:08,394 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}\n", + "2024-05-29 17:25:08,396 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (80, 2) executed in 0:00:00.005724\n", + "Executed 'MSE on data slice “`Full_Review` contains \"hotel\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}: \n", " Test failed\n", " Metric: 2.52\n", " \n", " \n", - "Executed 'MSE on data slice “`Full_Review` contains \"room\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}: \n", + "2024-05-29 17:25:08,406 pid:11998 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Full_Review': 'object'} to {'Full_Review': 'object'}\n", + "2024-05-29 17:25:08,406 pid:11998 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (124, 2) executed in 0:00:00.003998\n", + "Executed 'MSE on data slice “`Full_Review` contains \"room\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}: \n", " Test failed\n", " Metric: 2.4\n", " \n", - " \n" + " \n", + "2024-05-29 17:25:08,408 pid:11998 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'\n", + "2024-05-29 17:25:08,409 pid:11998 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 17:25:08,409 pid:11998 MainThread giskard.core.suite INFO Invariance to “Add typos” ({'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}): {failed, metric=0.9087136929460581}\n", + "2024-05-29 17:25:08,409 pid:11998 MainThread giskard.core.suite INFO MSE on data slice “`Full_Review` contains \"building\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}): {failed, metric=3.600595896699133}\n", + "2024-05-29 17:25:08,410 pid:11998 MainThread giskard.core.suite INFO MSE on data slice “`Full_Review` contains \"stay\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}): {failed, metric=3.3984339534286825}\n", + "2024-05-29 17:25:08,410 pid:11998 MainThread giskard.core.suite INFO MSE on data slice “`Full_Review` contains \"bed\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}): {failed, metric=2.9552583451970587}\n", + "2024-05-29 17:25:08,410 pid:11998 MainThread giskard.core.suite INFO MSE on data slice “`Full_Review` contains \"comfy\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}): {failed, metric=2.7172616774396765}\n", + "2024-05-29 17:25:08,411 pid:11998 MainThread giskard.core.suite INFO MSE on data slice “`Full_Review` contains \"area\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}): {failed, metric=2.6276261964130216}\n", + "2024-05-29 17:25:08,411 pid:11998 MainThread giskard.core.suite INFO MSE on data slice “`Full_Review` contains \"food\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}): {failed, metric=2.57312044778542}\n", + "2024-05-29 17:25:08,411 pid:11998 MainThread giskard.core.suite INFO MSE on data slice “`Full_Review` contains \"hotel\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}): {failed, metric=2.515134565551532}\n", + "2024-05-29 17:25:08,412 pid:11998 MainThread giskard.core.suite INFO MSE on data slice “`Full_Review` contains \"room\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 2.3372302706019896}): {failed, metric=2.4023965763964417}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test Invariance to “Add typos”\n
\n \n Measured Metric = 0.87448\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5fb285f3-16dc-4a3f-8be6-0b495d9a5e41\n
\n \n
\n dataset\n hotel_text_regression_dataset\n
\n \n
\n transformation_function\n Add typos\n
\n \n
\n threshold\n 0.95\n
\n \n
\n output_sensitivity\n 0.05\n
\n \n
\n
\n \n \n
\n Test MSE on data slice “`Full_Review` contains "building"”\n
\n \n Measured Metric = 3.6006\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5fb285f3-16dc-4a3f-8be6-0b495d9a5e41\n
\n \n
\n dataset\n hotel_text_regression_dataset\n
\n \n
\n slicing_function\n `Full_Review` contains "building"\n
\n \n
\n threshold\n 2.3372302706019896\n
\n \n
\n
\n \n \n
\n Test MSE on data slice “`Full_Review` contains "stay"”\n
\n \n Measured Metric = 3.39843\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5fb285f3-16dc-4a3f-8be6-0b495d9a5e41\n
\n \n
\n dataset\n hotel_text_regression_dataset\n
\n \n
\n slicing_function\n `Full_Review` contains "stay"\n
\n \n
\n threshold\n 2.3372302706019896\n
\n \n
\n
\n \n \n
\n Test MSE on data slice “`Full_Review` contains "bed"”\n
\n \n Measured Metric = 2.95526\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5fb285f3-16dc-4a3f-8be6-0b495d9a5e41\n
\n \n
\n dataset\n hotel_text_regression_dataset\n
\n \n
\n slicing_function\n `Full_Review` contains "bed"\n
\n \n
\n threshold\n 2.3372302706019896\n
\n \n
\n \n \n \n
\n Test MSE on data slice “`Full_Review` contains "comfy"”\n
\n \n Measured Metric = 2.71726\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5fb285f3-16dc-4a3f-8be6-0b495d9a5e41\n
\n \n
\n dataset\n hotel_text_regression_dataset\n
\n \n
\n slicing_function\n `Full_Review` contains "comfy"\n
\n \n
\n threshold\n 2.3372302706019896\n
\n \n
\n \n \n \n
\n Test MSE on data slice “`Full_Review` contains "area"”\n
\n \n Measured Metric = 2.62763\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5fb285f3-16dc-4a3f-8be6-0b495d9a5e41\n
\n \n
\n dataset\n hotel_text_regression_dataset\n
\n \n
\n slicing_function\n `Full_Review` contains "area"\n
\n \n
\n threshold\n 2.3372302706019896\n
\n \n
\n \n \n \n
\n Test MSE on data slice “`Full_Review` contains "food"”\n
\n \n Measured Metric = 2.57312\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5fb285f3-16dc-4a3f-8be6-0b495d9a5e41\n
\n \n
\n dataset\n hotel_text_regression_dataset\n
\n \n
\n slicing_function\n `Full_Review` contains "food"\n
\n \n
\n threshold\n 2.3372302706019896\n
\n \n
\n \n \n \n
\n Test MSE on data slice “`Full_Review` contains "hotel"”\n
\n \n Measured Metric = 2.51513\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5fb285f3-16dc-4a3f-8be6-0b495d9a5e41\n
\n \n
\n dataset\n hotel_text_regression_dataset\n
\n \n
\n slicing_function\n `Full_Review` contains "hotel"\n
\n \n
\n threshold\n 2.3372302706019896\n
\n \n
\n \n \n \n
\n Test MSE on data slice “`Full_Review` contains "room"”\n
\n \n Measured Metric = 2.4024\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 5fb285f3-16dc-4a3f-8be6-0b495d9a5e41\n
\n \n
\n dataset\n hotel_text_regression_dataset\n
\n \n
\n slicing_function\n `Full_Review` contains "room"\n
\n \n
\n threshold\n 2.3372302706019896\n
\n \n
\n \n \n\n \n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Invariance to “Add typos”\n", + "
\n", + " \n", + " Measured Metric = 0.90871\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " hotel_text_regression\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " hotel_text_regression_dataset\n", + "
\n", + " \n", + "
\n", + " transformation_function\n", + " Add typos\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " output_sensitivity\n", + " 0.05\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test MSE on data slice “`Full_Review` contains "building"”\n", + "
\n", + " \n", + " Measured Metric = 3.6006\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " hotel_text_regression\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " hotel_text_regression_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Full_Review` contains "building"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 2.3372302706019896\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test MSE on data slice “`Full_Review` contains "stay"”\n", + "
\n", + " \n", + " Measured Metric = 3.39843\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " hotel_text_regression\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " hotel_text_regression_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Full_Review` contains "stay"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 2.3372302706019896\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test MSE on data slice “`Full_Review` contains "bed"”\n", + "
\n", + " \n", + " Measured Metric = 2.95526\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " hotel_text_regression\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " hotel_text_regression_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Full_Review` contains "bed"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 2.3372302706019896\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test MSE on data slice “`Full_Review` contains "comfy"”\n", + "
\n", + " \n", + " Measured Metric = 2.71726\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " hotel_text_regression\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " hotel_text_regression_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Full_Review` contains "comfy"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 2.3372302706019896\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test MSE on data slice “`Full_Review` contains "area"”\n", + "
\n", + " \n", + " Measured Metric = 2.62763\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " hotel_text_regression\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " hotel_text_regression_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Full_Review` contains "area"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 2.3372302706019896\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test MSE on data slice “`Full_Review` contains "food"”\n", + "
\n", + " \n", + " Measured Metric = 2.57312\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " hotel_text_regression\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " hotel_text_regression_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Full_Review` contains "food"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 2.3372302706019896\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test MSE on data slice “`Full_Review` contains "hotel"”\n", + "
\n", + " \n", + " Measured Metric = 2.51513\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " hotel_text_regression\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " hotel_text_regression_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Full_Review` contains "hotel"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 2.3372302706019896\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test MSE on data slice “`Full_Review` contains "room"”\n", + "
\n", + " \n", + " Measured Metric = 2.4024\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " hotel_text_regression\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " hotel_text_regression_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Full_Review` contains "room"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 2.3372302706019896\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + "\n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] }, "execution_count": 12, "metadata": {}, @@ -535,118 +2829,6 @@ "source": [ "test_suite.add_test(testing.test_r2(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" ] - }, - { - "cell_type": "markdown", - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to:\n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to: \n", - "\n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" - ], - "metadata": { - "collapsed": false - } } ], "metadata": { @@ -665,7 +2847,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.1" + "version": "3.10.14" } }, "nbformat": 4, diff --git a/docs/reference/notebooks/ieee_fraud_detection_adversarial_validation.ipynb b/docs/reference/notebooks/ieee_fraud_detection_adversarial_validation.ipynb index d8352fd5ee..3e98100999 100644 --- a/docs/reference/notebooks/ieee_fraud_detection_adversarial_validation.ipynb +++ b/docs/reference/notebooks/ieee_fraud_detection_adversarial_validation.ipynb @@ -22,12 +22,7 @@ "Outline: \n", "\n", "* Detect vulnerabilities automatically with Giskard’s scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to: \n", - "\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { @@ -75,11 +70,11 @@ "cell_type": "code", "execution_count": 1, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:17:43.454764Z", "start_time": "2023-11-09T12:17:37.151513Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -94,7 +89,7 @@ "from sklearn.metrics import roc_auc_score\n", "from sklearn.model_selection import train_test_split\n", "\n", - "from giskard import GiskardClient, scan, testing, Dataset, Model, Suite" + "from giskard import Dataset, Model, scan, testing" ] }, { @@ -110,11 +105,11 @@ "cell_type": "code", "execution_count": 2, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:17:44.751420Z", "start_time": "2023-11-09T12:17:44.719440Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -149,11 +144,11 @@ "cell_type": "code", "execution_count": 3, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:17:45.925766Z", "start_time": "2023-11-09T12:17:45.904823Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -256,11 +251,11 @@ "cell_type": "code", "execution_count": 4, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:17:46.316557Z", "start_time": "2023-11-09T12:17:46.290804Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -320,11 +315,11 @@ "cell_type": "code", "execution_count": 6, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:18:39.711301Z", "start_time": "2023-11-09T12:18:39.634839Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -344,13 +339,13 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:18:40.628414Z", "start_time": "2023-11-09T12:18:40.394073Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -431,11 +426,11 @@ "cell_type": "code", "execution_count": 9, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:18:42.538241Z", "start_time": "2023-11-09T12:18:42.501449Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -467,12 +462,12 @@ }, { "cell_type": "markdown", - "source": [ - "## Detect vulnerabilities in your model" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "## Detect vulnerabilities in your model" + ] }, { "cell_type": "markdown", @@ -500,16 +495,2509 @@ "cell_type": "code", "execution_count": 12, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:25:58.669132Z", "start_time": "2023-11-09T12:25:58.390530Z" - } + }, + "collapsed": false }, "outputs": [ { "data": { - "text/html": "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -530,111 +3018,920 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "### Generate test suites from the scan\n", "\n", "The objects produced by the scan can be used as fixtures to generate a test suite that integrate all detected vulnerabilities. Test suites allow you to evaluate and validate your model's performance, ensuring that it behaves as expected on a set of predefined test cases, and to identify any regressions or issues that might arise during development or updates." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": 13, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:26:06.901452Z", "start_time": "2023-11-09T12:26:00.832299Z" - } + }, + "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Executed 'Precision on data slice “`D15` >= 4.000 AND `D15` < 344.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8444444444444444}: \n", + "2024-05-29 13:32:04,603 pid:62817 MainThread giskard.datasets.base INFO Casting dataframe columns from {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'} to {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'}\n", + "2024-05-29 13:32:04,624 pid:62817 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (34, 434) executed in 0:00:00.050286\n", + "Executed 'Accuracy on data slice “`D3` >= 13.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}: \n", " Test failed\n", - " Metric: 0.7\n", + " Metric: 0.68\n", " \n", " \n", - "Executed 'Precision on data slice “`D4` >= 81.000”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8444444444444444}: \n", + "2024-05-29 13:32:04,676 pid:62817 MainThread giskard.datasets.base INFO Casting dataframe columns from {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'} to {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'}\n", + "2024-05-29 13:32:04,696 pid:62817 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (31, 434) executed in 0:00:00.042018\n", + "Executed 'Accuracy on data slice “`D5` >= 10.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}: \n", " Test failed\n", - " Metric: 0.75\n", + " Metric: 0.68\n", " \n", " \n", - "Executed 'Accuracy on data slice “`D2` >= 108.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8664000000000001}: \n", + "2024-05-29 13:32:04,751 pid:62817 MainThread giskard.datasets.base INFO Casting dataframe columns from {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'} to {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'}\n", + "2024-05-29 13:32:04,771 pid:62817 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (37, 434) executed in 0:00:00.044460\n", + "Executed 'Accuracy on data slice “`D11` >= 31.000”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}: \n", " Test failed\n", - " Metric: 0.79\n", + " Metric: 0.7\n", " \n", " \n", - "Executed 'Accuracy on data slice “`C6` >= 1.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8664000000000001}: \n", + "2024-05-29 13:32:04,826 pid:62817 MainThread giskard.datasets.base INFO Casting dataframe columns from {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'} to {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'}\n", + "2024-05-29 13:32:04,847 pid:62817 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (31, 434) executed in 0:00:00.046466\n", + "Executed 'Accuracy on data slice “`card1` < 12740.000 AND `card1` >= 7867.000”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}: \n", " Test failed\n", - " Metric: 0.8\n", + " Metric: 0.71\n", " \n", " \n", - "Executed 'Accuracy on data slice “`D11` >= 65.000”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8664000000000001}: \n", + "2024-05-29 13:32:04,982 pid:62817 MainThread giskard.datasets.base INFO Casting dataframe columns from {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'} to {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'}\n", + "2024-05-29 13:32:05,004 pid:62817 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (32, 434) executed in 0:00:00.126585\n", + "Executed 'Accuracy on data slice “`M8` == \"F\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}: \n", " Test failed\n", - " Metric: 0.81\n", + " Metric: 0.72\n", " \n", " \n", - "Executed 'Precision on data slice “`V310` >= 48.475”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8444444444444444}: \n", + "2024-05-29 13:32:05,062 pid:62817 MainThread giskard.datasets.base INFO Casting dataframe columns from {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'} to {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'}\n", + "2024-05-29 13:32:05,087 pid:62817 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (38, 434) executed in 0:00:00.051907\n", + "Executed 'Precision on data slice “`D15` >= 1.000 AND `D15` < 232.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7707547169811321}: \n", " Test failed\n", - " Metric: 0.79\n", + " Metric: 0.68\n", " \n", " \n", - "Executed 'Precision on data slice “`C11` >= 1.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8444444444444444}: \n", + "2024-05-29 13:32:05,144 pid:62817 MainThread giskard.datasets.base INFO Casting dataframe columns from {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'} to {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'}\n", + "2024-05-29 13:32:05,166 pid:62817 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (42, 434) executed in 0:00:00.046181\n", + "Executed 'Accuracy on data slice “`M7` == \"F\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}: \n", " Test failed\n", - " Metric: 0.79\n", + " Metric: 0.74\n", " \n", " \n", - "Executed 'Accuracy on data slice “`D5` >= 13.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8664000000000001}: \n", + "2024-05-29 13:32:05,221 pid:62817 MainThread giskard.datasets.base INFO Casting dataframe columns from {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'} to {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'}\n", + "2024-05-29 13:32:05,244 pid:62817 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (31, 434) executed in 0:00:00.046322\n", + "Executed 'Accuracy on data slice “`D1` >= 0.500 AND `D1` < 120.000”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}: \n", " Test failed\n", - " Metric: 0.81\n", + " Metric: 0.74\n", " \n", " \n", - "Executed 'Precision on data slice “`C1` >= 2.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8444444444444444}: \n", + "2024-05-29 13:32:05,301 pid:62817 MainThread giskard.datasets.base INFO Casting dataframe columns from {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'} to {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'}\n", + "2024-05-29 13:32:05,324 pid:62817 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (50, 434) executed in 0:00:00.049778\n", + "Executed 'Precision on data slice “`C9` >= 0.500 AND `C9` < 1.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7707547169811321}: \n", " Test failed\n", - " Metric: 0.8\n", + " Metric: 0.69\n", " \n", " \n", - "Executed 'Precision on data slice “`V283` < 0.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8444444444444444}: \n", + "2024-05-29 13:32:05,505 pid:62817 MainThread giskard.datasets.base INFO Casting dataframe columns from {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'} to {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'}\n", + "2024-05-29 13:32:05,525 pid:62817 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (35, 434) executed in 0:00:00.042524\n", + "Executed 'Accuracy on data slice “`V10` < 0.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}: \n", " Test failed\n", - " Metric: 0.8\n", + " Metric: 0.77\n", " \n", " \n", - "Executed 'Precision on data slice “`V282` < 0.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8444444444444444}: \n", + "2024-05-29 13:32:05,579 pid:62817 MainThread giskard.datasets.base INFO Casting dataframe columns from {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'} to {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'}\n", + "2024-05-29 13:32:05,601 pid:62817 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (35, 434) executed in 0:00:00.045713\n", + "Executed 'Accuracy on data slice “`V11` < 0.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}: \n", " Test failed\n", - " Metric: 0.8\n", + " Metric: 0.77\n", " \n", " \n", - "Executed 'Accuracy on data slice “`D3` >= 13.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8664000000000001}: \n", + "2024-05-29 13:32:05,656 pid:62817 MainThread giskard.datasets.base INFO Casting dataframe columns from {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'} to {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'}\n", + "2024-05-29 13:32:05,678 pid:62817 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (49, 434) executed in 0:00:00.045881\n", + "Executed 'Precision on data slice “`P_emaildomain` == \"gmail.com\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7707547169811321}: \n", " Test failed\n", - " Metric: 0.82\n", + " Metric: 0.71\n", " \n", " \n", - "Executed 'Precision on data slice “`V285` >= 0.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8444444444444444}: \n", + "2024-05-29 13:32:05,731 pid:62817 MainThread giskard.datasets.base INFO Casting dataframe columns from {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'} to {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'}\n", + "2024-05-29 13:32:05,753 pid:62817 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (46, 434) executed in 0:00:00.045060\n", + "Executed 'Precision on data slice “`M6` == \"T\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7707547169811321}: \n", " Test failed\n", - " Metric: 0.81\n", + " Metric: 0.71\n", " \n", " \n", - "Executed 'Recall on data slice “`addr1` >= 312.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8603773584905661}: \n", + "2024-05-29 13:32:05,809 pid:62817 MainThread giskard.datasets.base INFO Casting dataframe columns from {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'} to {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'}\n", + "2024-05-29 13:32:05,909 pid:62817 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (40, 434) executed in 0:00:00.125281\n", + "Executed 'Accuracy on data slice “`C6` >= 1.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}: \n", " Test failed\n", - " Metric: 0.83\n", + " Metric: 0.78\n", " \n", " \n", - "Executed 'Accuracy on data slice “`D10` >= 222.000”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8664000000000001}: \n", + "2024-05-29 13:32:05,959 pid:62817 MainThread giskard.datasets.base INFO Casting dataframe columns from {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'} to {'TransactionAmt': 'float32', 'ProductCD': 'category', 'card1': 'int16', 'card2': 'float32', 'card3': 'float32', 'card4': 'category', 'card5': 'float32', 'card6': 'category', 'addr1': 'float32', 'addr2': 'float32', 'dist1': 'float32', 'dist2': 'float32', 'P_emaildomain': 'category', 'R_emaildomain': 'category', 'C1': 'float32', 'C2': 'float32', 'C3': 'float32', 'C4': 'float32', 'C5': 'float32', 'C6': 'float32', 'C7': 'float32', 'C8': 'float32', 'C9': 'float32', 'C10': 'float32', 'C11': 'float32', 'C12': 'float32', 'C13': 'float32', 'C14': 'float32', 'D1': 'float32', 'D2': 'float32', 'D3': 'float32', 'D4': 'float32', 'D5': 'float32', 'D6': 'float32', 'D7': 'float32', 'D8': 'float32', 'D9': 'float32', 'D10': 'float32', 'D11': 'float32', 'D12': 'float32', 'D13': 'float32', 'D14': 'float32', 'D15': 'float32', 'M1': 'category', 'M2': 'category', 'M3': 'category', 'M4': 'category', 'M5': 'category', 'M6': 'category', 'M7': 'category', 'M8': 'category', 'M9': 'category', 'V1': 'float32', 'V2': 'float32', 'V3': 'float32', 'V4': 'float32', 'V5': 'float32', 'V6': 'float32', 'V7': 'float32', 'V8': 'float32', 'V9': 'float32', 'V10': 'float32', 'V11': 'float32', 'V12': 'float32', 'V13': 'float32', 'V14': 'float32', 'V15': 'float32', 'V16': 'float32', 'V17': 'float32', 'V18': 'float32', 'V19': 'float32', 'V20': 'float32', 'V21': 'float32', 'V22': 'float32', 'V23': 'float32', 'V24': 'float32', 'V25': 'float32', 'V26': 'float32', 'V27': 'float32', 'V28': 'float32', 'V29': 'float32', 'V30': 'float32', 'V31': 'float32', 'V32': 'float32', 'V33': 'float32', 'V34': 'float32', 'V35': 'float32', 'V36': 'float32', 'V37': 'float32', 'V38': 'float32', 'V39': 'float32', 'V40': 'float32', 'V41': 'float32', 'V42': 'float32', 'V43': 'float32', 'V44': 'float32', 'V45': 'float32', 'V46': 'float32', 'V47': 'float32', 'V48': 'float32', 'V49': 'float32', 'V50': 'float32', 'V51': 'float32', 'V52': 'float32', 'V53': 'float32', 'V54': 'float32', 'V55': 'float32', 'V56': 'float32', 'V57': 'float32', 'V58': 'float32', 'V59': 'float32', 'V60': 'float32', 'V61': 'float32', 'V62': 'float32', 'V63': 'float32', 'V64': 'float32', 'V65': 'float32', 'V66': 'float32', 'V67': 'float32', 'V68': 'float32', 'V69': 'float32', 'V70': 'float32', 'V71': 'float32', 'V72': 'float32', 'V73': 'float32', 'V74': 'float32', 'V75': 'float32', 'V76': 'float32', 'V77': 'float32', 'V78': 'float32', 'V79': 'float32', 'V80': 'float32', 'V81': 'float32', 'V82': 'float32', 'V83': 'float32', 'V84': 'float32', 'V85': 'float32', 'V86': 'float32', 'V87': 'float32', 'V88': 'float32', 'V89': 'float32', 'V90': 'float32', 'V91': 'float32', 'V92': 'float32', 'V93': 'float32', 'V94': 'float32', 'V95': 'float32', 'V96': 'float32', 'V97': 'float32', 'V98': 'float32', 'V99': 'float32', 'V100': 'float32', 'V101': 'float32', 'V102': 'float32', 'V103': 'float32', 'V104': 'float32', 'V105': 'float32', 'V106': 'float32', 'V107': 'float32', 'V108': 'float32', 'V109': 'float32', 'V110': 'float32', 'V111': 'float32', 'V112': 'float32', 'V113': 'float32', 'V114': 'float32', 'V115': 'float32', 'V116': 'float32', 'V117': 'float32', 'V118': 'float32', 'V119': 'float32', 'V120': 'float32', 'V121': 'float32', 'V122': 'float32', 'V123': 'float32', 'V124': 'float32', 'V125': 'float32', 'V126': 'float32', 'V127': 'float32', 'V128': 'float32', 'V129': 'float32', 'V130': 'float32', 'V131': 'float32', 'V132': 'float32', 'V133': 'float32', 'V134': 'float32', 'V135': 'float32', 'V136': 'float32', 'V137': 'float32', 'V138': 'float32', 'V139': 'float32', 'V140': 'float32', 'V141': 'float32', 'V142': 'float32', 'V143': 'float32', 'V144': 'float32', 'V145': 'float32', 'V146': 'float32', 'V147': 'float32', 'V148': 'float32', 'V149': 'float32', 'V150': 'float32', 'V151': 'float32', 'V152': 'float32', 'V153': 'float32', 'V154': 'float32', 'V155': 'float32', 'V156': 'float32', 'V157': 'float32', 'V158': 'float32', 'V159': 'float32', 'V160': 'float32', 'V161': 'float32', 'V162': 'float32', 'V163': 'float32', 'V164': 'float32', 'V165': 'float32', 'V166': 'float32', 'V167': 'float32', 'V168': 'float32', 'V169': 'float32', 'V170': 'float32', 'V171': 'float32', 'V172': 'float32', 'V173': 'float32', 'V174': 'float32', 'V175': 'float32', 'V176': 'float32', 'V177': 'float32', 'V178': 'float32', 'V179': 'float32', 'V180': 'float32', 'V181': 'float32', 'V182': 'float32', 'V183': 'float32', 'V184': 'float32', 'V185': 'float32', 'V186': 'float32', 'V187': 'float32', 'V188': 'float32', 'V189': 'float32', 'V190': 'float32', 'V191': 'float32', 'V192': 'float32', 'V193': 'float32', 'V194': 'float32', 'V195': 'float32', 'V196': 'float32', 'V197': 'float32', 'V198': 'float32', 'V199': 'float32', 'V200': 'float32', 'V201': 'float32', 'V202': 'float32', 'V203': 'float32', 'V204': 'float32', 'V205': 'float32', 'V206': 'float32', 'V207': 'float32', 'V208': 'float32', 'V209': 'float32', 'V210': 'float32', 'V211': 'float32', 'V212': 'float32', 'V213': 'float32', 'V214': 'float32', 'V215': 'float32', 'V216': 'float32', 'V217': 'float32', 'V218': 'float32', 'V219': 'float32', 'V220': 'float32', 'V221': 'float32', 'V222': 'float32', 'V223': 'float32', 'V224': 'float32', 'V225': 'float32', 'V226': 'float32', 'V227': 'float32', 'V228': 'float32', 'V229': 'float32', 'V230': 'float32', 'V231': 'float32', 'V232': 'float32', 'V233': 'float32', 'V234': 'float32', 'V235': 'float32', 'V236': 'float32', 'V237': 'float32', 'V238': 'float32', 'V239': 'float32', 'V240': 'float32', 'V241': 'float32', 'V242': 'float32', 'V243': 'float32', 'V244': 'float32', 'V245': 'float32', 'V246': 'float32', 'V247': 'float32', 'V248': 'float32', 'V249': 'float32', 'V250': 'float32', 'V251': 'float32', 'V252': 'float32', 'V253': 'float32', 'V254': 'float32', 'V255': 'float32', 'V256': 'float32', 'V257': 'float32', 'V258': 'float32', 'V259': 'float32', 'V260': 'float32', 'V261': 'float32', 'V262': 'float32', 'V263': 'float32', 'V264': 'float32', 'V265': 'float32', 'V266': 'float32', 'V267': 'float32', 'V268': 'float32', 'V269': 'float32', 'V270': 'float32', 'V271': 'float32', 'V272': 'float32', 'V273': 'float32', 'V274': 'float32', 'V275': 'float32', 'V276': 'float32', 'V277': 'float32', 'V278': 'float32', 'V279': 'float32', 'V280': 'float32', 'V281': 'float32', 'V282': 'float32', 'V283': 'float32', 'V284': 'float32', 'V285': 'float32', 'V286': 'float32', 'V287': 'float32', 'V288': 'float32', 'V289': 'float32', 'V290': 'float32', 'V291': 'float32', 'V292': 'float32', 'V293': 'float32', 'V294': 'float32', 'V295': 'float32', 'V296': 'float32', 'V297': 'float32', 'V298': 'float32', 'V299': 'float32', 'V300': 'float32', 'V301': 'float32', 'V302': 'float32', 'V303': 'float32', 'V304': 'float32', 'V305': 'float32', 'V306': 'float32', 'V307': 'float32', 'V308': 'float32', 'V309': 'float32', 'V310': 'float32', 'V311': 'float32', 'V312': 'float32', 'V313': 'float32', 'V314': 'float32', 'V315': 'float32', 'V316': 'float32', 'V317': 'float32', 'V318': 'float32', 'V319': 'float32', 'V320': 'float32', 'V321': 'float32', 'V322': 'float32', 'V323': 'float32', 'V324': 'float32', 'V325': 'float32', 'V326': 'float32', 'V327': 'float32', 'V328': 'float32', 'V329': 'float32', 'V330': 'float32', 'V331': 'float32', 'V332': 'float32', 'V333': 'float32', 'V334': 'float32', 'V335': 'float32', 'V336': 'float32', 'V337': 'float32', 'V338': 'float32', 'V339': 'float32', 'id_01': 'float32', 'id_02': 'float32', 'id_03': 'float32', 'id_04': 'float32', 'id_05': 'float32', 'id_06': 'float32', 'id_07': 'float32', 'id_08': 'float32', 'id_09': 'float32', 'id_10': 'float32', 'id_11': 'float32', 'id_12': 'category', 'id_13': 'float32', 'id_14': 'float32', 'id_15': 'category', 'id_16': 'category', 'id_17': 'float32', 'id_18': 'float32', 'id_19': 'float32', 'id_20': 'float32', 'id_21': 'float32', 'id_22': 'float32', 'id_23': 'category', 'id_24': 'float32', 'id_25': 'float32', 'id_26': 'float32', 'id_27': 'category', 'id_28': 'category', 'id_29': 'category', 'id_30': 'category', 'id_31': 'category', 'id_32': 'float32', 'id_33': 'category', 'id_34': 'category', 'id_35': 'category', 'id_36': 'category', 'id_37': 'category', 'id_38': 'category', 'DeviceType': 'category', 'DeviceInfo': 'category', 'TimeInDay': 'int32', 'Cents': 'float32'}\n", + "2024-05-29 13:32:05,981 pid:62817 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (32, 434) executed in 0:00:00.043595\n", + "Executed 'Accuracy on data slice “`D2` < 120.000”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}: \n", " Test failed\n", - " Metric: 0.84\n", + " Metric: 0.78\n", + " \n", " \n", - " \n" + "2024-05-29 13:32:05,999 pid:62817 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'\n", + "2024-05-29 13:32:05,999 pid:62817 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 13:32:05,999 pid:62817 MainThread giskard.core.suite INFO Accuracy on data slice “`D3` >= 13.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}): {failed, metric=0.6764705882352942}\n", + "2024-05-29 13:32:06,000 pid:62817 MainThread giskard.core.suite INFO Accuracy on data slice “`D5` >= 10.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}): {failed, metric=0.6774193548387096}\n", + "2024-05-29 13:32:06,000 pid:62817 MainThread giskard.core.suite INFO Accuracy on data slice “`D11` >= 31.000” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}): {failed, metric=0.7027027027027027}\n", + "2024-05-29 13:32:06,001 pid:62817 MainThread giskard.core.suite INFO Accuracy on data slice “`card1` < 12740.000 AND `card1` >= 7867.000” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}): {failed, metric=0.7096774193548387}\n", + "2024-05-29 13:32:06,003 pid:62817 MainThread giskard.core.suite INFO Accuracy on data slice “`M8` == \"F\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}): {failed, metric=0.71875}\n", + "2024-05-29 13:32:06,004 pid:62817 MainThread giskard.core.suite INFO Precision on data slice “`D15` >= 1.000 AND `D15` < 232.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7707547169811321}): {failed, metric=0.68}\n", + "2024-05-29 13:32:06,004 pid:62817 MainThread giskard.core.suite INFO Accuracy on data slice “`M7` == \"F\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}): {failed, metric=0.7380952380952381}\n", + "2024-05-29 13:32:06,005 pid:62817 MainThread giskard.core.suite INFO Accuracy on data slice “`D1` >= 0.500 AND `D1` < 120.000” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}): {failed, metric=0.7419354838709677}\n", + "2024-05-29 13:32:06,005 pid:62817 MainThread giskard.core.suite INFO Precision on data slice “`C9` >= 0.500 AND `C9` < 1.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7707547169811321}): {failed, metric=0.6923076923076923}\n", + "2024-05-29 13:32:06,005 pid:62817 MainThread giskard.core.suite INFO Accuracy on data slice “`V10` < 0.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}): {failed, metric=0.7714285714285715}\n", + "2024-05-29 13:32:06,006 pid:62817 MainThread giskard.core.suite INFO Accuracy on data slice “`V11` < 0.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}): {failed, metric=0.7714285714285715}\n", + "2024-05-29 13:32:06,008 pid:62817 MainThread giskard.core.suite INFO Precision on data slice “`P_emaildomain` == \"gmail.com\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7707547169811321}): {failed, metric=0.7142857142857143}\n", + "2024-05-29 13:32:06,008 pid:62817 MainThread giskard.core.suite INFO Precision on data slice “`M6` == \"T\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.7707547169811321}): {failed, metric=0.7142857142857143}\n", + "2024-05-29 13:32:06,008 pid:62817 MainThread giskard.core.suite INFO Accuracy on data slice “`C6` >= 1.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}): {failed, metric=0.775}\n", + "2024-05-29 13:32:06,009 pid:62817 MainThread giskard.core.suite INFO Accuracy on data slice “`D2` < 120.000” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.836}): {failed, metric=0.78125}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test Precision on data slice “`D15` >= 4.000 AND `D15` < 344.500”\n
\n \n Measured Metric = 0.7\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n af2b2aaa-c0d6-4428-9a9c-9aa288c718f4\n
\n \n
\n dataset\n fraud_detection_adversarial_dataset\n
\n \n
\n slicing_function\n `D15` >= 4.000 AND `D15` < 344.500\n
\n \n
\n threshold\n 0.8444444444444444\n
\n \n
\n
\n \n \n
\n Test Precision on data slice “`D4` >= 81.000”\n
\n \n Measured Metric = 0.75\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n af2b2aaa-c0d6-4428-9a9c-9aa288c718f4\n
\n \n
\n dataset\n fraud_detection_adversarial_dataset\n
\n \n
\n slicing_function\n `D4` >= 81.000\n
\n \n
\n threshold\n 0.8444444444444444\n
\n \n
\n
\n \n \n
\n Test Accuracy on data slice “`D2` >= 108.500”\n
\n \n Measured Metric = 0.79412\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n af2b2aaa-c0d6-4428-9a9c-9aa288c718f4\n
\n \n
\n dataset\n fraud_detection_adversarial_dataset\n
\n \n
\n slicing_function\n `D2` >= 108.500\n
\n \n
\n threshold\n 0.8664000000000001\n
\n \n
\n
\n \n \n
\n Test Accuracy on data slice “`C6` >= 1.500”\n
\n \n Measured Metric = 0.8\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n af2b2aaa-c0d6-4428-9a9c-9aa288c718f4\n
\n \n
\n dataset\n fraud_detection_adversarial_dataset\n
\n \n
\n slicing_function\n `C6` >= 1.500\n
\n \n
\n threshold\n 0.8664000000000001\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`D11` >= 65.000”\n
\n \n Measured Metric = 0.80556\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n af2b2aaa-c0d6-4428-9a9c-9aa288c718f4\n
\n \n
\n dataset\n fraud_detection_adversarial_dataset\n
\n \n
\n slicing_function\n `D11` >= 65.000\n
\n \n
\n threshold\n 0.8664000000000001\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`V310` >= 48.475”\n
\n \n Measured Metric = 0.78571\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n af2b2aaa-c0d6-4428-9a9c-9aa288c718f4\n
\n \n
\n dataset\n fraud_detection_adversarial_dataset\n
\n \n
\n slicing_function\n `V310` >= 48.475\n
\n \n
\n threshold\n 0.8444444444444444\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`C11` >= 1.500”\n
\n \n Measured Metric = 0.79167\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n af2b2aaa-c0d6-4428-9a9c-9aa288c718f4\n
\n \n
\n dataset\n fraud_detection_adversarial_dataset\n
\n \n
\n slicing_function\n `C11` >= 1.500\n
\n \n
\n threshold\n 0.8444444444444444\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`D5` >= 13.500”\n
\n \n Measured Metric = 0.8125\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n af2b2aaa-c0d6-4428-9a9c-9aa288c718f4\n
\n \n
\n dataset\n fraud_detection_adversarial_dataset\n
\n \n
\n slicing_function\n `D5` >= 13.500\n
\n \n
\n threshold\n 0.8664000000000001\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`C1` >= 2.500”\n
\n \n Measured Metric = 0.8\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n af2b2aaa-c0d6-4428-9a9c-9aa288c718f4\n
\n \n
\n dataset\n fraud_detection_adversarial_dataset\n
\n \n
\n slicing_function\n `C1` >= 2.500\n
\n \n
\n threshold\n 0.8444444444444444\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`V283` < 0.500”\n
\n \n Measured Metric = 0.8\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n af2b2aaa-c0d6-4428-9a9c-9aa288c718f4\n
\n \n
\n dataset\n fraud_detection_adversarial_dataset\n
\n \n
\n slicing_function\n `V283` < 0.500\n
\n \n
\n threshold\n 0.8444444444444444\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`V282` < 0.500”\n
\n \n Measured Metric = 0.8\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n af2b2aaa-c0d6-4428-9a9c-9aa288c718f4\n
\n \n
\n dataset\n fraud_detection_adversarial_dataset\n
\n \n
\n slicing_function\n `V282` < 0.500\n
\n \n
\n threshold\n 0.8444444444444444\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`D3` >= 13.500”\n
\n \n Measured Metric = 0.82353\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n af2b2aaa-c0d6-4428-9a9c-9aa288c718f4\n
\n \n
\n dataset\n fraud_detection_adversarial_dataset\n
\n \n
\n slicing_function\n `D3` >= 13.500\n
\n \n
\n threshold\n 0.8664000000000001\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`V285` >= 0.500”\n
\n \n Measured Metric = 0.80645\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n af2b2aaa-c0d6-4428-9a9c-9aa288c718f4\n
\n \n
\n dataset\n fraud_detection_adversarial_dataset\n
\n \n
\n slicing_function\n `V285` >= 0.500\n
\n \n
\n threshold\n 0.8444444444444444\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`addr1` >= 312.500”\n
\n \n Measured Metric = 0.83333\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n af2b2aaa-c0d6-4428-9a9c-9aa288c718f4\n
\n \n
\n dataset\n fraud_detection_adversarial_dataset\n
\n \n
\n slicing_function\n `addr1` >= 312.500\n
\n \n
\n threshold\n 0.8603773584905661\n
\n \n
\n \n \n \n
\n Test Accuracy on data slice “`D10` >= 222.000”\n
\n \n Measured Metric = 0.84375\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n af2b2aaa-c0d6-4428-9a9c-9aa288c718f4\n
\n \n
\n dataset\n fraud_detection_adversarial_dataset\n
\n \n
\n slicing_function\n `D10` >= 222.000\n
\n \n
\n threshold\n 0.8664000000000001\n
\n \n
\n \n \n\n \n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`D3` >= 13.500”\n", + "
\n", + " \n", + " Measured Metric = 0.67647\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " train_test_data_classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fraud_detection_adversarial_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `D3` >= 13.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.836\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`D5` >= 10.500”\n", + "
\n", + " \n", + " Measured Metric = 0.67742\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " train_test_data_classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fraud_detection_adversarial_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `D5` >= 10.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.836\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`D11` >= 31.000”\n", + "
\n", + " \n", + " Measured Metric = 0.7027\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " train_test_data_classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fraud_detection_adversarial_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `D11` >= 31.000\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.836\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`card1` < 12740.000 AND `card1` >= 7867.000”\n", + "
\n", + " \n", + " Measured Metric = 0.70968\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " train_test_data_classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fraud_detection_adversarial_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `card1` < 12740.000 AND `card1` >= 7867.000\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.836\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`M8` == "F"”\n", + "
\n", + " \n", + " Measured Metric = 0.71875\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " train_test_data_classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fraud_detection_adversarial_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `M8` == "F"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.836\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`D15` >= 1.000 AND `D15` < 232.500”\n", + "
\n", + " \n", + " Measured Metric = 0.68\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " train_test_data_classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fraud_detection_adversarial_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `D15` >= 1.000 AND `D15` < 232.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.7707547169811321\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`M7` == "F"”\n", + "
\n", + " \n", + " Measured Metric = 0.7381\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " train_test_data_classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fraud_detection_adversarial_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `M7` == "F"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.836\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`D1` >= 0.500 AND `D1` < 120.000”\n", + "
\n", + " \n", + " Measured Metric = 0.74194\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " train_test_data_classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fraud_detection_adversarial_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `D1` >= 0.500 AND `D1` < 120.000\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.836\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`C9` >= 0.500 AND `C9` < 1.500”\n", + "
\n", + " \n", + " Measured Metric = 0.69231\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " train_test_data_classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fraud_detection_adversarial_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `C9` >= 0.500 AND `C9` < 1.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.7707547169811321\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`V10` < 0.500”\n", + "
\n", + " \n", + " Measured Metric = 0.77143\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " train_test_data_classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fraud_detection_adversarial_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `V10` < 0.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.836\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`V11` < 0.500”\n", + "
\n", + " \n", + " Measured Metric = 0.77143\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " train_test_data_classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fraud_detection_adversarial_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `V11` < 0.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.836\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`P_emaildomain` == "gmail.com"”\n", + "
\n", + " \n", + " Measured Metric = 0.71429\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " train_test_data_classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fraud_detection_adversarial_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `P_emaildomain` == "gmail.com"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.7707547169811321\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`M6` == "T"”\n", + "
\n", + " \n", + " Measured Metric = 0.71429\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " train_test_data_classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fraud_detection_adversarial_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `M6` == "T"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.7707547169811321\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`C6` >= 1.500”\n", + "
\n", + " \n", + " Measured Metric = 0.775\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " train_test_data_classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fraud_detection_adversarial_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `C6` >= 1.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.836\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Accuracy on data slice “`D2` < 120.000”\n", + "
\n", + " \n", + " Measured Metric = 0.78125\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " train_test_data_classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " fraud_detection_adversarial_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `D2` < 120.000\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.836\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + "\n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] }, "execution_count": 13, "metadata": {}, @@ -675,118 +3972,6 @@ "source": [ "test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" ] - }, - { - "cell_type": "markdown", - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to: \n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to: \n", - "\n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" - ], - "metadata": { - "collapsed": false - } } ], "metadata": { @@ -805,7 +3990,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.6" + "version": "3.10.14" } }, "nbformat": 4, diff --git a/docs/reference/notebooks/insurance_prediction_lgbm.ipynb b/docs/reference/notebooks/insurance_prediction_lgbm.ipynb index 6c03460669..162e15d080 100644 --- a/docs/reference/notebooks/insurance_prediction_lgbm.ipynb +++ b/docs/reference/notebooks/insurance_prediction_lgbm.ipynb @@ -22,12 +22,7 @@ "Outline: \n", "\n", "* Detect vulnerabilities automatically with Giskard’s scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to: \n", - "\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { @@ -55,28 +50,28 @@ }, { "cell_type": "markdown", - "source": [ - "We also install the project-specific dependencies for this tutorial." - ], + "id": "b831a269a6537938", "metadata": { "collapsed": false }, - "id": "b831a269a6537938" + "source": [ + "We also install the project-specific dependencies for this tutorial." + ] }, { "cell_type": "code", "execution_count": null, - "outputs": [], - "source": [ - "%pip install lightgbm" - ], + "id": "797dd8f836bd3a03", "metadata": { - "collapsed": false, "ExecuteTime": { "start_time": "2023-08-01T15:21:01.831520Z" - } + }, + "collapsed": false }, - "id": "797dd8f836bd3a03" + "outputs": [], + "source": [ + "%pip install lightgbm" + ] }, { "cell_type": "markdown", @@ -105,11 +100,11 @@ "execution_count": 1, "id": "3c8f7165dcf10fbe", "metadata": { - "hidden": true, "ExecuteTime": { "end_time": "2023-11-09T12:44:00.915388Z", "start_time": "2023-11-09T12:43:55.199949Z" - } + }, + "hidden": true }, "outputs": [], "source": [ @@ -126,35 +121,35 @@ "from sklearn.pipeline import Pipeline\n", "from sklearn.preprocessing import OneHotEncoder, StandardScaler\n", "\n", - "from giskard import GiskardClient, testing, Dataset, Model, scan, Suite" + "from giskard import Dataset, Model, scan, testing" ] }, { "cell_type": "markdown", - "source": [ - "## Notebook-level settings" - ], + "id": "86bca9ebc5a47498", "metadata": { "collapsed": false }, - "id": "86bca9ebc5a47498" + "source": [ + "## Notebook-level settings" + ] }, { "cell_type": "code", "execution_count": 2, - "outputs": [], - "source": [ - "logging.set_verbosity(logging.ERROR)\n", - "warnings.filterwarnings(\"ignore\", message=r\"Passing\", category=FutureWarning)" - ], + "id": "7024ed0a", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:44:01.669179Z", "start_time": "2023-11-09T12:44:01.634934Z" - } + }, + "collapsed": false }, - "id": "7024ed0a" + "outputs": [], + "source": [ + "logging.set_verbosity(logging.ERROR)\n", + "warnings.filterwarnings(\"ignore\", message=r\"Passing\", category=FutureWarning)" + ] }, { "cell_type": "markdown", @@ -171,11 +166,11 @@ "execution_count": 3, "id": "434314c0d4cf31fb", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:44:02.296748Z", "start_time": "2023-11-09T12:44:02.274216Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -213,11 +208,11 @@ "execution_count": 4, "id": "fe19935c186fd365", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:44:03.389591Z", "start_time": "2023-11-09T12:44:03.347340Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -265,11 +260,11 @@ "execution_count": 6, "id": "2a5351b1c97f2a31", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:44:04.230046Z", "start_time": "2023-11-09T12:44:04.204640Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -289,14 +284,14 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "id": "4d95d45185280742", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:44:04.870629Z", "start_time": "2023-11-09T12:44:04.828841Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -336,11 +331,11 @@ "execution_count": 8, "id": "64d42c05c8107e59", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T12:44:05.597336Z", "start_time": "2023-11-09T12:44:05.574669Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -400,7 +395,7 @@ "metadata": {}, "outputs": [], "source": [ - "# Wrap the prediction function so that the whole pipeline is saved to the Hub\n", + "# Wrap the prediction function\n", "def prediction_function(df):\n", " return pipeline.predict(df)\n", "\n", @@ -422,13 +417,13 @@ }, { "cell_type": "markdown", - "source": [ - "## Detect vulnerabilities in your model" - ], + "id": "8dc6d9def6fcfe66", "metadata": { "collapsed": false }, - "id": "8dc6d9def6fcfe66" + "source": [ + "## Detect vulnerabilities in your model" + ] }, { "cell_type": "markdown", @@ -465,7 +460,1115 @@ "outputs": [ { "data": { - "text/html": "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -485,15 +1588,15 @@ }, { "cell_type": "markdown", + "id": "e94f621a1e7fb82d", + "metadata": { + "collapsed": false + }, "source": [ "### Generate test suites from the scan\n", "\n", "The objects produced by the scan can be used as fixtures to generate a test suite that integrate all detected vulnerabilities. Test suites allow you to evaluate and validate your model's performance, ensuring that it behaves as expected on a set of predefined test cases, and to identify any regressions or issues that might arise during development or updates." - ], - "metadata": { - "collapsed": false - }, - "id": "e94f621a1e7fb82d" + ] }, { "cell_type": "code", @@ -510,32 +1613,335 @@ "name": "stdout", "output_type": "stream", "text": [ - "Executed 'MSE on data slice “`region` == \"northeast\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 17058376.93295317}: \n", + "2024-05-29 13:33:44,636 pid:63742 MainThread giskard.datasets.base INFO Casting dataframe columns from {'age': 'int64', 'sex': 'object', 'bmi': 'float64', 'children': 'int64', 'smoker': 'object', 'region': 'object'} to {'age': 'int64', 'sex': 'object', 'bmi': 'float64', 'children': 'int64', 'smoker': 'object', 'region': 'object'}\n", + "2024-05-29 13:33:44,638 pid:63742 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (96, 7) executed in 0:00:00.010454\n", + "Executed 'MSE on data slice “`region` == \"northeast\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 17058376.93295317}: \n", " Test failed\n", " Metric: 20989697.45\n", " \n", " \n", - "Executed 'MSE on data slice “`sex` == \"female\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 17058376.93295317}: \n", + "2024-05-29 13:33:44,653 pid:63742 MainThread giskard.datasets.base INFO Casting dataframe columns from {'age': 'int64', 'sex': 'object', 'bmi': 'float64', 'children': 'int64', 'smoker': 'object', 'region': 'object'} to {'age': 'int64', 'sex': 'object', 'bmi': 'float64', 'children': 'int64', 'smoker': 'object', 'region': 'object'}\n", + "2024-05-29 13:33:44,655 pid:63742 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (153, 7) executed in 0:00:00.008035\n", + "Executed 'MSE on data slice “`sex` == \"female\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 17058376.93295317}: \n", " Test failed\n", " Metric: 18686445.91\n", " \n", " \n", - "Executed 'MSE on data slice “`smoker` == \"no\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 17058376.93295317}: \n", + "2024-05-29 13:33:44,666 pid:63742 MainThread giskard.datasets.base INFO Casting dataframe columns from {'age': 'int64', 'sex': 'object', 'bmi': 'float64', 'children': 'int64', 'smoker': 'object', 'region': 'object'} to {'age': 'int64', 'sex': 'object', 'bmi': 'float64', 'children': 'int64', 'smoker': 'object', 'region': 'object'}\n", + "2024-05-29 13:33:44,669 pid:63742 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (264, 7) executed in 0:00:00.007542\n", + "Executed 'MSE on data slice “`smoker` == \"no\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 17058376.93295317}: \n", " Test failed\n", " Metric: 18440360.15\n", " \n", " \n", - "Executed 'MSE on data slice “`region` == \"southeast\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 17058376.93295317}: \n", + "2024-05-29 13:33:44,677 pid:63742 MainThread giskard.datasets.base INFO Casting dataframe columns from {'age': 'int64', 'sex': 'object', 'bmi': 'float64', 'children': 'int64', 'smoker': 'object', 'region': 'object'} to {'age': 'int64', 'sex': 'object', 'bmi': 'float64', 'children': 'int64', 'smoker': 'object', 'region': 'object'}\n", + "2024-05-29 13:33:44,678 pid:63742 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (88, 7) executed in 0:00:00.004697\n", + "Executed 'MSE on data slice “`region` == \"southeast\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 17058376.93295317}: \n", " Test failed\n", " Metric: 17201661.08\n", " \n", - " \n" + " \n", + "2024-05-29 13:33:44,682 pid:63742 MainThread giskard.core.suite INFO Executed test suite 'Test suite'\n", + "2024-05-29 13:33:44,682 pid:63742 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 13:33:44,683 pid:63742 MainThread giskard.core.suite INFO MSE on data slice “`region` == \"northeast\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 17058376.93295317}): {failed, metric=20989697.452960182}\n", + "2024-05-29 13:33:44,683 pid:63742 MainThread giskard.core.suite INFO MSE on data slice “`sex` == \"female\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 17058376.93295317}): {failed, metric=18686445.912572958}\n", + "2024-05-29 13:33:44,683 pid:63742 MainThread giskard.core.suite INFO MSE on data slice “`smoker` == \"no\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 17058376.93295317}): {failed, metric=18440360.147996694}\n", + "2024-05-29 13:33:44,684 pid:63742 MainThread giskard.core.suite INFO MSE on data slice “`region` == \"southeast\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 17058376.93295317}): {failed, metric=17201661.078229036}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test MSE on data slice “`region` == "northeast"”\n
\n \n Measured Metric = 20989697.45296\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n d2f80988-54e1-404c-abc0-09a1bfc81298\n
\n \n
\n dataset\n insurance dataset\n
\n \n
\n slicing_function\n `region` == "northeast"\n
\n \n
\n threshold\n 17058376.93295317\n
\n \n
\n
\n \n \n
\n Test MSE on data slice “`sex` == "female"”\n
\n \n Measured Metric = 18686445.91257\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n d2f80988-54e1-404c-abc0-09a1bfc81298\n
\n \n
\n dataset\n insurance dataset\n
\n \n
\n slicing_function\n `sex` == "female"\n
\n \n
\n threshold\n 17058376.93295317\n
\n \n
\n
\n \n \n
\n Test MSE on data slice “`smoker` == "no"”\n
\n \n Measured Metric = 18440360.148\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n d2f80988-54e1-404c-abc0-09a1bfc81298\n
\n \n
\n dataset\n insurance dataset\n
\n \n
\n slicing_function\n `smoker` == "no"\n
\n \n
\n threshold\n 17058376.93295317\n
\n \n
\n
\n \n \n
\n Test MSE on data slice “`region` == "southeast"”\n
\n \n Measured Metric = 17201661.07823\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n d2f80988-54e1-404c-abc0-09a1bfc81298\n
\n \n
\n dataset\n insurance dataset\n
\n \n
\n slicing_function\n `region` == "southeast"\n
\n \n
\n threshold\n 17058376.93295317\n
\n \n
\n \n \n\n \n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test MSE on data slice “`region` == "northeast"”\n", + "
\n", + " \n", + " Measured Metric = 20989697.45296\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " insurance model\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " insurance dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `region` == "northeast"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 17058376.93295317\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test MSE on data slice “`sex` == "female"”\n", + "
\n", + " \n", + " Measured Metric = 18686445.91257\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " insurance model\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " insurance dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `sex` == "female"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 17058376.93295317\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test MSE on data slice “`smoker` == "no"”\n", + "
\n", + " \n", + " Measured Metric = 18440360.148\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " insurance model\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " insurance dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `smoker` == "no"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 17058376.93295317\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test MSE on data slice “`region` == "southeast"”\n", + "
\n", + " \n", + " Measured Metric = 17201661.07823\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " insurance model\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " insurance dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `region` == "southeast"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 17058376.93295317\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + "\n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] }, "execution_count": 13, "metadata": {}, @@ -578,124 +1984,6 @@ "source": [ "test_suite.add_test(testing.test_rmse(model=giskard_model, dataset=giskard_dataset, threshold=10.0)).run()" ] - }, - { - "cell_type": "markdown", - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to:\n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ], - "metadata": { - "collapsed": false - }, - "id": "20b6dd6e50c9f9b6" - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], - "metadata": { - "collapsed": false - }, - "id": "a6259312ce27b32" - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - }, - "id": "8430cb1806cb0b36" - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - }, - "id": "7984f79517cb379f" - }, - { - "cell_type": "markdown", - "id": "4a31adf6", - "metadata": {}, - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "3e139194b2d1076", - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ] - }, - { - "cell_type": "markdown", - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to:\n", - " \n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" - ], - "metadata": { - "collapsed": false - }, - "id": "dedf0195ce8a1a6a" - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" - ], - "metadata": { - "collapsed": false - }, - "id": "4e2b05cbe891bce0" } ], "metadata": { @@ -714,7 +2002,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" + "version": "3.10.14" }, "toc": { "base_numbering": "0", diff --git a/docs/reference/notebooks/m5_sales_prediction_lgbm.ipynb b/docs/reference/notebooks/m5_sales_prediction_lgbm.ipynb index d17986afbd..06e8e45a86 100644 --- a/docs/reference/notebooks/m5_sales_prediction_lgbm.ipynb +++ b/docs/reference/notebooks/m5_sales_prediction_lgbm.ipynb @@ -19,12 +19,7 @@ "Outline: \n", "\n", "* Detect vulnerabilities automatically with Giskard’s scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to: \n", - "\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { @@ -39,7 +34,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2023-08-24T10:39:48.395480Z", @@ -63,14 +58,14 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 1, "metadata": { - "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19", - "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5", "ExecuteTime": { "end_time": "2023-11-09T13:52:15.838102Z", "start_time": "2023-11-09T13:52:07.960231Z" - } + }, + "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19", + "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5" }, "outputs": [], "source": [ @@ -83,7 +78,7 @@ "from lightgbm import LGBMRegressor\n", "from sklearn.metrics import r2_score\n", "\n", - "from giskard import Dataset, Model, testing, GiskardClient, scan, Suite" + "from giskard import Dataset, Model, scan, testing" ] }, { @@ -97,13 +92,13 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 2, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T13:52:18.124518Z", "start_time": "2023-11-09T13:52:18.073663Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -276,13 +271,13 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T13:53:18.825367Z", "start_time": "2023-11-09T13:53:18.772930Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -369,12 +364,12 @@ }, { "cell_type": "markdown", - "source": [ - "## Detect vulnerabilities in your model" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "## Detect vulnerabilities in your model" + ] }, { "cell_type": "markdown", @@ -400,18 +395,1123 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 10, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T13:54:20.338407Z", "start_time": "2023-11-09T13:54:20.146169Z" - } + }, + "collapsed": false }, "outputs": [ { "data": { - "text/html": "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -432,58 +1532,361 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "### Generate test suites from the scan\n", "\n", "The objects produced by the scan can be used as fixtures to generate a test suite that integrate all detected vulnerabilities. Test suites allow you to evaluate and validate your model's performance, ensuring that it behaves as expected on a set of predefined test cases, and to identify any regressions or issues that might arise during development or updates." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 11, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T13:54:32.233432Z", "start_time": "2023-11-09T13:54:32.004498Z" - } + }, + "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Executed 'MSE on data slice “`snap_TX` == 1”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 7.1383301822423295}: \n", + "2024-05-29 13:43:58,662 pid:66104 MainThread giskard.datasets.base INFO Casting dataframe columns from {'item_id': 'int64', 'dept_id': 'int64', 'cat_id': 'int64', 'store_id': 'int64', 'state_id': 'int64', 'wm_yr_wk': 'int64', 'event_name_1': 'int64', 'event_type_1': 'int64', 'event_name_2': 'int64', 'event_type_2': 'int64', 'snap_CA': 'int64', 'snap_TX': 'int64', 'snap_WI': 'int64', 'sell_price': 'float64'} to {'item_id': 'int64', 'dept_id': 'int64', 'cat_id': 'int64', 'store_id': 'int64', 'state_id': 'int64', 'wm_yr_wk': 'int64', 'event_name_1': 'int64', 'event_type_1': 'int64', 'event_name_2': 'int64', 'event_type_2': 'int64', 'snap_CA': 'int64', 'snap_TX': 'int64', 'snap_WI': 'int64', 'sell_price': 'float64'}\n", + "2024-05-29 13:43:58,664 pid:66104 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (1000, 15) executed in 0:00:00.015511\n", + "Executed 'MSE on data slice “`snap_TX` == 1”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 7.1383301822423295}: \n", " Test failed\n", " Metric: 8.92\n", " \n", " \n", - "Executed 'MSE on data slice “`snap_WI` == 1”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 7.1383301822423295}: \n", + "2024-05-29 13:43:58,683 pid:66104 MainThread giskard.datasets.base INFO Casting dataframe columns from {'item_id': 'int64', 'dept_id': 'int64', 'cat_id': 'int64', 'store_id': 'int64', 'state_id': 'int64', 'wm_yr_wk': 'int64', 'event_name_1': 'int64', 'event_type_1': 'int64', 'event_name_2': 'int64', 'event_type_2': 'int64', 'snap_CA': 'int64', 'snap_TX': 'int64', 'snap_WI': 'int64', 'sell_price': 'float64'} to {'item_id': 'int64', 'dept_id': 'int64', 'cat_id': 'int64', 'store_id': 'int64', 'state_id': 'int64', 'wm_yr_wk': 'int64', 'event_name_1': 'int64', 'event_type_1': 'int64', 'event_name_2': 'int64', 'event_type_2': 'int64', 'snap_CA': 'int64', 'snap_TX': 'int64', 'snap_WI': 'int64', 'sell_price': 'float64'}\n", + "2024-05-29 13:43:58,685 pid:66104 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (1000, 15) executed in 0:00:00.012052\n", + "Executed 'MSE on data slice “`snap_WI` == 1”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 7.1383301822423295}: \n", " Test failed\n", " Metric: 8.64\n", " \n", " \n", - "Executed 'MSE on data slice “`wm_yr_wk` == 1.161e+04”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 7.1383301822423295}: \n", + "2024-05-29 13:43:58,699 pid:66104 MainThread giskard.datasets.base INFO Casting dataframe columns from {'item_id': 'int64', 'dept_id': 'int64', 'cat_id': 'int64', 'store_id': 'int64', 'state_id': 'int64', 'wm_yr_wk': 'int64', 'event_name_1': 'int64', 'event_type_1': 'int64', 'event_name_2': 'int64', 'event_type_2': 'int64', 'snap_CA': 'int64', 'snap_TX': 'int64', 'snap_WI': 'int64', 'sell_price': 'float64'} to {'item_id': 'int64', 'dept_id': 'int64', 'cat_id': 'int64', 'store_id': 'int64', 'state_id': 'int64', 'wm_yr_wk': 'int64', 'event_name_1': 'int64', 'event_type_1': 'int64', 'event_name_2': 'int64', 'event_type_2': 'int64', 'snap_CA': 'int64', 'snap_TX': 'int64', 'snap_WI': 'int64', 'sell_price': 'float64'}\n", + "2024-05-29 13:43:58,701 pid:66104 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (700, 15) executed in 0:00:00.009145\n", + "Executed 'MSE on data slice “`wm_yr_wk` == 1.161e+04”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 7.1383301822423295}: \n", " Test failed\n", " Metric: 8.63\n", " \n", " \n", - "Executed 'MSE on data slice “`snap_CA` == 1”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 7.1383301822423295}: \n", + "2024-05-29 13:43:58,716 pid:66104 MainThread giskard.datasets.base INFO Casting dataframe columns from {'item_id': 'int64', 'dept_id': 'int64', 'cat_id': 'int64', 'store_id': 'int64', 'state_id': 'int64', 'wm_yr_wk': 'int64', 'event_name_1': 'int64', 'event_type_1': 'int64', 'event_name_2': 'int64', 'event_type_2': 'int64', 'snap_CA': 'int64', 'snap_TX': 'int64', 'snap_WI': 'int64', 'sell_price': 'float64'} to {'item_id': 'int64', 'dept_id': 'int64', 'cat_id': 'int64', 'store_id': 'int64', 'state_id': 'int64', 'wm_yr_wk': 'int64', 'event_name_1': 'int64', 'event_type_1': 'int64', 'event_name_2': 'int64', 'event_type_2': 'int64', 'snap_CA': 'int64', 'snap_TX': 'int64', 'snap_WI': 'int64', 'sell_price': 'float64'}\n", + "2024-05-29 13:43:58,718 pid:66104 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (1000, 15) executed in 0:00:00.009801\n", + "Executed 'MSE on data slice “`snap_CA` == 1”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 7.1383301822423295}: \n", " Test failed\n", " Metric: 7.92\n", " \n", - " \n" + " \n", + "2024-05-29 13:43:58,723 pid:66104 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'\n", + "2024-05-29 13:43:58,723 pid:66104 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 13:43:58,724 pid:66104 MainThread giskard.core.suite INFO MSE on data slice “`snap_TX` == 1” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 7.1383301822423295}): {failed, metric=8.915211806609342}\n", + "2024-05-29 13:43:58,724 pid:66104 MainThread giskard.core.suite INFO MSE on data slice “`snap_WI` == 1” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 7.1383301822423295}): {failed, metric=8.637798447295712}\n", + "2024-05-29 13:43:58,725 pid:66104 MainThread giskard.core.suite INFO MSE on data slice “`wm_yr_wk` == 1.161e+04” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 7.1383301822423295}): {failed, metric=8.630063641271192}\n", + "2024-05-29 13:43:58,725 pid:66104 MainThread giskard.core.suite INFO MSE on data slice “`snap_CA` == 1” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 7.1383301822423295}): {failed, metric=7.922860008805299}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test MSE on data slice “`snap_TX` == 1”\n
\n \n Measured Metric = 8.91521\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 953d4ee7-82dc-421e-910d-9043450544ae\n
\n \n
\n dataset\n M5 products timeseries dataset\n
\n \n
\n slicing_function\n `snap_TX` == 1\n
\n \n
\n threshold\n 7.1383301822423295\n
\n \n
\n
\n \n \n
\n Test MSE on data slice “`snap_WI` == 1”\n
\n \n Measured Metric = 8.6378\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 953d4ee7-82dc-421e-910d-9043450544ae\n
\n \n
\n dataset\n M5 products timeseries dataset\n
\n \n
\n slicing_function\n `snap_WI` == 1\n
\n \n
\n threshold\n 7.1383301822423295\n
\n \n
\n
\n \n \n
\n Test MSE on data slice “`wm_yr_wk` == 1.161e+04”\n
\n \n Measured Metric = 8.63006\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 953d4ee7-82dc-421e-910d-9043450544ae\n
\n \n
\n dataset\n M5 products timeseries dataset\n
\n \n
\n slicing_function\n `wm_yr_wk` == 1.161e+04\n
\n \n
\n threshold\n 7.1383301822423295\n
\n \n
\n
\n \n \n
\n Test MSE on data slice “`snap_CA` == 1”\n
\n \n Measured Metric = 7.92286\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 953d4ee7-82dc-421e-910d-9043450544ae\n
\n \n
\n dataset\n M5 products timeseries dataset\n
\n \n
\n slicing_function\n `snap_CA` == 1\n
\n \n
\n threshold\n 7.1383301822423295\n
\n \n
\n \n \n\n \n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test MSE on data slice “`snap_TX` == 1”\n", + "
\n", + " \n", + " Measured Metric = 8.91521\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " M5 sales timeseries regressor\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " M5 products timeseries dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `snap_TX` == 1\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 7.1383301822423295\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test MSE on data slice “`snap_WI` == 1”\n", + "
\n", + " \n", + " Measured Metric = 8.6378\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " M5 sales timeseries regressor\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " M5 products timeseries dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `snap_WI` == 1\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 7.1383301822423295\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test MSE on data slice “`wm_yr_wk` == 1.161e+04”\n", + "
\n", + " \n", + " Measured Metric = 8.63006\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " M5 sales timeseries regressor\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " M5 products timeseries dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `wm_yr_wk` == 1.161e+04\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 7.1383301822423295\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test MSE on data slice “`snap_CA` == 1”\n", + "
\n", + " \n", + " Measured Metric = 7.92286\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " M5 sales timeseries regressor\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " M5 products timeseries dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `snap_CA` == 1\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 7.1383301822423295\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + "\n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] }, - "execution_count": 12, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } @@ -522,118 +1925,6 @@ "source": [ "test_suite.add_test(testing.test_r2(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to: \n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ] - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ] - }, - { - "cell_type": "markdown", - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to:\n", - "\n", - "* Check for regressions after training a new model\n", - "* Automate the test suite execution in a CI/CD pipeline\n", - "* Compare several models during the prototyping phase" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" - ], - "metadata": { - "collapsed": false - } } ], "metadata": { @@ -652,7 +1943,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.6" + "version": "3.10.14" } }, "nbformat": 4, diff --git a/docs/reference/notebooks/medical_transcript_classification_sklearn.ipynb b/docs/reference/notebooks/medical_transcript_classification_sklearn.ipynb index b18c318240..4694220c18 100644 --- a/docs/reference/notebooks/medical_transcript_classification_sklearn.ipynb +++ b/docs/reference/notebooks/medical_transcript_classification_sklearn.ipynb @@ -22,12 +22,7 @@ "Outline: \n", "\n", "* Detect vulnerabilities automatically with Giskard’s scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to: \n", - "\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { @@ -55,13 +50,13 @@ }, { "cell_type": "markdown", - "source": [ - "We also install the project-specific dependencies for this tutorial." - ], + "id": "a722bc090c468228", "metadata": { "collapsed": false }, - "id": "a722bc090c468228" + "source": [ + "We also install the project-specific dependencies for this tutorial." + ] }, { "cell_type": "code", @@ -87,14 +82,14 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 1, "id": "14e64fb17dd952c", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T14:19:23.444151Z", "start_time": "2023-11-09T14:19:23.414422Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -114,7 +109,7 @@ "from sklearn.preprocessing import FunctionTransformer\n", "from typing import Iterable\n", "\n", - "from giskard import Dataset, Model, scan, GiskardClient, testing, Suite" + "from giskard import Dataset, Model, scan, testing" ] }, { @@ -129,14 +124,14 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 2, "id": "eb6ddf97e5bfaa17", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T14:19:24.265761Z", "start_time": "2023-11-09T14:19:24.208447Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -175,26 +170,26 @@ }, { "cell_type": "markdown", - "source": [ - "### Download NLTK stopwords corpus" - ], + "id": "51ae2ea76ba01e93", "metadata": { "collapsed": false }, - "id": "51ae2ea76ba01e93" + "source": [ + "### Download NLTK stopwords corpus" + ] }, { "cell_type": "code", "execution_count": null, + "id": "54d3dabd4484064e", + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "# Download list of english stopwords.\n", "nltk.download('stopwords')" - ], - "metadata": { - "collapsed": false - }, - "id": "54d3dabd4484064e" + ] }, { "cell_type": "markdown", @@ -208,14 +203,14 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 4, "id": "2016f55da2fb2636", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T14:19:26.322440Z", "start_time": "2023-11-09T14:19:26.198889Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -276,14 +271,14 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 6, "id": "ef066b868f02dea0", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T14:19:27.644994Z", "start_time": "2023-11-09T14:19:27.618840Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -305,14 +300,14 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": null, "id": "2eadc4944d498729", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T14:19:28.691833Z", "start_time": "2023-11-09T14:19:28.637886Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -347,14 +342,14 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 8, "id": "cc4c51a3519004b1", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T14:19:29.581720Z", "start_time": "2023-11-09T14:19:29.501498Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -449,7 +444,7 @@ }, "outputs": [], "source": [ - "# Wrap the prediction function so that the whole pipeline get saved to the Hub \n", + "# Wrap the prediction function\n", "def prediction_function(df):\n", " return pipeline.predict_proba(df)\n", "\n", @@ -469,13 +464,13 @@ }, { "cell_type": "markdown", - "source": [ - "## Detect vulnerabilities in your model" - ], + "id": "c6eb3203bcda614e", "metadata": { "collapsed": false }, - "id": "c6eb3203bcda614e" + "source": [ + "## Detect vulnerabilities in your model" + ] }, { "cell_type": "markdown", @@ -503,19 +498,4430 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 12, "id": "eb4a2acdff290603", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T14:35:48.509583Z", "start_time": "2023-11-09T14:35:48.257791Z" - } + }, + "collapsed": false }, "outputs": [ { "data": { - "text/html": "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -527,13 +4933,13 @@ }, { "cell_type": "markdown", - "source": [ - "## Generate comprehensive test suites automatically for your model" - ], + "id": "72547cc27176d759", "metadata": { "collapsed": false }, - "id": "72547cc27176d759" + "source": [ + "## Generate comprehensive test suites automatically for your model" + ] }, { "cell_type": "markdown", @@ -549,183 +4955,1812 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 14, "id": "e740cee558970a9c", "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T14:34:00.381087Z", "start_time": "2023-11-09T14:33:54.189523Z" - } + }, + "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Executed 'Invariance to “Add typos”' with arguments {'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", + "2024-05-29 13:49:53,802 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:53,804 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (371, 2) executed in 0:00:00.009591\n", + "2024-05-29 13:49:54,057 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,048 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (371, 2) executed in 0:00:00.998800\n", + "2024-05-29 13:49:55,051 pid:66538 MainThread giskard.utils.logging_utils INFO Perturb and predict data executed in 0:00:01.259399\n", + "2024-05-29 13:49:55,051 pid:66538 MainThread giskard.utils.logging_utils INFO Compare and predict the data executed in 0:00:00.000281\n", + "Executed 'Invariance to “Add typos”' with arguments {'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", " Test failed\n", - " Metric: 0.91\n", - " - [TestMessageLevel.INFO] 371 rows were perturbed\n", - " \n", - "Executed 'Overconfidence on data slice “`transcription` contains \"weight\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6569444444444444, 'p_threshold': 0.2468526289804987}: \n", - " Test failed\n", - " Metric: 0.74\n", - " \n", + " Metric: 0.9\n", + " - [INFO] 371 rows were perturbed\n", " \n", - "Executed 'Overconfidence on data slice “`transcription` contains \"having\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6569444444444444, 'p_threshold': 0.2468526289804987}: \n", + "2024-05-29 13:49:55,070 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,071 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (60, 2) executed in 0:00:00.005307\n", + "Executed 'Overconfidence on data slice “`transcription` contains \"temperature\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}: \n", " Test failed\n", - " Metric: 0.74\n", + " Metric: 0.73\n", " \n", " \n", - "Executed 'Overconfidence on data slice “`transcription` contains \"today\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6569444444444444, 'p_threshold': 0.2468526289804987}: \n", + "2024-05-29 13:49:55,090 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,091 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (97, 2) executed in 0:00:00.006231\n", + "Executed 'Overconfidence on data slice “`transcription` contains \"dr\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}: \n", " Test failed\n", - " Metric: 0.74\n", + " Metric: 0.73\n", " \n", " \n", - "Executed 'Overconfidence on data slice “`transcription` contains \"temperature\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6569444444444444, 'p_threshold': 0.2468526289804987}: \n", + "2024-05-29 13:49:55,111 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,112 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (76, 2) executed in 0:00:00.007671\n", + "Executed 'Overconfidence on data slice “`transcription` contains \"weight\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}: \n", " Test failed\n", - " Metric: 0.73\n", + " Metric: 0.72\n", " \n", " \n", - "Executed 'Overconfidence on data slice “`transcription` contains \"dr\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6569444444444444, 'p_threshold': 0.2468526289804987}: \n", + "2024-05-29 13:49:55,132 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,132 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (65, 2) executed in 0:00:00.006201\n", + "Executed 'Overconfidence on data slice “`transcription` contains \"having\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}: \n", " Test failed\n", - " Metric: 0.73\n", + " Metric: 0.72\n", " \n", " \n", - "Executed 'Overconfidence on data slice “`transcription` contains \"follow\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6569444444444444, 'p_threshold': 0.2468526289804987}: \n", + "2024-05-29 13:49:55,151 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,152 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (71, 2) executed in 0:00:00.005367\n", + "Executed 'Overconfidence on data slice “`transcription` contains \"today\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}: \n", " Test failed\n", " Metric: 0.72\n", " \n", " \n", - "Executed 'Overconfidence on data slice “`avg_whitespace(transcription)` >= 0.160”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6569444444444444, 'p_threshold': 0.2468526289804987}: \n", + "2024-05-29 13:49:55,172 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,173 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (80, 2) executed in 0:00:00.005348\n", + "Executed 'Overconfidence on data slice “`transcription` contains \"follow\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}: \n", " Test failed\n", - " Metric: 0.71\n", + " Metric: 0.7\n", " \n", " \n", - "Executed 'Overconfidence on data slice “`transcription` contains \"blood\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6569444444444444, 'p_threshold': 0.2468526289804987}: \n", + "2024-05-29 13:49:55,191 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,192 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (191, 2) executed in 0:00:00.008549\n", + "Executed 'Overconfidence on data slice “`transcription` contains \"blood\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}: \n", " Test failed\n", " Metric: 0.7\n", " \n", " \n", - "Executed 'Overconfidence on data slice “`transcription` contains \"distress\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6569444444444444, 'p_threshold': 0.2468526289804987}: \n", + "2024-05-29 13:49:55,213 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,214 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (70, 2) executed in 0:00:00.007286\n", + "Executed 'Overconfidence on data slice “`transcription` contains \"distress\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}: \n", " Test failed\n", " Metric: 0.69\n", " \n", " \n", - "Executed 'Overconfidence on data slice “`transcription` contains \"mg\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6569444444444444, 'p_threshold': 0.2468526289804987}: \n", + "2024-05-29 13:49:55,236 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,237 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (126, 2) executed in 0:00:00.007065\n", + "Executed 'Overconfidence on data slice “`transcription` contains \"stable\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}: \n", " Test failed\n", - " Metric: 0.69\n", + " Metric: 0.68\n", " \n", " \n", - "Executed 'Overconfidence on data slice “`transcription` contains \"continue\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6569444444444444, 'p_threshold': 0.2468526289804987}: \n", + "2024-05-29 13:49:55,257 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,258 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (93, 2) executed in 0:00:00.006464\n", + "Executed 'Overconfidence on data slice “`transcription` contains \"mg\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}: \n", " Test failed\n", - " Metric: 0.69\n", + " Metric: 0.68\n", " \n", " \n", - "Executed 'Overconfidence on data slice “`transcription` contains \"stable\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6569444444444444, 'p_threshold': 0.2468526289804987}: \n", + "2024-05-29 13:49:55,270 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,271 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (214, 2) executed in 0:00:00.009823\n", + "Executed 'Overconfidence on data slice “`text_length(transcription)` >= 2145.000”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}: \n", " Test failed\n", " Metric: 0.68\n", " \n", " \n", - "Executed 'Overconfidence on data slice “`avg_word_length(transcription)` < 5.789”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6569444444444444, 'p_threshold': 0.2468526289804987}: \n", + "2024-05-29 13:49:55,291 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,292 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (85, 2) executed in 0:00:00.006814\n", + "Executed 'Overconfidence on data slice “`transcription` contains \"discharge\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}: \n", " Test failed\n", - " Metric: 0.68\n", + " Metric: 0.67\n", " \n", " \n", - "Executed 'Overconfidence on data slice “`text_length(transcription)` >= 2145.000”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6569444444444444, 'p_threshold': 0.2468526289804987}: \n", + "2024-05-29 13:49:55,312 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,313 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (101, 2) executed in 0:00:00.007278\n", + "Executed 'Overconfidence on data slice “`transcription` contains \"hospital\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}: \n", " Test failed\n", - " Metric: 0.68\n", + " Metric: 0.67\n", + " \n", + " \n", + "2024-05-29 13:49:55,334 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,335 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (52, 2) executed in 0:00:00.006081\n", + "Executed 'Overconfidence on data slice “`transcription` contains \"continue\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}: \n", + " Test failed\n", + " Metric: 0.67\n", " \n", " \n", - "Executed 'Overconfidence on data slice “`transcription` contains \"discharge\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6569444444444444, 'p_threshold': 0.2468526289804987}: \n", + "2024-05-29 13:49:55,354 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,355 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (76, 2) executed in 0:00:00.005812\n", + "Executed 'Overconfidence on data slice “`transcription` contains \"vital\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}: \n", " Test failed\n", " Metric: 0.67\n", " \n", " \n", - "Executed 'Precision on data slice “`transcription` contains \"xyz\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.581266846361186}: \n", + "2024-05-29 13:49:55,374 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,375 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (22, 2) executed in 0:00:00.005195\n", + "Executed 'Precision on data slice “`transcription` contains \"xyz\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}: \n", " Test failed\n", " Metric: 0.32\n", " \n", " \n", - "Executed 'Precision on data slice “`transcription` contains \"subjective\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.581266846361186}: \n", + "2024-05-29 13:49:55,397 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,398 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (30, 2) executed in 0:00:00.004141\n", + "Executed 'Precision on data slice “`transcription` contains \"subjective\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}: \n", " Test failed\n", " Metric: 0.37\n", " \n", " \n", - "Executed 'Precision on data slice “`transcription` contains \"admission\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.581266846361186}: \n", + "2024-05-29 13:49:55,417 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,418 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (64, 2) executed in 0:00:00.005701\n", + "Executed 'Precision on data slice “`transcription` contains \"admission\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}: \n", " Test failed\n", " Metric: 0.38\n", " \n", " \n", - "Executed 'Precision on data slice “`transcription` contains \"daily\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.581266846361186}: \n", + "2024-05-29 13:49:55,438 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,439 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (65, 2) executed in 0:00:00.005455\n", + "Executed 'Precision on data slice “`transcription` contains \"daily\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}: \n", " Test failed\n", " Metric: 0.38\n", " \n", " \n", - "Executed 'Precision on data slice “`transcription` contains \"coronary\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.581266846361186}: \n", + "2024-05-29 13:49:55,460 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,462 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (23, 2) executed in 0:00:00.006355\n", + "Executed 'Precision on data slice “`transcription` contains \"coronary\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}: \n", + " Test failed\n", + " Metric: 0.39\n", + " \n", + " \n", + "2024-05-29 13:49:55,482 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,484 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (23, 2) executed in 0:00:00.005071\n", + "Executed 'Precision on data slice “`transcription` contains \"aspirin\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}: \n", " Test failed\n", " Metric: 0.39\n", " \n", " \n", - "Executed 'Precision on data slice “`transcription` contains \"followup\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.581266846361186}: \n", + "2024-05-29 13:49:55,505 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,506 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (48, 2) executed in 0:00:00.005088\n", + "Executed 'Precision on data slice “`transcription` contains \"followup\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}: \n", " Test failed\n", " Metric: 0.4\n", " \n", " \n", - "Executed 'Precision on data slice “`transcription` contains \"lung\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.581266846361186}: \n", + "2024-05-29 13:49:55,526 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,528 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (20, 2) executed in 0:00:00.005684\n", + "Executed 'Precision on data slice “`transcription` contains \"lung\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}: \n", " Test failed\n", " Metric: 0.4\n", " \n", " \n", - "Executed 'Precision on data slice “`transcription` contains \"count\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.581266846361186}: \n", + "2024-05-29 13:49:55,549 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,550 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (50, 2) executed in 0:00:00.006254\n", + "Executed 'Precision on data slice “`transcription` contains \"count\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}: \n", " Test failed\n", " Metric: 0.4\n", " \n", " \n", - "Executed 'Precision on data slice “`transcription` contains \"function\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.581266846361186}: \n", + "2024-05-29 13:49:55,581 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,593 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (29, 2) executed in 0:00:00.016641\n", + "Executed 'Precision on data slice “`transcription` contains \"function\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}: \n", " Test failed\n", " Metric: 0.41\n", " \n", " \n", - "Executed 'Precision on data slice “`transcription` contains \"abc\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.581266846361186}: \n", + "2024-05-29 13:49:55,648 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,649 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (24, 2) executed in 0:00:00.008552\n", + "Executed 'Precision on data slice “`transcription` contains \"abc\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}: \n", " Test failed\n", " Metric: 0.42\n", " \n", " \n", - "Executed 'Precision on data slice “`transcription` contains \"improved\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.581266846361186}: \n", + "2024-05-29 13:49:55,679 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,681 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (31, 2) executed in 0:00:00.007550\n", + "Executed 'Precision on data slice “`transcription` contains \"improved\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}: \n", " Test failed\n", " Metric: 0.42\n", " \n", " \n", - "Executed 'Precision on data slice “`transcription` contains \"discharge\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.581266846361186}: \n", + "2024-05-29 13:49:55,703 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,704 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (52, 2) executed in 0:00:00.006033\n", + "Executed 'Precision on data slice “`transcription` contains \"continue\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}: \n", " Test failed\n", " Metric: 0.42\n", " \n", " \n", - "Executed 'Precision on data slice “`transcription` contains \"greater\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.581266846361186}: \n", + "2024-05-29 13:49:55,725 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,726 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (85, 2) executed in 0:00:00.006033\n", + "Executed 'Precision on data slice “`transcription` contains \"discharge\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}: \n", " Test failed\n", - " Metric: 0.43\n", + " Metric: 0.42\n", " \n", " \n", - "Executed 'Precision on data slice “`transcription` contains \"aspirin\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.581266846361186}: \n", + "2024-05-29 13:49:55,862 pid:66538 MainThread giskard.datasets.base INFO Casting dataframe columns from {'transcription': 'object'} to {'transcription': 'object'}\n", + "2024-05-29 13:49:55,863 pid:66538 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (23, 2) executed in 0:00:00.005574\n", + "Executed 'Precision on data slice “`transcription` contains \"greater\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}: \n", " Test failed\n", " Metric: 0.43\n", " \n", " \n", - "Executed 'Precision on data slice “`transcription` contains \"continue\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.581266846361186}: \n", - " Test failed\n", - " Metric: 0.44\n", - " \n", - " \n" + "2024-05-29 13:49:55,866 pid:66538 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'\n", + "2024-05-29 13:49:55,866 pid:66538 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 13:49:55,866 pid:66538 MainThread giskard.core.suite INFO Invariance to “Add typos” ({'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}): {failed, metric=0.9029649595687331}\n", + "2024-05-29 13:49:55,867 pid:66538 MainThread giskard.core.suite INFO Overconfidence on data slice “`transcription` contains \"temperature\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}): {failed, metric=0.7333333333333333}\n", + "2024-05-29 13:49:55,867 pid:66538 MainThread giskard.core.suite INFO Overconfidence on data slice “`transcription` contains \"dr\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}): {failed, metric=0.7254901960784313}\n", + "2024-05-29 13:49:55,867 pid:66538 MainThread giskard.core.suite INFO Overconfidence on data slice “`transcription` contains \"weight\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}): {failed, metric=0.71875}\n", + "2024-05-29 13:49:55,867 pid:66538 MainThread giskard.core.suite INFO Overconfidence on data slice “`transcription` contains \"having\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}): {failed, metric=0.71875}\n", + "2024-05-29 13:49:55,868 pid:66538 MainThread giskard.core.suite INFO Overconfidence on data slice “`transcription` contains \"today\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}): {failed, metric=0.717948717948718}\n", + "2024-05-29 13:49:55,868 pid:66538 MainThread giskard.core.suite INFO Overconfidence on data slice “`transcription` contains \"follow\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}): {failed, metric=0.7027027027027027}\n", + "2024-05-29 13:49:55,869 pid:66538 MainThread giskard.core.suite INFO Overconfidence on data slice “`transcription` contains \"blood\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}): {failed, metric=0.6973684210526315}\n", + "2024-05-29 13:49:55,869 pid:66538 MainThread giskard.core.suite INFO Overconfidence on data slice “`transcription` contains \"distress\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}): {failed, metric=0.6944444444444444}\n", + "2024-05-29 13:49:55,870 pid:66538 MainThread giskard.core.suite INFO Overconfidence on data slice “`transcription` contains \"stable\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}): {failed, metric=0.6808510638297872}\n", + "2024-05-29 13:49:55,870 pid:66538 MainThread giskard.core.suite INFO Overconfidence on data slice “`transcription` contains \"mg\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}): {failed, metric=0.68}\n", + "2024-05-29 13:49:55,870 pid:66538 MainThread giskard.core.suite INFO Overconfidence on data slice “`text_length(transcription)` >= 2145.000” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}): {failed, metric=0.6756756756756757}\n", + "2024-05-29 13:49:55,870 pid:66538 MainThread giskard.core.suite INFO Overconfidence on data slice “`transcription` contains \"discharge\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}): {failed, metric=0.673469387755102}\n", + "2024-05-29 13:49:55,871 pid:66538 MainThread giskard.core.suite INFO Overconfidence on data slice “`transcription` contains \"hospital\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}): {failed, metric=0.6666666666666666}\n", + "2024-05-29 13:49:55,871 pid:66538 MainThread giskard.core.suite INFO Overconfidence on data slice “`transcription` contains \"continue\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}): {failed, metric=0.6666666666666666}\n", + "2024-05-29 13:49:55,871 pid:66538 MainThread giskard.core.suite INFO Overconfidence on data slice “`transcription` contains \"vital\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.6524137931034483, 'p_threshold': 0.2468526289804987}): {failed, metric=0.6666666666666666}\n", + "2024-05-29 13:49:55,871 pid:66538 MainThread giskard.core.suite INFO Precision on data slice “`transcription` contains \"xyz\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}): {failed, metric=0.3181818181818182}\n", + "2024-05-29 13:49:55,872 pid:66538 MainThread giskard.core.suite INFO Precision on data slice “`transcription` contains \"subjective\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}): {failed, metric=0.36666666666666664}\n", + "2024-05-29 13:49:55,873 pid:66538 MainThread giskard.core.suite INFO Precision on data slice “`transcription` contains \"admission\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}): {failed, metric=0.375}\n", + "2024-05-29 13:49:55,873 pid:66538 MainThread giskard.core.suite INFO Precision on data slice “`transcription` contains \"daily\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}): {failed, metric=0.38461538461538464}\n", + "2024-05-29 13:49:55,873 pid:66538 MainThread giskard.core.suite INFO Precision on data slice “`transcription` contains \"coronary\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}): {failed, metric=0.391304347826087}\n", + "2024-05-29 13:49:55,873 pid:66538 MainThread giskard.core.suite INFO Precision on data slice “`transcription` contains \"aspirin\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}): {failed, metric=0.391304347826087}\n", + "2024-05-29 13:49:55,873 pid:66538 MainThread giskard.core.suite INFO Precision on data slice “`transcription` contains \"followup\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}): {failed, metric=0.3958333333333333}\n", + "2024-05-29 13:49:55,874 pid:66538 MainThread giskard.core.suite INFO Precision on data slice “`transcription` contains \"lung\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}): {failed, metric=0.4}\n", + "2024-05-29 13:49:55,874 pid:66538 MainThread giskard.core.suite INFO Precision on data slice “`transcription` contains \"count\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}): {failed, metric=0.4}\n", + "2024-05-29 13:49:55,875 pid:66538 MainThread giskard.core.suite INFO Precision on data slice “`transcription` contains \"function\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}): {failed, metric=0.41379310344827586}\n", + "2024-05-29 13:49:55,875 pid:66538 MainThread giskard.core.suite INFO Precision on data slice “`transcription` contains \"abc\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}): {failed, metric=0.4166666666666667}\n", + "2024-05-29 13:49:55,876 pid:66538 MainThread giskard.core.suite INFO Precision on data slice “`transcription` contains \"improved\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}): {failed, metric=0.41935483870967744}\n", + "2024-05-29 13:49:55,876 pid:66538 MainThread giskard.core.suite INFO Precision on data slice “`transcription` contains \"continue\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}): {failed, metric=0.4230769230769231}\n", + "2024-05-29 13:49:55,877 pid:66538 MainThread giskard.core.suite INFO Precision on data slice “`transcription` contains \"discharge\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}): {failed, metric=0.4235294117647059}\n", + "2024-05-29 13:49:55,878 pid:66538 MainThread giskard.core.suite INFO Precision on data slice “`transcription` contains \"greater\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5787061994609164}): {failed, metric=0.43478260869565216}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test Invariance to “Add typos”\n
\n \n Measured Metric = 0.91375\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n transformation_function\n Add typos\n
\n \n
\n threshold\n 0.95\n
\n \n
\n output_sensitivity\n 0.05\n
\n \n
\n
\n \n \n
\n Test Overconfidence on data slice “`transcription` contains "weight"”\n
\n \n Measured Metric = 0.74194\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "weight"\n
\n \n
\n threshold\n 0.6569444444444444\n
\n \n
\n p_threshold\n 0.2468526289804987\n
\n \n
\n
\n \n \n
\n Test Overconfidence on data slice “`transcription` contains "having"”\n
\n \n Measured Metric = 0.74194\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "having"\n
\n \n
\n threshold\n 0.6569444444444444\n
\n \n
\n p_threshold\n 0.2468526289804987\n
\n \n
\n
\n \n \n
\n Test Overconfidence on data slice “`transcription` contains "today"”\n
\n \n Measured Metric = 0.73684\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "today"\n
\n \n
\n threshold\n 0.6569444444444444\n
\n \n
\n p_threshold\n 0.2468526289804987\n
\n \n
\n \n \n \n
\n Test Overconfidence on data slice “`transcription` contains "temperature"”\n
\n \n Measured Metric = 0.73333\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "temperature"\n
\n \n
\n threshold\n 0.6569444444444444\n
\n \n
\n p_threshold\n 0.2468526289804987\n
\n \n
\n \n \n \n
\n Test Overconfidence on data slice “`transcription` contains "dr"”\n
\n \n Measured Metric = 0.72549\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "dr"\n
\n \n
\n threshold\n 0.6569444444444444\n
\n \n
\n p_threshold\n 0.2468526289804987\n
\n \n
\n \n \n \n
\n Test Overconfidence on data slice “`transcription` contains "follow"”\n
\n \n Measured Metric = 0.72222\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "follow"\n
\n \n
\n threshold\n 0.6569444444444444\n
\n \n
\n p_threshold\n 0.2468526289804987\n
\n \n
\n \n \n \n
\n Test Overconfidence on data slice “`avg_whitespace(transcription)` >= 0.160”\n
\n \n Measured Metric = 0.71429\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `avg_whitespace(transcription)` >= 0.160\n
\n \n
\n threshold\n 0.6569444444444444\n
\n \n
\n p_threshold\n 0.2468526289804987\n
\n \n
\n \n \n \n
\n Test Overconfidence on data slice “`transcription` contains "blood"”\n
\n \n Measured Metric = 0.69737\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "blood"\n
\n \n
\n threshold\n 0.6569444444444444\n
\n \n
\n p_threshold\n 0.2468526289804987\n
\n \n
\n \n \n \n
\n Test Overconfidence on data slice “`transcription` contains "distress"”\n
\n \n Measured Metric = 0.69444\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "distress"\n
\n \n
\n threshold\n 0.6569444444444444\n
\n \n
\n p_threshold\n 0.2468526289804987\n
\n \n
\n \n \n \n
\n Test Overconfidence on data slice “`transcription` contains "mg"”\n
\n \n Measured Metric = 0.69388\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "mg"\n
\n \n
\n threshold\n 0.6569444444444444\n
\n \n
\n p_threshold\n 0.2468526289804987\n
\n \n
\n \n \n \n
\n Test Overconfidence on data slice “`transcription` contains "continue"”\n
\n \n Measured Metric = 0.68966\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "continue"\n
\n \n
\n threshold\n 0.6569444444444444\n
\n \n
\n p_threshold\n 0.2468526289804987\n
\n \n
\n \n \n \n
\n Test Overconfidence on data slice “`transcription` contains "stable"”\n
\n \n Measured Metric = 0.68085\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "stable"\n
\n \n
\n threshold\n 0.6569444444444444\n
\n \n
\n p_threshold\n 0.2468526289804987\n
\n \n
\n \n \n \n
\n Test Overconfidence on data slice “`avg_word_length(transcription)` < 5.789”\n
\n \n Measured Metric = 0.67647\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `avg_word_length(transcription)` < 5.789\n
\n \n
\n threshold\n 0.6569444444444444\n
\n \n
\n p_threshold\n 0.2468526289804987\n
\n \n
\n \n \n \n
\n Test Overconfidence on data slice “`text_length(transcription)` >= 2145.000”\n
\n \n Measured Metric = 0.67568\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `text_length(transcription)` >= 2145.000\n
\n \n
\n threshold\n 0.6569444444444444\n
\n \n
\n p_threshold\n 0.2468526289804987\n
\n \n
\n \n \n \n
\n Test Overconfidence on data slice “`transcription` contains "discharge"”\n
\n \n Measured Metric = 0.67347\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "discharge"\n
\n \n
\n threshold\n 0.6569444444444444\n
\n \n
\n p_threshold\n 0.2468526289804987\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`transcription` contains "xyz"”\n
\n \n Measured Metric = 0.31818\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "xyz"\n
\n \n
\n threshold\n 0.581266846361186\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`transcription` contains "subjective"”\n
\n \n Measured Metric = 0.36667\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "subjective"\n
\n \n
\n threshold\n 0.581266846361186\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`transcription` contains "admission"”\n
\n \n Measured Metric = 0.375\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "admission"\n
\n \n
\n threshold\n 0.581266846361186\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`transcription` contains "daily"”\n
\n \n Measured Metric = 0.38462\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "daily"\n
\n \n
\n threshold\n 0.581266846361186\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`transcription` contains "coronary"”\n
\n \n Measured Metric = 0.3913\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "coronary"\n
\n \n
\n threshold\n 0.581266846361186\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`transcription` contains "followup"”\n
\n \n Measured Metric = 0.39583\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "followup"\n
\n \n
\n threshold\n 0.581266846361186\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`transcription` contains "lung"”\n
\n \n Measured Metric = 0.4\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "lung"\n
\n \n
\n threshold\n 0.581266846361186\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`transcription` contains "count"”\n
\n \n Measured Metric = 0.4\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "count"\n
\n \n
\n threshold\n 0.581266846361186\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`transcription` contains "function"”\n
\n \n Measured Metric = 0.41379\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "function"\n
\n \n
\n threshold\n 0.581266846361186\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`transcription` contains "abc"”\n
\n \n Measured Metric = 0.41667\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "abc"\n
\n \n
\n threshold\n 0.581266846361186\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`transcription` contains "improved"”\n
\n \n Measured Metric = 0.41935\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "improved"\n
\n \n
\n threshold\n 0.581266846361186\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`transcription` contains "discharge"”\n
\n \n Measured Metric = 0.42353\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "discharge"\n
\n \n
\n threshold\n 0.581266846361186\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`transcription` contains "greater"”\n
\n \n Measured Metric = 0.43478\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "greater"\n
\n \n
\n threshold\n 0.581266846361186\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`transcription` contains "aspirin"”\n
\n \n Measured Metric = 0.43478\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "aspirin"\n
\n \n
\n threshold\n 0.581266846361186\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`transcription` contains "continue"”\n
\n \n Measured Metric = 0.44231\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 082b3091-113e-4b56-b010-9902f9b9dcf8\n
\n \n
\n dataset\n medical_transcript_dataset\n
\n \n
\n slicing_function\n `transcription` contains "continue"\n
\n \n
\n threshold\n 0.581266846361186\n
\n \n
\n \n \n\n \n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Invariance to “Add typos”\n", + "
\n", + " \n", + " Measured Metric = 0.90296\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " transformation_function\n", + " Add typos\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " output_sensitivity\n", + " 0.05\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`transcription` contains "temperature"”\n", + "
\n", + " \n", + " Measured Metric = 0.73333\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "temperature"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6524137931034483\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.2468526289804987\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`transcription` contains "dr"”\n", + "
\n", + " \n", + " Measured Metric = 0.72549\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "dr"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6524137931034483\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.2468526289804987\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`transcription` contains "weight"”\n", + "
\n", + " \n", + " Measured Metric = 0.71875\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "weight"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6524137931034483\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.2468526289804987\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`transcription` contains "having"”\n", + "
\n", + " \n", + " Measured Metric = 0.71875\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "having"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6524137931034483\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.2468526289804987\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`transcription` contains "today"”\n", + "
\n", + " \n", + " Measured Metric = 0.71795\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "today"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6524137931034483\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.2468526289804987\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`transcription` contains "follow"”\n", + "
\n", + " \n", + " Measured Metric = 0.7027\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "follow"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6524137931034483\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.2468526289804987\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`transcription` contains "blood"”\n", + "
\n", + " \n", + " Measured Metric = 0.69737\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "blood"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6524137931034483\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.2468526289804987\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`transcription` contains "distress"”\n", + "
\n", + " \n", + " Measured Metric = 0.69444\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "distress"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6524137931034483\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.2468526289804987\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`transcription` contains "stable"”\n", + "
\n", + " \n", + " Measured Metric = 0.68085\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "stable"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6524137931034483\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.2468526289804987\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`transcription` contains "mg"”\n", + "
\n", + " \n", + " Measured Metric = 0.68\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "mg"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6524137931034483\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.2468526289804987\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`text_length(transcription)` >= 2145.000”\n", + "
\n", + " \n", + " Measured Metric = 0.67568\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `text_length(transcription)` >= 2145.000\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6524137931034483\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.2468526289804987\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`transcription` contains "discharge"”\n", + "
\n", + " \n", + " Measured Metric = 0.67347\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "discharge"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6524137931034483\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.2468526289804987\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`transcription` contains "hospital"”\n", + "
\n", + " \n", + " Measured Metric = 0.66667\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "hospital"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6524137931034483\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.2468526289804987\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`transcription` contains "continue"”\n", + "
\n", + " \n", + " Measured Metric = 0.66667\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "continue"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6524137931034483\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.2468526289804987\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`transcription` contains "vital"”\n", + "
\n", + " \n", + " Measured Metric = 0.66667\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "vital"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.6524137931034483\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.2468526289804987\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`transcription` contains "xyz"”\n", + "
\n", + " \n", + " Measured Metric = 0.31818\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "xyz"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5787061994609164\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`transcription` contains "subjective"”\n", + "
\n", + " \n", + " Measured Metric = 0.36667\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "subjective"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5787061994609164\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`transcription` contains "admission"”\n", + "
\n", + " \n", + " Measured Metric = 0.375\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "admission"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5787061994609164\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`transcription` contains "daily"”\n", + "
\n", + " \n", + " Measured Metric = 0.38462\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "daily"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5787061994609164\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`transcription` contains "coronary"”\n", + "
\n", + " \n", + " Measured Metric = 0.3913\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "coronary"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5787061994609164\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`transcription` contains "aspirin"”\n", + "
\n", + " \n", + " Measured Metric = 0.3913\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "aspirin"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5787061994609164\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`transcription` contains "followup"”\n", + "
\n", + " \n", + " Measured Metric = 0.39583\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "followup"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5787061994609164\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`transcription` contains "lung"”\n", + "
\n", + " \n", + " Measured Metric = 0.4\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "lung"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5787061994609164\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`transcription` contains "count"”\n", + "
\n", + " \n", + " Measured Metric = 0.4\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "count"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5787061994609164\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`transcription` contains "function"”\n", + "
\n", + " \n", + " Measured Metric = 0.41379\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "function"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5787061994609164\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`transcription` contains "abc"”\n", + "
\n", + " \n", + " Measured Metric = 0.41667\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "abc"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5787061994609164\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`transcription` contains "improved"”\n", + "
\n", + " \n", + " Measured Metric = 0.41935\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "improved"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5787061994609164\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`transcription` contains "continue"”\n", + "
\n", + " \n", + " Measured Metric = 0.42308\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "continue"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5787061994609164\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`transcription` contains "discharge"”\n", + "
\n", + " \n", + " Measured Metric = 0.42353\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "discharge"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5787061994609164\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`transcription` contains "greater"”\n", + "
\n", + " \n", + " Measured Metric = 0.43478\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " medical_transcript_classification\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " medical_transcript_dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `transcription` contains "greater"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5787061994609164\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + "\n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] }, - "execution_count": 22, + "execution_count": 14, "metadata": {}, "output_type": "execute_result" } @@ -766,126 +6801,6 @@ "source": [ "test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" ] - }, - { - "cell_type": "markdown", - "id": "79d44e8a70afe38b", - "metadata": { - "collapsed": false - }, - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to:\n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ] - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], - "metadata": { - "collapsed": false - }, - "id": "750c995e5694fea7" - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - }, - "id": "92a34e0f12453610" - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - }, - "id": "f0e45c5b7f080e4c" - }, - { - "cell_type": "markdown", - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." - ], - "metadata": { - "collapsed": false - }, - "id": "8d18311f9990f724" - }, - { - "cell_type": "code", - "execution_count": null, - "id": "2da07a815b6bcb09", - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ] - }, - { - "cell_type": "markdown", - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to:\n", - "\n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" - ], - "metadata": { - "collapsed": false - }, - "id": "d7f52a115ae8d196" - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" - ], - "metadata": { - "collapsed": false - }, - "id": "9a4aa5439a722398" } ], "metadata": { @@ -904,7 +6819,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.12" + "version": "3.10.14" }, "papermill": { "default_parameters": {}, diff --git a/docs/reference/notebooks/movie_review_sentiment_classification_pytorch_sklearn.ipynb b/docs/reference/notebooks/movie_review_sentiment_classification_pytorch_sklearn.ipynb index a5ccf6de34..3fe599fbcd 100644 --- a/docs/reference/notebooks/movie_review_sentiment_classification_pytorch_sklearn.ipynb +++ b/docs/reference/notebooks/movie_review_sentiment_classification_pytorch_sklearn.ipynb @@ -19,34 +19,29 @@ "Outline: \n", "\n", "* Detect vulnerabilities automatically with Giskard’s scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to: \n", - "\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "## Install dependencies\n", "Make sure to install the `giskard`" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%pip install giskard --upgrade" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", @@ -77,8 +72,7 @@ "from sklearn.metrics import accuracy_score\n", "from sklearn.model_selection import train_test_split\n", "\n", - "from giskard import Model, Dataset, scan, testing, Suite\n", - "from giskard.client.giskard_client import GiskardClient" + "from giskard import Model, Dataset, scan, testing" ] }, { @@ -94,11 +88,11 @@ "cell_type": "code", "execution_count": 2, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T14:37:37.115118Z", "start_time": "2023-11-09T14:37:37.074171Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -176,13 +170,13 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 5, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T14:37:40.551228Z", "start_time": "2023-11-09T14:37:40.473965Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -201,13 +195,13 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T14:37:41.496296Z", "start_time": "2023-11-09T14:37:41.374398Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -353,23 +347,23 @@ }, { "cell_type": "markdown", - "source": [ - "## Detect vulnerabilities in your model" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "## Detect vulnerabilities in your model" + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "### Scan your model for vulnerabilities with Giskard\n", "\n", "Giskard's scan allows you to detect vulnerabilities in your model automatically. These include performance biases, unrobustness, data leakage, stochasticity, underconfidence, ethical issues, and more. For detailed information about the scan feature, please refer to our [scan documentation](https://docs.giskard.ai/en/stable/open_source/scan/scan_nlp/index.html)." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", @@ -384,17 +378,768 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 11, "metadata": { - "collapsed": false, "ExecuteTime": { "start_time": "2023-11-09T14:43:30.984093Z" - } + }, + "collapsed": false }, "outputs": [ { "data": { - "text/html": "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -415,47 +1160,207 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "### Generate test suites from the scan\n", "\n", "The objects produced by the scan can be used as fixtures to generate a test suite that integrate all detected vulnerabilities. Test suites allow you to evaluate and validate your model's performance, ensuring that it behaves as expected on a set of predefined test cases, and to identify any regressions or issues that might arise during development or updates." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 12, "metadata": { - "collapsed": false, "ExecuteTime": { "start_time": "2023-11-09T14:43:31.178727Z" - } + }, + "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Executed 'Overconfidence on data slice “`avg_whitespace(text)` >= 0.172”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.4033333333333333, 'p_threshold': 0.5}: \n", - " Test failed\n", - " Metric: 0.43\n", - " \n", - " \n", - "Executed 'Precision on data slice “`text` contains \"movie\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.808091286307054}: \n", + "2024-05-29 13:52:50,519 pid:67935 MainThread giskard.datasets.base INFO Casting dataframe columns from {'text': 'object'} to {'text': 'object'}\n", + "2024-05-29 13:52:50,520 pid:67935 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (62, 2) executed in 0:00:00.007627\n", + "Executed 'Precision on data slice “`text` contains \"movie\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8047520661157024}: \n", " Test failed\n", " Metric: 0.64\n", " \n", - " \n" + " \n", + "2024-05-29 13:52:50,530 pid:67935 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'\n", + "2024-05-29 13:52:50,536 pid:67935 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 13:52:50,543 pid:67935 MainThread giskard.core.suite INFO Precision on data slice “`text` contains \"movie\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.8047520661157024}): {failed, metric=0.6363636363636364}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test Overconfidence on data slice “`avg_whitespace(text)` >= 0.172”\n
\n \n Measured Metric = 0.43103\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 47a0c059-70ff-456b-802d-ac8faf08081f\n
\n \n
\n dataset\n Movie reviews dataset\n
\n \n
\n slicing_function\n `avg_whitespace(text)` >= 0.172\n
\n \n
\n threshold\n 0.4033333333333333\n
\n \n
\n p_threshold\n 0.5\n
\n \n
\n
\n \n \n
\n Test Precision on data slice “`text` contains "movie"”\n
\n \n Measured Metric = 0.63636\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 47a0c059-70ff-456b-802d-ac8faf08081f\n
\n \n
\n dataset\n Movie reviews dataset\n
\n \n
\n slicing_function\n `text` contains "movie"\n
\n \n
\n threshold\n 0.808091286307054\n
\n \n
\n
\n \n
\n \n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`text` contains "movie"”\n", + "
\n", + " \n", + " Measured Metric = 0.63636\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Movie reviews sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Movie reviews dataset\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `text` contains "movie"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.8047520661157024\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + "\n" + ], + "text/plain": [ + "" + ] }, - "execution_count": 11, + "execution_count": 12, "metadata": {}, "output_type": "execute_result" } @@ -494,118 +1399,6 @@ "source": [ "test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" ] - }, - { - "cell_type": "markdown", - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to:\n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to: \n", - "\n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" - ], - "metadata": { - "collapsed": false - } } ], "metadata": { @@ -624,7 +1417,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.6" + "version": "3.10.14" }, "widgets": { "application/vnd.jupyter.widget-state+json": { diff --git a/docs/reference/notebooks/newspaper_classification_pytorch.ipynb b/docs/reference/notebooks/newspaper_classification_pytorch.ipynb index f6c28962b9..76c5a720fa 100644 --- a/docs/reference/notebooks/newspaper_classification_pytorch.ipynb +++ b/docs/reference/notebooks/newspaper_classification_pytorch.ipynb @@ -21,12 +21,7 @@ "Outline: \n", "\n", "* Detect vulnerabilities automatically with Giskard’s scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to: \n", - "\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { @@ -41,7 +36,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2023-08-22T12:41:11.380265Z", @@ -67,11 +62,11 @@ "cell_type": "code", "execution_count": 1, "metadata": { - "id": "eup4gpgVoA10", "ExecuteTime": { "end_time": "2023-11-09T14:48:28.531176Z", "start_time": "2023-11-09T14:48:19.132247Z" - } + }, + "id": "eup4gpgVoA10" }, "outputs": [], "source": [ @@ -89,7 +84,7 @@ "from torchtext.vocab import build_vocab_from_iterator\n", "from torchtext.data.functional import to_map_style_dataset\n", "\n", - "from giskard import Model, Dataset, GiskardClient, scan, testing, Suite" + "from giskard import Model, Dataset, scan, testing" ] }, { @@ -105,11 +100,11 @@ "cell_type": "code", "execution_count": 2, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T14:48:29.990594Z", "start_time": "2023-11-09T14:48:29.949237Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -144,11 +139,11 @@ "cell_type": "code", "execution_count": 3, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T14:48:31.034568Z", "start_time": "2023-11-09T14:48:31.006365Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -167,13 +162,13 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T14:48:31.938024Z", "start_time": "2023-11-09T14:48:31.759090Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -272,20 +267,20 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T14:49:06.568434Z", "start_time": "2023-11-09T14:49:06.468046Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ "class TextClassificationModel(nn.Module):\n", " def __init__(self, vocab_size, embed_dim, num_class):\n", " super(TextClassificationModel, self).__init__()\n", - " self.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse_output=False)\n", + " self.embedding = nn.EmbeddingBag(vocab_size, embed_dim)\n", " self.fc = nn.Linear(embed_dim, num_class)\n", " self.init_weights()\n", "\n", @@ -436,23 +431,23 @@ }, { "cell_type": "markdown", - "source": [ - "## Detect vulnerabilities in your model" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "## Detect vulnerabilities in your model" + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "### Scan your model for vulnerabilities with Giskard\n", "\n", "Giskard's scan allows you to detect vulnerabilities in your model automatically. These include performance biases, unrobustness, data leakage, stochasticity, underconfidence, ethical issues, and more. For detailed information about the scan feature, please refer to our [scan documentation](https://docs.giskard.ai/en/stable/open_source/scan/scan_nlp/index.html)." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", @@ -469,16 +464,771 @@ "cell_type": "code", "execution_count": 11, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T15:20:30.991378Z", "start_time": "2023-11-09T15:20:30.768125Z" - } + }, + "collapsed": false }, "outputs": [ { "data": { - "text/html": "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -490,12 +1240,12 @@ }, { "cell_type": "markdown", - "source": [ - "## Generate comprehensive test suites automatically for your model" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "## Generate comprehensive test suites automatically for your model" + ] }, { "cell_type": "markdown", @@ -512,28 +1262,202 @@ "cell_type": "code", "execution_count": 12, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T15:20:39.594648Z", "start_time": "2023-11-09T15:20:36.942910Z" - } + }, + "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Executed 'Invariance to “Add typos”' with arguments {'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", + "2024-05-29 14:00:13,219 pid:68530 MainThread giskard.datasets.base INFO Casting dataframe columns from {'text': 'object'} to {'text': 'object'}\n", + "2024-05-29 14:00:13,225 pid:68530 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (7600, 2) executed in 0:00:00.016051\n", + "2024-05-29 14:00:13,689 pid:68530 MainThread giskard.datasets.base INFO Casting dataframe columns from {'text': 'object'} to {'text': 'object'}\n", + "2024-05-29 14:00:13,922 pid:68530 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (7600, 2) executed in 0:00:00.253734\n", + "2024-05-29 14:00:13,928 pid:68530 MainThread giskard.utils.logging_utils INFO Perturb and predict data executed in 0:00:00.729718\n", + "2024-05-29 14:00:13,930 pid:68530 MainThread giskard.utils.logging_utils INFO Compare and predict the data executed in 0:00:00.000818\n", + "Executed 'Invariance to “Add typos”' with arguments {'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", " Test failed\n", - " Metric: 0.89\n", - " - [TestMessageLevel.INFO] 7587 rows were perturbed\n", - " \n" + " Metric: 0.92\n", + " - [INFO] 7591 rows were perturbed\n", + " \n", + "2024-05-29 14:00:13,932 pid:68530 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'\n", + "2024-05-29 14:00:13,932 pid:68530 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 14:00:13,932 pid:68530 MainThread giskard.core.suite INFO Invariance to “Add typos” ({'model': , 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}): {failed, metric=0.917138716901594}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test Invariance to “Add typos”\n
\n \n Measured Metric = 0.89073\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 94eabac3-e221-4832-b444-26794115dae9\n
\n \n
\n dataset\n Test Dataset\n
\n \n
\n transformation_function\n Add typos\n
\n \n
\n threshold\n 0.95\n
\n \n
\n output_sensitivity\n 0.05\n
\n \n
\n
\n \n
\n
\n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Invariance to “Add typos”\n", + "
\n", + " \n", + " Measured Metric = 0.91714\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Simple News Classification Model\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Test Dataset\n", + "
\n", + " \n", + "
\n", + " transformation_function\n", + " Add typos\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " output_sensitivity\n", + " 0.05\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + "
\n", + "\n" + ], + "text/plain": [ + "" + ] }, "execution_count": 12, "metadata": {}, @@ -574,118 +1498,6 @@ "source": [ "test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" ] - }, - { - "cell_type": "markdown", - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to:\n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to: \n", - "\n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" - ], - "metadata": { - "collapsed": false - } } ], "metadata": { @@ -708,7 +1520,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.10" + "version": "3.10.14" } }, "nbformat": 4, diff --git a/docs/reference/notebooks/tripadvisor_sentiment_classification.ipynb b/docs/reference/notebooks/tripadvisor_sentiment_classification.ipynb index 9e2cc242d3..177c145245 100644 --- a/docs/reference/notebooks/tripadvisor_sentiment_classification.ipynb +++ b/docs/reference/notebooks/tripadvisor_sentiment_classification.ipynb @@ -21,12 +21,7 @@ "Outline: \n", "\n", "* Detect vulnerabilities automatically with Giskard’s scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to: \n", - "\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { @@ -44,10 +39,10 @@ "execution_count": null, "id": "904bb40c24cd2d02", "metadata": { - "collapsed": false, "ExecuteTime": { "start_time": "2023-11-09T14:58:53.801518Z" - } + }, + "collapsed": false }, "outputs": [], "source": [ @@ -92,22 +87,30 @@ "from transformers import DistilBertForSequenceClassification, DistilBertTokenizer\n", "from typing import Union, List\n", "\n", - "from giskard import Dataset, Model, scan, Suite, GiskardClient, testing" + "from giskard import Dataset, Model, scan, testing" ] }, { "cell_type": "markdown", - "source": [ - "## Define constants" - ], + "id": "e25efcb2dd214fff", "metadata": { "collapsed": false }, - "id": "e25efcb2dd214fff" + "source": [ + "## Define constants" + ] }, { "cell_type": "code", "execution_count": 2, + "id": "9dc3ac4821372c97", + "metadata": { + "ExecuteTime": { + "end_time": "2023-11-09T15:22:42.596338Z", + "start_time": "2023-11-09T15:22:42.552639Z" + }, + "collapsed": false + }, "outputs": [], "source": [ "# Constants\n", @@ -123,19 +126,19 @@ "DATA_URL = \"ftp://sys.giskard.ai/pub/unit_test_resources/tripadvisor_reviews_dataset/{}\"\n", "DATA_PATH = Path.home() / \".giskard\" / \"tripadvisor_reviews_dataset\"\n", "DATA_FILE_NAME = \"tripadvisor_hotel_reviews.csv\"" - ], - "metadata": { - "collapsed": false, - "ExecuteTime": { - "end_time": "2023-11-09T15:22:42.596338Z", - "start_time": "2023-11-09T15:22:42.552639Z" - } - }, - "id": "9dc3ac4821372c97" + ] }, { "cell_type": "code", "execution_count": 3, + "id": "7d58054eb0568a57", + "metadata": { + "ExecuteTime": { + "end_time": "2023-11-09T15:22:46.542276Z", + "start_time": "2023-11-09T15:22:46.501005Z" + }, + "collapsed": false + }, "outputs": [], "source": [ "# Set random seeds\n", @@ -143,39 +146,35 @@ "np.random.seed(RANDOM_SEED)\n", "torch.manual_seed(RANDOM_SEED)\n", "torch.cuda.manual_seed_all(RANDOM_SEED)" - ], - "metadata": { - "collapsed": false, - "ExecuteTime": { - "end_time": "2023-11-09T15:22:46.542276Z", - "start_time": "2023-11-09T15:22:46.501005Z" - } - }, - "id": "7d58054eb0568a57" + ] }, { "cell_type": "markdown", - "source": [ - "## Dataset preparation" - ], + "id": "d77b1e9844a959b6", "metadata": { "collapsed": false }, - "id": "d77b1e9844a959b6" + "source": [ + "## Dataset preparation" + ] }, { "cell_type": "markdown", - "source": [ - "### Load data" - ], + "id": "dda9fcb9495b3477", "metadata": { "collapsed": false }, - "id": "dda9fcb9495b3477" + "source": [ + "### Load data" + ] }, { "cell_type": "code", "execution_count": null, + "id": "e3c3e6a5", + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "nltk.download('stopwords')\n", @@ -285,11 +284,7 @@ " df.drop(columns=\"Rating\", inplace=True)\n", " df = text_preprocessor(df)\n", " return df" - ], - "metadata": { - "collapsed": false - }, - "id": "e3c3e6a5" + ] }, { "attachments": {}, @@ -303,7 +298,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "id": "8223c2c2", "metadata": { "ExecuteTime": { @@ -451,7 +446,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 8, "id": "931c5a5c", "metadata": {}, "outputs": [], @@ -468,25 +463,25 @@ }, { "cell_type": "markdown", - "source": [ - "## Detect vulnerabilities in your model" - ], + "id": "19aace313efe15af", "metadata": { "collapsed": false }, - "id": "19aace313efe15af" + "source": [ + "## Detect vulnerabilities in your model" + ] }, { "cell_type": "markdown", + "id": "cd3d8739ca1234a5", + "metadata": { + "collapsed": false + }, "source": [ "### Scan your model for vulnerabilities with Giskard\n", "\n", "Giskard's scan allows you to detect vulnerabilities in your model automatically. These include performance biases, unrobustness, data leakage, stochasticity, underconfidence, ethical issues, and more. For detailed information about the scan feature, please refer to our [scan documentation](https://docs.giskard.ai/en/stable/open_source/scan/scan_nlp/index.html)." - ], - "metadata": { - "collapsed": false - }, - "id": "cd3d8739ca1234a5" + ] }, { "cell_type": "code", @@ -500,7 +495,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 10, "id": "ecb49fa5", "metadata": { "ExecuteTime": { @@ -511,7 +506,2410 @@ "outputs": [ { "data": { - "text/html": "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -541,7 +2939,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 11, "id": "bea736a9", "metadata": { "ExecuteTime": { @@ -554,127 +2952,898 @@ "name": "stdout", "output_type": "stream", "text": [ - "Executed 'Invariance to “Switch Religion”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", - " Test failed\n", - " Metric: 0.88\n", - " - [TestMessageLevel.INFO] 8 rows were perturbed\n", - " \n", - "Executed 'Invariance to “Switch countries from high- to low-income and vice versa”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", + "2024-05-29 14:07:23,236 pid:70250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}\n", + "2024-05-29 14:07:23,237 pid:70250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (54, 2) executed in 0:00:00.009851\n", + "Executed 'Precision on data slice “`Review` contains \"manager\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}: \n", " Test failed\n", - " Metric: 0.85\n", - " - [TestMessageLevel.INFO] 137 rows were perturbed\n", + " Metric: 0.24\n", " \n", - "Executed 'Invariance to “Switch Gender”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", - " Test failed\n", - " Metric: 0.95\n", - " - [TestMessageLevel.INFO] 396 rows were perturbed\n", - " \n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/mykytaalekseiev/Work/GiskardDevelopVersion/giskard-client/.venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.\n", - " return bound(*args, **kwds)\n", - "/Users/mykytaalekseiev/Work/GiskardDevelopVersion/giskard-client/.venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2614: FutureWarning: The `pad_to_max_length` argument is deprecated and will be removed in a future version, use `padding=True` or `padding='longest'` to pad to the longest sequence in the batch, or use `padding='max_length'` to pad to a max length. In this case, you can give a specific length with `max_length` (e.g. `max_length=45`) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).\n", - " warnings.warn(\n", - "/var/folders/4q/3_bfyqnn7yv5jcjq98x2jf680000gn/T/ipykernel_27805/3117328017.py:61: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.\n", - " probs = torch.nn.functional.softmax(outputs.logits).detach().cpu().numpy()\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Executed 'Invariance to “Add typos”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'transformation_function': , 'threshold': 0.95, 'output_sensitivity': 0.05}: \n", - " Test failed\n", - " Metric: 0.67\n", - " - [TestMessageLevel.INFO] 999 rows were perturbed\n", " \n", - "Executed 'Precision on data slice “`Review` contains \"loved\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'slicing_function': , 'threshold': 0.19474999999999998}: \n", + "2024-05-29 14:07:23,259 pid:70250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}\n", + "2024-05-29 14:07:23,260 pid:70250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (56, 2) executed in 0:00:00.007041\n", + "Executed 'Precision on data slice “`Review` contains \"tiny\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}: \n", " Test failed\n", - " Metric: 0.07\n", + " Metric: 0.3\n", " \n", " \n", - "Executed 'Precision on data slice “`Review` contains \"complimentary\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'slicing_function': , 'threshold': 0.19474999999999998}: \n", + "2024-05-29 14:07:23,278 pid:70250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}\n", + "2024-05-29 14:07:23,279 pid:70250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (74, 2) executed in 0:00:00.005584\n", + "Executed 'Precision on data slice “`Review` contains \"said\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}: \n", " Test failed\n", - " Metric: 0.08\n", + " Metric: 0.39\n", " \n", " \n", - "Executed 'Precision on data slice “`Review` contains \"wonderful\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'slicing_function': , 'threshold': 0.19474999999999998}: \n", + "2024-05-29 14:07:23,297 pid:70250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}\n", + "2024-05-29 14:07:23,298 pid:70250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (76, 2) executed in 0:00:00.005489\n", + "Executed 'Precision on data slice “`Review` contains \"air\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}: \n", " Test failed\n", - " Metric: 0.09\n", + " Metric: 0.39\n", " \n", " \n", - "Executed 'Precision on data slice “`Review` contains \"perfect\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'slicing_function': , 'threshold': 0.19474999999999998}: \n", + "2024-05-29 14:07:23,316 pid:70250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}\n", + "2024-05-29 14:07:23,319 pid:70250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (57, 2) executed in 0:00:00.009502\n", + "Executed 'Precision on data slice “`Review` contains \"elevator\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}: \n", " Test failed\n", - " Metric: 0.1\n", + " Metric: 0.4\n", " \n", " \n", - "Executed 'Precision on data slice “`Review` contains \"quarter\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'slicing_function': , 'threshold': 0.19474999999999998}: \n", + "2024-05-29 14:07:23,337 pid:70250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}\n", + "2024-05-29 14:07:23,338 pid:70250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (52, 2) executed in 0:00:00.006870\n", + "Executed 'Precision on data slice “`Review` contains \"reservation\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}: \n", " Test failed\n", - " Metric: 0.1\n", + " Metric: 0.4\n", " \n", " \n", - "Executed 'Precision on data slice “`Review` contains \"excellent\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'slicing_function': , 'threshold': 0.19474999999999998}: \n", + "2024-05-29 14:07:23,355 pid:70250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}\n", + "2024-05-29 14:07:23,356 pid:70250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (93, 2) executed in 0:00:00.005766\n", + "Executed 'Precision on data slice “`Review` contains \"bad\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}: \n", " Test failed\n", - " Metric: 0.1\n", + " Metric: 0.43\n", " \n", " \n", - "Executed 'Precision on data slice “`Review` contains \"french\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'slicing_function': , 'threshold': 0.19474999999999998}: \n", + "2024-05-29 14:07:23,374 pid:70250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}\n", + "2024-05-29 14:07:23,375 pid:70250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (60, 2) executed in 0:00:00.006260\n", + "Executed 'Precision on data slice “`Review` contains \"hear\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}: \n", " Test failed\n", - " Metric: 0.1\n", + " Metric: 0.43\n", " \n", " \n", - "Executed 'Precision on data slice “`avg_word_length(Review)` >= 5.983”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'slicing_function': , 'threshold': 0.19474999999999998}: \n", + "2024-05-29 14:07:23,392 pid:70250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}\n", + "2024-05-29 14:07:23,393 pid:70250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (53, 2) executed in 0:00:00.006900\n", + "Executed 'Precision on data slice “`Review` contains \"average\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}: \n", " Test failed\n", - " Metric: 0.1\n", + " Metric: 0.43\n", " \n", " \n", - "Executed 'Precision on data slice “`Review` contains \"fantastic\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'slicing_function': , 'threshold': 0.19474999999999998}: \n", + "2024-05-29 14:07:23,597 pid:70250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}\n", + "2024-05-29 14:07:23,598 pid:70250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (83, 2) executed in 0:00:00.006237\n", + "Executed 'Precision on data slice “`Review` contains \"work\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}: \n", " Test failed\n", - " Metric: 0.11\n", + " Metric: 0.45\n", " \n", " \n", - "Executed 'Precision on data slice “`Review` contains \"choice\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'slicing_function': , 'threshold': 0.19474999999999998}: \n", + "2024-05-29 14:07:23,615 pid:70250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}\n", + "2024-05-29 14:07:23,616 pid:70250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (61, 2) executed in 0:00:00.006405\n", + "Executed 'Precision on data slice “`Review` contains \"noisy\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}: \n", " Test failed\n", - " Metric: 0.11\n", + " Metric: 0.48\n", " \n", " \n", - "Executed 'Precision on data slice “`Review` contains \"beautiful\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'slicing_function': , 'threshold': 0.19474999999999998}: \n", + "2024-05-29 14:07:23,632 pid:70250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}\n", + "2024-05-29 14:07:23,633 pid:70250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (83, 2) executed in 0:00:00.006357\n", + "Executed 'Precision on data slice “`Review` contains \"times\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}: \n", " Test failed\n", - " Metric: 0.11\n", + " Metric: 0.48\n", " \n", " \n", - "Executed 'Precision on data slice “`Review` contains \"enjoyed\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'slicing_function': , 'threshold': 0.19474999999999998}: \n", + "2024-05-29 14:07:23,648 pid:70250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}\n", + "2024-05-29 14:07:23,649 pid:70250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (94, 2) executed in 0:00:00.006226\n", + "Executed 'Precision on data slice “`Review` contains \"days\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}: \n", " Test failed\n", - " Metric: 0.11\n", + " Metric: 0.49\n", " \n", " \n", - "Executed 'Precision on data slice “`Review` contains \"love\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'slicing_function': , 'threshold': 0.19474999999999998}: \n", + "2024-05-29 14:07:23,666 pid:70250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}\n", + "2024-05-29 14:07:23,668 pid:70250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (73, 2) executed in 0:00:00.006877\n", + "Executed 'Precision on data slice “`Review` contains \"know\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}: \n", " Test failed\n", - " Metric: 0.12\n", + " Metric: 0.49\n", " \n", " \n", - "Executed 'Precision on data slice “`Review` contains \"suite\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'slicing_function': , 'threshold': 0.19474999999999998}: \n", + "2024-05-29 14:07:23,684 pid:70250 MainThread giskard.datasets.base INFO Casting dataframe columns from {'Review': 'object'} to {'Review': 'object'}\n", + "2024-05-29 14:07:23,685 pid:70250 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (81, 2) executed in 0:00:00.006585\n", + "Executed 'Precision on data slice “`Review` contains \"minutes\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}: \n", " Test failed\n", - " Metric: 0.12\n", + " Metric: 0.49\n", " \n", " \n", - "Executed 'Precision on data slice “`Review` contains \"orleans\"”' with arguments {'model': <__main__.GiskardModelCustomWrapper object at 0x12dd82590>, 'dataset': , 'slicing_function': , 'threshold': 0.19474999999999998}: \n", - " Test failed\n", - " Metric: 0.12\n", - " \n", - " \n" + "2024-05-29 14:07:23,688 pid:70250 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'\n", + "2024-05-29 14:07:23,688 pid:70250 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 14:07:23,688 pid:70250 MainThread giskard.core.suite INFO Precision on data slice “`Review` contains \"manager\"” ({'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}): {failed, metric=0.24074074074074073}\n", + "2024-05-29 14:07:23,688 pid:70250 MainThread giskard.core.suite INFO Precision on data slice “`Review` contains \"tiny\"” ({'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}): {failed, metric=0.30357142857142855}\n", + "2024-05-29 14:07:23,689 pid:70250 MainThread giskard.core.suite INFO Precision on data slice “`Review` contains \"said\"” ({'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}): {failed, metric=0.3918918918918919}\n", + "2024-05-29 14:07:23,689 pid:70250 MainThread giskard.core.suite INFO Precision on data slice “`Review` contains \"air\"” ({'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}): {failed, metric=0.39473684210526316}\n", + "2024-05-29 14:07:23,689 pid:70250 MainThread giskard.core.suite INFO Precision on data slice “`Review` contains \"elevator\"” ({'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}): {failed, metric=0.40350877192982454}\n", + "2024-05-29 14:07:23,689 pid:70250 MainThread giskard.core.suite INFO Precision on data slice “`Review` contains \"reservation\"” ({'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}): {failed, metric=0.40384615384615385}\n", + "2024-05-29 14:07:23,690 pid:70250 MainThread giskard.core.suite INFO Precision on data slice “`Review` contains \"bad\"” ({'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}): {failed, metric=0.43010752688172044}\n", + "2024-05-29 14:07:23,690 pid:70250 MainThread giskard.core.suite INFO Precision on data slice “`Review` contains \"hear\"” ({'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}): {failed, metric=0.43333333333333335}\n", + "2024-05-29 14:07:23,690 pid:70250 MainThread giskard.core.suite INFO Precision on data slice “`Review` contains \"average\"” ({'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}): {failed, metric=0.4339622641509434}\n", + "2024-05-29 14:07:23,690 pid:70250 MainThread giskard.core.suite INFO Precision on data slice “`Review` contains \"work\"” ({'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}): {failed, metric=0.4457831325301205}\n", + "2024-05-29 14:07:23,691 pid:70250 MainThread giskard.core.suite INFO Precision on data slice “`Review` contains \"noisy\"” ({'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}): {failed, metric=0.47540983606557374}\n", + "2024-05-29 14:07:23,691 pid:70250 MainThread giskard.core.suite INFO Precision on data slice “`Review` contains \"times\"” ({'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}): {failed, metric=0.4819277108433735}\n", + "2024-05-29 14:07:23,691 pid:70250 MainThread giskard.core.suite INFO Precision on data slice “`Review` contains \"days\"” ({'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}): {failed, metric=0.48936170212765956}\n", + "2024-05-29 14:07:23,691 pid:70250 MainThread giskard.core.suite INFO Precision on data slice “`Review` contains \"know\"” ({'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}): {failed, metric=0.4931506849315068}\n", + "2024-05-29 14:07:23,692 pid:70250 MainThread giskard.core.suite INFO Precision on data slice “`Review` contains \"minutes\"” ({'model': <__main__.GiskardModelCustomWrapper object at 0x3215bda50>, 'dataset': , 'slicing_function': , 'threshold': 0.61275}): {failed, metric=0.49382716049382713}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test Invariance to “Switch Religion”\n
\n \n Measured Metric = 0.875\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n transformation_function\n Switch Religion\n
\n \n
\n threshold\n 0.95\n
\n \n
\n output_sensitivity\n 0.05\n
\n \n
\n
\n \n \n
\n Test Invariance to “Switch countries from high- to low-income and vice versa”\n
\n \n Measured Metric = 0.85401\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n transformation_function\n Switch countries from high- to low-income and vice versa\n
\n \n
\n threshold\n 0.95\n
\n \n
\n output_sensitivity\n 0.05\n
\n \n
\n
\n \n \n
\n Test Invariance to “Switch Gender”\n
\n \n Measured Metric = 0.94949\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n transformation_function\n Switch Gender\n
\n \n
\n threshold\n 0.95\n
\n \n
\n output_sensitivity\n 0.05\n
\n \n
\n
\n \n \n
\n Test Invariance to “Add typos”\n
\n \n Measured Metric = 0.66567\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n transformation_function\n Add typos\n
\n \n
\n threshold\n 0.95\n
\n \n
\n output_sensitivity\n 0.05\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Review` contains "loved"”\n
\n \n Measured Metric = 0.07407\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n slicing_function\n `Review` contains "loved"\n
\n \n
\n threshold\n 0.19474999999999998\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Review` contains "complimentary"”\n
\n \n Measured Metric = 0.07547\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n slicing_function\n `Review` contains "complimentary"\n
\n \n
\n threshold\n 0.19474999999999998\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Review` contains "wonderful"”\n
\n \n Measured Metric = 0.09434\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n slicing_function\n `Review` contains "wonderful"\n
\n \n
\n threshold\n 0.19474999999999998\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Review` contains "perfect"”\n
\n \n Measured Metric = 0.09677\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n slicing_function\n `Review` contains "perfect"\n
\n \n
\n threshold\n 0.19474999999999998\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Review` contains "quarter"”\n
\n \n Measured Metric = 0.09804\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n slicing_function\n `Review` contains "quarter"\n
\n \n
\n threshold\n 0.19474999999999998\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Review` contains "excellent"”\n
\n \n Measured Metric = 0.10317\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n slicing_function\n `Review` contains "excellent"\n
\n \n
\n threshold\n 0.19474999999999998\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Review` contains "french"”\n
\n \n Measured Metric = 0.10345\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n slicing_function\n `Review` contains "french"\n
\n \n
\n threshold\n 0.19474999999999998\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`avg_word_length(Review)` >= 5.983”\n
\n \n Measured Metric = 0.10471\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n slicing_function\n `avg_word_length(Review)` >= 5.983\n
\n \n
\n threshold\n 0.19474999999999998\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Review` contains "fantastic"”\n
\n \n Measured Metric = 0.10714\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n slicing_function\n `Review` contains "fantastic"\n
\n \n
\n threshold\n 0.19474999999999998\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Review` contains "choice"”\n
\n \n Measured Metric = 0.10909\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n slicing_function\n `Review` contains "choice"\n
\n \n
\n threshold\n 0.19474999999999998\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Review` contains "beautiful"”\n
\n \n Measured Metric = 0.10989\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n slicing_function\n `Review` contains "beautiful"\n
\n \n
\n threshold\n 0.19474999999999998\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Review` contains "enjoyed"”\n
\n \n Measured Metric = 0.1134\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n slicing_function\n `Review` contains "enjoyed"\n
\n \n
\n threshold\n 0.19474999999999998\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Review` contains "love"”\n
\n \n Measured Metric = 0.11594\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n slicing_function\n `Review` contains "love"\n
\n \n
\n threshold\n 0.19474999999999998\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Review` contains "suite"”\n
\n \n Measured Metric = 0.11688\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n slicing_function\n `Review` contains "suite"\n
\n \n
\n threshold\n 0.19474999999999998\n
\n \n
\n \n \n \n
\n Test Precision on data slice “`Review` contains "orleans"”\n
\n \n Measured Metric = 0.12069\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 42c0d917-a7d4-43b9-a4ca-85b3f9309554\n
\n \n
\n dataset\n Trip advisor reviews sentiment\n
\n \n
\n slicing_function\n `Review` contains "orleans"\n
\n \n
\n threshold\n 0.19474999999999998\n
\n \n
\n \n \n\n \n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Review` contains "manager"”\n", + "
\n", + " \n", + " Measured Metric = 0.24074\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Trip advisor sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Trip advisor reviews sentiment\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Review` contains "manager"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.61275\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Review` contains "tiny"”\n", + "
\n", + " \n", + " Measured Metric = 0.30357\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Trip advisor sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Trip advisor reviews sentiment\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Review` contains "tiny"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.61275\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Review` contains "said"”\n", + "
\n", + " \n", + " Measured Metric = 0.39189\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Trip advisor sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Trip advisor reviews sentiment\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Review` contains "said"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.61275\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Review` contains "air"”\n", + "
\n", + " \n", + " Measured Metric = 0.39474\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Trip advisor sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Trip advisor reviews sentiment\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Review` contains "air"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.61275\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Review` contains "elevator"”\n", + "
\n", + " \n", + " Measured Metric = 0.40351\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Trip advisor sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Trip advisor reviews sentiment\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Review` contains "elevator"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.61275\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Review` contains "reservation"”\n", + "
\n", + " \n", + " Measured Metric = 0.40385\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Trip advisor sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Trip advisor reviews sentiment\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Review` contains "reservation"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.61275\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Review` contains "bad"”\n", + "
\n", + " \n", + " Measured Metric = 0.43011\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Trip advisor sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Trip advisor reviews sentiment\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Review` contains "bad"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.61275\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Review` contains "hear"”\n", + "
\n", + " \n", + " Measured Metric = 0.43333\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Trip advisor sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Trip advisor reviews sentiment\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Review` contains "hear"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.61275\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Review` contains "average"”\n", + "
\n", + " \n", + " Measured Metric = 0.43396\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Trip advisor sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Trip advisor reviews sentiment\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Review` contains "average"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.61275\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Review` contains "work"”\n", + "
\n", + " \n", + " Measured Metric = 0.44578\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Trip advisor sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Trip advisor reviews sentiment\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Review` contains "work"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.61275\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Review` contains "noisy"”\n", + "
\n", + " \n", + " Measured Metric = 0.47541\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Trip advisor sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Trip advisor reviews sentiment\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Review` contains "noisy"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.61275\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Review` contains "times"”\n", + "
\n", + " \n", + " Measured Metric = 0.48193\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Trip advisor sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Trip advisor reviews sentiment\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Review` contains "times"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.61275\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Review` contains "days"”\n", + "
\n", + " \n", + " Measured Metric = 0.48936\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Trip advisor sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Trip advisor reviews sentiment\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Review` contains "days"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.61275\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Review` contains "know"”\n", + "
\n", + " \n", + " Measured Metric = 0.49315\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Trip advisor sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Trip advisor reviews sentiment\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Review` contains "know"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.61275\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Precision on data slice “`Review` contains "minutes"”\n", + "
\n", + " \n", + " Measured Metric = 0.49383\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " Trip advisor sentiment classifier\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " Trip advisor reviews sentiment\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `Review` contains "minutes"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.61275\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + "\n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] }, - "execution_count": 15, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } @@ -711,122 +3880,6 @@ "source": [ "test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" ] - }, - { - "cell_type": "markdown", - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to:\n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ], - "metadata": { - "collapsed": false - }, - "id": "29b180b1bab598bf" - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], - "metadata": { - "collapsed": false - }, - "id": "45cc31e6aa286189" - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - }, - "id": "4ac0c57148bca48a" - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - }, - "id": "ba2b6decf6ef3f9d" - }, - { - "cell_type": "markdown", - "id": "cf824254", - "metadata": {}, - "source": [ - "### Upload your test suite to the Giskard Hub\n", - "\n", - "The entry point to the Giskard Hub is the upload of your test suite. Uploading the test suite will automatically save the model, dataset, tests, slicing & transformation functions to the Giskard Hub." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8efd6bf3", - "metadata": {}, - "outputs": [], - "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ] - }, - { - "cell_type": "markdown", - "id": "639f0c2d048805be", - "metadata": { - "collapsed": false - }, - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to:\n", - " \n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" - ], - "metadata": { - "collapsed": false - }, - "id": "30133c15995b7cb1" } ], "metadata": { @@ -845,7 +3898,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.11" + "version": "3.10.14" } }, "nbformat": 4, diff --git a/docs/reference/notebooks/twitter_sentiment_analysis_roberta.ipynb b/docs/reference/notebooks/twitter_sentiment_analysis_roberta.ipynb index 9ea92c4248..33d9682192 100644 --- a/docs/reference/notebooks/twitter_sentiment_analysis_roberta.ipynb +++ b/docs/reference/notebooks/twitter_sentiment_analysis_roberta.ipynb @@ -23,12 +23,7 @@ "Outline:\n", "\n", "* Detect vulnerabilities automatically with Giskard's scan\n", - "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics\n", - "* Upload your model to the Giskard Hub to:\n", - "\n", - " * Debug failing tests & diagnose issues\n", - " * Compare models & decide which one to promote\n", - " * Share your results & collect feedback from non-technical team members" + "* Automatically generate & curate a comprehensive test suite to test your model beyond accuracy-related metrics" ] }, { @@ -67,7 +62,7 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 1, "id": "initial_id", "metadata": { "ExecuteTime": { @@ -85,7 +80,7 @@ "from datasets import load_dataset\n", "from transformers import AutoModelForSequenceClassification, AutoTokenizer\n", "\n", - "from giskard import Dataset, Model, scan, testing, GiskardClient, Suite " + "from giskard import Dataset, Model, scan, testing " ] }, { @@ -101,7 +96,7 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 2, "id": "856d520e800188b4", "metadata": { "ExecuteTime": { @@ -154,7 +149,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": null, "id": "50d7817e0425d02c", "metadata": { "ExecuteTime": { @@ -183,7 +178,7 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": null, "id": "9ff7255a655f3cc5", "metadata": { "ExecuteTime": { @@ -225,7 +220,7 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": null, "id": "e89a8d482a455064", "metadata": { "ExecuteTime": { @@ -254,7 +249,7 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": null, "id": "87c897d849e9e004", "metadata": { "ExecuteTime": { @@ -318,7 +313,7 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 8, "id": "7ae720b00159c8a6", "metadata": { "ExecuteTime": { @@ -332,7 +327,7 @@ { "data": { "text/html": [ - "\n" + "text/html": [ + "\n", + "" + ] }, "metadata": {}, "output_type": "display_data" @@ -433,98 +2824,794 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "### Generate test suites from the scan\n", "\n", "The objects produced by the scan can be used as fixtures to generate a test suite that integrate all detected vulnerabilities. Test suites allow you to evaluate and validate your model's performance, ensuring that it behaves as expected on a set of predefined test cases, and to identify any regressions or issues that might arise during development or updates." - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 12, "metadata": { - "collapsed": false, "ExecuteTime": { "end_time": "2023-11-09T16:42:10.035419Z", "start_time": "2023-11-09T16:42:09.270253Z" - } + }, + "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Executed 'Overconfidence on data slice “`hours-per-week` < 41.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5045638359329867, 'p_threshold': 0.5}: \n", + "2024-05-29 14:14:33,648 pid:72955 MainThread giskard.datasets.base INFO Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}\n", + "2024-05-29 14:14:33,654 pid:72955 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (6902, 10) executed in 0:00:00.041066\n", + "Executed 'Overconfidence on data slice “`hours-per-week` < 41.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.4973034997131383, 'p_threshold': 0.5}: \n", " Test failed\n", - " Metric: 0.51\n", + " Metric: 0.5\n", " \n", " \n", - "Executed 'Underconfidence on data slice “`relationship` == \"Husband\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.01384993346299519, 'p_threshold': 0.95}: \n", + "2024-05-29 14:14:33,676 pid:72955 MainThread giskard.datasets.base INFO Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}\n", + "2024-05-29 14:14:33,684 pid:72955 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (879, 10) executed in 0:00:00.017064\n", + "Executed 'Underconfidence on data slice “`age` >= 41.500 AND `age` < 45.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.011710512846760161, 'p_threshold': 0.95}: \n", " Test failed\n", " Metric: 0.02\n", " \n", " \n", - "Executed 'Underconfidence on data slice “`age` >= 40.500 AND `age` < 56.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.01384993346299519, 'p_threshold': 0.95}: \n", + "2024-05-29 14:14:33,714 pid:72955 MainThread giskard.datasets.base INFO Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}\n", + "2024-05-29 14:14:33,717 pid:72955 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (3923, 10) executed in 0:00:00.018021\n", + "Executed 'Underconfidence on data slice “`relationship` == \"Husband\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.011710512846760161, 'p_threshold': 0.95}: \n", " Test failed\n", " Metric: 0.02\n", " \n", " \n", - "Executed 'Underconfidence on data slice “`fnlwgt` < 128385.000 AND `fnlwgt` >= 99990.000”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.01384993346299519, 'p_threshold': 0.95}: \n", + "2024-05-29 14:14:33,742 pid:72955 MainThread giskard.datasets.base INFO Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}\n", + "2024-05-29 14:14:33,745 pid:72955 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (1335, 10) executed in 0:00:00.017211\n", + "Executed 'Underconfidence on data slice “`age` >= 48.500 AND `age` < 58.500”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.011710512846760161, 'p_threshold': 0.95}: \n", " Test failed\n", " Metric: 0.02\n", " \n", " \n", - "Executed 'Underconfidence on data slice “`gender` == \"Male\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.01384993346299519, 'p_threshold': 0.95}: \n", + "2024-05-29 14:14:33,772 pid:72955 MainThread giskard.datasets.base INFO Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}\n", + "2024-05-29 14:14:33,776 pid:72955 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (6485, 10) executed in 0:00:00.022294\n", + "Executed 'Underconfidence on data slice “`gender` == \"Male\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.011710512846760161, 'p_threshold': 0.95}: \n", " Test failed\n", " Metric: 0.01\n", " \n", " \n", - "Executed 'Recall on data slice “`relationship` == \"Own-child\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5277777777777778}: \n", + "2024-05-29 14:14:33,790 pid:72955 MainThread giskard.datasets.base INFO Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}\n", + "2024-05-29 14:14:33,792 pid:72955 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (1498, 10) executed in 0:00:00.010130\n", + "Executed 'Recall on data slice “`relationship` == \"Own-child\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5253512132822478}: \n", " Test failed\n", - " Metric: 0.26\n", + " Metric: 0.3\n", " \n", " \n", - "Executed 'Recall on data slice “`workclass` == \"?\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5277777777777778}: \n", + "2024-05-29 14:14:33,807 pid:72955 MainThread giskard.datasets.base INFO Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}\n", + "2024-05-29 14:14:33,809 pid:72955 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (576, 10) executed in 0:00:00.007495\n", + "Executed 'Recall on data slice “`workclass` == \"?\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5253512132822478}: \n", " Test failed\n", - " Metric: 0.33\n", + " Metric: 0.34\n", " \n", " \n", - "Executed 'Recall on data slice “`relationship` == \"Not-in-family\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5277777777777778}: \n", + "2024-05-29 14:14:33,827 pid:72955 MainThread giskard.datasets.base INFO Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}\n", + "2024-05-29 14:14:33,829 pid:72955 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (2528, 10) executed in 0:00:00.013633\n", + "Executed 'Recall on data slice “`relationship` == \"Not-in-family\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5253512132822478}: \n", " Test failed\n", - " Metric: 0.36\n", + " Metric: 0.35\n", " \n", " \n", - "Executed 'Recall on data slice “`workclass` == \"Self-emp-not-inc\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5277777777777778}: \n", + "2024-05-29 14:14:33,849 pid:72955 MainThread giskard.datasets.base INFO Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}\n", + "2024-05-29 14:14:33,850 pid:72955 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (1076, 10) executed in 0:00:00.008265\n", + "Executed 'Recall on data slice “`relationship` == \"Unmarried\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5253512132822478}: \n", " Test failed\n", - " Metric: 0.39\n", + " Metric: 0.38\n", " \n", " \n", - "Executed 'Recall on data slice “`race` == \"Black\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5277777777777778}: \n", + "2024-05-29 14:14:33,865 pid:72955 MainThread giskard.datasets.base INFO Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}\n", + "2024-05-29 14:14:33,867 pid:72955 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (935, 10) executed in 0:00:00.008937\n", + "Executed 'Recall on data slice “`race` == \"Black\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5253512132822478}: \n", " Test failed\n", - " Metric: 0.39\n", + " Metric: 0.38\n", " \n", " \n", - "Executed 'Recall on data slice “`relationship` == \"Unmarried\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5277777777777778}: \n", + "2024-05-29 14:14:33,880 pid:72955 MainThread giskard.datasets.base INFO Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}\n", + "2024-05-29 14:14:33,882 pid:72955 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (721, 10) executed in 0:00:00.008114\n", + "Executed 'Recall on data slice “`workclass` == \"Self-emp-not-inc\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5253512132822478}: \n", " Test failed\n", - " Metric: 0.41\n", + " Metric: 0.39\n", " \n", " \n", - "Executed 'Recall on data slice “`gender` == \"Female\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5277777777777778}: \n", + "2024-05-29 14:14:33,902 pid:72955 MainThread giskard.datasets.base INFO Casting dataframe columns from {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'} to {'age': 'int64', 'workclass': 'object', 'fnlwgt': 'int64', 'relationship': 'object', 'race': 'object', 'gender': 'object', 'capital-gain': 'int64', 'capital-loss': 'int64', 'hours-per-week': 'int64'}\n", + "2024-05-29 14:14:33,905 pid:72955 MainThread giskard.utils.logging_utils INFO Predicted dataset with shape (3284, 10) executed in 0:00:00.014940\n", + "Executed 'Recall on data slice “`gender` == \"Female\"”' with arguments {'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5253512132822478}: \n", " Test failed\n", - " Metric: 0.5\n", + " Metric: 0.52\n", + " \n", " \n", - " \n" + "2024-05-29 14:14:33,917 pid:72955 MainThread giskard.core.suite INFO Executed test suite 'My first test suite'\n", + "2024-05-29 14:14:33,917 pid:72955 MainThread giskard.core.suite INFO result: failed\n", + "2024-05-29 14:14:33,917 pid:72955 MainThread giskard.core.suite INFO Overconfidence on data slice “`hours-per-week` < 41.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.4973034997131383, 'p_threshold': 0.5}): {failed, metric=0.5041237113402062}\n", + "2024-05-29 14:14:33,918 pid:72955 MainThread giskard.core.suite INFO Underconfidence on data slice “`age` >= 41.500 AND `age` < 45.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.011710512846760161, 'p_threshold': 0.95}): {failed, metric=0.023890784982935155}\n", + "2024-05-29 14:14:33,918 pid:72955 MainThread giskard.core.suite INFO Underconfidence on data slice “`relationship` == \"Husband\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.011710512846760161, 'p_threshold': 0.95}): {failed, metric=0.02013764975783839}\n", + "2024-05-29 14:14:33,918 pid:72955 MainThread giskard.core.suite INFO Underconfidence on data slice “`age` >= 48.500 AND `age` < 58.500” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.011710512846760161, 'p_threshold': 0.95}): {failed, metric=0.01647940074906367}\n", + "2024-05-29 14:14:33,918 pid:72955 MainThread giskard.core.suite INFO Underconfidence on data slice “`gender` == \"Male\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.011710512846760161, 'p_threshold': 0.95}): {failed, metric=0.013415574402467233}\n", + "2024-05-29 14:14:33,919 pid:72955 MainThread giskard.core.suite INFO Recall on data slice “`relationship` == \"Own-child\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5253512132822478}): {failed, metric=0.2962962962962963}\n", + "2024-05-29 14:14:33,919 pid:72955 MainThread giskard.core.suite INFO Recall on data slice “`workclass` == \"?\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5253512132822478}): {failed, metric=0.3448275862068966}\n", + "2024-05-29 14:14:33,919 pid:72955 MainThread giskard.core.suite INFO Recall on data slice “`relationship` == \"Not-in-family\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5253512132822478}): {failed, metric=0.3490909090909091}\n", + "2024-05-29 14:14:33,919 pid:72955 MainThread giskard.core.suite INFO Recall on data slice “`relationship` == \"Unmarried\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5253512132822478}): {failed, metric=0.38095238095238093}\n", + "2024-05-29 14:14:33,920 pid:72955 MainThread giskard.core.suite INFO Recall on data slice “`race` == \"Black\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5253512132822478}): {failed, metric=0.38333333333333336}\n", + "2024-05-29 14:14:33,920 pid:72955 MainThread giskard.core.suite INFO Recall on data slice “`workclass` == \"Self-emp-not-inc\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5253512132822478}): {failed, metric=0.391304347826087}\n", + "2024-05-29 14:14:33,920 pid:72955 MainThread giskard.core.suite INFO Recall on data slice “`gender` == \"Female\"” ({'model': , 'dataset': , 'slicing_function': , 'threshold': 0.5253512132822478}): {failed, metric=0.5231607629427792}\n" ] }, { "data": { - "text/plain": "", - "text/html": "\n\n\n\n\n\n
\n
\n
\n \n \n close\n \n \n Test suite failed.\n To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation)\n \n
\n
\n \n \n
\n Test Overconfidence on data slice “`hours-per-week` < 41.500”\n
\n \n Measured Metric = 0.50771\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 018699ef-a664-48b8-b9a2-2db999094be4\n
\n \n
\n dataset\n salary_data\n
\n \n
\n slicing_function\n `hours-per-week` < 41.500\n
\n \n
\n threshold\n 0.5045638359329867\n
\n \n
\n p_threshold\n 0.5\n
\n \n
\n
\n \n \n
\n Test Underconfidence on data slice “`relationship` == "Husband"”\n
\n \n Measured Metric = 0.02269\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 018699ef-a664-48b8-b9a2-2db999094be4\n
\n \n
\n dataset\n salary_data\n
\n \n
\n slicing_function\n `relationship` == "Husband"\n
\n \n
\n threshold\n 0.01384993346299519\n
\n \n
\n p_threshold\n 0.95\n
\n \n
\n
\n \n \n
\n Test Underconfidence on data slice “`age` >= 40.500 AND `age` < 56.500”\n
\n \n Measured Metric = 0.02032\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 018699ef-a664-48b8-b9a2-2db999094be4\n
\n \n
\n dataset\n salary_data\n
\n \n
\n slicing_function\n `age` >= 40.500 AND `age` < 56.500\n
\n \n
\n threshold\n 0.01384993346299519\n
\n \n
\n p_threshold\n 0.95\n
\n \n
\n
\n \n \n
\n Test Underconfidence on data slice “`fnlwgt` < 128385.000 AND `fnlwgt` >= 99990.000”\n
\n \n Measured Metric = 0.01793\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 018699ef-a664-48b8-b9a2-2db999094be4\n
\n \n
\n dataset\n salary_data\n
\n \n
\n slicing_function\n `fnlwgt` < 128385.000 AND `fnlwgt` >= 99990.000\n
\n \n
\n threshold\n 0.01384993346299519\n
\n \n
\n p_threshold\n 0.95\n
\n \n
\n \n \n \n
\n Test Underconfidence on data slice “`gender` == "Male"”\n
\n \n Measured Metric = 0.01496\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 018699ef-a664-48b8-b9a2-2db999094be4\n
\n \n
\n dataset\n salary_data\n
\n \n
\n slicing_function\n `gender` == "Male"\n
\n \n
\n threshold\n 0.01384993346299519\n
\n \n
\n p_threshold\n 0.95\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`relationship` == "Own-child"”\n
\n \n Measured Metric = 0.25926\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 018699ef-a664-48b8-b9a2-2db999094be4\n
\n \n
\n dataset\n salary_data\n
\n \n
\n slicing_function\n `relationship` == "Own-child"\n
\n \n
\n threshold\n 0.5277777777777778\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`workclass` == "?"”\n
\n \n Measured Metric = 0.32759\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 018699ef-a664-48b8-b9a2-2db999094be4\n
\n \n
\n dataset\n salary_data\n
\n \n
\n slicing_function\n `workclass` == "?"\n
\n \n
\n threshold\n 0.5277777777777778\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`relationship` == "Not-in-family"”\n
\n \n Measured Metric = 0.36364\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 018699ef-a664-48b8-b9a2-2db999094be4\n
\n \n
\n dataset\n salary_data\n
\n \n
\n slicing_function\n `relationship` == "Not-in-family"\n
\n \n
\n threshold\n 0.5277777777777778\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`workclass` == "Self-emp-not-inc"”\n
\n \n Measured Metric = 0.3913\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 018699ef-a664-48b8-b9a2-2db999094be4\n
\n \n
\n dataset\n salary_data\n
\n \n
\n slicing_function\n `workclass` == "Self-emp-not-inc"\n
\n \n
\n threshold\n 0.5277777777777778\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`race` == "Black"”\n
\n \n Measured Metric = 0.39167\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 018699ef-a664-48b8-b9a2-2db999094be4\n
\n \n
\n dataset\n salary_data\n
\n \n
\n slicing_function\n `race` == "Black"\n
\n \n
\n threshold\n 0.5277777777777778\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`relationship` == "Unmarried"”\n
\n \n Measured Metric = 0.4127\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 018699ef-a664-48b8-b9a2-2db999094be4\n
\n \n
\n dataset\n salary_data\n
\n \n
\n slicing_function\n `relationship` == "Unmarried"\n
\n \n
\n threshold\n 0.5277777777777778\n
\n \n
\n \n \n \n
\n Test Recall on data slice “`gender` == "Female"”\n
\n \n Measured Metric = 0.50409\n \n \n \n close\n \n \n Failed\n \n \n
\n
\n \n
\n model\n 018699ef-a664-48b8-b9a2-2db999094be4\n
\n \n
\n dataset\n salary_data\n
\n \n
\n slicing_function\n `gender` == "Female"\n
\n \n
\n threshold\n 0.5277777777777778\n
\n \n
\n \n \n\n \n\n" + "text/html": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
\n", + "
\n", + "
\n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Test suite failed.\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Overconfidence on data slice “`hours-per-week` < 41.500”\n", + "
\n", + " \n", + " Measured Metric = 0.50412\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " salary_cls\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " salary_data\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `hours-per-week` < 41.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.4973034997131383\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.5\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Underconfidence on data slice “`age` >= 41.500 AND `age` < 45.500”\n", + "
\n", + " \n", + " Measured Metric = 0.02389\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " salary_cls\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " salary_data\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `age` >= 41.500 AND `age` < 45.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.011710512846760161\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Underconfidence on data slice “`relationship` == "Husband"”\n", + "
\n", + " \n", + " Measured Metric = 0.02014\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " salary_cls\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " salary_data\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `relationship` == "Husband"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.011710512846760161\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " Test Underconfidence on data slice “`age` >= 48.500 AND `age` < 58.500”\n", + "
\n", + " \n", + " Measured Metric = 0.01648\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " salary_cls\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " salary_data\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `age` >= 48.500 AND `age` < 58.500\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.011710512846760161\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Underconfidence on data slice “`gender` == "Male"”\n", + "
\n", + " \n", + " Measured Metric = 0.01342\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " salary_cls\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " salary_data\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `gender` == "Male"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.011710512846760161\n", + "
\n", + " \n", + "
\n", + " p_threshold\n", + " 0.95\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`relationship` == "Own-child"”\n", + "
\n", + " \n", + " Measured Metric = 0.2963\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " salary_cls\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " salary_data\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `relationship` == "Own-child"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5253512132822478\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`workclass` == "?"”\n", + "
\n", + " \n", + " Measured Metric = 0.34483\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " salary_cls\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " salary_data\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `workclass` == "?"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5253512132822478\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`relationship` == "Not-in-family"”\n", + "
\n", + " \n", + " Measured Metric = 0.34909\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " salary_cls\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " salary_data\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `relationship` == "Not-in-family"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5253512132822478\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`relationship` == "Unmarried"”\n", + "
\n", + " \n", + " Measured Metric = 0.38095\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " salary_cls\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " salary_data\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `relationship` == "Unmarried"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5253512132822478\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`race` == "Black"”\n", + "
\n", + " \n", + " Measured Metric = 0.38333\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " salary_cls\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " salary_data\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `race` == "Black"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5253512132822478\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`workclass` == "Self-emp-not-inc"”\n", + "
\n", + " \n", + " Measured Metric = 0.3913\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " salary_cls\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " salary_data\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `workclass` == "Self-emp-not-inc"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5253512132822478\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + " Test Recall on data slice “`gender` == "Female"”\n", + "
\n", + " \n", + " Measured Metric = 0.52316\n", + " \n", + " \n", + " \n", + " close\n", + " \n", + " \n", + " Failed\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + "
\n", + " model\n", + " salary_cls\n", + "
\n", + " \n", + "
\n", + " dataset\n", + " salary_data\n", + "
\n", + " \n", + "
\n", + " slicing_function\n", + " `gender` == "Female"\n", + "
\n", + " \n", + "
\n", + " threshold\n", + " 0.5253512132822478\n", + "
\n", + " \n", + "
\n", + " \n", + " \n", + "\n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] }, - "execution_count": 14, + "execution_count": 12, "metadata": {}, "output_type": "execute_result" } @@ -563,107 +3650,6 @@ "source": [ "test_suite.add_test(testing.test_f1(model=giskard_model, dataset=giskard_dataset, threshold=0.7)).run()" ] - }, - { - "cell_type": "markdown", - "source": [ - "## Debug and interact with your tests in the Giskard Hub\n", - "\n", - "At this point, you've created a test suite that is highly specific to your domain & use-case. Failing tests can be a pain to debug, which is why we encourage you to head over to the Giskard Hub.\n", - "\n", - "Play around with a demo of the Giskard Hub on HuggingFace Spaces using [this link](https://huggingface.co/spaces/giskardai/giskard).\n", - "\n", - "More than just debugging tests, the Giskard Hub allows you to:\n", - "\n", - "* Compare models to decide which model to promote\n", - "* Automatically create additional domain-specific tests through our automated model insights feature\n", - "* Share your test results with team members and decision makers\n", - "\n", - "The Giskard Hub can be deployed easily on HuggingFace Spaces." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "Here's a sneak peek of automated model insights on a credit scoring classification model." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.09.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "![CleanShot 2023-09-26 at 18.38.50.png]()" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "# Create a Giskard client after having install the Giskard server (see documentation)\n", - "api_key = \"\" #This can be found in the Settings tab of the Giskard hub\n", - "#hf_token = \"\" #If the Giskard Hub is installed on HF Space, this can be found on the Settings tab of the Giskard Hub\n", - "\n", - "client = GiskardClient(\n", - " url=\"http://localhost:19000\", # Option 1: Use URL of your local Giskard instance.\n", - " # url=\"\", # Option 2: Use URL of your remote HuggingFace space.\n", - " key=api_key,\n", - " # hf_token=hf_token # Use this token to access a private HF space.\n", - ")\n", - "\n", - "project_key = \"my_project\"\n", - "my_project = client.create_project(project_key, \"PROJECT_NAME\", \"DESCRIPTION\")\n", - "\n", - "# Upload to the project you just created\n", - "test_suite.upload(client, project_key)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Download a test suite from the Giskard Hub\n", - "\n", - "After curating your test suites with additional tests on the Giskard Hub, you can easily download them back into your environment. This allows you to:\n", - " \n", - "- Check for regressions after training a new model\n", - "- Automate the test suite execution in a CI/CD pipeline\n", - "- Compare several models during the prototyping phase" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "test_suite_downloaded = Suite.download(client, project_key, suite_id=...)\n", - "test_suite_downloaded.run()" - ], - "metadata": { - "collapsed": false - } } ], "metadata": { @@ -675,14 +3661,14 @@ "language_info": { "codemirror_mode": { "name": "ipython", - "version": 2 + "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython2", - "version": "2.7.6" + "pygments_lexer": "ipython3", + "version": "3.10.14" } }, "nbformat": 4, diff --git a/docs/reference/slicing-functions/index.rst b/docs/reference/slicing-functions/index.rst index 1d718c9b20..3f5abec88d 100644 --- a/docs/reference/slicing-functions/index.rst +++ b/docs/reference/slicing-functions/index.rst @@ -6,8 +6,6 @@ Slicing functions .. autoclass:: giskard.registry.slicing_function.SlicingFunction .. automethod:: execute - .. automethod:: upload - .. automethod:: download Textual slicing --------------- diff --git a/docs/reference/suite/index.rst b/docs/reference/suite/index.rst index ca1dffa54f..8ddc58994d 100644 --- a/docs/reference/suite/index.rst +++ b/docs/reference/suite/index.rst @@ -10,8 +10,6 @@ Test suite .. automethod:: remove_test .. automethod:: upgrade_test .. automethod:: update_test_params - .. automethod:: upload - .. automethod:: download .. autoclass:: giskard.core.suite.SuiteInput @@ -21,6 +19,4 @@ Test suite .. autoclass:: giskard.core.suite.TestSuiteResult - .. automethod:: upload - .. autoclass:: giskard.core.test_result.TestResult diff --git a/docs/reference/transformation-functions/index.rst b/docs/reference/transformation-functions/index.rst index 6528ccf7a1..5ad58538af 100644 --- a/docs/reference/transformation-functions/index.rst +++ b/docs/reference/transformation-functions/index.rst @@ -8,8 +8,6 @@ Transformation functions .. autoclass:: giskard.registry.transformation_function.TransformationFunction .. automethod:: execute - .. automethod:: upload - .. automethod:: download Textual transformation functions -------------------------------- diff --git a/giskard/visualization/templates/scan_report/html/_code_snippet.html b/giskard/visualization/templates/scan_report/html/_code_snippet.html index 3cae7daf1a..4bad732009 100644 --- a/giskard/visualization/templates/scan_report/html/_code_snippet.html +++ b/giskard/visualization/templates/scan_report/html/_code_snippet.html @@ -1,30 +1,12 @@
-

Debug your issues in the Giskard hub

- -

- Install the Giskard hub app to: -

-
    -
  • Debug and diagnose your scan issues
  • -
  • Save your scan result as a re-executable test suite to benchmark your model
  • -
  • Extend your test suite with our catalog of ready-to-use tests
  • -
-

- You can find installation instructions here. -

-
-
{% raw %}from giskard import GiskardClient
-
-# Create a test suite from your scan results
-test_suite = results.generate_test_suite("My first test suite")
-
-# Upload your test suite to your Giskard hub instance
-client = GiskardClient("http://localhost:19000", "GISKARD_API_KEY")
-client.create_project("my_project_id", "my_project_name")
-test_suite.upload(client, "my_project_id"){% endraw %}
-
-
+

What's next?

+
+

+ 1. Generate a test suite from your scan results +

+
{% raw %}test_suite = results.generate_test_suite("My first test suite"){% endraw %}
+
+

2. Run your test suite

+
{% raw %}test_suite.run(){% endraw %}
+
diff --git a/giskard/visualization/templates/suite_results/_suite_results_header.html b/giskard/visualization/templates/suite_results/_suite_results_header.html index 33c9c6b8c0..7bf6a24d42 100644 --- a/giskard/visualization/templates/suite_results/_suite_results_header.html +++ b/giskard/visualization/templates/suite_results/_suite_results_header.html @@ -1,19 +1,18 @@ -
+
{% if passed %} - + check - + Test suite passed. {% else %} - + close + d="M19,6.41L17.59,5L12,10.59L6.41,5L5,6.41L10.59,12L5,17.59L6.41,19L12,13.41L17.59,19L19,17.59L13.41,12L19,6.41Z" + /> Test suite failed. - To debug your failing test and diagnose the issue, please run the Giskard hub (see documentation) {% endif %}