{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Contents:\n", "- [Model interpretability](#Model-interpretability)\n", " - [Interpretable models](#Interpretable-models)\n", " - [Linear regression](#Linear-regression)\n", " - [Model agnostic methods](#Model-agnostic-methods)\n", " - [Feature Importance](#Feature-Importance)\n", " - [Feature importance with pipelines](#Feature-importance-with-pipelines)\n", " - [Partial dependance plots and Individual Conditional Expectations with sklearn](#Partial-dependance-plots-and-Individual-Conditional-Expectations-with-sklearn)\n", " - [Partial dependance plots and Individual Conditional Expectations with pdpbox](#Partial-dependance-plots-and-Individual-Conditional-Expectations-with-pdpbox)\n", " - [Local Interpretable Model-agnostic Explanations (LIME)](#Local-Interpretable-Model-agnostic-Explanations-(LIME))\n", " - [SHAP (SHapley Additive exPlanations)](#SHAP-(SHapley-Additive-exPlanations))\n", " - [Numerical and Categorical features in the same dataset](#Numerical-and-Categorical-features-in-the-same-dataset)\n", " - [Feature Importance](#Feature-Importance)\n", " - [PDP](#PDP)\n", " - [LIME (optional)](#LIME-(optional))\n", " - [SHAP](#SHAP)\n", "- [Bias and Fairness in Machine Learning](#Bias-and-Fairness-in-Machine-Learning)\n", " - [Preprocessing and model training](#Preprocessing-and-model-training)\n", " - [Bias Audit](#Bias-Audit)\n", " - [A naive approach to Bias Mitigation](#A-naive-approach-to-Bias-Mitigation)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Model interpretability " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's start with the basic imports." ] }, { "cell_type": "code", "execution_count": 256, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interpretable models\n", "The easiest way to achieve interpretability is to use only a subset of algorithms that create interpretable models. Linear regression, logistic regression and the decision tree are commonly used interpretable models. Here we will quickly revisit the linear regression model." ] }, { "attachments": { "c7f95fb4-eee7-4e71-a3db-5ea0eee545e3.png": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYwAAAA7CAYAAABogb0zAAAQfklEQVR4Ae2dietVRRTH+58KyqSgwEopTFPUqFyyXbTU3EqjqBBbjGgn0Ygsk6xcoNAUicICbbe9rExsNVs0NZ34DByd33333XffvXPve/f3+w78eO/dbWa+c+Z8zzlz5v5OcypCQAgIASEgBHIgcFqOa3SJEBACQkAICAEnwpAQCAEhIASEQC4ERBi5YNJFQkAICAEhIMKQDAgBISAEhEAuBEQYuWDSRUJACAgBISDCkAwIASEgBIRALgREGLlg0kVCQAgIASEgwpAMCAEhIASEQC4ERBi5YNJFQkAICAEhIMKQDAgBISAEhEAuBEQYuWDSRUJACAgBISDCkAwIASEgBIRALgREGLlg0kVCQAgIASEgwpAMCAEhIASEQC4ERBi5YNJFQkAICAEhIMKQDAgBISAEhEAuBEQYuWA6ddGBAwfcM8884yZOnOhOP/10//ncc8+5Q4cOnbpokH07fvy427Vrl1uwYIE777zz/B/fP/30U3fixIlB1lt1p9cISN56PQLt6xdhtMem5QxK85JLLnH33nuv27t3r0Ow33vvPTdmzBi3ZMkS988//7Tc0/QD9Om+++5zl156qdu6dav7999/3V9//eUeeughTxzvvPNO07uo9vcRAp3kbceOHX3U2qHXFBFGzjH/+uuv3WWXXeZefPFF999//w246+mnn/bexptvvjngeNN/0M8nnnjCTZs2zf34448DuvPDDz94opw9e/agJMoBndWPWhAI5Q2DLCzI29ixYx3y9vfff4en9L1GBEQYOcDG6lm4cKG755573JEjR1ruePjhhz1hPPbYYy3nmnzg7bffdiNHjnQffvhhSze++eYbd/HFF7vRo0e77777ruW8DgiBbhEwefvggw9abkXe8O6Rtz179rSc14F6EBBh5MB58+bNXlAR2mQ5evSou/vuuz1hrFixInm6st/79u3zlv8nn3xSSR0HDx50N910k4MEkx4VFX700UfunHPOcePGjWvxPippUA8eWjXGPehS4SqRc0KTrN9VUfLI27nnnuvlLel9VNGeXjwTjO+//36Pcb+uDYowOkgG3sWtt97qHnzwwVTF+fvvv7upU6e6M88807311lsdnhbvtFn4O3fujPfQ4En0ZdSoUe6LL74Ijp76+uqrr3qSBBswGoylaoybhNnhw4fdbbfd5vCmqygmb59//nnq44eCvIHx7bff7jEWYaSKQf8fxJK+4IILvEWd1loEHbK4/vrrHVZSXaVKZWZeE54T35PFSJQssddffz15etD8rhLjpoFUJWHkkbd58+Z5A+W1115rGnS52yvCyA1V/164evVqd+ONN7o///yzpZFkC82ZM6cn2UJVKrOffvrJTZo0yW3ZsqWlzxxgcR+SHKyZYdbpKjG2OpryWSVh5JG3s846a9DL25AhDNwn0kunT5/uFcl1112XulD622+/+fjcpk2bUsM7/TZ5bJI88sgjfmGXmD6KEssar4NFOPYlYGXX7UJWqcwIc7G4+OWXX7o1a9b4vtJn+k6m2LBhw3y2CuNZprB35aWXXvJ7WXg2ckOaLhbnxo0bfRYWdZFwwHpC3aVKjOvuS9n6bC5UEZLqR3lLZgWWxS/P/VUSBuuQ4HznnXd6vcV85m/+/PldhZSjrGEwycnTJ8uBPGkaQlyf+L4VFCqKl3N5UzEJfbBRruxf0Rh7aPkQW73iiit8++kDihPSgDB6sRehSmVmXtWvv/7q02rpI33mc/z48VG8i19++cUvqrOnZf/+/d6DgxggDrwbUnnJvnrjjTd8vcTPmVB1lioxrrMfMeqqkjDqkrcZM2b4PVQmb4sWLRogb2RfIW/nn3++X6+pezNuVYTBloBrrrnGz2EiIu+///5JndrtnCpNGD///LO78sorHYtSFMIYKJdk9gwhHUI7nIM4Olnk5FrffPPN/nruKfMHWEXWF8zyQXGkFVMol19+uVd6addUdczqpo0xiymGdmOEpULmFOPxwgsvpI4j1zB+7caY1GRSlJMhLVvYJPuK1EoywIys2oUF6TsGQdpaS1lcqsK4bLt6cb/JRWwPw55bRt464YG8YZggb+EeDuTtjDPO8Nl+KFHkDbJAtpG3IjqjU1uyzoNF7EVvyIL9K/Tp8ccfd8eOHctqQsdzpQmDcAyEAXEwMLg8NI7PcM/CZ5995kaMGOHPtYuNd2xtzReQJpvlDWGBYBXT37BPf/zxh3v00Ud9WAdree7cua5d9kfRLlWlzHDFIfusTYh4VDapwrUdBJ5zhCazPALIgJBXmIEFubB7PHzu7t27vRdHWIoQVUhA7LLHA3zqqad8iCw2cTIuVWFcdMx7eZ8p9tiEkVfeUOxFlTjyRgQknINJeYMckLcLL7zQy9OGDRsGyFsd2McmDPqEV8WcIpyOXipbShPG999/7yc+A4ACuOiii7ybl0wxNeuRzV5MxH4vNkHaWT60365hQGwiEdcnY8qsZ3Ch75Bl2ga4djh0CscxCUh73b59+0n3Mi10h0IPFW27+ux4J6+K67iGPttYGg6s6RCK5FwWYRCO+vjjjwesY3XjgVI/JIKhQhuorwhh9Apjw7qfPvHQ0uTHjhHGIYWavRh2LO0TJZW2b6ddX/PKG4TBWGMxd1vayRtKFNnJmuPd1pV1fR6MyQYDY8L5afhyLC/GGPPgRh95G0XymczbbktpwggrxCKnccn1C4CyzW1ZFnv4rF5/z2P5MHDE2umz7fLm1SH8Dtc1IBGUW5YSDfuLZwbh8Jyyf7SPduYtnbwqnkPSAu1K2+VtZJK3r9auIh6oERVt6ZYweomx9bmfPm1My8oba1C8cy1vyStvKL6Yu7yRN9Yg6S8bc+soYGwKvAzOeTBG5xL2zaoPouyG3MEoGmFkWYimMAHJFGsdA1SmDhR+mkIMn2khC/pFSMoUGJZ/aAlZ6Mos8vAZRb9b3d0qyqz6rJ1ZY4S3gqDR57R1haKEYWtf3WBkeNOWmDgYRlVgbM9u2qdhbZ50jPYXkbdujJ+sNiJvKNOiXkvWs4ueA+NYaxjgdPXVV/t5itEYIxxFv6IRRmghhtY1ldhrJJjYWbHxEOheL3pj+UyePNm7hWG7wu/mTZA5xMvRcBe5J6n0bLLFVGxVKDPzqgihtSuW5EBfVq5c2RLuKkIYWDnLli3zwp30QDlH+IqwQrJUgWtYRxUYh89v0nfDOiZh5JW3q666ysuGyRsywb8UwDAjc5HF6lWrVvnsOo6xFpaVGcn9hH2QYeQtXAhPkzeyp2644Qa/II5eILQ8a9YsnzHIvznoxqPKGvOYhIEumjJliu8j3j7kHKNEIwxTFEllSSNJm2NwkplTnTrQKcbcLsaXPJ4lPGlt4HoEiTBSu70GKE6YG/cQBYvlbQomiYFNNjDIUsZpbWl3zOqKaVlD5rRx3bp1qdXSx1deecVfQ+ZZmhI3OcgKSeGaswYBTuz1CEko6d3wPzfw9FizSZYQ15g4WD1VYIxssmeJTZ95Cov7GGO0he95Srd15HmmYR2TMLqVN+SEwlrp4sWLveJm3Yw5yH4hwoz2VlsjF65H3s4++2wvb9zLc4yE0uSNBXKypigQCAREW+k784N5D4kwHyAQDEbWcsuWmIQRhqT6kjDMi0iGY1AqlgOcpUTKgh3zfrN8EMQ0jwhCsTUGXv+NUFFMwTSVMBB+JgS52mkKDc+RNFf+/0cYcgux70QYYeiSZ2EdgjFYJwnVcG73lmBTYtzXBMJAmaFcaK95pSF2ye8oJBQf1/MXKsHktfa72zrsvk6fhnVMwsgjb6S5Im9fffXVySZyH0YoOocXEi5duvTkHIQssawtowp5s8Vt5A1vFXlj5ziYYgBZQd7uuOMOH/vn/75Q0AV4E+DKOizJKzzDCgYgoa0Y6yAxCYP2kXhEP8EvBqHxzGgehr0mg0F4/vnn/T/awXUjxdIEnkFuQkGgUPqkwwI2mxEhBVgbS5c+sWC2fv36k4JKv5pMGOZV3XLLLb7PDzzwgGPjHkqL7AomKV4BZJK167oTYZjiYfKySQpFQJIE3hykgeVIiAArEFJmsqd5MuBtz0K+mkAYeBZGjHzyO6skF+bBg2NZpds6sp4VnjOsYxFGGXnjXuYiypqxD712NnviIRAaRm5NCSNvrFsgb3gIobyhu0J5M0+G/jPvkUciDdyHrIZRB/Y20IaQeELcuvlubQVj5l3ZQtufffZZTxpkbn777bfeSwUXjhNOC4k4T33RCIPKWFgBQHMTAdf2ZbAZC4ugCQU3lf0VLBxBCvbvWJnk9AkBTbPAsW6auobBRCP0gxfB66PZBcskYzJAjlj55LF3Cot0IgzG36w28MQjZY8FCgACmTBhwsk6saizYq+mxJpCGCg6iHj48OH+k9+dCgSAAuSvE8HwrCJ1dGoD5w3rWIRRVt5Qhqx7YfETtrSCsYfFD7mGXgKbgMvIm2VVUadtfgNrDCieG2MdIzZhgAnEAymwcRG9zFzhk8SVIq+Jj0oYNmj2SWMtowZFC7P1e0EIWL/Aou622KRCiBEwK/bMZKjKzhf5NG8mlmXNROt2jSmt3XkII+2+IscM76YQRpE+9ss9hnUswigrb+gSLH7+LAMIfWNhLtY0YhYLPWHcWLH1ErwVNpGWLVUQRtk2Je+PQhhYPliGhDNCdw3XDjCZ0AxkDDcr2YHYv1HEWNpFFbG5yUwIK/ZOqphrODEJw4g9RvtEGDbqg+szJmHEkLfQ4sfboBBCvfbaa/3GWb7HKjyfrKqkN8NcZ42AXeExypAgDKxndoBCCqGlh1DwriGO5VngiwF4jGeg6HmdR1GLgTAWcXdCWmADDsQ3Ce8UJaG0fhH+Wr58eZTFLPOAYqwxEdJizOl/VjgprU/dHrM8fupLpnJ3+6y062NinPb8Jh1Daa5du9aHDcu2O4a82SY4PAzmKms7hJKTC+Rl28r94Z6Gl19+2c9psq14R1OY9FK2rpgYl21Lu/tLexgWt2fSsrBiHgZZNAweirKKydyuQ2WOx7B8qB8XmVg1nkpV75Iq08/kvWW9KrM+kYG0v5hESdvNi0mrK4aXlMRHv+MiUFbeUKy2fkHmEnqGv7vuuiszIaNoL8ybmTlzpjcGiabwwlFevUFbhlIpTRgwOwuiLKQALIui7777rs8ygix68b8iig6geUsxLO2ibejFfbyWvoxX1Ys2q87mIlBW3sziT2YsVYWIeTMxMqGqamNdzy1NGDQ0zI7C6iOr6Mknn/TZMHV1RPUIASEwNBAwi7/dvxCOiQIeBOsXZHl28/LQmG3op2dFIYx+6pDaIgSEwOBGwBJL6ogEWDYWHjhv7B3qRYQx1CVA/RcCDUGAvU+sI4RrV7aju4oubNu2reVtrzF2dFfR1rqeKcKoC2nVIwSEgBBoOAIijIYPoJovBISAEKgLARFGXUirHiEgBIRAwxEQYTR8ANV8ISAEhEBdCIgw6kJa9QgBISAEGo6ACKPhA6jmCwEhIATqQkCEURfSqkcICAEh0HAERBgNH0A1XwgIASFQFwIijLqQVj1CQAgIgYYjIMJo+ACq+UJACAiBuhAQYdSFtOoRAkJACDQcARFGwwdQzRcCQkAI1IWACKMupFWPEBACQqDhCIgwGj6Aar4QEAJCoC4E/gcHFcrGJHyBXgAAAABJRU5ErkJggg==" } }, "cell_type": "markdown", "metadata": {}, "source": [ "### Linear regression\n", "Linear regression has the following form:\n", "
\n", " | MedInc | \n", "HouseAge | \n", "AveRooms | \n", "AveBedrms | \n", "Population | \n", "AveOccup | \n", "Latitude | \n", "Longitude | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "8.3252 | \n", "41.0 | \n", "6.984127 | \n", "1.023810 | \n", "322.0 | \n", "2.555556 | \n", "37.88 | \n", "-122.23 | \n", "
1 | \n", "8.3014 | \n", "21.0 | \n", "6.238137 | \n", "0.971880 | \n", "2401.0 | \n", "2.109842 | \n", "37.86 | \n", "-122.22 | \n", "
2 | \n", "7.2574 | \n", "52.0 | \n", "8.288136 | \n", "1.073446 | \n", "496.0 | \n", "2.802260 | \n", "37.85 | \n", "-122.24 | \n", "
3 | \n", "5.6431 | \n", "52.0 | \n", "5.817352 | \n", "1.073059 | \n", "558.0 | \n", "2.547945 | \n", "37.85 | \n", "-122.25 | \n", "
4 | \n", "3.8462 | \n", "52.0 | \n", "6.281853 | \n", "1.081081 | \n", "565.0 | \n", "2.181467 | \n", "37.85 | \n", "-122.25 | \n", "
Dep. Variable: | MedHouseVal | R-squared (uncentered): | 0.145 | \n", "
---|---|---|---|
Model: | OLS | Adj. R-squared (uncentered): | 0.145 | \n", "
Method: | Least Squares | F-statistic: | 351.1 | \n", "
Date: | Fri, 04 Apr 2025 | Prob (F-statistic): | 0.00 | \n", "
Time: | 12:04:26 | Log-Likelihood: | -36399. | \n", "
No. Observations: | 16512 | AIC: | 7.281e+04 | \n", "
Df Residuals: | 16504 | BIC: | 7.288e+04 | \n", "
Df Model: | 8 | \n", " | |
Covariance Type: | nonrobust | \n", " |
coef | std err | t | P>|t| | [0.025 | 0.975] | \n", "|
---|---|---|---|---|---|---|
MedInc | 0.8544 | 0.027 | 31.400 | 0.000 | 0.801 | 0.908 | \n", "
HouseAge | 0.1225 | 0.019 | 6.453 | 0.000 | 0.085 | 0.160 | \n", "
AveRooms | -0.2944 | 0.048 | -6.128 | 0.000 | -0.389 | -0.200 | \n", "
AveBedrms | 0.3393 | 0.044 | 7.729 | 0.000 | 0.253 | 0.425 | \n", "
Population | -0.0023 | 0.018 | -0.127 | 0.899 | -0.038 | 0.033 | \n", "
AveOccup | -0.0408 | 0.017 | -2.380 | 0.017 | -0.074 | -0.007 | \n", "
Latitude | -0.8969 | 0.052 | -17.314 | 0.000 | -0.998 | -0.795 | \n", "
Longitude | -0.8698 | 0.051 | -17.101 | 0.000 | -0.970 | -0.770 | \n", "
Omnibus: | 3333.187 | Durbin-Watson: | 0.211 | \n", "
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 9371.466 | \n", "
Skew: | 1.071 | Prob(JB): | 0.00 | \n", "
Kurtosis: | 6.006 | Cond. No. | 6.54 | \n", "
\n", " | feature | \n", "coefficient | \n", "
---|---|---|
0 | \n", "MedInc | \n", "0.854383 | \n", "
1 | \n", "HouseAge | \n", "0.122546 | \n", "
2 | \n", "AveRooms | \n", "-0.294410 | \n", "
3 | \n", "AveBedrms | \n", "0.339259 | \n", "
4 | \n", "Population | \n", "-0.002308 | \n", "
5 | \n", "AveOccup | \n", "-0.040829 | \n", "
6 | \n", "Latitude | \n", "-0.896929 | \n", "
7 | \n", "Longitude | \n", "-0.869842 | \n", "
\n", " | mean radius | \n", "mean texture | \n", "mean perimeter | \n", "mean area | \n", "mean smoothness | \n", "mean compactness | \n", "mean concavity | \n", "mean concave points | \n", "mean symmetry | \n", "mean fractal dimension | \n", "radius error | \n", "texture error | \n", "perimeter error | \n", "area error | \n", "smoothness error | \n", "compactness error | \n", "concavity error | \n", "concave points error | \n", "symmetry error | \n", "fractal dimension error | \n", "worst radius | \n", "worst texture | \n", "worst perimeter | \n", "worst area | \n", "worst smoothness | \n", "worst compactness | \n", "worst concavity | \n", "worst concave points | \n", "worst symmetry | \n", "worst fractal dimension | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "17.99 | \n", "10.38 | \n", "122.80 | \n", "1001.0 | \n", "0.11840 | \n", "0.27760 | \n", "0.3001 | \n", "0.14710 | \n", "0.2419 | \n", "0.07871 | \n", "1.0950 | \n", "0.9053 | \n", "8.589 | \n", "153.40 | \n", "0.006399 | \n", "0.04904 | \n", "0.05373 | \n", "0.01587 | \n", "0.03003 | \n", "0.006193 | \n", "25.38 | \n", "17.33 | \n", "184.60 | \n", "2019.0 | \n", "0.1622 | \n", "0.6656 | \n", "0.7119 | \n", "0.2654 | \n", "0.4601 | \n", "0.11890 | \n", "
1 | \n", "20.57 | \n", "17.77 | \n", "132.90 | \n", "1326.0 | \n", "0.08474 | \n", "0.07864 | \n", "0.0869 | \n", "0.07017 | \n", "0.1812 | \n", "0.05667 | \n", "0.5435 | \n", "0.7339 | \n", "3.398 | \n", "74.08 | \n", "0.005225 | \n", "0.01308 | \n", "0.01860 | \n", "0.01340 | \n", "0.01389 | \n", "0.003532 | \n", "24.99 | \n", "23.41 | \n", "158.80 | \n", "1956.0 | \n", "0.1238 | \n", "0.1866 | \n", "0.2416 | \n", "0.1860 | \n", "0.2750 | \n", "0.08902 | \n", "
2 | \n", "19.69 | \n", "21.25 | \n", "130.00 | \n", "1203.0 | \n", "0.10960 | \n", "0.15990 | \n", "0.1974 | \n", "0.12790 | \n", "0.2069 | \n", "0.05999 | \n", "0.7456 | \n", "0.7869 | \n", "4.585 | \n", "94.03 | \n", "0.006150 | \n", "0.04006 | \n", "0.03832 | \n", "0.02058 | \n", "0.02250 | \n", "0.004571 | \n", "23.57 | \n", "25.53 | \n", "152.50 | \n", "1709.0 | \n", "0.1444 | \n", "0.4245 | \n", "0.4504 | \n", "0.2430 | \n", "0.3613 | \n", "0.08758 | \n", "
3 | \n", "11.42 | \n", "20.38 | \n", "77.58 | \n", "386.1 | \n", "0.14250 | \n", "0.28390 | \n", "0.2414 | \n", "0.10520 | \n", "0.2597 | \n", "0.09744 | \n", "0.4956 | \n", "1.1560 | \n", "3.445 | \n", "27.23 | \n", "0.009110 | \n", "0.07458 | \n", "0.05661 | \n", "0.01867 | \n", "0.05963 | \n", "0.009208 | \n", "14.91 | \n", "26.50 | \n", "98.87 | \n", "567.7 | \n", "0.2098 | \n", "0.8663 | \n", "0.6869 | \n", "0.2575 | \n", "0.6638 | \n", "0.17300 | \n", "
4 | \n", "20.29 | \n", "14.34 | \n", "135.10 | \n", "1297.0 | \n", "0.10030 | \n", "0.13280 | \n", "0.1980 | \n", "0.10430 | \n", "0.1809 | \n", "0.05883 | \n", "0.7572 | \n", "0.7813 | \n", "5.438 | \n", "94.44 | \n", "0.011490 | \n", "0.02461 | \n", "0.05688 | \n", "0.01885 | \n", "0.01756 | \n", "0.005115 | \n", "22.54 | \n", "16.67 | \n", "152.20 | \n", "1575.0 | \n", "0.1374 | \n", "0.2050 | \n", "0.4000 | \n", "0.1625 | \n", "0.2364 | \n", "0.07678 | \n", "
RandomForestClassifier(random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
RandomForestClassifier(random_state=42)
Pipeline(steps=[('scaler', StandardScaler()),\n", " ('classifier', RandomForestClassifier(random_state=42))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Pipeline(steps=[('scaler', StandardScaler()),\n", " ('classifier', RandomForestClassifier(random_state=42))])
StandardScaler()
RandomForestClassifier(random_state=42)
RandomForestClassifier(random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
RandomForestClassifier(random_state=42)
\n", " | Feature | \n", "Contribution | \n", "
---|---|---|
0 | \n", "worst radius > 18.55 | \n", "-0.149064 | \n", "
1 | \n", "worst area > 1033.50 | \n", "-0.128015 | \n", "
2 | \n", "worst perimeter > 124.95 | \n", "-0.077437 | \n", "
3 | \n", "worst concavity > 0.39 | \n", "-0.066762 | \n", "
4 | \n", "smoothness error <= 0.01 | \n", "-0.064571 | \n", "
\n", " | age | \n", "workclass | \n", "education | \n", "capital-gain-category | \n", "hours-per-week | \n", "occupation | \n", "income | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "39 | \n", "State-gov | \n", "Bachelors | \n", "cat2 | \n", "40 | \n", "Adm-clerical | \n", "<=50K | \n", "
1 | \n", "50 | \n", "Self-emp-not-inc | \n", "Bachelors | \n", "cat1 | \n", "13 | \n", "Exec-managerial | \n", "<=50K | \n", "
2 | \n", "38 | \n", "Private | \n", "HS-grad | \n", "cat1 | \n", "40 | \n", "Handlers-cleaners | \n", "<=50K | \n", "
3 | \n", "53 | \n", "Private | \n", "11th | \n", "cat1 | \n", "40 | \n", "Handlers-cleaners | \n", "<=50K | \n", "
4 | \n", "28 | \n", "Private | \n", "Bachelors | \n", "cat1 | \n", "40 | \n", "Prof-specialty | \n", "<=50K | \n", "
Pipeline(steps=[('preprocessor',\n", " ColumnTransformer(remainder='passthrough',\n", " transformers=[('one_hot',\n", " OneHotEncoder(sparse_output=False),\n", " [1, 5]),\n", " ('ordinal',\n", " OrdinalEncoder(categories=[['Preschool',\n", " '1st-4th',\n", " '5th-6th',\n", " '7th-8th',\n", " '9th',\n", " '10th',\n", " '11th',\n", " '12th',\n", " 'HS-grad',\n", " 'Prof-school',\n", " 'Assoc-acdm',\n", " 'Assoc-voc',\n", " 'Some-college',\n", " 'Bachelors',\n", " 'Masters',\n", " 'Doctorate'],\n", " ['cat1',\n", " 'cat2',\n", " 'cat3',\n", " 'cat4']]),\n", " [2, 3])])),\n", " ('classifier', RandomForestClassifier(random_state=42))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Pipeline(steps=[('preprocessor',\n", " ColumnTransformer(remainder='passthrough',\n", " transformers=[('one_hot',\n", " OneHotEncoder(sparse_output=False),\n", " [1, 5]),\n", " ('ordinal',\n", " OrdinalEncoder(categories=[['Preschool',\n", " '1st-4th',\n", " '5th-6th',\n", " '7th-8th',\n", " '9th',\n", " '10th',\n", " '11th',\n", " '12th',\n", " 'HS-grad',\n", " 'Prof-school',\n", " 'Assoc-acdm',\n", " 'Assoc-voc',\n", " 'Some-college',\n", " 'Bachelors',\n", " 'Masters',\n", " 'Doctorate'],\n", " ['cat1',\n", " 'cat2',\n", " 'cat3',\n", " 'cat4']]),\n", " [2, 3])])),\n", " ('classifier', RandomForestClassifier(random_state=42))])
ColumnTransformer(remainder='passthrough',\n", " transformers=[('one_hot', OneHotEncoder(sparse_output=False),\n", " [1, 5]),\n", " ('ordinal',\n", " OrdinalEncoder(categories=[['Preschool',\n", " '1st-4th',\n", " '5th-6th',\n", " '7th-8th', '9th',\n", " '10th', '11th',\n", " '12th', 'HS-grad',\n", " 'Prof-school',\n", " 'Assoc-acdm',\n", " 'Assoc-voc',\n", " 'Some-college',\n", " 'Bachelors',\n", " 'Masters',\n", " 'Doctorate'],\n", " ['cat1', 'cat2',\n", " 'cat3', 'cat4']]),\n", " [2, 3])])
[1, 5]
OneHotEncoder(sparse_output=False)
[2, 3]
OrdinalEncoder(categories=[['Preschool', '1st-4th', '5th-6th', '7th-8th', '9th',\n", " '10th', '11th', '12th', 'HS-grad', 'Prof-school',\n", " 'Assoc-acdm', 'Assoc-voc', 'Some-college',\n", " 'Bachelors', 'Masters', 'Doctorate'],\n", " ['cat1', 'cat2', 'cat3', 'cat4']])
['age', 'hours-per-week']
passthrough
RandomForestClassifier(random_state=42)
\n", " | age | \n", "workclass | \n", "education | \n", "capital-gain-category | \n", "hours-per-week | \n", "occupation | \n", "
---|---|---|---|---|---|---|
31630 | \n", "41 | \n", "Private | \n", "Some-college | \n", "cat1 | \n", "40 | \n", "Craft-repair | \n", "
30766 | \n", "42 | \n", "Self-emp-not-inc | \n", "HS-grad | \n", "cat1 | \n", "99 | \n", "Farming-fishing | \n", "
14330 | \n", "67 | \n", "? | \n", "Doctorate | \n", "cat4 | \n", "5 | \n", "? | \n", "
388 | \n", "19 | \n", "Private | \n", "Some-college | \n", "cat1 | \n", "16 | \n", "Adm-clerical | \n", "
1196 | \n", "51 | \n", "Local-gov | \n", "Assoc-acdm | \n", "cat1 | \n", "60 | \n", "Prof-specialty | \n", "
\n", " | age | \n", "workclass | \n", "education | \n", "capital-gain-category | \n", "hours-per-week | \n", "occupation | \n", "
---|---|---|---|---|---|---|
31630 | \n", "41 | \n", "4 | \n", "12.0 | \n", "0.0 | \n", "40 | \n", "3 | \n", "
30766 | \n", "42 | \n", "6 | \n", "8.0 | \n", "0.0 | \n", "99 | \n", "5 | \n", "
14330 | \n", "67 | \n", "0 | \n", "15.0 | \n", "3.0 | \n", "5 | \n", "0 | \n", "
388 | \n", "19 | \n", "4 | \n", "12.0 | \n", "0.0 | \n", "16 | \n", "1 | \n", "
1196 | \n", "51 | \n", "2 | \n", "10.0 | \n", "0.0 | \n", "60 | \n", "10 | \n", "
\n", " | Feature | \n", "Contribution | \n", "
---|---|---|
0 | \n", "capital-gain-category=cat1 | \n", "-0.303316 | \n", "
1 | \n", "age > 48.00 | \n", "0.098196 | \n", "
2 | \n", "40.00 < hours-per-week <= 45.00 | \n", "0.080868 | \n", "
3 | \n", "occupation=Farming-fishing | \n", "-0.052247 | \n", "
4 | \n", "education=HS-grad | \n", "-0.051764 | \n", "
\n", " | minority | \n", "sex | \n", "ZIP | \n", "rent | \n", "education | \n", "age | \n", "income | \n", "loan_size | \n", "payment_timing | \n", "year | \n", "job_stability | \n", "default | \n", "occupation | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "1 | \n", "MT01RA | \n", "0 | \n", "5.007489 | \n", "19.248863 | \n", "14809.409378 | \n", "9842.141099 | \n", "1.554341 | \n", "6 | \n", "84.856189 | \n", "False | \n", "MZ01CD | \n", "
1 | \n", "0 | \n", "0 | \n", "MT01RA | \n", "0 | \n", "39.890228 | \n", "55.478499 | \n", "160198.976647 | \n", "4893.175948 | \n", "3.989138 | \n", "28 | \n", "77.270278 | \n", "False | \n", "MZ01CD | \n", "
2 | \n", "0 | \n", "1 | \n", "MT12RA | \n", "0 | \n", "4.950724 | \n", "24.238907 | \n", "15782.705073 | \n", "7478.246814 | \n", "3.526743 | \n", "23 | \n", "86.764053 | \n", "False | \n", "MZ01CD | \n", "
3 | \n", "0 | \n", "0 | \n", "MT01RA | \n", "0 | \n", "56.767243 | \n", "38.685639 | \n", "207510.864716 | \n", "4932.583272 | \n", "3.735728 | \n", "6 | \n", "57.549089 | \n", "True | \n", "MZ01CD | \n", "
4 | \n", "0 | \n", "0 | \n", "MT12RA | \n", "1 | \n", "61.487558 | \n", "30.977675 | \n", "211103.189551 | \n", "7907.702555 | \n", "3.520849 | \n", "9 | \n", "113.795162 | \n", "False | \n", "MZ01CD | \n", "
\n", " | Sex | \n", "Minority | \n", "label_value | \n", "score | \n", "
---|---|---|---|---|
0 | \n", "F | \n", "Minor | \n", "1 | \n", "0 | \n", "
1 | \n", "M | \n", "NonMinor | \n", "0 | \n", "0 | \n", "
2 | \n", "F | \n", "Minor | \n", "0 | \n", "0 | \n", "
3 | \n", "M | \n", "Minor | \n", "0 | \n", "1 | \n", "
4 | \n", "M | \n", "NonMinor | \n", "0 | \n", "0 | \n", "
\n", " | model_id | \n", "score_threshold | \n", "k | \n", "attribute_name | \n", "attribute_value | \n", "tpr | \n", "tnr | \n", "for | \n", "fdr | \n", "fpr | \n", "fnr | \n", "npv | \n", "precision | \n", "pp | \n", "pn | \n", "ppr | \n", "pprev | \n", "fp | \n", "fn | \n", "tn | \n", "tp | \n", "group_label_pos | \n", "group_label_neg | \n", "group_size | \n", "total_entities | \n", "prev | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "binary 0/1 | \n", "4381 | \n", "Sex | \n", "F | \n", "0.404462 | \n", "0.591605 | \n", "0.283671 | \n", "0.719629 | \n", "0.408395 | \n", "0.595538 | \n", "0.716329 | \n", "0.280371 | \n", "3556 | \n", "5175 | \n", "0.811687 | \n", "0.407284 | \n", "2559 | \n", "1468 | \n", "3707 | \n", "997 | \n", "2465 | \n", "6266 | \n", "8731 | \n", "17522 | \n", "0.282327 | \n", "
1 | \n", "0 | \n", "binary 0/1 | \n", "4381 | \n", "Sex | \n", "M | \n", "0.093224 | \n", "0.905899 | \n", "0.290610 | \n", "0.711515 | \n", "0.094101 | \n", "0.906776 | \n", "0.709390 | \n", "0.288485 | \n", "825 | \n", "7966 | \n", "0.188313 | \n", "0.093846 | \n", "587 | \n", "2315 | \n", "5651 | \n", "238 | \n", "2553 | \n", "6238 | \n", "8791 | \n", "17522 | \n", "0.290411 | \n", "
2 | \n", "0 | \n", "binary 0/1 | \n", "4381 | \n", "Minority | \n", "Minor | \n", "0.350962 | \n", "0.640101 | \n", "0.292208 | \n", "0.715795 | \n", "0.359899 | \n", "0.649038 | \n", "0.707792 | \n", "0.284205 | \n", "1798 | \n", "3234 | \n", "0.410409 | \n", "0.357313 | \n", "1287 | \n", "945 | \n", "2289 | \n", "511 | \n", "1456 | \n", "3576 | \n", "5032 | \n", "17522 | \n", "0.289348 | \n", "
3 | \n", "0 | \n", "binary 0/1 | \n", "4381 | \n", "Minority | \n", "NonMinor | \n", "0.203257 | \n", "0.791779 | \n", "0.286464 | \n", "0.719706 | \n", "0.208221 | \n", "0.796743 | \n", "0.713536 | \n", "0.280294 | \n", "2583 | \n", "9907 | \n", "0.589591 | \n", "0.206805 | \n", "1859 | \n", "2838 | \n", "7069 | \n", "724 | \n", "3562 | \n", "8928 | \n", "12490 | \n", "17522 | \n", "0.285188 | \n", "
\n", " | attribute_name | \n", "attribute_value | \n", "ppr_disparity | \n", "pprev_disparity | \n", "precision_disparity | \n", "fdr_disparity | \n", "for_disparity | \n", "fpr_disparity | \n", "fnr_disparity | \n", "tpr_disparity | \n", "tnr_disparity | \n", "npv_disparity | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Sex | \n", "F | \n", "4.310303 | \n", "4.339924 | \n", "0.971875 | \n", "1.011403 | \n", "0.976124 | \n", "4.339974 | \n", "0.656763 | \n", "4.338625 | \n", "0.653059 | \n", "1.009781 | \n", "
1 | \n", "Sex | \n", "M | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "
2 | \n", "Minority | \n", "Minor | \n", "0.696090 | \n", "1.727775 | \n", "1.013951 | \n", "0.994567 | \n", "1.020050 | \n", "1.728446 | \n", "0.814614 | \n", "1.726692 | \n", "0.808434 | \n", "0.991950 | \n", "
3 | \n", "Minority | \n", "NonMinor | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "
\n", " | attribute_name | \n", "attribute_value | \n", "tpr | \n", "tnr | \n", "for | \n", "fdr | \n", "fpr | \n", "fnr | \n", "npv | \n", "precision | \n", "ppr | \n", "pprev | \n", "prev | \n", "ppr_disparity | \n", "pprev_disparity | \n", "precision_disparity | \n", "fdr_disparity | \n", "for_disparity | \n", "fpr_disparity | \n", "fnr_disparity | \n", "tpr_disparity | \n", "tnr_disparity | \n", "npv_disparity | \n", "FNR Parity | \n", "Impact Parity | \n", "FPR Parity | \n", "TNR Parity | \n", "Equalized Odds | \n", "TPR Parity | \n", "NPV Parity | \n", "TypeII Parity | \n", "Supervised Fairness | \n", "Statistical Parity | \n", "TypeI Parity | \n", "Precision Parity | \n", "FDR Parity | \n", "Unsupervised Fairness | \n", "FOR Parity | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Sex | \n", "F | \n", "0.404462 | \n", "0.591605 | \n", "0.283671 | \n", "0.719629 | \n", "0.408395 | \n", "0.595538 | \n", "0.716329 | \n", "0.280371 | \n", "0.811687 | \n", "0.407284 | \n", "0.282327 | \n", "4.310303 | \n", "4.339924 | \n", "0.971875 | \n", "1.011403 | \n", "0.976124 | \n", "4.339974 | \n", "0.656763 | \n", "4.338625 | \n", "0.653059 | \n", "1.009781 | \n", "False | \n", "False | \n", "False | \n", "False | \n", "False | \n", "False | \n", "True | \n", "False | \n", "False | \n", "False | \n", "False | \n", "True | \n", "True | \n", "False | \n", "True | \n", "
1 | \n", "Sex | \n", "M | \n", "0.093224 | \n", "0.905899 | \n", "0.290610 | \n", "0.711515 | \n", "0.094101 | \n", "0.906776 | \n", "0.709390 | \n", "0.288485 | \n", "0.188313 | \n", "0.093846 | \n", "0.290411 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "
2 | \n", "Minority | \n", "Minor | \n", "0.350962 | \n", "0.640101 | \n", "0.292208 | \n", "0.715795 | \n", "0.359899 | \n", "0.649038 | \n", "0.707792 | \n", "0.284205 | \n", "0.410409 | \n", "0.357313 | \n", "0.289348 | \n", "0.696090 | \n", "1.727775 | \n", "1.013951 | \n", "0.994567 | \n", "1.020050 | \n", "1.728446 | \n", "0.814614 | \n", "1.726692 | \n", "0.808434 | \n", "0.991950 | \n", "True | \n", "False | \n", "False | \n", "True | \n", "False | \n", "False | \n", "True | \n", "True | \n", "False | \n", "False | \n", "False | \n", "True | \n", "True | \n", "False | \n", "True | \n", "
3 | \n", "Minority | \n", "NonMinor | \n", "0.203257 | \n", "0.791779 | \n", "0.286464 | \n", "0.719706 | \n", "0.208221 | \n", "0.796743 | \n", "0.713536 | \n", "0.280294 | \n", "0.589591 | \n", "0.206805 | \n", "0.285188 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "
sex | \n", "0.0 | \n", "1.0 | \n", "
---|---|---|
occupation_MZ11CD | \n", "\n", " | \n", " |
0.0 | \n", "26345 | \n", "18778 | \n", "
1.0 | \n", "0 | \n", "7440 | \n", "
minority | \n", "0.0 | \n", "1.0 | \n", "
---|---|---|
occupation_MZ11CD | \n", "\n", " | \n", " |
0.0 | \n", "37631 | \n", "7492 | \n", "
1.0 | \n", "0 | \n", "7440 | \n", "