explainX 3.0 is a modern, LLM-native rewrite of the explainability engine. Train any machine-learning model, then let a human or an LLM agent inspect it: understand why a prediction was made, surface bias, find the minimal change that flips a decision, and feed those insights back into training.
Where the original explainX rendered a human-only Plotly dashboard on a 2020 dependency stack, this rewrite returns structured, machine-readable results (typed objects that serialize to JSON) plus a natural-language summary — usable from plain Python and over the Model Context Protocol (MCP) so agents like Claude can call it as tools while they build models.
The goal: bring state-of-the-art explainability research into one place, with an interface designed for the era where LLMs train and debug models.
- Explain predictions — global feature importance + per-prediction reasoning.
- Debug models — counterfactuals and partial-dependence curves show what the model actually learned.
- Detect bias — group-fairness metrics (disparate impact, demographic parity, equal opportunity) answer "is my model rejecting one group regardless of profile?"
- Build trust — a plain-language summary for stakeholders and agents.
- Close the loop — results are structured so an LLM can read them and decide how to fix the training data or model.
A unified API over the methods the XAI literature identifies as the most deployed — SHAP, LIME, surrogate trees and counterfactuals — plus modern additions (ALE, anchors) and a 2024–2025 research frontier most tools skip: quantifying whether an explanation can be trusted.
| Capability | Method | Notes |
|---|---|---|
| Global importance | SHAP (auto) → permutation → intrinsic | SHAP used automatically when installed |
| Local explanation | SHAP (auto) → model-agnostic ablation | per-prediction signed contributions |
| Local surrogate | LIME (from scratch) | local weighted linear approximation |
| Sufficient rules | Anchors | high-precision IF-THEN rule per prediction |
| Counterfactuals | greedy model-agnostic search | smallest change that flips a decision |
| Global surrogate | decision tree + fidelity | inspectable glassbox rules + how faithful they are |
| Feature effects | PDP and ALE | ALE stays unbiased under feature correlation |
| Explanation quality | faithfulness + stability | does the explanation reflect the model, and is it robust? |
| Counterfactuals & recourse | greedy search with immutable / monotonic constraints | actionable "what to change" |
| Uncertainty | conformal prediction | distribution-free prediction sets / intervals with coverage guarantee |
| Fairness / bias | demographic parity, disparate impact (4/5 rule), equal opportunity | per sensitive attribute |
| Bias mitigation | post-processing per-group thresholds | detect → fix |
| Interactions | Friedman's H-statistic | which features matter together |
| Example-based | prototypes & criticisms (MMD) | representative vs. atypical cases |
| Metrics | classification + regression | accuracy/precision/recall/f1/auc, r²/mae/rmse |
| Monitoring | data drift (PSI + KS) | reference vs. current dataset |
| LLM narration | Claude (claude-opus-4-8) |
plain-language briefings / Q&A grounded in the report |
| Reporting | HTML export + CLI + dashboard | shareable artifact; no-code usage |
Beyond explaining a model, explainX helps you make it more accurate — the data-centric-AI playbook, returned as actionable reports:
| Diagnostic | What it finds | Why it improves accuracy |
|---|---|---|
| Error analysis | data slices with the highest error (slice discovery) | tells you where to add data / features or split the model |
| Label issues | likely-mislabeled rows (confident learning) | cleaning labels is often the highest-ROI accuracy lever |
| Target leakage | features that alone predict the target | catches inflated offline accuracy that collapses in production |
| Calibration | ECE / Brier + reliability | flags untrustworthy probabilities and recommends a fix |
ex.error_analysis() # ErrorAnalysis: worst slices + recommendation
ex.label_issues() # LabelIssues: rows to relabel (cross-validated)
ex.leakage() # LeakageReport: suspected leaky features
ex.calibration() # CalibrationReport: ECE, Brier, fix recommendationexplainX speaks the scikit-learn predict / predict_proba convention, so many
frameworks work with no wrapping: scikit-learn, XGBoost, LightGBM,
CatBoost (their sklearn-API estimators). For anything else, wrap_model()
adapts it — native XGBoost/LightGBM Boosters, Keras/TensorFlow, PyTorch,
statsmodels, or any custom prediction function:
from explainx import explain_model, wrap_model
explain_model(sklearn_or_xgb_or_lgbm_or_catboost_model, X, y) # direct
explain_model(wrap_model(keras_or_torch_model, task="classification"), X, y)
explain_model(wrap_model(predict_proba_fn=my_api_call, classes=[0, 1]), X, y)Runnable, studyable examples for every framework live in
examples/ (one file per framework).
⚠️ pip install explainxdoes not give you 3.0 yet. This LLM-native rewrite (v3.0.0) has not been published to PyPI, sopip install explainxcurrently still installs the legacy 2.x package. Until the 3.0 release is on PyPI, use one of the methods below.
From PyPI (works once 3.0.0 is published):
pip install "explainx[all]" # core + SHAP + MCP + drift + LLM narration + dashboard
# or minimal:
pip install explainx # core only (extras optional)From GitHub (installs the current 3.0 code on master):
pip install "git+https://github.com/explainX/explainx.git"
# with all optional extras:
pip install "explainx[all] @ git+https://github.com/explainX/explainx.git"From source (for development / running the examples & tests):
git clone https://github.com/explainX/explainx.git
cd explainx
pip install -e ".[all]" # editable install with every extra
pytest # run the test suiteExtras can be combined or used individually: shap, mcp, drift, llm,
dashboard, or all (e.g. pip install "explainx[shap,dashboard]").
from explainx import explain_model
report = explain_model(
model, X_test, y_test,
sensitive_features=["gender"], # run bias analysis on these columns
n_local=3, # explain a few individual predictions
)
print(report.summary) # natural-language briefing for a human/LLM
report.to_dict() # full structured result (JSON-ready)
report.to_json()Need finer control? Use the stateful explainer:
from explainx import ModelExplainer
ex = ModelExplainer(model, X_test, y_test)
ex.metrics() # ModelMetrics
ex.importance() # GlobalImportance (SHAP when available)
ex.explain(index=0, top_k=5) # LocalExplanation (SHAP/ablation)
ex.lime(index=0) # LocalExplanation (LIME)
ex.anchor(index=0) # Anchor: high-precision sufficient rule
ex.fairness("gender") # FairnessReport
ex.counterfactual(index=0) # Counterfactual: minimal flip
ex.recourse(index=0, immutable_features=["age", "gender"]) # actionable recourse
ex.surrogate() # SurrogateExplanation: glassbox tree + fidelity
ex.partial_dependence("income") # PartialDependence curve
ex.ale("income") # ALEResult: correlation-robust effect
ex.explanation_quality(index=0) # ExplanationQuality: faithfulness + stability
ex.conformal(X_cal, y_cal, X_test) # ConformalResult: guaranteed-coverage sets/intervals
ex.mitigate_bias("gender") # MitigationResult: per-group thresholds that fix parity
ex.interactions(top_k=5) # InteractionResult: Friedman H-statistic
ex.prototypes() # PrototypesResult: representative + atypical rowsfrom explainx.narrate import narrate_report # needs: pip install "explainx[llm]"
report = explain_model(model, X_test, y_test, sensitive_features=["gender"])
print(narrate_report(report, question="Why was applicant 5 rejected, and what would change it?"))The engine computes the evidence (SHAP, fairness, counterfactuals, conformal sets); Claude narrates it. Numbers stay in the engine, prose comes from the LLM — so the explanation is grounded, not hallucinated.
from explainx import detect_drift, save_html
detect_drift(reference_df, current_df) # DriftReport (PSI + KS per feature)
save_html(report, "report.html") # shareable page; embeds the full JSONpip install "explainx[dashboard]"
explainx-dashboardOpens a Streamlit app: upload a fitted model + dataset, then run any module (importance, local/LIME/anchor, counterfactual & recourse, PDP/ALE, interactions, fairness, mitigation, conformal, prototypes, quality, drift) or the full report, see live tables and charts, and download the HTML/JSON.
| Global importance | Local explanation |
|---|---|
![]() |
![]() |
| Fairness (bias detected) | Full report |
|---|---|
![]() |
![]() |
explainx report --model m.joblib --data d.csv --target y --sensitive gender --html out.html
explainx bias --model m.joblib --data d.csv --target y --sensitive gender
explainx drift --reference train.csv --current prod.csvpython -m explainx.examples.demoIt trains a deliberately gender-biased loan model and shows the fairness check firing, plus a counterfactual that flips a rejection to an approval.
Start the server (stdio transport):
explainx-mcp # installed console script
# or: python -m explainx.mcp_serverRegister it with an MCP client (e.g. Claude Desktop / Claude Code):
{
"mcpServers": {
"explainx": { "command": "explainx-mcp" }
}
}The agent saves a fitted model and dataset to disk, then calls tools by path:
| Tool | Purpose |
|---|---|
explain_model |
Full report (metrics, importance, local, fairness, surrogate, quality) |
feature_importance |
Global importance ranking |
explain_prediction |
Why one row was predicted as it was (SHAP/ablation) |
lime_explain_prediction |
Local LIME explanation for one row |
anchor_rule |
High-precision sufficient rule for one row |
counterfactual |
Minimal change that flips a row's class |
surrogate_rules |
Glassbox decision-tree rules + fidelity |
check_bias |
Group-fairness analysis on a sensitive feature |
model_metrics |
Performance metrics |
partial_dependence |
Marginal effect curve for a feature |
accumulated_local_effects |
Correlation-robust effect curve (ALE) |
explanation_quality |
Faithfulness + stability of an explanation |
conformal_prediction |
Guaranteed-coverage prediction sets / intervals |
actionable_recourse |
Minimal flip respecting immutable features |
mitigate_bias |
Per-group thresholds that equalize selection rate |
feature_interactions_tool |
Strongest pairwise interactions (H-statistic) |
prototypes_and_criticisms_tool |
Representative + atypical rows |
detect_data_drift |
Distribution drift between two datasets |
error_analysis |
Worst-performing data slices (slice discovery) |
label_issues |
Likely-mislabeled rows (confident learning) |
detect_target_leakage |
Features that leak the target |
assess_calibration |
Probability calibration (ECE / Brier) |
html_report |
Write a shareable HTML report |
Each returns a JSON-ready dict the agent can reason over — e.g. read a
disparate_impact_ratio below 0.8, conclude the model is biased, and rebalance
the training data.
# what the agent does first:
import joblib
joblib.dump(model, "model.joblib")
df.to_csv("data.csv", index=False) # features + target column
# then it calls: check_bias(model_path="model.joblib", data_path="data.csv",
# sensitive_feature="gender", target_column="approved")All outputs below come from the bundled demo — a deliberately gender-biased
loan-approval model. Reproduce them with python docs/generate_examples.py.
Natural-language summary (explain_model(...).summary):
Model `RandomForestClassifier` is a classification model evaluated on 800 samples across 4 features.
Performance: accuracy=1.000, precision=1.000, recall=1.000, f1=1.000, roc_auc=1.000.
The most influential features (via shap_mean_abs) are: credit_score (0.257), gender (0.168), debt_ratio (0.128), income (0.062).
A depth-4 decision-tree surrogate reproduces the model with accuracy=0.896 fidelity, giving an inspectable rule set.
Explanation quality (shap): faithfulness=1.00, stability=0.97 (higher is more trustworthy; ~1.0 is excellent).
Fairness on `gender`: BIAS DETECTED. Disparate impact ratio 0.37 is below the 0.8 four-fifths threshold:
group '0' receives the positive outcome (1) at 22.6% vs '1' at 61.3%. Demographic parity gap of 38.7%.
Recommended next steps: rebalance/reweight the training data across the sensitive groups, consider
removing or decorrelating proxy features, or apply a fairness constraint, then re-evaluate.
Global importance — ex.importance() | Local explanation — ex.explain(0)
![]() |
![]() |
|---|
Feature effects — ex.partial_dependence(...) / ex.ale(...) | Fairness — ex.fairness("gender")
![]() |
![]() |
|---|
Interactions — ex.interactions() | Conformal coverage — ex.conformal(...) | Drift — detect_drift(...)
![]() |
![]() |
![]() |
|---|
Counterfactual / recourse (gender held immutable):
credit_score: 530.3 -> 739.3 => prediction flips 0 (rejected) -> 1 (approved)
Anchor (sufficient rule): IF 410 <= credit_score <= 584 THEN rejected (precision 0.96, coverage 0.21)
Glassbox surrogate (accuracy=0.859 fidelity to the model):
|--- credit_score <= 672.83
| |--- gender <= 0.50
| | |--- income <= 62.72 -> rejected
| | |--- income > 62.72 -> rejected
| |--- gender > 0.50
| | |--- debt_ratio <= 0.44 -> approved
| | |--- debt_ratio > 0.44 -> rejected
|--- credit_score > 672.83 ...
Bias mitigation — ex.mitigate_bias("gender"): demographic-parity gap 38.7% → 0.2% via per-group thresholds.
pytest # or: python -m pytest explainx/testsThe 2020 Dash dashboard (explain.py, main.py, lib/) and its pinned,
no-longer-installable stack have been removed in favour of this engine. The new
import is explainx; explanations are returned as data rather than rendered
as a web app, which is what makes them consumable by both humans and LLMs.











