Skip to content

explainX/explainx

Repository files navigation

explainX: LLM-native Explainable AI

explainX 3.0 is a modern, LLM-native rewrite of the explainability engine. Train any machine-learning model, then let a human or an LLM agent inspect it: understand why a prediction was made, surface bias, find the minimal change that flips a decision, and feed those insights back into training.

Where the original explainX rendered a human-only Plotly dashboard on a 2020 dependency stack, this rewrite returns structured, machine-readable results (typed objects that serialize to JSON) plus a natural-language summary — usable from plain Python and over the Model Context Protocol (MCP) so agents like Claude can call it as tools while they build models.

The goal: bring state-of-the-art explainability research into one place, with an interface designed for the era where LLMs train and debug models.


Why

  1. Explain predictions — global feature importance + per-prediction reasoning.
  2. Debug models — counterfactuals and partial-dependence curves show what the model actually learned.
  3. Detect bias — group-fairness metrics (disparate impact, demographic parity, equal opportunity) answer "is my model rejecting one group regardless of profile?"
  4. Build trust — a plain-language summary for stakeholders and agents.
  5. Close the loop — results are structured so an LLM can read them and decide how to fix the training data or model.

What's inside

A unified API over the methods the XAI literature identifies as the most deployed — SHAP, LIME, surrogate trees and counterfactuals — plus modern additions (ALE, anchors) and a 2024–2025 research frontier most tools skip: quantifying whether an explanation can be trusted.

Capability Method Notes
Global importance SHAP (auto) → permutation → intrinsic SHAP used automatically when installed
Local explanation SHAP (auto) → model-agnostic ablation per-prediction signed contributions
Local surrogate LIME (from scratch) local weighted linear approximation
Sufficient rules Anchors high-precision IF-THEN rule per prediction
Counterfactuals greedy model-agnostic search smallest change that flips a decision
Global surrogate decision tree + fidelity inspectable glassbox rules + how faithful they are
Feature effects PDP and ALE ALE stays unbiased under feature correlation
Explanation quality faithfulness + stability does the explanation reflect the model, and is it robust?
Counterfactuals & recourse greedy search with immutable / monotonic constraints actionable "what to change"
Uncertainty conformal prediction distribution-free prediction sets / intervals with coverage guarantee
Fairness / bias demographic parity, disparate impact (4/5 rule), equal opportunity per sensitive attribute
Bias mitigation post-processing per-group thresholds detect → fix
Interactions Friedman's H-statistic which features matter together
Example-based prototypes & criticisms (MMD) representative vs. atypical cases
Metrics classification + regression accuracy/precision/recall/f1/auc, r²/mae/rmse
Monitoring data drift (PSI + KS) reference vs. current dataset
LLM narration Claude (claude-opus-4-8) plain-language briefings / Q&A grounded in the report
Reporting HTML export + CLI + dashboard shareable artifact; no-code usage

Improve accuracy (data-centric diagnostics)

Beyond explaining a model, explainX helps you make it more accurate — the data-centric-AI playbook, returned as actionable reports:

Diagnostic What it finds Why it improves accuracy
Error analysis data slices with the highest error (slice discovery) tells you where to add data / features or split the model
Label issues likely-mislabeled rows (confident learning) cleaning labels is often the highest-ROI accuracy lever
Target leakage features that alone predict the target catches inflated offline accuracy that collapses in production
Calibration ECE / Brier + reliability flags untrustworthy probabilities and recommends a fix
ex.error_analysis()   # ErrorAnalysis: worst slices + recommendation
ex.label_issues()     # LabelIssues: rows to relabel (cross-validated)
ex.leakage()          # LeakageReport: suspected leaky features
ex.calibration()      # CalibrationReport: ECE, Brier, fix recommendation

Works with any ML framework

explainX speaks the scikit-learn predict / predict_proba convention, so many frameworks work with no wrapping: scikit-learn, XGBoost, LightGBM, CatBoost (their sklearn-API estimators). For anything else, wrap_model() adapts it — native XGBoost/LightGBM Boosters, Keras/TensorFlow, PyTorch, statsmodels, or any custom prediction function:

from explainx import explain_model, wrap_model

explain_model(sklearn_or_xgb_or_lgbm_or_catboost_model, X, y)      # direct
explain_model(wrap_model(keras_or_torch_model, task="classification"), X, y)
explain_model(wrap_model(predict_proba_fn=my_api_call, classes=[0, 1]), X, y)

Runnable, studyable examples for every framework live in examples/ (one file per framework).

Install

⚠️ pip install explainx does not give you 3.0 yet. This LLM-native rewrite (v3.0.0) has not been published to PyPI, so pip install explainx currently still installs the legacy 2.x package. Until the 3.0 release is on PyPI, use one of the methods below.

From PyPI (works once 3.0.0 is published):

pip install "explainx[all]"   # core + SHAP + MCP + drift + LLM narration + dashboard
# or minimal:
pip install explainx          # core only (extras optional)

From GitHub (installs the current 3.0 code on master):

pip install "git+https://github.com/explainX/explainx.git"
# with all optional extras:
pip install "explainx[all] @ git+https://github.com/explainX/explainx.git"

From source (for development / running the examples & tests):

git clone https://github.com/explainX/explainx.git
cd explainx
pip install -e ".[all]"       # editable install with every extra
pytest                        # run the test suite

Extras can be combined or used individually: shap, mcp, drift, llm, dashboard, or all (e.g. pip install "explainx[shap,dashboard]").

Python API

from explainx import explain_model

report = explain_model(
    model, X_test, y_test,
    sensitive_features=["gender"],   # run bias analysis on these columns
    n_local=3,                       # explain a few individual predictions
)

print(report.summary)      # natural-language briefing for a human/LLM
report.to_dict()           # full structured result (JSON-ready)
report.to_json()

Need finer control? Use the stateful explainer:

from explainx import ModelExplainer

ex = ModelExplainer(model, X_test, y_test)
ex.metrics()                       # ModelMetrics
ex.importance()                    # GlobalImportance (SHAP when available)
ex.explain(index=0, top_k=5)       # LocalExplanation (SHAP/ablation)
ex.lime(index=0)                   # LocalExplanation (LIME)
ex.anchor(index=0)                 # Anchor: high-precision sufficient rule
ex.fairness("gender")              # FairnessReport
ex.counterfactual(index=0)         # Counterfactual: minimal flip
ex.recourse(index=0, immutable_features=["age", "gender"])  # actionable recourse
ex.surrogate()                     # SurrogateExplanation: glassbox tree + fidelity
ex.partial_dependence("income")    # PartialDependence curve
ex.ale("income")                   # ALEResult: correlation-robust effect
ex.explanation_quality(index=0)    # ExplanationQuality: faithfulness + stability
ex.conformal(X_cal, y_cal, X_test) # ConformalResult: guaranteed-coverage sets/intervals
ex.mitigate_bias("gender")         # MitigationResult: per-group thresholds that fix parity
ex.interactions(top_k=5)           # InteractionResult: Friedman H-statistic
ex.prototypes()                    # PrototypesResult: representative + atypical rows

LLM narration (optional)

from explainx.narrate import narrate_report   # needs: pip install "explainx[llm]"

report = explain_model(model, X_test, y_test, sensitive_features=["gender"])
print(narrate_report(report, question="Why was applicant 5 rejected, and what would change it?"))

The engine computes the evidence (SHAP, fairness, counterfactuals, conformal sets); Claude narrates it. Numbers stay in the engine, prose comes from the LLM — so the explanation is grounded, not hallucinated.

Monitoring & reporting

from explainx import detect_drift, save_html

detect_drift(reference_df, current_df)   # DriftReport (PSI + KS per feature)
save_html(report, "report.html")         # shareable page; embeds the full JSON

Interactive dashboard

pip install "explainx[dashboard]"
explainx-dashboard

Opens a Streamlit app: upload a fitted model + dataset, then run any module (importance, local/LIME/anchor, counterfactual & recourse, PDP/ALE, interactions, fairness, mitigation, conformal, prototypes, quality, drift) or the full report, see live tables and charts, and download the HTML/JSON.

Global importance Local explanation
Global importance view Local explanation view
Fairness (bias detected) Full report
Fairness view Full report view

No-code CLI

explainx report --model m.joblib --data d.csv --target y --sensitive gender --html out.html
explainx bias   --model m.joblib --data d.csv --target y --sensitive gender
explainx drift  --reference train.csv --current prod.csv

Try the demo

python -m explainx.examples.demo

It trains a deliberately gender-biased loan model and shows the fairness check firing, plus a counterfactual that flips a rejection to an approval.

Use it from an LLM agent (MCP)

Start the server (stdio transport):

explainx-mcp              # installed console script
# or:  python -m explainx.mcp_server

Register it with an MCP client (e.g. Claude Desktop / Claude Code):

{
  "mcpServers": {
    "explainx": { "command": "explainx-mcp" }
  }
}

The agent saves a fitted model and dataset to disk, then calls tools by path:

Tool Purpose
explain_model Full report (metrics, importance, local, fairness, surrogate, quality)
feature_importance Global importance ranking
explain_prediction Why one row was predicted as it was (SHAP/ablation)
lime_explain_prediction Local LIME explanation for one row
anchor_rule High-precision sufficient rule for one row
counterfactual Minimal change that flips a row's class
surrogate_rules Glassbox decision-tree rules + fidelity
check_bias Group-fairness analysis on a sensitive feature
model_metrics Performance metrics
partial_dependence Marginal effect curve for a feature
accumulated_local_effects Correlation-robust effect curve (ALE)
explanation_quality Faithfulness + stability of an explanation
conformal_prediction Guaranteed-coverage prediction sets / intervals
actionable_recourse Minimal flip respecting immutable features
mitigate_bias Per-group thresholds that equalize selection rate
feature_interactions_tool Strongest pairwise interactions (H-statistic)
prototypes_and_criticisms_tool Representative + atypical rows
detect_data_drift Distribution drift between two datasets
error_analysis Worst-performing data slices (slice discovery)
label_issues Likely-mislabeled rows (confident learning)
detect_target_leakage Features that leak the target
assess_calibration Probability calibration (ECE / Brier)
html_report Write a shareable HTML report

Each returns a JSON-ready dict the agent can reason over — e.g. read a disparate_impact_ratio below 0.8, conclude the model is biased, and rebalance the training data.

# what the agent does first:
import joblib
joblib.dump(model, "model.joblib")
df.to_csv("data.csv", index=False)   # features + target column
# then it calls:  check_bias(model_path="model.joblib", data_path="data.csv",
#                            sensitive_feature="gender", target_column="approved")

Example outputs

All outputs below come from the bundled demo — a deliberately gender-biased loan-approval model. Reproduce them with python docs/generate_examples.py.

Natural-language summary (explain_model(...).summary):

Model `RandomForestClassifier` is a classification model evaluated on 800 samples across 4 features.
Performance: accuracy=1.000, precision=1.000, recall=1.000, f1=1.000, roc_auc=1.000.
The most influential features (via shap_mean_abs) are: credit_score (0.257), gender (0.168), debt_ratio (0.128), income (0.062).
A depth-4 decision-tree surrogate reproduces the model with accuracy=0.896 fidelity, giving an inspectable rule set.
Explanation quality (shap): faithfulness=1.00, stability=0.97 (higher is more trustworthy; ~1.0 is excellent).
Fairness on `gender`: BIAS DETECTED. Disparate impact ratio 0.37 is below the 0.8 four-fifths threshold:
group '0' receives the positive outcome (1) at 22.6% vs '1' at 61.3%. Demographic parity gap of 38.7%.
Recommended next steps: rebalance/reweight the training data across the sensitive groups, consider
removing or decorrelating proxy features, or apply a fairness constraint, then re-evaluate.

Global importanceex.importance()  |  Local explanationex.explain(0)

global importance local explanation

Feature effectsex.partial_dependence(...) / ex.ale(...)  |  Fairnessex.fairness("gender")

pdp and ale fairness

Interactionsex.interactions()  |  Conformal coverageex.conformal(...)  |  Driftdetect_drift(...)

interactions conformal drift

Counterfactual / recourse (gender held immutable):

credit_score: 530.3 -> 739.3   =>  prediction flips 0 (rejected) -> 1 (approved)

Anchor (sufficient rule): IF 410 <= credit_score <= 584 THEN rejected (precision 0.96, coverage 0.21)

Glassbox surrogate (accuracy=0.859 fidelity to the model):

|--- credit_score <= 672.83
|   |--- gender <= 0.50
|   |   |--- income <= 62.72  -> rejected
|   |   |--- income >  62.72  -> rejected
|   |--- gender >  0.50
|   |   |--- debt_ratio <= 0.44 -> approved
|   |   |--- debt_ratio >  0.44 -> rejected
|--- credit_score >  672.83 ...

Bias mitigationex.mitigate_bias("gender"): demographic-parity gap 38.7% → 0.2% via per-group thresholds.

Tests

pytest          # or: python -m pytest explainx/tests

Migrating from legacy explainX

The 2020 Dash dashboard (explain.py, main.py, lib/) and its pinned, no-longer-installable stack have been removed in favour of this engine. The new import is explainx; explanations are returned as data rather than rendered as a web app, which is what makes them consumable by both humans and LLMs.

License

MIT