Gini Impurity Calculator
Compute the Gini impurity of a decision-tree node from its class counts or proportions, with the full 1 − Σ pₖ² working, a Shannon-entropy comparison, and the Gini gain of a candidate split. Matches scikit-learn. No signup, runs in your browser.
How it works
Gini impurity measures how mixed the class labels are at a node in a classification tree. It is the default splitting criterion in scikit-learn's DecisionTreeClassifier and the diversity index introduced by Breiman, Friedman, Olshen and Stone in Classification and Regression Trees (CART, 1984).
Let a node hold counts n₁, …, n_K over K classes, with total N = Σ nₖ. The calculation is four steps:
- Class proportions. For each class, pₖ = nₖ / N. These are the fractions of samples in the node belonging to each class.
- Square each proportion to get pₖ², then add them: Σ pₖ². This sum is the probability that two samples drawn at random from the node share a class.
- Gini impurity. G = 1 − Σ pₖ². Equivalently G = Σ pₖ(1 − pₖ) — the probability that two random draws differ. The calculator computes both forms and shows they agree, as a built-in cross-check.
- Range. G lies in [0, 1 − 1/K]. It is 0 for a pure node (one class only) and reaches its maximum 1 − 1/K when all classes are equally frequent.
For comparison the tool also reports Shannon entropy, H = −Σ pₖ log₂ pₖ bits, using the convention 0·log₂0 = 0 so pure classes do not produce NaN. Entropy is the alternative splitting criterion (scikit-learn's criterion="entropy"); in practice Gini and entropy nearly always choose the same split.
In split mode, the tool evaluates a candidate split of a parent node into two children. It computes each child's Gini Gⱼ, the sample weights wⱼ = Nⱼ / N, the weighted child impurity Σ wⱼ·Gⱼ, and the Gini gain ΔG = G_parent − Σ wⱼ·Gⱼ. This impurity decrease is exactly what the CART algorithm maximises when it chooses which feature and threshold to split on: the larger the Gini gain, the better the split separates the classes.
Worked examples
Frequently asked questions
Sources & references
- scikit-learn — Decision Trees, Mathematical formulation (Gini impurity & impurity decrease)
- Wikipedia — Decision tree learning → Gini impurity (formula, range, entropy comparison)
- Breiman, Friedman, Olshen & Stone — Classification and Regression Trees (CART, 1984)
The Gini impurity and Gini-gain formulas were last cross-checked against the scikit-learn documentation on 2026-06-10. These are standard, uncontested textbook formulas with no rates or policy that drift.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found a bug, edge case, or want to suggest an improvement?
Email me at [email protected] — most fixes ship within 24 hours.