induwara.lk
induwara.lkMachine Learning · Decision Trees

Gini Impurity Calculator

Compute the Gini impurity of a decision-tree node from its class counts or proportions, with the full 1 − Σ pₖ² working, a Shannon-entropy comparison, and the Gini gain of a candidate split. Matches scikit-learn. No signup, runs in your browser.

By Induwara AshinsanaUpdated Jun 10, 2026
Gini impuritynode & split
Matches scikit-learn

Comma- or space-separated, e.g. 6, 4 for 6 of class A and 4 of class B.

Examples
Gini impurity
0.4800
Shannon entropy
0.9710 bits
Max impurity (1 − 1/2)
0.5000
2 classes
Σ pₖ²
0.5200
N = 10

G = 1 − (0.6000² + 0.4000²) = 1 − 0.5200 = 0.4800

Cross-checked against the sum form Σ pₖ(1 − pₖ) = 0.4800.

Per-class working

ClassCount nₖProportion pₖpₖ²
A60.60000.3600
B40.40000.1600
Σ pₖ²0.5200
G = 1 − Σ pₖ²0.4800

Gini vs entropy: the two criteria nearly always pick the same split. Entropy is shown for comparison — see the ML metrics tools for the Shannon entropy and cross-entropy calculators.

Formula G = 1 − Σ pₖ²and the weighted impurity-decrease (Gini gain) follow scikit-learn's Decision Trees mathematical formulation. Last verified 2026-06-10. Full sources are listed below the calculator.

How it works

Gini impurity measures how mixed the class labels are at a node in a classification tree. It is the default splitting criterion in scikit-learn's DecisionTreeClassifier and the diversity index introduced by Breiman, Friedman, Olshen and Stone in Classification and Regression Trees (CART, 1984).

Let a node hold counts n₁, …, n_K over K classes, with total N = Σ nₖ. The calculation is four steps:

  1. Class proportions. For each class, pₖ = nₖ / N. These are the fractions of samples in the node belonging to each class.
  2. Square each proportion to get pₖ², then add them: Σ pₖ². This sum is the probability that two samples drawn at random from the node share a class.
  3. Gini impurity. G = 1 − Σ pₖ². Equivalently G = Σ pₖ(1 − pₖ) — the probability that two random draws differ. The calculator computes both forms and shows they agree, as a built-in cross-check.
  4. Range. G lies in [0, 1 − 1/K]. It is 0 for a pure node (one class only) and reaches its maximum 1 − 1/K when all classes are equally frequent.

For comparison the tool also reports Shannon entropy, H = −Σ pₖ log₂ pₖ bits, using the convention 0·log₂0 = 0 so pure classes do not produce NaN. Entropy is the alternative splitting criterion (scikit-learn's criterion="entropy"); in practice Gini and entropy nearly always choose the same split.

In split mode, the tool evaluates a candidate split of a parent node into two children. It computes each child's Gini Gⱼ, the sample weights wⱼ = Nⱼ / N, the weighted child impurity Σ wⱼ·Gⱼ, and the Gini gain ΔG = G_parent − Σ wⱼ·Gⱼ. This impurity decrease is exactly what the CART algorithm maximises when it chooses which feature and threshold to split on: the larger the Gini gain, the better the split separates the classes.

Worked examples

Maximally mixed binary node — counts [50, 50]

  1. N = 100, proportions p = [0.5, 0.5]
  2. Squares: 0.5² = 0.25 and 0.5² = 0.25
  3. Σ pₖ² = 0.25 + 0.25 = 0.50
  4. G = 1 − 0.50 = 0.5 (the maximum 1 − 1/2 for two classes)
  5. Entropy = −(0.5·log₂0.5 + 0.5·log₂0.5) = 1 bit

Skewed node — counts [3, 1]

  1. N = 4, proportions p = [0.75, 0.25]
  2. Squares: 0.75² = 0.5625 and 0.25² = 0.0625
  3. Σ pₖ² = 0.5625 + 0.0625 = 0.625
  4. G = 1 − 0.625 = 0.375
  5. Entropy = −(0.75·log₂0.75 + 0.25·log₂0.25) ≈ 0.8113 bits

Split mode (Gini gain) — parent [6, 4] → [4, 0] | [2, 4]

  1. Parent: G = 1 − (0.6² + 0.4²) = 1 − 0.52 = 0.48
  2. Left child [4, 0]: pure, G = 0, weight = 4/10 = 0.4
  3. Right child [2, 4]: p = [1/3, 2/3], G = 1 − (0.1111 + 0.4444) = 0.4444, weight = 6/10 = 0.6
  4. Weighted child impurity = 0.4·0 + 0.6·0.4444 = 0.2667
  5. Gini gain ΔG = 0.48 − 0.2667 = 0.2133

Frequently asked questions

Sources & references

The Gini impurity and Gini-gain formulas were last cross-checked against the scikit-learn documentation on 2026-06-10. These are standard, uncontested textbook formulas with no rates or policy that drift.

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.