How do you calculate the Pearson correlation coefficient by hand?

Find the mean of X and of Y. For each pair, subtract the means to get the deviations (xᵢ−x̄) and (yᵢ−ȳ). Sum the products of the deviations to get Sxy, and sum each squared deviation to get Sxx and Syy. Then r = Sxy / √(Sxx·Syy). For X=[1,2,3,4,5], Y=[2,4,5,4,5]: Sxy=6, Sxx=10, Syy=6, so r = 6/√60 = 0.7746.

What does an r value of 0.7 mean?

An r of 0.7 is a strong positive linear relationship: as one variable rises, the other tends to rise too. Squaring it gives r² = 0.49, so roughly 49% of the variation in one variable is explained by a straight-line fit on the other. The remaining 51% comes from other factors or scatter. Strength labels are a guide; always check the p-value and the scatter plot too.

What is the difference between r and r-squared?

r (from −1 to +1) gives both the direction and the strength of a linear relationship. r² (from 0 to 1) is r multiplied by itself and drops the sign — it is the share of variance in one variable explained by a linear fit on the other. So r = −0.8 and r = +0.8 are opposite in direction but share the same r² = 0.64, meaning 64% of variance explained in both cases.

Is a Pearson correlation of 0.5 significant?

It depends on the sample size. Significance is decided by the t-test t = r√(n−2)/√(1−r²), not by r alone. With n=10, r=0.5 gives t≈1.63, p≈0.14 — not significant at α=0.05. With n=30, the same r=0.5 gives t≈3.06, p≈0.005 — clearly significant. This calculator reports the exact two-tailed p-value so you do not have to guess.

What is the difference between Pearson and Spearman correlation?

Pearson measures the strength of a straight-line relationship using the actual values, and assumes roughly linear, normally distributed data. Spearman first ranks the values and correlates the ranks, so it captures any monotonic relationship — including curved ones — and resists outliers. Use Pearson for linear data; switch to Spearman when the relationship is monotonic but not straight, or when outliers distort the values.

Does a strong correlation mean one variable causes the other?

No. Correlation measures how two variables move together, not why. A high r can come from one variable driving the other, from a hidden third factor influencing both, or from coincidence in a small sample. Establishing cause needs a controlled experiment or careful causal design. Always read a Pearson r as evidence of association, never of causation on its own.

How many data points do I need for Pearson's r?

You need at least 3 paired points for this tool, because the significance test uses n−2 degrees of freedom. With only 2 points r is always exactly ±1, which is meaningless. In practice, larger samples give more trustworthy results: small samples can show a high r purely by chance, which is exactly why the p-value matters as much as r itself.

Does this calculator send my data anywhere?

No. Parsing your two columns, computing the means, deviations, sums, r, r², the t-statistic and the p-value all run in your browser with plain JavaScript. Nothing is uploaded, logged, or stored, so you can paste real research or coursework data safely. The page keeps working offline once it has loaded.

Statistics · Data science

Pearson Correlation Coefficient Calculator

Paste two columns of numbers and get the Pearson correlation r, r², covariance, a significance test (t-statistic and two-tailed p-value), and a scatter plot with the full step-by-step working. Matched to scipy.stats.pearsonr, runs entirely in your browser — no signup, nothing uploaded.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 10, 2026

Pearson correlation calculator

X values

Numbers separated by commas, spaces, or new lines. Paste two Excel columns here to fill both.

Y values

Must have the same count as X — each X needs a matching Y.

Examples

Covariance / SD

Decimals

Pearson r

-0.9948

Range −1 to 1 · n = 5

r² (determination)

0.9897

99.0% of variance explained

Covariance

-4.2500

Sxy / (n−1)

Strength

Very strong negative

Scatter plot

with least-squares trend line (y = -1.700x + 11.700)

Significance test

t-statistic

-17.0000

Degrees of freedom

p-value (two-tailed)

0.0004

At α = 0.05Significant

Cross-check. The deviation-score formula gives r = -0.9948; the independent raw-score formula [n·Σxy − ΣxΣy] / √(…) gives -0.9948. They reconcile, as they must — and both match scipy.stats.pearsonr.

Step-by-step working

#	xᵢ	yᵢ	xᵢ−x̄	yᵢ−ȳ	(xᵢ−x̄)(yᵢ−ȳ)	(xᵢ−x̄)²	(yᵢ−ȳ)²
1	1.0000	10.0000	-2.0000	3.4000	-6.8000	4.0000	11.5600
2	2.0000	8.0000	-1.0000	1.4000	-1.4000	1.0000	1.9600
3	3.0000	7.0000	0.0000	0.4000	0.0000	0.0000	0.1600
4	4.0000	5.0000	1.0000	-1.6000	-1.6000	1.0000	2.5600
5	5.0000	3.0000	2.0000	-3.6000	-7.2000	4.0000	12.9600
Σ	15.0000	33.0000			-17.0000	10.0000	29.2000

x̄ = 15.0000 / 5 = 3.0000 · ȳ = 33.0000 / 5 = 6.6000

SD(X) = 1.5811 · SD(Y) = 2.7019 (sample)

r = -17.0000 / √(10.0000 × 29.2000) = -0.9948

Method: r = Σ(xᵢ−x̄)(yᵢ−ȳ) / √(Σ(xᵢ−x̄)²·Σ(yᵢ−ȳ)²); significance via t = r√(n−2)/√(1−r²) with df = n−2 — NIST e-Handbook §1.3.5.13, matched to scipy.stats.pearsonr. Nothing leaves this page.

How it works

The Pearson product-moment correlation coefficient r measures the strength and direction of the linear relationship between two paired variables. It runs from −1 (a perfect decreasing line), through 0 (no linear association), to +1 (a perfect increasing line). The definition is the one in the NIST/SEMATECH e-Handbook of Statistical Methods §1.3.5.13.

For n paired observations, with means x̄ = (Σxᵢ)/n and ȳ = (Σyᵢ)/n, the coefficient is the sum of cross-products of the deviations divided by the root of the product of the squared deviations:

r = Σ(xᵢ−x̄)(yᵢ−ȳ) / √( Σ(xᵢ−x̄)² · Σ(yᵢ−ȳ)² ) = Sxy / √(Sxx · Syy)

The tool computes this in four steps:

Means and deviations. It averages each column, then subtracts the mean from every value to get the deviations (xᵢ−x̄) and (yᵢ−ȳ).
Sums. It accumulates Sxy, Sxx, and Syy from the deviation table. If Sxx or Syy is zero — a constant column — r is undefined, so the tool shows a clear message instead of a divide-by-zero.
r and r². The correlation is Sxy / √(Sxx·Syy), and r² (the coefficient of determination) is r squared — the share of variance in one variable explained by a linear fit on the other. Covariance is Sxy/(n−1) for a sample (or Sxy/n for a whole population); r itself is unaffected by that choice.
Significance. Under the null hypothesis that the true correlation is zero, t = r√(n−2)/√(1−r²) follows a Student-t distribution with df = n−2. The two-tailed p-value is the regularized incomplete beta function I_x(df/2, 1/2) at x = df/(df+t²), the exact identity SciPy uses.

As a credibility check the calculator also recomputes r a second way — the raw-score formula [n·Σxy − ΣxΣy] / √(…) — and confirms both routes agree to floating-point precision, matching scipy.stats.pearsonr. Pearson's r assumes a roughly linear relationship; for ranked or monotonic-but-curved data, a rank correlation such as Spearman fits better. And a strong r is evidence of association, never of causation on its own.

Worked examples

Classic positive — X = [1, 2, 3, 4, 5], Y = [2, 4, 5, 4, 5]

Means: x̄ = 15/5 = 3, ȳ = 20/5 = 4
Sxy = (−2)(−2)+(−1)(0)+(0)(1)+(1)(0)+(2)(1) = 4+0+0+0+2 = 6
Sxx = 4+1+0+1+4 = 10; Syy = 4+0+1+0+1 = 6
r = 6 / √(10·6) = 6/√60 = 0.7746; r² = 0.6000 (60% of variance)
t = 0.7746·√3/√0.4 = 2.1213, df = 3, p = 0.1240 → not significant at α=0.05

Strong negative — study hours X = [1, 2, 3, 4, 5] vs exam errors Y = [10, 8, 7, 5, 3]

Means: x̄ = 3, ȳ = 33/5 = 6.6
Deviations y: 3.4, 1.4, 0.4, −1.6, −3.6
Sxy = −6.8−1.4+0−1.6−7.2 = −17; Sxx = 10; Syy = 29.2
r = −17 / √(10·29.2) = −17/√292 = −0.9948; r² = 0.9897
t = −17.0000, df = 3, p = 0.000443 → significant: more study, fewer errors

Weak / not significant — X = [−2, −1, 0, 1, 2], Y = [0.5, 0.2, 0.0, 0.3, 0.9]

Means: x̄ = 0, ȳ = 1.9/5 = 0.38
Sxy = (−2)(0.12)+(−1)(−0.18)+0+(1)(−0.08)+(2)(0.52) = 0.90
Sxx = 10; Syy = 0.468
r = 0.90 / √(10·0.468) = 0.90/√4.68 = 0.4160; r² = 0.1731
t = 0.7924, df = 3, p = 0.4860 → a weak hint, not significant at α=0.05

Frequently asked questions

Sources & references

The formulas on this page were last cross-checked against these sources on 2026-06-10. Pearson's r is a stable mathematical definition, so this tool needs no rate or schedule updates — only the worked examples are periodically re-reconciled against SciPy.

Related tools

LiveAI

Spearman Correlation

Paste two columns to get Spearman's ρ (rho), a t-test with two-tailed p-value, a strength reading, and the full rank working — with mid-rank tie correction. Matches scipy.stats.spearmanr, runs entirely in your browser.

Open tool

LiveAI

MCC Calculator

Compute the Matthews Correlation Coefficient from a confusion matrix or two label columns, with formula breakdown and imbalanced-data interpretation, entirely in the browser.

Open tool

LiveAI

Confusion Matrix Calculator

Enter the four cells of a binary confusion matrix (TP, FP, FN, TN) and instantly get accuracy, precision, recall, specificity, F1, F-beta, balanced accuracy and the Matthews correlation coefficient — each shown with its exact formula. Runs in your browser, no signup.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.