Linear Regression

Ordinary least-squares slope, intercept and R² for any (x, y) list.

Overview

The Linear Regression tool fits a straight line y = m * x + b to any list of (x, y) points by ordinary least squares. It reports the slope, intercept, R-squared goodness of fit and a residuals summary, so you can see how well the line captures your data.

It is built for analysts spotting trends in a small dataset, students doing lab reports, marketers correlating spend with revenue and engineers calibrating sensors. When you don't need scikit-learn, you don't need scikit-learn.

How it works

Given n pairs (x_i, y_i), the OLS estimates are the values of m and b that minimise the sum of squared residuals. The closed-form solution is m = Σ((x_i - x̄)(y_i - ȳ)) / Σ((x_i - x̄)^2) and b = ȳ - m * x̄, where x̄ and ȳ are the sample means.

R-squared is 1 - SS_res / SS_tot where SS_res = Σ(y_i - ŷ_i)^2 and SS_tot = Σ(y_i - ȳ)^2. It ranges from 0 (line explains nothing) to 1 (perfect fit). Negative values are impossible for an OLS fit with an intercept.

Examples

Points: (1,2), (2,4), (3,6), (4,8)
   →  slope 2, intercept 0, R² = 1

Points: (1,1), (2,2.1), (3,2.9), (4,4.2)
   →  slope ≈ 1.04, intercept ≈ -0.05, R² ≈ 0.996

Points: (10,15), (20,25), (30,30), (40,50)
   →  slope ≈ 1.1, intercept ≈ 2.5

Points: (1,5), (2,5), (3,5), (4,5)
   →  slope 0, intercept 5, R² undefined (no y variance)

FAQ

What does R-squared mean?

The fraction of variance in y explained by x. 0.9 means the line accounts for 90% of the variation; 0.1 means it explains very little.

Is R-squared the same as correlation?

For simple linear regression, R-squared equals the square of the Pearson correlation coefficient.

Does it work for non-linear data?

OLS will still produce a line, but the fit will be poor. For curves, see Polynomial Curve Fit.

How many points do I need?

At least two for a unique line, but for meaningful R-squared aim for 10+. Two points always give R² = 1 trivially.

Are outliers a problem?

Yes. OLS is sensitive to outliers because residuals are squared. Robust regression methods downweight extreme values, though they aren't included here.

Try Linear Regression