Linear Regression
Ordinary least-squares slope, intercept and R² for any (x, y) list.
Overview
The Linear Regression tool fits a straight line y = m * x + b to any list of (x, y) points by ordinary least squares. It reports the slope, intercept, R-squared goodness of fit and a residuals summary, so you can see how well the line captures your data.
It is built for analysts spotting trends in a small dataset, students doing lab reports, marketers correlating spend with revenue and engineers calibrating sensors. When you don't need scikit-learn, you don't need scikit-learn.
How it works
Given n pairs (x_i, y_i), the OLS estimates are the values of m and b that minimise the sum of squared residuals. The closed-form solution is m = Σ((x_i - x̄)(y_i - ȳ)) / Σ((x_i - x̄)^2) and b = ȳ - m * x̄, where x̄ and ȳ are the sample means.
R-squared is 1 - SS_res / SS_tot where SS_res = Σ(y_i - ŷ_i)^2 and SS_tot = Σ(y_i - ȳ)^2. It ranges from 0 (line explains nothing) to 1 (perfect fit). Negative values are impossible for an OLS fit with an intercept.
Examples
Points: (1,2), (2,4), (3,6), (4,8)
→ slope 2, intercept 0, R² = 1
Points: (1,1), (2,2.1), (3,2.9), (4,4.2)
→ slope ≈ 1.04, intercept ≈ -0.05, R² ≈ 0.996
Points: (10,15), (20,25), (30,30), (40,50)
→ slope ≈ 1.1, intercept ≈ 2.5
Points: (1,5), (2,5), (3,5), (4,5)
→ slope 0, intercept 5, R² undefined (no y variance)
FAQ
What does R-squared mean?
The fraction of variance in y explained by x. 0.9 means the line accounts for 90% of the variation; 0.1 means it explains very little.
Is R-squared the same as correlation?
For simple linear regression, R-squared equals the square of the Pearson correlation coefficient.
Does it work for non-linear data?
OLS will still produce a line, but the fit will be poor. For curves, see Polynomial Curve Fit.
How many points do I need?
At least two for a unique line, but for meaningful R-squared aim for 10+. Two points always give R² = 1 trivially.
Are outliers a problem?
Yes. OLS is sensitive to outliers because residuals are squared. Robust regression methods downweight extreme values, though they aren't included here.