Re: Calculate slope of a data set or trend line

kux wrote:
Hello everyone,
I hope someone is out here who can help me with a simple calculation...

I have a sales data base in access with monthly sales history by
product. to make future predictions I need the slope (representing
growth) of monthly sales.
the data is setup in a crosstab with the months in different collums,
say column 1 to 12 for last year. I want to calculate the slope for
every row (on the same query or another if needed)

Does anyone have an idea how to calculate the slope of the trendline
for those values? In excel I can just use =slope(known y's, known x's).
the slope i need is from the trendline in common statistics (I believe
it is calcuated by minimizing the sum of the squared deviation - OLS)

I cannot use an ordinary excel export (too many lines).

Anyone an idea how to work around that?

Help is much appreciated,


Data points: (X1,Y1), ..., (XN, YN)

Best Fit Line: Y = AX + B

Error from the line: Ei = Yi - AXi - B

Square of each error:

Ei^2 = Yi^2 + A^2 Xi^2 + B^2 - 2A XiYi - 2B Yi + 2AB Xi

Sum of the squares of the errors:

Sum(Ei^2) = Sum(Yi^2) + A^2 Sum(Xi^2) + B^2 Sum(1) - 2A Sum(XiYi) - 2B
Sum(Yi) + 2AB Sum(Xi), i = 1,...,N

Taking the partial derivatives with respect to A and B and setting them
to 0,

w.r.t. A: 0 = 2A Sum(Xi^2) - 2 Sum(XiYi) + 2B Sum(Xi)
w.r.t. B: 0 = 2BN + 2A Sum(Xi) -2Sum(Yi)

Rearranging so that A and B are the variables:

Sum(Xi^2) A + Sum(Xi) B = Sum(XiYi)
Sum(Xi) A + N B = Sum(Yi)

as a Matrix Equation:

- - - - - -
| Sum(Xi^2) Sum(Xi) | * | A | = | Sum(XiYi) |
| Sum(Xi) N | | B | | Sum(Yi) |
- - - - - -

As long as Abs(N Sum(Xi^2) - [Sum(Xi)]^2) is not miniscule, Cramer's
rule will work without numerical instabilities:

A = [N Sum(XiYi) - Sum(Xi) Sum(Yi)] / [N Sum(Xi^2) - Sum(Xi) ^ 2]
B = [Sum(Xi) ^ 2 Sum(Yi) - Sum(Xi) Sum(XiYi)] / [N Sum(Xi^2) - Sum(Xi)
^ 2]

where each sum is performed on i = 1,...,N

So if tblData has N records of (X, Y) values:

SELECT SUM(1) * SUM(X*X) - SUM(X) * SUM(X) AS Denom, (SUM(1) * SUM(X *
Y) - SUM(X) * SUM(Y)) / Denom AS A, (SUM(X * X) * SUM(Y) - SUM(X) *
SUM(X * Y)) / Denom AS B FROM tblData;

should return unrounded values for A and B.

I didn't check the equations against a book but they seem to be giving
the correct results for a couple of simple data sets. Least squares
regression of physical data often results in miniscule Denom values.
You'll know that numerical instabilities are creeping in when the
answers you get no longer solve the 2 X 2 matrix equation. In those
cases, alternate matrix solutions such as partial or full (matrix)
pivoting during gaussian elimination, may be required. A new wrinkle
is added when the numbers are in rows. Perhaps create a public
function with 12 TotalMonthlySales arguments, that returns A (the
slope) for an N value of 12.

I hope this helps,

James A. Fortune