 |
 |
|
 |
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
4c0e4c89$1@news.povray.org...
>> OK. I'll go see if I can get that to work...
>
> Right, that seems to work quite well...
>
> Any idea how to get a confidence estimate?
Years ago I wrote some VBA code to calculate regressions with all
parameters. The code was adapted from pseudocode found in books of
statistics, so parts of it are pure magic: I don't understand how it works.
I can send it to you if you want.
G.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Gilles Tran wrote:
> Years ago I wrote some VBA code to calculate regressions with all
> parameters. The code was adapted from pseudocode found in books of
> statistics, so parts of it are pure magic: I don't understand how it works.
> I can send it to you if you want.
Well, it could certainly be worth a giggle...
(It's surprising how many companies have this "box of VBA magic" where
they don't know how it works, but it does.)
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible wrote:
>> OK. I'll go see if I can get that to work...
>
> Right, that seems to work quite well...
>
> Any idea how to get a confidence estimate?
You could take a look at the section labeled "Confidence Intervals" on
the page that scott linked.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
4c0e6853$1@news.povray.org...
> Well, it could certainly be worth a giggle...
Here it is. Have fun figuring that out. Note that it pulls the data from a
database first. Also I've removed the code that writes the results in the
database.
G.
Public Sub Regression(RelationCode, NEXP)
Dim MaBd As Database, stSQL As String, DataSet As Recordset
Dim I As Integer, J As Integer, K As Integer, NOBS As Integer
Dim S(0 To 9) As Double, SSY As Double, D As Double, A As Double
Dim SPYX(0 To 9) As Double, SPXX(0 To 9, 0 To 9) As Double, VY As Double
Dim VYX(0 To 9) As Double, VXX(0 To 9, 0 To 9) As Double
Dim B(0 To 9) As Double, SB(0 To 9) As Double, R2 As Double, R2ADJ As
Double, SY As Double
Dim F As Double, T(0 To 9) As Double, P As Double, PCoeff(0 To 9) As Double
Dim ta As Double, Pa As Double
Dim Unitcode As Integer, DataBasisCode As Integer, stCoeff As String
Dim SST As Double, SSR As Double, SSD As Double, SA As Double
Dim MSD As Double, MSR As Double, M(0 To 9) As Double
Set MaBd = CurrentDb
' NEXP = number of predictors
stSQL = "SELECT DISTINCTROW Y "
For I = 1 To NEXP
stSQL = stSQL & ", X" & CStr(I)
Next I
stSQL = stSQL & " FROM RELATION_TEMP"
'Debug.Print stSQL
NOBS = 0
Set DataSet = MaBd.OpenRecordset(stSQL)
If DataSet.EOF Then Exit Sub
DataSet.MoveFirst
' Browse the data table
Do Until DataSet.EOF
' sum of squares Y
SSY = SSY + (DataSet(0) ^ 2)
' for all variables
For I = 0 To NEXP
'sum of data
S(I) = S(I) + DataSet(I)
Next I
' for all predictors
For I = 0 To NEXP - 1
' sum of products Y * X
SPYX(I) = SPYX(I) + DataSet(0) * DataSet(I + 1) ' sum of products
Y*x
For J = 0 To NEXP - 1
' sum of products X * X
SPXX(I, J) = SPXX(I, J) + DataSet(I + 1) * DataSet(J + 1)
Next J
Next I
' number of observations
NOBS = NOBS + 1
DataSet.MoveNext
Loop
DataSet.Close
If (NOBS - NEXP - 1 <= 0) Then
frm!CurrentMessage = "Relation " & RelationCode & " - Not calculable :
N obs <= " & Str(NEXP + 1)
Exit Sub
End If
' total sum of squares and variance of Y
SST = (SSY - S(0) ^ 2 / NOBS)
VY = SST / NOBS
' Means of Y and X (M(0)=mean for Y)
For I = 0 To NEXP
M(I) = S(I) / NOBS
Next I
' covariances of Y and X
For I = 0 To NEXP - 1
VYX(I) = (SPYX(I) - (S(0) * S(I + 1)) / NOBS) / NOBS
Next I
' covariances of the predictors X
For I = 0 To NEXP - 1
For J = 0 To NEXP - 1
VXX(I, J) = (SPXX(I, J) - (S(I + 1) * S(J + 1)) / NOBS) / NOBS
Next J
Next I
' inversion (Gauss method) of the XX covariance matrix
' the original matrix is destroyed
For K = 0 To NEXP - 1
D = VXX(K, K)
If (D = 0) Then End 'null determinant
VXX(K, K) = 1
For J = 0 To NEXP - 1
VXX(K, J) = VXX(K, J) / D
Next J
For I = 0 To NEXP - 1
If (I <> K) Then
D = VXX(I, K)
VXX(I, K) = 0
For J = 0 To NEXP - 1
VXX(I, J) = VXX(I, J) - D * VXX(K, J)
Next J
End If
Next I
Next K
' calculation of the coefficients
For I = 0 To NEXP - 1
For J = 0 To NEXP - 1
B(I) = B(I) + VXX(I, J) * VYX(J)
Next J
Next I
' calculation of the sum of squares of deviations attribuable to the
regression
For I = 0 To NEXP - 1
SSR = SSR + B(I) * VYX(I) * NOBS
Next I
' calculation of the sum of squares of deviations about the regression
SSD = SST - SSR
' R2 and R2 for the degrees of freedom
R2 = 1 - (SSD / SST)
R2ADJ = 1 - (SSD * (NOBS - 1)) / (SST * (NOBS - NEXP - 1))
' mean square attribuable to the regression
MSR = SSR / NEXP
' mean square of deviations about the regression
MSD = SSD / (NOBS - NEXP - 1)
' standard error estimate
SY = Sqr(MSD)
' F value and Probability
F = MSR / MSD
P = FISHER(F, NEXP, NOBS - NEXP - 1)
' calculation of the standard error of the regression coefficients
For I = 0 To NEXP - 1
SB(I) = Sqr(VXX(I, I) * MSD / NOBS)
T(I) = B(I) / SB(I)
PCoeff(I) = TSTUDENT(T(I), NOBS - NEXP - 1)
Next I
' constant a
A = M(0)
For I = 0 To NEXP - 1
A = A - B(I) * M(I + 1)
Next I
' standard error of constant a, t-value and probability
SA = 0
For I = 1 To NEXP
For J = 1 To NEXP
SA = SA + M(I) * M(J) * VXX(I - 1, J - 1)
Next J
Next I
SA = Sqr(MSD * (1 + SA) / NOBS)
ta = A / SA
Pa = TSTUDENT(ta, NOBS - NEXP - 1)
'For i = 0 To NEXP - 1
' Debug.Print SY
'Next i
End Sub
Public Function TSTUDENT(T, DF)
' Calculates the probability of Student's t-distribution value of being less
in absolute value than t
Dim teta As Double, I As Integer, K As Integer, A As Double, P As Double, Q
As Double, cs As Double
Dim Pi As Double
Pi = 3.14159265358979
T = Abs(T)
teta = Atn(T / Sqr(DF))
If DF Mod 2 = 0 Then ' even degrees of freedom
I = DF / 2 - 2
Q = 1
For K = 0 To I
Q = Q * (K * 2 + 1) / (K * 2 + 2)
cs = Cos(teta) ^ (K * 2 + 2)
A = A + Q * cs
Next K
A = Sin(teta) * (A + 1)
Else ' odd degrees of freedom
If DF > 1 Then
I = (DF - 1) / 2 - 1
Q = 1
For K = 0 To I
If K = 0 Then
Q = 1
Else
Q = Q * (K * 2) / (K * 2 + 1)
End If
cs = Cos(teta) ^ (K * 2 + 1)
A = A + Q * cs
Next K
A = (2 / Pi) * (teta + Sin(teta) * A)
Else
A = 2 * teta / Pi
End If
End If
A = 1 - A
If (A < 0.0001) Then A = 0
TSTUDENT = A
' Debug.Print P
End Function
Public Function FISHER(F, DF1, DF2)
' Calculates the probability of a F = (X1/DF1)/(X2/DF2)
Dim teta As Double, X As Double, I As Integer, K As Integer, A As Double, B
As Double
Dim P As Double, Q As Double, cs As Double, NU1 As Double, NU2 As Double
Dim Pi As Double
Pi = 3.14159265358979
If DF1 Mod 2 = 0 Or DF2 Mod 2 = 0 Then
X = DF2 / (DF2 + DF1 * F)
If DF1 Mod 2 = 0 Then
NU1 = DF1
NU2 = DF2
Else
NU2 = DF1
NU1 = DF2
X = 1 - X
End If
I = (NU1 - 2) / 2 - 1
Q = 1
P = 1
For K = 0 To I
Q = Q * (NU2 + K * 2) / (K * 2 + 2)
P = P + Q * ((1 - X) ^ (K + 1))
Next K
If DF1 Mod 2 = 0 Then
P = (X ^ (NU2 / 2)) * P
Else
P = 1 - (X ^ (NU2 / 2)) * P
End If
Else
teta = Atn(Sqr(DF1 * F / DF2))
' Debug.Print "teta=", teta
NU1 = DF1
NU2 = DF2
' calculation of A
If NU2 > 1 Then
I = (NU2 - 1) / 2 - 1
For K = 0 To I
If K = 0 Then
Q = 1
Else
Q = Q * (K * 2) / (K * 2 + 1)
End If
A = A + Q * Cos(teta) ^ (K * 2 + 1)
Next K
A = (2 / Pi) * (teta + Sin(teta) * A)
Else
A = 2 * teta / Pi
End If
' calculation of B
If NU1 > 1 Then
B = 1
Q = 1
If (NU1 > 3) Then
I = (NU1 - 3) / 2 - 1
For K = 0 To I
Q = Q * (NU2 + K * 2 + 1) / (K * 2 + 3)
B = B + Q * Sin(teta) ^ (K * 2 + 2)
Next K
End If
' Debug.Print "b=", B
' note : the exponentation is necessary due to overflow
B = (2 / Sqr(Pi)) * Exp(FACTLN((NU2 - 1) / 2) - FACTLN((NU2 - 2) /
2)) * Sin(teta) * (Cos(teta) ^ NU2) * B
Else
B = 0
End If
P = 1 - A + B
End If
If (P < 0.0001) Then P = 0
FISHER = P
End Function
Public Function FACTRL(N)
Dim J As Integer, ntop As Integer
Dim A(1 To 33) As Double
ntop = 0
A(1) = 1
If (N < 0) Then
End
ElseIf (N <= ntop) Then
FACTRL = A(N + 1)
ElseIf (N <= 32) Then
For J = ntop + 1 To N
A(J + 1) = J * A(J)
Next J
ntop = N
FACTRL = A(N + 1)
Else
FACTRL = Exp(GAMMALN(N + 1))
End If
End Function
Public Function GAMMALN(xx)
' Holy cow!
Dim ser As Double, stp As Double, tmp As Double
Dim X As Double, Y As Double, cof(1 To 6) As Double
Dim J As Integer
cof(1) = 76.1800917294715
cof(2) = -86.5053203294168
cof(3) = 24.0140982408309
cof(4) = -1.23173957245015
cof(5) = 1.20865097386618E-03
cof(6) = -5.395239384953E-06
stp = 2.506628274631
X = xx
Y = X
tmp = X + 5.5
tmp = (X + 0.5) * Log(tmp) - tmp
ser = 1.00000000019001
For J = 1 To 6
Y = Y + 1
ser = ser + cof(J) / Y
Next J
GAMMALN = tmp + Log(stp * ser / X)
End Function
Public Function FACTLN(N)
Dim A(1 To 100) As Double
Dim I As Integer
For I = 1 To 100
A(I) = -100
Next I
If N < 0 Then End
If N <= 99 Then
If A(N + 1) < 0 Then A(N + 1) = GAMMALN(N + 1)
FACTLN = A(N + 1)
Else
FACTLN = GAMMALN(N + 1)
End If
End Function
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
>> Any idea how to get a confidence estimate?
>
> You could take a look at the section labeled "Confidence Intervals" on
> the page that scott linked.
Yes. Except, as originally explained, I don't understand it.
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Orchid XP v8 wrote:
>>> Any idea how to get a confidence estimate?
>>
>> You could take a look at the section labeled "Confidence Intervals" on
>> the page that scott linked.
>
> Yes. Except, as originally explained, I don't understand it.
>
Considering that you're only looking for an implementation it doesn't
really matter that you understand what's going on so long as you can
just type in the equations. Everything is pretty straightforward except
for determining t*_(n-2), for which I suggest either finding an existing
implementation of the inverse cumulative Student's t-distribution
function or looking up the appropriate value from the table on
http://en.wikipedia.org/wiki/Student's_t-distribution.
In case you're still having trouble, I whipped up some Python code to do
what you want. It's almost pseudocode, so hopefully it'll be pretty
easy to follow even if you don't know Python.
------
from math import sqrt
from scipy.stats import t as tFcn
def linRegress(X, Y, conf):
# maximum likelihood estimates
n = len(X)
Sx = sum(X)
Sy = sum(Y)
Sxx = sum(x*x for x in X)
Sxy = sum(x*y for x, y in zip(X,Y))
Syy = sum(y*y for y in Y)
b = (n*Sxy - Sx*Sy)/(n*Sxx - Sx*Sx)
a = Sy/n - b*Sx/n
# confidence interval stuff
ve = (n*Syy - Sy*Sy - b*b*(n*Sxx - Sx*Sx))/(n*(n-2))
vb = n*ve / (n*Sxx - Sx*Sx)
va = vb*Sxx / n
tstar = tFcn(n-2).isf(1-conf)
return (a, tstar*sqrt(va)), (b, tstar*sqrt(vb))
if __name__ == "__main__":
# example data from the Wikipedia page
X = [1.47, 1.50, 1.52, 1.55, 1.57, 1.60, 1.63, 1.65, 1.68, 1.70, 1.73,
1.75, 1.78, 1.80, 1.83]
Y = [52.21, 53.12, 54.48, 55.84, 57.20, 58.57, 59.93, 61.29, 63.11,
64.47, 66.28, 68.10, 69.92, 72.19, 74.46]
a, b = linRegress(X,Y,0.975)
print "intercept: mean=%f min=%f max=%f" % (a[0], a[0]-a[1], a[0]+a[1])
print "slope: mean=%f min=%f max=%f" % (b[0], b[0]-b[1], b[0]+b[1])
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Gilles Tran wrote:
> Here it is. Have fun figuring that out.
Heh. On line 300 it says
' Holy cow!
So I'm guessing that's a code comment then. I've also just spent 20
minutes sorting out all the indentation so that it's actually possible
to follow the flow control. We'll see if I can deduce anything further...
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible wrote:
> We'll see if I can deduce anything further...
Public Function FACTLN(N)
Dim A(1 To 100) As Double
Dim I As Integer
For I = 1 To 100
A(I) = -100
Next I
If N < 0 Then End
If N <= 99 Then
If A(N + 1) < 0 Then A(N + 1) = GAMMALN(N + 1)
FACTLN = A(N + 1)
Else
FACTLN = GAMMALN(N + 1)
End If
End Function
Uh... anybody want to venture a guess is to how this is different from
Public Function FACTLN(N)
IF N < 0 Then End
FACTLN = GAMMALN(N + 1)
End Function
From what I can tell, it creates an array, initialises it with negative
values, then checks to see if a given cell is negative (which it will
be, 100% of the time), and if so, calculates a value, puts it into the
array, and returns it.
It _looks like_ it's trying to cache previously-computed results. Except
that the array gets recreated each time the function is called. (Unless
I've missed something.) And even if it doesn't, each call to the
function zeros out the whole array anyway! WTF?
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
4c0f5e84$1@news.povray.org...
> It _looks like_ it's trying to cache previously-computed results. Except
> that the array gets recreated each time the function is called. (Unless
> I've missed something.) And even if it doesn't, each call to the function
> zeros out the whole array anyway! WTF?
The source is here:
http://books.google.com/books?id=gn_4mpdN9WkC&pg=PA208
It seems to make sense in Fortran 77 (possibly due to the way it handles
arrays?), so I just adapted it to VBA without bothering to streamline it. In
any case it doesn't make much sense in VBA for sure. Perhaps the FACTRL
function can be simplified as well.
When I wrote that, the whole Fisher part seemed to run on pixie dust anyway
and I was in a hurry. It worked, so I didn't change anything for fear of
breaking stuff.
G.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Gilles Tran wrote:
> It seems to make sense in Fortran 77 (possibly due to the way it handles
> arrays?),
Due to the statement that says
data a
That's the fortran equivalent of "static" in C, sort of.
--
Darren New, San Diego CA, USA (PST)
Eiffel - The language that lets you specify exactly
that the code does what you think it does, even if
it doesn't do what you wanted.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|
 |