POV-Ray: Newsgroups: povray.off-topic: Linear regression

POV-Ray : Newsgroups : povray.off-topic : Linear regression		Server Time 1 Jul 2025 05:28:54 EDT (-0400)

<<< Previous 5 Messages

Goto Initial 10 Messages

From: Gilles Tran
Subject: Re: Linear regression
Date: 8 Jun 2010 11:52:01
Message: <4c0e6721@news.povray.org>


4c0e4c89$1@news.povray.org...
>> OK. I'll go see if I can get that to work...
>
> Right, that seems to work quite well...
>
> Any idea how to get a confidence estimate?

Years ago I wrote some VBA code to calculate regressions with all 
parameters. The code was adapted from pseudocode found in books of 
statistics, so parts of it are pure magic: I don't understand how it works. 
I can send it to you if you want.

G.

Post a reply to this message

From: Invisible
Subject: Re: Linear regression
Date: 8 Jun 2010 11:57:07
Message: <4c0e6853$1@news.povray.org>

Gilles Tran wrote:

> Years ago I wrote some VBA code to calculate regressions with all 
> parameters. The code was adapted from pseudocode found in books of 
> statistics, so parts of it are pure magic: I don't understand how it works. 
> I can send it to you if you want.

Well, it could certainly be worth a giggle...

(It's surprising how many companies have this "box of VBA magic" where 
they don't know how it works, but it does.)

Post a reply to this message

From: Kevin Wampler
Subject: Re: Linear regression
Date: 8 Jun 2010 12:07:40
Message: <4c0e6acc$1@news.povray.org>

Invisible wrote:
>> OK. I'll go see if I can get that to work...
> 
> Right, that seems to work quite well...
> 
> Any idea how to get a confidence estimate?

You could take a look at the section labeled "Confidence Intervals" on 
the page that scott linked.

Post a reply to this message

From: Gilles Tran
Subject: Re: Linear regression
Date: 8 Jun 2010 12:15:08
Message: <4c0e6c8c@news.povray.org>


4c0e6853$1@news.povray.org...
> Well, it could certainly be worth a giggle...

Here it is. Have fun figuring that out. Note that it pulls the data from a 
database first. Also I've removed the code that writes the results in the 
database.

G.

Public Sub Regression(RelationCode, NEXP)
Dim MaBd As Database, stSQL As String, DataSet As Recordset
Dim I As Integer, J As Integer, K As Integer, NOBS As Integer
Dim S(0 To 9) As Double, SSY As Double, D As Double, A As Double
Dim SPYX(0 To 9) As Double, SPXX(0 To 9, 0 To 9) As Double, VY As Double
Dim VYX(0 To 9) As Double, VXX(0 To 9, 0 To 9) As Double
Dim B(0 To 9) As Double, SB(0 To 9) As Double, R2 As Double, R2ADJ As 
Double, SY As Double
Dim F As Double, T(0 To 9) As Double, P As Double, PCoeff(0 To 9) As Double
Dim ta As Double, Pa As Double
Dim Unitcode As Integer, DataBasisCode As Integer, stCoeff As String
Dim SST As Double, SSR As Double, SSD As Double, SA As Double
Dim MSD As Double, MSR As Double, M(0 To 9) As Double
Set MaBd = CurrentDb
' NEXP = number of predictors
stSQL = "SELECT DISTINCTROW Y "
For I = 1 To NEXP
    stSQL = stSQL & ", X" & CStr(I)
Next I
stSQL = stSQL & " FROM RELATION_TEMP"
'Debug.Print stSQL
NOBS = 0
Set DataSet = MaBd.OpenRecordset(stSQL)
If DataSet.EOF Then Exit Sub
DataSet.MoveFirst
' Browse the data table
Do Until DataSet.EOF
    ' sum of squares Y
    SSY = SSY + (DataSet(0) ^ 2)
    ' for all variables
    For I = 0 To NEXP
        'sum of data
        S(I) = S(I) + DataSet(I)
    Next I
    ' for all predictors
    For I = 0 To NEXP - 1
        ' sum of products Y * X
        SPYX(I) = SPYX(I) + DataSet(0) * DataSet(I + 1) ' sum of products 
Y*x
        For J = 0 To NEXP - 1
            ' sum of products X * X
            SPXX(I, J) = SPXX(I, J) + DataSet(I + 1) * DataSet(J + 1)
        Next J
    Next I
    ' number of observations
    NOBS = NOBS + 1
    DataSet.MoveNext
Loop
DataSet.Close
If (NOBS - NEXP - 1 <= 0) Then
    frm!CurrentMessage = "Relation " & RelationCode & " - Not calculable  : 
N obs <= " & Str(NEXP + 1)
    Exit Sub
End If
' total sum of squares and variance of Y
SST = (SSY - S(0) ^ 2 / NOBS)
VY = SST / NOBS

' Means of Y and X (M(0)=mean for Y)
For I = 0 To NEXP
    M(I) = S(I) / NOBS
Next I

' covariances of Y and X
For I = 0 To NEXP - 1
    VYX(I) = (SPYX(I) - (S(0) * S(I + 1)) / NOBS) / NOBS
Next I

' covariances of the predictors X
For I = 0 To NEXP - 1
    For J = 0 To NEXP - 1
        VXX(I, J) = (SPXX(I, J) - (S(I + 1) * S(J + 1)) / NOBS) / NOBS
    Next J
Next I

' inversion (Gauss method) of the XX covariance matrix
' the original matrix is destroyed
For K = 0 To NEXP - 1
    D = VXX(K, K)
    If (D = 0) Then End 'null determinant
    VXX(K, K) = 1
    For J = 0 To NEXP - 1
        VXX(K, J) = VXX(K, J) / D
    Next J
    For I = 0 To NEXP - 1
        If (I <> K) Then
            D = VXX(I, K)
            VXX(I, K) = 0
            For J = 0 To NEXP - 1
                VXX(I, J) = VXX(I, J) - D * VXX(K, J)
            Next J
        End If
    Next I

Next K
' calculation of the coefficients

For I = 0 To NEXP - 1
    For J = 0 To NEXP - 1
        B(I) = B(I) + VXX(I, J) * VYX(J)
    Next J

Next I

' calculation of the sum of squares of deviations attribuable to the 
regression
For I = 0 To NEXP - 1
        SSR = SSR + B(I) * VYX(I) * NOBS
Next I

' calculation of the sum of squares of deviations about the regression
SSD = SST - SSR

' R2 and R2 for the degrees of freedom
R2 = 1 - (SSD / SST)
R2ADJ = 1 - (SSD * (NOBS - 1)) / (SST * (NOBS - NEXP - 1))

' mean square attribuable to the regression
MSR = SSR / NEXP

' mean square of deviations about the regression
MSD = SSD / (NOBS - NEXP - 1)

' standard error estimate
SY = Sqr(MSD)

' F value and Probability
F = MSR / MSD
P = FISHER(F, NEXP, NOBS - NEXP - 1)

' calculation of the standard error of the regression coefficients
For I = 0 To NEXP - 1
        SB(I) = Sqr(VXX(I, I) * MSD / NOBS)
        T(I) = B(I) / SB(I)
        PCoeff(I) = TSTUDENT(T(I), NOBS - NEXP - 1)
Next I

' constant a
A = M(0)
For I = 0 To NEXP - 1
        A = A - B(I) * M(I + 1)
Next I

' standard error of constant a, t-value and probability
SA = 0
For I = 1 To NEXP
    For J = 1 To NEXP
        SA = SA + M(I) * M(J) * VXX(I - 1, J - 1)
    Next J
Next I

SA = Sqr(MSD * (1 + SA) / NOBS)
ta = A / SA
Pa = TSTUDENT(ta, NOBS - NEXP - 1)

'For i = 0 To NEXP - 1
'    Debug.Print SY
'Next i

End Sub



Public Function TSTUDENT(T, DF)
' Calculates the probability of Student's t-distribution value of being less 
in absolute value than t
Dim teta As Double, I As Integer, K As Integer, A As Double, P As Double, Q 
As Double, cs As Double
Dim Pi As Double
Pi = 3.14159265358979
T = Abs(T)
teta = Atn(T / Sqr(DF))
If DF Mod 2 = 0 Then ' even degrees of freedom
    I = DF / 2 - 2
    Q = 1
    For K = 0 To I
       Q = Q * (K * 2 + 1) / (K * 2 + 2)
       cs = Cos(teta) ^ (K * 2 + 2)
       A = A + Q * cs
    Next K
    A = Sin(teta) * (A + 1)
Else    ' odd degrees of freedom
    If DF > 1 Then
        I = (DF - 1) / 2 - 1
        Q = 1
        For K = 0 To I
            If K = 0 Then
                Q = 1
            Else
                Q = Q * (K * 2) / (K * 2 + 1)
            End If
            cs = Cos(teta) ^ (K * 2 + 1)
            A = A + Q * cs
        Next K
        A = (2 / Pi) * (teta + Sin(teta) * A)

    Else
        A = 2 * teta / Pi
    End If
End If
    A = 1 - A
    If (A < 0.0001) Then A = 0
    TSTUDENT = A
'    Debug.Print P
End Function

Public Function FISHER(F, DF1, DF2)
' Calculates the probability of a F = (X1/DF1)/(X2/DF2)
Dim teta As Double, X As Double, I As Integer, K As Integer, A As Double, B 
As Double
Dim P As Double, Q As Double, cs As Double, NU1 As Double, NU2 As Double
Dim Pi As Double
Pi = 3.14159265358979
If DF1 Mod 2 = 0 Or DF2 Mod 2 = 0 Then
    X = DF2 / (DF2 + DF1 * F)
    If DF1 Mod 2 = 0 Then
        NU1 = DF1
        NU2 = DF2
    Else
        NU2 = DF1
        NU1 = DF2
        X = 1 - X
    End If
    I = (NU1 - 2) / 2 - 1
    Q = 1
    P = 1
    For K = 0 To I
       Q = Q * (NU2 + K * 2) / (K * 2 + 2)
       P = P + Q * ((1 - X) ^ (K + 1))
    Next K
    If DF1 Mod 2 = 0 Then
        P = (X ^ (NU2 / 2)) * P
    Else
        P = 1 - (X ^ (NU2 / 2)) * P
    End If

Else
    teta = Atn(Sqr(DF1 * F / DF2))
'    Debug.Print "teta=", teta
    NU1 = DF1
    NU2 = DF2
    ' calculation of A
    If NU2 > 1 Then
        I = (NU2 - 1) / 2 - 1
        For K = 0 To I
            If K = 0 Then
                Q = 1
            Else
                Q = Q * (K * 2) / (K * 2 + 1)
            End If
            A = A + Q * Cos(teta) ^ (K * 2 + 1)

        Next K
        A = (2 / Pi) * (teta + Sin(teta) * A)
    Else
        A = 2 * teta / Pi
    End If


    ' calculation of B
    If NU1 > 1 Then
        B = 1
        Q = 1
        If (NU1 > 3) Then
            I = (NU1 - 3) / 2 - 1
            For K = 0 To I
                Q = Q * (NU2 + K * 2 + 1) / (K * 2 + 3)
                B = B + Q * Sin(teta) ^ (K * 2 + 2)
            Next K
        End If
'        Debug.Print "b=", B
        ' note : the exponentation is necessary due to overflow
        B = (2 / Sqr(Pi)) * Exp(FACTLN((NU2 - 1) / 2) - FACTLN((NU2 - 2) / 
2)) * Sin(teta) * (Cos(teta) ^ NU2) * B
    Else
        B = 0
    End If

    P = 1 - A + B

End If
    If (P < 0.0001) Then P = 0
    FISHER = P
End Function


Public Function FACTRL(N)
Dim J As Integer, ntop As Integer
Dim A(1 To 33) As Double
ntop = 0
A(1) = 1
If (N < 0) Then
    End
ElseIf (N <= ntop) Then
    FACTRL = A(N + 1)
ElseIf (N <= 32) Then
    For J = ntop + 1 To N
    A(J + 1) = J * A(J)
    Next J
    ntop = N
    FACTRL = A(N + 1)
Else
    FACTRL = Exp(GAMMALN(N + 1))
End If

End Function

Public Function GAMMALN(xx)
' Holy cow!
Dim ser As Double, stp As Double, tmp As Double
Dim X As Double, Y As Double, cof(1 To 6) As Double
Dim J As Integer
cof(1) = 76.1800917294715
cof(2) = -86.5053203294168
cof(3) = 24.0140982408309
cof(4) = -1.23173957245015
cof(5) = 1.20865097386618E-03
cof(6) = -5.395239384953E-06
stp = 2.506628274631
X = xx
Y = X
tmp = X + 5.5
tmp = (X + 0.5) * Log(tmp) - tmp
ser = 1.00000000019001
For J = 1 To 6
    Y = Y + 1
    ser = ser + cof(J) / Y
Next J
GAMMALN = tmp + Log(stp * ser / X)
End Function

Public Function FACTLN(N)
Dim A(1 To 100) As Double
Dim I As Integer
For I = 1 To 100
    A(I) = -100
Next I
If N < 0 Then End
If N <= 99 Then
    If A(N + 1) < 0 Then A(N + 1) = GAMMALN(N + 1)
    FACTLN = A(N + 1)
Else
    FACTLN = GAMMALN(N + 1)
End If
End Function

Post a reply to this message

From: Orchid XP v8
Subject: Re: Linear regression
Date: 8 Jun 2010 13:16:56
Message: <4c0e7b08$1@news.povray.org>

>> Any idea how to get a confidence estimate?
> 
> You could take a look at the section labeled "Confidence Intervals" on 
> the page that scott linked.

Yes. Except, as originally explained, I don't understand it.

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

From: Kevin Wampler
Subject: Re: Linear regression
Date: 8 Jun 2010 15:43:15
Message: <4c0e9d53$1@news.povray.org>

Orchid XP v8 wrote:
>>> Any idea how to get a confidence estimate?
>>
>> You could take a look at the section labeled "Confidence Intervals" on 
>> the page that scott linked.
> 
> Yes. Except, as originally explained, I don't understand it.
> 

Considering that you're only looking for an implementation it doesn't 
really matter that you understand what's going on so long as you can 
just type in the equations.  Everything is pretty straightforward except 
for determining t*_(n-2), for which I suggest either finding an existing 
implementation of the inverse cumulative Student's t-distribution 
function or looking up the appropriate value from the table on 
http://en.wikipedia.org/wiki/Student's_t-distribution.

In case you're still having trouble, I whipped up some Python code to do 
what you want.  It's almost pseudocode, so hopefully it'll be pretty 
easy to follow even if you don't know Python.

------

from math import sqrt
from scipy.stats import t as tFcn

def linRegress(X, Y, conf):
	# maximum likelihood estimates
	n = len(X)
	Sx = sum(X)
	Sy = sum(Y)
	Sxx = sum(x*x for x in X)
	Sxy = sum(x*y for x, y in zip(X,Y))
	Syy = sum(y*y for y in Y)
	b = (n*Sxy - Sx*Sy)/(n*Sxx - Sx*Sx)
	a = Sy/n - b*Sx/n

	# confidence interval stuff
	ve = (n*Syy - Sy*Sy - b*b*(n*Sxx - Sx*Sx))/(n*(n-2))
	vb = n*ve / (n*Sxx - Sx*Sx)
	va = vb*Sxx / n

	tstar = tFcn(n-2).isf(1-conf)
	return (a, tstar*sqrt(va)), (b, tstar*sqrt(vb))

if __name__ == "__main__":
	# example data from the Wikipedia page
	X = [1.47, 1.50, 1.52, 1.55, 1.57, 1.60, 1.63, 1.65, 1.68, 1.70, 1.73, 
1.75, 1.78, 1.80, 1.83]
	Y = [52.21, 53.12, 54.48, 55.84, 57.20, 58.57, 59.93, 61.29, 63.11, 
64.47, 66.28, 68.10, 69.92, 72.19, 74.46]

	a, b = linRegress(X,Y,0.975)
	print "intercept:  mean=%f min=%f max=%f" % (a[0], a[0]-a[1], a[0]+a[1])
	print "slope:  mean=%f min=%f max=%f" % (b[0], b[0]-b[1], b[0]+b[1])

Post a reply to this message

From: Invisible
Subject: Re: Linear regression
Date: 9 Jun 2010 05:12:33
Message: <4c0f5b01$1@news.povray.org>

Gilles Tran wrote:

> Here it is. Have fun figuring that out.

Heh. On line 300 it says

   ' Holy cow!

So I'm guessing that's a code comment then. I've also just spent 20 
minutes sorting out all the indentation so that it's actually possible 
to follow the flow control. We'll see if I can deduce anything further...

Post a reply to this message

From: Invisible
Subject: Re: Code regression
Date: 9 Jun 2010 05:27:32
Message: <4c0f5e84$1@news.povray.org>

Invisible wrote:

> We'll see if I can deduce anything further...

Public Function FACTLN(N)
   Dim A(1 To 100) As Double
   Dim I As Integer
   For I = 1 To 100
     A(I) = -100
   Next I
   If N < 0 Then End
   If N <= 99 Then
     If A(N + 1) < 0 Then A(N + 1) = GAMMALN(N + 1)
     FACTLN = A(N + 1)
   Else
     FACTLN = GAMMALN(N + 1)
   End If
End Function

Uh... anybody want to venture a guess is to how this is different from

Public Function FACTLN(N)
   IF N < 0 Then End
   FACTLN = GAMMALN(N + 1)
End Function

 From what I can tell, it creates an array, initialises it with negative 
values, then checks to see if a given cell is negative (which it will 
be, 100% of the time), and if so, calculates a value, puts it into the 
array, and returns it.

It _looks like_ it's trying to cache previously-computed results. Except 
that the array gets recreated each time the function is called. (Unless 
I've missed something.) And even if it doesn't, each call to the 
function zeros out the whole array anyway! WTF?

Post a reply to this message

From: Gilles Tran
Subject: Re: Code regression
Date: 9 Jun 2010 10:54:02
Message: <4c0fab0a$1@news.povray.org>


4c0f5e84$1@news.povray.org...

> It _looks like_ it's trying to cache previously-computed results. Except 
> that the array gets recreated each time the function is called. (Unless 
> I've missed something.) And even if it doesn't, each call to the function 
> zeros out the whole array anyway! WTF?

The source is here:
http://books.google.com/books?id=gn_4mpdN9WkC&pg=PA208
It seems to make sense in Fortran 77 (possibly due to the way it handles 
arrays?), so I just adapted it to VBA without bothering to streamline it. In 
any case it doesn't make much sense in VBA for sure. Perhaps the FACTRL 
function can be simplified as well.
When I wrote that, the whole Fisher part seemed to run on pixie dust anyway 
and I was in a hurry. It worked, so I didn't change anything for fear of 
breaking stuff.

G.

Post a reply to this message

From: Darren New
Subject: Re: Code regression
Date: 9 Jun 2010 12:05:28
Message: <4c0fbbc8$1@news.povray.org>

Gilles Tran wrote:
> It seems to make sense in Fortran 77 (possibly due to the way it handles 
> arrays?),

Due to the statement that says
   data a


That's the fortran equivalent of "static" in C, sort of.

-- 
Darren New, San Diego CA, USA (PST)
    Eiffel - The language that lets you specify exactly
    that the code does what you think it does, even if
    it doesn't do what you wanted.

Post a reply to this message

<<< Previous 5 Messages

Goto Initial 10 Messages