POV-Ray: Newsgroups: povray.off-topic: SQL question

POV-Ray : Newsgroups : povray.off-topic : SQL question		Server Time 28 Dec 2025 23:53:23 EST (-0500)

<<< Previous 7 Messages

Goto Initial 10 Messages

From: scott
Subject: Re: SQL question
Date: 12 Mar 2009 03:58:36
Message: <49b8c0ac@news.povray.org>

> I don't know what database engine you're using,

MS SQL CE, I think you pointed me to that one :-)

> select count(distinct expid) from exp where eventid=1;
> and you'll get just the number of rows.
>
> Other servers use the term "unique" instead of "distinct". Check your 
> documentation.

Hmmm, SELECT DISTINCT expid works, but SELECT COUNT(DISTINCT expid) throws a 
parse error at "DISTINCT" from SQL CE at runtime.  Actually I got this 
before when I was semi-experimenting the other night and couldn't work out 
why that didn't work.

<googling>

OK it seems that SQL CE doesn't support DISTINCT inside COUNT, great.  They 
suggest workarounds, and this works:

SELECT COUNT(*) FROM (SELECT DISTINCT expid FROM exp WHERE eventid=1) AS T

Finally!

So I take it using SELECT DISTINCT X ... is better than using SELECT X ... 
GROUP BY X for what I want to do?

> Another way might be to do something really messy with outer joins on a 
> table so you get something with an eventid column that's either "1" or 
> NULL, turn the NULLs into 0's, and then find the AVG(), but it's probably 
> way, way easier to do two SQL statements, even if you wind up using temp 
> variables and doing the math on the server.

OK thanks for the tips, I think I will try and get the basic queries working 
neatly and fast first :-)

Post a reply to this message

From: scott
Subject: Re: SQL question
Date: 12 Mar 2009 06:29:35
Message: <49b8e40f$1@news.povray.org>

> So I take it using SELECT DISTINCT X ... is better than using SELECT X ... 
> GROUP BY X for what I want to do?

OK, so this is weird.

When I call COUNT(*) FROM (SELECT X ... GROUP BY X) on my test data set 
(about 100k eventID rows) it runs in 0.25s. That's a bit slow, but hmm ok.

If I change the SQL statement to COUNT(*) FROM (SELECT DISTINCT X ... ) it 
takes 11.27s.  Huh?

Also, with the DISTINCT method, as I repeatedly execute the query the 
database size seems to have grown from 7MB to 14MB then to 31MB... HUH?  The 
database size goes up by 4x just by executing a query a few times?  Maybe 
it's optimising something for future queries, but it doesn't seem to get any 
faster the 2nd or 3rd time...

Post a reply to this message

From: [GDS|Entropy]
Subject: Re: SQL question
Date: 12 Mar 2009 07:23:46
Message: <49b8f0c2$1@news.povray.org>

"scott" <sco### [at] scottcom> wrote in message 
news:49b7765e$1@news.povray.org...
>My goal is to calculate, for example, the % of experiments that have had a 
>certain EventID happen one or more times.

This probably won't help too much following the context established by some 
other posts, but if you were just attempting to analyze the data from within 
an app you could try C# aggregate functions... You can do % and count, as 
well as many other things with those:

saleTotalValueNet_label.Text = 
Convert.ToString((decimal)newDataSet.SaleItems.Compute("SUM([totalWithTax])", 
""));

Some solution based on this sort of thing would not work as quickly as a 
TSQL statement, but I seriously doubt it would increase your DB size either.

If you had a partially narrowed down dataSet, you could just perform the 
requisite calculations with aggregate functions (and their output) once to 
get the values you are looking for. I don't know what the scope of analysis 
or refresh interval might be, and both of which will affect your selection 
of an acceptable compound solution.

Just a half-asleep thought...

ian

Post a reply to this message

From: Darren New
Subject: Re: SQL question
Date: 12 Mar 2009 12:48:17
Message: <49b93cd1$1@news.povray.org>

scott wrote:
> So I take it using SELECT DISTINCT X ... is better than using SELECT X 
> ... GROUP BY X for what I want to do?

I don't know. I'd say it's more "correct" in some abstract theoretical 
sense, since the only reason you're grouping is to get unique rows. And it 
works in the case you're trying. :-)

If you wanted to know how many different types of events each experiment 
yielded, you'd do something like
   select expid, count(eventid) from exp group by expid
or something like that. (Not gonna fire up SQL to try it.)

But you see what I'm saying there.

-- 
   Darren New, San Diego CA, USA (PST)
   My fortune cookie said, "You will soon be
   unable to read this, even at arm's length."

Post a reply to this message

From: Darren New
Subject: Re: SQL question
Date: 12 Mar 2009 12:50:19
Message: <49b93d4b$1@news.povray.org>

scott wrote:
> When I call COUNT(*) FROM (SELECT X ... GROUP BY X) on my test data set 
> (about 100k eventID rows) it runs in 0.25s. That's a bit slow, but hmm ok.
> 
> If I change the SQL statement to COUNT(*) FROM (SELECT DISTINCT X ... ) 
> it takes 11.27s.  Huh?

I'm guessing when you do a SELECT DISTINCT, it's internally creating a 
temporary table.

> Also, with the DISTINCT method, as I repeatedly execute the query the 
> database size seems to have grown from 7MB to 14MB then to 31MB... HUH?  

Do you have an option to compact the database again and see if it shrinks? 
Did you perchance close the connection to the database between queries so it 
frees up temporary tables?  (I'm just guessing here.)

Do these questions not even make sense for the compact edition? :-)

-- 
   Darren New, San Diego CA, USA (PST)
   My fortune cookie said, "You will soon be
   unable to read this, even at arm's length."

Post a reply to this message

From: scott
Subject: Re: SQL question
Date: 13 Mar 2009 04:22:00
Message: <49ba17a8$1@news.povray.org>

> Do you have an option to compact the database again and see if it shrinks? 
> Did you perchance close the connection to the database between queries so 
> it frees up temporary tables?  (I'm just guessing here.)
>
> Do these questions not even make sense for the compact edition? :-)

Yes they make sense, but I haven't tried that because the speed penalty 
seems bad anyway so won't use the distinct method for now.

Post a reply to this message

From: scott
Subject: Re: SQL question
Date: 13 Mar 2009 04:23:37
Message: <49ba1809$1@news.povray.org>

> I don't know. I'd say it's more "correct" in some abstract theoretical 
> sense, since the only reason you're grouping is to get unique rows. And it 
> works in the case you're trying. :-)
>
> If you wanted to know how many different types of events each experiment 
> yielded, you'd do something like
>   select expid, count(eventid) from exp group by expid
> or something like that. (Not gonna fire up SQL to try it.)
>
> But you see what I'm saying there.

Yeh I see what you're saying, I'll have to work on importing more realistic 
data as my test data is a bit artificial.  But the speed so far seems fine 
and I'm pretty confident that SQL CE will be capable of doing everything I 
need.  If I had known it was this easy to set up and work with .net I might 
have used it for some other projects too rather than struggling with 
filtering and sorting arrays and other data containers!

Post a reply to this message

From: Darren New
Subject: Re: SQL question
Date: 13 Mar 2009 12:22:01
Message: <49ba8829$1@news.povray.org>

scott wrote:
> Yes they make sense, but I haven't tried that because the speed penalty 
> seems bad anyway so won't use the distinct method for now.

What indexes have you defined? If you define an index over expid, you may 
see a speed increase. 11 seconds seems a bit long for any query on a table 
with indexes.

-- 
   Darren New, San Diego CA, USA (PST)
   My fortune cookie said, "You will soon be
   unable to read this, even at arm's length."

Post a reply to this message

From: scott
Subject: Re: SQL question
Date: 16 Mar 2009 03:51:47
Message: <49be0513$1@news.povray.org>

> What indexes have you defined? If you define an index over expid, you may 
> see a speed increase. 11 seconds seems a bit long for any query on a table 
> with indexes.

I had an index on expid already, but looking at it again I think the test 
case I used was artificial because almost all of the data was from just one 
expid.  Using the query for this one, where 99% of records were for this ID, 
it took the 11 seconds, but when querying for other expids (where they took 
up less than 1% of the table) was way way faster, like 0.1 seconds or 
something.  I guess if the intermediate result somewhere consists of almost 
the whole table it causes some slowdown...

Post a reply to this message

From: Darren New
Subject: Re: SQL question
Date: 16 Mar 2009 14:26:10
Message: <49be99c2$1@news.povray.org>

scott wrote:
> I guess if the intermediate result somewhere 
> consists of almost the whole table it causes some slowdown...

That it will do, yes!  I suspect "unique" and "group by" are doing things 
differently, with "group by" accumulating results and "unique" 
post-processing them.

You definitely want a realistic data set before you worry about speed.

-- 
   Darren New, San Diego CA, USA (PST)
   My fortune cookie said, "You will soon be
   unable to read this, even at arm's length."

Post a reply to this message

<<< Previous 7 Messages

Goto Initial 10 Messages