POV-Ray: Newsgroups: povray.off-topic: SQL question

POV-Ray : Newsgroups : povray.off-topic : SQL question		Server Time 17 Dec 2025 10:28:40 EST (-0500)

Goto Latest 10 Messages

Next 7 Messages >>>

From: scott
Subject: SQL question
Date: 11 Mar 2009 04:29:18
Message: <49b7765e$1@news.povray.org>

I told you these would be coming :-)

OK so I have a table like this (there are other columns, but these are the 
important 2):

ExpID  EventID
1      1
1      3
1      3
1      6
2      1
2      2
2      7
3      2
3      2
3      3
3      4
...

Where ExpID is the number of an experiment and EventID signifies some event 
has happened during that experiment.  Note that events can happen more than 
once for each experiment.

My goal is to calculate, for example, the % of experiments that have had a 
certain EventID happen one or more times.

I have got this far (for counting experiments where EventID 1 happened):

SELECT ExpID FROM Table WHERE EventID=1 GROUP BY ExpID

This gives me a list of ExpIDs where eventID 1 has happened.  I can manually 
count the rows, but how do I write the SQL to just return the number of 
rows?  I've tried all sorts of COUNT but it won't work (either returns 1 or 
an error).

Also is there any way to calculate the % of experiments (result from that 
query above divided by the total number of experiments) in one SQL 
statement, or do I have to do that by hand after issuing two statements?

Post a reply to this message

From: Invisible
Subject: Re: SQL question
Date: 11 Mar 2009 05:14:11
Message: <49b780e3$1@news.povray.org>

scott wrote:

> My goal is to calculate, for example, the % of experiments that have had 
> a certain EventID happen one or more times.
> 
> I have got this far (for counting experiments where EventID 1 happened):
> 
> SELECT ExpID FROM Table WHERE EventID=1 GROUP BY ExpID
> 
> This gives me a list of ExpIDs where eventID 1 has happened.  I can 
> manually count the rows, but how do I write the SQL to just return the 
> number of rows?  I've tried all sorts of COUNT but it won't work (either 
> returns 1 or an error).

I don't have an SQL engine in front of me, but I would have expected

   SELECT ExpID, COUNT(*) FROM Table WHERE EventID=1 GUOP BY ExpID;

to do what you want.

> Also is there any way to calculate the % of experiments (result from 
> that query above divided by the total number of experiments) in one SQL 
> statement, or do I have to do that by hand after issuing two statements?

Again, off the top of my head, I would suggest that

   SELECT ExpID, COUNT(*) / Total
   FROM Table, (SELECT COUNT(*) as Total FROM Table2)
   GOUP BY ExpID;

or similar ought to do it. (As you can see, you can run a query inside 
another query to generate the total.)

Post a reply to this message

From: scott
Subject: Re: SQL question
Date: 11 Mar 2009 05:28:00
Message: <49b78420@news.povray.org>

> I don't have an SQL engine in front of me, but I would have expected
>
>   SELECT ExpID, COUNT(*) FROM Table WHERE EventID=1 GUOP BY ExpID;

I think I tried that and got back something like:

ExpID Count
1     1
2     1

BTW does it matter if I drop the ExpID after the SELECT, I'm not interested 
in any experimentID, just the count of them that have an ID=1 event.

I will try later to see if I actually did try that or not (the code and 
database is on another machine).

>   SELECT ExpID, COUNT(*) / Total
>   FROM Table, (SELECT COUNT(*) as Total FROM Table2)
>   GOUP BY ExpID;

Oh ok cool, will experiment with that sort of syntax.

The reason I want it all in one SQL function is that I'm thinking of making 
some external text-based file that lists all the possible queries people 
might want to run on the data, so then the code will be much simpler (no 
repeats of lots of SQL code and associated fluff) and I can change the 
queries without modifying the actual code.  That way, the user can simply 
choose "% failed due to Event blah blah" and the code will get the 
associated query from the text file and run it automatically.  Someone has 
probably already invented a much better way to do this sort of thing, but 
hey this is my first database app!

Post a reply to this message

From: Invisible
Subject: Re: SQL question
Date: 11 Mar 2009 05:43:30
Message: <49b787c2$1@news.povray.org>

>> I don't have an SQL engine in front of me, but I would have expected
>>
>>   SELECT ExpID, COUNT(*) FROM Table WHERE EventID=1 GUOP BY ExpID;
> 
> I think I tried that and got back something like:
> 
> ExpID Count
> 1     1
> 2     1

Mmm, OK. I'm a little rusty with this stuff. It's been a while. ;-)

> BTW does it matter if I drop the ExpID after the SELECT, I'm not 
> interested in any experimentID, just the count of them that have an ID=1 
> event.

OK, what *precisely* are you trying to count?

1. The number of times event X occurred in each experiment?

2. The number of experiments where event X occurred at least once?

If there's one thing you quickly learn about SQL, it's that you need to 
have a very clear picture in your brain of exactly what you're answer 
you're trying to get. And if you ask people for help, you need to be 
very careful about explaining to them what you're after. ;-)

>>   SELECT ExpID, COUNT(*) / Total
>>   FROM Table, (SELECT COUNT(*) as Total FROM Table2)
>>   GOUP BY ExpID;
> 
> Oh ok cool, will experiment with that sort of syntax.

That or something like that ought to do it, I believe.

> The reason I want it all in one SQL function is that I'm thinking of 
> making some external text-based file that lists all the possible queries 
> people might want to run on the data, so then I can 
> change the queries without modifying the actual code.

Makes sense.

> Someone has probably already invented a much better way 
> to do this sort of thing, but hey this is my first database app!

Well, you could always store the SQL in the database itself! ;-)

Post a reply to this message

From: scott
Subject: Re: SQL question
Date: 11 Mar 2009 06:05:16
Message: <49b78cdc$1@news.povray.org>

> OK, what *precisely* are you trying to count?
>
> 2. The number of experiments where event X occurred at least once?

This one ^ the number of experiments where event X occurred at least once.

In my first post I said what I had got already:

SELECT ExpID FROM Table WHERE EventID=1 GROUP BY ExpID

This gives me a list of unique ExpIDs where event X occurred at least once. 
I simply want it to return the count of those returned rows rather than the 
rows themselves, so I just want a single number returned.  This will then be 
divided by the total number of experiments in the database, to give a % of 
experiments where event X occurred one or more times.

> Well, you could always store the SQL in the database itself! ;-)

Cunning :-)  Although it would need to be a separate database, as currently 
I'm keeping the option open for the user to specify database file to create 
and use, so if they want they can keep several separate databases and my app 
just works on one of them at a time.

Post a reply to this message

From: Invisible
Subject: Re: SQL question
Date: 11 Mar 2009 06:12:02
Message: <49b78e72$1@news.povray.org>

scott wrote:
>> OK, what *precisely* are you trying to count?
>>
>> 2. The number of experiments where event X occurred at least once?
> 
> This one ^ the number of experiments where event X occurred at least once.
> 
> In my first post I said what I had got already:
> 
> SELECT ExpID FROM Table WHERE EventID=1 GROUP BY ExpID
> 
> This gives me a list of unique ExpIDs where event X occurred at least 
> once. I simply want it to return the count of those returned rows rather 
> than the rows themselves, so I just want a single number returned.

In that case, I think you can just do

   SELECT count(ExpID) FROM Table WHERE EventID=1 GROUP BY ExpID;

I could be wrong on that though...

> This 
> will then be divided by the total number of experiments in the database, 
> to give a % of experiments where event X occurred one or more times.

Right. OK.

>> Well, you could always store the SQL in the database itself! ;-)
> 
> Cunning :-)  Although it would need to be a separate database, as 
> currently I'm keeping the option open for the user to specify database 
> file to create and use, so if they want they can keep several separate 
> databases and my app just works on one of them at a time.

Ah, I see.

Apparently the way the "big boys" do it is with "prepaired statements". 
Basically the database contains something like

   SELECT count(*) FROM Table WHERE EventID=%X% GROUP BY ExpID;

and each time you use the statement, you get to say what %X% should be. 
(This means the database can parse the text of the statement, build a 
query plan, optimise it and so forth just once - when the statement is 
"prepaired" - and then each time you execute it, it just reuses the 
execution plan with the new variable value.) Of course, that's more of 
an issue if you're trying to run Amazon.com or something. ;-)

Post a reply to this message

From: Darren New
Subject: Re: SQL question
Date: 11 Mar 2009 16:49:39
Message: <49b823e3$1@news.povray.org>

scott wrote:
> SELECT ExpID FROM Table WHERE EventID=1 GROUP BY ExpID
> 
> This gives me a list of ExpIDs where eventID 1 has happened.  I can 
> manually count the rows, but how do I write the SQL to just return the 
> number of rows?  I've tried all sorts of COUNT but it won't work (either 
> returns 1 or an error).

<Clippy> It looks like you're trying to use a relational database! </Clippy>

Seriously, the problem you're running into is that you're using something 
that's only mostly relational, without enforcing the 
one-identical-row-per-table rule. In particular, you're using "group by" to 
eliminate duplicates. But enough theory.

I don't know what database engine you're using, but under MySql, you can say
select distinct expid from exp where eventid=1;
and it'll throw out the duplicate expid's. Then you can say
select count(distinct expid) from exp where eventid=1;
and you'll get just the number of rows.

Of course, the same without the "where" will give you the number of 
experiments, so just divide them.

Other servers use the term "unique" instead of "distinct". Check your 
documentation.

> Also is there any way to calculate the % of experiments (result from 
> that query above divided by the total number of experiments) in one SQL 
> statement, or do I have to do that by hand after issuing two statements?

You can probably assign the results of the two counts to two variables and 
then return the ratio all in one transaction without too much trouble. I 
don't remember ever coming up with a way of having two "where" statements 
easily in one final result.

Another way might be to do something really messy with outer joins on a 
table so you get something with an eventid column that's either "1" or NULL, 
turn the NULLs into 0's, and then find the AVG(), but it's probably way, way 
easier to do two SQL statements, even if you wind up using temp variables 
and doing the math on the server.

-- 
   Darren New, San Diego CA, USA (PST)
   My fortune cookie said, "You will soon be
   unable to read this, even at arm's length."

Post a reply to this message

From: scott
Subject: Re: SQL question
Date: 12 Mar 2009 03:58:36
Message: <49b8c0ac@news.povray.org>

> I don't know what database engine you're using,

MS SQL CE, I think you pointed me to that one :-)

> select count(distinct expid) from exp where eventid=1;
> and you'll get just the number of rows.
>
> Other servers use the term "unique" instead of "distinct". Check your 
> documentation.

Hmmm, SELECT DISTINCT expid works, but SELECT COUNT(DISTINCT expid) throws a 
parse error at "DISTINCT" from SQL CE at runtime.  Actually I got this 
before when I was semi-experimenting the other night and couldn't work out 
why that didn't work.

<googling>

OK it seems that SQL CE doesn't support DISTINCT inside COUNT, great.  They 
suggest workarounds, and this works:

SELECT COUNT(*) FROM (SELECT DISTINCT expid FROM exp WHERE eventid=1) AS T

Finally!

So I take it using SELECT DISTINCT X ... is better than using SELECT X ... 
GROUP BY X for what I want to do?

> Another way might be to do something really messy with outer joins on a 
> table so you get something with an eventid column that's either "1" or 
> NULL, turn the NULLs into 0's, and then find the AVG(), but it's probably 
> way, way easier to do two SQL statements, even if you wind up using temp 
> variables and doing the math on the server.

OK thanks for the tips, I think I will try and get the basic queries working 
neatly and fast first :-)

Post a reply to this message

From: scott
Subject: Re: SQL question
Date: 12 Mar 2009 06:29:35
Message: <49b8e40f$1@news.povray.org>

> So I take it using SELECT DISTINCT X ... is better than using SELECT X ... 
> GROUP BY X for what I want to do?

OK, so this is weird.

When I call COUNT(*) FROM (SELECT X ... GROUP BY X) on my test data set 
(about 100k eventID rows) it runs in 0.25s. That's a bit slow, but hmm ok.

If I change the SQL statement to COUNT(*) FROM (SELECT DISTINCT X ... ) it 
takes 11.27s.  Huh?

Also, with the DISTINCT method, as I repeatedly execute the query the 
database size seems to have grown from 7MB to 14MB then to 31MB... HUH?  The 
database size goes up by 4x just by executing a query a few times?  Maybe 
it's optimising something for future queries, but it doesn't seem to get any 
faster the 2nd or 3rd time...

Post a reply to this message

From: [GDS|Entropy]
Subject: Re: SQL question
Date: 12 Mar 2009 07:23:46
Message: <49b8f0c2$1@news.povray.org>

"scott" <sco### [at] scottcom> wrote in message 
news:49b7765e$1@news.povray.org...
>My goal is to calculate, for example, the % of experiments that have had a 
>certain EventID happen one or more times.

This probably won't help too much following the context established by some 
other posts, but if you were just attempting to analyze the data from within 
an app you could try C# aggregate functions... You can do % and count, as 
well as many other things with those:

saleTotalValueNet_label.Text = 
Convert.ToString((decimal)newDataSet.SaleItems.Compute("SUM([totalWithTax])", 
""));

Some solution based on this sort of thing would not work as quickly as a 
TSQL statement, but I seriously doubt it would increase your DB size either.

If you had a partially narrowed down dataSet, you could just perform the 
requisite calculations with aggregate functions (and their output) once to 
get the values you are looking for. I don't know what the scope of analysis 
or refresh interval might be, and both of which will affect your selection 
of an acceptable compound solution.

Just a half-asleep thought...

ian

Post a reply to this message

Goto Latest 10 Messages

Next 7 Messages >>>