|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
I told you these would be coming :-)
OK so I have a table like this (there are other columns, but these are the
important 2):
ExpID EventID
1 1
1 3
1 3
1 6
2 1
2 2
2 7
3 2
3 2
3 3
3 4
...
Where ExpID is the number of an experiment and EventID signifies some event
has happened during that experiment. Note that events can happen more than
once for each experiment.
My goal is to calculate, for example, the % of experiments that have had a
certain EventID happen one or more times.
I have got this far (for counting experiments where EventID 1 happened):
SELECT ExpID FROM Table WHERE EventID=1 GROUP BY ExpID
This gives me a list of ExpIDs where eventID 1 has happened. I can manually
count the rows, but how do I write the SQL to just return the number of
rows? I've tried all sorts of COUNT but it won't work (either returns 1 or
an error).
Also is there any way to calculate the % of experiments (result from that
query above divided by the total number of experiments) in one SQL
statement, or do I have to do that by hand after issuing two statements?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
scott wrote:
> My goal is to calculate, for example, the % of experiments that have had
> a certain EventID happen one or more times.
>
> I have got this far (for counting experiments where EventID 1 happened):
>
> SELECT ExpID FROM Table WHERE EventID=1 GROUP BY ExpID
>
> This gives me a list of ExpIDs where eventID 1 has happened. I can
> manually count the rows, but how do I write the SQL to just return the
> number of rows? I've tried all sorts of COUNT but it won't work (either
> returns 1 or an error).
I don't have an SQL engine in front of me, but I would have expected
SELECT ExpID, COUNT(*) FROM Table WHERE EventID=1 GUOP BY ExpID;
to do what you want.
> Also is there any way to calculate the % of experiments (result from
> that query above divided by the total number of experiments) in one SQL
> statement, or do I have to do that by hand after issuing two statements?
Again, off the top of my head, I would suggest that
SELECT ExpID, COUNT(*) / Total
FROM Table, (SELECT COUNT(*) as Total FROM Table2)
GOUP BY ExpID;
or similar ought to do it. (As you can see, you can run a query inside
another query to generate the total.)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> I don't have an SQL engine in front of me, but I would have expected
>
> SELECT ExpID, COUNT(*) FROM Table WHERE EventID=1 GUOP BY ExpID;
I think I tried that and got back something like:
ExpID Count
1 1
2 1
BTW does it matter if I drop the ExpID after the SELECT, I'm not interested
in any experimentID, just the count of them that have an ID=1 event.
I will try later to see if I actually did try that or not (the code and
database is on another machine).
> SELECT ExpID, COUNT(*) / Total
> FROM Table, (SELECT COUNT(*) as Total FROM Table2)
> GOUP BY ExpID;
Oh ok cool, will experiment with that sort of syntax.
The reason I want it all in one SQL function is that I'm thinking of making
some external text-based file that lists all the possible queries people
might want to run on the data, so then the code will be much simpler (no
repeats of lots of SQL code and associated fluff) and I can change the
queries without modifying the actual code. That way, the user can simply
choose "% failed due to Event blah blah" and the code will get the
associated query from the text file and run it automatically. Someone has
probably already invented a much better way to do this sort of thing, but
hey this is my first database app!
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> I don't have an SQL engine in front of me, but I would have expected
>>
>> SELECT ExpID, COUNT(*) FROM Table WHERE EventID=1 GUOP BY ExpID;
>
> I think I tried that and got back something like:
>
> ExpID Count
> 1 1
> 2 1
Mmm, OK. I'm a little rusty with this stuff. It's been a while. ;-)
> BTW does it matter if I drop the ExpID after the SELECT, I'm not
> interested in any experimentID, just the count of them that have an ID=1
> event.
OK, what *precisely* are you trying to count?
1. The number of times event X occurred in each experiment?
2. The number of experiments where event X occurred at least once?
If there's one thing you quickly learn about SQL, it's that you need to
have a very clear picture in your brain of exactly what you're answer
you're trying to get. And if you ask people for help, you need to be
very careful about explaining to them what you're after. ;-)
>> SELECT ExpID, COUNT(*) / Total
>> FROM Table, (SELECT COUNT(*) as Total FROM Table2)
>> GOUP BY ExpID;
>
> Oh ok cool, will experiment with that sort of syntax.
That or something like that ought to do it, I believe.
> The reason I want it all in one SQL function is that I'm thinking of
> making some external text-based file that lists all the possible queries
> people might want to run on the data, so then I can
> change the queries without modifying the actual code.
Makes sense.
> Someone has probably already invented a much better way
> to do this sort of thing, but hey this is my first database app!
Well, you could always store the SQL in the database itself! ;-)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> OK, what *precisely* are you trying to count?
>
> 2. The number of experiments where event X occurred at least once?
This one ^ the number of experiments where event X occurred at least once.
In my first post I said what I had got already:
SELECT ExpID FROM Table WHERE EventID=1 GROUP BY ExpID
This gives me a list of unique ExpIDs where event X occurred at least once.
I simply want it to return the count of those returned rows rather than the
rows themselves, so I just want a single number returned. This will then be
divided by the total number of experiments in the database, to give a % of
experiments where event X occurred one or more times.
> Well, you could always store the SQL in the database itself! ;-)
Cunning :-) Although it would need to be a separate database, as currently
I'm keeping the option open for the user to specify database file to create
and use, so if they want they can keep several separate databases and my app
just works on one of them at a time.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
scott wrote:
>> OK, what *precisely* are you trying to count?
>>
>> 2. The number of experiments where event X occurred at least once?
>
> This one ^ the number of experiments where event X occurred at least once.
>
> In my first post I said what I had got already:
>
> SELECT ExpID FROM Table WHERE EventID=1 GROUP BY ExpID
>
> This gives me a list of unique ExpIDs where event X occurred at least
> once. I simply want it to return the count of those returned rows rather
> than the rows themselves, so I just want a single number returned.
In that case, I think you can just do
SELECT count(ExpID) FROM Table WHERE EventID=1 GROUP BY ExpID;
I could be wrong on that though...
> This
> will then be divided by the total number of experiments in the database,
> to give a % of experiments where event X occurred one or more times.
Right. OK.
>> Well, you could always store the SQL in the database itself! ;-)
>
> Cunning :-) Although it would need to be a separate database, as
> currently I'm keeping the option open for the user to specify database
> file to create and use, so if they want they can keep several separate
> databases and my app just works on one of them at a time.
Ah, I see.
Apparently the way the "big boys" do it is with "prepaired statements".
Basically the database contains something like
SELECT count(*) FROM Table WHERE EventID=%X% GROUP BY ExpID;
and each time you use the statement, you get to say what %X% should be.
(This means the database can parse the text of the statement, build a
query plan, optimise it and so forth just once - when the statement is
"prepaired" - and then each time you execute it, it just reuses the
execution plan with the new variable value.) Of course, that's more of
an issue if you're trying to run Amazon.com or something. ;-)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
scott wrote:
> SELECT ExpID FROM Table WHERE EventID=1 GROUP BY ExpID
>
> This gives me a list of ExpIDs where eventID 1 has happened. I can
> manually count the rows, but how do I write the SQL to just return the
> number of rows? I've tried all sorts of COUNT but it won't work (either
> returns 1 or an error).
<Clippy> It looks like you're trying to use a relational database! </Clippy>
Seriously, the problem you're running into is that you're using something
that's only mostly relational, without enforcing the
one-identical-row-per-table rule. In particular, you're using "group by" to
eliminate duplicates. But enough theory.
I don't know what database engine you're using, but under MySql, you can say
select distinct expid from exp where eventid=1;
and it'll throw out the duplicate expid's. Then you can say
select count(distinct expid) from exp where eventid=1;
and you'll get just the number of rows.
Of course, the same without the "where" will give you the number of
experiments, so just divide them.
Other servers use the term "unique" instead of "distinct". Check your
documentation.
> Also is there any way to calculate the % of experiments (result from
> that query above divided by the total number of experiments) in one SQL
> statement, or do I have to do that by hand after issuing two statements?
You can probably assign the results of the two counts to two variables and
then return the ratio all in one transaction without too much trouble. I
don't remember ever coming up with a way of having two "where" statements
easily in one final result.
Another way might be to do something really messy with outer joins on a
table so you get something with an eventid column that's either "1" or NULL,
turn the NULLs into 0's, and then find the AVG(), but it's probably way, way
easier to do two SQL statements, even if you wind up using temp variables
and doing the math on the server.
--
Darren New, San Diego CA, USA (PST)
My fortune cookie said, "You will soon be
unable to read this, even at arm's length."
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> I don't know what database engine you're using,
MS SQL CE, I think you pointed me to that one :-)
> select count(distinct expid) from exp where eventid=1;
> and you'll get just the number of rows.
>
> Other servers use the term "unique" instead of "distinct". Check your
> documentation.
Hmmm, SELECT DISTINCT expid works, but SELECT COUNT(DISTINCT expid) throws a
parse error at "DISTINCT" from SQL CE at runtime. Actually I got this
before when I was semi-experimenting the other night and couldn't work out
why that didn't work.
<googling>
OK it seems that SQL CE doesn't support DISTINCT inside COUNT, great. They
suggest workarounds, and this works:
SELECT COUNT(*) FROM (SELECT DISTINCT expid FROM exp WHERE eventid=1) AS T
Finally!
So I take it using SELECT DISTINCT X ... is better than using SELECT X ...
GROUP BY X for what I want to do?
> Another way might be to do something really messy with outer joins on a
> table so you get something with an eventid column that's either "1" or
> NULL, turn the NULLs into 0's, and then find the AVG(), but it's probably
> way, way easier to do two SQL statements, even if you wind up using temp
> variables and doing the math on the server.
OK thanks for the tips, I think I will try and get the basic queries working
neatly and fast first :-)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> So I take it using SELECT DISTINCT X ... is better than using SELECT X ...
> GROUP BY X for what I want to do?
OK, so this is weird.
When I call COUNT(*) FROM (SELECT X ... GROUP BY X) on my test data set
(about 100k eventID rows) it runs in 0.25s. That's a bit slow, but hmm ok.
If I change the SQL statement to COUNT(*) FROM (SELECT DISTINCT X ... ) it
takes 11.27s. Huh?
Also, with the DISTINCT method, as I repeatedly execute the query the
database size seems to have grown from 7MB to 14MB then to 31MB... HUH? The
database size goes up by 4x just by executing a query a few times? Maybe
it's optimising something for future queries, but it doesn't seem to get any
faster the 2nd or 3rd time...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"scott" <sco### [at] scottcom> wrote in message
news:49b7765e$1@news.povray.org...
>My goal is to calculate, for example, the % of experiments that have had a
>certain EventID happen one or more times.
This probably won't help too much following the context established by some
other posts, but if you were just attempting to analyze the data from within
an app you could try C# aggregate functions... You can do % and count, as
well as many other things with those:
saleTotalValueNet_label.Text =
Convert.ToString((decimal)newDataSet.SaleItems.Compute("SUM([totalWithTax])",
""));
Some solution based on this sort of thing would not work as quickly as a
TSQL statement, but I seriously doubt it would increase your DB size either.
If you had a partially narrowed down dataSet, you could just perform the
requisite calculations with aggregate functions (and their output) once to
get the values you are looking for. I don't know what the scope of analysis
or refresh interval might be, and both of which will affect your selection
of an acceptable compound solution.
Just a half-asleep thought...
ian
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|