|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> I don't know what database engine you're using,
MS SQL CE, I think you pointed me to that one :-)
> select count(distinct expid) from exp where eventid=1;
> and you'll get just the number of rows.
>
> Other servers use the term "unique" instead of "distinct". Check your
> documentation.
Hmmm, SELECT DISTINCT expid works, but SELECT COUNT(DISTINCT expid) throws a
parse error at "DISTINCT" from SQL CE at runtime. Actually I got this
before when I was semi-experimenting the other night and couldn't work out
why that didn't work.
<googling>
OK it seems that SQL CE doesn't support DISTINCT inside COUNT, great. They
suggest workarounds, and this works:
SELECT COUNT(*) FROM (SELECT DISTINCT expid FROM exp WHERE eventid=1) AS T
Finally!
So I take it using SELECT DISTINCT X ... is better than using SELECT X ...
GROUP BY X for what I want to do?
> Another way might be to do something really messy with outer joins on a
> table so you get something with an eventid column that's either "1" or
> NULL, turn the NULLs into 0's, and then find the AVG(), but it's probably
> way, way easier to do two SQL statements, even if you wind up using temp
> variables and doing the math on the server.
OK thanks for the tips, I think I will try and get the basic queries working
neatly and fast first :-)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> So I take it using SELECT DISTINCT X ... is better than using SELECT X ...
> GROUP BY X for what I want to do?
OK, so this is weird.
When I call COUNT(*) FROM (SELECT X ... GROUP BY X) on my test data set
(about 100k eventID rows) it runs in 0.25s. That's a bit slow, but hmm ok.
If I change the SQL statement to COUNT(*) FROM (SELECT DISTINCT X ... ) it
takes 11.27s. Huh?
Also, with the DISTINCT method, as I repeatedly execute the query the
database size seems to have grown from 7MB to 14MB then to 31MB... HUH? The
database size goes up by 4x just by executing a query a few times? Maybe
it's optimising something for future queries, but it doesn't seem to get any
faster the 2nd or 3rd time...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"scott" <sco### [at] scottcom> wrote in message
news:49b7765e$1@news.povray.org...
>My goal is to calculate, for example, the % of experiments that have had a
>certain EventID happen one or more times.
This probably won't help too much following the context established by some
other posts, but if you were just attempting to analyze the data from within
an app you could try C# aggregate functions... You can do % and count, as
well as many other things with those:
saleTotalValueNet_label.Text =
Convert.ToString((decimal)newDataSet.SaleItems.Compute("SUM([totalWithTax])",
""));
Some solution based on this sort of thing would not work as quickly as a
TSQL statement, but I seriously doubt it would increase your DB size either.
If you had a partially narrowed down dataSet, you could just perform the
requisite calculations with aggregate functions (and their output) once to
get the values you are looking for. I don't know what the scope of analysis
or refresh interval might be, and both of which will affect your selection
of an acceptable compound solution.
Just a half-asleep thought...
ian
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
scott wrote:
> So I take it using SELECT DISTINCT X ... is better than using SELECT X
> ... GROUP BY X for what I want to do?
I don't know. I'd say it's more "correct" in some abstract theoretical
sense, since the only reason you're grouping is to get unique rows. And it
works in the case you're trying. :-)
If you wanted to know how many different types of events each experiment
yielded, you'd do something like
select expid, count(eventid) from exp group by expid
or something like that. (Not gonna fire up SQL to try it.)
But you see what I'm saying there.
--
Darren New, San Diego CA, USA (PST)
My fortune cookie said, "You will soon be
unable to read this, even at arm's length."
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
scott wrote:
> When I call COUNT(*) FROM (SELECT X ... GROUP BY X) on my test data set
> (about 100k eventID rows) it runs in 0.25s. That's a bit slow, but hmm ok.
>
> If I change the SQL statement to COUNT(*) FROM (SELECT DISTINCT X ... )
> it takes 11.27s. Huh?
I'm guessing when you do a SELECT DISTINCT, it's internally creating a
temporary table.
> Also, with the DISTINCT method, as I repeatedly execute the query the
> database size seems to have grown from 7MB to 14MB then to 31MB... HUH?
Do you have an option to compact the database again and see if it shrinks?
Did you perchance close the connection to the database between queries so it
frees up temporary tables? (I'm just guessing here.)
Do these questions not even make sense for the compact edition? :-)
--
Darren New, San Diego CA, USA (PST)
My fortune cookie said, "You will soon be
unable to read this, even at arm's length."
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> Do you have an option to compact the database again and see if it shrinks?
> Did you perchance close the connection to the database between queries so
> it frees up temporary tables? (I'm just guessing here.)
>
> Do these questions not even make sense for the compact edition? :-)
Yes they make sense, but I haven't tried that because the speed penalty
seems bad anyway so won't use the distinct method for now.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> I don't know. I'd say it's more "correct" in some abstract theoretical
> sense, since the only reason you're grouping is to get unique rows. And it
> works in the case you're trying. :-)
>
> If you wanted to know how many different types of events each experiment
> yielded, you'd do something like
> select expid, count(eventid) from exp group by expid
> or something like that. (Not gonna fire up SQL to try it.)
>
> But you see what I'm saying there.
Yeh I see what you're saying, I'll have to work on importing more realistic
data as my test data is a bit artificial. But the speed so far seems fine
and I'm pretty confident that SQL CE will be capable of doing everything I
need. If I had known it was this easy to set up and work with .net I might
have used it for some other projects too rather than struggling with
filtering and sorting arrays and other data containers!
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
scott wrote:
> Yes they make sense, but I haven't tried that because the speed penalty
> seems bad anyway so won't use the distinct method for now.
What indexes have you defined? If you define an index over expid, you may
see a speed increase. 11 seconds seems a bit long for any query on a table
with indexes.
--
Darren New, San Diego CA, USA (PST)
My fortune cookie said, "You will soon be
unable to read this, even at arm's length."
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> What indexes have you defined? If you define an index over expid, you may
> see a speed increase. 11 seconds seems a bit long for any query on a table
> with indexes.
I had an index on expid already, but looking at it again I think the test
case I used was artificial because almost all of the data was from just one
expid. Using the query for this one, where 99% of records were for this ID,
it took the 11 seconds, but when querying for other expids (where they took
up less than 1% of the table) was way way faster, like 0.1 seconds or
something. I guess if the intermediate result somewhere consists of almost
the whole table it causes some slowdown...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
scott wrote:
> I guess if the intermediate result somewhere
> consists of almost the whole table it causes some slowdown...
That it will do, yes! I suspect "unique" and "group by" are doing things
differently, with "group by" accumulating results and "unique"
post-processing them.
You definitely want a realistic data set before you worry about speed.
--
Darren New, San Diego CA, USA (PST)
My fortune cookie said, "You will soon be
unable to read this, even at arm's length."
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |