We have an application that stores operational data as a single master table and a set of child tables. Based on the class of data, a different child table gets joined in e.g.
master inner_join pets -> gives us pet data master inner_join cars -> gives us car data
It works reasonably well, and it makes the developer's lives easier. Normally the working set of records is quite small as well e.g. < 1k records at a time so the actual queries being issued are something like:
select <some fields> from a inner join b on a .sys_id = b.sys_id where a.<some indexed column> = <some fairly selective criteria>
There's one part of the application though that computes counts via the not terribly complex:
select count(*) from a inner join b on a sys_id = b.syss_id where a.<some indexed column> = <some selective criteria>
Recently though we've had some customers with some unusual data shape such that the particular count query (above) is joining 200k records or so together to implement the count.
Looking at profile on the database, that count alone is something like 68% of the server CPU load. Not surprisingly, I'd dearly love to do something about this. Problem is, I'm stumped.
Question I have is:
Short of constructing a summary table (which is probably a non starter in this environment), is there anything I could to to speed up the return time of a count query like the above?
Apologies if this is a stupid question, but as you can tell from the batch of posts I've been putting on the newsgroup I'm on something of a jihad to clean up some of the Oracle performance bottlenecks in this app, that that is definately one of them.
Pat <pat.ca...@service-now.com> wrote in news:0653963d-1d32-4e85-9bae- 55e038b9e...@f63g2000hsf.googlegroups.com:
> Question I have is:
> Short of constructing a summary table (which is probably a non starter > in this environment), is there anything I could to to speed up the > return time of a count query like the above?
Ready, Fire, Aim!
If you don't know specifically where the time is being spent, you'd only be guessing what change is most optimal.
Run your query with SQL_TRACE=TRUE & run the results thru tkprof.
Post the results & EXPLAIN PLAN back here is you still need more asisstance
> Pat <pat.ca...@service-now.com> wrote in news:0653963d-1d32-4e85-9bae- > 55e038b9e...@f63g2000hsf.googlegroups.com:
> > Question I have is:
> > Short of constructing a summary table (which is probably a non starter > > in this environment), is there anything I could to to speed up the > > return time of a count query like the above?
> Ready, Fire, Aim!
> If you don't know specifically where the time is being spent, > you'd only be guessing what change is most optimal.
> Run your query with SQL_TRACE=TRUE & run the results thru tkprof.
> Post the results & EXPLAIN PLAN back here is you still need more asisstance
I'm working over a remote desktop at the moment so I can't cut and paste the plan so I'll transcribe:
select count(*) as recordcount from (a inner join b on a.sys_id = b.sys_id) where a.active = 1
Table has aprox 180k rows, of which 151k have active = true
SELECT STATEMENT _SORT AGGREGATE __NESTED LOOPS ___INDEX RANGE SCAN (index on a.active, a.sys_id) ___INDEX UNIQUE SCAN (index on b.sys_id)
Total Query time: 2.39 seconds
I've got a covering index on the primary table and an index on the child table's join condition, so there's no actual table reads going on here e.g. it's just an index join. I have current stats on the table (as of last night). From what I can see this is not an absurdly bad optimizer plan, but then I suspect there's something I'm not seeing, hence my post here.
> On Apr 12, 4:44 pm, "Ana C. Dent" <anaced...@hotmail.com> wrote:
> > Pat <pat.ca...@service-now.com> wrote in news:0653963d-1d32-4e85-9bae- > > 55e038b9e...@f63g2000hsf.googlegroups.com:
> > > Question I have is:
> > > Short of constructing a summary table (which is probably a non starter > > > in this environment), is there anything I could to to speed up the > > > return time of a count query like the above?
> > Ready, Fire, Aim!
> > If you don't know specifically where the time is being spent, > > you'd only be guessing what change is most optimal.
> > Run your query with SQL_TRACE=TRUE & run the results thru tkprof.
> > Post the results & EXPLAIN PLAN back here is you still need more asisstance
> I'm working over a remote desktop at the moment so I can't cut and > paste the plan so I'll transcribe:
> select count(*) as recordcount from (a inner join b on a.sys_id = > b.sys_id) where a.active = 1
> Table has aprox 180k rows, of which 151k have active = true
> SELECT STATEMENT > _SORT AGGREGATE > __NESTED LOOPS > ___INDEX RANGE SCAN (index on a.active, a.sys_id) > ___INDEX UNIQUE SCAN (index on b.sys_id)
> Total Query time: 2.39 seconds
> I've got a covering index on the primary table and an index on the > child table's join condition, so there's no actual table reads going > on here e.g. it's just an index join. I have current stats on the > table (as of last night). From what I can see this is not an absurdly > bad optimizer plan, but then I suspect there's something I'm not > seeing, hence my post here.- Hide quoted text -
> - Show quoted text -
The nested loops technique works pretty well with a limited volume of matching records but if you are really getting counts of 200k at times well it's a non-performer from the get go.
A hash join might perfom a ( little ) better ... but still may not be adequate.
What release level and patchset are you running on exactly? What kind of hardware platform?
Does the application really require an absolutely accurate count in these circumstances and if so how is this accurate count going to be used?
Re-design may be called for here or at least some bypass type techniques.
If absolutely correct counts are not required one way of thinking about a bypass is limiting the count to some base number of rows via rownum ( select count(*) from whatever the tables are with join condition AND rownum <= 10,100,1000 ).
Another bypass technique might be to create a table that has counts already calculated and refresh that table on a periodic basis. Then point the query against the pre-aggregated table.
> On Apr 12, 8:01 pm, Pat <pat.ca...@service-now.com> wrote:
> > On Apr 12, 4:44 pm, "Ana C. Dent" <anaced...@hotmail.com> wrote:
> > > Pat <pat.ca...@service-now.com> wrote in news:0653963d-1d32-4e85-9bae- > > > 55e038b9e...@f63g2000hsf.googlegroups.com:
> > > > Question I have is:
> > > > Short of constructing a summary table (which is probably a non starter > > > > in this environment), is there anything I could to to speed up the > > > > return time of a count query like the above?
> > > Ready, Fire, Aim!
> > > If you don't know specifically where the time is being spent, > > > you'd only be guessing what change is most optimal.
> > > Run your query with SQL_TRACE=TRUE & run the results thru tkprof.
> > > Post the results & EXPLAIN PLAN back here is you still need more asisstance
> > I'm working over a remote desktop at the moment so I can't cut and > > paste the plan so I'll transcribe:
> > select count(*) as recordcount from (a inner join b on a.sys_id = > > b.sys_id) where a.active = 1
> > Table has aprox 180k rows, of which 151k have active = true
> > SELECT STATEMENT > > _SORT AGGREGATE > > __NESTED LOOPS > > ___INDEX RANGE SCAN (index on a.active, a.sys_id) > > ___INDEX UNIQUE SCAN (index on b.sys_id)
> > Total Query time: 2.39 seconds
> > I've got a covering index on the primary table and an index on the > > child table's join condition, so there's no actual table reads going > > on here e.g. it's just an index join. I have current stats on the > > table (as of last night). From what I can see this is not an absurdly > > bad optimizer plan, but then I suspect there's something I'm not > > seeing, hence my post here.- Hide quoted text -
> > - Show quoted text -
> The nested loops technique works pretty well with a limited volume of > matching records but if you are really getting counts of 200k at times > well it's a non-performer from the get go.
> A hash join might perfom a ( little ) better ... but still may not be > adequate.
> What release level and patchset are you running on exactly? What kind > of hardware platform?
> Does the application really require an absolutely accurate count in > these circumstances and if so how is this accurate count going to be > used?
> Re-design may be called for here or at least some bypass type > techniques.
> If absolutely correct counts are not required one way of thinking > about a bypass is limiting the count to some base number of rows via > rownum ( select count(*) from whatever the tables are with join > condition AND rownum <= 10,100,1000 ).
> Another bypass technique might be to create a table that has counts > already calculated and refresh that table on a periodic basis. Then > point the query against the pre-aggregated table.
Unfortunately (from a database efficiency standpoint), exact counts are required, or at least expected by our current user base. Summary tables are probably not going to work either since I can't predict what the count is going to be over. In this case, this user is hammering the subset active=1, but tomorrow it may be based on some other specifier.
Switching this to a hash join sounds like it has promise (or at least be worth a shot).
Is there an optimizer hint I can add to force a hash join in this case? Be worth a shot.
I have to admit I am a bit surprised this is all that slow though. It's a modern box with 3.0 ghz CPUs in it (intel) and virtually no load apart from this query. He can basically monopolize an entire core (and he does according to vmstat), and it's still taking him about 2.2 seconds to finish. Seems like that's an aweful lot of CPU time just to join a couple of hundred thousand index nodes together and count the result. Virtually no physical IOs involved here either. Far as I can tell, this is pure CPU load.
> On Apr 12, 5:51 pm, hpuxrac <johnbhur...@sbcglobal.net> wrote: > > The nested loops technique works pretty well with a limited volume of > > matching records but if you are really getting counts of 200k at times > > well it's a non-performer from the get go.
> > A hash join might perfom a ( little ) better ... but still may not be > > adequate.
> Unfortunately (from a database efficiency standpoint), exact counts > are required, or at least expected by our current user base. Summary > tables are probably not going to work either since I can't predict > what the count is going to be over. In this case, this user is > hammering the subset active=1, but tomorrow it may be based on some > other specifier.
> Switching this to a hash join sounds like it has promise (or at least > be worth a shot).
> Is there an optimizer hint I can add to force a hash join in this > case? Be worth a shot.
> I have to admit I am a bit surprised this is all that slow though. > It's a modern box with 3.0 ghz CPUs in it (intel) and virtually no > load apart from this query. He can basically monopolize an entire core > (and he does according to vmstat), and it's still taking him about 2.2 > seconds to finish. Seems like that's an aweful lot of CPU time just to > join a couple of hundred thousand index nodes together and count the > result. Virtually no physical IOs involved here either. Far as I can > tell, this is pure CPU load.
> 10.2.0.4 64 bit on RHEL 5.0.
I set up a simple test case to test the suggestion to try a hash join, rather than a nested loops join. Your results may vary slightly from these results.
First, two tables that have sufficiently wide enough columns so that Oracle does not try a full table scan: CREATE TABLE T1 ( SYS_ID NUMBER(10) NOT NULL, ACTIVE NUMBER(1) NOT NULL, C3 VARCHAR2(255), C4 VARCHAR2(255), C5 VARCHAR2(255), C6 VARCHAR2(255), C7 VARCHAR2(255), C8 VARCHAR2(255), C9 VARCHAR2(255), C10 VARCHAR2(255));
Now, insert rows into the T1 table (note that the DECODE contains an error, reverse the last 1 and 0 in the DECODE, but this will work as is by slightly adjusting the SQL statement): INSERT INTO T1 SELECT ROWNUM, DECODE(SIGN(MOD(ROWNUM,100)-84),1,1,0), DBMS_RANDOM.STRING('A',255), DBMS_RANDOM.STRING('A',255), DBMS_RANDOM.STRING('A',255), DBMS_RANDOM.STRING('A',255), DBMS_RANDOM.STRING('A',255), DBMS_RANDOM.STRING('A',255), DBMS_RANDOM.STRING('A',255), DBMS_RANDOM.STRING('A',255) FROM DUAL CONNECT BY LEVEL<=180000;
Now to save time, we will use the T1 table to build the T2 table: INSERT INTO T2 SELECT SYS_ID, C3, C3, C4, C5, C6, C7, C8, C9, C10 FROM T1;
COMMIT;
Now, collect statistics on the tables and indexes: EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE); EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE);
Your query, rewritten to work with the above (note that a.active is specified as 0 in this case): select count(*) as recordcount from (T1 a inner join T2 b on a.sys_id = b.sys_id) where a.active = 0;
Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("A"."ACTIVE"=0 AND "A"."SYS_ID"="B"."SYS_ID")
The above shows that the T1_IND1 index was probed 180,000 times. Note that the plan order is not the same as what you posted, likely because your table B (T2) contains more than 1 row per SYS_ID. Plan statistics from a 10046 trace, 3000 consistent reads: (Rows 1) SORT AGGREGATE (cr=3000 pr=0 pw=0 time=0 us) (Rows 153000) NESTED LOOPS (cr=3000 pr=0 pw=0 time=2255352 us cost=121 size=1170000 card=90000) (Rows 180000) INDEX FAST FULL SCAN T2_IND1 (cr=382 pr=0 pw=0 time=266066 us cost=104 size=900000 card=180000) (Rows 153000) INDEX UNIQUE SCAN T1_IND1 (cr=2618 pr=0 pw=0 time=0 us cost=0 size=8 card=1)
Slightly modifying your query with a hint to force a hash join: select /*+ use_hash(a b) */ count(*) as recordcount from (T1 a inner join T2 b on a.sys_id = b.sys_id) where a.active = 0; The DBMS_XPLAN showing actual timing: --------------------------------------------------------------------------- -------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | --------------------------------------------------------------------------- -------------------------------------------- | 1 | SORT AGGREGATE | | 1 | 1 | 1 | 00:00:02.42 | 818 | | | | |* 2 | HASH JOIN | | 1 | 90000 | 153K| 00:00:02.13 | 818 | 3624K| 1381K| 5510K (0)| |* 3 | INDEX FAST FULL SCAN| T1_IND1 | 1 | 90000 | 153K| 00:00:00.16 | 436 | | | | | 4 | INDEX FAST FULL SCAN| T2_IND1 | 1 | 180K| 180K| 00:00:00.18 | 382 | | | | --------------------------------------------------------------------------- --------------------------------------------
Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("A"."SYS_ID"="B"."SYS_ID") 3 - filter("A"."ACTIVE"=0)
The above executed 0.03 seconds faster, and the consistent reads dropped to 818. (Rows 1) SORT AGGREGATE (cr=818 pr=0 pw=0 time=0 us) (Rows 153000) HASH JOIN (cr=818 pr=0 pw=0 time=1352636 us cost=458 size=1170000 card=90000) (Rows 153000) INDEX FAST FULL SCAN T1_IND1 (cr=436 pr=0 pw=0 time=238965 us cost=120 size=720000 card=90000) (Rows 180000) INDEX FAST FULL SCAN T2_IND1 (cr=382 pr=0 pw=0 time=278655 us cost=104 size=900000 card=180000)
With just an ORDERED hint to force the same plan order as what you posted: | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | --------------------------------------------------------------------------- ----------------- | 1 | SORT AGGREGATE | | 1 | 1 | 1 | 00:00:02.24 | 3075 | | 2 | NESTED LOOPS | | 1 | 90000 | 153K| 00:00:01.99 | 3075 | |* 3 | INDEX FAST FULL SCAN| T1_IND1 | 1 | 90000 | 153K| 00:00:00.16 | 436 | |* 4 | INDEX UNIQUE SCAN | T2_IND1 | 153K| 1 | 153K| 00:00:00.72 | 2639 | --------------------------------------------------------------------------- -----------------
Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("A"."ACTIVE"=0) 4 - access("A"."SYS_ID"="B"."SYS_ID")
0.21 seconds faster than the first execution, even though the consistent reads increased to 3075 (Rows 1) SORT AGGREGATE (cr=3075 pr=0 pw=0 time=0 us) (Rows 153000) NESTED LOOPS (cr=3075 pr=0 pw=0 time=2026659 us cost=128 size=1170000 card=90000) (Rows 153000) INDEX FAST FULL SCAN T1_IND1 (cr=436 pr=0 pw=0 time=239888 us cost=120 size=720000 card=90000) (Rows 153000) INDEX UNIQUE SCAN T2_IND1 (cr=2639 pr=0 pw=0 time=0 us cost=0 size=5 card=1)
In the plans, look closely at the timings of each step in the plan. The timings of a child step will roll up into the parent step (indicated by the indentation in the plan). The actual scan of the indexes happens reasonably fast, it is the join that takes the majority of the elapsed time.
Interesting fun with mathematics, which may not be entirely relevant. On a computer with a computer marketed as having a 1333MHz bus speed, using 333MHz quad pumped dual channel memory chips, each memory clock cycle retrieves up to 32 bytes in 0.000000003003003 seconds (maximum transfer speed of 10,162.35 MB per second), and the CPU core will wait for this duration on every memory access. A standard 8KB block requires a minimum of 256 memory clock cycles to be read, resulting in a minimum delay of 0.000000768768768 seconds to read an 8KB block from system memory. If you require the computer to perform 180,000 8KB reads (assuming the data is not cached in the CPU registers, L1, L2, or L3 caches), it will take a minimum of 0.138 seconds (consistent reads of 8KB blocks might take 5-10 times longer). What seems like a simple problem becomes a bit complicated when you dig into the details.
Charles Hooper IT Manager/Oracle DBA K&M Machine-Fabricating, Inc.
> On Apr 12, 9:58 pm, Pat <pat.ca...@service-now.com> wrote:
> > On Apr 12, 5:51 pm, hpuxrac <johnbhur...@sbcglobal.net> wrote: > > > The nested loops technique works pretty well with a limited volume of > > > matching records but if you are really getting counts of 200k at times > > > well it's a non-performer from the get go.
> > > A hash join might perfom a ( little ) better ... but still may not be > > > adequate.
> > Unfortunately (from a database efficiency standpoint), exact counts > > are required, or at least expected by our current user base. Summary > > tables are probably not going to work either since I can't predict > > what the count is going to be over. In this case, this user is > > hammering the subset active=1, but tomorrow it may be based on some > > other specifier.
> > Switching this to a hash join sounds like it has promise (or at least > > be worth a shot).
> > Is there an optimizer hint I can add to force a hash join in this > > case? Be worth a shot.
> > I have to admit I am a bit surprised this is all that slow though. > > It's a modern box with 3.0 ghz CPUs in it (intel) and virtually no > > load apart from this query. He can basically monopolize an entire core > > (and he does according to vmstat), and it's still taking him about 2.2 > > seconds to finish. Seems like that's an aweful lot of CPU time just to > > join a couple of hundred thousand index nodes together and count the > > result. Virtually no physical IOs involved here either. Far as I can > > tell, this is pure CPU load.
> > 10.2.0.4 64 bit on RHEL 5.0.
> I set up a simple test case to test the suggestion to try a hash join, > rather than a nested loops join. Your results may vary slightly from > these results.
> First, two tables that have sufficiently wide enough columns so that > Oracle does not try a full table scan: > CREATE TABLE T1 ( > SYS_ID NUMBER(10) NOT NULL, > ACTIVE NUMBER(1) NOT NULL, > C3 VARCHAR2(255), > C4 VARCHAR2(255), > C5 VARCHAR2(255), > C6 VARCHAR2(255), > C7 VARCHAR2(255), > C8 VARCHAR2(255), > C9 VARCHAR2(255), > C10 VARCHAR2(255));
> CREATE UNIQUE INDEX T1_IND1 ON T1(ACTIVE,SYS_ID);
> Now, insert rows into the T1 table (note that the DECODE contains an > error, reverse the last 1 and 0 in the DECODE, but this will work as > is by slightly adjusting the SQL statement): > INSERT INTO > T1 > SELECT > ROWNUM, > DECODE(SIGN(MOD(ROWNUM,100)-84),1,1,0), > DBMS_RANDOM.STRING('A',255), > DBMS_RANDOM.STRING('A',255), > DBMS_RANDOM.STRING('A',255), > DBMS_RANDOM.STRING('A',255), > DBMS_RANDOM.STRING('A',255), > DBMS_RANDOM.STRING('A',255), > DBMS_RANDOM.STRING('A',255), > DBMS_RANDOM.STRING('A',255) > FROM > DUAL > CONNECT BY > LEVEL<=180000;
> Now to save time, we will use the T1 table to build the T2 table: > INSERT INTO > T2 > SELECT > SYS_ID, > C3, > C3, > C4, > C5, > C6, > C7, > C8, > C9, > C10 > FROM > T1;
> COMMIT;
> Now, collect statistics on the tables and indexes: > EXEC > DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE); > EXEC > DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE);
> Your query, rewritten to work with the above (note that a.active is > specified as 0 in this case): > select count(*) as recordcount from (T1 a inner join T2 b on a.sys_id > = > b.sys_id) where a.active = 0;
> Predicate Information (identified by operation id): > --------------------------------------------------- > 4 - access("A"."ACTIVE"=0 AND "A"."SYS_ID"="B"."SYS_ID")
> The above shows that the T1_IND1 index was probed 180,000 times. Note > that the plan order is not the same as what you posted, likely because > your table B (T2) contains more than 1 row per SYS_ID. > Plan statistics from a 10046 trace, 3000 consistent reads: > (Rows 1) SORT AGGREGATE (cr=3000 pr=0 pw=0 time=0 us) > (Rows 153000) NESTED LOOPS (cr=3000 pr=0 pw=0 time=2255352 us > cost=121 size=1170000 card=90000) > (Rows 180000) INDEX FAST FULL SCAN T2_IND1 (cr=382 pr=0 pw=0 > time=266066 us cost=104 size=900000 card=180000) > (Rows 153000) INDEX UNIQUE SCAN T1_IND1 (cr=2618 pr=0 pw=0 > time=0 us cost=0 size=8 card=1)
> Slightly modifying your query with a hint to force a hash join: > select /*+ use_hash(a b) */ > count(*) as recordcount from (T1 a inner join T2 b on a.sys_id = > b.sys_id) where a.active = 0; > The DBMS_XPLAN showing actual timing: > --------------------------------------------------------------------------- -------------------------------------------- > | Id | Operation | Name | Starts | E-Rows | A-Rows > | A-Time | Buffers | OMem | 1Mem | Used-Mem | > --------------------------------------------------------------------------- -------------------------------------------- > | 1 | SORT AGGREGATE | | 1 | 1 | 1 | > 00:00:02.42 | 818 | | | | > |* 2 | HASH JOIN | | 1 | 90000 | 153K| > 00:00:02.13 | 818 | 3624K| 1381K| 5510K (0)| > |* 3 | INDEX FAST FULL SCAN| T1_IND1 | 1 | 90000 | 153K| > 00:00:00.16 | 436 | | | | > | 4 | INDEX FAST FULL SCAN| T2_IND1 | 1 | 180K| 180K| > 00:00:00.18 | 382 | | | | > --------------------------------------------------------------------------- --------------------------------------------
> Predicate Information (identified by operation id): > --------------------------------------------------- > 2 - access("A"."SYS_ID"="B"."SYS_ID") > 3 - filter("A"."ACTIVE"=0)
> The above executed 0.03 seconds faster, and the consistent reads > dropped to 818. > (Rows 1) SORT AGGREGATE (cr=818 pr=0 pw=0 time=0 us) > (Rows 153000) HASH JOIN (cr=818 pr=0 pw=0 time=1352636 us > cost=458 size=1170000 card=90000) > (Rows 153000) INDEX FAST FULL SCAN T1_IND1 (cr=436 pr=0 pw=0 > time=238965 us cost=120 size=720000 card=90000) > (Rows 180000) INDEX FAST FULL SCAN T2_IND1 (cr=382 pr=0 pw=0 > time=278655 us cost=104 size=900000 card=180000)
> With just an ORDERED hint to force the same plan order as what you > posted: > | Id | Operation | Name | Starts | E-Rows | A-Rows > | A-Time | Buffers | > --------------------------------------------------------------------------- ----------------- > | 1 | SORT AGGREGATE | | 1 | 1 | 1 | > 00:00:02.24 | 3075 | > | 2 | NESTED LOOPS | | 1 | 90000 | 153K| > 00:00:01.99 | 3075 | > |* 3 | INDEX FAST FULL SCAN| T1_IND1 | 1 | 90000 | 153K| > 00:00:00.16 | 436 | > |* 4 | INDEX UNIQUE SCAN | T2_IND1 | 153K| 1 | 153K| > 00:00:00.72 | 2639 | > --------------------------------------------------------------------------- -----------------
> Predicate Information (identified by operation id): > --------------------------------------------------- > 3 - filter("A"."ACTIVE"=0) > 4 - access("A"."SYS_ID"="B"."SYS_ID")
> 0.21 seconds faster than the first execution, even though the > consistent reads increased to 3075 > (Rows 1) SORT AGGREGATE (cr=3075 pr=0 pw=0 time=0 us) > (Rows 153000) NESTED LOOPS (cr=3075 pr=0 pw=0 time=2026659 us > cost=128 size=1170000 card=90000) > (Rows 153000) INDEX FAST FULL SCAN T1_IND1 (cr=436 pr=0 pw=0 > time=239888 us cost=120 size=720000 card=90000) > (Rows 153000) INDEX UNIQUE SCAN T2_IND1 (cr=2639 pr=0 pw=0 > time=0 us cost=0 size=5 card=1)
> In the plans, look closely at the timings of each step in the plan. > The timings of a child step will roll up into the parent step > (indicated by the indentation in the plan). The actual scan of the > indexes happens reasonably fast, it is the join that takes the > majority of the elapsed time.
> Interesting fun with mathematics, which may not be entirely relevant. > On a computer with a computer marketed as having a 1333MHz bus speed, > using 333MHz quad pumped dual channel memory chips, each memory clock > cycle retrieves up to 32 bytes in 0.000000003003003 seconds (maximum > transfer speed of 10,162.35 MB per second), and the CPU core will wait > for this duration on every memory access. A standard 8KB block > requires a minimum of 256 memory clock cycles to be read, resulting in > a minimum delay of 0.000000768768768 seconds to read an 8KB block from > system memory. If you require the computer to perform 180,000 8KB > reads (assuming the data is not cached in the CPU registers, L1, L2, > or L3 caches), it will take a minimum of 0.138 seconds (consistent > reads of 8KB blocks might take 5-10 times longer). What seems like a > simple problem becomes a bit complicated when you dig into the
> We have an application that stores operational data as a single master > table and a set of child tables. Based on the class of data, a > different child table gets joined in e.g.
> master inner_join pets -> gives us pet data > master inner_join cars -> gives us car data
> It works reasonably well, and it makes the developer's lives easier. > Normally the working set of records is quite small as well e.g. < 1k > records at a time so the actual queries being issued are something > like:
> select <some fields> from a inner join b on a .sys_id = b.sys_id where > a.<some indexed column> = <some fairly selective criteria>
> There's one part of the application though that computes counts via > the not terribly complex:
> select count(*) from a inner join b on a sys_id = b.syss_id where > a.<some indexed column> = <some selective criteria>
> Recently though we've had some customers with some unusual data shape > such that the particular count query (above) is joining 200k records > or so together to implement the count.
> Looking at profile on the database, that count alone is something like > 68% of the server CPU load. Not surprisingly, I'd dearly love to do > something about this. Problem is, I'm stumped.
> Question I have is:
> Short of constructing a summary table (which is probably a non starter > in this environment), is there anything I could to to speed up the > return time of a count query like the above?
> Apologies if this is a stupid question, but as you can tell from the > batch of posts I've been putting on the newsgroup I'm on something of > a jihad to clean up some of the Oracle performance bottlenecks in this > app, that that is definately one of them.
Another option to consider is looking at is using/creating a materialized view to support the count query.
Going down the posts in the other part of this thread, if you force the count query to use a hash join, this might help the queries that return a high count ( 200k ) but could ( possibly severely ) impact the queries that ( running now with a nested loop ) return a small count.
On Apr 13, 11:20 am, hpuxrac <johnbhur...@sbcglobal.net> wrote:
> On Apr 12, 5:55 pm, Pat <pat.ca...@service-now.com> wrote:
> Another option to consider is looking at is using/creating a > materialized view to support the count query.
> Going down the posts in the other part of this thread, if you force > the count query to use a hash join, this might help the queries that > return a high count ( 200k ) but could ( possibly severely ) impact > the queries that ( running now with a nested loop ) return a small > count.
I'm going to show my ignorance here, but would a materialized view help if there's a lot of transactional volume on the table e.g. if there are, say, 500 or so updates per hour across that table? Wouldn't I need to refresh the view every few seconds to keep it current? Or is this the part I'm missing?
> On Apr 13, 11:20 am, hpuxrac <johnbhur...@sbcglobal.net> wrote: >> On Apr 12, 5:55 pm, Pat <pat.ca...@service-now.com> wrote:
>> Another option to consider is looking at is using/creating a >> materialized view to support the count query.
>> Going down the posts in the other part of this thread, if you force >> the count query to use a hash join, this might help the queries that >> return a high count ( 200k ) but could ( possibly severely ) impact >> the queries that ( running now with a nested loop ) return a small >> count.
> I'm going to show my ignorance here, but would a materialized view > help if there's a lot of transactional volume on the table e.g. if > there are, say, 500 or so updates per hour across that table? Wouldn't > I need to refresh the view every few seconds to keep it current? Or is > this the part I'm missing?
There is an option to "refresh on commit".
It's not a good idea to use it on a very busy table as the overheads are dramatic (a single row update resulted in about 45 statement executions on the commit the last time I tested it). However, at one update every 7 seconds, you may find that the overheads are acceptable.
There are a couple of oddities with read-consistency, though.
Check what happens if you update the table, then (from the same session) run the query BEFORE committing. Your version of the query has to go to the base tables to see your update rather than using the materialized view - and you may then be caught in a trap where other users start sharing the cursor that visited the base tables rather than the view. Unless you have very good control over how the updates and queries synchronise, you could get caught in the trap of randomly changing performance.
A completely different thought - this may be a case where you want to get literal values into your query and build histograms to support the query. You had 150,000 out of 180,000 rows with status 'active'. If you create a histogram on the column, and use literals in your SQL you may find that the optimizer uses the 'damage-limiting' hash join on the worst case, and the nested loop on the other cases. The cost of the extra optimisation stages may be insignificant compared to the benefit of doing the right thing every time.
> > On Apr 13, 11:20 am, hpuxrac <johnbhur...@sbcglobal.net> wrote: > >> On Apr 12, 5:55 pm, Pat <pat.ca...@service-now.com> wrote:
> >> Another option to consider is looking at is using/creating a > >> materialized view to support the count query.
> >> Going down the posts in the other part of this thread, if you force > >> the count query to use a hash join, this might help the queries that > >> return a high count ( 200k ) but could ( possibly severely ) impact > >> the queries that ( running now with a nested loop ) return a small > >> count.
> > I'm going to show my ignorance here, but would a materialized view > > help if there's a lot of transactional volume on the table e.g. if > > there are, say, 500 or so updates per hour across that table? Wouldn't > > I need to refresh the view every few seconds to keep it current? Or is > > this the part I'm missing?
> There is an option to "refresh on commit".
> It's not a good idea to use it on a very busy table as the overheads > are dramatic (a single row update resulted in about 45 statement > executions on the commit the last time I tested it). However, at > one update every 7 seconds, you may find that the overheads are > acceptable.
> There are a couple of oddities with read-consistency, though.
> Check what happens if you update the table, then (from the same session) > run the query BEFORE committing. Your version of the query has to go to > the base tables to see your update rather than using the materialized > view - > and you may then be caught in a trap where other users start sharing the > cursor that visited the base tables rather than the view. Unless you have > very good control over how the updates and queries synchronise, you > could get caught in the trap of randomly changing performance.
> A completely different thought - this may be a case where you want to > get literal values into your query and build histograms to support the > query. You had 150,000 out of 180,000 rows with status 'active'. > If you create a histogram on the column, and use literals in your SQL > you may find that the optimizer uses the 'damage-limiting' hash join on > the worst case, and the nested loop on the other cases. The cost of > the extra optimisation stages may be insignificant compared to the benefit > of doing the right thing every time.
In this case, looks like having a histogram didn't help. The active column has two potential values (0 or 1), so it's essentially a boolean. Even with a 2 columns histogram on it, the optimizer is still going down nested loops.
I may take a crack at putting in an a refreshing materialized view though. Need to sit down and do a little match on relative transaction frequency though, want to make sure I end up saving cycles rather than burning them.
> In this case, looks like having a histogram didn't help. The active > column has two potential values (0 or 1), so it's essentially a > boolean. Even with a 2 columns histogram on it, the optimizer is still > going down nested loops.
> I may take a crack at putting in an a refreshing materialized view > though. Need to sit down and do a little match on relative transaction > frequency though, want to make sure I end up saving cycles rather than > burning them.
The costing in this plan are a little surprising - but there are two possibilities that I can think of might explain them
What's your value for optimizer_index_caching - has it been set to something close to 100 ?
Is this (very specifically) running 10.2.0.1 ?
I guess it's is also possible that the NLS index is introducing a side effect that I've not come across before - can you show us the plan you get (including costs and predicates as above) when you hint the hash join suggested by Charles Hooper.
>>> On Apr 13, 11:20 am, hpuxrac <johnbhur...@sbcglobal.net> wrote: >>>> On Apr 12, 5:55 pm, Pat <pat.ca...@service-now.com> wrote: >>>> Another option to consider is looking at is using/creating a >>>> materialized view to support the count query. >>>> Going down the posts in the other part of this thread, if you force >>>> the count query to use a hash join, this might help the queries that >>>> return a high count ( 200k ) but could ( possibly severely ) impact >>>> the queries that ( running now with a nested loop ) return a small >>>> count. >>> I'm going to show my ignorance here, but would a materialized view >>> help if there's a lot of transactional volume on the table e.g. if >>> there are, say, 500 or so updates per hour across that table? Wouldn't >>> I need to refresh the view every few seconds to keep it current? Or is >>> this the part I'm missing? >> There is an option to "refresh on commit".
>> It's not a good idea to use it on a very busy table as the overheads >> are dramatic (a single row update resulted in about 45 statement >> executions on the commit the last time I tested it). However, at >> one update every 7 seconds, you may find that the overheads are >> acceptable.
>> There are a couple of oddities with read-consistency, though.
>> Check what happens if you update the table, then (from the same session) >> run the query BEFORE committing. Your version of the query has to go to >> the base tables to see your update rather than using the materialized >> view - >> and you may then be caught in a trap where other users start sharing the >> cursor that visited the base tables rather than the view. Unless you have >> very good control over how the updates and queries synchronise, you >> could get caught in the trap of randomly changing performance.
>> A completely different thought - this may be a case where you want to >> get literal values into your query and build histograms to support the >> query. You had 150,000 out of 180,000 rows with status 'active'. >> If you create a histogram on the column, and use literals in your SQL >> you may find that the optimizer uses the 'damage-limiting' hash join on >> the worst case, and the nested loop on the other cases. The cost of >> the extra optimisation stages may be insignificant compared to the benefit >> of doing the right thing every time.
> In this case, looks like having a histogram didn't help. The active > column has two potential values (0 or 1), so it's essentially a > boolean. Even with a 2 columns histogram on it, the optimizer is still > going down nested loops.
> I may take a crack at putting in an a refreshing materialized view > though.
An alternative approach may be to use a trigger and a counter table. You might face locking issues though. That depends on how much concurrency you have on that table, how much original rows fold down into one row and on the duration of your transactions.
> Need to sit down and do a little match on relative transaction > frequency though, want to make sure I end up saving cycles rather than > burning them.
On Apr 15, 1:09 am, "Shakespeare" <what...@xs4all.nl> wrote:
> Just another "probably stupid" question:
> Unfortunaltely, my testdatabase could not handle the test scripts (it's > sized too small I think) but would reversing the columns in
> CREATE UNIQUE INDEX T1_IND1 ON T1(ACTIVE,SYS_ID); > to
> CREATE UNIQUE INDEX T1_IND1 ON T1(SYS_ID,ACTIVE);
> do some good here? (Since active is not very discriminating and the join is > on SYS_ID)? Would like to have tested this myself, but no success...
> Shakespeare
You'd think it might help a little bit, but I've got an index on just sys_id as well (for other reasons, not for this query), and the optimizer seems to not want to use it so he thinks at least that he gets enough discrimination out of active to make it worthwhile.
In case anyone is curious though, I did run down some more information here and it makes sense (to me at least).
It was kind of weird that the optimizer kept choosing nested loop joins even when a hash join was demonstrably much faster. Turns out one of my colleages (in an effort to work around an unrelated issue) had decided to set:
With those settings in place, he ran nested loops consistently.
With the default optimizer_index_caching=0 optimizer_index_cost_adj=100
He runs a hash join across this same data set.
Gonna run some more tests, but if that looks good I'm going to look at adjusting those parameters on production box. Issue they were implemented to work around was resolved another way in any event.