After another workshop of OracleTurk

I had my 2 presentation this weekend. It was a fantastic book of Lewis, CBO optimizer. I had met possible future interns of Turkcell. After nearly 10 session they got really confused :) . During my first session, “Selectivity issues”, I showed my own examples. Since the topic is issues, we really dealt with lots of confusing scenerios. Hereby, i will give an example of dynamic sampling and constraints over selectivity in action.

First of all, create a table of 2 columns. Both columns are dependent to each other, in fact they are the same.


drop table dependent_cols purge;
create table dependent_cols
as
select c1 c1, c1 c2
from (select trunc(dbms_random.value(0,35)) c1 from dual connect by level < 100000);

10g XE did not gathered statistics for me, well it is the stuation I want.


SQL> select low_value, high_value from user_tab_columns utc where utc.TABLE_NAME = ‘DEPENDENT_COLS’;

LOW_VALUE HIGH_VALUE
—————————————————————- —————————————————————-

same view from “select * from user_tables ut where ut.TABLE_NAME = ‘DEPENDENT_COLS’;” SQL, lots of nulls in row. Oracle does not really know a glimps of table. How will it going to calculate cardinality? After workshop one of my collagues (Derya) and me stayed in office and make tests on 9i and on 10g. The key point is 9i uses “choose” mod of optimizer_mode, by default, on the other hand 10g uses all_rows.


SQL> select display_value from v$parameter p where p.NAME = ‘optimizer_mode’;

DISPLAY_VALUE
——————————————————————————–
ALL_ROWS

Below there are some results of cardinality of ALL_ROWS, FIRST_ROWS and RULE from explain plans. This was the table without statistics I created above on my 10g XE.


explain plan for
select *
from dependent_cols dc
where dc.c1 = 22;

MODE | Cardinality
ALL_ROWS | 3127
RULE | NULL
FIRST_ROWS | 3127

actual cardinality is 2834.


SQL> select count(*)
2 from dependent_cols dc
3 where dc.c1 = 22;

COUNT(*)
———-
2834

Since the data is random, 3127 is not a bad estimate. After an equality predicate, I tried 2 dependent-column based predicate SQLs.


SQL> alter session set optimizer_mode=all_rows;
SQL> explain plan for
2 select *
3 from dependent_cols dc
4 where dc.c1 = dc.c2;
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2076668931
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 115K| 2927K| 52 (12)| 00:00:
|* 1 | TABLE ACCESS FULL| DEPENDENT_COLS | 115K| 2927K| 52 (12)| 00:00:
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DC"."C1"="DC"."C2")
Note
-----
- dynamic sampling used for this statement

SQL> explain plan for
2 select *
3 from dependent_cols dc
4 where dc.c1 != dc.c2;

Explained
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2076668931
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 52 | 52 (12)| 00:00:
|* 1 | TABLE ACCESS FULL| DEPENDENT_COLS | 2 | 52 | 52 (12)| 00:00:
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DC"."C1""DC"."C2")
Note
-----
- dynamic sampling used for this statement

It is an absolutely good estimate. Without knowled of statistics c1=c2 cardinality is 115000, c1!=c2 cardinality is 2. Our domain knowledge of data tells, c1=c2 is true in all rows. But how can Oracle does a good job without statistics? Well, it was dynamic sampling. Our tests on 9i showed us, “choose” mod of optimizer chooses “rule” mod if there is no statistic, so rule based execution will be used instead of CBO based execution. But default behavior of “all_rows” and “first_rows” is try to make sampling. According to DBAzine.com level 2 of optimizer_dynamic_sampling parameter means make dynamic sampling on all unanalized objects that have more than 32 blocks.


SQL> select value from v$parameter where name like ‘%samp%’;
VALUE
2

How does oracle do dynamic sampling? I made a 10046 trace.


SQL> alter session set events ‘10046 trace name context forever, level 2′;
SQL> alter session set tracefile_identifier = ‘dyna_sample_test’;
SQL> explain plan for
2 select *
3 from dependent_cols dc
4 where dc.c1 != dc.c2;

[A part of trace file]
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param(‘parallel_execution_enabled’, ‘false’) NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */
NVL(SUM(C1), :”SYS_B_0″), NVL(SUM(C2), :”SYS_B_1″)
FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL(“DC”) FULL(“DC”) NO_PARALLEL_INDEX(“DC”) */
:”SYS_B_2″ AS C1,
CASE
WHEN “DC”.”C1″ = “DC”.”C2″ THEN
:”SYS_B_3″
ELSE
:”SYS_B_4″
END AS C2
FROM “DEPENDENT_COLS” SAMPLE BLOCK(:”SYS_B_5″, :”SYS_B_6″) SEED(:”SYS_B_7″) “DC”) SAMPLESUB
[A part of trace file]

So what is oracle telling us with this SQL? “Do not ever never do parallel query”, “read %n of table”, “check out how many of C1 = C2″. If I gather statistics on this table, Oracle gives up with dynamic sampling. This is the realistic behavior, because sampling does not come free. Another issue I saw is misusage of constraints. In fact oracle does well with transitive closure at joins, however it is unable to use this feature on single table selectivity. Below scenerio is just a waste of resources, I think.


SQL> alter table dependent_cols add constraint c1_eq_c2 check(c1=c2);
SQL> explain plan for
2 select *
3 from dependent_cols dc
4 where dc.c1 = dc.c2;
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
——————————————————————————–
Plan hash value: 2076668931
——————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
——————————————————————————–
| 0 | SELECT STATEMENT | | 115K| 2927K| 52 (12)| 00:00:
|* 1 | TABLE ACCESS FULL| DEPENDENT_COLS | 115K| 2927K| 52 (12)| 00:00:
——————————————————————————–
Predicate Information (identified by operation id):
—————————————————
1 – filter(“DC”.”C1″=”DC”.”C2″)
Note
—–
– dynamic sampling used for this statement

SQL> explain plan for
2 select *
3 from dependent_cols dc
4 where dc.c1 != dc.c2;
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
——————————————————————————–
Plan hash value: 2076668931
——————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
——————————————————————————–
| 0 | SELECT STATEMENT | | 2 | 52 | 52 (12)| 00:00:
|* 1 | TABLE ACCESS FULL| DEPENDENT_COLS | 2 | 52 | 52 (12)| 00:00:
——————————————————————————–
Predicate Information (identified by operation id):
—————————————————
1 – filter(“DC”.”C1″”DC”.”C2″)
Note
—–
– dynamic sampling used for this statement

Oracle really knows, C1 IS EQUAL TO C2, but it does dynamic sampling. This is not real execution path of SQL, just CBO’s estimates, I do not know it really uses this constraint on runtime.

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*