Filtered Indexes on Oracle

Today i was reading http://tonguc.wordpress.com/2008/06/26/oracle-vs-sql-server-a-never-ending-story/ Tonguç’s blog, I really liked MS’s filtered index idea. It could be possible to use it in oracle with a workaround.

Read More »

Selecting Kth

Not an oracle related post. Life is stange, full of unexpectable scenerios. I think it wont be easy to make posts about oracle from now on. The below code is an algorithim comparision for finding th Nth element of a given list.
Read More »

A quick overview of diff_table*

10.2.0.4 came with some 11g features, an extended dbms_stats package. I had made some basic demonstraions below.

 Read More »

Parallelism Partitioning CBO issue

Parallelism and partitioning are two excellent features of Oracle. What is the relation between CBO and these two guys? Well I catch some issues. First of all lets create a table with range partitioning:


SQL> select version from v$instance;
VERSION
-----------------
10.2.0.1.0


CREATE TABLE part_table
PARTITION BY RANGE (c1)(
PARTITION p1 VALUES LESS THAN (100000),
PARTITION p2 VALUES LESS THAN (200000),
PARTITION p3 VALUES LESS THAN (MAXVALUE))
nologging
as
select
level c1, cast(' ' as  char(2000)) filler
from dual
connect by level < 300000;


CREATE INDEX part_table_c1_idx
ON part_table (c1)
LOCAL (
PARTITION p1,
PARTITION p2,
PARTITION p3);

gather stats...

Just a table with three partitions and each partition has a local index. C1 column has strictly continuous data. Lets do some exercises.


select * from part_table where c1 = 10000;
--------------------------------------------------------------------------------
| Id  | Operation                          | Name              | Rows  | Bytes |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                   |     1 |  2005 |
|   1 |  PARTITION RANGE SINGLE            |                   |     1 |  2005 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| PART_TABLE        |     1 |  2005 |
|*  3 |    INDEX RANGE SCAN                | PART_TABLE_C1_IDX |     1 |       |
--------------------------------------------------------------------------------

Here we see partition pruning, only partition P1 and its index considered as source data. Of course index path is used, because it is a singe-row access, no need for a FTS. below there is an example for 2 partition usage with index path.


select * from part_table where c1 in (10000, 190000);
--------------------------------------------------------------------------------
| Id  | Operation                           | Name              | Rows  | Bytes
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                   |     2 |  4010
|   1 |  INLIST ITERATOR                    |                   |       |
|   2 |   PARTITION RANGE ITERATOR          |                   |     2 |  4010
|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID| PART_TABLE        |     2 |  4010
|*  4 |     INDEX RANGE SCAN                | PART_TABLE_C1_IDX |     2 |
--------------------------------------------------------------------------------

Now, I want to see an index access with a FTS, indeed. What I meant is:


select * from part_table
where c1 between 0 and 100000-1
or c1 = 150000;
--------------------------------------------------------------------------------
| Id  | Operation                           | Name              | Rows  | Bytes
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                   |   100K|   191M
|   1 |  CONCATENATION                      |                   |       |
|   2 |   PARTITION RANGE SINGLE            |                   |     1 |  2005
|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID| PART_TABLE        |     1 |  2005
|*  4 |     INDEX RANGE SCAN                | PART_TABLE_C1_IDX |     1 |
|   5 |   PARTITION RANGE SINGLE            |                   | 99999 |   191M
|*  6 |    TABLE ACCESS FULL                | PART_TABLE        | 99999 |   191M
--------------------------------------------------------------------------------
cost: 7393

Well done. I wanted to read all of P1 and one row from P2, FTS for P1 and index access for P2. But now, I want it to be parallel, some processes will read from index of p2 others will make an FTS on P1


select /*+ PARALLEL(part_table,2) */ * from part_table
where c1 between 0 and 100000-1
or c1 = 150000;
--------------------------------------------------------------------------------
| Id  | Operation                           | Name              | Rows  | Bytes
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                   |   100K|   191M
|   1 |  CONCATENATION                      |                   |       |
|   2 |   PARTITION RANGE SINGLE            |                   |     1 |  2005
|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID| PART_TABLE        |     1 |  2005
|*  4 |     INDEX RANGE SCAN                | PART_TABLE_C1_IDX |     1 |
|   5 |   PARTITION RANGE SINGLE            |                   | 99999 |   191M
|*  6 |    TABLE ACCESS FULL                | PART_TABLE        | 99999 |   191M
--------------------------------------------------------------------------------
cost:4085

select /*+ PARALLEL(part_table,4) */ * from part_table
where c1 between 0 and 100000-1
or c1 = 150000;
[same exec plan]
cost:2044

CBO’s formula for costing concatenation is cost(INDEX RANGE SCAN) + cost(TABLE ACCESS FULL)/degree of parallelism. It is realistic to not dividing cost of INDEX RANGE SCAN with DOG (degree of parallelism). But I do not find it realistic to just divide the cost of TABLE ACCESS FULL with DOG. If one programmer writes a basic ERP program in 2 months, one cannot say 2 programmers write the same program in 1 month. As I said before, I want it to be parallel, some of process will read from index of p2 others will make an FTS on P1


begin
  update part_table set c1 = 10000 where c1  < 100000;
  dbms_stats.gather_table_stats( ownname => USER,tabname => 'PART_TABLE',cascade => TRUE);
  commit;
end;

I updated all of P1, there is only one value for C1 column.


select  * from part_table where c1 in (10000);
--------------------------------------------------------------------------------
| Id  | Operation              | Name       | Rows  | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |            |   100K|   191M|  7389   (1)| 00:01
|   1 |  PARTITION RANGE SINGLE|            |   100K|   191M|  7389   (1)| 00:01
|*  2 |   TABLE ACCESS FULL    | PART_TABLE |   100K|   191M|  7389   (1)| 00:01
--------------------------------------------------------------------------------

select  * from part_table where c1 in (190000);
--------------------------------------------------------------------------------
| Id  | Operation                          | Name              | Rows  | Bytes |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                   |     1 |  2005 |
|   1 |  PARTITION RANGE SINGLE            |                   |     1 |  2005 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| PART_TABLE        |     1 |  2005 |
|*  3 |    INDEX RANGE SCAN                | PART_TABLE_C1_IDX |     1 |       |
--------------------------------------------------------------------------------

Alrigth, but if I give a predicate that includes both of them, CBO does not use index range scan for P2:


select * from part_table where c1 in (10000, 190000);
--------------------------------------------------------------------------------
| Id  | Operation              | Name       | Rows  | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |            | 98966 |   189M| 14778   (1)| 00:02
|   1 |  PARTITION RANGE INLIST|            | 98966 |   189M| 14778   (1)| 00:02
|*  2 |   TABLE ACCESS FULL    | PART_TABLE | 98966 |   189M| 14778   (1)| 00:02
--------------------------------------------------------------------------------

select /*+ PARALLEL(part_table,4) */ * from part_table where c1 in (10000, 190000);
--------------------------------------------------------------------------------
| Id  | Operation            | Name       | Rows  | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            | 98966 |   189M|  4082   (1)| 00:00:4
|   1 |  PX COORDINATOR      |            |       |       |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000   | 98966 |   189M|  4082   (1)| 00:00:4
|   3 |    PX BLOCK ITERATOR |            | 98966 |   189M|  4082   (1)| 00:00:4
|*  4 |     TABLE ACCESS FULL| PART_TABLE | 98966 |   189M|  4082   (1)| 00:00:4
--------------------------------------------------------------------------------

What I expect from Oracle to hire one process for reading P2 with index and hire other processes to FTS P1. Forcing oracle to do parallelism on this partitioned table causes a parallel FTS on P2, which is higher than a non-parallel index access. Anyway, best case was concatenation, which looks like below, also nearly the same cost of concatenation with 4 parallelism


select /*+ PARALLEL(p1,4) */ * from part_table partition (p1) p1 where c1 = 10000
union all
select /*+ PARALLEL(p2,1) */ * from part_table partition (p2) p2 where c1 = 190000;
--------------------------------------------------------------------------------
| Id  | Operation                           | Name              | Rows  | Bytes
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                   |   100K|   191M
|   1 |  UNION-ALL                          |                   |       |
|   2 |   PX COORDINATOR                    |                   |       |
|   3 |    PX SEND QC (RANDOM)              | :TQ10000          |   100K|   191M
|   4 |     PX BLOCK ITERATOR               |                   |   100K|   191M
|*  5 |      TABLE ACCESS FULL              | PART_TABLE        |   100K|   191M
|   6 |   PARTITION RANGE SINGLE            |                   |     1 |  2005
|   7 |    TABLE ACCESS BY LOCAL INDEX ROWID| PART_TABLE        |     1 |  2005
|*  8 |     INDEX RANGE SCAN                | PART_TABLE_C1_IDX |     1 |
--------------------------------------------------------------------------------
cost: 2043

I want it to be parallel, some processes should read from index of p2 others should make an FTS on P1 at the same time. Each partition should have its own process group and each group should have their very own execution plan. May be one group using bitmap, one group making rowid separated parallelism, one group making index-skip-scan. “Buradan yetkililere sesleniyorum”

After another workshop of OracleTurk

I had my 2 presentation this weekend. It was a fantastic book of Lewis, CBO optimizer. I had met possible future interns of Turkcell. After nearly 10 session they got really confused🙂. During my first session, “Selectivity issues”, I showed my own examples. Since the topic is issues, we really dealt with lots of confusing scenerios. Hereby, i will give an example of dynamic sampling and constraints over selectivity in action.

First of all, create a table of 2 columns. Both columns are dependent to each other, in fact they are the same.


drop table dependent_cols purge;
create table dependent_cols
as
select c1 c1, c1 c2
from (select trunc(dbms_random.value(0,35)) c1 from dual connect by level < 100000);

10g XE did not gathered statistics for me, well it is the stuation I want.


SQL> select low_value, high_value from user_tab_columns utc where utc.TABLE_NAME = ‘DEPENDENT_COLS’;

LOW_VALUE HIGH_VALUE
—————————————————————- —————————————————————-

same view from “select * from user_tables ut where ut.TABLE_NAME = ‘DEPENDENT_COLS’;” SQL, lots of nulls in row. Oracle does not really know a glimps of table. How will it going to calculate cardinality? After workshop one of my collagues (Derya) and me stayed in office and make tests on 9i and on 10g. The key point is 9i uses “choose” mod of optimizer_mode, by default, on the other hand 10g uses all_rows.


SQL> select display_value from v$parameter p where p.NAME = ‘optimizer_mode’;

DISPLAY_VALUE
——————————————————————————–
ALL_ROWS

Below there are some results of cardinality of ALL_ROWS, FIRST_ROWS and RULE from explain plans. This was the table without statistics I created above on my 10g XE.


explain plan for
select *
from dependent_cols dc
where dc.c1 = 22;

MODE | Cardinality
ALL_ROWS | 3127
RULE | NULL
FIRST_ROWS | 3127

actual cardinality is 2834.


SQL> select count(*)
2 from dependent_cols dc
3 where dc.c1 = 22;

COUNT(*)
———-
2834

Since the data is random, 3127 is not a bad estimate. After an equality predicate, I tried 2 dependent-column based predicate SQLs.


SQL> alter session set optimizer_mode=all_rows;
SQL> explain plan for
2 select *
3 from dependent_cols dc
4 where dc.c1 = dc.c2;
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2076668931
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 115K| 2927K| 52 (12)| 00:00:
|* 1 | TABLE ACCESS FULL| DEPENDENT_COLS | 115K| 2927K| 52 (12)| 00:00:
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DC"."C1"="DC"."C2")
Note
-----
- dynamic sampling used for this statement

SQL> explain plan for
2 select *
3 from dependent_cols dc
4 where dc.c1 != dc.c2;

Explained
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2076668931
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 52 | 52 (12)| 00:00:
|* 1 | TABLE ACCESS FULL| DEPENDENT_COLS | 2 | 52 | 52 (12)| 00:00:
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DC"."C1""DC"."C2")
Note
-----
- dynamic sampling used for this statement

It is an absolutely good estimate. Without knowled of statistics c1=c2 cardinality is 115000, c1!=c2 cardinality is 2. Our domain knowledge of data tells, c1=c2 is true in all rows. But how can Oracle does a good job without statistics? Well, it was dynamic sampling. Our tests on 9i showed us, “choose” mod of optimizer chooses “rule” mod if there is no statistic, so rule based execution will be used instead of CBO based execution. But default behavior of “all_rows” and “first_rows” is try to make sampling. According to DBAzine.com level 2 of optimizer_dynamic_sampling parameter means make dynamic sampling on all unanalized objects that have more than 32 blocks.


SQL> select value from v$parameter where name like ‘%samp%’;
VALUE
2

How does oracle do dynamic sampling? I made a 10046 trace.


SQL> alter session set events ‘10046 trace name context forever, level 2’;
SQL> alter session set tracefile_identifier = ‘dyna_sample_test’;
SQL> explain plan for
2 select *
3 from dependent_cols dc
4 where dc.c1 != dc.c2;

[A part of trace file]
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param(‘parallel_execution_enabled’, ‘false’) NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */
NVL(SUM(C1), :”SYS_B_0″), NVL(SUM(C2), :”SYS_B_1″)
FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL(“DC”) FULL(“DC”) NO_PARALLEL_INDEX(“DC”) */
:”SYS_B_2″ AS C1,
CASE
WHEN “DC”.”C1″ = “DC”.”C2″ THEN
:”SYS_B_3″
ELSE
:”SYS_B_4″
END AS C2
FROM “DEPENDENT_COLS” SAMPLE BLOCK(:”SYS_B_5″, :”SYS_B_6″) SEED(:”SYS_B_7″) “DC”) SAMPLESUB
[A part of trace file]

So what is oracle telling us with this SQL? “Do not ever never do parallel query”, “read %n of table”, “check out how many of C1 = C2”. If I gather statistics on this table, Oracle gives up with dynamic sampling. This is the realistic behavior, because sampling does not come free. Another issue I saw is misusage of constraints. In fact oracle does well with transitive closure at joins, however it is unable to use this feature on single table selectivity. Below scenerio is just a waste of resources, I think.


SQL> alter table dependent_cols add constraint c1_eq_c2 check(c1=c2);
SQL> explain plan for
2 select *
3 from dependent_cols dc
4 where dc.c1 = dc.c2;
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
——————————————————————————–
Plan hash value: 2076668931
——————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
——————————————————————————–
| 0 | SELECT STATEMENT | | 115K| 2927K| 52 (12)| 00:00:
|* 1 | TABLE ACCESS FULL| DEPENDENT_COLS | 115K| 2927K| 52 (12)| 00:00:
——————————————————————————–
Predicate Information (identified by operation id):
—————————————————
1 – filter(“DC”.”C1″=”DC”.”C2″)
Note
—–
– dynamic sampling used for this statement

SQL> explain plan for
2 select *
3 from dependent_cols dc
4 where dc.c1 != dc.c2;
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
——————————————————————————–
Plan hash value: 2076668931
——————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
——————————————————————————–
| 0 | SELECT STATEMENT | | 2 | 52 | 52 (12)| 00:00:
|* 1 | TABLE ACCESS FULL| DEPENDENT_COLS | 2 | 52 | 52 (12)| 00:00:
——————————————————————————–
Predicate Information (identified by operation id):
—————————————————
1 – filter(“DC”.”C1″”DC”.”C2″)
Note
—–
– dynamic sampling used for this statement

Oracle really knows, C1 IS EQUAL TO C2, but it does dynamic sampling. This is not real execution path of SQL, just CBO’s estimates, I do not know it really uses this constraint on runtime.

Nested Loop / Hash / Merge over Event 10053 trace file

One of my colleague asked me about how he can write better SQL. Despite he is a java developer, I appreciate his will to understand database. When I said, “I really can not teach you this in ten minutes”, he lost his will. Any way, Lets talk about 10053 trace file. Oracle dumbs its execution path of optimizer on runtime to disk. It is a human readable document, but not so “user-friendly”. No problem, I like “developer-friendly” programs. If I want to join two tables in SQL, what will oracle do? Well it has some abilities to join, compares these ways, and selects the best execution path. Once upon a time, in the ancient days when I was learning html-16 years old, Oracle used to look at tables and look at SQL and without making a comparison between its execution paths, picks up one of the execution plans based on a decision tree, called RuleBasedOptimizer. Now it is dead, CostBasedOptimizer is the king. 10053 has many text about behavior of CBO. There are 3 main ways to join tables.
First is nested loop. For each row in first table, go and find a matched pair in second table. How we reach each tables? Well, it really depends on table type, index type, data characteristic, clustering factor, using histogram or not, update or out of date statistical data about tables, server parameters like, having asynchronous I/O, single datablock read count, multiple datablock read count, current workload etc… Nested loop is commonly used way to join tables.
Hashing bases on regular hash functions. Hash function is an irreversible function, has O(1) complexity. If we compare hashing with nested loops, NL has O(m*n) complexity (m = row size of first dataset from first table, n = same for second table, in oracle terminology call it cardinality). Oracle reads first dataset creates a hash table, it may have collusion. And read second dataset, tries to hit hash table with second datasets values. When a hit occurs makes a exact comparision between values that have the same hash value. That is all, but the real problem is size of hash table. “OK, lets do it bigger without collusion”, no databases do not serves to one client, we need to it using least resource as we can, for happy end users. If given hash table size for a dataset does not enough to hold each dataset, oracle dumps partially hast table and datasets to disk and acts like an acrobat. One may say “Hashing has O(1) access, however NL has O(m*n), why do we need NL?”. Because hashing may use much more CPU and memory. O() thing is academic, forget about it when you graduate from collage.
Merge is a basically a NL with sorting. In fact it is all about sorting. We take the first dataset and sort it, and vice-versa. After sorting each dataset it is much more easier to join. It is not needed to do a sort when optimizer chooses to use index access to read a join column, because indexed leafs are already sorted on disk. Also sorting of each dataset do not have to be done in a serial fashion. Like hash table area of hashing, merging makes use of sort area. Of course same problems and similar solutions applied for size of sort area.

I will use an example that bases on an example by Jonathan Lewis (He gave me permission about that by email). Lets create two tables. (I use Oracle XE with very default parameters)


drop table build_tab purge;
drop table probe_tab purge;

create table probe_tab
as
select
10000 + rownum id,
trunc(dbms_random.value(0,5000)) n1,
rpad(rownum,20) probe_vc,
rpad(‘x’,500) padding
from dual connect by level <= 15000;

alter table probe_tab add constraint pt_pk primary key(id);

create table build_tab
as
select
rownum id,
10001 + trunc(dbms_random.value(0,5000)) id_probe,
rpad(rownum,20) build_vc,
rpad(‘x’,500) padding
from all_objects
where rownum ‘HR’, tabname => ‘probe_tab’);

begin
dbms_stats.gather_table_stats(ownname => ‘HR’, tabname => ‘build_tab’);
dbms_stats.gather_table_stats(ownname => ‘HR’, tabname => ‘build_tab’);
end;
/

I gathered table stats, oracle now have the chance of tasting the data. With the help of analysis of tables, CBO have updated knowledge about join operation.


SQL> select name, display_value from v$parameter where name like ‘%block%’;

NAME DISPLAY_VALUE
——————————————————————————– ——————————————————————————–
db_block_buffers 0
db_block_checksum TRUE
db_block_size 8192
db_file_multiblock_read_count 128
db_block_checking FALSE
SQL> select name, display_value from v$parameter where name like ‘%area_size%’;

NAME DISPLAY_VALUE
——————————————————————————– ——————————————————————————–
create_bitmap_area_size 8388608
bitmap_merge_area_size 1048576
hash_area_size 131072
sort_area_size 65536
workarea_size_policy AUTO
SQL> select name, display_value from v$parameter where name like ‘%pga%’;

NAME DISPLAY_VALUE
——————————————————————————– ——————————————————————————–
pga_aggregate_target 190M

These are some of my parameters. db_file_multiblock_read_count, db_block_size dictates oracle to choose multiblock read or single block read. Lower multiblock read cost means we will see more full table scans. Blocks are atoms of Oracles. When oracle want to do I/O it will be at least the size of a db_block_size. Any change in these parameter will change everything in system.
Once upon a time workarea_size_policy is not auto. If it is not auto, hash_area_size and sort_area_size will dictate hashing and sorting area sizes. No matter of how loaded is your system, oracle will use these specified area sizes. Does not this sound a bit hazardous. Without any dependency of connected client count of server, each client will have the same size of area, even their operations are not the same. In fact it is possible to override these paramters in session. It is 21th century, we have workarea_size_policy=auto parameter. This means “keep sum of pgas near pga_aggregate_target value”. It is an upper limit, an individual pga area can have %5 or %10 (I always confuse), a user may be using parallel execution, each slave is limited to this percentage, of course there is a limit for slave numbers. So, it is vital to know about these parameters before reading trace file.

alter session set events ‘10053 trace name context forever, level 2’;
alter session set tracefile_identifier = ‘compare’;

Here comes “dump the trace file” command. As you see it is session level, all execution will be dumped to %ORACLE_HOME%\app\oracle\admin\XE\udump directory.

— 543
select /*+ USE_NL (bt pt) */ *
from build_tab bt, probe_tab pt
where bt.id between 1 and 500
and bt.id_probe = pt.id;

— 367
select *
from build_tab bt, probe_tab pt
where bt.id between 1 and 500
and bt.id_probe = pt.id;

— 1293
select /*+ USE_MERGE (pt bt) */ *
from build_tab bt, probe_tab pt
where bt.id between 1 and 500
and bt.id_probe = pt.id;

these are three queries I mentioned above using NL, Hash and Merge join strategies. Lets have a look at some parts of trace output:

SQL:******* UNPARSED QUERY IS *******
SELECT /*+ USE_NL (“PT”) USE_NL (“BT”) */ “BT”.”ID” “ID”,”BT”.”ID_PROBE” “ID_PROBE”,”BT”.”BUILD_VC” “BUILD_VC”,”BT”.”PADDING” “PADDING”,”PT”.”ID” “ID”,”PT”.”N1″ “N1″,”PT”.”PROBE_VC” “PROBE_VC”,”PT”.”PADDING” “PADDING” FROM “HR”.”BUILD_TAB” “BT”,”HR”.”PROBE_TAB” “PT” WHERE “BT”.”ID”>=1 AND “BT”.”ID” best plan cost
***********************
(newjo-stop-1) k:0, spcnt:0, perm:2, maxperm:2000
*********************************
Number of join permutations tried: 2
*********************************
(newjo-save) [1 0 ]
Final – All Rows Plan: Best join order: 1

All these rows means, Oracle compared left to right join and right to left. Outer table means use this table as driver table. Iterate over its rows. So oracle makes a single access to this table. Here best outer table is BUILD_TAB with single table access cost of 42.11. Driven table will be accessed driver datasets row cardinality times. Without using index, accessing cost to PROBE_TAB + 42.11 is 161367.82, on the other hand if we use index making a unique index access cost is 1, accessing 500 separate rows is 500. So 500 + 42 = 542, far far far from 161367.82 and “Best join order: 1”.

============
Plan Table
============
————————————————-+———————————–+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
————————————————-+———————————–+
| 0 | SELECT STATEMENT | | | | 543 | |
| 1 | NESTED LOOPS | | 500 | 518K | 543 | 00:00:07 |
| 2 | TABLE ACCESS BY INDEX ROWID | BUILD_TAB| 500 | 259K | 42 | 00:00:01 |
| 3 | INDEX RANGE SCAN | BT_PK | 500 | | 3 | 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID | PROBE_TAB| 1 | 530 | 1 | 00:00:01 |
| 5 | INDEX UNIQUE SCAN | PT_PK | 1 | | 0 | |
————————————————-+———————————–+
Predicate Information:
———————-
3 – access(“BT”.”ID”>=1 AND “BT”.”ID”2->5->4->1->0. Pruning occurs on index access times (3) and (5).

Current SQL statement for this session:
— 367
select *
from build_tab bt, probe_tab pt
where bt.id between 1 and 500
and bt.id_probe = pt.id

Here is the second SQL, I did not used any hint, because CBO selects hashing for this SQL. Same SINGLE TABLE ACCESS PATHs and BASE STATISTICAL INFORMATION about tables are printed.

HA Join
Outer table:
resc: 42.11 card 500.10 bytes: 530 deg: 1 resp: 42.11
Inner table: PROBE_TAB Alias: PT
resc: 324.52 card: 15000.00 bytes: 530 deg: 1 resp: 324.52
using dmeth: 2 #groups: 1
Cost per ptn: 0.84 #ptns: 1
hash_area: 0 (max=0) Hash join: Resc: 367.48 Resp: 367.48 [multiMatchCost=0.00]
HA cost: 367.48
resc: 367.48 resc_io: 364.00 resc_cpu: 15939091
resp: 367.48 resp_io: 364.00 resp_cpu: 15939091
Best:: JoinMethod: Hash
Cost: 367.48 Degree: 1 Resp: 367.48 Card: 500.10 Bytes: 1060
[..skipped..]
HA Join
Outer table:
resc: 324.52 card 15000.00 bytes: 530 deg: 1 resp: 324.52
Inner table: BUILD_TAB Alias: BT
resc: 42.11 card: 500.10 bytes: 530 deg: 1 resp: 42.11
using dmeth: 2 #groups: 1
Cost per ptn: 399.93 #ptns: 1
hash_area: 0 (max=0) Hash join: Resc: 766.57 Resp: 766.57 [multiMatchCost=0.00]
HA Join (swap)
Outer table:
resc: 42.11 card 500.10 bytes: 530 deg: 1 resp: 42.11
Inner table: PROBE_TAB Alias: PT
resc: 324.52 card: 15000.00 bytes: 530 deg: 1 resp: 324.52
using dmeth: 2 #groups: 1
Cost per ptn: 0.84 #ptns: 1
hash_area: 0 (max=0) Hash join: Resc: 367.48 Resp: 367.48 [multiMatchCost=0.00]
HA cost: 367.48
resc: 367.48 resc_io: 364.00 resc_cpu: 15939091
resp: 367.48 resp_io: 364.00 resp_cpu: 15939091

Again the same question “left or right”? Here we use SINGLE TABLE ACCESS costs for each table, because we do not need to sort or pickup rows indivudially. resc = cost of serial access. deg = degree of parallelism. resp = cost of parallel access, because of deg=1, resc=resp. total best cost is 367.48. Substract SINGLE TABLE ACCESS costs. 367.48-(42.11+324.52) = 0.85. Building a hash table for this operation is free I think. And the answer of “left or right”. In hashing left means builder of hash table, right means user of hash table. Small dataset is builder of hash table, BUILD_TABLE. Also I skipped many columns, I have given no hint so there was merge cost calculation in these rows.

Plan Table
============
————————————————-+———————————–+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
————————————————-+———————————–+
| 0 | SELECT STATEMENT | | | | 367 | |
| 1 | HASH JOIN | | 500 | 518K | 367 | 00:00:05 |
| 2 | TABLE ACCESS BY INDEX ROWID | BUILD_TAB| 500 | 259K | 42 | 00:00:01 |
| 3 | INDEX RANGE SCAN | BT_PK | 500 | | 3 | 00:00:01 |
| 4 | TABLE ACCESS FULL | PROBE_TAB| 15K | 7764K | 325 | 00:00:04 |
————————————————-+———————————–+
Predicate Information:
———————-
1 – access(“BT”.”ID_PROBE”=”PT”.”ID”)
3 – access(“BT”.”ID”>=1 AND “BT”.”ID”=1 AND “BT”.”ID”<=500)

I did not dive into deeper. If you need more about NL/HASH/Merge, turkcellstaj grup having a two day seminar at this weekend at Turkcell Maltepe. Seminar’s topic will be CBO, our primary resource is Mr. Lewis’es CBO book. I will be talking about this topic at sunday afternoon. All seats are reserved🙂.

Latch-Lock Bölüm I

Eşzamanlılık söz konusu olunca, kaynaklara erişim yöntemleri bazı kısıtlar ile denetlenmelidir. Neden denetlenmelidir? Şöyle izâh edelim: Varsayalım ki elimizde bir simgesel mikroişlemci dili var. Komutları:

YÜK : bir bellek adresinden mikroişlemcinin bir saklayıcısına (register) değer atar.
YAZ : mikroişlemcinin bir saklayıcısından bir bellek adresine değer atar.
ART : Saklayıcının değerini 1 arttırır.

Elimizdeki yazılım şu şekilde:

kod1:
YÜK $4000, A
ART A
YAZ A, $4000

Görüldüğü gibi 4000 bellek bölgesinde değer önce A saklayıcısına cekiliyor, ardından 1 arttırma ve değeri bellek bölgesine geri yazma işlemleri geliyor. Düşünülmesi gereken ise modern işletim sistemlerinin eş zamanlı bir çok prosesi beraber yürütebilmesidir. Hatta sunucularda 4 CPU standart bir sayı olduğuda düşünülünce bu yazılımın 2 farklı proses tarafından yürütülmesi durumunda $4000 bellek bölgesinde arttırma işlevi yanlış sonuçlanması işten bile değildir. Senaryomuz şu şekildedir:

$4000’de 20 değeri var

zaman proses-1 proses-2
Z1 YÜK $4000, A - // 20 değeri mikroişlemcide
Z2 - YÜK $4000, A // 20 değeri mikroişlemcide
Z3 ART A - // pros1 20++ işlemi yapar
Z4 - ART A // pros2 20++ işlemi yapar
Z5 YAZ A, $4000 - // pros1 21 yazar
Z6 - YAZ A, $4000 // pros2 21 yazar


İki farklı proses birbirlerinden bihaber aynı bellek bölgesindeki değeri alıp, arttırıp geri yazmaktadır. Sonuç olarak 22 olmasını istediğimiz değer 21 olarak hesaplanmıştır. Burada paylaşılan kaynağa erişim denetlenmeden yapıldığı için eş zamanlılık sorun oluşturmaktadır. Oracle’ın paylaşılan kaynaklara çözümü LATCH’lerdir. LATCH bellekteki bir veri yapısını bir prosesin kullanımı altında diğer proseslere kilitlemesini sağlayan bir yapıdır. kod1 örneğindeki bütün satırlar KRİTİK BÖLGEdir. Kritik bölge başlangıçında latch alınması ve bitişinde latch’in bırakılması gereken bloktur.

Peki latch yapısını gerçeklemek için Oracle hangi yöntemi kullanmıştır? TAS gibi mikroişlemci komutları ile. TAS komutu kendi çevrim süresi boyunca bellek üzerindeki bir bayrak değişkenini kontrol edip bu değişkenin değerine göre kodun akışını yönlendirebilir. Bölünmezlik ilkesi olduğu için başka bir proses ile paylaşılan bayrak değişkeni hakkında bir sorun yaşamaz.

Bir proses kritik bölgede işlem yürütürken, diğer prosesler latch (TAS) üzerinde dönmektedirler. Bu dönme sürecinde proses SPIN konumunda kabul edilir. Pratikte proses hiç bir iş yapmamasına karşın, teorikte proses işlemci zamanı harcamakta ve işletim sistemi tarafından CANLI olarak görülmektedir. Soru: Neden proses TAS üzerinde döngüye sokuluyor? Basitçe yerini iş yapması gereken bir prosese devretmesi daha mantıklı olmaz mı? Cevap: TAS komutu ucuzdur, zira mikroişlemci seviyesindedir. Bağlam değişimi (concept switch) ise daha pahalıdır, çünkü kernel seviyesindedir, ek işlemleri vardır (ayrıca geri bağlam değişimi de gerektiği için bu maliyeti 2 ile çarpın). Oracle SPIN sayısını belirli bir sınırda kabul etmiştir. Bu sayı kadar TAS yürütüldükten sonra bağlam değişimi olup, mikroişlemci iş yapacak başka bir prosese devredilir. Sonuç olarak Oracle tarafından belirlenmiş SPIN sayısı bağlam değişiminden daha ucuz olduğu için mantıklı bir çözümdür.

Optimistic or Pessimistic

Concurrency comes with overhead of locking problem. Oracle uses pessimistic locking on default. SELECT * FOR UPDATE locks related rows for possible writes, because SELECT * FOR UPDATE means "I will write to these rows, do not touch them". OK, here comes question, why a writer do really block other writes? Because a writer altering the data for a purpose. If this purpose will be broken down in case of write operation, data inconsistency will occur. This is not only an ACID versus situation also your business logic will corrupt.
As we said FOR UPDATE is pessimistic. Consider a web page selling used cars. Customers want to a list of cars related to some searh criters. Is it good to show this list with FOR UPDATE? It is possible but it is not a realistic solution, we may use stateful session beans or another middle layer session solution. But FOR UPDATE will defeat concurrency (in this specific example, I do not say never use FOR UPDATE)
We must not use FOR UPDATE and we must overcome lost update problem. We will use optimistic locking. We will add a version column to each car in our inventory, when a user comes with a request of 'buy this car, it is ID is ...' we will check version column. If version is higher than our version: this means "this car has just sold when you are looking to list". Nice solution, widely used in many persistency tecnology.
Oracle comes with an other solution "rowdependencies". Each row will have a extra place to declare ROW_SCN (SCN is a number, an internal clock, in case of commit SCN increases). In absence of "rowdependencies" Oracle uses a place in datablock to declare datablock_SCN.

Here is a demo:

ert@ORCL> create table t_no_rowdep as select * from all_objects where rownum < 5;
ert@ORCL> create table t_row_dep rowdependencies as select * from t_no_rowdep;

We created 2 same-data table

ert@ORCL> select ora_rowscn, dbms_rowid.rowid_block_number(rowid) bn, object_nam e from t_row_dep;

ORA_ROWSCN BN OBJECT_NAME
---------- ---------- ------------------------------
899479 18780 ICOL$
899479 18780 I_USER1
899479 18780 CON$
899479 18780 UNDO$

ert@ORCL> select ora_rowscn, dbms_rowid.rowid_block_number(rowid) bn, object_nam e from t_no_rowdep;

ORA_ROWSCN BN OBJECT_NAME
---------- ---------- ------------------------------
899426 18772 ICOL$
899426 18772 I_USER1
899426 18772 CON$
899426 18772 UNDO$

Each of them is small enough to fit in a datablock. Now we update only one row.

ert@ORCL> update t_no_rowdep set object_name = reverse(object_name) where object_name = 'CON$';

1 row updated.

ert@ORCL> commit;

Commit complete.

ert@ORCL> select ora_rowscn, dbms_rowid.rowid_block_number(rowid) bn, object_name from t_no_rowdep;

ORA_ROWSCN BN OBJECT_NAME
---------- ---------- ------------------------------
899625 18772 ICOL$
899625 18772 I_USER1
899625 18772 $NOC
899625 18772 UNDO$

All rows seems to have updated SCN. No, not all of them updated. We are not using rowdependencies option. Each datablock have one physical place to declare last SCN of datablock.

ert@ORCL> update t_row_dep set object_name = reverse(object_name) where object_name = 'CON$';

1 row updated.

ert@ORCL> commit;

Commit complete.

ert@ORCL> select ora_rowscn, dbms_rowid.rowid_block_number(rowid) bn, object_name from t_row_dep;

ORA_ROWSCN BN OBJECT_NAME
---------- ---------- ------------------------------
899479 18780 ICOL$
899479 18780 I_USER1
899666 18780 $NOC
899479 18780 UNDO$

One row’s ORA_ROWSCN updated, as we need. It is your option to use ORA_ROWSCN with rowdependencies option to implement optimistic locking. So “ORA_ROWSCN can be set to be unique for each row of a table by ROWDEPENDENCIES” is not a myth…

A formatter for WordPress ver1

It is very hard for me to publish code in my blog. unidentification, lock of colors etc..

http://rapidshare.com/files/12000281/codeFormatter.zip.html

This is a simple program that formates your files to publish on wordpress.

usage is:

ert@SEMERKAND:~/jdevhome/mywork/codeFormatter/Client/classes/client$ java -classpath .. client.Main in.tex out.tex 4
java.lang.Exception: no valid #CODE# found
at client.BlockHandler.process(BlockHandler.java:79)
at client.Main.main(Main.java:57)

yes as default it gives an exception🙂. “in.tex” is the input file. “out.tex” is the out guy. “4” is how much spaces will your tabs include. Use it, change it. All is ours.

Pragma pack over structures.

I have been working on my friends post

here is my test case:
————————————————————————————————————————–

ert@SEMERKAND:~$ cat dnm.cpp

#include <iostream>

union manken_u {
        char    ch;
        int     integer;
        double  db;
};

#pragma pack(2)
struct manken_s_p {
        char    ch;
        int     integer;
        double  db;
};

#pragma pack()
struct manken_s {
        char    ch;
        int     integer;
        double  db;
};

#define sub(x,y) (int)(x)-(int)(y)

int main()

{
        std::cout
                << “size of union: ” << sizeof(union manken_u) << std::endl
                << “size of struct: ” << sizeof(struct manken_s) << std::endl
                << “size of packed struct: ” << sizeof(struct manken_s_p) << std::endl
                << “size of char+int+double: ” << sizeof(char)+sizeof(int)+sizeof(double) << std::endl;

        manken_s my_struct;
        char* base = &my_struct.ch;
        std::cout
                << “offset of char ” << sub(base,&my_struct.ch) << std::endl
                << “offset of int ” << sub(base,&my_struct.integer) << std::endl
                << “offset of db ” << sub(base,&my_struct.db) << std::endl;

        std::cout << ” pack is 2 ” << std::endl;

        manken_s_p my_struct_p;
        base = &my_struct_p.ch;
        std::cout
                << “offset of char ” << sub(base,&my_struct_p.ch) << std::endl
                << “offset of int ” << sub(base,&my_struct_p.integer) << std::endl
                << “offset of db ” << sub(base,&my_struct_p.db) << std::endl;

        return 0;
}
ert@SEMERKAND:~$ g++ dnm.cpp
ert@SEMERKAND:~$ ./a.out
size of union: 8
size of struct: 16
size of packed struct: 14
size of char+int+double: 13
offset of char 0
offset of int -4
offset of db -8
 pack is 2
offset of char 0
offset of int -2
offset of db -6


————————————————————————————————————————–

So, what is the moral of the story:

#pragma pack is a directive for compiler to determine the size of atomic unit for storing nonprimitive data types (struct for example) on memory.
If you set it to 2, chars in a struct will have a size of 2 bytes not 1 byte, because compiler will align it.
In my example first offset values are for pack(4) (default), so char has a padding value of 3 bytes to expand it to 4 bytes.
Following offset values shows pack(2) results. Char has a total lenght of 2 byte. BUT IT DOES NOT USE 2 BYTE.

And…. Why the compiler does this alignment stuff. Because different arc. have certain types of sizes of types like this

references:
http://publib.boulder.ibm.com/infocenter/macxhelp/v6v81/index.jsp?topic=/com.ibm.vacpp6m.doc/compiler/ref/rnpgpack.htm
http://docs.hp.com/en/8/pragmas.htm
and http://emreknlk.blogspot.com/