Wednesday, July 03, 2013

Oracle 12c - Session-specific statistics for global temporary tables

It’s been more than a week since Oracle Database 12c officially became available. I spent the very first night on reading the documentation, New Features chapter in particular.

New version, new features, improvements, enhancements, some more exciting than others. One however, intrigued me a bit .That is Session-specific statistics for global temporary tables. From time to time I am involved in optimizing queries that use global temporary tables so I am well aware of the challenges that come with them.

As the name suggest temporary tables contain temporary data. The data is visible only to the current session and persist either to the next commit or until the session is terminated. Either way, two sessions may populate the same temporary table with different data and concurrently run identical queries against it.

As we all know queries are executed using an execution plan generated by the optimizer. The efficiency of the execution plan depends upon the information the optimizer has about the data at the time the query is parsed. In my opinion, in order to get good execution plans(I am deliberately avoiding to say optimal plans) , accurate statistics are the most important piece of information you want to feed the optimizer with . 

However, when temporary tables are involved, it is often a challenge for the optimizer to generate efficient execution plan(s) that will be appropriate for all use cases.

In some cases when temporary tables are populated with data having similar characteristics, it is often a practice to have appropriate statistics collected and fixed (dynamic sampling may also fit in this case) so the optimizer produces efficient and stable execution plan(s). (one size fits all)

However, often temporary tables do not have statistics simply because it is difficult to find ones that will cover all use cases.

Consider the following example:

There is table MY_OBJECTS created using the following CTAS statement:

 CREATE TABLE my_objects AS  
 SELECT * FROM all_objects  
       , (SELECT * FROM dual CONNECT BY rownum<=10);  

A table MY_TYPES that contains all distinct data types derived from ALL_OBJECTS

 CREATE TABLE my_types AS  
 SELECT DISTINCT object_type FROM all_objects;  

A table T that has the same structure as MY_OBJECTS

 CREATE TABLE t AS   
 SELECT * FROM my_objects WHERE 1=2;  

And a global temporary table TEMP_OBJECTS that has only one column which will accept OBJECT_IDs

 CREATE GLOBAL TEMPORARY TABLE temp_objects   
 (object_id NUMBER NOT NULL)   
 ON COMMIT PRESERVE ROWS;  

Collect statistics for MY_OBJECTS and MY_TYPES tables

 EXEC dbms_stats.gather_table_stats(ownname=>'TEST',tabname=>'MY_OBJECTS',cascade=>true);  
 EXEC dbms_stats.gather_table_stats(ownname=>'TEST',tabname=>'MY_TYPES',cascade=>true);  

Let's suppose there are two sessions that load some data in TEMP_OBJECTS and then execute the following insert statement:

 INSERT INTO t  
 SELECT /*+ gather_plan_statistics */ o.*   
 FROM my_objects o  
   , my_types t  
   , temp_objects tt  
 WHERE o.object_type=t.object_type   
  AND o.object_id=tt.object_id;  

The first session (SID 22) inserts one record in TEMP_OBJECTS and then executes the insert statement from above by invoking insert_1.sql script.


 test[22,39]> INSERT INTO temp_objects (SELECT object_id FROM all_objects WHERE rownum<=1);  
   
 1 row created.  
   
 test[22,39]> commit;  
   
 Commit complete.  
   
 test[22,39]> @insert_1  
   
 10 rows created.  
   
   

Let's check the execution plan being used. It is based on Merge Cartesian Join, which is somewhat acceptable in this case.

 test[22,39]> @explain_last  
   
 PLAN_TABLE_OUTPUT  
 ----------------------------------------------------------------------
SQL_ID 8a1c7phuna9vn, child number 0  
 -------------------------------------  
 insert into t select /*+ gather_plan_statistics */ o.* from my_objects  
 o, my_types t, temp_objects tt where o.object_type=t.object_type and  
 o.object_id=tt.object_id  
   
 Plan hash value: 3579371359  
   
 ---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |              |      1 |        |      0 |00:00:02.56 |   14067 |  14028 |       |       |          |
|   1 |  LOAD TABLE CONVENTIONAL |              |      1 |        |      0 |00:00:02.56 |   14067 |  14028 |       |       |          |
|*  2 |   HASH JOIN              |              |      1 |     10 |     10 |00:00:00.04 |   14036 |  14028 |  1519K|  1519K| 1521K (0)|
|   3 |    MERGE JOIN CARTESIAN  |              |      1 |     38 |     38 |00:00:00.02 |       6 |      2 |       |       |          |
|   4 |     TABLE ACCESS FULL    | TEMP_OBJECTS |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |       |       |          |
|   5 |     BUFFER SORT          |              |      1 |     38 |     38 |00:00:00.02 |       3 |      2 | 73728 | 73728 |          |
|   6 |      TABLE ACCESS FULL   | MY_TYPES     |      1 |     38 |     38 |00:00:00.02 |       3 |      2 |       |       |          |
|   7 |    TABLE ACCESS FULL     | MY_OBJECTS   |      1 |    882K|    882K|00:00:02.09 |   14030 |  14026 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------   
 Predicate Information (identified by operation id):  
 ---------------------------------------------------  
   
   2 - access("O"."OBJECT_TYPE"="T"."OBJECT_TYPE" AND "O"."OBJECT_ID"="TT"."OBJECT_ID")  
   
 Note  
 -----  
   - dynamic statistics used: dynamic sampling (level=2)  
   
   
 30 rows selected.  
   


Session with SID (251) kicks in a moment later loading 1000 rows in TEMP_OBJECTS and runs the same insert statement. Since the same SQL statement has already been parsed , this session will reuse the existing cursor. However, in this case the plan being used is the one that should be avoided since cartesian join operation may severely impact performances:

 test[251,65]> INSERT INTO temp_objects (SELECT object_id FROM all_objects WHERE rownum<=1000);  
   
 1000 rows created.  
   
 test[251,65]> @insert_1  
   
 10000 rows created.  
   
 test[251,65]> @explain_last  
   
 PLAN_TABLE_OUTPUT  
 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 SQL_ID 8a1c7phuna9vn, child number 0  
 -------------------------------------  
 insert into t select /*+ gather_plan_statistics */ o.* from my_objects  
 o, my_types t, temp_objects tt where o.object_type=t.object_type and  
 o.object_id=tt.object_id  
   
 Plan hash value: 3579371359  
   
 ---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |              |      1 |        |      0 |00:00:03.87 |   15495 |  14026 |       |       |          |
|   1 |  LOAD TABLE CONVENTIONAL |              |      1 |        |      0 |00:00:03.87 |   15495 |  14026 |       |       |          |
|*  2 |   HASH JOIN              |              |      1 |     10 |  10000 |00:00:00.22 |   14037 |  14026 |  3162K|  2024K| 3101K (0)|
|   3 |    MERGE JOIN CARTESIAN  |              |      1 |     38 |  38000 |00:00:00.73 |       7 |      0 |       |       |          |
|   4 |     TABLE ACCESS FULL    | TEMP_OBJECTS |      1 |      1 |   1000 |00:00:00.02 |       4 |      0 |       |       |          |
|   5 |     BUFFER SORT          |              |   1000 |     38 |  38000 |00:00:00.21 |       3 |      0 | 73728 | 73728 |          |
|   6 |      TABLE ACCESS FULL   | MY_TYPES     |      1 |     38 |     38 |00:00:00.01 |       3 |      0 |       |       |          |
|   7 |    TABLE ACCESS FULL     | MY_OBJECTS   |      1 |    882K|    882K|00:00:01.97 |   14030 |  14026 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------
   
 Predicate Information (identified by operation id):  
 ---------------------------------------------------  
   
   2 - access("O"."OBJECT_TYPE"="T"."OBJECT_TYPE" AND "O"."OBJECT_ID"="TT"."OBJECT_ID")  
   
 Note  
 -----  
   - dynamic statistics used: dynamic sampling (level=2)  
   
   
 30 rows selected.  
   

As expected, the same cursor was reused by both sessions.

 sys[263,185]> select plan_hash_value, child_number, CHILD_ADDRESS, executions, invalidations, is_shareable, is_obsolete from v$sql where sql_id='8a1c7phuna9vn';  
   
 PLAN_HASH_VALUE CHILD_NUMBER CHILD_ADDRESS  EXECUTIONS INVALIDATIONS I I  
 --------------- ------------ ---------------- ---------- ------------- - -  
    3579371359      0 00000000618B54B8     2       0 Y N  
   
 sys[263,185]>  


As you can sense, plan stability is one of the biggest challenges when dealing with queries that depend on temporary tables.

There are several strategies available, that I am aware of, on how to attack this problem. They include: use of stored outlines; fixed statistics; baselines; hints (cardinality hint for example); periodically invalidating plans; using separate workspaces (schemas) for different types of workloads; and couple of others more or less effective. Each of these comes with its own limitations.

With Oracle 12c users have an ability to gather session specific statistics for global temporary tables. This feature seems to nicely address the challenges from above.

When session specific statistics are used, each session collects “private” statistics and have CBO generate execution plan/cursor based on them. These cursors are not shared with other sessions. Also a cursor that’s been generated using session specific statistics is invalidated when statistics are re-collected within the same session.

Session specific statistics are enabled when GLOBAL_TEMP_TABLE_STATS preference is set to SESSION.

 exec dbms_stats.set_table_prefs(ownname=>'TEST', tabname=>'TEMP_OBJECTS',pname=>'GLOBAL_TEMP_TABLE_STATS',pvalue=>'SESSION');  

Once the preference is set all one needs to do is to gather stats for session based statistics to be collected.

And that is all, now let’s see which plans will be generated for each of the sessions if we repeat the example from above when session statistics are being used.

Session with SID 251 loads 1000 rows into TEMP_OBJECTS table, collects stats and executes the same insert statement.

 test[251,69]> INSERT INTO temp_objects (SELECT object_id FROM all_objects WHERE rownum<=1000);  
   
 1000 rows created.  
   
 test[251,69]> exec dbms_stats.gather_table_stats(ownname=>'TEST', tabname=>'TEMP_OBJECTS');  
   
 PL/SQL procedure successfully completed.  
   
 test[251,69]> @insert_1  
   
 10000 rows created.  

Now, the execution plan used is based on a HASH JOIN rather than MERGE CARTESIAN JOIN operation

 test[251,69]> @explain_last  
   
 PLAN_TABLE_OUTPUT  
 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 SQL_ID 8a1c7phuna9vn, child number 1  
 -------------------------------------  
 insert into t select /*+ gather_plan_statistics */ o.* from my_objects  
 o, my_types t, temp_objects tt where o.object_type=t.object_type and  
 o.object_id=tt.object_id  
   
 Plan hash value: 4256520316  
   
 ---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |              |      1 |        |      0 |00:00:02.71 |   15514 |  14026 |       |       |          |
|   1 |  LOAD TABLE CONVENTIONAL |              |      1 |        |      0 |00:00:02.71 |   15514 |  14026 |       |       |          |
|*  2 |   HASH JOIN              |              |      1 |   9858 |  10000 |00:00:00.07 |   14037 |  14026 |  1696K|  1696K| 1583K (0)|
|   3 |    TABLE ACCESS FULL     | MY_TYPES     |      1 |     38 |     38 |00:00:00.01 |       3 |      0 |       |       |          |
|*  4 |    HASH JOIN             |              |      1 |   9858 |  10000 |00:00:00.04 |   14034 |  14026 |  2293K|  2293K| 1607K (0)|
|   5 |     TABLE ACCESS FULL    | TEMP_OBJECTS |      1 |   1000 |   1000 |00:00:00.01 |       4 |      0 |       |       |          |
|   6 |     TABLE ACCESS FULL    | MY_OBJECTS   |      1 |    882K|    882K|00:00:01.64 |   14030 |  14026 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------- 
   
 Predicate Information (identified by operation id):  
 ---------------------------------------------------  
   
   2 - access("O"."OBJECT_TYPE"="T"."OBJECT_TYPE")  
   4 - access("O"."OBJECT_ID"="TT"."OBJECT_ID")  
   
 Note  
 -----  
   - Global temporary table session private statistics used  
   
   
 30 rows selected.  

Note the “Note” - Global temporary table session private statistics used.

If you check v$sqlarea or v$sql you will see there are two child cursors for the same sql id.


 sys[263,185]> select plan_hash_value, child_number, CHILD_ADDRESS, executions, invalidations, is_shareable, is_obsolete from v$sql where sql_id='8a1c7phuna9vn';  
   
 PLAN_HASH_VALUE CHILD_NUMBER CHILD_ADDRESS  EXECUTIONS INVALIDATIONS I I  
 --------------- ------------ ---------------- ---------- ------------- - -  
    3579371359      0 00000000618B54B8     2       0 Y N  
    4256520316      1 00000000625A3928     1       0 Y N  
   
 sys[263,185]>   


It is worth to mention that sessions that use global statistics will keep using without interfering with the session specific stats and cursors.

For example, session with SID 26 loads 1000 rows, but doesn't collect stats on TEMP_OBJECTS table. The insert statement in this case will use the globally available cursors (child cursor 0)

   
 test[26,145]> INSERT INTO temp_objects (SELECT object_id FROM all_objects WHERE rownum<=1000);  
   
 1000 rows created.  
   
 test[26,145]> commit;  
   
 Commit complete.  
   
 test[26,145]> @insert_1  
   
 10000 rows created.  
   
 test[26,145]> @explain_last  
   
 PLAN_TABLE_OUTPUT  
 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 SQL_ID 8a1c7phuna9vn, child number 0  
 -------------------------------------  
 insert into t select /*+ gather_plan_statistics */ o.* from my_objects  
 o, my_types t, temp_objects tt where o.object_type=t.object_type and  
 o.object_id=tt.object_id  
   
 Plan hash value: 3579371359  
   
 ---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |              |      1 |        |      0 |00:00:03.71 |   15513 |  14026 |       |       |          |
|   1 |  LOAD TABLE CONVENTIONAL |              |      1 |        |      0 |00:00:03.71 |   15513 |  14026 |       |       |          |
|*  2 |   HASH JOIN              |              |      1 |     10 |  10000 |00:00:00.13 |   14039 |  14026 |  3162K|  2024K| 3114K (0)|
|   3 |    MERGE JOIN CARTESIAN  |              |      1 |     38 |  38000 |00:00:00.64 |       9 |      0 |       |       |          |
|   4 |     TABLE ACCESS FULL    | TEMP_OBJECTS |      1 |      1 |   1000 |00:00:00.01 |       6 |      0 |       |       |          |
|   5 |     BUFFER SORT          |              |   1000 |     38 |  38000 |00:00:00.19 |       3 |      0 | 73728 | 73728 |          |
|   6 |      TABLE ACCESS FULL   | MY_TYPES     |      1 |     38 |     38 |00:00:00.01 |       3 |      0 |       |       |          |
|   7 |    TABLE ACCESS FULL     | MY_OBJECTS   |      1 |    882K|    882K|00:00:01.88 |   14030 |  14026 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------  
   
 Predicate Information (identified by operation id):  
 ---------------------------------------------------  
   
   2 - access("O"."OBJECT_TYPE"="T"."OBJECT_TYPE" AND "O"."OBJECT_ID"="TT"."OBJECT_ID")  
   
 Note  
 -----  
   - dynamic statistics used: dynamic sampling (level=2)  
   
   
 30 rows selected.  
   
 sys[263,185]> select plan_hash_value, child_number, CHILD_ADDRESS, executions, invalidations, is_shareable, is_obsolete from v$sql where sql_id='8a1c7phuna9vn';  
   
 PLAN_HASH_VALUE CHILD_NUMBER CHILD_ADDRESS  EXECUTIONS INVALIDATIONS I I  
 --------------- ------------ ---------------- ---------- ------------- - -  
    3579371359      0 00000000618B54B8     3       0 Y N  
    4256520316      1 00000000625A3928     1       0 Y N  
   

Let's suppose the session with sid 251 modifies the TEMP_OBJECTS table, deletes its content and loads only 1 row followed by refreshing the session specific statistics. In this case it is expected to see the plan with hash value 3579371359 to be used.

 test[251,69]> DELETE FROM temp_objects;  
   
 1000 rows deleted.  
   
 test[251,69]> INSERT INTO temp_objects (SELECT object_id FROM all_objects WHERE rownum<=1);  
   
 1 row created.  
   
 test[251,69]> commit;  
   
 Commit complete.  
   
 test[251,69]> exec dbms_stats.gather_table_stats(ownname=>'TEST', tabname=>'TEMP_OBJECTS');  
   
 PL/SQL procedure successfully completed.  
   
 test[251,69]>  
   


If you check the cursor state (valid/invalid) you can see the session private cursor (child cursor 1) was just invalidated.

 PLAN_HASH_VALUE CHILD_NUMBER CHILD_ADDRESS  EXECUTIONS INVALIDATIONS I I  
 --------------- ------------ ---------------- ---------- ------------- - -  
    3579371359      0 00000000618B54B8     3       0 Y N  
    4256520316      1 00000000625A3928     1       1 Y N  
   
 test[251,69]> @insert_1  
   
 10 rows created.  
   
 test[251,69]> @explain_last  
   
 PLAN_TABLE_OUTPUT  
 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 SQL_ID 8a1c7phuna9vn, child number 1  
 -------------------------------------  
 insert into t select /*+ gather_plan_statistics */ o.* from my_objects  
 o, my_types t, temp_objects tt where o.object_type=t.object_type and  
 o.object_id=tt.object_id  
   
 Plan hash value: 3579371359  
   
 ---------------------------------------------------------------------------------------------------------------------------------------  
 | Id | Operation        | Name     | Starts | E-Rows | A-Rows |  A-Time  | Buffers | Reads | OMem | 1Mem | Used-Mem |  
 ---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |              |      1 |        |      0 |00:00:02.43 |   14051 |  14026 |       |       |          |
|   1 |  LOAD TABLE CONVENTIONAL |              |      1 |        |      0 |00:00:02.43 |   14051 |  14026 |       |       |          |
|*  2 |   HASH JOIN              |              |      1 |     10 |     10 |00:00:00.01 |   14037 |  14026 |  1519K|  1519K| 1426K (0)|
|   3 |    MERGE JOIN CARTESIAN  |              |      1 |     38 |     38 |00:00:00.01 |       7 |      0 |       |       |          |
|   4 |     TABLE ACCESS FULL    | TEMP_OBJECTS |      1 |      1 |      1 |00:00:00.01 |       4 |      0 |       |       |          |
|   5 |     BUFFER SORT          |              |      1 |     38 |     38 |00:00:00.01 |       3 |      0 | 73728 | 73728 |          |
|   6 |      TABLE ACCESS FULL   | MY_TYPES     |      1 |     38 |     38 |00:00:00.01 |       3 |      0 |       |       |          |
|   7 |    TABLE ACCESS FULL     | MY_OBJECTS   |      1 |    882K|    882K|00:00:02.05 |   14030 |  14026 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------
   
 Predicate Information (identified by operation id):  
 ---------------------------------------------------  
   
   2 - access("O"."OBJECT_TYPE"="T"."OBJECT_TYPE" AND "O"."OBJECT_ID"="TT"."OBJECT_ID")  
   
 Note  
 -----  
   - Global temporary table session private statistics used  
   
   
 30 rows selected.  
   

As expected the plan with hash value 3579371359 was used.

 PLAN_HASH_VALUE CHILD_NUMBER CHILD_ADDRESS  EXECUTIONS INVALIDATIONS I I  
 --------------- ------------ ---------------- ---------- ------------- - -  
    3579371359      0 00000000618B54B8     3       0 Y N  
    3579371359      1 00000000625A3928     1       1 Y N   
   

You could also use v$sql_shared_cursor view to observe how the cursors are created and shared.
In order to track statistics one could use [DBA|USER|ALL]_TAB_STATISTICS views. These views have SCOPE column that indicate whether the statistics are shared or session specific:

 test[251,69]> SELECT owner, table_name, num_rows, last_analyzed, scope FROM dba_tab_statistics WHERE table_name='TEMP_OBJECTS';  
   
OWNER           TABLE_NAME        NUM_ROWS LAST_ANAL SCOPE
--------------- --------------- ---------- --------- -------
TEST            TEMP_OBJECTS                         SHARED
TEST            TEMP_OBJECTS             1 30-JUN-13 SESSION
   

Finally, something to think about is the possibility to increase hard parsing. As you may have observed from above, each parse made after the stats are gathered is a hard parse. Therefore, depending upon your workload you may expect to see more hard parsing.

Resources

2 comments:

Nakinov said...

Awesome post Mihajlo! Gathering session level statistics on GTTs is something very refreshing for me.

Thanks for sharing!

Unknown said...

Well jobs keep it up see oracle community for oracle consulting configuration.