Thursday, August 23, 2007

advanced SQL tuning

Subject: TROUBLESHOOTING: Advanced Query Tuning
Doc ID: Note:163563.1 Type: TROUBLESHOOTING
Last Revision Date: 18-JUL-2006 Status: PUBLISHED

Purpose

The purpose of this article is to assist with the handling of query tuning issues. This article provides guidelines for dealing with tuning problems. It assumes that there really is a query tuning problem as opposed to some other issue that makes queries perform in a sub-optimal way. There are already plenty of articles on this topic. These will be referenced from this article where appropriate rather than trying to duplicate existing material. Obviously an article such as this cannot cover all possibilities but rather attempts to indicate what to look for.

Scope and Application

All

Resolving Query Tuning Issues

Queries can run slowly for any number of reasons. In order to try to determine where the problem lies and thus how to deal with it, this article has been divided into a number of sections containing pertinent questions to help the reader . Remember that initially you should be thinking towards providing at least a satisfactory workaround to the issue if the absolute root cause cannot be determined. Once a Workaround has been put in place and the pressure is off (to some degree at least), then the root cause can be determined (if necessary).

The points in this article are applicable to both the Rule (RBO) and Cost Based Optimizers (CBO)

Investigation

The first step must be to collect information about the problem itself:

Do you know which query is causing the problems?
If not, then Note 33089.1 contains information that may help identify problem queries.
The Tuning Pack in Oracle Enterprise Manager is also very useful for identifying problematic statements. There is also useful information in Chapter 6 of the Oracle9i Database Performance Guide and Reference.

Tools such as STATSPACK can also assist in determining which query is causing problems. See Note 94224.1

Information on this particular query
At this stage it is not recommended to get exhaustive information regarding this query. With experience, it is often possible to resolve issues with only small amounts of data, such as the query itself and the explain plan. Note 68735.1 Information required to diagnose a Query Performance Problem is an exhaustive list of the type of information required.



Diagnostics

Once the statement (or statements) have been determined then the next stage is to find the reason for the query running slow.

Has this query always run slowly? See Note 179668.1
Investigate the following areas
Explain plan for the query
In a perfect world, you would wish to gather the explain plan for both the 'slow' and 'not so slow' queries. Obviously this may not be possible but if this can be achieved it makes generation of a workaround much quicker because there is immediately a goal to aim at that has known and acceptable performance.

The best (standard) way of getting an explain plan is to use autotrace.
See Note 68735.1 regarding how to do this.

Once you have the explain plan, review it alongside the query text looking for anything unusual. The goal here is to identify any problem areas or anything that could be modified to run more efficiently.

Identify the optimizer in use (See Note 66484.1)
By far the most common issue encountered will be that the 'wrong' optimizer has been chosen and the system has not been set up to use it. This results in choices being made based on incorrect information.

Look for cost figures in the explain output to indicate that the CBO has been used
Missing cost figures do not necessarily indicate that the RBO has been used.
Look for CBO specific features (Hash Join, Index Fast Full Scan etc) as these confirm CBO usage
Once the optimizer has been determined, establish that the system is setup to work effectively with this optimizer. For example if CBO is in use then ensure that statistics are upto date and reflect the dataset.

Once the optimizer has been determined, establish that the system is setup to work effectively with this optimizer. For example if CBO is in use then ensure that statistics are up to date and reflect the dataset. See Statistics and analysis strategy below

Identify operations that may cause problems on your system:
Remember that these can only be general suggestions and many systems will work fine that do not match this structure.

Online Transaction Processing (OLTP) System

Full Table Scan
Hash or Sort merge joins
Index fast full scans or (large) index range scans
Parallel query
Nested loop joins with high cardinality outer table
Batch/Warehouse system

Nested loop joins
Index lookups
See the Query text section below for more general suggestions of things to look for.

Identify expensive operations:
There are 2 main categories of potentially expensive operations: High cardinality (large number of rows) and High cost. If the explain plan has steps that show large cardinality or cost figures then these may be areas where the biggest benefits of tuning can be reaped.

Statistics and analysis strategy
The CBO requires accurate statistics to enable it to work efficiently. By default no statistics are gathered on any objects. If the CBO is to be used effectively, statistics must be gathered. See Note 44961.1 for Analysis recommendations. A quick way to see if statistics are present is to select the NUM_ROWS column for your base table from dba_tables. If the column value is non-NULL then statistics have been gathered at some point. If it is suspected that the statistics may be old, then the LAST_ANALYZED column in DBA_TABLES can be used to determine when the table was last analyzed.

Query text
Review the query looking for any constructs which may cause you problems.
These come from experience but you are looking for:

Large INlists / OR statements Note 62153.1
Outer joins - There are a number of restrictions on the use of outer joins - see Oracle9i Database Performance Guide and Reference
Set operators (UNION etc) - Incorrect usage of set operations can be very inefficient
Partitioned Tables (or Views Oracle 7.3) - ensure that elimination is taking place as expected
No where clause or missing join predicates (potential for Cartesian Products)
Aggregate functions (cpu intensive functions applied to large rows sources can cause problems in some cases)
Sorting issues (These will typically be identified in combination with Explain plan) Note 67409.1 and Note 10577.1
Implicit type conversion - ensure that the datatypes of compared objects are the same or conversion overheads can occur
Any other 'strange' or unexpected constructs
Views, Inline views or Subqueries - is view merging or subquery unnesting taking place or not? Are predicates being passed in?
Finding an acceptable plan

Examination of the explain plan for the query can only give information on the query as it is running currently. If the query is running poorly then this may not help determine how the query could run more efficiently. Also this may be a new query where no acceptable plan has been determined yet so there is nothing to aim for. The following section gives suggestions for how a better plan may be found

Trace statistics
I/O and timing statistics (elapsed and cpu) can be very helpful in determining which part of a query is the root cause of the problem. Especially useful are actual row counts as these can be compared against the predicted row counts (expected row counts (cardinalities e.g. Card=12345) can be found in explain plans for CBO queries, actual row counts can be found in tkprof output or in the STAT lines from raw sqltrace/10046 output ). Any anomalies can then be investigated. For example, if the actual number of rows returned by a particular explain plan step differs significantly from the CBO cardinality estimates, then it is possible that this is a source of a bad plan choice. The cause of the incorrect statistics can be investigated and corrected. A fully detailed analysis is beyond the scope of this document.

Where excessive I/O has been identified, some potential causes are:

Full Table Scans and Index Fast Full Scans - this would be characterised by multi block i/o and likely waits for 'db file scattered read'. Note that Full table scans on the Right hand side (inner) of a nested loops join is unlikley to be performant since the table will be scanned once for every row on the Left hand side (outer).
Index range or Full Scan - If a large proportion (or indeed all) of an index is scanned then this can lead to excessive I/O (and CPU). This would be characterised by high single block i/o activity and likely waits for 'db file sequential read'. Join order can play a very significant part in this
Join order - It is advisable to choose a join order that eliminates as many rows as possible early in the query execution plan. If this does not occur then large volumes of data may be scanned that is later eliminated by the application of join predicates. If the join predicates can be applied earlier, then this volume of data may not need to be scanned and may reduce I/O (and CPU) requirements.
Excessive sort/hash areas - If sort/hash areas are excessive, the optimizer may start to choose plans which take advantage of these which may result in large amounts of I/O. For example, hash or sort merge joins may be chosen in preference to otjher methods. If parallel execution is also being used then this is even more likely.

Where excessive CPU usage has been identified, some potential causes are:

Index range or Full Scan - If a large proportion (or indeed all) of an index is scanned then this can lead to excessive CPU since much of the index may be cached. Join order can play a very significant part in this
Join order - As with I/O, it is advisable to choose a join order that eliminates as many rows as possible early in the query execution plan. If this does not occur then large volumes of data may be scanned that is later eliminated by the application of join predicates. If the join predicates can be applied earlier, then this volume of data may not need to be scanned and may reduce CPU (and I/O) requirements.
Excessive sort/hash areas - Sorting is very CPU intensive.
Nested Loops joins with high number of rows in the left hand side (outer table) will result in a large number of lookups on the Right hand side (inner table). The repeated lookup process is CPU intensive.

Break Query down into component parts
Most complex queries are a combination of many simpler queries. If a query is broken into its constituent parts and there are then optimized a good indication of the overall (desired) plan can be built up by combining each tuned section. Remember that in some cases significant benefits can be gained from the combination of steps and so even better performance can be obtained from a combined query than a number of standalone queries. Generally it is much easier to tune a simple query than a complex one.

Application knowledge
It may be that the application designer has information about how a particular query should run based on the application design. If that is the case then that can be used that to compare the plan that is generated with the expected (designed) access path. For example, if it is felt that a query should read table A with a full table scan and then use the information to drive an index lookup on table B with a nested loop join, then check this against what it was designed to do. If it does not match up then it may perform more acceptably if it is forced to use the designed access path, perhaps by using hints..

If specific application knowledge is not available then similar results may be attainable by considering the table sizes and organizing them so that the most rows are eliminated as early as possible in the query. If rows are eliminated early, then they do not have to be considered later and therefore may improve the overall query performance.

Trial and Error
Often trying a plan can give you a pointer as to a potential better plan. For example if a query performs badly under the RBO, analyze the tables and use CBO. It may be that the plan produced is a good one and can then be used in your environment. Even if the method used to create the plan is not feasible for use in the real environment, the fact that it actually produces the required output in a timely manner may be useful in as much as it proves that the operations are possible.

Summary: Compare real figures with optimizer estimates, break the query down into its constituent pieces and tune these individually, utilise application knowledge to suggest improvements in join orders and utilise trial and error to test potential plan choices.

Solutions

By looking at the diagnostics above, hopefully the root cause of the problem can be determined. The articles below cover most common solution areas.

Query does not use an index See Note 67522.1 Why is my index not used?
My hints are being ignored See Note 69992.1 Why is my hint ignored?
Changed oracle version and now query(ies) are slow See Note 160089.1 Why are my queries slow after upgrading my database?
Query produces wrong results See Note 150895.1 Handling Wrong Results Issues
Workarounds

Often the provision of a workaround is enough to diffuse a serious situation. It is worth determining if a workaround is applicable early on so that you can formulate your strategy around that. Even the knowledge that a workaround in not applicable can be useful as it can concentrate the mind on other solutions to the problem. Also creating a workaround can actually confirm that as solution is viable.

Is a Workaround going to be a valid solution to the problem?
Usually the most effective workaround is to use a hint. Can the query be hinted to force the desired plan?
See Note 29236.1 Hints Quick Reference or
Note 69992.1 Why hints are ignored.

What to do if the query cannot be modified?
For more suggestions and what to do if the query itself cannot be modified because it is generated or fixed code, or for any other reason then see Note 122812.1



RELATED DOCUMENTS

Note 233112.1 START HERE> Diagnosing Query Tuning Problems Using a Decision Tree

Note 372431.1 TROUBLESHOOTING: Tuning a New Query
Note 179668.1 TROUBLESHOOTING: Tuning Slow Running Queries
Note 122812.1 Tuning Suggestions When Query Cannot be Modified
Note 67522.1 Diagnosing Why a Query is Not Using an Index

Note 214106.1 Using TKProf to compare actual and predicted row counts