|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Brain Cramps: How simple mistakes can cripple your database
Donald K. Burleson, Burleson Consulting
It’s amazing how simple unobtrusive mistakes can cripple your database. Oracle 10g is the world most robust and flexible database and it’s not uncommon to find inappropriate configurations that perform miserably. This presentation will draw from real-world consulting experiences and show actual examples of Oracle Issues and how to detect and the most common Oracle performance issues.
This presentation explores some of the root causes of Oracle performance bottlenecks and offers specific advice for detecting and correcting sub-optimal configurations. Drawn from real-word experiences with actual examples, this presentation shows how simple mistakes in global Oracle settings can have unforeseen side effects. Examples will include sub-optimal initialization parameters, inappropriate parameters for tables & indexes, SQL optimizer traps and external issues such as disk and network configurations. This is a do-not-miss session for anyone who wants to understand how small changes can have a dramatic impact on Oracle performance.
Key points include:
- Understand common disk problems - Watch out for those optimizer settings - See legacy issues with 10g upgrades - See “Silly” Oracle configurations - See what happens when you forget to take your DBA training
What is a Brain Cramp?
By brain cramp, I refer to the phenomenon of being overwhelmed by the sheer number of choices offered by Oracle. As the worlds most robust database, Oracle has an almost infinite number of configurations, over 250 initialization parameters, and an often bewildering array of choices. Get one wrong, and you could have a seriously sub-optimal performance situation.
Fixing Issues
Another distinguishing characteristic of a brain cramp fix is a dramatic improvement in database performance. In by book “Oracle Silver Bullets”, I note that changing a single global setting can have a profound positive impact on overall database performance. Of course, a single change to a global parameter can also cause poor Oracle performance.
I have grouped these brain cramp issues into several categories:
External Issues:
SQL Tuning Issues:
Disk Issues:
Network Issues:
OS Issues:
Table & Index Issues
Init.ora Issues
Management Issues:
The problem of super-large disks See: http://www.dba-oracle.com/t_plague_large_oracle_disks.htm This issue of single channel access also imposes a bottleneck on Oracle disk devices, and the large disks (over 144 gigabytes) often perform more slowly for high concurrent access than their smaller predecessors.
Solutions to the large disk plague?Obviously, the Oracle professional must take action to relieve disk I/O bottlenecks. There are several solutions to this issue:
Initialization Parameters Starting with Oracle9i you have the ability to view the estimated CPU, TEMP and I/O costs for every SQL execution plan step. Oracle Corporation has noted that typical OLTP databases are becomingly increasingly CPU-bound and has provided the ability for the DBA to make the optimizer consider the CPU costs associated with each SQL execution step. Oracle 10g has recognized this trend toward CPU-based optimization by providing you with the ability to choose CPU-based or I/O-based costing during SQL optimization (the 10g default is CPU-costing). In Oracle10g, system stats are gathered by default, and in Oracle9i the DBA must manually execute the dbms_stat.gather_system_stats package to get CBO statistics.
alter session set "_optimizer_cost_model"=choose; -- default value alter session set "_optimizer_cost_model"=io; alter session set "_optimizer_cost_model"=cpu; You can use this parameter to choose the best optimizer costing model for your particular database, based on your own I/O and CPU load:
Your choice of relative weighting for these factors depends upon the existing state of your database. Databases using 32-bit technology (and the corresponding 1.7 gig limit on SGA RAM size) tend to have databases that are I/O-bound with the top timed events being spent performing disk reads:
Top 5 Timed Events ~~~~~~~~~~~~~~~~~ % Total Event Waits Time (s) Ela Time ---------------------------------- ------------ ----------- -------- db file sequential read xxxx xxxx 30 db file scattered read xxxx xxxx 40
Once 64-bit became popular, Oracle SGA sizes increased, more frequently-referenced data was cached, and databases became increasingly CPU-bound. Also, solid-state disk (RAM SAN) has removed disk I/O as a source of waits:
Top 5 Timed Events ~~~~~~~~~~~~~~~~~ % Total Event Waits Time (s) Ela Time ---------------------------------- ------------ ----------- -------- CPU time xxxx xxxx 55.76 db file sequential read xxxx xxxx 27.55
The gathered statistics are captured via the dbms_stats package (in 9.2 and above) and CPU statistics are captured automatically in 10g and stored in the sys.aux_stat$ view.
As we have noted, in database where CPU is the top timed event may benefit from changing their SQL optimizer to consider the CPU costs associated with each execution plan. Your cpu_count parameter is also important, and I have details here: See: http://www.dba-oracle.com/oracle_tips_cpu_count_wrong.htm
Using CPU costing may not be good for databases that are I/O-bound. Also,
note that changing to CPU-based optimizer costing will change the predicate
evaluation order of your query (See MetaLink bulletin 276877.1). Turning on CPU costing The default for the optimizer cost model is “choose”, meaning that the presence of CBO statistics will influence whether or not CPU costs are considered. According to the documentation, CPU costs are considered when you collect SQL optimizer schema statistics with the dbms_stat.gather_system_stats package (the default behavior in Oracle10g), and CPU costs will be considered in all of your SQL optimization. But it gets tricky because of Bug 2820066 where CPU cost is computed whenever optimizer_index_cost_adj is set to a non-default value. Unless you have applied the 9.2.0.6 server patch set, your Oracle9i database may be generating CPU statistics, regardless of your CBO stats collection method. To ensure that you are using CPU costing:
Turning off CPU costing As we noted, I/O-bound databases (especially 32-bit databases) may want to use I/O-based SQL costing. The default optimizer costing in Oracle 10g is “cpu”, and you can change to “io” costing by using these techniques:
Notes on Bug 2820066: CPU cost is computed when optimizer_index_cost_adj is set to a non-default value.
Bug description: If optimizer_index_cost_adj is set to a non-default value CPU costs are calculated regardless of the optimizer cost model used. If you have optimizer_index_cost_adj set and you are not using the optimizer CPU cost model, but explain plan shows that for queries not using domain indexes CPU costs are being calculated, you are probably hitting this bug. In sum, CPU cost is always computed regardless of optimizer mode when optimizer_index_cost_adj is set in un-patched Oracle versions less than 10.1.0.2.
Tuning with Text Indexes One serious SQL performance problem occurs when you use the SQL “LIKE clause” operator to find a string within a large Oracle table column (e.g. VARCHAR(2000), CLOB, BLOB): Select stuff from bigtab where text_column like ‘%ipod%’; Select stuff from bigtab where full_name like ‘%JONES’; Because standard Oracle cannot index into a large column, there “like” queries cause full-table scans, and Oracle must examine every row in the table, even when the result set is very small. These unnecessary full-table scans are a problem:
Oracle Regular Expression syntax has profound implications for Oracle tuning, especially in the area of indexing where indexes can be created on regular expressions, eliminating expensive full-table scans in-favor of fast index access. Regular expressions are extremely powerful for extracting facts from large text columns, especially the regexp_like syntax. Whenever an ad-hoc query seeks information inside a text column, Oracle is often forced to do full-table scans, and this extra work can cripple a database with excessive I/O. For example, an insurance company may request all claim descriptions that contain the words (“fraud”, or “fraudulent”) and the trick is to service these queries with an index. There are two approaches to dealing with high-volume queries against text columns:
Let’s look at each approach. Tuning with Oracle*Text Indexes
One obscure trick for indexing queries with a leading wildcard character (like '%SON') is to create a REVERSE index and them programmatically reverse the SQL like clause to read "like 'NOS%'", effectively indexing on the other side of the text, clumsy, yet effective. For details, see Oracle SQL "like clause" and index access.
Oracle*Text Indexes
The Oracle*Text utility (formally called Oracle ConText and Oracle Intermedia) allows us to parse through a large text column and index on the words within the column.
Unlike ordinary b-tree or bitmap indexes, Oracle context, ctxcat and ctxrule indexes are NOT updated as content is changed. Since most standard Oracle databases will use the ctxcat index with standard relational tables, you must decide on a refresh interval.
Hence, Oracle Text indexes are only useful for removing full-table scans when the tables are largely read-only and/or the end-users don’t mind not having 100% search recall:
Oracle Text works with traditional data columns and also with MS-Word docs and Adobe PDF files that are stored within Oracle. Oracle Text has several index types:
CTXCAT Indexes - A CTXCAT index is best for smaller text fragments that must be indexed along with other standard relational data (VARCHAR2). WHERE CATSEARCH(text_column, 'ipod')> 0; CONTEXT Indexes - The CONTEXT index type is used to index large amounts of text such as Word, PDF, XML, HTML or plain text documents. WHERE CONTAINS(test_column, 'ipod', 1) > 0 CTXRULE Indexes - A CTXRULE index can be used to build document classification applications.
These types of indexes allow you to replace the old-fashioned SQL “LIKE” syntax with “CONTAINS” or “CATSEARCH” SQL syntax:
When we execute the query with the new index we see that the full-table scan is replaced with a index scan, greatly reducing execution speed and improving hardware stress:
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=FIRST_ROWS 1 0 SORT (ORDER BY) 2 1 TABLE ACCESS (BY INDEX ROWID) OF 'BIGTAB' 3 2 DOMAIN INDEX OF 'TEXT-COLUMN_IDX' Index re-synchronization Because rebuilding an Oracle Text index (context, ctxcat, ctxrule) requires a full-table scan and lots of internal parsing, it is not practical to use triggers for instantaneous index updates. Updating Oracle Text indexes is easy and they can be schedules using dbms_job or the Oracle 10g dbms_scheduler utility package: Oracle text provides a CTX_DDL package with the sync_index and optimize_index procedures: SQL> EXEC CTX_DDL.SYNC_INDEX('text_column_idx');
SQL> EXEC CTX_DDL.OPTIMIZE_INDEX('text_column_idx','FULL');
For example, if you create a nightly dbms_scheduler job to call sync_index, your index will be refreshed, but the structure will become sub-optimal over time. Oracle recommends that you periodically use the optimize_index package to periodically re-build the whole index from scratch. Index optimization can be performed in three modes (FAST, FULL or TOKEN).
In sum, the Oracle Text indexes are great for removing unnecessary
full-table scans from static Oracle tables and they can reduce I/O by
several orders of magnitude, greatly improving overall SQL performance. Oracle Regular Expression Indexes
Oracle expert Jonathan Gennick notes that regular expression can be used in an Oracle index, a powerful tool for improving the speed of complex SQL queries, and notes that regular expression indexes (a type of function-based index) can dramatically reduce database overhead for pattern-matching queries. I've noted in my book "Oracle Tuning: The Definitive Reference", that function-based indexes are one of the most powerful and underutilized tools for Oracle professionals.
Let’s take a closer look and see how a regular expression index can reduce
stress on Oracle. Indexing on regular expressions In Parsing with regular expressions regexp_like Jonathan Gennick shows a great example where we use Oracle regular expressions to extract “acreage” references from inside a test string, ignoring important factors such as case sensitivity and words stems (acre, acres, acreage): COLUMN park_name format a30 COLUMN acres format a13 SELECT park_name, REGEXP_SUBSTR(description,'[^ ]+[- ]acres?',1,1,'i') acres FROM michigan_park WHERE REGEXP_LIKE(description, '[^ ]+[- ]acres?','i');
Here is the output, where we see that the regular expression has parsed-out the acreage figures, just as-if they were a discrete data column with the table:
PARK_NAME ACRES ____________________________ ___________ Mackinac Island State Park 1800 acres Muskallonge Lake State Park 217-acre Porcupine Mountains State Park 60,000 acres Tahquamenon Falls State Park 40,000+ acres The only problem with this query is that it will always perform a large-table full-table scan on the michigan_park table, causing unnecessary overhead for Oracle. However, using the powerful function-based indexes we could eliminate the unnecessary overhead by using the regular expression directly in the index: CREATE INDEX parks_acreage ON michigan_parks (REGEXP_LIKE(description, '[^ ]+[- ]acres?','i')); This simple index definition would create a yes/no index on all park records that contain a reference to "acre", "acres", "acreage". The database overhead would be once, when each rows is added to the table, and not over-and-over again when queries are executed. The rules for choosing a function-based index on a complex expression (regular expression, decode) is a trade-off between several factors:
It's the age-old quandary. If we build the regular expression once (at insert time) it can be used over-and-over again with little overhead. Conversely, using regular expressions in SQL without a supporting index will cause repeated full-table scans.
For detailed information on learning regular expression and using them in the database see Jonathan Gennick and Peter Linsley's book “Oracle Regular Expressions Pocket Reference”.
Indexing Opportunities
Jonathan Gennick shows a great example where we use Oracle regular expressions to extract “acreage” references from inside a test string, ignoring important factors such as case sensitivity and words stems (acre, acres, acreage): COLUMN park_name format a30 COLUMN acres format a13 SELECT park_name, REGEXP_SUBSTR(description,'[^ ]+[- ]acres?',1,1,'i') acres FROM michigan_park WHERE REGEXP_LIKE(description, '[^ ]+[- ]acres?','i');
Here is the output, where we see that the regular expression has parsed-out the acreage figures, just as-if they were a discrete data column with the table:
PARK_NAME ACRES ____________________________ ___________ Mackinac Island State Park 1800 acres Muskallonge Lake State Park 217-acre Porcupine Mountains State Park 60,000 acres Tahquamenon Falls State Park 40,000+ acres The only problem with this query is that it will always perform a large-table full-table scan on the michigan_park table, causing unnecessary overhead for Oracle. However, using the powerful function-based indexes we could eliminate the unnecessary overhead by using the regular expression directly in the index:
CREATE INDEX parks_acreage ON michigan_parks (REGEXP_LIKE(description, '[^ ]+[- ]acres?','i')); This simple index definition would create a yes/no index on all park records that contain a reference to "acre", "acres", "acreage". The database overhead would be once, when each rows is added to the table, and not over-and-over again when queries are executed. The rules for choosing a function-based index on a complex expression (regular expression, decode) is a trade-off between several factors:
It's the age-old quandary. If we build the regular expression once (at insert time) it can be used over-and-over again with little overhead. Conversely, using regular expressions in SQL without a supporting index will cause repeated full-table scans.
Init.ora parameter issues
Obvious missing parameters cause havoc:
Db_cache_size Sort_area_size (pga_aggregate_target)
Optimizer parameters
Optimizer_mode Optimizer_index_cost_adj Optimizer_index_caching
Starting in Oracle10g, the parameter optimizer_max_permutations parameters becomes a hidden parameter, starting with an underscore as _optimizer_max_permutations with a default values of 2,000.
Pga_aggregate_target
There are important limitations of pga_aggregate_target:
· The total work area cannot exceed 200 megabytes of RAM because of the default setting for _pga_max_size. · No RAM sort may use more than 5% of pga_aggregate_target or _pga_max_size, whichever is smaller. This means that no task may use more than 200 megabytes for sorting or hash joins. The algorithm further reduces this to (200/2) for sorts so the actual limit for pure sorts will be 100 megabytes. · These restrictions were made to ensure that no large sorts or hash joins hog the PGA RAM area, but there are some secrets to optimize the PGA. For example, the following set of parameters may be mutually-exclusive: · sort_area_size=1048576 <-- sort_area_size is ignored when pga_aggregate_target is set and when workarea_size_policy =auto, unless you are using a specialized feature such as the MTS. If dedicated server connections are used, the sort_area_size parameter is ignored. · · pga_aggregate_target = 500m <-- The maximum default allowed value is 200 megabytes, this limits sorts to 25 megabytes (5% of 500m). · mts_servers<>0 <-- If Multi-threaded server is being used, the pga_aggregate_target setting would be ignored in all versions except Oracle10g.
We also see these additional undocumented parameters:
Parameter Name Description _smm_advice_enabled if TRUE, enable v$pga_advice _smm_advice_log_size overwrites default size of the PGA advice workarea history log _smm_auto_cost_enabled if TRUE, use the AUTO size policy cost functions _smm_auto_max_io_size Maximum IO size (in KB) used by sort/hash-join in auto mode _smm_auto_min_io_size\ Minimum IO size (in KB) used by sort/hash-join in auto mode _smm_bound overwrites memory manager automatically computed bound _smm_control provides controls on the memory manager _smm_max_size maximum work area size in auto mode (serial) _smm_min_size minimum work area size in auto mode _smm_px_max_size maximum work area size in auto mode (global) _smm_trace Turn on/off tracing for SQL memory manager
WARNING - These are unsupported parameters and they should not be used unless you have tested their behavior on your own database and you are willing to accept full responsibility for any issues.
Super-size me For certain Oracle applications the Oracle professional will want to allow individual tasks to exceed the default limits imposed by Oracle. For example, PC-based, 64 bit Oracle servers (1 or 2 CPU's with 8 gigabytes of RAM) will often have unused RAM available. For example, a fully-cached 5 gigabyte database on an 8 gigabyte dedicated Oracle server will have approximately 1 gigabyte available for the PGA (allowing 20% for the OS and other SGA regions): • O/S - 1.6 gig • SGA - 5 gig • PGA Space - 1 gig • Total - 8 gig The system has a pga_aggregate_target setting of 1 gigabyte and the undocumented parameters are at their default settings. While it is unusual for an online system to require super-sized regions for sorting (because the result sets for online screens are normally small), there can be a benefit to having large RAM regions available for the Oracle optimizer. The Oracle cost-based optimizer will determine whether a hash join would be beneficial over a nested-loop join, so making more PGA available for hash joins will not have any detrimental effect since the optimizer will only invoke a super-sized hash join if it is better than a nested-loop join. In a system like the example above, the following settings would increase the default sizes for large sorts and hash joins while limiting those for parallel sorts.
· pga_aggregate_target = 4g · _pga_max_size = 400m · _smm_px_max_size = 333m
With these hidden parameters set we see significant size increase for serial sorts and a throttling effect for parallel queries and sorts. To see a reproducible, artificial test case demonstrating sort throttling, Mike Ault has prepared a 230 page artificial test case: Validation of Sort Sizes in a Linux Oracle10g Database. However, bear in mind that it only valid for a specific release of Oracle10g, on a specific hardware and OS environment, and not using any optional features such as the MTS.
· A RAM sort or hash join may now have up to the full 200 megabytes (5% of pga_aggegate_target) a 400% increase over a 1 gigabyte pga_aggregate_target setting. With the default settings, only a 200% (100 megabyte size) increase would be possible. · Parallel queries are now limited to 333 megabytes of RAM (30% of pga_aggregate_target or _smm_px_max_size), such that a DEGREE=4 parallel query would have a maximum of 83 megabytes (333 meg/4) per slave which may actually be less due to internal sizing algorithms that set the memory increments used in setting sort areas. This throttling is to prevent one parallel query using all available memory since _smm_px_max_size would default to 1.2 gigabytes with the setting for pga_aggregate_target at 4 gigabytes. · You must be careful in setting the PGA_AGGREGATE_TARGET to greater than the available memory, calculate the maximum number of users who would be sorting/hashing and multiple that times the predicted size to get your actual limitations otherwise ORA-4030 errors or swapping may occur. Laurent Schneider notes in Oracle MetaLink that overriding the PGA defaults made a large batch processes run more than 8x faster: "I set appropriate values for pga_aggregate_target and _pga_max_size... alter system set pga_aggregate_target=6G; alter system set "_pga_max_size"=2000000000; ...and I gave the query some hints "NOREWRITE FULL USE_HASH ORDERED". As a result, it boosted my query performance from 12 hours to 1.5 hour."
Hidden initialization parameters Hidden initialization parameters are very dangerous because their use is undocumented, but they can be very valuable if you are careful. We can use a SQL*Plus query to quickly find any new initialization parameters in a new version of Oracle:
select name from v$parameter@oracle9i minus select name from v$parameter@oracle8i ;
COLUMN parameter FORMAT a37 COLUMN description FORMAT a30 WORD_WRAPPED COLUMN "Session Value" FORMAT a10 COLUMN "Instance Value" FORMAT a10 SET LINES 100 SET PAGES 0 SPOOL undoc.lis
SELECT a.ksppinm "Parameter", a.ksppdesc "Description", b.ksppstvl "Session Value", c.ksppstvl "Instance Value" FROM x$ksppi a, x$ksppcv b, x$ksppsv c WHERE a.indx = b.indx AND a.indx = c.indx AND a.ksppinm LIKE '/_%' escape '/' /
Hidden parameters in TPC benchmarks
This world-record benchmark used a $6,000,000 HP server with 64-Intel Itanium2 processors and 768 gig or RAM and achieved over one million transactions per minute. We see that the benchmark DBA employed several Oracle hidden parameters to boost performance. Like most vendors, they take advantage of hardware-specific performance features:
_in_memory_undo=false _cursor_cache_frame_bind_memory = true _db_cache_pre_warm = false _in_memory_undo=false _check_block_after_checksum = false _lm_file_affinity
For each of the 16 RAC nodes, this benchmark used about 44 gigabytes of RAM data buffers each, distributed into five separate RAM data block buffers. The total RAM data block buffer storage was over 700 billion bytes: Here are the data block buffer parameters for each RAC node:
db_cache_size = 4000M db_recycle_cache_size = 500M db_8k_cache_size = 200M db_16k_cache_size = 4056M db_2k_cache_size = 35430M
hidden parameter _like_with_bind_as_equality: In situations in which the LIKE pattern is expected to match very few rows, it is possible to set the hidden parameter _like_with_bind_as_equality to TRUE. This makes the optimizer treat the predicate as though it were COLUMN = :BIND. In other words, the optimizer uses the column’s density as the selectivity instead of a fixed five percent.
Problems and “silver bullet” solutions:
See: http://www.dba-oracle.com/oracle_news/2005_10_18_silver_bullets.htm
Oracle silver bullet tuning is a top-down approach where a global change can have a profound positive impact on Oracle database performance. Jeff Moss, an Oracle DBA from the UK reports in a publication titled “IO problem!” that an external disk bottleneck was hurting his overall Oracle performance: “the temporary tablespace datafiles were on a filesystem which was marked as a archive redo log area on the RAID array and were set up as a concatenated volume rather than a stripe….with obvious performance impact!” By analyzing statistics and creating a prediction, Moss was able to predict that his I/O was about 50x too slow: “It will take a while to organise the array reorganisation to fix the problem but it should boost performance since the current stats are showing that it’s about 50 times slower service times on that part of the array than the place the files are supposed to be at.” In a subsequent publication titled “What a boost!", Moss notes that this single reorg made his batch suite run time go down from 12 hours down to only 4 hours:
“The temp tablespace area on that IO issue was reorganised a few days ago and the batch suite has reduced in time from 12 hours to 4...with more to come from the ability to parallelise more of the jobs now that the system isn't bottlenecking on temp. Much Better.” In one, my client had a table with 85,000,000 records, but was used as an OLTP table. The table itself was indexed by the primary key. They had over three hundred partitions, all in their own datafiles (so there were lots of those too), and they never accessed the table by primary key. It was a nightmare both for the maintenance of the datafiles, and the table itself. Consequently, they hadn't rebuilt the local indexes because they'd have to rebuild each partition at a time! Unfortunately for their DBA team, they had all inherited this setup, and were stuck with it.
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 1996 - 2009 by
Burleson Enterprises, Inc. All rights reserved.
Oracle® is the registered trademark
of Oracle Corporation. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||