Tuesday, July 27. 2010Comments (0) Trackbacks (0) Demantra Configuration and Tuning Case Study by Mike Swing, TruTek
TruTek recently participated in a tuning exercise for one of our Demantra customers. This paper describes some of our lessons learned.
What is Demantra? Demantra provides demand management, sales and operations planning, and trade promotions management solutions. Demantra provides powerful analytics for more accurate demand-driven planning, forecasting and modeling. You can read more about Demantra performance at “Oracle Demantra Performance, An Oracle White Paper, September 2007”. This paper describes recommended design considerations, appserver properties, database parameters, and client server attributes for Demantra Worksheets, which are the primary user interface for the Demantra tools. The bottom line is that there are many areas where tuning can improve the overall performance of Demantra worksheets. Our efforts focused on tuning the operating system and database. The Hardware Configuration Our client recently implemented new AIX hardware with 64 threads, 128 GB RAM, 1.6 TB of SSD and 2 TB of 15K Raid 10. In general, when compared to the initial configuration on 6/21/2010, we were able to improve the performance of worksheets from about 300% to 1000% on 7/2/2010. The following table illustrates the worksheet runtimes before and after performance tuning. ![]() The Operating System (AIX) Configuration Concurrent I/O Our testing concluded that using Concurrent I/O would improve performance. Concurrent I/O was first introduced in AIX Version 5.2. This feature invokes direct I/O, so it has all the other performance considerations associated with direct I/O. With standard direct I/O, inodes (data structures associated with a file) are locked to prevent a condition where multiple threads might try to change the consults of a file simultaneously. Concurrent I/O bypasses the inode lock, which allows multiple threads to read and write data concurrently to the same file. This is due to the way in which JFS2 is implemented with a write-exclusive inode lock, allowing multiple users to read the same file simultaneously. As you can imagine, direct I/O can cause major problems with databases that continuously read from the same file. Concurrent I/O solves this problem, which is why it is known as a feature that is used primarily for relational databases. Similar to direct I/O, you can implement this either through an open system call or by mounting the file system, as follows: # mount -o cio /u When you mount the file system with this command, all its files use concurrent I/O. Even more so than using direct I/O, concurrent I/O provides almost all the advantages of using raw logical volumes, while still keeping the ease of administration available with file systems. Note that you cannot use concurrent I/O with JFS (only JFS2). Further, applications that might benefit from having a file system read ahead or that have high buffer cache hit rates might actually see performance degradation. Performance Improvement – Oracle won’t work without this setting: AIXTHREAD_SCOPE=S AIX has a setting that allows process PIDs to change. In order for Oracle to work correctly, the PIDs need to be static. set AIXTHREAD_SCOPE=S in the environment: export AIXTHREAD_SCOPE=S Demantra I/O Requirements Demantra installations for medium to large companies will need to have I/O subsystems that can accommodate 150K - 300K I/Os per second (IOPS). Low-end solid state drives (SSD) have IOPS of about 120K. High-end SSDs have IOPS over 500K. With 15K hard disks, the IOPS is about 200. Achieving 150K IOPS with JBOD would require 750 drives. Database Configuration and Performance Tuning 1. Solid State Disks (SSD) improve I/O performance by 1000 times. For moved objects that require fast access to the SSD, two SSD mount points would be best: o SALES_DATA, MDP_MATRIX tables, partitions, local indexes and indexes o redo logs, TEMP tablespace, undo tablespace Reading and writing from SSD is about 1000 times faster than comparable operations on JBOD. In combination with 128 GB of RAM, the use of SSD has minimized the physical read wait times and the IOPS is roughly 1000 times greater than the previous system. Coupled with 64 processor threads the SSD I/O system is more than 1000 times faster than the old I/O system. 2. Running the worksheets with multiple CPUs in parallel improved performance by 8 times. We decided to evaluate the effect of varying degrees of parallelism. In our example, we have 64 processor threads available. We found that the degree of parallelism for tables could be high, but that indexes generally should have a degree of 1, to avoid issues with the CBO. However, by setting the SALES_DATA to a higher degree we were able to increase the CPU utilization of the worksheet. For example, if the SALES_DATA table is set to a parallel degree of 8, 8 processes will execute the parallel query. The following diagram shows the same Capacity Planning worksheet run twice: once for 2:00 minutes with 8 CPUs and the second run for 15:45 minutes with 1 CPU. ![]() The CAPACITY_PLANNING worksheet uses the parallel index SALES_DATA_144_IDXN. The following example shows how increasing the parallelism of the worksheet decreases the runtime of the worksheet, almost linearly. ![]() The associated worksheet runtimes to this graph are as follows: ![]() Note that as the number of processors doubles, the time should be reduced by half. For example, with a parallel degree of 8, 8 CPUs are used to process the worksheet and the runtime is 1:55. By doubling the CPUs running a worksheet, we might expect the runtime to be about 1:00 minute. However, the runtime with 16 processors is 1:20, meaning this is a non-linear process with some loss due to ‘friction’ and approximately 63% efficient. 3. Partitioning improved performance by up to 8 times (degree of parallelism) for most worksheets Partitioning allows for performance improvements by reducing the physical reads by using partition pruning for SELECT statements and allows for parallel updates, when the updates fall in different partitions and parallel execution is enabled. Evaluate the effect of varying the number of partitions. Partition – different partitioning options were evaluated SALES_DATA – change partitions to be based on week. This could be a finer resolution, maybe “day”, but usually data is loaded weekly , so the effective resolution is weekly. MDP_MATRIX – change the number of partitions of locations to be 1500, about 200 locations per partition. Initial testing with 30,000 location partitions proved to be unusable. 4. Enable Parallel DML for Load Programs: alter session enable parallel_dml; For tables with partitions and loads that cross partitions with parallel dml enabled, this will cause normal update statements to work in parallel. 5. We changed multiblock_read_count from 8 to 32. This resulted in 4 times faster I/O during full object scans. Overall worksheet performance improved an average of 20% – 25%. 6. We researched the most efficient memory utilization – after much testing, 65% SGA and 35% PGA, with 128 GB of available RAM turned out to be the best. By effectively tuning the database memory usage, we were able to reduce physical I/Os by allowing the memory to cache both data blocks in the buffer cache, and code in the PGA. This allowed for a performance improvement of about 10% to 20%, depending on how this was initially set. 7. Reorganizing tables resulted in about a 5% improvement when the SALES_DATA table was 28% out of order: SALES_DATA 28% out of order Capacity worksheet 2:05 SALES_DATA 0% out of order Capacity worksheet 1:59 8. We changed the block size from 8K to 32K – reducing chaining can result in fewer physical reads and could improve performance by a factor of 2 or more if every row was chained. However, in our case, about 5% of the rows were chained in the worst table. While we have reduced future chaining, the performance improvement was small. 9. We evaluated the performance of the Default Buffer Pool, Keep Buffer Pool and Recycle Buffer Pool for use with the Solid State disks. Our recommendation – use the Default BUFFER_POOL Best Settings for Performance ![]() Maintenance We determined the best way to rebuild the SALES_DATA table, using sales_date, item_id and location_id in the ORDER BY of the INSERT. The code was generated by TOAD and manually updated with the ORDER BY on sales_date, item_id and location_id. For help implementing this solution or other configuration or tuning tasks, please contact Sherri Dudley, sherri@trutek.com or Barb Matthews, barb@trutek.com. |
QuicksearchArchivesCategoriesSyndicate This BlogBlog Administration |

Comments
Sun, 09.05.2010 23:52
Barb, you take amazing notes. Thanks for sharing them with the members of our EBS communi ty who couldn't make it [...]
Fri, 04.09.2009 11:06
Lon, I am getting same RW-2 0019 error while installing Or acle Release 12 on Linux. I ha ve setup my network as D [...]
Mon, 09.02.2009 10:36
This is extremely helpful!