Posted by: akolk | February 12, 2009

forcedirectio: Another victim

I see it all the time: people using Best Practices and ending up in a big mess afterwards. This time it is the mount option forcedirectio. According to a NetApp best practice for Oracle one should always use forcedirectio for the File Systems that store the Oracle Files. So people migrating to these systems read the white papers and best practices and then run into performance problems. A quick diagnosis shows that it is all related to IO. Ofcourse the NAS is blamed and NetApp gets a bad reputation. It is not only NetApp, it is true for all vendors that advice you to use the forcedirectio.

What does forcedirectio do?

It basically by passes the File System buffer cache and because of that it is using a shorter and faster code path in the OS to get the I/O done. That it is ofcourse what we want, however you are now nolonger using the File System Buffer Cache. Depending on your OS and defaults, a large portion of your internal memory could be used as the FS Buffer Cache. So most DBAs don’t dare to set Oracle Buffer Caches bigger than 2-3 GB and don’t dare to use Raw Devices. So the FS cache is used very heavily. It is not uncommon to see that on Oracle Database uses 2 to 10 more caching in the FS than the Oracle Buffer Cache. I have seen a system that used 20 to 40 times more caching the FS than the Oracle Buffer Cache.

So just imagine what happens if one than bypasses the FS Buffer Cache :) Most I/Os will become physical I/Os. Especially the reads will suffer. If the reads suffer, the end user response time will suffer directly. If end users are unhappy, managers will start to realise that they are important and they will so their face and interest again.

So how can we fix all of this? Easy. Just remember that if you remove the FS caching, you want to cache it some where else! And don’t rely on the NAS or SAN cache. The best and the cheapest place to cache the data, is the Oracle Buffer Cache. This will help to improve your Buffer Cache Hit Ratio again :)

So if you use forcedirectio, have a look at your Oracle Buffer Cache size!

About these ads

Responses

  1. Excellent,

    what is it’s behavior in case of RAW devices? second is there any recommended ratio/size for setting oracle buffer cache after changing it to forcedirectio?

    regards,

  2. For RAW you should apply the same logic (forcedirectio mimics RAW on the filesystem level). With RAW you also loose the file system caching, and one should also compensate the loss of the FS buffer cache.

  3. Two years ago, I was part of a project where all oracle databases mounted on vxfs should start using directio. After skipping the FS Buffer cache, all segment scans (or scattered reads so to say) became horrible in terms of response time. Turned out that vxfs read-ahead seemed to work only with buffering turned on. All those queries became highly dependent on the dbfmrc parameter. And we didn’t have the time to analyze all execution plan changes that would occur if we would increase dbfmrc. The filesystem remained buffered in the end… :)

  4. It is the preferred method to let oracle handle its own cache mechanism. Disable FS cache and give big buffer cache to oracle. then try to tune various parameters that affect IO efficiency. btw, why would you do scattered reads (FTS)? how about trying to turn them into sequential reads?

  5. It is important to know that there is significant variation between various filesystem’s “Direct I/O” implementations. Not all deliver write concurrency as do, for example, NFS and UFS under Solaris – or RAW anywhere. With QFS, its ‘samaio’ option is the best direct I/O mode. With VxFS, one must use ‘Quick I/O’, ODM, or its new ‘cio’ option to get write concurrency. Absent write concurrency, LGWR and DBWR can encounter often-confounding speed limits!

    Of course, when filesystem caching is removed, so is filesystem prefetching. I’ve lost count of how many have failed to consider that in precipitous moves to RAC! All RAC I/O options are unbuffered as a matter of architectural necessity. It works much better to have the DB do its own prefetching that to have a filesystem guess at it, but historically, one had to use Parallel Query to get Oracle to do its own prefetching. I’m told 11g is capable of DB-based prefetching without adding PARALLEL options to applications. Now *that* is a great step forward!

    Yep; this topic has many dimensions!

  6. Hi

    I have a customer who migrated from single instance to extended RAC with ASM on Sun Solaris.

    We algo hit the “physical I/O became real physical I/O after ASM” problem around 6 months ago. For some databases increasing the buffer cache helped

    However for a couple with heavy batch jobs the problem was not so easy to fix because in their old 96GB memory machine there were 5 databases running, each with 2GB SGA the physical reads were lightning fast of course it was because of filesystem buffer cache. But after migrated to ASM it has been seen that it is not so easy to calculate how much db_cache_size should be set for these two problematic databases. This is because in old machine the 96 GB minus all SGA memory was a big memory pool and can be served to all databaees and it is hard to know actually how much memory was each database using (SGA + Filesystem Cache).

    The problem is getting more or less fixed after several tweaks for Online operations but the batch processes are still taking too much time (triple).

    It looks like if you have enough CPU resources filesystem cache doesnt seem that bad at all and I am saying so even from my experience from last year where turning filesystem cache off in 3 customers virtually saved their life because CPU reources were scarce and duble buffering was killing their systems!

    Filesystem cache is also very useful in Linux 32 bit databases due to memory addressing limit in this platform. (Yes there are still people implementing 32 bit Linux:-( )

  7. As hinted above, there is a difference between the OS file caching and Oracle caching – the OS file cache does not distinguish between full scans and random IO. So even when you do switch to direct IO -and- make sure the memory previously used for the OS file cache is assigned to the Oracle buffer cache, you might still have slow (‘large table’) full scans that ran much faster from the OS file cache. To counter this, one could force the full scans to be cached by Oracle, but that requires schema changes, and one must figure out which segments are scanned often beforehand.
    Obviously, not using directIO has downsides, you ‘waste’ memory and CPU cycles. Thnink about switching to directIO when all of the following is true:
    – end users are not happy
    – the database is CPU bound
    – the database is doing phenomenal IO rates
    – the buffer cache advice (or your common sense) predicts that increasing the buffer cache size will significantly reduce the #physical reads

    Sometimes we follow the strategy to transfer memory from the OS file cache to the Oracle buffer cache -without- switching to directIO, observe the new IO rate, and re-evaluate the wisdom of enabling directIO based on these new IO rates. Also a good method to predict the effects of switching to ASM :)


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: