Posted by: akolk | February 11, 2009

IO performance: Oracle 9.2.0.7/Solaris 8/6130 SAN/Veritas Volume Manager.

I had a look a system here in the Netherlands for a company that was having severe I/O performance problems. They tried to switch to DirectIO (filesystemset_option=setall), but they discovered that the IO performance got worse. So they quickly turned it back to ASYNC. The System Admins told the DBAs several times that there were no OS or File System issues. However they did recently migrate from one SAN to another with Volumne Manager Mirrorring. And the local was remotely mirrored, supposedly in asynchronous mode.

So then I had a look at the system. The system has 24 CPUs, 57GB of internal memory and 14 Oracle databases running on it. The two most important databases did according to statspack around 1700 I/Os per second. So I did a quickscan of the Physical Reads per SQL statement and found that some of the statements were missing indexes (based on Tapios Rules of Engagement), I also noticed that the buffer caches were really small.

After adding some indexes and increasing the buffer cache(s) it was discovered that the writes were still a bit (understatement) slow. To get the focus of the database and more onto the system and OS, I decided to use the test program for writes from Jonathan Lewis. The tests performed were 1K, 10000 blocks sequential writes, 8K, 10000 blocks random writes and 16K, 10000 blocks sequential writes. The tests were run on different mounts points and the interesting things were observered. The tests also performed slow on the database mount points. So the database was no longer the root of problem, something else was. Now the system administrators had to pay attention πŸ™‚ Then it was quickly discovered that volume manager mirroring was still enabled for the database mount point. When that was discovered and turned off, the performance improved but was still not good enough (acutally it was 3-4 times too slow). Then it was discovered that the remote SAN had it is cache disabled and the asynchronous operations became synchronous. When that was fixed, the writes almost returned to normal.

Lessons leaned:

  • Don’t rely on SA to tell you what not the problem is, proof him wrong.
  • Indexes still matter
  • Buffer Caches are way too small for todays hardware πŸ™‚
  • Use the test IO program more often πŸ™‚

Responses

  1. Anjo.

    Good article. Thanks for sharing valuable experiences.

  2. At least you had sysadmins that were minimally responsible and investigated the problem.

    Here is what happened here:
    EMC san, mirrored raid10 to DR, synced.
    Write performance was of the order of hundreds of iops, not even around 1000!

    Of course: I was told to tune the database! Couldn’t be anything wrong with the SAN or OS, the sysadmins couldn’t possibly have got it wrong, only dbas do that!

    When I measured the IO write speeds, the figures were horrendous. Worse: any writes going through SPA were half the speed of those going through SPB.

    Given the RAID10 stripes were evenly spread across both, I also had the luxury of variable (slow or slower) write speed!

    EMC of course was exemplary in their adamant positioning that nothing was wrong and that what we had to do was “spread the IO” to “even out performance”. Had I followed their ‘advice’, we’d have ended up with evenly BAD performance across everything…

    Eventually we got an external consultant to actually tune the darn SAN and its setup, at both ends of the sync string. So far, he’s managed to triple the iops rate by doing simple, basic tuning steps.

    We’re going to implement a standby setup instead of the mirrored db very soon, with the redo logs being sent via async SAN mirror.

    Hopefully that will give me back the write speed of 3000iops I got without the silly sync in there….

  3. hi anjo,

    can you please share with us the which program u use to test the writes thanks

  4. Intressting article.
    Do you have any experince with oracle on zfs?
    I would be ver interested in that topic.


Leave a reply to Tom Cancel reply

Categories