Posted by: akolk | January 29, 2009

Lost SCSI devices after reboot

After an (involuntary)  reboot  of a SuSE 10 Enterprise edition, the Oracle instance wouldn’t come up. After a quick investigation it turned out that some of the LUNs were ‘missing’. LUNs are kind of important and nice to have to store datafiles,and in order to open the database these files need to be present.  So So that explains why the DB didnt’t come up. Now how do we find the SCSI devices on the Qlogic HBA card? I had to do the following as root:

Because there was only one active HBA card (but you need to do this for all your cards):

# echo scsi-qlascan >/proc/scsi/qla2xxx/1

And then you do:

# probe-luns -a

And all the devices were found again. This was confirmed with the dmesg command.
Posted by: akolk | January 28, 2009

Oracle Support: How do they do it?

A customer of mine ran into a problem with datapump exp and Spatial data. They wanted to export data with a sub query to control the area that they wanted to export. So they created a parfile and started the export. It ran into a problem. The export produced error ORA-29900 and complains about the subquery that was entered (an SDO_RELATE call to only export the objects of an certain area). After searching for some time on the internet and metalink we found bug 5029151. So the problem had been encountered by another user and they opened a bug, but the bug had been closed with status 92 (not a bug).  If Oracle Support had closed this bug with a different status like ‘Could not reproduce’ or ‘Not enough information available’, I would understand the reason for closing the bug. 

Now it leaves me with the feeling that the spreadsheet game is being played and that certain bugs get closed and not fixed, and that Oracle support doesn’t really care about it. So we will open an tar and let Oracle Support know that we want it fixed. I already know what is going to happen. Stalling and more data is needed etc. before they can open the bug.

Will keep you updated on how this goes 🙂

Posted by: akolk | January 28, 2009

XMLDOM PLSQL performance tip

I used the xmldom in PLSQL for the first time last year for a customer to generate XML files with product data that get loaded into FREDHOPPER (search engine). There were close to 300K products that needed to be exported into these XML files and each XML file could only have 1000 products (limitation of FREDHOPPER not Oracle). So on average I would create 300 xml files, but I noticed that the PLSQL procedure started to slow down and take more PGA the longer it was running (it went as far as eating up close to 50+ percent of a 12GB system). So there was a memory leak, I checked my code but didn’t find a problem. The only thing I kept doing was opening and closing these DOMS in memory (PGA) so I started to suspect that. I remembered this little PLSQL procedure that would free unused memory that was still allocated to your heaps. It is called dbms_session.free_unused_user_memory.  So now just before I call xmldom.newDocument, I call this dbms_session.free_unused_user_memory procedure and the performance improved, as the heap no longer grew and was steady around 10 percent of the memory. So I was able to improve the performance of the XMLDOM dramatically by releasing the unused memory.

Two weeks ago I got an phone call from a consultant who had a problem with XMLDOM on a windows machine (32 bit). The memory problems are even worse there and they he a small number of processes eat up all the available memory and error out with ora-4030. He asked if anything could be done about this and I remembered my little XMLDOM performance issue so I told him to call dbms_session.free_unused_user_memory every so often. And again the PLSQL program no longer runs into ora-4030 errors.

This is of course an Oracle problem, but at least we have an way to make XMLDOM perform better and use less memory.

Posted by: akolk | January 3, 2009

Found a customer still using the Oracle Precompiler :)

On new years eve I got a phone call from a company that had a performance problem. All of a sudden the batch job was no longer finishing and that gave problems because this process was supposed to update their website. So together with the customer we decided to start looking at this on January 1st 2009 (after lunch :)).  I arrived at the house of the DBA and we looked at the batch and of course the error didn’t reproduce, so it was decided that I should return home.  While driving home the batch hang again. So I instructed the DBA to find out if the batch was hanging in the database or on the client side. Turned out to be the client side. So the next couple of hours I looked at Precompiler bugs (hadn’t done that in a long time), as the batch program was written in Pro*C. While looking at the program, I was reminded of the fact that some of the Precompiler paramete have very strange (as in low) defaults.  The following parameters maxopencursors (10), hold_cursor (no), release_cursor(no). These settings will help to increase the amount of soft parsing  (if more than 10 cursors have to be open at the same time). Setting hold_cursor to yes and maxopencursors much higher should help to reduce the amount of soft parsing.

Back to the real problem. Because the production system didn’t have a Debugger installed (it was an AIX system), it was very difficult to find out where the process/program was hanging. But then I found the tool that I needed: procstack. With procstack one can dump the stack of a running process like this: procstack <pid>. Once we had that, it took less than 30 minutes to find the problem in the code and produce a fix. The program is now working again 🙂

Posted by: akolk | January 2, 2009

Back

Well I have decided to start blogging again. So watch this space for new articles 🙂

Anjo Kolk

« Newer Posts

Categories