Discussing all things virtualization and storage in the data center.

Storage metrics and what to do with them?

I run Hitachi Tuning Manager (HTnM) and have been battling with what to do with it for some time.  The information it provides is good (if not a little too good) but I would like to know what to do with the performance data I see.

Today, I was doing real time monitoring of the top 10 array groups (AG) in one of our USP's.  One  AG sat on about 45 % busy for the hour or so I watched it.  It has a couple of Exchange server's LUN's and they have been working hard for the last day or so.  Yesterday, one 25 GB LUN was chewing up 600 IOPS of one AG.  I figure that a 7D+1P AG can do about 1000 IOPS at best so what about the other LUNs sitting on that AG.  Was I just lucky that no one was watching as I bet some other servers would have been paying the price for that Exchange LUN because I have over 900 GB of disk that has to be used on the AG.

Time and time again, I have asked HDS for some guidance on what to do with certain stats.  I basically get a run around as they are not willing to give guidance on their hardware.  HTnM has the capacity to create alerts/events.  I tried it but got flooded with alerts.  The thresholds were high but it seems that the USP can run hot especially during backup windows.

I think that the best bet is to wait until the customer/user complains and then look at the problem.  I do try to be proactive in making sure the storage is behaving but I would dearly love to figure out a regime of when to do something.

Ignorance is bliss as before I got HTnM, I never had enough information to concern me.  Now I want to do something but when it the right time to do it?

The most surprising thing about HTnM is that is uses Command Devices to query the USP.  They can really skew metrics especially for logical devices.  When I run a couple of real time reports, the Command Device often has 500 to 800 millisecond response time.  That means the graph is useless as I am looking for 15  or above especially in our HSC clusters which use Sync True Copy.

Perhaps EMC are better at offering advice on what to do with their storage?

Comments on this subject would be more than welcome.

Stephen

comments powered by Disqus