EMC RAID Disks Test

This test monitors the current state, overall health, and the load-balancing capability of each disk in the EMC storage system. With the help of this test, administrators can not only identify failed disks, but can also predict the potential failure of a disk, so that efforts can be undertaken to avert the same. In addition, the test also points administrators to disks that are handling more I/O requests than the rest, thus shedding light on irregularities in the distribution of I/O load across disks and prompting administrators to fine-tune the load-balancing algorithm. In addition, the test also proactively alerts administrators to probable slowdowns in I/O processing by specific disks, thereby enabling administrators to initiate pre-emptive actions.

This test is disabled by default. To enable the test, go to the enable / disable tests page using the menu sequence : Agents -> Tests -> Enable/Disable, pick the EMC Clariion SAN as the desired Component type, set Performance as the Test type, choose the test from the disabled tests list, and click on the < button to move the test to the ENABLED TESTS list. Finally, click the Update button.

Target of the test : An EMC CLARiiON storage device

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each disk on the storage system.

Configurable parameters for the test
Parameter	Description
Test Period	How often should the test be executed.
Host	The IP address of the storage device for which this test is to be configured.
Port	The port number at which the storage device listens. The default is NULL.
User Name and Password	The SMI-S Provider is paired with the EMC CIM Object Manager Server to provide an SMI-compliant interface for CLARiiON arrays. Against the User and Password parameters, specify the credentials of a user who has been assigned Monitor access to the EMC CIM Object Manager Server paired with EMC CLARiiON’s SMI-S provider.
Confirm Password	Confirm the Password by retyping it here.
SSL	Set this flag to Yes, if the storage device being monitored is SSL-enabled.
IsEmbedded	By default, this flag is set to False for an EMC CLARiiON device. Do not disturb this default setting.
SerialNumber	If the SMI-S provider has been implemented as a proxy, then such a provider can be configured to manage multiple storage devices. This is why, you will have to explicitly specify which storage system you want the eG agent to monitor. Since each storage system is uniquely identified by a Serial number, specify the same here. The serial number for an EMC CLARiiON device will be of the format, FCNMM094900059.
NameSpace	Specify the NameSpace that uniquely identifies the profiles specific to the provider in use. For EMC CLARiiON, this parameter will be set to root/emc by default.

Measurements made by the test

Measurement

Description

Measurement Unit

Interpretation

Health state

Indicates how healthy this disk currently is.

The values that this measure can report and their corresponding numeric values are discussed in the table below:

Measure Value	Numeric Value
OK	0
Unknown	1
Degraded/Warning	2
Minor failure	3
Major failure	4
Critical failure	5
Non-recoverable error	6

Note:

By default, this measure reports the Measure Values discussed above to indicate the state of a disk. In the graph of this measure however, states are represented using the numeric equivalents only.

Operational status

Indicates the current operational state of this disk.

The values that this measure can report and their corresponding numeric values are discussed in the table below:

Measure Value	Numeric Value
OK	0
In Service	1
Power Mode	2
Completed	3
Starting	4
Dormant	5
Other	6
Unknown	7
Stopping	8
Stressed	9
Stopped	10
Supporting Entity in Error	11
Degraded or Predicted Failure	12
Predictive Failure	13
Lost Communication	14
No Contact	15
Aborted	16
Error	17
Non-Recoverable Error	18

Note:

By default, this measure reports the Measure Values discussed above to indicate the operational state of a disk. In the graph of this measure however, operational states are represented using the numeric equivalents only.

Detailed operational state

Describes the current operational state of this disk.

This measure will be reported only if the API provides a detailed operational state.

Typically, the detailed state will describe why the disk is in a particular operational state. For instance, if the Operational status measure reports the value Stopping for a disk, then this measure will explain why that disk is being stopped.

The values that this measure can report and their corresponding numeric values are discussed in the table below:

Measure Value	Numeric Value
Online	0
Success	1
Power Saving Mode	2
Write Protected	3
Write Disabled	4
Not Ready	5
Removed	6
Rebooting	7
Offline	8
Failure	9

Note:

By default, this measure reports the Measure Values discussed above to indicate the detailed operational state of a disk. In the graph of this measure however, detailed operational states are represented using the numeric equivalents only.

Data transmitted

Indicates the rate at which data was transmitted by this disk.

MB/Sec

IOPS

Indicates the rate at which I/O operations were performed on this disk.

IOPS

Compare the value of this measure across disks to know which disk handled the maximum number of I/O requests and which handled the least. If the gap between the two is very high, then it indicates serious irregularities in load-balancing across disks.

You may then want to take a look at the Reads and Writes measure to understand what to fine-tune – the load-balancing algorithm for read requests or that of the write requests.

Reads

Indicates the rate at which read operations were performed on this disk.

Reads/Sec

Compare the value of this measure across disks to know which disk handled the maximum number of read requests and which handled the least. If the gap between the two is very high, then it indicates serious irregularities in load-balancing across disks.

Writes

Indicates the rate at which write operations were performed on this disk.

Writes/Sec

Compare the value of this measure across disks to know which disk handled the maximum number of write requests and which handled the least. If the gap between the two is very high, then it indicates serious irregularities in load-balancing across disks.

Data reads

Indicates the rate at which data is read from this disk.

MB/Sec

Compare the value of these measures across disks to identify the slowest disk in terms of servicing read and write requests (respectively).

Data written

Indicates the rate at which data is written to this disk.

MB/Sec

Disk busy

Indicates the percentage of time this disk was busy processing requests.

Percent

Compare the value of this measure across disks to know which disk was the busiest and which disk was not. If the gap between the two is very high, then it indicates serious irregularities in load-balancing across disks.

Average read size

Indicates the amount of data read from this disk per I/O operation

MB/Op

Compare the value of these measures across disks to identify the slowest disk in terms of servicing read and write requests (respectively).

Average write size

Indicates the amount of data written to this disk per I/O operation.

MB/Op

Read hit

Indicates the percentage of read requests that were serviced by the cache of this disk.

Percent

A high value is desired for this measure. A very low value is a cause for concern, as it indicates that cache usage is very poor; this in turn implies that direct disk accesses, which are expensive operations, are high.

Write hit

Indicates the percentage of write requests that were serviced by the cache of this disk.

Percent

Average response time

Indicates the time taken by this disk to respond to I/O requests.

Microsecs

Ideally, this value should be low. If not, it implies that the disk is slow.

EMC queue length

Indicates the number of requests that are in queue for this disk.

Number

A consistent increase in this value indicates a potential processing bottleneck with the disk.

EMC queue arrivals

Indicates the number of times a user request arrived while at least one other request was being processed.

Number