Storage Processor Cache Test

EMC VNX systems have 2 storage processors - usually suffixed by “A” or “B” to denote which one it is. It is the job of the storage processor to retrieve data from the disk when asked, and to write data to disk when asked. It also handles all RAID operations as well as read and write caching. The read cache uses a read-ahead mechanism that lets the storage system prefetch data from the disk. Therefore the data will be ready in the cache when the application needs it. The write cache buffers and optimizes writes by absorbing peak loads, combining small writes, and eliminating rewrites. The read and write caches and cache pages need to be sized adequately to achieve optimal performance of the storage system. If not, it may result in poor cache hits, a high rate of direct disk accesses, and significant degradation in the performance of the storage system. To avert such disasters, it would be good practice to periodically run the Storage Processor Cache test.

This test continuously monitors the current state, size, and usage of the read and write caches of each storage processor of the EMC VNX storage system, and proactively alerts administrators to the abnormal state, ineffective usage, and/or the insufficient size of the caches. This way, administrators are enabled to pre-emptively initiate remedial measures, so that the problems are resolved before storage system performance is impacted.

Target of the test : An EMC VNX Unified Storage system

Agent deploying the test : A remote agent

Outputs of the test : One set of results for the EMC VNX Unified Storage system.

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The IP address of the storage device for which this test is to be configured.

Port

The port number at which the storage device listens. The default is NULL.

Controller IP

Specify the IP address of the storage controller on the block-only storage system in the Controller IP text box. By default, the IP address of the Host will be assigned in the Controller IP text box.

NaviseccliPath

The eG agent uses the command-line utility, NaviSecCli.exe, which is part of the NaviSphere Management Suite, to communicate with and monitor the storage device. To enable the eG agent to invoke the CLI, configure the full path to the CLI in the NaviseccliPath text box.

User Name and Password

Provide the credentials of a user with Administrator rights to the storage controller in the User Name and Password text boxes.

Confirm Password

Confirm the password by retyping it here.

User Scope

To use the NaviSphere CLI, the eG agent needs to be configured with a User Scope. Scope defines the access radius of the user account (User and Password) that you have configured for this test. Set User Scope to Local if the user account you have configured for this test applies to the monitored storage system only. Set User Scope to Global if the user account you have configured applies to all the storage systems within a domain.

Timeout

Indicate the duration (in seconds) for which this test should wait for a response from the storage device. By default, this is set to 120 seconds. Note that the 'Timeout' value should always be set between 3 and 600 seconds only.

Measurements made by the test
Measurement Description Measurement Unit Interpretation

Cache page size

Indicates the number of pages currently in cache.

Number

To service I/O requests faster, to reduce disk overloads, and to eliminate disk abuse, the read/write caches should be sized with sufficient memory pages.

Cache page size determines the minimum amount of storage processor memory used to service a single I/O operation. Given below are some guidelines to right-size your cache:

  • Default of 8KB is fine for majority of workloads.
  • Default of 8KB is fine for majority of workloads.
  • Increase to maximum 16 KB if large-block I/O size is predominant in the environment.
  • With predominant small-block access, like 2 KB and 4 KB database environments, match cache page size to the predominant I/O size.

Cache pages owned

Indicates the number of cache pages owned.

Number

 

Dirty cache pages

Indicates the number of dirty cache pages.

Number

These are pages in write cache that have received new data from hosts but have not yet been flushed to disk. While a high value (i.e., a value between 60-80% of the write cache) for this measure is good as it increases the chance of a read coming from cache or additional writes to the same block of data being absorbed by the cache, a very high value – i.e., a value equal to or close to the total number of pages in the write cache – is a sign of bad health, as it indicates that the write cache is over-stressed.  

Read hit ratio

Indicates the percentage of read requests to this LUN that were serviced by the cache.

Percent

Ideally, the value of this measure should be high. A low value indicates that many read requests are serviced by direct disk accesses, which is a more expensive operation in terms of processing overheads.

SP read cache

Indicates whether the read cache of the storage processor is enabled or not.

 

If the read cache of the storage processor (SP) is enabled, then this measure will report the value Enabled. If not, then, this measure will report the value Disabled.

The numeric values that correspond to each of the states discussed above are available in the table below:

Measure Value Numeric Value
Enabled 1
Disabled 0

Note:

By default, this measure reports the above-mentioned Measure Values to indicate the status of the read cache. In the graph of this measure however, cache state is represented using the numeric equivalents - 0 or 1.

SP write cache

Indicates whether the write cache of the storage processor is enabled or not.

 

If the write cache of the storage processor (SP) is enabled, then this measure will report the value Enabled. If not, then, this measure will report the value Disabled.

The numeric values that correspond to each of the states discussed above are available in the table below:

Measure Value Numeric Value
Enabled 1
Disabled 0

Note:

By default, this measure reports the above-mentioned Measure Values to indicate the status of the write cache. In the graph of this measure however, cache state is represented using the numeric equivalents - 0 or 1.

SPA cache pages

Indicates the total number of pages in the cache of Storage Processor A.

Number

For best performance, each Storage Processor (SP) should have the maximum amount of its memory in cache and should use the default settings for the cache properties. Therefore, ideally the number of memory pages in the cache should be high, as otherwise, storage system performance will suffer.

SPA free memory size

Indicates the amount of physical memory of storage processor A that is currently unused.

MB

 

SPA physical memory

Indicates the total physical memory of storage processor A.

MB

SPA read cache size

Indicates the current size of the read cache of Storage Processor A.

Mb

 

The read cache holds data that is expected to be accessed in the near future. If a request for data that is in the cache arrives, the request can be serviced from the cache faster than from the disks. Each request satisfied from cache eliminates the need for a disk access, reducing disk load. Typically, it would be good practice set the read cache to roughly 10% of available cache; 200 MB is the recommended minimum, and 1024 is the recommended maximum. For block-only VNX systems, the minimum can be set to 100 MB.

The initial read cache settings that EMC recommends for the different VNX models have been discussed in the table below:

EMC VNX Model Initial Read Cache Setting (in MB)
VNX5100 100
VNX5300 400
VNX5500 700
VNX5700 1024

If the workload exhibits a "locality of reference" behavior, where a relatively small set of data is accessed frequently and repeatedly, the read cache can improve performance. In read-intensive environments, where more than 70 percent of all requests are reads, the read cache should be large enough to accommodate the dataset that is most frequently accessed. For sequential reads from a LUN, data that is expected to be accessed by subsequent read requests is read (prefetched) into the cache before being requested. Therefore, for optimal performance, the read cache should be large enough to accommodate prefetched data for sequential reads from each LUN. An improperly sized read-cache can increase direct disk reads and can hence, adversely impact storage system performance.

SPA read cache state

Indicates the current state of the read cache for Storage Processor (SP) A.

 

If the read cache of the storage processor (SP) A is enabled, then this measure will report the value Enabled. If not, then, this measure will report the value Disabled.

The numeric values that correspond to each of the states discussed above are available in the table below:

Measure Value Numeric Value
Enabled 1
Disabled 0

Note:

By default, this measure reports the above-mentioned Measure Values to indicate the status of the read cache. In the graph of this measure however, cache state is represented using the numeric equivalents - 0 or 1.

SPA system buffer

Indicates the size of the system buffer of storage processor A.

MB

 

SPA write cache size

Indicates the current size of the write cache of Storage Processor B.

MB

Write cache serves as a temporary buffer where data is stored temporarily before it is written to the disks. Cache writes are far faster than disk writes. Also, write-cached data is consolidated into larger I/Os when possible, and written to the disks more efficiently. (This reduces the expensive small writes in case of RAID 5 LUNs.) Also, in cases where data is modified frequently, the data is overwritten in the cache and written to the disks only once for several updates in the cache. This reduces disk load. Consequently, the write cache absorbs write data during heavy load periods and writes them to the disks, in an optimal fashion, during light load periods. However, if the amount of write data during an I/O burst exceeds the write cache size, the cache fills. Subsequent requests must wait for cached data to be flushed and for cache pages to become available for writing new data. It is hence imperative that you rightly size the write cache and set cache watermarks appropriately. Cache watermarks control the flushing behavior of write cache. Given below are a few recommendations in this regard:

  • Start with low watermark of 60% and a high watermark of 80%. This is suitable for a majority of the workloads.
  • If frequent forced flushing occurs, reduce watermark values.
  • Maintain a difference of about 20% between the low and high watermarks.
  • Avoid drastic changes to these values unless advised by EMC Support.

SPA write cache state

Indicates the current state of the write cache for Storage Processor (SP) A.

 

If the write cache of the storage processor (SP) A is enabled, then this measure will report the value Enabled. If not, then, this measure will report the value Disabled.

The numeric values that correspond to each of the states discussed above are available in the table below:

Measure Value Numeric Value
Enabled 1
Disabled 0

Note:

By default, this measure reports the above-mentioned Measure Values to indicate the status of the write cache. In the graph of this measure however, cache state is represented using the numeric equivalents - 0 or 1.

SPB cache pages

Indicates the total number of pages in the cache of Storage Processor B.

Number

For best performance, each Storage Processor (SP) should have the maximum amount of its memory in cache and should use the default settings for the cache properties. Therefore, ideally the number of memory pages in the cache should be high, as otherwise, storage system performance will suffer.

SPB free memory size

Indicates the amount of physical memory of storage processor B that is currently unused.

MB

 

SPA physical memory

Indicates the total physical memory of storage processor B.

MB

 

SPB read cache size

Indicates the current size of the read cache of Storage Processor B.

MB

The read cache holds data that is expected to be accessed in the near future. If a request for data that is in the cache arrives, the request can be serviced from the cache faster than from the disks. Each request satisfied from cache eliminates the need for a disk access, reducing disk load. Typically, it would be good practice set the read cache to roughly 10% of available cache; 200 MB is the recommended minimum, and 1024 is the recommended maximum. For block-only VNX systems, the minimum can be set to 100 MB.

The initial read cache settings that EMC recommends for the different VNX models have been discussed in the table below:

EMC VNX Model Initial Read Cache Setting (in MB)
VNX5100 100
VNX5300 400
VNX5500 700
VNX5700 1024

If the workload exhibits a "locality of reference" behavior, where a relatively small set of data is accessed frequently and repeatedly, the read cache can improve performance. In read-intensive environments, where more than 70 percent of all requests are reads, the read cache should be large enough to accommodate the dataset that is most frequently accessed. For sequential reads from a LUN, data that is expected to be accessed by subsequent read requests is read (prefetched) into the cache before being requested. Therefore, for optimal performance, the read cache should be large enough to accommodate prefetched data for sequential reads from each LUN. An improperly sized read-cache can increase direct disk reads and can hence, adversely impact storage system performance.

Since the read cache is not mirrored, to use the available storage processor memory efficiently, ensure that you allocate the same amount of read cache to both the storage processors – i.e., A and B.

SPB read cache state

Indicates the current state of the read cache for Storage Processor (SP) A.

 

If the read cache of the storage processor (SP) B is enabled, then this measure will report the value Enabled. If not, then, this measure will report the value Disabled.

The numeric values that correspond to each of the states discussed above are available in the table below:

Measure Value Numeric Value
Enabled 1
Disabled 0

Note:

By default, this measure reports the above-mentioned Measure Values to indicate the status of the read cache. In the graph of this measure however, cache state is represented using the numeric equivalents - 0 or 1.

SPB system buffer

Indicates the size of the system buffer of storage processor A.

MB

 

SPA write cache size

Indicates the current size of the write cache of Storage Processor B.

MB

Write cache serves as a temporary buffer where data is stored temporarily before it is written to the disks. Cache writes are far faster than disk writes. Also, write-cached data is consolidated into larger I/Os when possible, and written to the disks more efficiently. (This reduces the expensive small writes in case of RAID 5 LUNs.) Also, in cases where data is modified frequently, the data is overwritten in the cache and written to the disks only once for several updates in the cache. This reduces disk load. Consequently, the write cache absorbs write data during heavy load periods and writes them to the disks, in an optimal fashion, during light load periods. However, if the amount of write data during an I/O burst exceeds the write cache size, the cache fills. Subsequent requests must wait for cached data to be flushed and for cache pages to become available for writing new data. It is hence imperative that you rightly size the write cache and set cache watermarks appropriately. Cache watermarks control the flushing behavior of write cache. Given below are a few recommendations in this regard:

  • Start with low watermark of 60% and a high watermark of 80%. This is suitable for a majority of the workloads.
  • If frequent forced flushing occurs, reduce watermark values.
  • Maintain a difference of about 20% between the low and high watermarks.
  • Avoid drastic changes to these values unless advised by EMC Support.

Since the write cache is mirrored, the write cache allocation applies to both the storage processors – i.e., A and B.

SPB write cache state

Indicates the current state of the write cache for Storage Processor (SP) B.

 

If the write cache of the storage processor (SP) B is enabled, then this measure will report the value Enabled. If not, then, this measure will report the value Disabled.

The numeric values that correspond to each of the states discussed above are available in the table below:

Measure Value Numeric Value
Enabled 1
Disabled 0

Note:

By default, this measure reports the above-mentioned Measure Values to indicate the status of the write cache. In the graph of this measure however, cache state is represented using the numeric equivalents - 0 or 1.

Write cache mirrored

Indicates the write cache mirrored status.

 

Each storage processor (SP) has a write cache in its memory, which mirrors the write cache on the other SP. Because these caches mirror each other, they are always either enabled or disabled, and always the same size. On powerup, a storage system automatically enables the write cache on each SP if the write cache size is non-zero.

Using this measure, you can determine whether the write cache of both SPs is currently enabled/disabled.

If the write cache is disabled, then this measure will report the value Enabled. If not, the measure will report the value Disabled.

The numeric values that correspond to each of the states discussed above are available in the table below:

Measure Value Numeric Value
Enabled 1
Disabled 0

Note:

By default, this measure reports the above-mentioned Measure Values to indicate the mirror status of the write cache. In the graph of this measure however, the mirror status is represented using the numeric equivalents - 0 or 1.

Write hit ratio

Indicates the percentage of write requests to this LUN that were serviced by the cache.

Percent

Ideally, the value of this measure should be high. A low value indicates that many write requests are serviced by direct disk accesses, which is a more expensive operation in terms of processing overheads.