AWS DynamoDB Test

Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. You can use Amazon DynamoDB to create a database table that can store and retrieve any amount of data, and serve any level of request traffic. Amazon DynamoDB automatically spreads the data and traffic for the table over a sufficient number of servers to handle the request capacity specified by the customer and the amount of data stored, while maintaining consistent and fast performance.

If the tables are poorly sized with read and write capacity, then the DynamoDB service will be unable to service the read and/or write requests rapidly and successfully. This can cause critical read/write requests to be throttled, thus negatively impacting user productivity . If this is to be avoided, then administrators should continuously monitor how each table services the read/write requests to it, measure the usage of read/write capacity by every table, and accurately isolate those tables where requests may potentially be throttled owing to inadequate capacity. This is exactly where the AWS DynamoDB test helps!

This test automatically discovers the tables created using the DyanamoDB service. For each table, the test reports the provisioned read/write capacity and capacity consumed by each table, thus turning the spot light on tables that are under-sized - i.e., that do not have adequate resources to process their workload. The test also reports the latency and the number of requests throttled per table, thus enabling administrators to gauge the impact of improper sizing on performance. In the process, administrators are also provided pointers on how table capacity can be fine-tuned to improve performance. Additionally, the test also captures and promptly reports errors and request failures, which also contribute to a sub-par experience with the DynamoDB service.

Target of the test: Amazon Cloud

Agent deploying the test: A remote agent

Output of the test: One set of results for each table on DynamoDB

First-level descriptor: AWS Region

Second-level descriptor: TableName

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Access Type

eG Enterprise monitors the AWS cloud using AWS API. By default, the eG agent accesses the AWS API using a valid AWS account ID, which is assigned a special role that is specifically created for monitoring purposes. Accordingly, the Access Type parameter is set to Role by default. Furthermore, to enable the eG agent to use this default access approach, you will have to configure the eG tests with a valid AWS Account ID to Monitor and the special AWS Role Name you created for monitoring purposes.

Some AWS cloud environments however, may not support the role-based approach. Instead, they may allow cloud API requests only if such requests are signed by a valid Access Key and Secret Key. When monitoring such a cloud environment therefore, you should change the Access Type to Secret. Then, you should configure the eG tests with a valid AWS Access Key and AWS Secret Key.

Note that the Secret option may not be ideal when monitoring high-security cloud environments. This is because, such environments may issue a security mandate, which would require administrators to change the Access Key and Secret Key, often. Because of the dynamicity of the key-based approach, Amazon recommends the Role-based approach for accessing the AWS API.

AWS Account ID to Monitor

This parameter appears only when the Access Type parameter is set to Role. Specify the AWS Account ID that the eG agent should use for connecting and making requests to the AWS API. To determine your AWS Account ID, follow the steps below:

  • Login to the AWS management console. with your credentials.

  • Click on your IAM user/role on the top right corner of the AWS Console. You will see a drop-down menu containing the Account ID (see Figure 1).

    Figure 1 : Identifying the AWS Account ID

AWS Role Name

This parameter appears when the Access Type parameter is set to Role. Specify the name of the role that you have specifically created on the AWS cloud for monitoring purposes. The eG agent uses this role and the configured Account ID to connect to the AWS Cloud and pull the required metrics. To know how to create such a role, refer to Creating a New Role.

AWS Access Key, AWS Secret Key, Confirm AWS Access Key, Confirm AWS Secret Key

These parameters appear only when the Access Type parameter is set to Secret.To monitor an Amazon cloud instance using the Secret approach, the eG agent has to be configured with the access key and secret key of a user with a valid AWS account. For this purpose, we recommend that you create a special user on the AWS cloud, obtain the access and secret keys of this user, and configure this test with these keys. The procedure for this has been detailed in the Obtaining an Access key and Secret key topic. Make sure you reconfirm the access and secret keys you provide here by retyping it in the corresponding Confirm text boxes.

Proxy Host and Proxy Port

In some environments, all communication with the AWS cloud and its regions could be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none , indicating that the eG agent is not configured to communicate via a proxy, by default.

Proxy User Name, Proxy Password, and Confirm Password

If the proxy server requires authentication, then, specify a valid proxy user name and password in the Proxy User name and Proxy Password parameters, respectively. Then, confirm the password by retyping it in the Confirm Password text box. By default, these parameters are set to none, indicating that the proxy sever does not require authentication by default.

Proxy Domain and Proxy Workstation

If a Windows NTLM proxy is to be configured for use, then additionally, you will have to configure the Windows domain name and the Windows workstation name required for the same against the Proxy Domain and Proxy Workstation parameters. If the environment does not support a Windows NTLM proxy, set these parameters to none.

Exclude Region

Here, you can provide a comma-separated list of region names or patterns of region names that you do not want to monitor. For instance, to exclude regions with names that contain 'east' and 'west' from monitoring, your specification should be: *east*,*west*

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measures reported by the test:

Measurement

Description

Measurement Unit

Interpretation

Conditional check failed requests

Indicates the number of conditional write attempts to this table that failed.

Number

During a write request like PutItem, UpdateItem or DeleteItem operations, you can define a logical condition that defines whether the item can be modified or not, e.g. the item can be updated only if it’s not marked as “protected”. This logical condition must return “true” to allow the operation to proceed. If it returns “false”, this metric is incremented and a 400 error (Bad request) is returned. Note that a conditional write failure will not increment the value of the User errors measure.

Provisioned read capacity

Indicates the number of read capacity units that has been provisioned for this table.

Number

The value of this measure is the sum of read capacity units provisioned for the table and the global secondary index on that table.

You can use the detailed diagnosis of this measure to know how many read capacity units have been provisioned to this table and to the global secondary index on this table.

One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size. If you need to read an item that is larger than 4 KB, DynamoDB will need to consume additional read capacity units. The total number of read capacity units required depends on the item size, and whether you want an eventually consistent or strongly consistent read.

Consumed read capacity

Indicates the number of read capacity units consumed by this table.

Number

Tracking changes in read consumed capacity allows administrators to spot abnormal peaks or drops in read activities. In particularly, administrators can make sure that consumption does not exceed the value of the Provisioned read capacity measure.

The value of the Consumed read capacity measure is the sum of read capacity units consumed by the table and by the global secondary index on that table. To know how many capacity units have been consumed by the table and by the index, use the detailed diagnosis of this measure.

If the value of the Consumed read capacity measure for a table is close to or equal to the value of its Provisioned read capacity measure, or if the value of the Read capacity measure for a table is close to or equal to 100%, it indicates that the read requests to that table have been or will potentially be throttled. Throttling prevents your application from consuming too many capacity units. When a request is throttled, it fails with an HTTP 400 code (Bad Request) and a ProvisionedThroughputExceededException.

Administrators might want to set up a first alert before a table consumes its entire capacity —for instance, they can configure an alert to be triggered at a threshold of 80%. This would give them time to scale up capacity before any requests are throttled.

If administrators do not want to risk requests getting throttled, they can enable auto scaling. With DynamoDB auto scaling, a table can increase its provisioned read and write capacity to handle sudden increases in traffic, without request throttling. When the workload decreases, DynamoDB auto scaling can decrease the throughput so that administrators do not pay for unused provisioned capacity.

Alternatively, administrators can purchase reserved capacity in advance. By reserving read and write capacity units ahead of time, administrators can realize significant cost savings compared to on-demand provisioned throughput settings.

Read capacity

Indicates the percentage of read capacity consumed by this table.

Percent

Read throttle events

Indicates the number of read events which exceeded provisioned read throughput of this table.

Number

A low value for this measure indicates that the table is well-tuned. If the value of this measure is very high or is increasing consistently, it indicates that many reads are exceeding the provisioned read throughput, and are hence being throttled. If too many read requests to a table are found to get throttled, then use the detailed diagnosis of this measure to determine what type of read requests to that table are getting throttled the most - are they requests to the table? or are they requests to the global secondary index on the table?

To avoid read throttle events, administrators should make sure that the table/index is provisioned with adequate read throughput, based on the size of items and the performance level they expect from it. For instance, suppose that administrators want to read 80 items per second from a table containing items of 3 KB in size. Say that they want strongly consistent reads. For this scenario, administrators have to set the table's provisioned read throughput to 80 read capacity units.

Alternatively, administrators can enable auto scaling for the table. With DynamoDB auto scaling, a table can increase its provisioned read and write capacity to handle sudden increases in traffic, without request throttling. When the workload decreases, DynamoDB auto scaling can decrease the throughput so that administrators do not pay for unused provisioned capacity.

Another option is to purchase reserved capacity in advance. By reserving read and write capacity units ahead of time, administrators can realize significant cost savings compared to on-demand provisioned throughput settings.

Provisioned write capacity

Indicates the number of write capacity units provisioned for this table.

Number

The value of this measure is the sum of write capacity units provisioned for the table and the global secondary index on that table.

You can use the detailed diagnosis of this measure to know how many write capacity units have been provisioned to this table and to the global secondary index on this table.

One write capacity unit represents one write per second for an item up to 1 KB in size. If you need to write an item that is larger than 1 KB, DynamoDB will need to consume additional write capacity units. The total number of write capacity units required depends on the item size.

Consumed write capacity

Indicates the number of write capacity units consumed by this table.

Number

Tracking changes in write consumed capacity allows administrators to spot abnormal peaks or drops in write activities. In particularly, administrators can make sure that consumption does not exceed the value of the Provisioned write capacity measure.

The value of the Consumed write capacity measure for a table is the sum of the write capacity units consumed by that table and by its global secondary index. To know how many write capacity units were consumed by the table and by the index, use the detailed diagnosis of this measure.

If the value of the Consumed write capacity measure for a table is close to or equal to the value of the Provisioned write capacity measure, or if the value of the Write capacity measure for a table is close to or equal to 100%, it indicates that the write requests to that table have been or will potentially be throttled.

Throttling prevents an application from consuming too many capacity units. When a request is throttled, it fails with an HTTP 400 code (Bad Request) and a ProvisionedThroughputExceededException.

Administrators might want to set up a first alert before a table consumes its entire capacity —for instance, they can configure an alert to be triggered at a threshold of 80%. This would give them time to scale up capacity before any requests are throttled.

If administrators do not want to risk requests getting throttled, they can enable auto scaling. With DynamoDB auto scaling, a table can increase its provisioned read and write capacity to handle sudden increases in traffic, without request throttling. When the workload decreases, DynamoDB auto scaling can decrease the throughput so that administrators do not pay for unused provisioned capacity.

Alternatively, administrators can purchase reserved capacity in advance. By reserving read and write capacity units ahead of time, administrators can realize significant cost savings compared to on-demand provisioned throughput settings.

Write capacity

Indicates the percentage of provisioned write capacity that has been consumed by this table.

Percent

Write throttle events

Indicates the number of write events which exceeded the write throughput provisioned for this table.

Number

A low value for this measure indicates that the table is well-tuned. If the value of this measure is very high or is increasing consistently, it indicates that many writes are exceeding the provisioned write throughput, and are hence being throttled. If too many write requests to a table are found to be throttled, then use the detailed diagnosis of this measure to determine what type of write requests to that table are getting throttled the most - are they requests to the table? or are they requests to the global secondary index on the table?

Throttling prevents applications from consuming too many capacity units. When a request is throttled, it fails with an HTTP 400 code (Bad Request) and a ProvisionedThroughputExceededException. To avoid write throttle events, administrators should make sure that the table/index is provisioned with adequate write throughput, based on the size of items and the performance level they expect from it. For instance, suppose that administrators want to write 100 items per second to a table, and that the items are 512 bytes in size. For this scenario, administrators have to set the table's provisioned write throughput to 100 write capacity units.

Alternatively, administrators can enable auto scaling for the table. With DynamoDB auto scaling, a table can increase its provisioned read and write capacity to handle sudden increases in traffic, without request throttling. When the workload decreases, DynamoDB auto scaling can decrease the throughput so that administrators do not pay for unused provisioned capacity.

Another option is to purchase reserved capacity in advance. By reserving read and write capacity units ahead of time, administrators can realize significant cost savings compared to on-demand provisioned throughput settings.

Onlineindex consumed write capacity

Indicates the number of write capacity units consumed when adding a new global secondary index to this table.

Number

This metric should be monitored when a new Global Secondary Index is being added so you can be aware if you did not provision enough capacity. If that’s the case, incoming write requests happening during the index building phase might be throttled which will severely slow down its creation and cause upstream delays or problems. You should then adjust the index’s write capacity using the UpdateTable operation, which can be done even if the index is still being built.

Note:

This metric does not take into account ordinary write throughput consumed during index creation. To know how much write capacity was consumed ordinarily by the table during index creation, use the detailed diagnosis for this measure; this reveals how many write capacity units were consumed by the table and by the index.

Onlineindex percentage progress

Indicates the percentage of completion when a new global secondary index is being added to this table.

Percent

This metrics allows you to follow the progress of the creation of a Global Secondary Index. You should keep an eye on this metric and correlate it with the rest of your DynamoDB metrics to make sure the index creation does not impact overall performance. If the index takes too much time to build, it might be due to throttled events so you should check the Onlineindex throttle events metric.

DynamoDB must first allocate resources for the new index, and then backfill attributes from the table into the index. For large tables, this process might take a long time. You should monitor this statistic to view the relative progress as DynamoDB builds the index.

You can use the detailed diagnosis of this measure to know the percentage of progress of global secondary index creation and table creation. This way, you can assess the impact of index creation on table creation time.

Onlineindex throttle events

Indicates the number of write throttle events that occur when adding a new global secondary index to this table.

Number

Write-throttled events happening when adding a new Global Secondary Index to a table can dramatically slow down its creation. In such a situation, use the detailed diagnosis of this measure to know the count of write throttle events that occurred when adding the index to the table and the count of write throttle events that occurred ordinarily on the table. This will point you to the type of throttle events that are actually delaying table creation.

If this metric is not equal to zero, adjust the index’s write capacity using UpdateTable. You can prevent throttling by properly monitoring the measure Onlineindex write capacity. You can also use the detailed diagnosis of this measure to know the detailed information on the throttle events.

Note:

The Write throttle Events metric does not count the throttle events happening during index creation.

Returned items

Indicates the number of items returned by a Scan or Query operation to this table.

Number

Use the detailed diagnosis of this measure to view the count of items returned per operation. This will point you to that operation that returned the maximum number of items.

Successful request latency

Indicates the response time (in milliseconds) of successful requests to this table.

Secs

If you see this number increasing above normal levels, you should quickly investigate since it can significantly impact your application’s performance. It can be caused by network issues, or requests taking too much time due to your table design. In this case, using Global secondary indexes can help maintain reasonable performance.

In the event of high latency, you can also use the detailed diagnosis of this measure to figure out the latency per operation. This will point you to those most latent operations on the table.

System errors

Indicates the number of requests to this table resulting in an HTTP 500 error.

Number

This metric should always be equal to zero. If it is not, you may want to get involved - perhaps restarting portions of the service, temporarily disabling some functionality in your application, or getting in touch with AWS support.

Throttled requests

Indicates the number of user requests to this table, containing at least one event that exceeded your provisioned throughput

Number

Compare the value of this measure across tables to know which table received the maximum requests containing multiple throttle events.

If too many requests to a table are getting throttled, use the detailed diagnosis of this measure to identify the precise operations on that table that are being affected by the throttling. The detailed diagnostics list the number of throttled events per operation, thus leading you to that operation that is being throttled the most.

When a request is throttled, it fails with an HTTP 400 code (Bad Request) and a ProvisionedThroughputExceededException. To avoid throttle events, administrators should make sure that the table is provisioned with adequate read and throughput, based on the size of items in that table and the performance level they expect from it.

User errors

Indicates the number of requests to this table that are generating an HTTP 400 error.

Number

If your client application is interacting correctly with DynamoDB, this metric should always be equal to zero. It is incremented for any 400 error except for ProvisionedThroughputExceededException, ThrottlingException, and ConditionalCheckFailedException. It is usually due to a client error such as an authentication failure.

Returned data

Indicates the amount of data returned by GetRecords operations (Amazon DynamoDB Streams) performed on this table.

KB

Use the detailed diagnosis of this measure to view the amount of data returned per operation. This will point you to that operation that returned the maximum data.

Returned records

Indicates the number of stream records returned by GetRecords operations (Amazon DynamoDB Streams) performed on this table.

Number

Use the detailed diagnosis of this measure to view the number of records returned per operation. This will point you to that operation that returned the maximum number of records.