Azure IoT Hub Test

IoT Hub is a managed service hosted in the cloud that acts as a central message hub for communication between an IoT application and its attached devices. You can connect millions of devices and their backend solutions reliably and securely. Almost any device can be connected to an IoT hub.

Every IoT hub has an identity registry that stores information about the devices and modules permitted to connect to it. Before a device or module can connect, there must be an entry for that device or module in the IoT hub's identity registry. Azure IoT hub maintains a device twin for each device that you connect to IoT hub. Device twins are JSON documents that store device state information including metadata, configurations, and conditions.

A device or module must also authenticate with the IoT hub based on credentials stored in the identity registry. After authentication, the internet connection between the IoT device and IoT Hub is secured using the Transport Layer Security (TLS) standard.

Typically, IoT devices send telemetry from the sensors to back-end services in the cloud. Examples of telemetry received from a device can include sensor data such as speed or temperature, an error message such as missed event, or an information message to indicate the device is in good health. However, other types of communication are possible, such as a back-end service sending commands to your devices - eg., a back-end service sending a command to a device to change the frequency at which it sends telemetry to help diagnose a problem. IoT Hub implements commands by allowing you to invoke direct methods on devices. Direct methods represent a request-reply interaction with a device similar to an HTTP call in that they succeed or fail immediately (after a user-specified timeout).

A built-in endpoint collects data from your device by default. The data is collected using a request-response pattern over dedicated IoT device endpoints, is available for a maximum duration of seven days, and can be used to take actions on a device. Data can also be routed to different services for further processing. Once a message route has been created, data stops flowing to the built-in-endpoint unless a fallback route has been configured.

Back-end apps can also be used to enable device administrators and operators to update and interact with IoT devices in bulk and at a scheduled time. Jobs execute device twin updates and direct methods against a set of devices at a scheduled time. For example, an operator may want to use a back-end app that initiates and tracks a job to reboot a set of devices in building 43 and floor 3 at a time that would not be disruptive to the operations of the building.

While on the one hand, IoT hubs simplify business workflows by enabling 'near-hands-free' communication and action between devices and backend solutions, on the other, they can also serve as a 'problem hotspot' because of the many moving parts within! For example, an unavailable IoT hub can suspend business operations, as IoT applications will no longer be able to communicate with devices attached to that hub. Also, if commands sent by back-end services to devices fail, they can cause unexpected errors / issues in a business-critical workflow. Likewise, some telemetry messages may not be delivered to desired endpoints, thereby disrupting a crucial business practice. Moreover, slowness may be observed in message routing, which may consequently delay key business processes. Similarly, device twin updates, scheduled jobs, and job queries can also fail, resulting in problems in communication and corresponding action. Furthermore, if an IoT hub operates at a level higher than the established quotas, then again, the performance of the hub and business services that depend on it will be compromised. To avoid this, administrators need to closely track how each IoT hub interacts with attached devices, and rapidly detect anomalies. This is where the Azure IoT Hub test helps!

This test auto-discovers the IoT hubs created for each resource group of a target subscription. For every hub so discovered, the test reports the status of that hub, and alerts administrators if any hub is unavailable. Commands executed on the devices attached to each hub are monitored, and command aborts, rejections, and abandonment are brought to the attention of administrators. Message routing by every hub is also monitored; in the process, latencies in message delivery to specific endpoints are revealed. Furthermore, the test also tracks operations - eg., reads, updates etc. - performed on device twins maintained by each hub. In the process, the test captures and reports operational failures. The progress of scheduled jobs is tracked, and job failures, cancellation failures, job query failures are highlighted. The test also periodically measures the operational levels of every hub by reporting the count of devices registered with a hub, the size of requests and responses, the count of messages sent to and by devices etc. Administrators are notified if any hub is about to use up its operational capacity, thus urging them to increase the capacity before performance suffers. This way, the test rapidly points administrators to problems in the functioning of an IoT hub and prompts them to immediately initiate corrective action, so that the hub operates uninterrupted.

Target of the Test: A Microsoft Azure Subscription

Agent deploying the test: A remote agent

Output of the test: One set of results for every IoT hub configured for each resource group of the target subscription

Configurable parameters for the test
Parameters	Description
Test Period	How often should the test be executed.
Host	The host for which the test is to be configured.
Subscription ID	Specify the GUID which uniquely identifies the Microsoft Azure Subscription to be monitored. To know the ID that maps to the target subscription, do the following: Login to the Microsoft Azure Portal. When the portal opens, click on the Subscriptions option (as indicated by Figure 1). Figure 1 : Clicking on the Subscriptions option Figure 2 that appears next will list all the subscriptions that have been configured for the target Azure AD tenant. Locate the subscription that is being monitored in the list, and check the value displayed for that subscription in the Subscription ID column. Figure 2 : Determining the Subscription ID Copy the Subscription ID in Figure 2 to the text box corresponding to the SUBSCRIPTION ID parameter in the test configuration page.
Tenant ID	Specify the Directory ID of the Azure AD tenant to which the target subscription belongs. To know how to determine the Directory ID, refer to Configuring the eG Agent to Monitor a Microsoft Azure Subscription Using Azure ARM REST API.
Client ID, Client Password, and Confirm Password	To connect to the target subscription, the eG agent requires an Access token in the form of an Application ID and the client secret value. For this purpose, you should register a new application with the Azure AD tenant. To know how to create such an application and determine its Application ID and client secret, refer to Configuring the eG Agent to Monitor a Microsoft Azure Subscription Using Azure ARM REST API. Specify the Application ID of the created Application in the Client ID text box and the client secret value in the Client Password text box. Confirm the Client Password by retyping it in the Confirm Password text box.
Proxy Host and Proxy Port	In some environments, all communication with the Azure cloud be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none, indicating that the eG agent is not configured to communicate via a proxy, by default.
Proxy Username, Proxy Password and Confirm Password	If the proxy server requires authentication, then, specify a valid proxy user name and password in the Proxy Username and Proxy Password parameters, respectively. Then, confirm the password by retyping it in the Confirm Password text box.
Detailed Diagnosis	To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled: The eG manager license should allow the detailed diagnosis capability Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

Measures made by the test:

Measurement

Description

Measurement Unit

Interpretation

Status

Indicates whether/not this IoT hub is available.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value	Numeric Value
Available	1
Unavailable	0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of an IoT hub. In the graph of this measure however, the same is represented using the numeric equivalents only.

Use the detailed diagnosis of this measure to know the location, SKU, tier and capacity of the IoT hub.

Provisioning status

Indicates the current provisioning status of this IoT hub.

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value	Numeric Value
Succeeded	1
Updating	2
Error	3
Unknown	0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current provisioning status . In the graph of this measure however, the same is represented using the numeric equivalents only.

Telemetry message send attempts

Indicates the number of device-to-cloud telemetry messages attempted to be sent to this IoT hub.

Number

Make sure that the value of this measure is within the prescribed limits. If not, then throttling errors will occur, causing messages to be dropped.

Telemetry messages sent

Indicates the number of device-to-cloud telemetry messages sent successfully to this IoT hub.

Number

Commands completed

Indicates the number of cloud-to-device message deliveries completed successfully by this IoT hub.

Number

To guarantee at-least-once message delivery, your IoT hub persists cloud-to-device messages in per-device queues. For the IoT hub to remove the messages from the queue, the devices must explicitly acknowledge completion. This approach guarantees resiliency against connectivity and device failures.

Commands abandoned

Indicates the number of cloud-to-device commands abandoned by the devices attached to this IoT hub.

Number

Devices abandon messages, which causes the IoT hub to put the messages back in the queue, with the state set to Enqueued

Commands rejected

Indicates the number of cloud-to-device commands rejected by the devices attached to this hub.

Number

Ideally, the value of this measure should be 0.

A non-zero value implies that one/more messages are rejected.

If a message is rejected, the IoT hub to sets it to the Dead lettered state.

Total devices

Indicates the total number of devices registered with this hub.

Number

The total number of devices plus modules that can be registered to a single IoT hub is capped at 1,000,000. If the value of this measure is equal to this cap, the performance of the hub will be compromised. To avoid this, you may want to increase the limit by contacting Microsoft support.

Connected devices

Indicates the number of devices currently connected to this hub.

Number

Ideally, the value of this measure should be the same as that of the Total devices measure. If the gap between these two measures is large, it could mean that many registered devices are currently disconnected from the hub.

Telemetry messages delivered

Indicates the number of telemetry messages that this hub successfully delivered to endpoints.

Number

Dropped messages

Indicates the total count of messages dropped by this hub.

Number

Typically, a message is dropped for reasons like message didn't match any routing query or endpoint was dead and message could not be delivered after several retries.

Ideally, the value of this measure should be 0.

Orphaned messages

Indicates the number of messages that this hub orphaned.

Number

Orphaned messages are those that do not match any routes, including fallback routes.

Ideally, the value of this measure should be 0.

Invalid messages

Indicates the number of messages that this hub could not deliver.

Number

Message is invalid because of incompatibility with the endpoint.

If this measure reports a non-zero value, then you may want to check the configuration of endpoints for incompatibilities.

Messages matching fallback condition

Indicates the number of messages that this hub has written to the fallback endpoint.

Number

If message routing is turned on, you can enable the fallback route capability. Once a route is created, data stops flowing to the built-in-endpoint, unless a route is created to that endpoint. If there are no routes to the built-in-endpoint and a fallback route is enabled, only messages that don't match any query conditions on routes will be sent to the built-in-endpoint. Also, if all existing routes are deleted, fallback route must be enabled to receive all data at the built-in-endpoint.

If this measure reports a non-zero value, it could imply that either no routes have been created to the built-in endpoint, or none of the routes created match the query conditions.

Messages delivered to event hub endpoints

Indicates the number of messages that this hub delivered to event hub endpoints.

Number

Apart from the built-in-Event Hubs compatible endpoint, you can also route data to custom endpoints of type Event Hubs.

Message latency for event hub endpoints

Indicates the average latency between message ingress to this IoT hub and message ingress into custom endpoints of type Event Hub.

Milliseconds

If users notice any slowness in message routing, then compare the value of this measure with the other latency metrics reported by this test to determine whether deliveries to a specific endpoint were much slower than the rest.

Messages delivered to service bus queue endpoints

Indicates the number of messages that this hub delivered to service bus queue endpoints.

Number

Message latency for service bus queue endpoints

Indicates the average latency between message ingress to this IoT hub and message ingress into a service bus queue endpoint.

Milliseconds

Messages delivered to service bus topic endpoint

Indicates the number of messages that this hub delivered to service bus topic endpoints.

Number

Message latency for service bus topic endpoints

Indicates the average latency between message ingress to this IoT hub and message ingress into a service bus topic endpoint.

Milliseconds

Messages delivered to the built-in endpoint

Indicates the number of messages that this hub delivered to the built-in endpoint.

Number

Message latency for the built-in endpoint

Indicates the average latency between message ingress to this IoT hub and message ingress into the built-in endpoint and fallback route.

Milliseconds

Messages delivered to storage endpoints

Indicates the number of messages that this hub delivered to storage endpoints.

Number

Message latency for storage endpoints

Indicates the average latency between message ingress to this IoT hub and message ingress into a storage endpoint.

Milliseconds

Data written to storage

Indicates the amount of data this IoT hub delivered to storage endpoints.

Blobs written to storage

Indicates the number of times this hub delivered blobs to storage endpoints.

Number

Successful twin reads from devices

Indicates the count of all successful device-initiated twin reads from this hub.

Number

Failed twin reads from devices

Indicates the count of all failed device-initiated twin reads from this hub.

Number

Ideally, the value of this measure should be 0.

Response size of twin reads from devices

Indicates the average response size of all successful device-initiated twin reads from this hub.

Successful twin updates from devices

Indicates the count of all successful device-initiated twin updates to this hub.

Number

Failed twin updates from devices

Indicates the count of all failed device-initiated twin updates to this hub.

Number

Size of twin updates from devices

Indicates the average response size of all successful device-initiated twin updates to this hub.

Successful direct method invocations

Indicates the count of all successful direct method calls made by this hub.

Number

Failed direct method invocations

Indicates the count of all failed direct method calls made by this hub.

Number

Ideally, the value of this measure should be 0.

Request size of direct method invocations

Indicates the average request size of Cloud to Device method invocations made by this hub.

Response size of direct method invocations

Indicates the average response size of Cloud to Device method invocations made by this hub.

Successful twin reads from back end

Indicates the count of all successful back-end-initiated twin reads processed by this hub.

Number

Failed twin reads from back end

Indicates the count of all failed back-end-initiated twin updates processed by this hub.

Number

Ideally, the value of this measure should be 0.

Response size of twin reads from back end

Indicates the average response size of backend-initiated twin reads processed by this hub.

Successful twin updates from back end

Indicates the total count of successful device-initiated twin updates processed by this hub.

Number

Failed twin updates from back end

Indicates the total count of failed device-initiated twin updates processed by this hub.

Number

Ideally, the value of this measure should be 0.

Size of twin updates from back end

Indicates the average response size of device-initiated twin updates processed by this hub.

Successful twin queries

Indicates the total number of twin queries processed by this hub that were successful.

Number

Failed twin queries

Indicates the total number of twin queries processed by this hub that failed.

Number

Ideally, the value of this measure should be 0.

Twin queries result size

Indicates the average result size of successful twin queries processed by this hub.

Successful creations of twin update jobs

Indicates the number of twin update jobs this hub successfully created.

Number

Failed creations of twin update jobs

Indicates the number of twin update jobs this hub could not create.

Number

Ideally, the value of this measure should be 0.

Successful creations of method invocation jobs

Indicates the number of direct method invocation jobs successfully created by this hub.

Number

Failed creations of method invocation jobs

Indicates the number of direct method invocation jobs this hub could not create.

Number

Ideally, the value of this measure should be 0.

Successful calls to list jobs

Indicates the number of calls made by this hub to list jobs that succeeded.

Number

Failed calls to list jobs

Indicates the number of calls made by this hub to list jobs that failed.

Number

Ideally, the value of this measure should be 0.

Successful job cancellations

Indicates the number of calls made by this hub to cancel jobs that succeeded.

Number

Failed job cancellations

Indicates the number of calls made by this hub to cancel jobs that failed.

Number

Ideally, the value of this measure should be 0.

Successful job queries

Indicates the total count of calls to query jobs that was successfully processed by this hub.

Number

Failed job queries

Indicates the total count of calls to query jobs that this hub failed to process.

Number

Ideally, the value of this measure should be 0.

Completed jobs

Indicates the total count of jobs completed by this hub.

Number

Failed jobs

Indicates the total count of jobs processed by this hub that failed.

Number

Ideally, the value of this measure should be 0.

Throttling errors

Indicates the number of throttling errors encountered by this hub.

Number

Throttling errors occur if an IoT hub's throttling limits have been exceeded for the requested operation. Ideally, the value of this measure should be 0. However, if this measure reports a non-zero value, then check if you are hitting the throttling limit by comparing your Telemetry message send attempts metric against the limits set.

Also, note that throttling errors occur only after the limit has been violated for too long a period. This is done so that your messages are not dropped if your IoT hub gets burst traffic. In the meantime, IoT hub processes the messages at the operation throttle rate, which might be slow if there is too much traffic in the backlog.

Total messages used

Indicates the total number of messages used by this hub today.

Number

This is a cumulative value that is reset to zero at the beginning of each day.