Azure IoT Hub Test
IoT Hub is a managed service hosted in the cloud that acts as a central message hub for communication between an IoT application and its attached devices. You can connect millions of devices and their backend solutions reliably and securely. Almost any device can be connected to an IoT hub.
Every IoT hub has an identity registry that stores information about the devices and modules permitted to connect to it. Before a device or module can connect, there must be an entry for that device or module in the IoT hub's identity registry. Azure IoT hub maintains a device twin for each device that you connect to IoT hub. Device twins are JSON documents that store device state information including metadata, configurations, and conditions.
A device or module must also authenticate with the IoT hub based on credentials stored in the identity registry. After authentication, the internet connection between the IoT device and IoT Hub is secured using the Transport Layer Security (TLS) standard.
Typically, IoT devices send telemetry from the sensors to back-end services in the cloud. Examples of telemetry received from a device can include sensor data such as speed or temperature, an error message such as missed event, or an information message to indicate the device is in good health. However, other types of communication are possible, such as a back-end service sending commands to your devices - eg., a back-end service sending a command to a device to change the frequency at which it sends telemetry to help diagnose a problem. IoT Hub implements commands by allowing you to invoke direct methods on devices. Direct methods represent a request-reply interaction with a device similar to an HTTP call in that they succeed or fail immediately (after a user-specified timeout).
A built-in endpoint collects data from your device by default. The data is collected using a request-response pattern over dedicated IoT device endpoints, is available for a maximum duration of seven days, and can be used to take actions on a device. Data can also be routed to different services for further processing. Once a message route has been created, data stops flowing to the built-in-endpoint unless a fallback route has been configured.
Back-end apps can also be used to enable device administrators and operators to update and interact with IoT devices in bulk and at a scheduled time. Jobs execute device twin updates and direct methods against a set of devices at a scheduled time. For example, an operator may want to use a back-end app that initiates and tracks a job to reboot a set of devices in building 43 and floor 3 at a time that would not be disruptive to the operations of the building.
While on the one hand, IoT hubs simplify business workflows by enabling 'near-hands-free' communication and action between devices and backend solutions, on the other, they can also serve as a 'problem hotspot' because of the many moving parts within! For example, an unavailable IoT hub can suspend business operations, as IoT applications will no longer be able to communicate with devices attached to that hub. Also, if commands sent by back-end services to devices fail, they can cause unexpected errors / issues in a business-critical workflow. Likewise, some telemetry messages may not be delivered to desired endpoints, thereby disrupting a crucial business practice. Moreover, slowness may be observed in message routing, which may consequently delay key business processes. Similarly, device twin updates, scheduled jobs, and job queries can also fail, resulting in problems in communication and corresponding action. Furthermore, if an IoT hub operates at a level higher than the established quotas, then again, the performance of the hub and business services that depend on it will be compromised. To avoid this, administrators need to closely track how each IoT hub interacts with attached devices, and rapidly detect anomalies. This is where the Azure IoT Hub test helps!
This test auto-discovers the IoT hubs created for each resource group of a target subscription. For every hub so discovered, the test reports the status of that hub, and alerts administrators if any hub is unavailable. Commands executed on the devices attached to each hub are monitored, and command aborts, rejections, and abandonment are brought to the attention of administrators. Message routing by every hub is also monitored; in the process, latencies in message delivery to specific endpoints are revealed. Furthermore, the test also tracks operations - eg., reads, updates etc. - performed on device twins maintained by each hub. In the process, the test captures and reports operational failures. The progress of scheduled jobs is tracked, and job failures, cancellation failures, job query failures are highlighted. The test also periodically measures the operational levels of every hub by reporting the count of devices registered with a hub, the size of requests and responses, the count of messages sent to and by devices etc. Administrators are notified if any hub is about to use up its operational capacity, thus urging them to increase the capacity before performance suffers. This way, the test rapidly points administrators to problems in the functioning of an IoT hub and prompts them to immediately initiate corrective action, so that the hub operates uninterrupted.
Target of the Test: A Microsoft Azure Subscription
Agent deploying the test: A remote agent
Output of the test: One set of results for every IoT hub configured for each resource group of the target subscription
Parameters | Description |
---|---|
Test Period |
How often should the test be executed. |
Host |
The host for which the test is to be configured. |
Subscription ID |
Specify the GUID which uniquely identifies the Microsoft Azure Subscription to be monitored. To know the ID that maps to the target subscription, do the following:
|
Tenant ID |
Specify the Directory ID of the Azure AD tenant to which the target subscription belongs. To know how to determine the Directory ID, refer to Configuring the eG Agent to Monitor a Microsoft Azure Subscription Using Azure ARM REST API. |
Client ID, Client Password, and Confirm Password |
To connect to the target subscription, the eG agent requires an Access token in the form of an Application ID and the client secret value. For this purpose, you should register a new application with the Azure AD tenant. To know how to create such an application and determine its Application ID and client secret, refer to Configuring the eG Agent to Monitor a Microsoft Azure Subscription Using Azure ARM REST API. Specify the Application ID of the created Application in the Client ID text box and the client secret value in the Client Password text box. Confirm the Client Password by retyping it in the Confirm Password text box. |
Proxy Host and Proxy Port |
In some environments, all communication with the Azure cloud be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none, indicating that the eG agent is not configured to communicate via a proxy, by default. |
Proxy Username, Proxy Password and Confirm Password |
If the proxy server requires authentication, then, specify a valid proxy user name and password in the Proxy Username and Proxy Password parameters, respectively. Then, confirm the password by retyping it in the Confirm Password text box. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement | Description | Measurement Unit | Interpretation | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Status |
Indicates whether/not this IoT hub is available. |
|
The values reported by this measure and its numeric equivalents are mentioned in the table below:
Note: By default, this measure reports the Measure Values listed in the table above to indicate the current status of an IoT hub. In the graph of this measure however, the same is represented using the numeric equivalents only. Use the detailed diagnosis of this measure to know the location, SKU, tier and capacity of the IoT hub. |
||||||||||
Provisioning status |
Indicates the current provisioning status of this IoT hub. |
|
The values reported by this measure and its numeric equivalents are mentioned in the table below:
Note: By default, this measure reports the Measure Values listed in the table above to indicate the current provisioning status . In the graph of this measure however, the same is represented using the numeric equivalents only. |
||||||||||
Telemetry message send attempts |
Indicates the number of device-to-cloud telemetry messages attempted to be sent to this IoT hub. |
Number |
Make sure that the value of this measure is within the prescribed limits. If not, then throttling errors will occur, causing messages to be dropped. |
||||||||||
Telemetry messages sent |
Indicates the number of device-to-cloud telemetry messages sent successfully to this IoT hub. |
Number |
|
||||||||||
Commands completed |
Indicates the number of cloud-to-device message deliveries completed successfully by this IoT hub. |
Number |
To guarantee at-least-once message delivery, your IoT hub persists cloud-to-device messages in per-device queues. For the IoT hub to remove the messages from the queue, the devices must explicitly acknowledge completion. This approach guarantees resiliency against connectivity and device failures. |
||||||||||
Commands abandoned |
Indicates the number of cloud-to-device commands abandoned by the devices attached to this IoT hub. |
Number |
Devices abandon messages, which causes the IoT hub to put the messages back in the queue, with the state set to Enqueued |
||||||||||
Commands rejected |
Indicates the number of cloud-to-device commands rejected by the devices attached to this hub. |
Number |
Ideally, the value of this measure should be 0. A non-zero value implies that one/more messages are rejected. If a message is rejected, the IoT hub to sets it to the Dead lettered state. |
||||||||||
Total devices |
Indicates the total number of devices registered with this hub. |
Number |
The total number of devices plus modules that can be registered to a single IoT hub is capped at 1,000,000. If the value of this measure is equal to this cap, the performance of the hub will be compromised. To avoid this, you may want to increase the limit by contacting Microsoft support. |
||||||||||
Connected devices |
Indicates the number of devices currently connected to this hub. |
Number |
Ideally, the value of this measure should be the same as that of the Total devices measure. If the gap between these two measures is large, it could mean that many registered devices are currently disconnected from the hub. |
||||||||||
Telemetry messages delivered |
Indicates the number of telemetry messages that this hub successfully delivered to endpoints. |
Number |
|
||||||||||
Dropped messages |
Indicates the total count of messages dropped by this hub. |
Number |
Typically, a message is dropped for reasons like message didn't match any routing query or endpoint was dead and message could not be delivered after several retries. Ideally, the value of this measure should be 0. |
||||||||||
Orphaned messages |
Indicates the number of messages that this hub orphaned. |
Number |
Orphaned messages are those that do not match any routes, including fallback routes. Ideally, the value of this measure should be 0. |
||||||||||
Invalid messages |
Indicates the number of messages that this hub could not deliver. |
Number |
Message is invalid because of incompatibility with the endpoint. If this measure reports a non-zero value, then you may want to check the configuration of endpoints for incompatibilities. |
||||||||||
Messages matching fallback condition |
Indicates the number of messages that this hub has written to the fallback endpoint. |
Number |
If message routing is turned on, you can enable the fallback route capability. Once a route is created, data stops flowing to the built-in-endpoint, unless a route is created to that endpoint. If there are no routes to the built-in-endpoint and a fallback route is enabled, only messages that don't match any query conditions on routes will be sent to the built-in-endpoint. Also, if all existing routes are deleted, fallback route must be enabled to receive all data at the built-in-endpoint. If this measure reports a non-zero value, it could imply that either no routes have been created to the built-in endpoint, or none of the routes created match the query conditions. |
||||||||||
Messages delivered to event hub endpoints |
Indicates the number of messages that this hub delivered to event hub endpoints. |
Number |
Apart from the built-in-Event Hubs compatible endpoint, you can also route data to custom endpoints of type Event Hubs. |
||||||||||
Message latency for event hub endpoints |
Indicates the average latency between message ingress to this IoT hub and message ingress into custom endpoints of type Event Hub. |
Milliseconds |
If users notice any slowness in message routing, then compare the value of this measure with the other latency metrics reported by this test to determine whether deliveries to a specific endpoint were much slower than the rest. |
||||||||||
Messages delivered to service bus queue endpoints |
Indicates the number of messages that this hub delivered to service bus queue endpoints. |
Number |
|
||||||||||
Message latency for service bus queue endpoints |
Indicates the average latency between message ingress to this IoT hub and message ingress into a service bus queue endpoint. |
Milliseconds |
If users notice any slowness in message routing, then compare the value of this measure with the other latency metrics reported by this test to determine whether deliveries to a specific endpoint were much slower than the rest. |
||||||||||
Messages delivered to service bus topic endpoint |
Indicates the number of messages that this hub delivered to service bus topic endpoints. |
Number |
|
||||||||||
Message latency for service bus topic endpoints |
Indicates the average latency between message ingress to this IoT hub and message ingress into a service bus topic endpoint. |
Milliseconds |
If users notice any slowness in message routing, then compare the value of this measure with the other latency metrics reported by this test to determine whether deliveries to a specific endpoint were much slower than the rest. |
||||||||||
Messages delivered to the built-in endpoint |
Indicates the number of messages that this hub delivered to the built-in endpoint. |
Number |
|
||||||||||
Message latency for the built-in endpoint |
Indicates the average latency between message ingress to this IoT hub and message ingress into the built-in endpoint and fallback route. |
Milliseconds |
If users notice any slowness in message routing, then compare the value of this measure with the other latency metrics reported by this test to determine whether deliveries to a specific endpoint were much slower than the rest. |
||||||||||
Messages delivered to storage endpoints |
Indicates the number of messages that this hub delivered to storage endpoints. |
Number |
|
||||||||||
Message latency for storage endpoints |
Indicates the average latency between message ingress to this IoT hub and message ingress into a storage endpoint. |
Milliseconds |
If users notice any slowness in message routing, then compare the value of this measure with the other latency metrics reported by this test to determine whether deliveries to a specific endpoint were much slower than the rest. |
||||||||||
Data written to storage |
Indicates the amount of data this IoT hub delivered to storage endpoints. |
MB |
|
||||||||||
Blobs written to storage |
Indicates the number of times this hub delivered blobs to storage endpoints. |
Number |
|
||||||||||
Successful twin reads from devices |
Indicates the count of all successful device-initiated twin reads from this hub. |
Number |
|
||||||||||
Failed twin reads from devices |
Indicates the count of all failed device-initiated twin reads from this hub. |
Number |
Ideally, the value of this measure should be 0. |
||||||||||
Response size of twin reads from devices |
Indicates the average response size of all successful device-initiated twin reads from this hub. |
MB |
|
||||||||||
Successful twin updates from devices |
Indicates the count of all successful device-initiated twin updates to this hub. |
Number |
|
||||||||||
Failed twin updates from devices |
Indicates the count of all failed device-initiated twin updates to this hub. |
Number |
|
||||||||||
Size of twin updates from devices |
Indicates the average response size of all successful device-initiated twin updates to this hub. |
MB |
|
||||||||||
Successful direct method invocations |
Indicates the count of all successful direct method calls made by this hub. |
Number |
|
||||||||||
Failed direct method invocations |
Indicates the count of all failed direct method calls made by this hub. |
Number |
Ideally, the value of this measure should be 0. |
||||||||||
Request size of direct method invocations |
Indicates the average request size of Cloud to Device method invocations made by this hub. |
MB |
|
||||||||||
Response size of direct method invocations |
Indicates the average response size of Cloud to Device method invocations made by this hub. |
MB |
|
||||||||||
Successful twin reads from back end |
Indicates the count of all successful back-end-initiated twin reads processed by this hub. |
Number |
|
||||||||||
Failed twin reads from back end |
Indicates the count of all failed back-end-initiated twin updates processed by this hub. |
Number |
Ideally, the value of this measure should be 0. |
||||||||||
Response size of twin reads from back end |
Indicates the average response size of backend-initiated twin reads processed by this hub. |
MB |
|
||||||||||
Successful twin updates from back end |
Indicates the total count of successful device-initiated twin updates processed by this hub. |
Number |
|
||||||||||
Failed twin updates from back end |
Indicates the total count of failed device-initiated twin updates processed by this hub. |
Number |
Ideally, the value of this measure should be 0. |
||||||||||
Size of twin updates from back end |
Indicates the average response size of device-initiated twin updates processed by this hub. |
MB |
|
||||||||||
Successful twin queries |
Indicates the total number of twin queries processed by this hub that were successful. |
Number |
|
||||||||||
Failed twin queries |
Indicates the total number of twin queries processed by this hub that failed. |
Number |
Ideally, the value of this measure should be 0. |
||||||||||
Twin queries result size |
Indicates the average result size of successful twin queries processed by this hub. |
MB |
|
||||||||||
Successful creations of twin update jobs |
Indicates the number of twin update jobs this hub successfully created. |
Number |
|
||||||||||
Failed creations of twin update jobs |
Indicates the number of twin update jobs this hub could not create. |
Number |
Ideally, the value of this measure should be 0. |
||||||||||
Successful creations of method invocation jobs |
Indicates the number of direct method invocation jobs successfully created by this hub. |
Number |
|
||||||||||
Failed creations of method invocation jobs |
Indicates the number of direct method invocation jobs this hub could not create. |
Number |
Ideally, the value of this measure should be 0. |
||||||||||
Successful calls to list jobs |
Indicates the number of calls made by this hub to list jobs that succeeded. |
Number |
|
||||||||||
Failed calls to list jobs |
Indicates the number of calls made by this hub to list jobs that failed. |
Number |
Ideally, the value of this measure should be 0. |
||||||||||
Successful job cancellations |
Indicates the number of calls made by this hub to cancel jobs that succeeded. |
Number |
|
||||||||||
Failed job cancellations |
Indicates the number of calls made by this hub to cancel jobs that failed. |
Number |
Ideally, the value of this measure should be 0. |
||||||||||
Successful job queries |
Indicates the total count of calls to query jobs that was successfully processed by this hub. |
Number |
|
||||||||||
Failed job queries |
Indicates the total count of calls to query jobs that this hub failed to process. |
Number |
Ideally, the value of this measure should be 0. |
||||||||||
Completed jobs |
Indicates the total count of jobs completed by this hub. |
Number |
|
||||||||||
Failed jobs |
Indicates the total count of jobs processed by this hub that failed. |
Number |
Ideally, the value of this measure should be 0. |
||||||||||
Throttling errors |
Indicates the number of throttling errors encountered by this hub. |
Number |
Throttling errors occur if an IoT hub's throttling limits have been exceeded for the requested operation. Ideally, the value of this measure should be 0. However, if this measure reports a non-zero value, then check if you are hitting the throttling limit by comparing your Telemetry message send attempts metric against the limits set. Also, note that throttling errors occur only after the limit has been violated for too long a period. This is done so that your messages are not dropped if your IoT hub gets burst traffic. In the meantime, IoT hub processes the messages at the operation throttle rate, which might be slow if there is too much traffic in the backlog. |
||||||||||
Total messages used |
Indicates the total number of messages used by this hub today. |
Number |
This is a cumulative value that is reset to zero at the beginning of each day. |