Azure ADF Activity Runs by Pipeline Test
A Data Factory can have one or more pipelines. A pipeline is a logical grouping of activities that together perform a task. For example, a pipeline could contain a set of activities that ingest and clean log data, and then kick off a mapping data flow to analyze the log data. The pipeline allows you to manage the activities as a set instead of each one individually.
When a pipeline is executed, each activity within that pipeline will generate an activity run. This is a record of that activity’s execution, including its status (e.g., succeeded, failed, in progress), start and end times, and any error messages if it fails. This test monitors those run activities and alerts the administrators on the failed and cancelled runs that could affect the pipeline activity.
Target of the test : Microsoft Azure Data Factory
Agent deploying the test : An external agent
Outputs of the test : One set of results for each Resource Group : Pipeline in the target Microsoft Azure Data Factory.
Parameters | Description |
---|---|
Test Period |
How often should the test be executed. |
Host |
The host for which the test is to be configured. |
Subscription ID |
Specify the GUID which uniquely identifies the Microsoft Azure Subscription to be monitored. To know the ID that maps to the target subscription, do the following:
|
Tenant ID |
Specify the Directory ID of the Azure AD tenant to which the target subscription belongs. To know how to determine the Directory ID, refer to Configuring the eG Agent to Monitor Microsoft Azure Data Factory Using Azure ARM REST API |
Client ID, Client Password, and Confirm Password |
To connect to the target subscription, the eG agent requires an Access token in the form of an Application ID and the client secret value. For this purpose, you should register a new application with the Azure AD tenant. To know how to create such an application and determine its Application ID and client secret, refer to Configuring the eG Agent to Monitor Microsoft Azure Data Factory Using Azure ARM REST API. Specify the Application ID of the created Application in the Client ID text box and the client secret value in the Client Password text box. Confirm the Client Password by retyping it in the Confirm Password text box. |
Proxy Host |
In some environments, all communication with the Azure cloud could be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none, indicating that the eG agent is not configured to communicate via a proxy, by default. |
Proxy Username, Proxy Password and Confirm Password |
If the proxy server requires authentication, then, specify a valid proxy user name and password in the Proxy Username and Proxy Password parameters, respectively. Then, confirm the password by retyping it in the Confirm Password text box. |
DD Frequency |
Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD Frequency. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement | Description | Measurement Unit | Interpretation |
---|---|---|---|
Queued activity runs |
Indicates the number of activity runs that were queued in this pipeline. |
Number |
The value of this measure should be low. As more activities are queued, the overall pipeline execution time will increase, as the activities have to wait for prior runs to complete or for sufficient resources to become available. |
Inprogress activity runs |
Indicates the number of activity runs that were in progress in this pipeline. |
Number |
The value of this measure should be low. If there are too many in-progress activities consuming resources or taking longer than expected to complete, it can increase the risk of timeouts for long-running activities or the entire pipeline. Timeouts are typically due to insufficient resources or activities waiting too long for completion, leading to failure. |
Succeeded activity runs |
Indicates the number of activity runs that succeeded in this pipeline. |
Number |
Ideally, the value of this measure should be singnificantly high. |
Failed runs |
Indicates the number of activity runs that failed in this pipeline. |
Number |
The value of this measure should be low. When there are more failed runs in Azure Data Factory (ADF), it can have a significant impact on pipelines, resources, and overall workflow. Failed runs refer to instances where an activity or a pipeline execution fails to complete successfully due to issues like incorrect configurations, resource exhaustion, or connectivity problems. |
Cancelling runs |
Indicates the number of cancelling runs in this pipeline. |
Number |
The value of this measure should be low. In Azure Data Factory (ADF), cancelling a run refers to the action of stopping a pipeline or activity that is currently in progress or queued for execution. When you cancel a run, ADF will attempt to halt the execution of the activities in the pipeline, preventing any further progress for the current execution. |
Cancelled runs |
Indicates the number of activity runs that were cancelled in this pipeline. |
Number |
The value of this measure should be low. When a pipeline run is cancelled, all dependent activities that were in progress will be stopped, and any downstream tasks that rely on the results of those activities might fail as a result. Frequent cancellations can lead to cascading failures in dependent activities, which may require manual intervention to reset or restart the pipeline. |
Long running runs |
Indicates the number of activity runs that were long-running in this pipeline. |
Number |
The value of this measure should be low. If many activities are running for long periods, other pipelines or activities that are waiting to be executed may be queued up. This can create a backlog, delaying the execution of subsequent tasks. |
Total runs |
Indicates the total number of activity runs in this pipeline. |
Number |
This measure alerts on activity run failures and queued activity runs for each pipeline. This measure also helps the administrators to identify the pipelines with maximum activity run failures. |
Availability |
Indicates the percentage of activity runs that were available in this pipeline. |
Percent |
|
Total copy duration |
Indicates the time taken by the activity run in this pipeline for copy process. |
Seconds |
The value of this measure should be low. If the total copy duration is unexpectedly long, it could be challenging to pinpoint where the bottleneck is occurring. Is the source data slow to retrieve? Is the network or infrastructure causing delays? Without clear diagnostics, troubleshooting can become more time-consuming. If the total copy duration is unexpectedly long, it could be challenging to pinpoint where the bottleneck is occurring. Is the source data slow to retrieve? Is the network or infrastructure causing delays? Without clear diagnostics, troubleshooting can become more time-consuming. |
Data flow execution duration |
Indicates the time taken by the activity run in this pipeline for executing data flow. |
Seconds |
|
Data reads |
Indicates the number of data reads performed by the activity run in this pipeline. |
Number |
|
Data writes |
Indicates the number of data writes performed by the activity run in this pipeline. |
Number |
|
Failed runs rate |
Indicates the rate at which the activity runs failed in this pipeline. |
Percent |
Ideally, the value of this measure should be low. |
Total queue time |
Indicates the total queue time taken by the activity runs in this pipeline. |
Seconds |
|
Average queue time |
Indicates the average queue time taken by the activity runs in this pipeline. |
Seconds |
|