Azure ADF Activity Runs by Pipeline Test

A Data Factory can have one or more pipelines. A pipeline is a logical grouping of activities that together perform a task. For example, a pipeline could contain a set of activities that ingest and clean log data, and then kick off a mapping data flow to analyze the log data. The pipeline allows you to manage the activities as a set instead of each one individually.

When a pipeline is executed, each activity within that pipeline will generate an activity run. This is a record of that activity’s execution, including its status (e.g., succeeded, failed, in progress), start and end times, and any error messages if it fails. This test monitors those run activities and alerts the administrators on the failed and cancelled runs that could affect the pipeline activity.

Target of the test : Microsoft Azure Data Factory

Agent deploying the test : An external agent

Outputs of the test : One set of results for each Resource Group : Pipeline in the target Microsoft Azure Data Factory.

Configurable parameters for the test

Parameters Description

Test Period

How often should the test be executed.

Host

The host for which the test is to be configured.

Subscription ID

Specify the GUID which uniquely identifies the Microsoft Azure Subscription to be monitored. To know the ID that maps to the target subscription, do the following:

  1. Login to the Microsoft Azure Portal.

  2. When the portal opens, click on the Subscriptions option (as indicated by Figure 1).

    Figure 1 : Clicking on the Subscriptions option

  3. Figure 2 that appears next will list all the subscriptions that have been configured for the target Azure AD tenant. Locate the subscription that is being monitored in the list, and check the value displayed for that subscription in the Subscription ID column.

    Figure 2 : Determining the Subscription ID

  4. Copy the Subscription ID in Figure 2 to the text box corresponding to the SUBSCRIPTION ID parameter in the test configuration page.

Tenant ID

Specify the Directory ID of the Azure AD tenant to which the target subscription belongs. To know how to determine the Directory ID, refer to Configuring the eG Agent to Monitor Microsoft Azure Data Factory Using Azure ARM REST API

Client ID, Client Password, and Confirm Password

To connect to the target subscription, the eG agent requires an Access token in the form of an Application ID and the client secret value. For this purpose, you should register a new application with the Azure AD tenant. To know how to create such an application and determine its Application ID and client secret, refer to Configuring the eG Agent to Monitor Microsoft Azure Data Factory Using Azure ARM REST API. Specify the Application ID of the created Application in the Client ID text box and the client secret value in the Client Password text box. Confirm the Client Password by retyping it in the Confirm Password text box.

Proxy Host

In some environments, all communication with the Azure cloud could be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none, indicating that the eG agent is not configured to communicate via a proxy, by default.

Proxy Username, Proxy Password and Confirm Password

If the proxy server requires authentication, then, specify a valid proxy user name and password in the Proxy Username and Proxy Password parameters, respectively. Then, confirm the password by retyping it in the Confirm Password text box.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD Frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Queued activity runs

Indicates the number of activity runs that were queued in this pipeline.

Number

The value of this measure should be low.

As more activities are queued, the overall pipeline execution time will increase, as the activities have to wait for prior runs to complete or for sufficient resources to become available.

Inprogress activity runs

Indicates the number of activity runs that were in progress in this pipeline.

Number

The value of this measure should be low.

If there are too many in-progress activities consuming resources or taking longer than expected to complete, it can increase the risk of timeouts for long-running activities or the entire pipeline. Timeouts are typically due to insufficient resources or activities waiting too long for completion, leading to failure.

Succeeded activity runs

Indicates the number of activity runs that succeeded in this pipeline.

Number

Ideally, the value of this measure should be singnificantly high.

Failed runs

Indicates the number of activity runs that failed in this pipeline.

Number

The value of this measure should be low.

When there are more failed runs in Azure Data Factory (ADF), it can have a significant impact on pipelines, resources, and overall workflow. Failed runs refer to instances where an activity or a pipeline execution fails to complete successfully due to issues like incorrect configurations, resource exhaustion, or connectivity problems.

Cancelling runs

Indicates the number of cancelling runs in this pipeline.

Number

The value of this measure should be low.

In Azure Data Factory (ADF), cancelling a run refers to the action of stopping a pipeline or activity that is currently in progress or queued for execution. When you cancel a run, ADF will attempt to halt the execution of the activities in the pipeline, preventing any further progress for the current execution.

Cancelled runs

Indicates the number of activity runs that were cancelled in this pipeline.

Number

The value of this measure should be low.

When a pipeline run is cancelled, all dependent activities that were in progress will be stopped, and any downstream tasks that rely on the results of those activities might fail as a result. Frequent cancellations can lead to cascading failures in dependent activities, which may require manual intervention to reset or restart the pipeline.

Long running runs

Indicates the number of activity runs that were long-running in this pipeline.

Number

The value of this measure should be low.

If many activities are running for long periods, other pipelines or activities that are waiting to be executed may be queued up. This can create a backlog, delaying the execution of subsequent tasks.

Total runs

Indicates the total number of activity runs in this pipeline.

Number

This measure alerts on activity run failures and queued activity runs for each pipeline. This measure also helps the administrators to identify the pipelines with maximum activity run failures.

Availability

Indicates the percentage of activity runs that were available in this pipeline.

Percent

 

Total copy duration

Indicates the time taken by the activity run in this pipeline for copy process.

Seconds

The value of this measure should be low.

If the total copy duration is unexpectedly long, it could be challenging to pinpoint where the bottleneck is occurring. Is the source data slow to retrieve? Is the network or infrastructure causing delays? Without clear diagnostics, troubleshooting can become more time-consuming. If the total copy duration is unexpectedly long, it could be challenging to pinpoint where the bottleneck is occurring. Is the source data slow to retrieve? Is the network or infrastructure causing delays? Without clear diagnostics, troubleshooting can become more time-consuming.

Data flow execution duration

Indicates the time taken by the activity run in this pipeline for executing data flow.

Seconds

 

Data reads

Indicates the number of data reads performed by the activity run in this pipeline.

Number

 

Data writes

Indicates the number of data writes performed by the activity run in this pipeline.

Number

 

Failed runs rate

Indicates the rate at which the activity runs failed in this pipeline.

Percent

Ideally, the value of this measure should be low.

Total queue time

Indicates the total queue time taken by the activity runs in this pipeline.

Seconds

 

Average queue time

Indicates the average queue time taken by the activity runs in this pipeline.

Seconds