Azure Data Factory Test
Azure Data Factory (ADF) is a cloud-based data integration service provided by Microsoft Azure. ADF allows users to connect to a wide range of data sources, both on-premises and in the cloud, including databases, file systems, and SaaS applications. Data can be moved from one location to another using activities like copying data from a source to a destination. ADF supports data transformation using data flows, which performs complex transformations on data as it moves between sources and destination. A pipeline is a logical grouping of activities that initiates the data movement and data transformation. When a pipeline is executed, each activity within that pipeline will generate an activity run. This is a record of that activity’s execution, including its status (e.g., succeeded, failed, in progress), start and end times, and any error messages if it fails.
This test reports the failed trigger runs, failed pipeline runs, and failed pipeline activity runs occurred during pipeline run execution. Using this test, early detection of failures helps the administrators to troubleshoot issues quickly, reducing downtime and minimizing the impact on data availability and quality. This test also helps identify bottlenecks and performance issues to optimize pipeline activities for faster data processing and improved efficiency.
Target of the test : Microsoft Azure Data Factory
Agent deploying the test : An external agent
Outputs of the test : One set of results for the target Microsoft Azure Data Factory.
Parameters | Description |
---|---|
Test Period |
How often should the test be executed. |
Host |
The host for which the test is to be configured. |
Subscription ID |
Specify the GUID which uniquely identifies the Microsoft Azure Subscription to be monitored. To know the ID that maps to the target subscription, do the following:
|
Tenant ID |
Specify the Directory ID of the Azure AD tenant to which the target subscription belongs. To know how to determine the Directory ID, refer to Configuring the eG Agent to Monitor Microsoft Azure Data Factory Using Azure ARM REST API |
Client ID, Client Password, and Confirm Password |
To connect to the target subscription, the eG agent requires an Access token in the form of an Application ID and the client secret value. For this purpose, you should register a new application with the Azure AD tenant. To know how to create such an application and determine its Application ID and client secret, refer to Configuring the eG Agent to Monitor Microsoft Azure Data Factory Using Azure ARM REST API. Specify the Application ID of the created Application in the Client ID text box and the client secret value in the Client Password text box. Confirm the Client Password by retyping it in the Confirm Password text box. |
Proxy Host |
In some environments, all communication with the Azure cloud could be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the Proxy Host and Proxy Port parameters. By default, these parameters are set to none, indicating that the eG agent is not configured to communicate via a proxy, by default. |
Proxy Username, Proxy Password and Confirm Password |
If the proxy server requires authentication, then, specify a valid proxy user name and password in the Proxy Username and Proxy Password parameters, respectively. Then, confirm the password by retyping it in the Confirm Password text box. |
DD Frequency |
Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD Frequency. |
Detailed Diagnosis |
To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option. The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:
|
Measurement | Description | Measurement Unit | Interpretation | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Longer pipeline runs |
Indicates the number of pipeline runs that consumes more processing time. |
Number |
The value of this measure should be low. Large datasets can significantly increase processing time. The more data being moved or transformed, the longer the pipeline will take. |
||||||||||||
Failed pipeline runs |
Indicates the number of pipeline runs that were failed. |
Number |
Failed pipeline runs in Azure Data Factory (ADF) can disrupt data workflows and impact downstream processes. An alert will be triggered whenever the total Failed pipeline runs metrics is greater than 0. |
||||||||||||
Succeeded pipeline runs |
Indicates the number of pipeline runs that were succeeded. |
Number |
Ideally, the value of this measure should be singnificantly high. |
||||||||||||
Cancelled pipeline runs |
Indicates the number of pipeline runs that were cancelled. |
Number |
The value of these measures should be low. The pipeline and activity runs will be cancelled when a user executes a manual cancellation through the Azure portal or API. Also, if a parent pipeline is cancelled, any child pipelines or activities may also get cancelled. |
||||||||||||
Cancelled activity runs |
Indicates the number of activity runs that were cancelled. |
Number |
|||||||||||||
Failed activity runs |
Indicates the number of activity runs that were failed. |
Number |
Failed activity runs in Azure Data Factory (ADF) can disrupt data workflows and complicate data processing. |
||||||||||||
Succeeded activity runs |
Indicates the number of activity runs that were succeeded. |
Number |
Ideally, the value of this measure should be singnificantly high. |
||||||||||||
Failed trigger runs |
Indicates the number of trigger runs that were failed. |
Number |
Failed trigger runs in Azure Data Factory (ADF) can lead to disruptions in scheduled data workflows. |
||||||||||||
Succeeded trigger runs |
Indicates the number of trigger runs that were succeeded. |
Number |
Ideally, the value of this measure should be singnificantly high. |
||||||||||||
Cancelled trigger runs |
Indicates the number of trigger runs that were cancelled. |
Number |
The value of this measure should be low. If a parent pipeline is cancelled, any active trigger runs associated with that pipeline may also be cancelled. |
||||||||||||
Failed starts |
Indicates the number of start operations that were failed. |
Number |
Failed starts in Azure Data Factory (ADF) refer to scenarios where triggers or pipelines fail to initiate successfully. This can disrupt data workflows and impact scheduled tasks. |
||||||||||||
Succeeded starts |
Indicates the number of start operations that were succeeded. |
Number |
This measure is used to identify resource utilization and failed executions when SSIS is integrated with Azure Data Factory. Ideally, the value of this measure should be singnificantly high. |
||||||||||||
Cancelled starts |
Indicates the number of start operations that were cancelled. |
Number |
The value of this measure should be low. Cancelled starts in Azure Data Factory (ADF) refer to instances where triggers or pipelines are not initiated due to manual or automated cancellations. |
||||||||||||
Stuck stop |
Indicates the number of stuck stop processes. |
Number |
In Azure Data Factory, insufficient resources or high load on the integration runtime may prevent a stop command from processing in a timely manner. |
||||||||||||
Succeeded stop |
Indicates the number of stop operations that were succeeded. |
Number |
Ideally, the value of this measure should be singnificantly high. |
||||||||||||
Succeeded execution |
Indicates the number of executions that were succeeded. |
Number |
Ideally, the value of this measure should be singnificantly high. |
||||||||||||
Failed execution |
Indicates the number of executions that were failed. |
Number |
Failed execution in Azure Data Factory (ADF) refers to situations where a pipeline or activity does not complete successfully. The failed execution could disrupt data workflows and affect data processing. |
||||||||||||
Cancelled execution |
Indicates the number of executions that were cancelled. |
Number |
The value of this measure should be low. Limited resources in the integration runtime can lead to the cancellation of additional executions, especially if the system is under heavy load. |
||||||||||||
CPU utilization |
Indicates the usage of CPU in Azure Data Factory. |
Percent |
High CPU utilization may indicate that the integration runtime is under heavy load, which can slow down pipeline execution. |
||||||||||||
Available memory |
Indicates the memory available for use in Azure Data Factory. |
MB |
Low available memory can lead to slower processing times and may cause activities to fail if they exceed memory limits. |
||||||||||||
Queue duration |
Indicates the time that a pipeline or activity spends in a queued state before execution. |
Secs |
Long queue durations can indicate resource bottlenecks or inefficiencies in pipeline design, affecting overall performance. |
||||||||||||
Queue length |
Indicates the number of pipeline or activity runs awaiting execution. |
Number |
A long queue length can signal resource bottlenecks, indicating that the system is struggling to keep up with demand. |
||||||||||||
Available nodes count |
Indicates the number of nodes that are available for processing activities within an integration runtime. |
Number |
A low count of available nodes can indicate that the system is overloaded, potentially leading to longer queue durations and slower execution times. |
||||||||||||
Copy capacity utilization |
Indicates how effectively the resources allocated for copy activities are being utilized during data transfer operations. |
Percent |
Efficient use of copy capacity can lead to lower costs by preventing the need for unnecessary resource scaling. |
||||||||||||
Copy available capacity percentage |
Indicates the proportion of available capacity for copy activities. |
Percent |
High available capacity percentage may indicate that resources are underutilized, while low values can signal bottlenecks that need addressing. |
||||||||||||
Copy waiting queue length |
Indicates the number of copy activities awaiting to be executed. |
Number |
A long waiting queue indicates potential bottlenecks in resource allocation or pipeline design, which can slow down data processing. |
||||||||||||
Pipeline capacity utilization |
Indicates how effectively the resources allocated for pipeline activities are used for execution. |
Percent |
High capacity utilization can indicate that resources are effectively used, while low utilization may suggest inefficiencies or underuse of allocated resources. |
||||||||||||
Pipeline available capacity percentage |
Indicates the proportion of available capacity for executing pipeline activities. |
Percent |
A high available capacity percentage may suggest that the pipeline has capacity to handle additional workloads, while a low percentage could indicate that resources are strained. |
||||||||||||
Pipeline waiting queue length |
Indicates the number of pipeline runs that are currently queued and awaiting execution. |
Number |
A long waiting queue can indicate that the system is unable to process incoming requests quickly enough, which may affect overall data processing efficiency. |
||||||||||||
External capacity utilization |
Indicates how effectively external resources like external data sources are being utilized. |
Percent |
High utilization can indicate that external resources are effectively leveraged, while low utilization may suggest inefficiencies or underuse. |
||||||||||||
External available capacity percentage |
Indicates the proportion of unused capacity of external resources like databases, APIs. |
Percent |
A high percentage indicates that there is significant unused capacity, suggesting potential for better resource allocation or workload distribution. |
||||||||||||
External waiting queue length |
Indicates the number of requests or activities that are queued and waiting to access external resources. |
Number |
A long waiting queue can indicate that external resources are overloaded or that there are constraints preventing activities from executing efficiently. |
||||||||||||
Maximum allowed entities count |
Indicates the maximum number of entities such as datasets, pipelines, triggers, etc. that can be utilized within a given ADF instance. |
Number |
Managing the count of entities can help maintain performance and prevent slowdowns caused by excessive configurations. |
||||||||||||
Maximum allowed factory size |
Indicates the maximum size of resources/configurations that can be used within a single Data Factory instance. |
|
Managing factory size effectively ensures that performance remains optimal, as exceeding limits can lead to throttling and slower execution times. |
||||||||||||
Total entities count |
Indicates the cumulative number of various entities within a Data Factory instance. |
Number |
A high number of entities can impact performance, so monitoring helps identify potential bottlenecks and areas for optimization. |
||||||||||||
Total factory size |
Indicates overall capacity and resource allocation for a specific Data Factory instance. |
|
A larger factory size can lead to performance bottlenecks if not managed correctly. |
||||||||||||
Job starts |
Indicates the number of Job starts associated to initiation of pipeline or activity. |
Number |
Analyzing job starts can help identify trends in usage, peak times for data processing, and potential bottlenecks in data workflows. |
||||||||||||
Job ends |
Indicates the number of Job ends associated to execution of pipeline or activity. |
Number |
This mesaure is used to identify failures when MVNet/Airflow is integrated with Azure Data Factory. Tracking job ends helps administrators analyze how long jobs take to complete, which can indicate efficiency and potential bottlenecks in data workflows. |
||||||||||||
Failed heartbeats |
Indicates the number of heartbeats that were failed. |
Number |
Failed heartbeats refer to the instances where the monitoring system fails to receive a regular signal (or heartbeat) from an integration runtime or a linked service. These heartbeats are crucial for ensuring that components are functioning correctly and are responsive. Monitoring failed heartbeats can provide insights into potential issues within data integration processes. |
||||||||||||
Failed operators |
Indicates the number of operators that were failed. |
Number |
In Azure Data Factory (ADF), operators are essential components used in data integration and transformation processes. They help define how data is processed, transformed, or transferred between various data sources and destinations. The overall pipeline activity will typically be marked as Failed if one or more operators fail. |
||||||||||||
Succeeded operators |
Indicates the number of operators that were succeeded. |
Number |
Ideally, the value of this measure should be singnificantly high. |
||||||||||||
Failed instance tasks |
Indicates the number of instance tasks that were failed. |
Number |
In Azure Data Factory (ADF), an instance task typically refers to the execution of a specific activity or job within a pipeline. Each time a pipeline is triggered, an instance of that pipeline is created, and each activity within the pipeline runs as part of that instance. The overall status of the pipeline instance will typically be marked as Failed if one or more activities fail. |
||||||||||||
Succeeded instance tasks |
Indicates the number of instance tasks that were succeeded. |
Number |
Ideally, the value of this measure should be singnificantly high. |
||||||||||||
Previously succeeded instance tasks |
Indicates the number of instance tasks that completed successfully during prior execution of a pipeline instance. |
Number |
More number of previously succeeded instance tasks may indicate a well-functioning pipeline, but it can also complicate data state management. The administrators may need to ensure that the outputs from these tasks are still valid and relevant for future executions. |
||||||||||||
Zombie killed tasks |
Indicates the number of pipeline activities that were running but are no longer active. |
Number |
If multiple zombie tasks are killed, the pipeline instance may be marked as Failed, especially if the killed tasks are critical to the workflow. This can disrupt the entire data processing sequence. |
||||||||||||
Scheduler heartbeats |
Indicates the number of Scheduler heartbeats. |
Number |
In Azure Data Factory (ADF), scheduler heartbeats refer to a mechanism that ensures the orchestration and scheduling components are functioning correctly and actively monitoring the status of various tasks and workflows. More frequent heartbeats can increase the load on the system, consuming more network bandwidth and processing resources. This could lead to higher operational costs or impact the performance of other tasks and services running in ADF. |
||||||||||||
DAG processing processes |
Indicates the number of DAG processes |
Number |
Directed Acyclic Graph (DAG) is a graphical representation of tasks and their dependencies, often used in workflow management systems of Azure Data Factory (ADF). More DAGs can lead to increased complexity in managing data workflows. Identifying issues and debugging failures may take longer, as more processes mean more potential points of failure. |
||||||||||||
DAG processing manager stalls |
Indicates the number of DAG processing manager stalls. |
Number |
The value of this measure should be low. DAG Processing Manager Stalls refer to a situation where the component responsible for managing and executing Directed Acyclic Graphs (DAGs) becomes unresponsive or significantly slows down. |
||||||||||||
DAG file refresh error |
Indicates the number of errors caused due to DAG file refresh activity. |
Number |
The value of this measure should be low. A DAG file refresh error typically occurs in workflow orchestration systems of Azure Data Factory where the system fails to update or reload a DAG (Directed Acyclic Graph) file. This can lead to issues in task scheduling and execution. |
||||||||||||
Scheduler tasks killed externally |
Indicates the number of Scheduler tasks that were killed. |
Number |
The value of this measure should be low. Scheduler tasks refer to the individual units of work managed and executed by a scheduling system in Azure Data Factory. These tasks are defined within a Directed Acyclic Graph (DAG) and are responsible for performing specific operations in a data pipeline or workflow. If a task is killed, any downstream tasks that depend on it may not execute, leading to incomplete data processing or failed workflows. |
||||||||||||
Scheduler orphaned tasks cleared |
Indicates the number of Scheduler orphaned tasks that were cleared. |
Number |
The value of this measure should be high. A scheduler orphan task refers to a task in Azure Data Factory that remains in the task queue or task state but has lost its parent dependencies. Clearing orphan tasks releases resources (CPU, memory, I/O) that were previously tied up by those tasks, allowing the system to allocate them to active workflows and tasks. |
||||||||||||
Scheduler orphaned tasks adopted |
Indicates the number of Scheduler orphaned tasks that were adopted. |
Number |
When scheduler orphan tasks are adopted in Azure Data Factory, it typically means that those tasks are reassigned or re-integrated into the workflow, often after having been previously orphaned. A higher count of adopted orphan tasks can complicate workflow management, making it harder to track dependencies and the overall state of tasks. |
||||||||||||
Scheduler critical section busy |
Indicates the number of Scheduler critical section that was busy. |
Number |
When the scheduler's critical section is busy in Azure Data Factory, it means that the part of the system is responsible for managing task execution, dependencies, and resource allocation is currently occupied. A high value of this measure represents that more tasks are waiting for access to the critical section, resulting in longer queues and it would take more time for executing the tasks. |
||||||||||||
Scheduler failed sla email attempts |
Indicates the number email attempts that consists of failed scheduler SLA. |
Number |
An increase in failed email attempts of scheduler SLA may indicate systemic issues within ADF pipelines, prompting a need to review pipeline designs, triggers, or dependencies. |
||||||||||||
Runtime started task instances |
Indicates the number of task instances at runtime. |
Number |
A surge in task instances can cause delays in execution if the Integration Runtime is unable to handle the workload efficiently, resulting in longer processing times. |
||||||||||||
DAG callback exceptions |
Indicates the number of DAG callback exceptions. |
Number |
DAG is a collection of tasks with defined dependencies. When using callbacks in DAGs, you may encounter exceptions that can disrupt data workflow execution. Frequent callback exceptions may lead to repeated task failures, causing pipelines to halt or enter an inconsistent state. |
||||||||||||
Task Celery timeout error |
Indicates the number of task celery timeout errors. |
Number |
A task timeout error in Celery typically indicates that a task is taking longer to execute than the specified timeout limit. Frequent timeouts can put additional strain on work processes, consuming more CPU and memory as they attempt to handle retries or process long-running tasks. |
||||||||||||
Tasks removed from dag |
Indicates the number of tasks removed from DAG. |
Number |
When tasks are removed from a DAG , it could lead to failures in downstream tasks that expect outputs from the removed tasks. |
||||||||||||
Tasks restored to dag |
Indicates the number of tasks restored to DAG. |
Number |
If tasks are restored to DAG, they will have access to historical run data from previous executions, which may affect their behavior. |
||||||||||||
Task instances created using operator |
Indicates the number of task instances created using operator. |
Number |
As more task instances are created, overall execution times may increase due to resource contention. High contention for resources can result in slower execution of individual tasks. |
||||||||||||
Triggers blocked main thread |
Indicates the number of triggers that blocks main thread. |
Number |
When more triggers block the main thread, the ADF scheduler becomes slow or unresponsive, leading to delays in scheduling new task instances. This can cause subsequent tasks to wait longer than expected to start. |
||||||||||||
Triggers failed |
Indicates the number of triggers that were failed. |
Number |
The failure of triggers can cause delays in the execution of the entire workflow. Tasks that depend on the successful execution of the trigger will remain in a queued or waiting state. |
||||||||||||
Triggers succeeded |
Indicates the number of triggers that were succeeded. |
Number |
Ideally, the value of this measure should be singnificantly high. |
||||||||||||
DAG bag size |
Indicates the size of DAG bag. |
|
DAG bag size typically refers to the amount of memory and resources used by the DAG bag, which is the collection of all the DAGs defined in ADF environment. If the DAG bag is excessively large, it may lead to out-of-memory errors, especially in environments with limited resources. |
||||||||||||
DAG processing import errors |
Indicates the number of import errors in DAG processing. |
Number |
DAG processing import errors in ADF occur when the scheduler encounters issues while trying to import or parse the Python files that define the DAGs. These errors can prevent DAGs from being recognized and executed properly. Continuous import errors can put a strain on the scheduler, which may consume additional resources trying to process the faulty DAGs, potentially affecting the performance of other DAGs. |
||||||||||||
DAG total processing parse time |
Indicates the time taken by scheduler to parse and process all DAG files when loaded. |
Secs |
As the parse time is more, the scheduler takes more time to load and process DAGs, delaying the scheduling of tasks. This can cause cascading delays in dependent workflows. |
||||||||||||
DAG processing last run seconds ago |
Indicates the time when the DAG was last processed or executed. |
Secs |
If a DAG takes longer to complete its last run, subsequent tasks may be delayed, leading to a backlog in the execution of dependent workflows. |
||||||||||||
DAG processing processor timeouts |
Indicates the duration within which a DAG processor is processed. |
Secs |
A processor timeout typically occurs when a task in a DAG does not complete within a specified time limit, causing the scheduler to terminate the task and report a failure. Frequent timeouts can cause tasks to fail, leading to incomplete data workflows. This can disrupt data pipelines and business processes that depend on the DAGs. |
||||||||||||
Scheduler tasks running |
Indicates the number of scheduler tasks in running state. |
Number |
In Azure Data Factory (ADF), a scheduler task typically refers to the scheduling of activities within a pipeline to run at specific times or intervals An increase in running scheduler tasks can lead to resource contention, potentially affecting the performance of data processing activities. |
||||||||||||
Scheduler tasks starving |
Indicates the number of scheduler tasks in starving state. |
Number |
In Azure Data Factory, scheduler task starvation occurs when there are more scheduled pipeline runs than the system can handle concurrently, leading to some tasks being delayed or queued. A growing number of starved tasks can create backlogs, causing a domino effect where subsequent tasks are also delayed. This can complicate workflows that depend on timely data availability. |
||||||||||||
Scheduler tasks executable |
Indicates the number of times a scheduled task has been executed over a specified period. |
Number |
By analyzing this count, you can assess whether the pipelines are running as expected. A sudden increase or decrease in execution counts can indicate issues or changes in data processing needs. |
||||||||||||
Executor open slots |
Indicates the number of available execution slots used to run pipeline activities. |
Number |
A consistently low count may indicate that the pipelines are hitting concurrency limits or that resources are being underutilized. |
||||||||||||
Executor queued tasks |
Indicates the number of tasks that are currently queued and awaiting execution. |
Number |
A very high value of this measure often indicates that the system is experiencing bottlenecks, either due to resource limits (like maximum concurrent activities) or contention for shared resources (like databases or storage). |
||||||||||||
Executor running tasks |
Indicates the number of tasks that are currently in execution process. |
Number |
A consistent high value of this measure may suggest a healthy throughput, while a sudden drop could indicate issues such as task failures or bottlenecks. |
||||||||||||
Pool open slots |
Indicates the number of available execution slots within a specific pool. |
Number |
If the count of this measure is consistently low, it may indicate that the integration runtime is nearing its capacity or that tasks are taking longer to complete. |
||||||||||||
Pool queued slots |
Indicates the number of tasks that are currently queued and awaiting execution within a specific resource pool. |
Number |
A high value of this measure can signal bottlenecks in the data processing pipeline, prompting a review of pipeline design or resource allocation. |
||||||||||||
Pool running slots |
Indicates the number of execution slots that are currently occupied by tasks being executed within a specific resource pool. |
Number |
A high value of this measure can indicate that the system is effectively utilizing its available resources. |
||||||||||||
Pool starving tasks |
Indicates the number of execution slots that are currently occupied but unable to execute tasks. |
Number |
A high value of this measure indicates that the system is unable to process tasks efficiently. This can lead to increased latency, as tasks remain queued longer than expected. |
||||||||||||
Triggers running |
Indicates the number of trigger instances that are currently in execution for a given pipeline. |
Number |
A consistently high value of this measure indicates that multiple instances of a pipeline are running simultaneously, which can impact resource usage and performance. |
||||||||||||
DAG run dependency check |
Indicates the time taken to verify and validate dependencies between tasks in a Directed Acyclic Graph (DAG). |
Secs |
A longer dependency check time delays the start of downstream tasks, leading to increased overall pipeline execution time. This can affect time-sensitive processes and data freshness. |
||||||||||||
Task instances duration |
Indicates the total time taken for a specific instance of a task to execute from start to finish. |
Secs |
Prolonged task durations can result in resource contention, where available resources are tied up for longer than necessary. This may lead to queuing of other tasks that depend on the completion of the long-running tasks. |
||||||||||||
DAG run duration success |
Indicates the total time taken for a successful execution of a DAG from the moment it starts running until it completes all tasks without errors. |
Secs |
A high value of this measure indicate bottlenecks within specific tasks or dependencies, which can hinder the overall workflow. This could lead to queuing of other tasks waiting for resources to become available. |
||||||||||||
DAG run duration failed |
Indicates the total time taken for a DAG execution that did not complete successfully. |
Secs |
A high value of this measure indicate that a significant amount of time is spent executing tasks that ultimately do not succeed. This can disrupt the flow of subsequent tasks that depend on the successful completion of the DAG, causing delays throughout the workflow. |
||||||||||||
DAG run schedule delay |
Indicates the amount of time that elapses between the scheduled time for a DAG run to start and the actual time when run begins executing. |
Secs |
A high value of this measure can disrupt downstream tasks that rely on the output of the delayed run. This can create a cascading effect, causing subsequent processes to be delayed as well. |
||||||||||||
Scheduler critical section duration |
Indicates the amount of time that a critical session (or task) is allowed to run before it is considered problematic. |
Secs |
A high value of this measure can indicate inefficiencies in task execution. This may lead to prolonged resource usage, which can impede the performance of other tasks awaiting execution. |
||||||||||||
DAG run first task scheduling delay |
Indicates the amount of time that elapses between the scheduled start time of a DAG run and the actual start time of the first task within that DAG. |
Secs |
A high scheduling delay for the first task indicates that the entire DAG takes longer to execute. This can lead to delays in data processing and reporting, affecting business decisions based on timely data. |
||||||||||||
Collected db dags |
Indicates the number of bags that hold data and metadata collected during the execution of data pipelines and activities. |
|
In Azure Data Factory (ADF), collected DB bags refer to a specific mechanism related to the management of data and metadata within ADF's execution environment. As the number of collected DB bags increases, it may slow down data processing and retrieval times. The overhead of managing a larger number of bags can impact the speed of data operations. |
||||||||||||
Memory usage |
Indicates the size of memory utilized by Azure Data Factory. |
|
|
||||||||||||
Memory Utilization |
Indicates the percent of memory utilized by Azure Data Factory. |
Percent |
High memory usage can slow down processing speeds, leading to longer execution times for data pipelines and tasks. This may affect the overall throughput of data processing. |
||||||||||||
CPU Utilization |
Indicates the percent of CPU utilized by Azure Data Factory. |
Percent |
High CPU usage can result in slower processing speeds for tasks, leading to longer execution times for data pipelines. This can affect overall workflow efficiency and throughput. |
||||||||||||
Nodes count |
Indicates the number of nodes in ADF pipeline. |
Number |
Each node represents an activity that requires resources. If there are too many nodes running concurrently, it may lead to resource contention, resulting in slower performance and longer execution times for the pipeline. |
||||||||||||
Status |
Indicates the status of the resources in Azure Data Factory. |
|
The resources in Azure Data Factory represents the pipeline, activity and triggers. The values reported by this measure and its numeric equivalents are mentioned in the table below:
Note: By default, this measure reports the Measure Values listed in the table above to indicate the status of the resources in Azure Data Factory. Use the detailed diagnosis of this measure to know Factory name, Location, Provisioning state and Pipeline enable details. |
||||||||||||
Total triggers |
Indicates the total count of triggers in Azure Data Factory. |
Number |
A high value of this measure can lead to overlapping executions, where multiple instances of the same pipeline are running at the same time. This can complicate data management and may lead to data integrity issues if not handled correctly. |
||||||||||||
Total pipelines |
Indicates the total count of pipelines in Azure Data Factory. |
Number |
If multiple pipelines are running concurrently, they may compete for limited resources (CPU, memory, etc.), leading to performance degradation. This can result in slower execution times for some pipelines. |