.Net Windows Service Business Transactions Test

Standalone Windows service applications execute .NET methods/functions to transact business. The responsiveness of these methods/functions are a key determinant of user experience with the corresponding transactions; if response time increases, user experience deteriorates. To make users happy, a business transaction should be rapidly processed by each of the servers in its path. Processing bottlenecks on a single system/server in the transaction path can slowdown/stall an entire business transaction or can cause serious transaction errors. This in turn can badly scar the experience of application users. To avoid this, administrators should promptly identify slow/stalled/error function calls, isolate where the slowness/error occurred, and uncover what caused the aberration? The .NET Windows Service Business Transactions test helps with this!

This test runs on a Windows server/system, where one/more .NET applications run sas standalone Windows services. For each standalone Windows service:method/function pair that is configured for monitoring, this test computes and reports the average time taken by that method/function to finish execution and send out the response to the corresponding service. In the process, the test identifies the slow/stalled functions, and reports the count of such function calls and their responsiveness. Detailed diagnostics provided by the test accurately pinpoint the exact function calls that are slow/stalled, and when those calls were made. The slowest function call can thus be identified.

Moreover, to enable administrators to figure out if the slowness can be attributed to a bottleneck in SQL query processing, the test also reports the average time the a specific function took to execute SQL queries. If a majority of the queries are slow, then the test will instantly capture the same and notify administrators.

Additionally, the test promptly alerts administrators to functions that encountered errors during execution. To identify the exact function calls that spewed errors, the detailed diagnosis capability of the test can be used.

This way, the test effortlessly measures the performance of the methods/functions configured for monitoring for a specific standalone Windows service application, highlights methods/functions that are under-performing, and takes administrators close to the root-cause of poor application performance.

Target of the Test : A BTM-enabled .NET application that runs as a standalone Windows service

Agent deploying the test : An internal agent

Output of the test : One set of results for every method/function type that is configured for monitoring for a chosen standalone Windows service

Configurable parameters for the test
Parameter Description

Test period

How often should the test be executed

Host

The host for which the test is to be configured.

Port

The port number through which the Apache Hive communicates. The default port is 80.

Services and Functions

This test will run and report metrics only if one/more services are configured for monitoring, and at least one function per service is configured for monitoring.

To know how to configure the above, refer to theConfiguring Services and Functions for Monitoring topic .

Healthy URL Trace

By default, this flag is set to No. This means that eG Enterprise will not collect detailed diagnostics for those transactions - i.e., methods/functions - that are healthy. If you want to enable the detailed diagnosis capability for healthy functions as well, then set this flag to Yes.

Max Healthy Traces

This parameter is applicable only if the Healthy URL Trace flag is set to ‘Yes’. Here, specify the number of top function calls that should be listed in the detailed diagnosis of the Healthy transactions measure, every time the test runs. By default, this is set to 50, indicating that the detailed diagnosis of the Healthy transactions measure will by default list the top-50 function calls, arranged in the descending order of their response times.

Max Slow Traces

Specify the number of top function calls that should be listed in the detailed diagnosis of the Slow transactions measure, every time the test runs. By default, this is set to 10, indicating that the detailed diagnosis of the Slow transactions measure will by default list the top-10 function calls, arranged in the descending order of their response times.

Max Stalled Traces

Specify the number of top function calls that should be listed in the detailed diagnosis of the Stalled transactions measure, every time the test runs. By default, this is set to 10, indicating that the detailed diagnosis of the Stalled transactions measure will by default list the top-10 function calls, arranged in the descending order of their response times.

Max Error Traces

Specify the number of top function calls that should be listed in the detailed diagnosis of the Error transactions measure, every time the test runs. By default, this is set to 10, indicating that the detailed diagnosis of the Error transactions measure will by default list the top-10 function calls, in terms of the number of errors they encountered.

Method Execution Cutoff (MS)

From the detailed diagnosis of slow/stalled/error function calls, you can drill down and perform deep execution analysis of a particular call. In this drill-down, the methods invoked by that slow/stalled/error function are listed in the order in which the function calls the methods. By configuring a method execution cutoff, you can make sure that methods that have been executing for a duration greater the specified cutoff are alone listed when performing execution analysis. For instance, if you specify 5 here, then the Execution Analysis window for a slow/stalled/error function will list only those methods that have been executing for over 5 milliseconds. This way, you get to focus on only those methods that could have caused the slowness, without being distracted by inconsequential methods. By default, the value of this parameter is set to 250 ms.

SQL Execution Cutoff (MS)

Typically, from the detailed diagnosis of a slow/stalled/error function call made by a standalone Windows service application, you can drill down to view the SQL queries (if any) executed by that function from that node and the execution time of each query. By configuring a SQL Execution Cutoff, you can make sure that queries that have been executing for a duration greater the specified cutoff are alone listed when performing query analysis. For instance, if you specify 5 here, then for a slow/stalled/error function, the SQL Queries window will display only those queries that have been executing for over 5 milliseconds. This way, you get to focus on only those queries that could have contributed to the slowness. By default, the value of this parameter is set to 10 ms.

Max Grouped URLs per Measure Period

This parameter is not applicable to this test . You can therefore ensure that the default settings for this parameter remain.

Max SQL Queries Per Transaction

Typically, from the detailed diagnosis of a slow/stalled/error tunction you can drill down to view the SQL queries (if any) executed by that transaction from that node and the execution time of each query. By default, eG picks the first 500 SQL queries executed by the function, compares the execution time of each query with the SQL Execution Cutoff configured for this test, and displays only those queries with an execution time that is higher than the configured cutoff. This is why, the 'Max SQL Queries Per Transaction' parameter is set to 500 by default.

To improve agent performance, you may want the SQL Execution Cutoff to be compared with the execution time of a less number of queries – say, 200 queries. Similarly, to increase the probability of capturing more number of long-running queries, you may want the SQL Execution Cutoff to be compared with the execution time of a large number of queries – say, 1000 queries. For this, you just need to modify the 'Max SQL Queries Per Transaction' specification to suit your purpose.

Filtered URL Patterns

This parameter is not applicable to this test . You can therefore ensure that the default settings for this parameter remain.

Show Cookies

This parameter is not applicable to this test . You can therefore ensure that the default settings for this parameter remain.

Show Headers

This parameter is not applicable to this test . You can therefore ensure that the default settings for this parameter remain.

URL Segments

This parameter is not applicable to this test . You can therefore ensure that the default settings for this parameter remain.

Mask SQL

If eG BTM finds that one/more slow database queries are responsible for the slowness that users are experiencing with a transaction, then you can drill-down from the cross-application transaction topology of that transaction to view the slow SQL queries run by that transaction. This way, you can accurately isolate the root-cause of the slowness. By default, if any of these queries include confidential information such as passwords, then eG Enterprise will mask the passwords in the query using the '?' (question mark) character. This way, eG Enterprise protects your business-critical data from abuse by intruders. Accordingly, this flag is set to Yes by default. For security reasons, we recommend that you do not change the default status of this flag. However, for some reason, if you want the password to be visible in clear text in the SQL queries displayed as part of the detailed analytics, then you can unmask the password by setting this flag to No.

Mask Public IP, Mask Private IP

This parameter is not applicable to this test . You can therefore ensure that the default settings for this parameter remain.

Mask URL Params

This parameter is not applicable to this test . You can therefore ensure that the default settings for this parameter remain.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD Frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measures reported by the test
Measurement Description Measurement Unit Interpretation

All transactions

Indicates the total number of requests received for this function during the last measurement period.

Number

By comparing the value of this measure across functions, you can identify the most popular functions. Using the detailed diagnosis of this measure, you can then figure out which specific functions are called often.

For the Summary descriptor, this measure will reveal the total number of times this function was called by the Windows service application during the last measurement period. This is a good indicator of the workload of the application.

Avg response time

Indicates the average time taken by this function to complete execution.

Secs

Compare the value of this measure across functions to isolate the slowest one. You can then use the detailed diagnosis of the All transactions measure of that group to know how much time each transaction in that group took to execute. This will lead you to the slowest transaction.

For the Summary descriptor, this measure will reveal the average responsiveness of all the function calls made by this Windows service application during the last measurement period. An abnormally low value for this measure for the Summary descriptor could indicate a serious processing bottleneck.

Healthy transactions

Indicates the number of times this function call was healthy.

Number

By default, this measure will report the count of function calls with a response time less than 4000 milliseconds. You can change this default setting by modifying the thresholds of the Avg response time measure using the eG admin interface.

For the Summary descriptor, this measure will report the total number of healthy function calls made by the standalone service application.

Healthy transactions percentage

Indicates what percentage of the total number of calls of this function is healthy.

Percent

To know which are the healthy function calls, use the detailed diagnosis of this measure. For the Summary descriptor, this measure will report the overall percentage of healthy function calls made by this standalone Windows service application.

Slow transactions

Indicates the number of times this function call was slow during the last measurement period.

Number

By default, this measure will report the number of function calls with a response time higher than 4000 milliseconds and lesser than 60000 milliseconds. You can change this default setting by modifying the thresholds of the Avg response time measure using the eG admin interface.

A high value for this measure is a cause for concern, as too many slow function calls means that the performance of the standalone Windows service application is deteriorating. For the Summary descriptor, this measure will report the total number of functions . This is a good indicator of the processing power of the target web site.

Slow transaction response time

Indicates the average time taken by the slow calls of this function to execute.

Secs

For the Summary descriptor, this measure will report the average response time of all the slow functions called by the target standalone Windows service application.

Slow transactions percentage

Indicates what percentage of the total number of times that this function was called, that it was slow.

Percent

Use the detailed diagnosis of this measure to know which precise function calls were slow. You can drill down from a slow function to know what is causing the slowness. For the Summary descriptor, this measure will report the overall percentage of slow function calls made by the target standalone Windows service application.

Error transactions

Indicates the number of times this function call experienced errors during the last measurement period.

Number

A high value is a cause for concern, as function calls that throw too many errors can degrade application performance. For the Summary descriptor, this measure will report the total number of functions that encountered errors during execution. This is a good indicator of how error-prone the target service application is.

Error transactions response time

Indicates the average duration for which this function executed before an error condition was detected.

Secs

The value of this measure will help you discern if erro functions were also slow. For the Summary descriptor, this measure will report the average response time of all function calls made by the target service application.

Error transactions percentage

Indicates what percentage of the total number of times that this function was called that it experienced errors.

Percent

Use the detailed diagnosis of this measure to isolate the error functions. You can even drill down from an error function in the detailed diagnosis to determine the cause of the error. For the Summary descriptor, this measure will report the overall percentage of functions executed by the target service application that is currently throwing errors.

Stalled transactions

Indicates the number of times that this function was stalled during execution in the last measurement period.

Number

By default, this measure will report the number of function calls with a response time higher than 60000 milliseconds. You can change this default setting by modifying the thresholds of the Avg response time measure using the eG admin interface.

A high value is a cause for concern, as too many stalled function calls means sub-par application performance. For the Summary descriptor, this measure will report the total number of stalled functions executed by the monitored standalone Windows service application.

Stalled transactions response time

Indicates the average time taken by the stalled calls of this function to execute.

Secs

For the Summary descriptor, this measure will report the average response time of all stalled transactions on the target web site.

Stalled transactions percentage

Indicates what percentage of the total number of times this function was called, it stalled.

Percent

Use the detailed diagnosis of this measure to know which precise function calls stalled. You can drill down from a stalled function call to know what is causing that function to stall. For the Summary descriptor, this measure will report the overall percentage of functions called by the target service application that is stalling.

Slow SQL statements executed

Indicates the number of slow SQL queries that were executed by the transactions of this pattern during the last measurement period.

Number

For the Summary descriptor, this measure will report the total number of slow SQL queries executed by all transactions to the target web site.

Avg slow SQL statement time

Indicates the average execution time of the slow SQL queries that were run by this function.

Secs

If any function is executing very slowly, you may want to check the value of this measure for that function to figure out if query execution is slowing down the function. Use the detailed diagnosis of the Slow transactions measure to identify the precise slow function call. Then, drill down from that slow call to confirm whether/not database queries have contributed to the slowness. Deep-diving into the queries will reveal the slowest queries and their impact on the execution time of the transaction.

Total transactions per minute

Indicates the number of times this function was executed per minute.

Number

 

Error transactions per minute

Indicates the number of times in a minute this function encountered errors.

Number

A very low value is desired for this measure.

Compare the value of this measure across functions to find the error-prone function.

Entry point request count

Indicates the total number of calls to this function for which the monitored node was the entry point.

Number

Any web / web application server node typically processes two types of functions. They are:

  • Function calls passing through the monitored node - these are calls that will also be processed by nodes before and after the target node in the transaction topology;

  • Functions for which the monitored node is the entry point - these are function calls triggered by the standalone Windows services on the target node; these calls may be processed by other nodes after the target node in the transaction topology, but will not have any predecessors.

The All transactions measure reports the sum of the function calls passing through the target node and the calls for which the target node is the entry point. This way, the All transactions measure paints the true picture of function load on a specific node.

Service managers/owners on the other hand, will be less interested in measuring load at the node-level; instead, they will be keen to determine load at the business service-level. This is where the Entry point request count helps! For a specific node, this measure reports the count of 'unique' functions for which that node is the entry point. By monitoring this count across all nodes that are engaged in delivering a business service, service managers can accurately ascertain the total load on the service.

Entry point avg response time

Indicates the average time taken by this function for which the target node is the entry point, to complete execution.

Milliseconds

The Avg response time measure reveals how responsive a function is from the time a function call is received and processed by the target node. This measure does not differentiate between pass-through and entry point functions, and is hence a good indicator of the workload processing ability of the target node.

The Entry point avg response time measure on the other hand, enables service owners/managers to accurately measure performance at the business service-level. If the target node serves as a function's entry point, then this measure reports the responsiveness of that function, end-to-end - i.e., from the entry point, through all the other nodes in its path, till a response is returned to the entry point. By monitoring this measure across all nodes engaged in delivering a business service, you can identify the 'unique' functions run by the service applications on each node, measure end-to-end responsiveness of each function, and isolate the slowest function at the service-level.

Entry point slow request count

Indicates the number of slow calls to this function for which the target node is the entry point.

Number

The Slow transactions measure captures the total count of calls to a function that have been observed to be slow, from the time the target node receives the calls, till responses are returned to the target node. Since this measure does not differentiate between entry point and pass through functions, you can use this measure to accurately determine the total number of function calls to a 'target' node that slowed down.

On the other hand, if you want to determine the count of slow functions calls that are impacting the health of a 'business service', you can use the Entry point slow request count measure. With the help of the values reported by this measure for every node that supports a business service, you can ascertain how many 'distinct' functions are slowing down that service.

Entry point slow response time

Indicates the average execution time of the slow calls to this function for which the target node is the entry point,

Milliseconds

The Slow transaction response time measure reports the time that elapsed from when the target node received slow function calls till when it responded to the calls. Since this measure does not differentiate between pass-through and entry point functions, it will give you a fair idea of overall slowness on the 'target node'.

The Entry point slow response time measure on the other hand, captures slowness by observing function response time from its entry point. Using the values reported by this measure for every node that supports a business service, you can monitor slowness at the business service-level.

Entry point error request count

Indicates the count of error calls to this function for which the target node is the entry point.

Number

The Error transactions measure captures the total count of functions that have encountered errors, from the time the function calls reach the target node, till responses are returned to the target node. Since this measure does not differentiate between entry point and pass through functions, you can use this measure to accurately determine the total number of error functions to a 'target' node.

On the other hand, if you want to determine the count of error functions that are impacting the health of a 'business service', you can use the Entry point error request count measure. With the help of the values reported by this measure for every node that supports a business service, you can ascertain how many 'distinct' functions run by the service applications on each node have encountered errors.

Entry point error response time

Indicates the average execution time of the error calls to this function for which the target node is the entry point,

Milliseconds

The Error transaction response time measure reports the time that elapsed from when the target node received error function calls till when responses were returned to that node. Since this measure does not differentiate between pass-through and entry pointfunctions, it will give you a fair idea of the overall responsiveness of error functions on the 'target node'.

The Entry point error response time measure on the other hand, reports the average response time of 'unique' error functions from their entry point. Using the values reported by this measure for every node that supports a business service, you can understand how responsive error functions are at the service level.

Entry point stalled request count

Indicates the number of stalled calls to this function for which the target node is the entry point.

Number

The Stalled transactions measure captures the total count of function calls that have been observed to be stalling, from the time the calls reached the target node, till responses are returned to the target node. Since this measure does not differentiate between entry point and pass through function calls, you can use this measure to accurately determine the total number of calls to a 'target' node that have stalled.

On the other hand, if you want to determine the count of stalled function calls to a 'business service', you can use the Entry point stalled request count measure. With the help of the values reported by this measure for every node that supports a business service, you can ascertain how many 'distinct' function calls run by the service applications on each node are stalled.

Entry point stalled response time

Indicates the average execution time of the stalled calls to this function for which the target node is the entry point,

Milliseconds

The Stalled transaction response time measure reports the time that elapsed from when the target node received stalled function calls till when it received responses to those requests. Since this measure does not differentiate between pass-through and entry point function calls, it is a good measure of the overall responsiveness of stalled functions to the 'target node'.

The Entry point stalled response time measure on the other hand, reports the average responsiveness of 'unique' stalled function calls from their entry point. Using the values reported by this measure for every node that supports a business service, you can gauge the responsiveness of stalled function calls at the business service level.

Avg CPU time

Indicates the average time for which this function was utilizing the CPU.

Msecs

Compare the value of this measure across functions to accurately identify the CPU-intensive functions.

For the Summary descriptor, this measure will report the average time for which all the functions executed by the target standalone Windows service application used the CPU.

Avg block time

Indicates the average duration for which this function was blocked and could not execute.

Milliseconds

If the Avg response time for any function is very high, you may want to check the value of this measure for that function. This will help you figure out whether/not prolonged blocking is causing the function to slow down or stall.

For the Summary descriptor, this measure will report the average time for which all the functions executed by the target standalone Windows service application were blocked.

Avg wait time

Indicates the average duration for which this function was waiting before it resumed execution.

Milliseconds

If the Avg response time for any function is very high, you may want to check the value of this measure for that function. This will help you figure out whether/not a very high waiting time is what is causing the function to slow down/stall.

For the Summary descriptor, this measure will report the average time for which all the functions executed by the target service application were waiting.

Entry point healthy request count

Indicates the number of healthy calls to this function for which the target node is the entry point.

Number

The Healthy transactions measure captures the total count of function calls that have been observed to be healthy, from the time the function calls reach the target node, till responses are returned to the target node. Since this measure does not differentiate between entry point and pass through functions, you can use this measure to accurately determine the total number of healthy functions executed on a 'target' node.

On the other hand, if you want to determine the count of healthy functions run by a target service application for a 'business service', you can use the Entry point healthy request count measure. With the help of the values reported by this measure for every node that supports a business service, you can ascertain how many 'distinct' functions run by the service applications on each node are.healthy.

Entry point total calls per minute

Indicates the total number of calls to this function that were received per minute by the entry point.

Number

 

Entry point error transactions per minute

Indicates the number of error calls to this function that were received per minute by the entry point.

Number

 

Satisfaction score

Indicates the Apdex score / Satisfaction score of this function.

Number

Apdex (Application Performance Index) is an open standard developed by an alliance of companies. It defines a standard method for reporting and comparing the performance of software applications in computing. Its purpose is to convert measurements into insights about user satisfaction, by specifying a uniform way to analyze and report on the degree to which measured performance meets user expectations.

The Apdex method converts many measurements into one number on a uniform scale of 0-to-1 (0 = no users satisfied, 1 = all users satisfied). The resulting Apdex score is a numerical measure of user satisfaction with the performance of enterprise applications. This metric can be used to report on any source of end-user performance measurements for which a performance objective has been defined.

The Apdex formula is:

Apdext = (Satisfied Count + Tolerating Count / 2) / Total Samples

This is nothing but the number of satisfied samples plus half of the tolerating samples plus none of the frustrated samples, divided by all the samples.

A score of 1.0 means all responses were satisfactory. A score of 0.0 means none of the responses were satisfactory. Tolerating responses half satisfy a user. For example, if all responses are tolerating, then the Apdex score would be 0.50.

Ideally therefore, the value of this measure should be 1.0. A value less than 1.0 indicates that the experience with the function has been less than satisfactory.