Thresholds are upper and lower bounds that determine whether a metric is performing to expectation or not. Every time the actual value of the metric falls outside the prescribed limits, the monitoring system raises an alarm.
Typically, administrators have to define the thresholds for each metric collected by a monitoring system. Since there are thousands of metrics for a large infrastructure, manually setting each and every metric can be a laborious, cumbersome process.
For some metrics like availability and response time, administrators can set fixed thresholds, based on their service level expectations and agreements. For other metrics that are not bound by SLAs, the value of these metrics can be time-varying. The number of users connected to a web server is one example. The value of this metric varies with time of day (there are more connections during the day and fewer during the night) and even day of the week. Therefore it is not feasible to have a single, fixed threshold for such time-varying metrics.
With eG Innovations, we are reducing system maintenance and support costs, avoiding incremental IT spending and eliminating system downtime across the hospital. Performance and prediction reports help us optimize IT spending and save $100,000 per year.
eG Enterprise includes an intelligent thresholding engine, designed to handle both metrics with fixed values and those that vary with time of day. For service quality metrics (e.g., availability, response time), the system allows administrators to set multiple fixed thresholds.
For other metrics, eG Enterprise computes time-varying thresholds automatically. The automatic threshold computation is done using tested statistical quality control techniques to analyze past values of the metrics and to automatically set the upper and lower bounds for each of the metrics, using the historical data. Since the values of the metrics vary from time to time, the historical thresholds are also time-varying.
The key benefits of this approach are:
Threshold policy determines how the thresholds for a metric are computed, and alarm policies determine when alarms are to be generated to inform administrators about a problem. Depending on their criticality, different metrics may require different alarm policies. Alarm policies also can take into account the frequency of threshold violations of a metric.
To accommodate different types of metrics, the eG alarm manager offers administrators complete flexibility in setting alarm policies. Administrators can set individual alarm policies for each server, or each server group, or per server type.
A single problem can trigger a number of side-effects, potentially triggering a large number of network alarms and difficulty knowing where to begin problem diagnosis.
eG Enterprise provides administrators a head start. The patented event correlation engine embedded in eG Enterprise analyzes the measurements provided to it by eG agents in real-time, assesses the inter-dependencies between infrastructure components, and automatically prioritizes alarms into different levels of criticality. This capability is ideal for multi-domain environments where multiple associates and tools can result in prolonged downtimes and reduced customer satisfaction.