Hadoop Snapshot Operations Test
HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system. Some common use cases of snapshots are data backup, protection against user errors and disaster recovery.
Using the Hadoop Snapshot Operations test, administrators can track administrator- and user-initiated snapshot operations on the HDFS.
Target of the test : A Hadoop cluster
Agent deploying the test : A remote agent
Outputs of the test : One set of the results for the Hadoop cluster being monitored
Parameter | Description |
---|---|
Test Period |
How often should the test be executed. |
Host |
The IP address of the NameNode that processes client connections to the cluster. NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients. |
Port |
The port at which the NameNode accepts client connections. NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients. By default, the NameNode's client connection port is 8020. |
Name Node Web Port |
The eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the resource manager. NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients. To run API commands on the NameNode and pull metrics, the eG agent needs access to the NameNode's web port. To determine the correct web port of the NameNode, do the following:
Configure the <Port_Number> in the specification as the Name Node Web Port. In the case of the above sample configuration, this will be 50070. |
Name Node User Name |
The eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the resource manager. NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients. In some Hadoop configurations, a simple authentication user name may be required for running API commands and collecting metrics from the NameNode. When monitoring such Hadoop installations, specify the name of the simple authentication user here. If no such user is available/required, then do not disturb the default value none of this parameter. |
Resource Manager IP and Resource Manager Web Port |
The eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the resource manager. The YARN Resource Manager Service (RM) is the central controlling authority for resource management and makes resource allocation decisions. To pull metrics from the resource manager, the eG agents first needs to connect to the resource manager. For this, you need to configure this test with the IP address/host name of the resource manager and its web port. Use the Resource Manager IP and Resource Manager Web Port parameters to configure these details. To determine the IP/host name and web port of the resource manager, do the following:
Configure the <IP_Address_or_Host_Name> in the specification as the Resource Manager IP, and the <Port_Number> as the Resource Manager Web Port. In the case of the above sample configuration, this will be 8080. |
Resource Manager Username |
The eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the resource manager. The YARN Resource Manager Service (RM) is the central controlling authority for resource management and makes resource allocation decisions. In some Hadoop configurations, a simple authentication user name may be required for running API commands and collecting metrics from the resource manager. When monitoring such Hadoop installations, specify the name of the simple authentication user here. If no such user is available/required, then do not disturb the default value none of this parameter. |
Measurement | Description | Measurement Unit | Interpretation |
---|---|---|---|
Allow operations |
Indicates the rate at which Allow Snapshots operations were performed on the HDFS. |
Operations/Sec |
The Allow Snapshots operation allows snapshots of a directory to be created. If the operation is successful, the directory becomes snapshottable. Superuser privileges are required for performing this operation. |
Disallow operations |
Indicates the rate at which Disallow Snapshots operations were performed on the HDFS. |
Operations/Sec |
The Disallow Snapshots operations disallows the creation of snapshots for a directory. All snapshots of the directory must be deleted before disallowing snapshots. Superuser privileges are required for performing this operation. |
Create operations |
Indicates the rate at which Create Snapshots operations were performed on the HDFS. |
Operations/Sec |
Use this operation to create a snapshot of a snapshottable directory. Owner privilege of the snapshottable directory are required for performing this operation. |
Delete Snapshots |
Indicates the rate at which Delete Snapshots operations were performed on the HDFS. |
Operations/Sec |
This operation deletes a snapshot of a om a snapshottable directory. Owner privilege of the snapshottable directory are required for performing this operation. |
Rename Snapshots |
Indicates the rate at which Rename Snapshots operations were performed on the HDFS. |
Operations/Sec |
This operation renames a snapshot. Owner privilege of the snapshottable directory are required for performing this operation. |