How to Set Up High Availability for Namenode – Part 5

Hadoop has two core components which are HDFS and YARN. HDFS is for storing the Data, YARN is for processing the Data. HDFS is Hadoop Distributed File System, it has Namenode as Master Service and Datanode as Slave Service.

Namenode is the critical component of Hadoop which is storing the metadata of data stored in HDFS. If the Namenode goes down, the entire cluster will not be accessible, it is the single point of failure (SPOF). So, the production environment will be having Namenode High Availability to avoid the production outage if one Namenode goes down because of various reasons like machine crash, planned maintenance activity, etc.

Hadoop 2.x provides the feasibility where we can have two Namenodes, one will be Active Namenode and another will be Standby Namenode.

  • Active Namenode – It manages all client operations.
  • Standby Namenode – It is redundant of Active Namenode. If Active NN goes down, then Standby NN will take all the responsibility of Active NN.

Enabling Namenode High Availability requires Zookeeper which is mandatory for automatic failover. ZKFC (Zookeeper Failover Controller) is a Zookeeper client that is used to maintain the state of Namenode.

Requirements

In this article, we are going to enable Namenode High Availability in Cloudera Manager.

Step 1: Installation of Zookeeper

1. Log in to Cloudera Manager.

http://Your-IP:7180/cmf/home
Cloudera Manager Dashboard
Cloudera Manager Dashboard

2. In the Cluster (tecmint) action prompt, select “Add Service”.

Add Service in Cloudera Manager
Add Service in Cloudera Manager

3. Select the service “Zookeeper”.

Zookeeper Service
Zookeeper Service

4. Select the servers where we are going to have Zookeeper installed.

Add Zookeeper Service
Add Zookeeper Service

5. We are going to have 3 Zookeepers to form Zookeeper Quorum. Select the servers as mentioned below.

Create Zookeeper Quorum
Create Zookeeper Quorum

6. Configure the Zookeeper properties, here we are having the default ones. In real-time, you have to have separate directory/mount points for storing Zookeeper data. In Part-1, we have explained about storage configuration for each service. Click ‘continue’ to proceed.

Configure Zookeeper Properties
Configure Zookeeper Properties

7. Installation will begin, once installed Zookeeper will be started. You can view the background operations here.

Installing Zookeeper Service
Installing Zookeeper Service

8. After successful completion of the above step, Status will be ‘Finished’.

Zookeeper Installed
Zookeeper Installed

9. Now, Zookeeper is successfully Installed and Configured. Click the ‘Finish’.

Zookeeper Configured
Zookeeper Configured

10. You can view the Zookeeper service on the Cloudera Manager Dashboard.

View Zookeeper Service
View Zookeeper Service

Step 2: Enabling Namenode High Availability

11. Go to Cloudera Manager –> HDFS –> Actions –> Enable High Availability.

Enabling High Availability
Enabling High Availability

12. Enter the Nameservice Name as “nameservice1” – This is a common Namespace for both Active and standby Namenode.

Add Nameservice Name
Add Nameservice Name

13. Select the Second Namenode where we are going to have standby Namenode.

Choose Second Namenode
Choose Second Namenode

14. Here we are selecting master2.tecmint.com for standby Namenode.

Select Host for Namenode
Select Host for Namenode

15. Select the Journal nodes, these are mandatory services for synchronizing Active and Standby Namenode.

Select Jouranal Nodes
Select Jouranal Nodes

16. We are making Quorum Journal by placing the Journal node in 3 servers as mentioned below. Select 3 servers and click ‘OK’.

Create Quorum Journal
Create a Quorum Journal

17. Click ‘Continue’ to proceed.

Assign Roles to Quorum Journal
Assign Roles to Quorum Journal

18. Enter the Journal Node directory path. Just we need to mention the path while installing this directory will be automatically created by the service itself. We are mentioning as ‘/jn’. Click ‘Continue’ to proceed.

Add Journal Node Path
Add Journal Node Path

19. It will start enabling the High Availability.

Enabling High Availability
Enabling High Availability

20. Once completed all the background processes, we will get ‘Finished’ Status.

Finished High Availability
Finished High Availability

21. Finally, we will get a notification ‘Successfully enabled High Availability’. Click ‘Finish’.

High Availability Enabled
High Availability Enabled

22. Verify the Active and Standby Namenode by going to Cloudera Manager –> HDFS –> Instances.

Verify High Availability
Verify High Availability

23. Here, you can wee two Namenodes, one will be in the ‘Active’ state and another will be in ‘Standby’ state.

Verify Namenodes
Verify Namenodes
Conclusion

In this article, we have gone through the step by step process to enable Namenode High Availability. It is highly recommended to have Namenode High Availability in all the clusters in a real-time environment. Please post your doubts if you face any error while doing this process. We will see Resource Manager High Availability in the next article.

Mohan Sivam
A Solution Architect and Bigdata infrastructure with over 10 years of experience in Information technology.

Each tutorial at TecMint is created by a team of experienced Linux system administrators so that it meets our high-quality standards.

Join the TecMint Weekly Newsletter (More Than 156,129 Linux Enthusiasts Have Subscribed)
Was this article helpful? Please add a comment or buy me a coffee to show your appreciation.

Got Something to Say? Join the Discussion...

Thank you for taking the time to share your thoughts with us. We appreciate your decision to leave a comment and value your contribution to the discussion. It's important to note that we moderate all comments in accordance with our comment policy to ensure a respectful and constructive conversation.

Rest assured that your email address will remain private and will not be published or shared with anyone. We prioritize the privacy and security of our users.