How to Install CDH and Configure Service Placements on CentOS/RHEL 7 – Part 4

In an earlier article, we have explained the installation of Cloudera Manager, in this article, you will learn how to install and configure CDH (Cloudera Distribution Hadoop) in RHEL/CentOS 7.

While installing the CDH parcel, we have to ensure the Cloudera Manager and CDH compatibility. Cloudera version is having 3 parts – <major>.<minor>.<maintenance>. Cloudera Manager major and minor version must be the same as CDH major and minor version.

For Example, we are using Cloudera Manager 6.3.1 and CDH 6.3.2. Here 6 is major and 3 is the minor version. Major and Minor must be the same to avoid compatibility issues.

Requirements

We will take the below 2 nodes for installing CDH. Already we have installed Cloudera Manager in master1, also we are using master1 as a repo server.

master1.tecmint.com
worker1.tecmint.com

Step 1: Download CDH Parcels on Master Server

1. First, connect to master1 server and download CDH Parcels files in the /var/www/html/cloudera-repos/cdh directory . We have to download below mentioned 3 files which should be compatible with RHEL/CentOS 7.

CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel
CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha1
manifest.json

2. Before download these files, make sure to create a cdh directory under /var/www/html/cloudera-repos/ location.

$ cd /var/www/html/cloudera-repos/
$ sudo mkdir cdh
$ cd cdh

3. Next, download the above mentioned 3 files using the following wget command.

$ sudo wget https://archive.cloudera.com/cdh6/6.3.2/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel 
$ sudo wget https://archive.cloudera.com/cdh6/6.3.2/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha1 
$ sudo wget https://archive.cloudera.com/cdh6/6.3.2/parcels/manifest.json 

Step 2: Setup Cloudera Manager Repo On Worker Clients

4. Now, connect to worker servers and copy the repo file (cloudera-manager.repo) from repo server (master1) to all the remaining worker server. This repo file ensures the servers that all the required parcels and RPMs will be downloaded from the repo server while installing.

cat >/etc/yum.repos.d/cloudera-manager.repo <<EOL
[cloudera-repo]
name=cloudera-manager
baseurl=http://104.211.95.96/cloudera-repos/cm6/
enabled=1
gpgcheck=0
EOL

5. Once the repo added, list the enabled repositories to make sure the cloudera-manager repo is enabled.

$ yum repolist
Check Enabled Repos
Check Enabled Repos

Step 3: Install Cloudera Manager Daemons and Agent on Worker Servers

6. Now, we need to install cloudera-manager-daemons and cloudera-manager-agent in all the remaining servers.

$ sudo yum install cloudera-manager-daemons cloudera-manager-agent
Install Cloudera Manager Daemons and Agent
Install Cloudera Manager Daemons and Agent

7. Next, you need to configure the Cloudera Manager agent to report the Cloudera Manager server.

$ sudo vi /etc/cloudera-scm-agent/config.ini

Replace the localhost with Cloudera Manager server IP address.

Configure Cloudera Manager
Configure Cloudera Manager

8. Start Cloudera Manager Agent and verify the status.

$ sudo systemctl start cloudera-scm-agent
$ sudo systemctl status cloudera-scm-agent
Check Cloudera Manager Status
Check Cloudera Manager Status

Step 4: Install and Setup CDH

We have CDH parcels in master1 – repo server. Make sure all the servers are having Cloudera Manager repo file in /etc/yum.repos.d/ if you follow automatic installation using Cloudera Manager.

9. Log in to Cloudera Manager using the below URL at port 7180 and use the default username and password of Cloudera Manager.

http://104.211.95.96:7180/cmf/login
Username: admin
Password: admin
Cloudera Manager Login
Cloudera Manager Login

10. Once you log in, you will be greeted with the Welcome page. Here you can find Release notes, New Features of Cloudera Manager.

Cloudera Manager Welcome
Cloudera Manager Welcome

11. Accept the License and Continue.

Cloudera Manager License
Cloudera Manager License

12. Select the Edition. The trial version is been selected by default, we can proceed with that.

Cloudera Manager Edition
Cloudera Manager Edition

13. Now, follow the Cluster Installation steps. Continue the Welcome Page.

Cloudera Manager Cluster Installation
Cloudera Manager Cluster Installation

14. Name the Cluster and continue, we have named as “tecmint”. There are 2 types of Cluster you can define. We are proceeding with Regular Cluster.

  • Regular Cluster: Will consist of storage nodes, compute nodes, and other necessary services.
  • Compute Cluster: Will consist only of compute nodes. External storage can be used for storing data.
Cloudera Manager Cluster Name
Cloudera Manager Cluster Name

15. We have already installed Cloudera Manager Agents in all the servers. You can find those servers in “Currently Managed Hosts”. For automatic installation, you have to enter the FQDN or IP of the servers in the “New Hosts” option and search. Cloudera Manager will automatically detect the hosts on which we need to install CDH.

Here, click the “Currently Managed Hosts”, select all hosts by selecting ‘Hostname’ and continue.

Currently Managed Hosts
Currently Managed Hosts

16. Select Repository – using Parcel is the recommended way. Click ‘More Options’ to configure the repository.

Select Repository
Select Repository

17. Enter the local repository URL as mentioned below. Remove all the remaining public repositories which pointing Web (Cloudera Repositories).

Parcel Repository Settings
Parcel Repository Settings

This is the CDH local repository URL that we are having in master1.

http://104.211.95.96/cloudera-repos/cdh/
Cloudera Repo
Cloudera Repo

18. Once the repository URL entered, this page will show only available parcels. Continue this step.

Select Repository Settings
Select Repository Settings

19. Now the parcels are being downloaded, distributed, unpacked, and activated in all the available servers.

Install Parcels
Install Parcels

20. Once the CDH Parcels are Activated, inspect the Cluster. This step will do a health checkup of the cluster. Here we are skipping and Continuing.

Inspect Cluster
Inspect Cluster

Step 5: Cluster Configuration

21. Here we need to select the Services to be installed in Cluster. Some packed combinations will be available by default, we are going with Custom Services.

Cluster Configuration
Cluster Configuration

22. In Custom Services, we are installing only Core Components (HDFS and YARN) for this demo purpose.

Install Services
Install Services

23. Assign the roles to the server. We can customize according to our requirements. Find the below chart which describes the recommended Role Distribution for a basic small cluster with 5 to 20 nodes with High Availability.

Assign Roles
Assign Roles
Recommended Role Distributions
Recommended Role Distributions

24. Select the Database type, hostname, DB Name, Username, and Password. As we are using Embedded PostgreSQL, it will be selected by default. Test the connection, it should be successful.

Setup Database
Setup Database

25. This page will show the default configuration parameters of HDFS and Yarn, including data directories. Review all the configuration details and you can make changes if required. Then Continue with this.

Review Configruation Parameters
Review Configuration Parameters

26. This page will show the details of the ‘First Run’ command. You can expand it to see the details of running commands. If there are any network or permission issues in the cluster, this step will get fail. Usually, this step decides the smooth installation of the Cluster Building.

First Run Commands
First Run Commands
Running Commands
Running Commands

27. Once the above step completed, Click ‘Finish’ to complete the installation. This is the Dashboard of Cloudera Manager after installing CDH.

http://104.211.95.96:7180/cmf/home
Cloudera Manager Dashboard
Cloudera Manager Dashboard
Conclusion

We have completed the Cloudera Manager and CDH installation successfully. In the Cloudera Manager Dashboard, you can find a pre-defined set of charts where you can monitor Cluster CPU, Disk IO etc. We can manage the entire Cluster using this Cloudera Manager. We will see all the administrative operations in the upcoming articles.

Mohan Sivam
A Solution Architect and Bigdata infrastructure with over 10 years of experience in Information technology.

Each tutorial at TecMint is created by a team of experienced Linux system administrators so that it meets our high-quality standards.

Join the TecMint Weekly Newsletter (More Than 156,129 Linux Enthusiasts Have Subscribed)
Was this article helpful? Please add a comment or buy me a coffee to show your appreciation.

Got Something to Say? Join the Discussion...

Thank you for taking the time to share your thoughts with us. We appreciate your decision to leave a comment and value your contribution to the discussion. It's important to note that we moderate all comments in accordance with our comment policy to ensure a respectful and constructive conversation.

Rest assured that your email address will remain private and will not be published or shared with anyone. We prioritize the privacy and security of our users.