How to Install and Configure Apache Hadoop on a Single Node in CentOS 7

Apache Hadoop is an Open Source framework build for distributed Big Data storage and processing data across computer clusters. The project is based on the following components:

  1. Hadoop Common – it contains the Java libraries and utilities needed by other Hadoop modules.
  2. HDFS – Hadoop Distributed File System – A Java based scalable file system distributed across multiple nodes.
  3. MapReduce – YARN framework for parallel big data processing.
  4. Hadoop YARN: A framework for cluster resource management.
Install Hadoop in CentOS 7
Install Hadoop in CentOS 7

This article will guide you on how you can install Apache Hadoop on a single node cluster in CentOS 7 (also works for RHEL 7 and Fedora 23+ versions). This type of configuration is also referenced as Hadoop Pseudo-Distributed Mode.

Step 1: Install Java on CentOS 7

1. Before proceeding with Java installation, first login with root user or a user with root privileges setup your machine hostname with the following command.

# hostnamectl set-hostname master
Set Hostname in CentOS 7
Set Hostname in CentOS 7

Also, add a new record in hosts file with your own machine FQDN to point to your system IP Address.

# vi /etc/hosts

Add the below line:

192.168.1.41 master.hadoop.lan
Set Hostname in /etc/hosts File
Set Hostname in /etc/hosts File

Replace the above hostname and FQDN records with your own settings.

2. Next, go to Oracle Java download page and grab the latest version of Java SE Development Kit 8 on your system with the help of curl command:

# curl -LO -H "Cookie: oraclelicense=accept-securebackup-cookie" “http://download.oracle.com/otn-pub/java/jdk/8u92-b14/jdk-8u92-linux-x64.rpm”
Download Java SE Development Kit 8
Download Java SE Development Kit 8

3. After the Java binary download finishes, install the package by issuing the below command:

# rpm -Uvh jdk-8u92-linux-x64.rpm
Install Java in CentOS 7
Install Java in CentOS 7

Step 2: Install Hadoop Framework in CentOS 7

4. Next, create a new user account on your system without root powers which we’ll use it for Hadoop installation path and working environment. The new account home directory will reside in /opt/hadoop directory.

# useradd -d /opt/hadoop hadoop
# passwd hadoop

5. On the next step visit Apache Hadoop page in order to get the link for the latest stable version and download the archive on your system.

# curl -O http://apache.javapipe.com/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz 
Download Hadoop Package
Download Hadoop Package

6. Extract the archive the copy the directory content to hadoop account home path. Also, make sure you change the copied files permissions accordingly.

#  tar xfz hadoop-2.7.2.tar.gz
# cp -rf hadoop-2.7.2/* /opt/hadoop/
# chown -R hadoop:hadoop /opt/hadoop/
Extract-and Set Permissions on Hadoop
Extract-and Set Permissions on Hadoop

7. Next, login with hadoop user and configure Hadoop and Java Environment Variables on your system by editing the .bash_profile file.

# su - hadoop
$ vi .bash_profile

Append the following lines at the end of the file:

## JAVA env variables
export JAVA_HOME=/usr/java/default
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar

## HADOOP env variables
export HADOOP_HOME=/opt/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
Configure Hadoop and Java Environment Variables
Configure Hadoop and Java Environment Variables

8. Now, initialize the environment variables and check their status by issuing the below commands:

$ source .bash_profile
$ echo $HADOOP_HOME
$ echo $JAVA_HOME
Initialize Linux Environment Variables
Initialize Linux Environment Variables

9. Finally, configure ssh key based authentication for hadoop account by running the below commands (replace the hostname or FQDN against the ssh-copy-id command accordingly).

Also, leave the passphrase filed blank in order to automatically login via ssh.

$ ssh-keygen -t rsa
$ ssh-copy-id master.hadoop.lan
Configure SSH Key Based Authentication
Configure SSH Key Based Authentication
If you liked this article, then do subscribe to email alerts for Linux tutorials. If you have any questions or doubts? do ask for help in the comments section.

If You Appreciate What We Do Here On TecMint, You Should Consider:

TecMint is the fastest growing and most trusted community site for any kind of Linux Articles, Guides and Books on the web. Millions of people visit TecMint! to search or browse the thousands of published articles available FREELY to all.

If you like what you are reading, please consider buying us a coffee ( or 2 ) as a token of appreciation.

Support Us

We are thankful for your never ending support.

46 thoughts on “How to Install and Configure Apache Hadoop on a Single Node in CentOS 7”

  1. Step 5: Start and Test Hadoop Cluster

    After entering this command i am getting following error. Please help me to resolved this issue.

    [[email protected] ~]$ start-dfs.sh
    

    Sample Error

    18/07/02 15:41:05 ERROR conf.Configuration: error parsing conf mapred-site.xml
    com.ctc.wstx.exc.WstxParsingException: Illegal processing instruction target ("xml"); 
    xml (case insensitive) is reserved by the specs.
     at [row,col,system-id]: [2,5,"file:/opt/hadoop/etc/hadoop/mapred-site.xml"]
    	at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:621)
    	at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:491)
    	at com.ctc.wstx.sr.BasicStreamReader.readPIPrimary(BasicStreamReader.java:4019)
    	at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2141)
    	at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1181)
    	at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2799)
    	at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2728)
    	at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2605)
    	at org.apache.hadoop.conf.Configuration.get(Configuration.java:1103)
    	at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1157)
    	at org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1434)
    	at org.apache.hadoop.security.Groups.(Groups.java:112)
    	at org.apache.hadoop.security.Groups.(Groups.java:101)
    	at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:449)
    	at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:323)
    	at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:290)
    	at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:850)
    	at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
    	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
    	at org.apache.hadoop.hdfs.tools.GetConf.run(GetConf.java:315)
    	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
    	at org.apache.hadoop.hdfs.tools.GetConf.main(GetConf.java:332)
    Exception in thread "main" java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: 
    Illegal processing instruction target ("xml"); xml (case insensitive) is reserved by the specs.
     at [row,col,system-id]: [2,5,"file:/opt/hadoop/etc/hadoop/mapred-site.xml"]
    ....
    
    Reply
  2. In .bash_profile, I have appended the following lines:

    ## JAVA env variables
    export JAVA_HOME=/usr/java/default
    export PATH=$PATH:$JAVA_HOME/bin
    export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar
    ## HADOOP env variables
    export HADOOP_HOME=/opt/hadoop
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    export HADOOP_YARN_HOME=$HADOOP_HOME
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
    

    Java version is:

    [[email protected] ~]$ java -version
    openjdk version "1.8.0_171"
    OpenJDK Runtime Environment (build 1.8.0_171-b10)
    OpenJDK 64-Bit Server VM (build 25.171-b10, mixed mode)
    

    but, after entering the command.

    # hdfs namenode -format
    

    I am getting bellow error.

    /opt/hadoop/bin/hdfs: line 319: /usr/java/default//bin/java: No such file or directory

    Please help me out to resolved this issue sir…..

    Reply
  3. Hi Sir,

    When i am entering this following command.

    # hdfs namenode -format
    

    I am getting error like /opt/hadoop/bin/hdfs: line 319: /usr/java/default//bin/java: No such file or directory

    My complete command is.

    [[email protected]er ~]$ hdfs namenode -format
    /opt/hadoop/bin/hdfs: line 319: /usr/java/default//bin/java: No such file or directory
    

    will u please help me in this sir……..?

    Reply
  4. i am not able to install java file. When i run the command in the terminal it shows “rpm failed file not found”

    Reply
  5. [[email protected] ~]# tar xfz hadoop-2.7.2.tar.gz

    gzip: stdin: not in gzip format
    tar: Child returned status 1
    tar: Error is not recoverable: exiting now

    I did try to install gzip but it did not worked.

    Reply
      • The Oracle URL is expired, use curl or wget as follows.

        # curl -LO "https://mirror.its.sfu.ca/mirror/CentOS-Third-Party/NSG/common/x86_64/jdk-8u92-linux-x64.rpm"
        OR
        # wget "https://mirror.its.sfu.ca/mirror/CentOS-Third-Party/NSG/common/x86_64/jdk-8u92-linux-x64.rpm"
        
        Reply
  6. That’s for the great tutorial on how to install Hadoop. A lot of beginners like me would be benefited by your work. I just want to suggest, perhaps including a small addendum on how to read and write from the HDFS would be great.

    Reply

Got something to say? Join the discussion.

Have a question or suggestion? Please leave a comment to start the discussion. Please keep in mind that all comments are moderated and your email address will NOT be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.