Hadoop Installation on Linux Systems

5 min readNov 6, 2023

If you ever had to install Hadoop on any system you would understand the painful and unnecessarily tiresome process that goes into setting up Hadoop on your system. In this tutorial we will go through the Installation on Hadoop on a Linux system. I will also go through the common mistakes I went through while installing the system on my systems and on the systems of my fellow colleagues.

Downloading Requirements

I recommend installing Hadoop on using the terminal it provides a easy way to check if your installation progressed successfully. The tutorial will require the terminal to be active. To open the terminal on most Ubuntu systems the command is Ctrl+Alt+T once the terminal is opened we can start downloading the requirements using the command.

sudo apt update && sudo apt install openjdk-8-jdk

This should download the latest package information for all the sources on the internet. Java version 8 which is required for this installation is also installed. To check the installation run the following command

java -version

Next we will create a new user on Ubuntu to facilitate a dedicated Hadoop user. You will be asked for some information to be entered enter as you see fit.

sudo adduser hadoop

Install ssh (secure shell) to enable secure connection between the nodes in the cluster.

sudo apt install ssh

Installing Hadoop

First we need to switch to the new user. You might need to add sudo to the start depending on the system configuration.

su - hadoop

Now configure password-less SSH access for the newly created Hadoop user, press enter for everything that follows.

ssh-keygen -t rsa

Copy the generated public key to the authorized key file and set the proper
permissions:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 640 ~/.ssh/authorized_keys

You will be asked to authenticate hosts by adding RSA keys to known hosts. Type yes and hit Enter to authenticate the localhost.

ssh localhost

Switch to the hadoop user again (with or without sudo)

su - hadoop

Download hadoop 3.3.6

wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz

Once you’ve downloaded the file, you can unzip it to a folder.

tar -xvzf hadoop-3.3.6.tar.gz

Rename the extracted folder to remove version information. This is an optional step, but if you don’t want to rename, then adjust the remaining configuration paths.

mv hadoop-3.3.6 hadoop

Next, you will need to configure Hadoop and Java Environment Variables on your system. Open the ~/.bashrc file in your favorite text editor.Here I am using nano editior , to pasting the code we use ctrl+shift+v for saving the file ctrl+x and ctrl+y ,then hit enter:

nano ~/.bashrc

Append the below lines to the file.

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

Load the above configuration in the current environment.

source ~/.bashrc

You also need to configure JAVA_HOME in hadoop-env.sh file. Edit the Hadoop environment variable file in the text editor:

nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Search for the “export JAVA_HOME” and configure it .

JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Step 11 : Configuring Hadoop :
First, you will need to create the namenode and datanode directories inside the Hadoop user home directory. Run the following command to create both directories:

cd hadoop/
mkdir -p ~/hadoopdata/hdfs/{namenode,datanode}

Next, edit the core-site.xml file and update with your system hostname:

nano $HADOOP_HOME/etc/hadoop/core-site.xml

Change the following name as per your system hostname:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Save and close the file.
Then, edit the hdfs-site.xml file:

nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Change the NameNode and DataNode directory paths as shown below:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>

Then, edit the mapred-site.xml file:

nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Make the following changes:

<configuration>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME/home/hadoop/hadoop/bin/hadoop</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME/home/hadoop/hadoop/bin/hadoop</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME/home/hadoop/hadoop/bin/hadoop</value>
</property>
</configuration>

Then, edit the yarn-site.xml file:

nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Make the following changes:

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

Save the file and close it .

Start Hadoop cluster

Before starting the Hadoop cluster. You will need to format the Namenode as a Hadoop user.Run the following command to format the Hadoop Namenode:

hdfs namenode -format

Once the Namenode directory is successfully formatted with hdfs file system, you will see the message “Storage directory /home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted“.

Then start the Hadoop cluster with the following command.