How to Install Apache Hadoop on Ubuntu 20.04 - 22.04

Опубликовано: 05 Октябрь 2024
на канале: MivoCloud
5,344
80

Apache Hadoop is an open-source framework for processing and storing big data. In today's industries, Hadoop become the standard framework for big data. Hadoop is designed to be run on distributed systems with hundreds or even thousands of clustered computers or dedicated servers. With this in mind, Hadoop can handle large datasets with high volume and complexity for both structured and unstructured data.

Every Hadoop deployment contains the following components:

Hadoop Common: The common utilities that support the other Hadoop modules.
Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput access to application data.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

In this Video, we will install the latest version of Apache Hadoop on an Ubuntu 22.04 server. Hadoop gets installed on a single node server and we create a Pseudo-Distributed Mode of Hadoop deployment.

Useful Links:
VPS/VDS - https://www.mivocloud.com/
Hadoop - https://hadoop.apache.org/

WARNING - ANGLED BRACKETS AREN'T ALLOWED IN DESCRIPTION SO BE ATTENTIVE TO THE VIDEO IN NANO REDACTOR

Commands Used:
sudo apt install default-jdk
java -version
sudo apt install openssh-server openssh-client pdsh
sudo useradd -m -s /bin/bash hadoop
sudo passwd hadoop
sudo usermod -aG sudo hadoop
su - hadoop
ssh-keygen -t rsa
ls ~/.ssh/
cat ~/.ssh/id_rsa.pub ff ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
ssh localhost
wget https://dlcdn.apache.org/hadoop/commo...
tar -xvzf hadoop-3.3.4.tar.gz
sudo mv hadoop-3.3.4 /usr/local/hadoop
sudo chown -R hadoop:hadoop /usr/local/hadoop
nano ~/.bashrc

Hadoop environment variables
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

source ~/.bashrc
echo $JAVA_HOME
echo $HADOOP_HOME
echo $HADOOP_OPTS

nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
hadoop version

sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml

configuration
property
name fs.defaultFS /name
value hdfs://IP:9000 /value
/property
/configuration

sudo mkdir -p /home/hadoop/hdfs/{namenode,datanode}
sudo chown -R hadoop:hadoop /home/hadoop/hdfs
sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

configuration

property
name dfs.replication /name
value 1 /value
/property

property
name dfs.name.dir /name
value file:///home/hadoop/hdfs/namenode /value
/property

property
name dfs.data.dir /name
value file:///home/hadoop/hdfs/datanode /value
/property

/configuration

hdfs namenode -format
start-dfs.sh

IF YOU HAVE AN ERROR
sudo apt-get remove pdsh
start-dfs.sh
IF IT DOESN'T HELP
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

YARN MANAGER
sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
configuration
property
name mapreduce.framework.name /name
value yarn /value
/property
property
name mapreduce.application.classpath /name
value $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/* /value
/property
/configuration

sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
configuration
property
name yarn.nodemanager.aux-services /name
value mapreduce_shuffle /value
/property
property
name yarn.nodemanager.env-whitelist /name
value JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME /value
/property
/configuration

start-yarn.sh