How to Install Apache Spark on Ubuntu 22.04

Опубликовано: 05 Октябрь 2024
на канале: MivoCloud
261
42

Apache Spark is a free, open-source, and general-purpose data processing engine used by data scientists to perform extremely fast data queries on a large amount of data. It uses an in-memory data store to store queries and data directly in the main memory of the cluster nodes. It offers high-level APIs in Java, Scala, Python, and R languages. It also supports a rich set of higher-level tools such as Spark SQL, MLlib, GraphX, and Spark Streaming.

In this video I will show you how to install it

Useful Links:
VPS/VDS - https://www.mivocloud.com/

WARNING - ANGLED BRACKETS AREN'T ALLOWED IN DESCRIPTION SO BE ATTENTIVE TO THE VIDEO IN NANO EDITOR

Commands Used:
sudo apt update sudo apt upgrade -y
apt-get install default-jdk curl -y
java -version
wget https://dlcdn.apache.org/spark/spark-...
mv spark-3.5.1-bin-hadoop3/ /opt/spark
tar xvf spark-3.5.1-bin-hadoop3.tgz
nano ~/.bashrc

export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

source ~/.bashrc
useradd spark
chown -R spark:spark /opt/spark
nano /etc/systemd/system/spark-master.service

[Unit]
Description=Apache Spark Master
After=network.target

[Service]
Type=forking
User=spark
Group=spark
ExecStart=/opt/spark/sbin/start-master.sh
ExecStop=/opt/spark/sbin/stop-master.sh

[Install]
WantedBy=multi-user.target

nano /etc/systemd/system/spark-slave.service

[Unit]

Description=Apache Spark Slave

After=network.target

[Service]
Type=forking
User=spark
Group=spark
ExecStart=/opt/spark/sbin/start-slave.sh spark://your-server-ip:7077
ExecStop=/opt/spark/sbin/stop-slave.sh

[Install]
WantedBy=multi-user.target

systemctl daemon-reload
systemctl start spark-master
systemctl enable spark-master
systemctl status spark-master
systemctl start spark-slave
systemctl enable spark-slave
systemctl status spark-slave
spark-shell