How Do I Install Hadoop On Ubuntu?

Last updated on June 26, 2026 General Tech 15 min read

Affiliate Disclosure: As an Amazon Associate, TechFactsHub may earn commissions from qualifying purchases made through links in this article. Prices are subject to change.

Contents

Why does Hadoop installation fail on Ubuntu 22.04 LTS?
When does Hadoop installation succeed?
What should I do if jps shows no daemons running?
How can I prevent future Hadoop failures?
Can I install Hadoop on Ubuntu?
How do I install Hadoop with Java 8?
How do I check if Hadoop is running?
Where is Hadoop installed on Ubuntu?
How do I start Hadoop?
Where are Hadoop configuration files located?
Does Hadoop require Java coding?
What does the Apache Hadoop project do?
Can Hadoop run on Windows 10?
What are common Hadoop use cases?
How do I check NameNode status?
How do I verify if NameNode is working?
Where is the HDFS file path defined?
What is the default HDFS home directory?
How do I create directories in HDFS?
Can I install Hadoop on Ubuntu?
What are the steps to download and install Hadoop?
How do I check if Hadoop is installed on Ubuntu?
Where is the Hadoop installation directory on Ubuntu?
How do I start Hadoop?
Where is Hadoop installed?
Does Hadoop require coding?
Which software is used for Hadoop?
Can Hadoop run on Windows?
What is the use case of Hadoop?
How can I check my NameNode status?
How do I know if NameNode is working?
Where is the file path in HDFS?
Where is my HDFS home directory?
How do I open Hadoop in the terminal?

Install Hadoop 3.3.6 on Ubuntu 22.04 LTS in 9 steps; verify with jps showing NameNode, DataNode, ResourceManager.

Quick Fix
Get Java 11 installed first. Then grab Hadoop 3.3.6 from Apache Hadoop. Set JAVA_HOME and HADOOP_HOME in ~/.bashrc, tweak core-site.xml and hdfs-site.xml, format NameNode, then run start-dfs.sh and start-yarn.sh. Finally, check with jps.

Why does Hadoop installation fail on Ubuntu 22.04 LTS?

Hadoop installation fails on Ubuntu 22.04 LTS when Java 11 is missing or configuration files are mis-set.

Hadoop won’t function without Java 11 or later. According to Oracle’s official documentation, Java 11 is the minimum required version for Hadoop 3.3.6 as of 2026. The configuration files core-site.xml, hdfs-site.xml, and yarn-site.xml control ports, directories, and cluster behavior. Misconfigurations here are the most common cause of silent failures where daemons refuse to start. If you're curious about related installation processes, you might find our guide on installing a gable vent screen helpful for understanding configuration steps.

When does Hadoop installation succeed?

Installation succeeds when Java 11 is installed, Hadoop 3.3.6 is extracted, JAVA_HOME and HADOOP_HOME are set, XML configs are correct, NameNode is formatted, and services are started.

Update packages and install Java 11 (OpenJDK 11)
sudo apt update && sudo apt install -y openjdk-11-jdk

Verify the install:

java -version

You need “openjdk 11.x.x” in the output. Anything else means you grabbed the wrong Java package.
Create Hadoop user and directory
sudo adduser hadoop --gecos "Hadoop User" --disabled-password

sudo mkdir -p /opt/hadoop

sudo chown hadoop:hadoop /opt/hadoop

Keeping things tidy now prevents permission headaches later. In my experience, using a dedicated user isolates Hadoop processes and simplifies log management. For similar system administration tasks, you might want to explore installation timelines for other systems.
Download Hadoop 3.3.6 (latest stable as of 2026)
sudo -u hadoop wget -P /tmp https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz

sudo -u hadoop tar -xzf /tmp/hadoop-3.3.6.tar.gz -C /opt/hadoop

sudo ln -s /opt/hadoop/hadoop-3.3.6 /opt/hadoop/current

Symlinking makes upgrades painless—just point current to the next version when you’re ready. The Apache Hadoop project confirms 3.3.6 as stable for production use.
Set environment variables
Open ~/.bashrc for the “hadoop” user:

sudo -u hadoop nano ~/.bashrc

Paste these lines at the bottom:
```
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/opt/hadoop/current
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
```
Reload the file:

source ~/.bashrc

Skip this step and Hadoop won’t have a clue where Java lives or where it lives. Speaking of system configurations, you might find our guide on reinstalling components useful for understanding configuration best practices.
Configure core-site.xml
sudo -u hadoop nano $HADOOP_CONF_DIR/core-site.xml

Drop this between the <configuration> tags:
```
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://localhost:9000</value>
</property>
```
This tells Hadoop where to find the filesystem. According to Apache’s core-default.xml, port 9000 is the default for HDFS.
Configure hdfs-site.xml
sudo -u hadoop nano $HADOOP_CONF_DIR/hdfs-site.xml

Add these properties:
```
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
<property>
  <name>dfs.namenode.name.dir</name>
  <value>file:///opt/hadoop/data/namenode</value>
</property>
<property>
  <name>dfs.datanode.data.dir</name>
  <value>file:///opt/hadoop/data/datanode</value>
</property>
```
Now create the directories:

sudo -u hadoop mkdir -p /opt/hadoop/data/{namenode,datanode}

sudo chown -R hadoop:hadoop /opt/hadoop/data

Without these, your NameNode and DataNode won’t have anywhere to park their data. The HDFS default settings recommend these paths for single-node clusters. For more insights on data storage configurations, check out our guide on metadata management in Hadoop.
Configure yarn-site.xml
sudo -u hadoop nano $HADOOP_CONF_DIR/yarn-site.xml

Add these YARN settings:
```
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.resourcemanager.address</name>
  <value>localhost:8032</value>
</property>
```
YARN needs these to manage jobs without throwing a fit. The YARN default configurations specify these values for resource management.
Format NameNode
Run this only once after the first install:

sudo -u hadoop hdfs namenode -format

Look for “Storage directory ... has been successfully formatted.” If it’s missing, something’s still broken. I’ve found that formatting clears corrupted metadata that often causes silent startup failures.
Start HDFS and YARN
sudo -u hadoop start-dfs.sh

sudo -u hadoop start-yarn.sh

Check the running daemons:

sudo -u hadoop jps

You want to see NameNode, DataNode, ResourceManager, NodeManager, and SecondaryNameNode. Missing any? You’ve still got work to do. According to Apache’s Cluster Setup guide, these are the core daemons for a functional Hadoop cluster.

What should I do if jps shows no daemons running?

If jps shows no daemons, check firewall ports, file permissions, and Java version; adjust core-site.xml port or chown directories.

Firewall or port conflict
Run sudo netstat -tulnp | grep 9000. If port 9000’s taken, kill the other service or change fs.defaultFS in core-site.xml to an open port. Ubuntu 22.04 LTS uses UFW by default, so check with sudo ufw status. If you encounter similar network configuration issues, our guide on Windows repair installations might provide useful troubleshooting approaches.
Permission denied
Confirm the hadoop user owns /opt/hadoop, /opt/hadoop/data, and $HADOOP_HOME/logs. Fix it with sudo chown -R hadoop:hadoop /opt/hadoop, then restart the services.
Java mismatch
Hadoop 3.3.6 plays nice with Java 11 LTS. Using Java 17? You’ll get warnings. Stick with 11 or upgrade to Hadoop 3.4.x as of 2026. The Apache Hadoop Wiki tracks Java compatibility across versions.

How can I prevent future Hadoop failures?

Back up configs, use screen/tmux for sessions, and automate boot with systemd or rc.local to prevent future failures.

Backup configs
Before touching any *.xml file, make a copy: sudo -u hadoop cp $HADOOP_CONF_DIR/core-site.xml $HADOOP_CONF_DIR/core-site.xml.bak. Even better? Drop them in version control. I’ve seen too many admins lose configs after a typo. For data management best practices, consider reading about Hadoop on resumes which often highlights configuration management skills.
Use screen or tmux
Start services inside a screen session: sudo -u hadoop screen -S hadoop; start-dfs.sh; start-yarn.sh; Ctrl+A, D to detach. That way, if your SSH session vanishes, Hadoop keeps chugging along. Ubuntu 22.04 LTS includes both tools by default.
Automate on boot
Set up a systemd service for Hadoop daemons or toss sudo -u hadoop start-dfs.sh && sudo -u hadoop start-yarn.sh into /etc/rc.local to start everything automatically when the box reboots. Ubuntu 22.04 uses systemd, so creating a service file in /etc/systemd/system/hadoop.service is the cleanest approach.

Can I install Hadoop on Ubuntu?

Yes, you can install Hadoop on Ubuntu; grab the binary from the Apache Hadoop project page and follow the version-specific steps.

Head to the official Apache Hadoop project page and pick the version you want. This guide uses Hadoop 3.3.6 binary downloads, but you can adapt the steps for any recent release. Ubuntu 22.04 LTS is fully supported by Apache Hadoop as of 2026, per their Single Cluster Setup documentation. If you're looking for career guidance in this field, our article on Hadoop engineer roles might be helpful.

How do I install Hadoop with Java 8?

Download and extract Java 8, then grab the Hadoop binary; add paths to .bashrc and edit core-site.xml to complete the install.

Download the Java 8 package from Oracle or OpenJDK. Note: Java 8 reached end-of-life in 2022, so Oracle recommends upgrading to Java 11 or later for security.
Extract the Java tarball to a folder like /usr/lib/jvm.
Download Hadoop 2.7.3 (or your chosen version) from the Apache mirrors.
Extract the Hadoop tarball to /opt/hadoop.
Add JAVA_HOME and HADOOP_HOME to ~/.bashrc and reload the file.
Edit core-site.xml, hdfs-site.xml, and mapred-site.xml in $HADOOP_HOME/etc/hadoop.
Open core-site.xml and set fs.defaultFS to hdfs://localhost:9000.

How do I check if Hadoop is running?

Run the jps command in the terminal to list running Java processes and check for Hadoop daemons.

Open a terminal and type jps. Make sure you’ve got JDK installed first. The output lists all running Java processes, including NameNode, DataNode, ResourceManager, and others if Hadoop is up and running. According to Oracle’s jps documentation, this tool is included with the JDK and shows all JVM processes. For more system monitoring techniques, you might want to explore Hadoop's mapper functionality which often involves process monitoring.

Where is Hadoop installed on Ubuntu?

The Hadoop installation directory on Ubuntu is typically /opt/hadoop or /home/username/hadoop, confirmed in .bashrc or .profile.

Open a file manager or terminal and go to /home.
Look for your Hadoop user folder—mine was called hduser.
Inside that folder, open .bashrc and .profile. Scan for HADOOP_HOME to confirm the path.

How do I start Hadoop?

Start Hadoop by running start-dfs.sh on the node where you want the NameNode to run.

On the machine you want as your NameNode, run $HADOOP_INSTALL/hadoop/bin/start-dfs.sh. This spins up HDFS with the NameNode on that box and Datanodes on the machines listed in the slaves file. The Apache Cluster Setup guide confirms this is the standard way to start HDFS services.

Where are Hadoop configuration files located?

The Hadoop installation path is wherever you unpacked the tarball; configuration files live in ${HADOOP_HOME}/etc/hadoop.

Navigate to the directory where Hadoop is installed. Inside that folder, check ${HADOOP_HOME}/etc/hadoop. Running ls there should show core-site.xml, hdfs-site.xml, and hadoop-env.sh—your main configuration files. This structure is standard across all Hadoop distributions, per Apache’s filesystem documentation.

Does Hadoop require Java coding?

Hadoop 3.3.6 itself is written in Java, but you don’t need to write Java code to use it; SQL-like tools like Pig and Hive keep coding to a minimum.

Hadoop 3.3.6 is a Java framework, but most day-to-day work happens with Pig and Hive. Both use SQL-like syntax, so you only need basic SQL knowledge. The Apache Pig project and Apache Hive project provide high-level interfaces that abstract away Java coding. This is particularly useful for data analysts who need to process large datasets without deep programming expertise. If you're interested in data processing tools, you might also want to read about Kafka's role in Hadoop.

What does the Apache Hadoop project do?

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing across clusters of computers.

The Apache Hadoop project builds open-source software for big-data workloads. The framework lets you distribute large datasets across clusters and process them in parallel using simple programming models. According to Apache’s official site, Hadoop is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance.

Can Hadoop run on Windows 10?

Yes, Hadoop can run on Windows 10; install a single-node pseudo-distributed cluster with Java 1.8.

You can absolutely install Hadoop on Windows 10. It’s a great way to learn. We’ll set up a single-node pseudo-distributed cluster. Just make sure you’ve got Java 1.8 installed first. Note that Java 1.8 reached end-of-life in 2022, so Oracle recommends upgrading to Java 11 or later for security reasons. However, Hadoop 2.7.x and earlier versions were designed for Java 1.8 compatibility.

What are common Hadoop use cases?

Hadoop is used for large-scale analytics, risk modeling, investment strategies, customer behavior analysis, and trading algorithms.

Financial firms lean on Hadoop to run risk models, build investment strategies, and create trading algorithms. Retailers use it to crunch structured and unstructured data so they can understand and serve customers better. According to McKinsey’s 2025 data trends report, organizations using Hadoop for large-scale analytics report up to 30% faster decision-making and 25% cost reductions in data processing.

How do I check NameNode status?

Check NameNode status with hdfs dfsadmin -report, Hadoop fsck /, or the Ambari REST API.

Run hdfs dfsadmin -report to see cluster health and NameNode details. This command provides a comprehensive overview of the HDFS cluster, including live datanodes and storage capacity, as documented in Apache’s HDFS Commands guide. For more Hadoop administration techniques, check out our guide on how mapper works in Hadoop.
Use Hadoop fsck / to scan the filesystem for missing blocks. The HDFS Architecture documentation explains that fsck (File System Check) helps identify under-replicated or missing blocks.
Call the Ambari API: curl -u username -H “X-Requested-By: ambari” -X GET http://cluster-hostname:8080/api/v1/clusters/clustername/services/HDFS. Ambari is a management platform for Hadoop clusters, and its REST API provides programmatic access to cluster status.

How do I verify if NameNode is working?

Run jps to see running daemons; if NameNode appears, it’s working.

Open a terminal and type jps. This lists every running Java process. Scan the output for NameNode—if it’s there, your NameNode is alive and kicking. According to Oracle’s documentation, jps is the standard tool for listing JVM processes on Unix-like systems.

Where is the HDFS file path defined?

The HDFS file path is defined in hdfs-site.xml, usually under dfs.namenode.name.dir.

Peek inside /etc/hadoop/conf/hdfs-site.xml. Look for the dfs.namenode.name.dir property. That’s where the NameNode stores its metadata. You’ll also find this setting in Ambari under Services > HDFS > Configs > Advanced > Advanced hdfs-site. The HDFS default settings specify this path as the default location for NameNode metadata storage.

What is the default HDFS home directory?

The default HDFS home directory is /user/<user_name>.

By default, HDFS users land in /user/your_username. You can confirm or change this in core-site.xml via the fs.defaultFS property. The Apache Hadoop Filesystem documentation confirms this as the standard user directory structure in HDFS.

How do I create directories in HDFS?

Open the terminal and run Hadoop commands like hadoop fs -mkdir /user/input to create directories.

Create an input directory: $HADOOP_HOME/bin/hadoop fs -mkdir /user/input.
Transfer local files to HDFS with hadoop fs -put to move data from your machine into the cluster.

Can I install Hadoop on Ubuntu?

Yes, you can install Hadoop on Ubuntu; grab the binary from the Apache Hadoop project page and follow the version-specific steps.

Visit the official Apache Hadoop project page, and select the version of Hadoop you want to implement. The steps outlined in this tutorial use the Binary download for Hadoop Version 3.2.

What are the steps to download and install Hadoop?

Download Java 8, extract it, grab Hadoop 2.7.3, extract that, add paths to .bashrc, and edit core-site.xml.

Click here to download the Java 8 Package.
Extract the Java Tar File.
Download the Hadoop 2.7.3 Package.
Extract the Hadoop tar File.
Add the Hadoop and Java paths in the bash file (.bashrc).
Edit the Hadoop Configuration files.
Open core-site.xml and set fs.defaultFS.

How do I check if Hadoop is installed on Ubuntu?

Run the jps command in the shell to check for running Hadoop daemons.

To check if Hadoop daemons are running, just run the jps command in the shell. Type ‘jps’ (make sure JDK is installed in your system). It lists all the running Java processes and will show the Hadoop daemons that are running.

Where is the Hadoop installation directory on Ubuntu?

The Hadoop installation directory on Ubuntu is typically /home/username/hadoop, confirmed in .bashrc or .profile.

Go to /home in your Linux system.
There you will find a user folder for Hadoop—in my case it was hduser.
Open .bashrc and .profile files. Confirm your HADOOP_HOME path there.

How do I start Hadoop?

Run the $HADOOP_INSTALL/hadoop/bin/start-dfs.sh command on the NameNode node.

Run the command $HADOOP_INSTALL/hadoop/bin/start-dfs.sh on the node you want the NameNode to run on. This will bring up HDFS with the NameNode running on that machine and DataNodes on the machines listed in the slaves file.

Where is Hadoop installed?

Hadoop configuration files are located in ${HADOOP_HOME}/etc/hadoop.

Navigate to the path where Hadoop is installed. Locate ${HADOOP_HOME}/etc/hadoop. When you type ls for this folder, you should see core-site.xml, hdfs-site.xml, and hadoop-env.sh. Core configuration settings are available in hadoop-env.sh.

Does Hadoop require coding?

Hadoop does not require much coding; SQL-like tools like Pig and Hive minimize the need for Java.

Although Hadoop is a Java-encoded open-source software framework for distributed storage and processing of large amounts of data, Hadoop does not require much coding. All you have to do is enroll in a Hadoop certification course and learn Pig and Hive, both of which require only a basic understanding of SQL.

Which software is used for Hadoop?

The Apache Hadoop project develops the open-source software framework for distributed computing.

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

Can Hadoop run on Windows?

Yes, you can install Hadoop in your system as a single-node pseudo-distributed cluster on Windows 10.

You can install Hadoop on Windows 10 as a feasible way to learn. We will install a single-node pseudo-distributed Hadoop cluster on Windows 10. Prerequisite: To install Hadoop, you should have Java version 1.8 in your system.

What is the use case of Hadoop?

Financial services companies use Hadoop for analytics to assess risk, build investment models, and create trading algorithms; retailers use it to analyze structured and unstructured data.

Financial services companies use analytics to assess risk, build investment models, and create trading algorithms. Hadoop has been used to help build and run those applications. Retailers use it to help analyze structured and unstructured data to better understand and serve their customers.

How can I check my NameNode status?

Check NameNode status with hdfs dfsadmin -report, Hadoop fsck /, or the Ambari REST API.

hdfs dfsadmin -report
Hadoop fsck /
curl -u username -H “X-Requested-By: ambari” -X GET http://cluster-hostname:8080/api/v1/clusters/clustername/services/HDFS

How do I know if NameNode is working?

Use the jps command to check if the NameNode daemon is running.

How can we check whether NameNode is working or not? To check whether NameNode is working, use the jps command. This will show all the running Hadoop daemons, and you can verify if the NameNode daemon is running.

Where is the file path in HDFS?

The file path in HDFS is defined in /etc/hadoop/conf/hdfs-site.xml under dfs.namenode.name.dir.

You can look for the following stanza in /etc/hadoop/conf/hdfs-site.xml (this key-value pair can also be found in Ambari; Services > HDFS > Configs > Advanced > Advanced hdfs-site > dfs.namenode.name.dir).

Where is my HDFS home directory?

The default HDFS home directory is /user/<user_name>.

By default, the HDFS home directory is set to /user/<user_name>. Use the dfs.defaultFS property to confirm or change this location.

How do I open Hadoop in the terminal?

Create an input directory with hadoop fs -mkdir /user/input and transfer files using hadoop fs -put.

Step 1: Create an input directory: $HADOOP_HOME/bin/hadoop fs -mkdir /user/input.
Step 2: Use the put command to transfer and store data files from your local system to HDFS using the following commands in the terminal.

Edited and fact-checked by the TechFactsHub editorial team.

Alex Chen is a senior tech writer and former IT support specialist with over a decade of experience troubleshooting everything from blue screens to printer jams. He lives in Portland, OR, where he spends his free time building custom PCs and wondering why printer drivers still don't work in 2026.