CentOS 7 搭建Hadoop完全分布式集群指南
- 实操笔记
- 2025-03-24
- 64热度
- 0评论
一、环境准备
1.系统要求
-
-
推荐配置:2核CPU/4GB内存/50GB硬盘
-
2.软件准备
-
-
VMware Workstation 16 Pro
-
MobaXterm v23.2
-
hadoop-2.7.6
-
3.集群规划
-
-
hadoop102: 192.168.220.31 (DataNode + NodeManager)
-
注意:虚拟机的静态IP地址根据你的虚拟机地址动态调整,并非要和这里一摸一样,3台虚拟机的IP地址建议设置为连续地址,方便使用。
4.基础配置
# 所有节点执行
# 3台虚拟机的主机名分别设置为:hadoop101、hadoop102、hadoop103,其中hadoop101为master,另外两台为slave。
[root@hadoop101 ~]# hostnamectl set-hostname hadoop101 # slave节点设为hadoop102/hadoop103
# 关闭防火墙并禁止开机自启动
[root@hadoop101 ~]# systemctl stop firewalld && systemctl disable firewalld
[root@hadoop101 ~]# cd /etc/selinux/
[root@hadoop101 selinux]# vi config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
# 将 SELINUX=enforcing 修改为 SELINUX=disabled
SELINUX=disabled
# SELINUXTYPE= can take one of three values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
5.Hosts文件配置
# 所有节点执行
# /etc/hosts
192.168.220.30 hadoop101
192.168.220.31 hadoop102
192.168.220.32 hadoop103
[root@hadoop101 ~]# vi /etc/hosts
[root@hadoop101 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.220.30 hadoop101
192.168.220.31 hadoop102
192.168.220.32 hadoop103
二、安装Java环境
1.上传 JDK安装包至 /opt/software
目录下并进行解压到 /opt/apps
目录下
# 在 /opt 目录下创建 software 和 apps 文件夹,分别用来放置未解压的安装包和解压后的安装文件
[root@hadoop101 ~]# mkdir -p /opt/software /opt/apps
# 利用 MobaXterm 工具将JDK安装包上传至 /opt/software 目录下
# 开始解压JDK
[root@hadoop101 ~]# cd /opt/software/
[root@hadoop101 software]# tar zxvf jdk-8u202-linux-x64.tar.gz -C /opt/apps/
2.在 配置文件中配置Java环境变量
[root@hadoop101 software]# vi /etc/profile
# 在文件末尾处添加如下内容
# JAVA_HOME
export JAVA_HOME=/opt/apps/jdk1.8.0_202 # 此处根据Java解压包所在路径动态调整
export PATH=$PATH:$JAVA_HOME/bin
3.使 Java 环境变量立即生效
[root@hadoop101 software]# source /etc/profile
[root@hadoop101 software]# java -version
java version "1.8.0_202"
Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)
三、Hadoop安装与配置
1.至/opt/apps
目录下
[root@hadoop101 software]# tar zxvf hadoop-2.7.6.tar.gz -C /opt/apps/
2.在文件中配置Hadoop环境变量
[root@hadoop101 hadoop-2.7.6]# vi /etc/profile
# 在文件末尾处添加如下内容
export HADOOP_HOME=/opt/apps/hadoop-2.7.6
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
[root@hadoop101 software]# source /etc/profile # 使立即生效
3.修改配置文件(※※核心部分※※)
- hadoop-env.sh、yarn-env.sh、mapred-env.sh
# 将 JAVA_HOME 修改上面配置路径
[root@hadoop101 hadoop]# vi hadoop-env.sh
export JAVA_HOME=/opt/apps/jdk1.8.0_202
[root@hadoop101 hadoop]# vi yarn-env.sh
export JAVA_HOME=/opt/apps/jdk1.8.0_202 # 删掉“#”并将export关键字置于行首,不要有空格
[root@hadoop101 hadoop]# vi mapred-env.sh # 删掉“#”并将export关键字置于行首,不要有空格
export JAVA_HOME=/opt/apps/jdk1.8.0_202
- core-site.xml
[root@hadoop101 hadoop]# vi core-site.xml
# 在<configuration></configuration>标签对中间添加如下内容
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop101:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/apps/hadoop-2.7.6/data/tmp</value>
</property>
- hdfs-site.xml
[root@hadoop101 hadoop]# vi hdfs-site.xml
# 在<configuration></configuration>标签对中间添加如下内容
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 指定Hadoop辅助名称节点主机配置 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop103:50090</value>
</property>
- mapred-site.xml
[root@hadoop101 hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@hadoop101 hadoop]# vi mapred-site.xml
<!-- 指定MR运行在Yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
- slaves 文件
[root@hadoop101 hadoop]# vi slaves
# 删掉原有的 localhost,添加hadoop102、hadoop103两个节点
hadoop102
hadoop103
4.hadoop101
虚拟机关机,利用虚拟机克隆功能完整克隆出hadoop102
、hadoop103
- 修改、
hadoop103
两台主机的主机名、IP地址
# 在 hadoop102 上修改主机名和IP地址
[root@hadoop101 ~]# hostnamectl set-hostname hadoop102
[root@hadoop101 ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens32
# 只需要将 IPADDR 地址修改为集群规划中对应 hadoop102 的IP地址
IPADDR=192.168.220.31
# 重启网卡
[root@hadoop102 ~]# systemctl restart network
# 在 hadoop103 上修改主机名和IP地址
[root@hadoop101 ~]# hostnamectl set-hostname hadoop103
[root@hadoop101 ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens32
# 只需要将 IPADDR 地址修改为集群规划中对应 hadoop103 的IP地址
IPADDR=192.168.220.32
# 重启网卡
[root@hadoop102 ~]# systemctl restart network
1.生成密钥对(所有节点)
[root@hadoop101 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): # 此处直接按回车键:enter
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase): # 此处直接按回车键:enter
Enter same passphrase again: # 此处直接按回车键:enter
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:N7zesnSkQxBrTWmEqYf13DJGXVXjDYcOx75qB5JIAi8 root@hadoop101
The key's randomart image is: +---[RSA 2048]----+ | .+oo oo+=| | . +=+ o=oo| | o ++=..=..| | E=.oo* . o | | . +So=+. . | | .oo=. . | |=..o | | o.+o . | | o+.. | +----[SHA256]-----+
2.分发公钥(所有节点)
# 在 hadoop101 主机上执行 [root@hadoop101 ~]# ssh-copy-id hadoop101 [root@hadoop101 ~]# ssh-copy-id hadoop102 [root@hadoop101 ~]# ssh-copy-id hadoop103 # 在 hadoop102 主机上执行 [root@hadoop102 ~]# ssh-copy-id hadoop101 [root@hadoop102 ~]# ssh-copy-id hadoop102 [root@hadoop102 ~]# ssh-copy-id hadoop103 # 在 hadoop103 主机上执行 [root@hadoop103 ~]# ssh-copy-id hadoop101 [root@hadoop103 ~]# ssh-copy-id hadoop102 [root@hadoop103 ~]# ssh-copy-id hadoop103
3.验证免密登录
五、集群初始化与群起集群
1.在hadoop101
NameNode
[root@hadoop101 hadoop-2.7.6]# hdfs namenode -format
2.在上启动HDFS
[root@hadoop101 hadoop-2.7.6]# start-dfs.sh Starting namenodes on [hadoop101] hadoop101: starting namenode, logging to /opt/apps/hadoop-2.7.6/logs/hadoop-root-namenode-hadoop101.out hadoop102: starting datanode, logging to /opt/apps/hadoop-2.7.6/logs/hadoop-root-datanode-hadoop102.out hadoop103: starting datanode, logging to /opt/apps/hadoop-2.7.6/logs/hadoop-root-datanode-hadoop103.out Starting secondary namenodes [hadoop103] hadoop103: starting secondarynamenode, logging to /opt/apps/hadoop-2.7.6/logs/hadoop-root-secondarynamenode-hadoop103.out
3.启动YARN
[root@hadoop101 hadoop-2.7.6]# start-yarn.sh starting yarn daemons starting resourcemanager, logging to /opt/apps/hadoop-2.7.6/logs/yarn-root-resourcemanager-hadoop101.out hadoop102: starting nodemanager, logging to /opt/apps/hadoop-2.7.6/logs/yarn-root-nodemanager-hadoop102.out hadoop103: starting nodemanager, logging to /opt/apps/hadoop-2.7.6/logs/yarn-root-nodemanager-hadoop103.out
4.验证服务(所有节点)
[root@hadoop101 hadoop-2.7.6]# jps # Master(hadoop101虚拟机)应有NameNode/ResourceManager进程 [root@hadoop101 hadoop-2.7.6]# hdfs dfsadmin -report # 查看DataNode状态
5.hadoop web UI
监控页面:http://192.168.220.30:50070
六、集群功能验证
1.HDFS文件操作测试
[root@hadoop101 hadoop-2.7.6]# hdfs dfs -mkdir /test [root@hadoop101 hadoop-2.7.6]# hdfs dfs -put localfile.txt /test/
2.YARN任务测试
[root@hadoop101 hadoop-2.7.6]# hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar pi 10 100