‌CentOS 7 搭建Hadoop完全分布式集群指南‌

一、环境准备

1.系统要求

  • 3台CentOS 7虚拟机(1 Master + 2 Slave)

  • 推荐配置:2核CPU/4GB内存/50GB硬盘

  • 网络互通(关闭防火墙或开放必要端口)

2.软件准备

  • CentOS-7-x86_64-Minimal-2009

  • VMware Workstation 16 Pro

  • MobaXterm v23.2

  • hadoop-2.7.6

  • jdk-8u202-linux-x64

3.集群规划

  • hadoop101: 192.168.220.30 (NameNode + ResourceManager)

  • hadoop102: 192.168.220.31 (DataNode + NodeManager)

  • hadoop103: 192.168.220.32 (DataNode + NodeManager)

注意:虚拟机的静态IP地址根据你的虚拟机地址动态调整,并非要和这里一摸一样,3台虚拟机的IP地址建议设置为连续地址,方便使用。

4.基础配置

# 所有节点执行
# 3台虚拟机的主机名分别设置为:hadoop101、hadoop102、hadoop103,其中hadoop101为master,另外两台为slave。
[root@hadoop101 ~]# hostnamectl set-hostname hadoop101  # slave节点设为hadoop102/hadoop103
# 关闭防火墙并禁止开机自启动
[root@hadoop101 ~]# systemctl stop firewalld && systemctl disable firewalld

修改主机名

查看防火墙状态

关闭防火墙和禁止开机自启动

[root@hadoop101 ~]# cd /etc/selinux/
[root@hadoop101 selinux]# vi config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
# 将 SELINUX=enforcing 修改为 SELINUX=disabled
SELINUX=disabled
# SELINUXTYPE= can take one of three values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected.
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

将enforing修改为disabled

5.Hosts文件配置

# 所有节点执行
# /etc/hosts
192.168.220.30 hadoop101
192.168.220.31 hadoop102
192.168.220.32 hadoop103
[root@hadoop101 ~]# vi /etc/hosts
[root@hadoop101 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.220.30  hadoop101
192.168.220.31  hadoop102
192.168.220.32  hadoop103

配置IP地址和主机名的映射

二、安装Java环境

1.上传 JDK安装包至 /opt/software 目录下并进行解压到 /opt/apps目录下

# 在 /opt 目录下创建 software 和 apps 文件夹,分别用来放置未解压的安装包和解压后的安装文件
[root@hadoop101 ~]# mkdir -p /opt/software /opt/apps
# 利用 MobaXterm 工具将JDK安装包上传至 /opt/software 目录下
# 开始解压JDK
[root@hadoop101 ~]# cd /opt/software/
[root@hadoop101 software]# tar zxvf jdk-8u202-linux-x64.tar.gz -C /opt/apps/

2./etc/profile配置文件中配置Java环境变量

[root@hadoop101 software]# vi /etc/profile
# 在文件末尾处添加如下内容
# JAVA_HOME
export JAVA_HOME=/opt/apps/jdk1.8.0_202	# 此处根据Java解压包所在路径动态调整
export PATH=$PATH:$JAVA_HOME/bin

配置JAVA环境变量

3.使 Java 环境变量立即生效

[root@hadoop101 software]# source /etc/profile
[root@hadoop101 software]# java -version
java version "1.8.0_202"
Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)

三、Hadoop安装与配置

1.上传并解压hadoop-2.7.6.tar.gz/opt/apps目录下

[root@hadoop101 software]# tar zxvf hadoop-2.7.6.tar.gz -C /opt/apps/

2./etc/profile文件中配置Hadoop环境变量

[root@hadoop101 hadoop-2.7.6]# vi /etc/profile
# 在文件末尾处添加如下内容
export HADOOP_HOME=/opt/apps/hadoop-2.7.6
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
[root@hadoop101 software]# source /etc/profile	# 使立即生效

配置Hadoop环境变量

3.修改配置文件(※※核心部分※※)

  • hadoop-env.sh、yarn-env.sh、mapred-env.sh
# 将 JAVA_HOME 修改上面配置路径
[root@hadoop101 hadoop]# vi hadoop-env.sh
 export JAVA_HOME=/opt/apps/jdk1.8.0_202

[root@hadoop101 hadoop]# vi yarn-env.sh
export JAVA_HOME=/opt/apps/jdk1.8.0_202		# 删掉“#”并将export关键字置于行首,不要有空格

[root@hadoop101 hadoop]# vi mapred-env.sh	# 删掉“#”并将export关键字置于行首,不要有空格
export JAVA_HOME=/opt/apps/jdk1.8.0_202
  • core-site.xml‌
[root@hadoop101 hadoop]# vi core-site.xml
# 在<configuration></configuration>标签对中间添加如下内容
<!-- 指定HDFS中NameNode的地址 -->
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop101:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
    <name>hadoop.tmp.dir</name>
    <value>/opt/apps/hadoop-2.7.6/data/tmp</value>
</property>

配置core-site.xml文件属性项

  • hdfs-site.xml‌
[root@hadoop101 hadoop]# vi hdfs-site.xml
# 在<configuration></configuration>标签对中间添加如下内容
<property>
    <name>dfs.replication</name>
    <value>3</value>
</property>
<!-- 指定Hadoop辅助名称节点主机配置 -->
<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>hadoop103:50090</value>
</property>

配置hdfs-site.xml文件属性项

  • ‌mapred-site.xml
[root@hadoop101 hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@hadoop101 hadoop]# vi mapred-site.xml
<!-- 指定MR运行在Yarn上 -->
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>

指定mapreduce的运行模式

  • slaves 文件
[root@hadoop101 hadoop]# vi slaves
# 删掉原有的 localhost,添加hadoop102、hadoop103两个节点
hadoop102
hadoop103

4.hadoop101虚拟机关机,利用虚拟机克隆功能完整克隆出hadoop102hadoop103两台虚拟机,构成hadoop集群。

  • 修改hadoop102hadoop103两台主机的主机名IP地址
# 在 hadoop102 上修改主机名和IP地址
[root@hadoop101 ~]# hostnamectl set-hostname hadoop102
[root@hadoop101 ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens32
# 只需要将 IPADDR 地址修改为集群规划中对应 hadoop102 的IP地址
IPADDR=192.168.220.31
# 重启网卡
[root@hadoop102 ~]# systemctl restart network
# 在 hadoop103 上修改主机名和IP地址
[root@hadoop101 ~]# hostnamectl set-hostname hadoop103
[root@hadoop101 ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens32
# 只需要将 IPADDR 地址修改为集群规划中对应 hadoop103 的IP地址
IPADDR=192.168.220.32
# 重启网卡
[root@hadoop102 ~]# systemctl restart network

修改克隆出来的虚拟机主机名和IP地址

四、配置SSH免密登录

1.生成密钥对(所有节点)

[root@hadoop101 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):	# 此处直接按回车键:enter
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):		# 此处直接按回车键:enter
Enter same passphrase again:	# 此处直接按回车键:enter
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:N7zesnSkQxBrTWmEqYf13DJGXVXjDYcOx75qB5JIAi8 root@hadoop101
The key's randomart image is: +---[RSA 2048]----+ | .+oo oo+=| | . +=+ o=oo| | o ++=..=..| | E=.oo* . o | | . +So=+. . | | .oo=. . | |=..o | | o.+o . | | o+.. | +----[SHA256]-----+ 

生成SSH密钥对

2.分发公钥(所有节点)

# 在 hadoop101 主机上执行 [root@hadoop101 ~]# ssh-copy-id hadoop101 [root@hadoop101 ~]# ssh-copy-id hadoop102 [root@hadoop101 ~]# ssh-copy-id hadoop103 # 在 hadoop102 主机上执行 [root@hadoop102 ~]# ssh-copy-id hadoop101 [root@hadoop102 ~]# ssh-copy-id hadoop102 [root@hadoop102 ~]# ssh-copy-id hadoop103 # 在 hadoop103 主机上执行 [root@hadoop103 ~]# ssh-copy-id hadoop101 [root@hadoop103 ~]# ssh-copy-id hadoop102 [root@hadoop103 ~]# ssh-copy-id hadoop103 

分发SSH密钥3.验证免密登录验证SSH免密登录其他主机

五、集群初始化与群起集群

1.hadoop101上执行格式化 NameNode

[root@hadoop101 hadoop-2.7.6]# hdfs namenode -format 

初始化Namenode

2.hadoop101上启动HDFS

[root@hadoop101 hadoop-2.7.6]# start-dfs.sh Starting namenodes on [hadoop101] hadoop101: starting namenode, logging to /opt/apps/hadoop-2.7.6/logs/hadoop-root-namenode-hadoop101.out hadoop102: starting datanode, logging to /opt/apps/hadoop-2.7.6/logs/hadoop-root-datanode-hadoop102.out hadoop103: starting datanode, logging to /opt/apps/hadoop-2.7.6/logs/hadoop-root-datanode-hadoop103.out Starting secondary namenodes [hadoop103] hadoop103: starting secondarynamenode, logging to /opt/apps/hadoop-2.7.6/logs/hadoop-root-secondarynamenode-hadoop103.out 

启动HDFS3.启动YARN

[root@hadoop101 hadoop-2.7.6]# start-yarn.sh starting yarn daemons starting resourcemanager, logging to /opt/apps/hadoop-2.7.6/logs/yarn-root-resourcemanager-hadoop101.out hadoop102: starting nodemanager, logging to /opt/apps/hadoop-2.7.6/logs/yarn-root-nodemanager-hadoop102.out hadoop103: starting nodemanager, logging to /opt/apps/hadoop-2.7.6/logs/yarn-root-nodemanager-hadoop103.out 

启动YARN

4.验证服务(所有节点)

[root@hadoop101 hadoop-2.7.6]# jps # Master(hadoop101虚拟机)应有NameNode/ResourceManager进程 [root@hadoop101 hadoop-2.7.6]# hdfs dfsadmin -report # 查看DataNode状态 

5.浏览器访问 hadoop web UI监控页面:http://192.168.220.30:50070

访问hdfs监控页面

六、集群功能验证

1.HDFS文件操作测试

[root@hadoop101 hadoop-2.7.6]# hdfs dfs -mkdir /test [root@hadoop101 hadoop-2.7.6]# hdfs dfs -put localfile.txt /test/ 

2.YARN任务测试

[root@hadoop101 hadoop-2.7.6]# hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar pi 10 100