Hadoop 3.0.3 集群部署

#前言

  初次部署Hadoop,折腾了一整天,到处找资料,发现找到的资料配置都略有差别,一脸懵懂。还好这几天不忙,能慢慢折腾。晚上貌似部署成功了,怎么验证呢?等部署好Spark时再检查是否正常运行出结果吧。


#闲扯

  上周未至现在这几天(上班就晚上干了),把基本的hoovip.com 改成一个电影站,采集了5个电影站,每3个小时采集一次,做了微博自动分享电影功能。慢慢优化吧,弄完后再打算弄一个api接口,针对 Youtube,B站,Tumblr 提供下载服务。因为查了下后台,这三个查询的比例最大,最后今天上google站长统计发现,csdn上的 Bilibili B站视频 快速下載、備份影片 Mp4 格式 - videofk.com 带来100~200的访问IP,哈哈。


#来上代码吧

1, vim /etc/hosts 
  192.168.1.101 had001(master)
  192.168.1.102 had002
  192.168.1.103 had003 
  #复制到102,103 主机
  
一,操作had001主机
2,将101 的 .ssh/authorized_keys 复制到 102,103二台服务器
  scp -i /home/xxx.pem .ssh/authorized_keys root@had002:/root/.ssh/
  scp -i /home/xxx.pem .ssh/authorized_keys root@had003:/root/.ssh/
  #如果没问题,在101上就可以直接 ssh root@had002 连上102服务器了
  
3,将路径写入环境变更 vim /etc/profile  ,保存后执行 source /etc/profile
  export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
  export HADOOP_HOME=/home/hadoop
  export HDFS_NAMENODE_USER=root
  export HDFS_DATANODE_USER=root
  export HDFS_SECONDARYNAMENODE_USER=root
  export YARN_RESOURCEMANAGER_USER=root
  export YARN_NODEMANAGER_USER=root 

  echo export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh 
4,配置core-site vim $HADOOP_HOME/etc/hadoop/core-site.xml
      <configuration>
          <property>
               <name>hadoop.tmp.dir</name>
               <value>file:/usr/local/hadoop/tmp</value> <!-- 指定临时目录-->
         </property>
         <property>
            <name>fs.defaultFS</name>
            <value>hdfs://had001:9000</value> <!--访问地址与端口-->
         </property>
     </configuration>

5,配置 vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml 
    <configuration>
        <property>
            <name>dfs.replication</name> <!--复本数量-->
            <value>1</value>
        </property>
         <property>
            <name>dfs.namenode.name.dir</name> <!-- namenode 存放路径-->
            <value>file:/usr/local/hadoop/tmp/dfs/name</value>
         </property>
         <property>
            <name>dfs.datanode.data.dir</name> <!-- datanode 存放路径 -->
            <value>file:/usr/local/hadoop/tmp/dfs/data</value>
         </property>
        <property>
          <name>dfs.namenode.secondary.http-address</name> <!-- # 通过web界面来查看HDFS状态-->
          <value>had001:9001</value>
        </property>
    </configuration>

6,配置 vim $HADOOP_HOME/etc/hadoop/mapred-site.xml 
    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    </configuration>

7,配置 vim $HADOOP_HOME/etc/hadoop/yarn-site.xml
    <configuration>
<!-- Site specific YARN configuration properties -->
        <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>had001</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.resourcemanager.address</name>
            <value>had001:8032</value>
        </property>
        <property>
           <name>yarn.resourcemanager.scheduler.address</name>
           <value>had001:8030</value>
        </property>
        <property>
           <name>yarn.resourcemanager.resource-tracker.address</name>
           <value>had001:8031</value>
        </property>
        <property>
           <name>yarn.resourcemanager.admin.address</name>
           <value>had001:8033</value>
        </property>
        <property>
           <name>yarn.resourcemanager.webapp.address</name>
           <value>had001:8088</value>
        </property>
    </configuration>

8,指定datanode 节点主机名 vim $HADOOP_HOME/etc/hadoop/workers 
    had002
    had003

9, 配置完之后,把101上面的 hadoop 打包传到 102,103二台主机上
   zip -R hadoop.zip hadoop
   scp /home/hadoop.zip  root@had002:/home
   scp /home/hadoop.zip  root@had003:/home

10,传输完之后,在102,103上进行解压
    ssh root@had002  #登录
    unzip -r /home/hadoop.zip
    
11,在101上格式化hdfs
    $HADOOP_HOME/bin/hdfs namenode -format

12,启动dfs $HADOOP_HOME/sbin/start-dfs.sh,使用jps查看启用的服务 (stop-dfs.sh是停止)

13,启动yarn(分布式计算) $HADOOP_HOME/sbin/start-yarn.sh

14,查看HDFS系统状态  $HADOOP_HOME/bin/hdfs dfsadmin -report

15,最后执行jps查看后台运行的服务
   101主机
        22656 Jps
        21205 ResourceManager
        20405 SecondaryNameNode
        10233 NodeManager
        20155 NameNode

   102,103主机
        3697 Jps
        3001 DataNode
        3389 NodeManager
参考
http://www.cnvirtue.com/547.html
https://www.linode.com/docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/
还有这几个官方文档,方便查询
http://hadoop.apache.org/docs/r3.0.3/hadoop-project-dist/hadoop-common/core-default.xml
http://hadoop.apache.org/docs/r3.0.3/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
http://hadoop.apache.org/docs/r3.0.3/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
http://hadoop.apache.org/docs/r3.0.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml