Spark 从零到开发(二)Spark安装和集群搭建

本人花费半年的时间总结的《Java面试指南》已拿腾讯等大厂offer,已开源在github ,欢迎star!

本文GitHub https://github.com/OUYANGSIHAI/JavaInterview 已收录,这是我花了6个月总结的一线大厂Java面试总结,本人已拿大厂offer,欢迎star

原文链接:blog.ouyangsihai.cn >> Spark 从零到开发(二)Spark安装和集群搭建

准备工作

首先得安装scala:

伪集群搭建没做过的参考:CentOS7.x Hadoop集群搭建

下载解压

配置

1. 配置环境变量

/etc/profile

123
export SPARK_HOME=/home/fantj/sparkexport PATH=$PATH:$SPARK_HOME/binexport CLASSPAHT=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

export SPARK_HOME=/home/fantj/spark
export PATH=$PATH:$SPARK_HOME/bin
export CLASSPAHT=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

2. 配置 /conf/spark-env.sh

cp spark-env.sh.template spark-env.sh

给尾部添加环境变量:

12345
export JAVA_HOME=/home/fantj/jdkexport SCALA_HOME=/home/fantj/scalaexport SPARK_MASTER_IP=s166export SPARK_WORKER_MEMORY=1gexport HADOOP_CONF_DIR=/home/fantj/hadoop/etc/hadoop

export JAVA_HOME=/home/fantj/jdk
export SCALA_HOME=/home/fantj/scala
export SPARK_MASTER_IP=s166
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/home/fantj/hadoop/etc/hadoop

3. 配置 /conf/slaves.conf

cp slaves.template slaves.conf

新添数据:

123
spark2spark3spark4

spark2
spark3
spark4

同步配置到slave节点

将spark和scala 和配置文件拷贝到每个slave节点。

123456
1099  scp -r scala-2.11.7 spark-1.5.1-bin-hadoop2.4/ s168:/home/fantj/download/ 1100  scp -r scala-2.11.7 spark-1.5.1-bin-hadoop2.4/ s169:/home/fantj/download/  1135  scp /etc/profile s167:/etc/profile 1136  scp /etc/profile s168:/etc/profile 1137  scp /etc/profile s169:/etc/profile

1099 scp -r scala-2.11.7 spark-1.5.1-bin-hadoop2.4/ s168:/home/fantj/download/
1100 scp -r scala-2.11.7 spark-1.5.1-bin-hadoop2.4/ s169:/home/fantj/download/

1135 scp /etc/profile s167:/etc/profile
1136 scp /etc/profile s168:/etc/profile
1137 scp /etc/profile s169:/etc/profile

启动spark

  1. 首先得启动hadoop或者只启动hdfs。 start-dfs.sh命令。
  2. jps查看并确保主从机的hadoop的dfs都启动后。(主:NameNode,从:DataNode)
  3. spark的根目录下执行 ./sbin/start-all.sh,如果想要slave节点也跟着启动,需要做免密码登录。没有做的话可以用相同的命令一个一个节点去启动。
    123456
    [root@s166 spark]# ./sbin/start-all.sh starting org.apache.spark.deploy.master.Master, logging to /home/fantj/download/spark-1.5.1-bin-hadoop2.4/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-s166.outlocalhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/fantj/download/spark-1.5.1-bin-hadoop2.4/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-s166.outlocalhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/fantj/download/spark-1.5.1-bin-hadoop2.4/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-s167.outlocalhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/fantj/download/spark-1.5.1-bin-hadoop2.4/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-s168.outlocalhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/fantj/download/spark-1.5.1-bin-hadoop2.4/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-s169.out

[root@s166 spark]# ./sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /home/fantj/download/spark-1.5.1-bin-hadoop2.4/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-s166.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/fantj/download/spark-1.5.1-bin-hadoop2.4/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-s166.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/fantj/download/spark-1.5.1-bin-hadoop2.4/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-s167.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/fantj/download/spark-1.5.1-bin-hadoop2.4/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-s168.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/fantj/download/spark-1.5.1-bin-hadoop2.4/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-s169.out

  1. 再查看jps
    123456789101112131415161718
    -------s166 jps -------1397 NameNode52854 Worker1559 SecondaryNameNode53671 Jps52719 Master-------s167 jps -------1764 DataNode29092 Jps28414 Worker-------s168 jps -------33921 Worker1756 DataNode34063 Jps-------s169 jps -------27384 Jps1754 DataNode27242 Worker

——-s166 jps ——-
1397 NameNode
52854 Worker
1559 SecondaryNameNode
53671 Jps
52719 Master
——-s167 jps ——-
1764 DataNode
29092 Jps
28414 Worker
——-s168 jps ——-
33921 Worker
1756 DataNode
34063 Jps
——-s169 jps ——-
27384 Jps
1754 DataNode
27242 Worker

可以看到,一个 Master三个 Worker

然后再访问主节点ip的8080端口。

打开Spark-shell

12345678910111213141516171819
[root@s166 bin]# spark-shell 18/07/30 12:34:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable18/07/30 12:34:20 INFO spark.SecurityManager: Changing view acls to: root18/07/30 12:34:20 INFO spark.SecurityManager: Changing modify acls to: root18/07/30 12:34:20 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)18/07/30 12:34:22 INFO spark.HttpServer: Starting HTTP Server18/07/30 12:34:23 INFO server.Server: jetty-8.y.z-SNAPSHOT18/07/30 12:34:23 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:3500518/07/30 12:34:23 INFO util.Utils: Successfully started service 'HTTP class server' on port 35005.......18/07/30 12:38:39 INFO session.SessionState: Created local directory: /tmp/2c350bb0-1297-40d8-a9bd-47446b116bf3_resources18/07/30 12:38:39 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/2c350bb0-1297-40d8-a9bd-47446b116bf318/07/30 12:38:39 INFO session.SessionState: Created local directory: /tmp/root/2c350bb0-1297-40d8-a9bd-47446b116bf318/07/30 12:38:40 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/2c350bb0-1297-40d8-a9bd-47446b116bf3/_tmp_space.db18/07/30 12:38:40 INFO repl.SparkILoop: Created sql context (with Hive support)..SQL context available as sqlContext. scala

[root@s166 bin]# spark-shell
18/07/30 12:34:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
18/07/30 12:34:20 INFO spark.SecurityManager: Changing view acls to: root
18/07/30 12:34:20 INFO spark.SecurityManager: Changing modify acls to: root
18/07/30 12:34:20 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
18/07/30 12:34:22 INFO spark.HttpServer: Starting HTTP Server
18/07/30 12:34:23 INFO server.Server: jetty-8.y.z-SNAPSHOT
18/07/30 12:34:23 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:35005
18/07/30 12:34:23 INFO util.Utils: Successfully started service ‘HTTP class server’ on port 35005.


18/07/30 12:38:39 INFO session.SessionState: Created local directory: /tmp/2c350bb0-1297-40d8-a9bd-47446b116bf3_resources
18/07/30 12:38:39 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/2c350bb0-1297-40d8-a9bd-47446b116bf3
18/07/30 12:38:39 INFO session.SessionState: Created local directory: /tmp/root/2c350bb0-1297-40d8-a9bd-47446b116bf3
18/07/30 12:38:40 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/2c350bb0-1297-40d8-a9bd-47446b116bf3/_tmp_space.db
18/07/30 12:38:40 INFO repl.SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.

scala

这就证明开启成功了,同理访问 4040端口。

本人花费半年的时间总结的《Java面试指南》已拿腾讯等大厂offer,已开源在github ,欢迎star!

本文GitHub https://github.com/OUYANGSIHAI/JavaInterview 已收录,这是我花了6个月总结的一线大厂Java面试总结,本人已拿大厂offer,欢迎star

原文链接:blog.ouyangsihai.cn >> Spark 从零到开发(二)Spark安装和集群搭建


 上一篇
Spark 从零到开发(一)初识 Spark 从零到开发(一)初识
Apache Spark]()是一个围绕速度、易用性和复杂分析构建的大数据处理框架。最初在2009年由加州大学伯克利分校的AMPLab开发,并于2010年成为Apache的开源项目之一。 Spark是MapReduce的替代方案,而且兼
2021-04-05
下一篇 
Spark 从零到开发(三)初识RDD Spark 从零到开发(三)初识RDD
RDD(Resilient Distributed Dataset)叫做弹性分布式数据集,是Spark中最基本的数据抽象,它代表一个不可变、可分区、里面的元素可并行计算的集合。RDD具有数据流模型的特点:自动容错、位置感知性调度和可伸缩性
2021-04-05