Blog

Apache Spark 을 이용한 빅데이터 분석 (5)

Spark

Zeppelin

설치하기

$ wget https://downloads.apache.org/zeppelin/zeppelin-0.9.0/zeppelin-0.9.0-bin-all.tgz $ tar -zxvf zeppelin-0.9.0-bin-all.tgz $ mv zeppelin-0.9.0-bin-all zeppelin $ ./zeppelin/bin/zeppelin-daemon.sh Please specify HADOOP_CONF_DIR if USE_HADOOP is true Zeppelin is not running Zeppelin start [ OK ] $ curl http://localhost:8080
Shell

실습

Word Count ( Object Save )

scala> val f = sc.textFile("README.md") scala> val wc = f.flatMap(l => l.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) scala> wc.saveAsObjectFile("hdfs://master-01:9000/wc_out.obj") scala> val obj = sc.objectFile[(String, Int)]("hdfs://master-01:9000/wc_out.obj") scala> obj.collect.foreach(println) (package,1) (this,1) ... scala> val obj2 = obj.sortBy(x => x._2, false) scala> val arr = obj2.take(20) scala> arr.foreach(x => println(x)) (,73) (the,23) (to,16) (Spark,14) (for,12) (##,9) (a,9) (and,9) (is,7) (run,7) (on,7) (can,6) (also,5) (in,5) (of,5) (Please,4) (*,4) (if,4) (including,4) (an,4)
Scala