Spark Notes
Spark Notes
logger}
5
5
log.dir=/var/log/spark
6
6
log.file=spark-worker-gmo-cl-data-01.log
7
7
max.log.file.size=200MB
8
-max.log.file.backup.index=10
8
+max.log.file.backup.index=5
9
9
log4j.appender.RFA=org.apache.log4j.RollingFileAppender
10
10
log4j.appender.RFA.File=${log.dir}/${log.file}
11
11
log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
12
12
log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
Find the log4j for spark and change the log directory manually
find / -ctime -0.1 2>/dev/null | grep log4j.properties <- find files that were m
odified in the last 24*60 * 0.1 minutes
/etc/hadoop/conf.cloudera.yarn/log4j.properties
/etc/hadoop/conf.cloudera.hdfs/log4j.properties
/etc/spark/conf.cloudera.spark/log4j.properties
/var/run/cloudera-scm-agent/process/2340-deploy-client-config/yarn-conf/log4j.pr
operties
/var/run/cloudera-scm-agent/process/2336-deploy-client-config/aux/client/log4j.p
roperties
/var/run/cloudera-scm-agent/process/2336-deploy-client-config/spark-conf/log4j.p
roperties
/var/run/cloudera-scm-agent/process/2330-deploy-client-config/hadoop-conf/log4j.
properties
/var/run/cloudera-scm-agent/process/2324-spark-SPARK_WORKER/config/log4j.propert
ies <- this one had old /var/log/spark and new index
/var/run/cloudera-scm-agent/process/2324-spark-SPARK_WORKER/aux/client/log4j.pro
perties
/var/run/cloudera-scm-agent/process/2324-spark-SPARK_WORKER/log4j.properties <this one had old /var/log/spark
/var/run/cloudera-scm-agent/process/2324-spark-SPARK_WORKER/hadoop-conf/log4j.pr
operties
log.dir=/var/log/spark
log.dir=/mnt/vol1/var/log/spark (worker default group advanced configuration saf
ety valve)
log.threshold=INFO
main.logger=RFA
root.logger=${log.threshold},${main.logger}
log4j.rootLogger=${root.logger}
-log.dir=/var/log/spark
+log.dir=/mnt/vol1/var/log/spark
log.file=spark-worker-gmo-cl-data-01.log
max.log.file.size=200MB
max.log.file.backup.index=5
log4j.appender.RFA=org.apache.log4j.RollingFileAppender
[dchtchou@gmo-cl-edge-02 SimpleApp]$ ls -a
. .. project run run~ simple.sbt simple.sbt~ src target
[dchtchou@gmo-cl-edge-02 SimpleApp]$ mkdir lib
[dchtchou@gmo-cl-edge-02 SimpleApp]$ cp /home/dchtchou/src/ion-pcm/SPARK/jars/sp
ark-assembly-1.1.0-SNAPSHOT-hadoop2.3.0-cdh5.1.0.jar lib
[dchtchou@gmo-cl-edge-02 SimpleApp]$ ./run
[info] Set current project to Simple Project (in build file:/home/dchtchou/src/i
on-pcm/SPARK/SimpleApp/)
[info] Compiling 1 Scala source to /home/dchtchou/src/ion-pcm/SPARK/SimpleApp/ta
rget/scala-2.10/classes...
[info] Packaging /home/dchtchou/src/ion-pcm/SPARK/SimpleApp/target/scala-2.10/si
mple-project_2.10-1.0.jar ...
[info] Done packaging.
[success] Total time: 6 s, completed Jul 30, 2014 5:50:35 PM
14/07/30 17:50:36 INFO SecurityManager: Changing view acls to: dchtchou
Compiled ok, but getting:
14/07/30 17:50:40 INFO SparkContext: Job finished: count at SimpleApp.scala:15,
took 0.025717261 s
Lines with a: 1, Lines with b: 0
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/
hive/HiveContext
at SimpleApp$.main(SimpleApp.scala:21)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.HiveConte
xt
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 9 more
will re-sart of spark role help?
File /mnt/vol1/var/log/spark/spark-master-gmo-cl-edge-02.log
nope
put /etc/hive/conf.cloudera.hive/hive-site.xml on /etc/alternatives/spark-conf/
on every node
since this is spark home, we need to replace the spark assembly as well
export SPARK_HOME=/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/spark
mv /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/spark/assembly/lib/spark
-assembly-1.0.0-cdh5.1.0-hadoop2.3.0-cdh5.1.0.jar /opt/cloudera/parcels/CDH-5.1.
0-1.cdh5.1.0.p0.53/lib/spark/assembly/lib/spark-assembly-1.0.0-cdh5.1.0-hadoop2.
3.0-cdh5.1.0.jar.Original
cp spark-assembly-1.1.0-SNAPSHOT-hadoop2.3.0-cdh5.1.0.jar /opt/cloudera/parcels/
CDH-5.1.0-1.cdh5.1.0.p0.53/lib/spark/assembly/lib/
do this by ftp
spark-assembly-1.1.0-SNAPSHOT-hadoop2.3.0-cdh5.1.0.jar
did this -no reboot required
scala>
hiveContext.hql("SELECT COUNT(*) FROM bom_table_top").collect().forea
ch(println)
ch(println)
14/07/30 19:15:45 INFO ParseDriver: Parsing command: SELECT COUNT(*) FROM bom_ta
ble_top
14/07/30 19:15:45 INFO ParseDriver: Parse Completed
14/07/30 19:15:45 INFO Analyzer: Max iterations (2) reached for batch MultiInsta
nceRelations
14/07/30 19:15:45 INFO Analyzer: Max iterations (2) reached for batch CaseInsens
itiveAttributeReferences
14/07/30 19:15:45 INFO metastore: Trying to connect to metastore with URI thrift
://gmo-cl-edge-02:9083
14/07/30 19:15:45 INFO metastore: Waiting 1 seconds before next connection attem
pt.
14/07/30 19:15:46 INFO metastore: Connected to metastore.
14/07/30 19:15:47 INFO Analyzer: Max iterations (2) reached for batch Check Anal
ysis
14/07/30 19:15:47 INFO deprecation: mapred.map.tasks is deprecated. Instead, use
mapreduce.job.maps
14/07/30 19:15:47 INFO MemoryStore: ensureFreeSpace(404207) called with curMem=0
, maxMem=278302556
14/07/30 19:15:47 INFO MemoryStore: Block broadcast_0 stored as values in memory
(estimated size 394.7 KB, free 265.0 MB)
14/07/30 19:15:47 INFO SQLContext$$anon$1: Max iterations (2) reached for batch
Add exchange
14/07/30 19:15:47 INFO SQLContext$$anon$1: Max iterations (2) reached for batch
Prepare Expressions
14/07/30 19:15:47 INFO SparkContext: Starting job: collect at SparkPlan.scala:52
14/07/30 19:15:48 INFO FileInputFormat: Total input paths to process : 1
14/07/30 19:15:48 INFO DAGScheduler: Registering RDD 5 (mapPartitions at Exchang
e.scala:69)
14/07/30 19:15:48 INFO DAGScheduler: Got job 0 (collect at SparkPlan.scala:52) w
ith 1 output partitions (allowLocal=false)
14/07/30 19:15:48 INFO DAGScheduler: Final stage: Stage 0(collect at SparkPlan.s
cala:52)
14/07/30 19:15:48 INFO DAGScheduler: Parents of final stage: List(Stage 1)
14/07/30 19:15:48 INFO DAGScheduler: Missing parents: List(Stage 1)
14/07/30 19:15:48 INFO DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[5] at
mapPartitions at Exchange.scala:69), which has no missing parents
14/07/30 19:15:48 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (Ma
pPartitionsRDD[5] at mapPartitions at Exchange.scala:69)
14/07/30 19:15:48 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
14/07/30 19:15:48 INFO TaskSetManager: Re-computing pending task lists.
14/07/30 19:15:48 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, lo
calhost, PROCESS_LOCAL, 5668 bytes)
14/07/30 19:15:48 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, lo
calhost, PROCESS_LOCAL, 5668 bytes)
14/07/30 19:15:48 INFO Executor: Running task 1.0 in stage 1.0 (TID 1)
14/07/30 19:15:48 INFO Executor: Running task 0.0 in stage 1.0 (TID 0)
14/07/30 19:15:48 INFO BlockManager: Found block broadcast_0 locally
14/07/30 19:15:48 INFO BlockManager: Found block broadcast_0 locally
14/07/30 19:15:48 INFO HadoopRDD: Input split: hdfs://gmo-cl-name-01:8020/user/h
ive/warehouse/bom_table_top/part-m-00000:39554203+39554204
14/07/30 19:15:48 INFO HadoopRDD: Input split: hdfs://gmo-cl-name-01:8020/user/h
ive/warehouse/bom_table_top/part-m-00000:0+39554203
14/07/30 19:15:48 INFO deprecation: mapred.tip.id is deprecated. Instead, use ma
preduce.task.id
14/07/30 19:15:48 INFO deprecation: mapred.task.id is deprecated. Instead, use m
apreduce.task.attempt.id
14/07/30 19:15:48 INFO deprecation: mapred.task.is.map is deprecated. Instead, u
se mapreduce.task.ismap
14/07/30 19:15:48 INFO deprecation: mapred.task.partition is deprecated. Instead
, use mapreduce.task.partition
14/07/30 19:15:48 INFO deprecation: mapred.job.id is deprecated. Instead, use ma
preduce.job.id
14/07/30 19:15:49 INFO Executor: Finished task 0.0 in stage 1.0 (TID 0). 1868 by
tes result sent to driver
14/07/30 19:15:49 INFO Executor: Finished task 1.0 in stage 1.0 (TID 1). 1868 by
tes result sent to driver
14/07/30 19:15:49 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in
1338 ms on localhost (1/2)
14/07/30 19:15:49 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in
1330 ms on localhost (2/2)
14/07/30 19:15:49 INFO DAGScheduler: Stage 1 (mapPartitions at Exchange.scala:69
) finished in 1.356 s
14/07/30 19:15:49 INFO DAGScheduler: looking for newly runnable stages
14/07/30 19:15:49 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have
all completed, from pool
14/07/30 19:15:49 INFO DAGScheduler: running: Set()
14/07/30 19:15:49 INFO DAGScheduler: waiting: Set(Stage 0)
14/07/30 19:15:49 INFO DAGScheduler: failed: Set()
14/07/30 19:15:49 INFO DAGScheduler: Missing parents for Stage 0: List()
14/07/30 19:15:49 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[9] at map at
SparkPlan.scala:52), which is now runnable
14/07/30 19:15:49 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (Ma
ppedRDD[9] at map at SparkPlan.scala:52)
14/07/30 19:15:49 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
14/07/30 19:15:49 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, lo
calhost, PROCESS_LOCAL, 5843 bytes)
14/07/30 19:15:49 INFO Executor: Running task 0.0 in stage 0.0 (TID 2)
14/07/30 19:15:49 INFO BlockManager: Found block broadcast_0 locally
14/07/30 19:15:49 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesI
nFlight: 50331648, targetRequestSize: 10066329
14/07/30 19:15:49 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2
non-empty blocks out of 2 blocks
14/07/30 19:15:49 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0
remote fetches in 6 ms
14/07/30 19:15:49 INFO Executor: Finished task 0.0 in stage 0.0 (TID 2). 1076 by
tes result sent to driver
14/07/30 19:15:49 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 2) in
93 ms on localhost (1/1)
scala>
hiveContext.hql("SELECT LEVEL, COUNT(*) AS level_counts FROM dchtchou
_bom_table GROUP BY LEVEL").collect().foreach(println)
_bom_table GROUP BY LEVEL").collect().foreach(println)
14/07/30 19:40:35 INFO ParseDriver: Parsing command: SELECT LEVEL, COUNT(*) AS l
evel_counts FROM dchtchou_bom_table GROUP BY LEVEL
14/07/30 19:40:36 INFO ParseDriver: Parse Completed
14/07/30 19:40:36 INFO Analyzer: Max iterations (2) reached for batch MultiInsta
nceRelations
14/07/30 19:40:36 INFO Analyzer: Max iterations (2) reached for batch CaseInsens
itiveAttributeReferences
14/07/30 19:40:36 INFO metastore: Trying to connect to metastore with URI thrift
://gmo-cl-edge-02:9083
14/07/30 19:40:36 INFO metastore: Waiting 1 seconds before next connection attem
pt.
14/07/30 19:40:37 INFO metastore: Connected to metastore.
14/07/30 19:40:38 INFO Analyzer: Max iterations (2) reached for batch Check Anal
ysis
14/07/30 19:40:38 INFO deprecation: mapred.map.tasks is deprecated. Instead, use
mapreduce.job.maps
14/07/30 19:40:38 INFO MemoryStore: ensureFreeSpace(404207) called with curMem=0
, maxMem=278302556
14/07/30 19:40:38 INFO MemoryStore: Block broadcast_0 stored as values in memory
(estimated size 394.7 KB, free 265.0 MB)
14/07/30 19:40:38 INFO MemoryStore: ensureFreeSpace(56) called with curMem=40420
7, maxMem=278302556
14/07/30 19:40:38 INFO MemoryStore: Block broadcast_0_meta stored as values in m
emory (estimated size 56.0 B, free 265.0 MB)
14/07/30 19:40:38 INFO BlockManagerInfo: Added broadcast_0_meta in memory on gmo
02/192.168.1.15:48984]
14/07/30 19:40:42 INFO SendingConnection: Connected to [gmo-cl-data-04/192.168.1
.17:37802], 1 messages pending
14/07/30 19:40:42 INFO SendingConnection: Connected to [gmo-cl-data-03/192.168.1
.16:60851], 1 messages pending
14/07/30 19:40:42 INFO SendingConnection: Connected to [gmo-cl-data-02/192.168.1
.15:48984], 1 messages pending
14/07/30 19:40:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on g
mo-cl-data-04:37802 (size: 88.4 KB, free: 265.3 MB)
14/07/30 19:40:42 INFO ConnectionManager: Accepted connection from [gmo-cl-data01/192.168.1.14]
14/07/30 19:40:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on g
mo-cl-data-03:60851 (size: 88.4 KB, free: 265.3 MB)
14/07/30 19:40:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on g
mo-cl-data-02:48984 (size: 88.4 KB, free: 265.3 MB)
14/07/30 19:40:42 INFO SendingConnection: Initiating connection to [gmo-cl-data01/192.168.1.14:49107]
14/07/30 19:40:42 INFO SendingConnection: Connected to [gmo-cl-data-01/192.168.1
.14:49107], 1 messages pending
14/07/30 19:40:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on g
mo-cl-data-01:49107 (size: 88.4 KB, free: 265.3 MB)
14/07/30 19:40:46 INFO TaskSetManager: Starting task 15.0 in stage 1.0 (TID 16,
gmo-cl-data-04, NODE_LOCAL, 18007 bytes)
14/07/30 19:40:46 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 0) in
4833 ms on gmo-cl-data-04 (1/1095)
14/07/30 19:40:47 INFO TaskSetManager: Starting task 17.0 in stage 1.0 (TID 17,
gmo-cl-data-03, NODE_LOCAL, 18007 bytes)
14/07/30 19:40:47 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in
5593 ms on gmo-cl-data-03 (2/1095)
14/07/30 19:40:47 INFO TaskSetManager: Starting task 19.0 in stage 1.0 (TID 18,
gmo-cl-data-02, NODE_LOCAL, 18007 bytes)
14/07/30 19:40:47 INFO TaskSetManager: Starting task 22.0 in stage 1.0 (TID 19,
gmo-cl-data-02, NODE_LOCAL, 18007 bytes)
14/07/30 19:40:47 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in
5607 ms on gmo-cl-data-02 (3/1095)
14/07/30 19:40:47 INFO TaskSetManager: Finished task 6.0 in stage 1.0 (TID 6) in
5597 ms on gmo-cl-data-02 (4/1095)
14/07/30 19:40:47 INFO TaskSetManager: Starting task 18.0 in stage 1.0 (TID 20,
gmo-cl-data-04, NODE_LOCAL, 18007 bytes)
14/07/30 19:40:47 INFO TaskSetManager: Finished task 12.0 in stage 1.0 (TID 12)
in 5842 ms on gmo-cl-data-04 (5/1095)
14/07/30 19:40:47 INFO TaskSetManager: Starting task 20.0 in stage 1.0 (TID 21,
gmo-cl-data-04, NODE_LOCAL, 18007 bytes)
14/07/30 19:40:47 INFO TaskSetManager: Finished task 8.0 in stage 1.0 (TID 8) in
5977 ms on gmo-cl-data-04 (6/1095)
14/07/30 19:40:48 INFO TaskSetManager: Starting task 23.0 in stage 1.0 (TID 22,
gmo-cl-data-02, NODE_LOCAL, 18007 bytes)
14/07/30 19:40:48 INFO TaskSetManager: Finished task 9.0 in stage 1.0 (TID 10) i
n 6129 ms on gmo-cl-data-02 (7/1095)
14/07/30 19:40:48 INFO TaskSetManager: Starting task 24.0 in stage 1.0 (TID 23,
gmo-cl-data-02, NODE_LOCAL, 18007 bytes)
14/07/30 19:40:48 INFO TaskSetManager: Finished task 16.0 in stage 1.0 (TID 14)
in 6329 ms on gmo-cl-data-02 (8/1095)
14/07/30 19:40:48 INFO TaskSetManager: Starting task 21.0 in stage 1.0 (TID 24,
gmo-cl-data-04, NODE_LOCAL, 18007 bytes)
14/07/30 19:40:48 INFO TaskSetManager: Finished task 5.0 in stage 1.0 (TID 4) in
6372 ms on gmo-cl-data-04 (9/1095)
14/07/30 19:40:48 INFO TaskSetManager: Starting task 25.0 in stage 1.0 (TID 25,
gmo-cl-data-03, NODE_LOCAL, 18007 bytes)
14/07/30 19:40:48 INFO TaskSetManager: Finished task 10.0 in stage 1.0 (TID 9) i
14/07/30 19:43:54 INFO TaskSetManager: Finished task 1041.0 in stage 1.0 (TID 10
) in 44 ms on gmo-cl-data-04 (44/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-03, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 128 ms on gmo-cl-data-03 (45/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-03, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 104 ms on gmo-cl-data-03 (46/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-02, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 22 ms on gmo-cl-data-02 (47/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-02, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 58 ms on gmo-cl-data-02 (48/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-04, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 41 ms on gmo-cl-data-04 (49/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-03, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 27 ms on gmo-cl-data-03 (50/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-04, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 55 ms on gmo-cl-data-04 (51/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-02, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 37 ms on gmo-cl-data-02 (52/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-02, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 36 ms on gmo-cl-data-02 (53/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-04, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 21 ms on gmo-cl-data-04 (54/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-03, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 46 ms on gmo-cl-data-03 (55/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-03, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 21 ms on gmo-cl-data-03 (56/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-02, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 31 ms on gmo-cl-data-02 (57/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-04, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 35 ms on gmo-cl-data-04 (58/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-03, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 26 ms on gmo-cl-data-03 (59/200)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 39 ms on gmo-cl-data-03 (60/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-03, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-03, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 267 ms on gmo-cl-data-03 (61/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-02, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
in 690 ms on gmo-cl-data-02 (62/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-02, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 54 ms on gmo-cl-data-02 (63/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-04, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-02, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 22 ms on gmo-cl-data-02 (64/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-04, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 48 ms on gmo-cl-data-04 (65/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-03, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 76 ms on gmo-cl-data-04 (66/200)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 48 ms on gmo-cl-data-03 (67/200)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 28 ms on gmo-cl-data-03 (68/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-03, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-02, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 25 ms on gmo-cl-data-02 (69/200)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 23 ms on gmo-cl-data-03 (70/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-03, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-03, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 60 ms on gmo-cl-data-03 (71/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-02, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 77 ms on gmo-cl-data-02 (72/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-02, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 35 ms on gmo-cl-data-02 (73/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-02, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 34 ms on gmo-cl-data-02 (74/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-03, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 16 ms on gmo-cl-data-03 (75/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-04, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
in 779 ms on gmo-cl-data-04 (76/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-02, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 35 ms on gmo-cl-data-02 (77/200)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 74 ms on gmo-cl-data-04 (78/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-04, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-02, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 41 ms on gmo-cl-data-02 (79/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-04, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 550 ms on gmo-cl-data-04 (80/200)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 50 ms on gmo-cl-data-02 (81/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-02, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 38 ms on gmo-cl-data-03 (82/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-03, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-03, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:57 INFO TaskSetManager: Finished
) in 564 ms on gmo-cl-data-03 (83/200)
14/07/30 19:43:57 INFO TaskSetManager: Starting
, gmo-cl-data-04, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:58 INFO TaskSetManager: Finished
9) in 56 ms on gmo-cl-data-04 (183/200)
14/07/30 19:43:58 INFO TaskSetManager: Starting
4, gmo-cl-data-02, PROCESS_LOCAL, 8734 bytes)
14/07/30 19:43:58 INFO TaskSetManager: Finished
3) in 39 ms on gmo-cl-data-02 (184/200)
14/07/30 19:43:58 INFO TaskSetManager: Finished
4) in 37 ms on gmo-cl-data-04 (185/200)
14/07/30 19:43:58 INFO TaskSetManager: Finished
) in 536 ms on gmo-cl-data-04 (186/200)
14/07/30 19:43:58 INFO TaskSetManager: Finished
2) in 27 ms on gmo-cl-data-03 (187/200)
14/07/30 19:43:58 INFO TaskSetManager: Finished
5) in 48 ms on gmo-cl-data-01 (188/200)
14/07/30 19:43:58 INFO TaskSetManager: Finished
8) in 42 ms on gmo-cl-data-02 (189/200)
14/07/30 19:43:58 INFO TaskSetManager: Finished
9) in 43 ms on gmo-cl-data-02 (190/200)
14/07/30 19:43:58 INFO TaskSetManager: Finished
1) in 46 ms on gmo-cl-data-03 (191/200)
14/07/30 19:43:58 INFO TaskSetManager: Finished task 199.0 in stage 0.0 (TID 129
4) in 43 ms on gmo-cl-data-02 (192/200)
14/07/30 19:43:58 INFO TaskSetManager: Finished task 198.0 in stage 0.0 (TID 129
3) in 49 ms on gmo-cl-data-04 (193/200)
14/07/30 19:43:58 INFO TaskSetManager: Finished task 192.0 in stage 0.0 (TID 128
7) in 74 ms on gmo-cl-data-01 (194/200)
14/07/30 19:43:58 INFO TaskSetManager: Finished task 195.0 in stage 0.0 (TID 129
0) in 73 ms on gmo-cl-data-03 (195/200)
14/07/30 19:43:58 INFO TaskSetManager: Finished task 191.0 in stage 0.0 (TID 128
6) in 372 ms on gmo-cl-data-01 (196/200)
14/07/30 19:43:58 INFO TaskSetManager: Finished task 138.0 in stage 0.0 (TID 123
3) in 641 ms on gmo-cl-data-01 (197/200)
14/07/30 19:43:58 INFO TaskSetManager: Finished task 104.0 in stage 0.0 (TID 119
9) in 939 ms on gmo-cl-data-02 (198/200)
14/07/30 19:43:58 INFO TaskSetManager: Finished task 137.0 in stage 0.0 (TID 123
2) in 793 ms on gmo-cl-data-04 (199/200)
14/07/30 19:43:59 INFO TaskSetManager: Finished task 165.0 in stage 0.0 (TID 126
0) in 1605 ms on gmo-cl-data-03 (200/200)
14/07/30 19:43:59 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have
all completed, from pool
14/07/30 19:43:59 INFO DAGScheduler: Stage 0 (collect at SparkPlan.scala:52) fin
ished in 2.771 s
14/07/30 19:43:59 INFO SparkContext: Job finished: collect at SparkPlan.scala:52
, took 200.469593935 s
[16,476377]
[3,2220985]
[10,11041237]
[14,1852499]
[4,4947681]
[13,1353556]
[12,1686848]
[17,15441]
[15,1771336]
[18,3537]
[6,16810306]
[2,764589]
[7,49529326]
[19,5143]
[11,4848319]
[5,7586226]
[8,29843447]
[20,1743]
[9,123893626]
[1,249340]
scala>
SparkSQL can't do functions yet like concat...have to program it in Spark :)
14/08/01 03:49:46 INFO ParseDriver: Parsing command: SELECT
ASSEMBLY_NAME
,ASSEMBLY_DESC
,ASSEMBLY_PF
,ASSEMBLY_BU
,ASSEMBLY_TG
,ASSEMBLY_ITEM_ID
,BILL_SEQUENCE_ID
,COMPONENT_NAME
,COMPONENT_DESC
,COMPONENT_PF
,COMPONENT_BU
,COMPONENT_TG
,COMPONENT_ITEM_ID
,COMPONENT_QUANTITY
,EFFECTIVITY_DATE
,DISABLE_DATE
,ALTERNATE_BOM_DESIGNATOR
,ORGANIZATION_ID
,ORGANIZATION_CODE
,TEST_COMMENT
,concat('<i>',table_top.COMPONENT_NAME,'</i>') AS PATH
,1 AS LEVEL
,table_top.COMPONENT_NAME AS LEVEL_1
,LEVEL_2
,LEVEL_3
,LEVEL_4
,LEVEL_5
,LEVEL_6
,LEVEL_7
,LEVEL_8
,LEVEL_9
,LEVEL_10
,LEVEL_11
,LEVEL_12
,LEVEL_13
,LEVEL_14
,LEVEL_15
,LEVEL_16
,LEVEL_17
,LEVEL_18
,LEVEL_19
,LEVEL_20
,LEVEL_21
,LEVEL_22
,LEVEL_23
,LEVEL_24
,LEVEL_25
,LEVEL_26
,LEVEL_27
,LEVEL_28
,LEVEL_29
,LEVEL_30
,1 as level_partition
FROM bom_table_top
14/08/01 03:49:46 INFO ParseDriver: Parse Completed
14/08/01 03:49:47 INFO SparkDeploySchedulerBackend: Registered executor: Actor[a
kka.tcp://sparkExecutor@gmo-cl-data-03:41181/user/Executor#1938503466] with ID 0
14/08/01 03:49:47 INFO SparkDeploySchedulerBackend: Registered executor: Actor[a
kka.tcp://sparkExecutor@gmo-cl-data-04:37881/user/Executor#2064189298] with ID 3
14/08/01 03:49:47 INFO BlockManagerMasterActor: Registering block manager gmo-cl
-data-03:59963 with 2.1 GB RAM
14/08/01 03:49:47 INFO SparkDeploySchedulerBackend: Registered executor: Actor[a
kka.tcp://sparkExecutor@gmo-cl-data-02:49467/user/Executor#114880777] with ID 2
14/08/01 03:49:47 INFO BlockManagerMasterActor: Registering block manager gmo-cl
-data-04:57370 with 2.1 GB RAM
14/08/01 03:49:47 INFO Analyzer: Max iterations (2) reached for batch MultiInsta
nceRelations
14/08/01 03:49:47 INFO Analyzer: Max iterations (2) reached for batch CaseInsens
itiveAttributeReferences
14/08/01 03:49:47 INFO BlockManagerMasterActor: Registering block manager gmo-cl
-data-02:51039 with 2.1 GB RAM
14/08/01 03:49:47 INFO metastore: Trying to connect to metastore with URI thrift
://gmo-cl-edge-02:9083
14/08/01 03:49:47 INFO metastore: Waiting 1 seconds before next connection attem
pt.
14/08/01 03:49:48 INFO metastore: Connected to metastore.
Exception in thread "main" org.apache.spark.sql.catalyst.errors.package$TreeNode
Exception: Unresolved attributes: 'concat(<i>,'table_top.component_name,</i>) AS
path#4,'table_top.component_name AS level_1#6, tree:
Project [assembly_name#8,assembly_desc#9,assembly_pf#10,assembly_bu#11,assembly_
tg#12,assembly_item_id#13,bill_sequence_id#14,component_name#15,component_desc#1
6,component_pf#17,component_bu#18,component_tg#19,component_item_id#20,component
_quantity#21,effectivity_date#22,disable_date#23,alternate_bom_designator#24,org
anization_id#25,organization_code#26,test_comment#27,'concat(<i>,'table_top.comp
onent_name,</i>) AS path#4,1 AS level#5,'table_top.component_name AS level_1#6,l
evel_2#31,level_3#32,level_4#33,level_5#34,level_6#35,level_7#36,level_8#37,leve
l_9#38,level_10#39,level_11#40,level_12#41,level_13#42,level_14#43,level_15#44,l
evel_16#45,level_17#46,level_18#47,level_19#48,level_20#49,level_21#50,level_22#
51,level_23#52,level_24#53,level_25#54,level_26#55,level_27#56,level_28#57,level
_29#58,level_30#59,1 AS level_partition#3]
LowerCaseSchema
MetastoreRelation default, bom_table_top, None
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anon
fun$apply$1.applyOrElse(Analyzer.scala:71)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anon
fun$apply$1.applyOrElse(Analyzer.scala:69)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.s
cala:165)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala
:156)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.appl
y(Analyzer.scala:69)
at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.appl
y(Analyzer.scala:67)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$an
onfun$apply$2.apply(RuleExecutor.scala:62)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$an
onfun$apply$2.apply(RuleExecutor.scala:60)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.
scala:51)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimiz
ed.scala:60)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.app
ly(RuleExecutor.scala:60)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.app
ly(RuleExecutor.scala:52)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.s
cala:52)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQ
LContext.scala:317)
at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.sc
ala:317)
at org.apache.spark.sql.hive.HiveContext$QueryExecution.optimizedPlan$lz
ycompute(HiveContext.scala:250)
at org.apache.spark.sql.hive.HiveContext$QueryExecution.optimizedPlan(Hi
veContext.scala:249)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(S
QLContext.scala:320)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.s
cala:320)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycomput
e(SQLContext.scala:323)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContex
t.scala:323)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:428)
at sparksql_shell$.main(sparksql_shell.scala:49)
at sparksql_shell.main(sparksql_shell.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:313)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[16,476377]
[17,15441]
[18,3537]
[19,5143]
[20,1743]
sparksql-shell: finished in 2.13 min
INSERT INTO dchtchou_bom_pairs_parquet
SELECT * FROM bom_pairs_tsv
See if we can catch bom_pairs_tsv
It's doing this weird thing:
14/08/07 03:58:57 INFO NetworkTopology: Adding a new node: /default/192.168.1.14
:50010
14/08/07 03:58:58 INFO DAGScheduler: Registering RDD 5 (mapPartitions at Exchang
e.scala:69)
14/08/07 03:58:58 INFO DAGScheduler: Got job 0 (collect at SparkPlan.scala:52) w
ith 1 output partitions (allowLocal=false)
14/08/07 03:58:58 INFO DAGScheduler: Final stage: Stage 0(collect at SparkPlan.s
cala:52)
14/08/07 03:58:58 INFO DAGScheduler: Parents of final stage: List(Stage 1)
14/08/07 03:58:58 INFO DAGScheduler: Missing parents: List(Stage 1)
14/08/07 03:58:58 INFO DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[5] at
mapPartitions at Exchange.scala:69), which has no missing parents
14/08/07 03:58:59 INFO DAGScheduler: Submitting 4394 missing tasks from Stage 1
(MapPartitionsRDD[5] at mapPartitions at Exchange.scala:69)
14/08/07 03:58:59 INFO TaskSchedulerImpl: Adding task set 1.0 with 4394 tasks
14/08/07 03:59:14 WARN TaskSchedulerImpl: Initial job has not accepted any resou
rces; check your cluster UI to ensure that workers are registered and have suffi
cient memory
[dchtchou@gmo-cl-edge-02 sparksql-shell]$ /home/dchtchou/src/ion-pcm/src/spark/s
parksql-shell/sparksql-shell.sh -q "SELECT COUNT(*) FROM dchtchou_bom_table_spar
ksql" 2>log.txt
sparksql-shell: using context:org.apache.spark.SparkContext@1629aeb2
sparksql-shell: running query:
SELECT COUNT(*) FROM dchtchou_bom_table_sparksql
C-c C-c[dchtchou@gmo-cl-edge-02 sparksql-shell]$ /home/dchtchou/src/ion-pcm/sr
c/spark/sparksql-shell/sparksql-shell.sh -q "SELECT COUNT(*) FROM dchtchou_bom_t
able_sparksql" 2>log.txt
sparksql-shell: using context:org.apache.spark.SparkContext@7a86b09d
sparksql-shell: running query:
SELECT COUNT(*) FROM dchtchou_bom_table_sparksql
C-c C-c[dchtchou@gmo-cl-edge-02 sparksql-shell]$