Spark

Getting spark-sql shell to work

There is a lesser-known CLI for Spark SQL, spark-sql.

Environment: CDH 5.3.0, Spark 1.2.0

spark-sql not in /usr/bin and so need to be run with full path, which is SPARK_HOME/bin. On CDH currently it requires a workaround to allow access to Hive jars. Also hive-site.xml should be in SPARK_HOME/conf as described here.
spark-sql, like other spark shell script, shares all the parameter as spark-submit.

Unlike spark-shell or pyspark, spark-sql shell requires Hive – it is calling HiveShim which is also called by HiveContext.
In CDH, it does not have a handy /usr/bin script and JAVA_HOME must also be set.

$ JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/spark-sql --master yarn --driver-class-path '/opt/cloudera/parcels/CDH/lib/hive/lib/*' --driver-java-options '-Dspark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hive/lib/*'

note: /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/ is SPARK_HOME

With spark-sql running,

spark-sql> show tables;
table1
Table2
t1

spark-sql> describe t1;
c1  int NULL
c2  string  NULL
tc  timestamp   NULL

Without Hive jar on the class-path it will error out with

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/cli/CliDriver
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.cli.CliDriver
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 13 more

Leave a comment