최근 포토로그


Creating Scala Fat Jars for Spark on SBT with sbt-assembly Plugin Spark

출처 : http://queirozf.com/entries/creating-scala-fat-jars-for-spark-on-sbt-with-sbt-assembly-plugin


One way to do it (for Scala-based projects) is to use the sbt-assembly plugin.

Add sbt-assembly plugin to sbt

Create a file called assembly.sbt under the project/ directory, like this:

├── src/└── project/    └── assembly.sbt

In that file, add:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.2")

After adding it to SBT, you can run $ sbt assembly and a Jar file with all dependencies (so called Fat-jar) will be created for you.

Throubleshooting: "deduplicate: different file contents found in the following:"

This is a very common error that arises due to all sorts of duplicate files in the many projects you need to package together to form the fat jar.

You need to tell sbt-assembly how to fix those in order to have a clean packaged jar.

The following build.sbt file I've used in a Spark-Streaming project can be used as an example; just paste the assemblyMergeStrategyblock into your build file and all errors should go away.

Note: The following works for Spark 1.x!

Click here if you are using Spark 2

// this file was written for spark 1.6.0 and scala 2.10.4// it will not work on spark 2!version := "1.0"name := "my-sample-spark-streaming-project"scalaVersion := "2.10.4"libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0" % "provided"libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.6.0" % "provided"libraryDependencies += "org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.6.0"libraryDependencies += "com.amazonaws" % "amazon-kinesis-client" % "1.6.1"libraryDependencies += "com.amazonaws" % "amazon-kinesis-producer" % "0.10.2"assemblyMergeStrategy in assembly := {  case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last  case PathList("javax", "activation", xs @ _*) => MergeStrategy.last  case PathList("org", "apache", xs @ _*) => MergeStrategy.last  case PathList("com", "google", xs @ _*) => MergeStrategy.last  case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last  case PathList("com", "codahale", xs @ _*) => MergeStrategy.last  case PathList("com", "yammer", xs @ _*) => MergeStrategy.last  case "about.html" => MergeStrategy.rename  case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last  case "META-INF/mailcap" => MergeStrategy.last  case "META-INF/mimetypes.default" => MergeStrategy.last  case "plugin.properties" => MergeStrategy.last  case "log4j.properties" => MergeStrategy.last  case x =>    val oldStrategy = (assemblyMergeStrategy in assembly).value    oldStrategy(x)}

Throubleshooting Spark 2: "deduplicate: different file contents found in the following:"

For Spark 2 you need to add a couple of lines to the above solution.

Here's a working build.sbt for Spark 2:

// this file was written for spark 2.0.0 and scala 2.11.8version := "1.0"name := "my-sample-spark2-streaming-project"scalaVersion := "2.11.8"libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" % "provided"libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.0.0" % "provided"libraryDependencies += "org.apache.spark" %% "spark-streaming-kinesis-asl" % "2.0.0"libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" % "provided"assemblyMergeStrategy in assembly := {  case PathList("org","aopalliance", xs @ _*) => MergeStrategy.last  case PathList("javax", "inject", xs @ _*) => MergeStrategy.last  case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last  case PathList("javax", "activation", xs @ _*) => MergeStrategy.last  case PathList("org", "apache", xs @ _*) => MergeStrategy.last  case PathList("com", "google", xs @ _*) => MergeStrategy.last  case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last  case PathList("com", "codahale", xs @ _*) => MergeStrategy.last  case PathList("com", "yammer", xs @ _*) => MergeStrategy.last  case "about.html" => MergeStrategy.rename  case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last  case "META-INF/mailcap" => MergeStrategy.last  case "META-INF/mimetypes.default" => MergeStrategy.last  case "plugin.properties" => MergeStrategy.last  case "log4j.properties" => MergeStrategy.last  case x =>    val oldStrategy = (assemblyMergeStrategy in assembly).value    oldStrategy(x)}

See also


덧글

댓글 입력 영역