Spark Dev (1) - debug Spark program on M1/M2 chip

Last updated on Apr 26, 2023 2 min read

When I first switched to an M2-chip MBA, I ran into trouble downloading JDK 8 from orcale.com. As a result, my plans to set up Spark locally were put on hold for a while. However, with the new need to re-compile Spark (3.2.1) for my research, I recently picked up where I left off and was finally able to complete the setup.

Set up

JDK 8 (Zulu Community 8)

$ java -version
openjdk version "1.8.0_362"
OpenJDK Runtime Environment (Zulu 8.68.0.21-CA-macos-aarch64) (build 1.8.0_362-b09)
OpenJDK 64-Bit Server VM (Zulu 8.68.0.21-CA-macos-aarch64) (build 25.362-b09, mixed mode)

Scala 2.12

$ scala -version
Scala code runner version 2.12.17 -- Copyright 2002-2022, LAMP/EPFL and Lightbend, Inc.

SBT version: 1.5.7

# make sure to determine the SBT version at `./project/build.properties`
sbt.version=1.5.7

Get hands dirty

I have put an example at https://github.com/Angryrou/spark-starter. And here are some key steps:

the building file build.sbt to specify the dependency (you can add more dependencies in libraryDependencies)

name := "Spark Starter"
version := "1.0"
scalaVersion := "2.12.17"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.2.1"

create a package src.main.scala.debug
```
mkdir -p src/main/scala/debug
```

have an example scala script at src/main/scala/debug/SimpleApp.scala

import org.apache.spark.sql.SparkSession

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "README.md" // Should be some file on your system
    val spark = try {
        SparkSession
        .builder()
        .appName("Simple Application")
        .getOrCreate()
    } catch {
      case _ =>
        SparkSession
        .builder()
        .appName("Simple Application")
        .config("spark.master", "local[2]")
        .getOrCreate()
    }
    val logData = spark.read.textFile(logFile).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println(s"Lines with a: $numAs, Lines with b: $numBs")
    spark.stop()
  }
}

build package (cml)
```
sbt package
```
Intellij (IJ): directly run the class at the main entry.

Takeaways

It is not necessary to deploy Spark locally for debugging your Spark program. You can run it over IJ by hardcoding the spark.master as local[4] in the program.
To run the program on a server, you further need to pacakge your project (e.g., sbt package) after removing the hardcoding for creating the Spark Session (or using try-catch to automate the change)

Spark Spark-Dev

Spark Dev (1) - debug Spark program on M1/M2 chip

Set up

Get hands dirty

Takeaways

Chenghao Lyu

Ph.D.