Cincinnati Spark Meetup Wrap-up

Cincinnati Spark Meetup, April 15, 2015.

Cincinnati Spark Meetup, April 15, 2015.

by Curt Kohler & Darin McBeath

The Cincinnati Spark Meetup continues to attract new members at a brisk pace. After only 3 months we have grown to include over 70 members who are interested in learning about this compelling new technology platform. On April 15th, 2015, about 30 members gathered at a local business for our latest Meetup.  During the first half of the evening, we watched a video of Matei Zaharia’s Spark Summit East keynote address and were joined via phone by Michael Armbrust (lead developer for Spark SQL at Databricks) for a Q&A session on Spark 1.3.0 and the new DataFrames APIs. The second half of the evening provided the opportunity to participate in a beginner’s “hands-on” programming session with Spark with experienced members available for help.  The session provided sample data, a set of questions about the data to answer using typical Spark processing patterns, and solutions to the exercises in Scala, Python, and Java. 

After reflecting on the meeting, there were a few items that really stood out:

·      Spark is definitely a trending, hot tech topic. One group at the Meetup had made a 2 hour drive from Lexington, Kentucky to learn more about the platform. Even in the midwest, there are companies releasing production processes leveraging Spark.

·      DataFrames will be the API of the future for Spark.  The named column programming paradigm makes the code much less cryptic than using RDD positional offsets you typically encounter with lower level Spark APIs. The DataFrame approach also reduces the amount of code that you need to write, as the optimizer handles much of the data remapping needed to do multiple joins, etc.

Finally, for those who are interested, the material from the programming session is available in an S3 bucket and can be found here: http://cincy-spark.s3.amazonaws.com/