Spark DataFrames are filled with null values and Spark's null best practices differ from standard programming languages. This post explains how to deal with null in Spark and avoid the dreaded NullPointerException.
The spread between emerging market value and emerging market growth is the widest it's ever been and Rob Arnott says emerging market value stocks are a buy. This post looks at options for retail investors.
Spark provides a lot of SQL functions that make it easy to perform a variety of operations on DataFrames. This blog post explains how to use common SQL functions and how to write your own SQL functions.
Spark transformations often depend on columns that are added by other transformations, thus creating an order dependency. This post will discuss tactics that reduce order dependencies in transformation libraries.
Environment config files return different values for the test, development, staging, and production environments. This episode will show you how to add environment configuration to your Spark projects.
Spark JAR files let you package code in a GitHub repository and run it on a cluster. This episode will demonstrate how to build JAR files with SBT and how to customize the code that's included in JAR files.
The Spark rlike method allows for powerful SQL REGEXP pattern matching. This episode shows how to make simple rlike matches and then dives into techniques for defining multiple match criteria in a CSV file.
uTest is the 'essential test framework' for the Scala programming language and provides an elegant interface for writing tests. This blog post shows how to test Spark functions with the uTest framework.