I'm writing a solution in Scala based on Spark 3.5 that has complex map/seq data structures of dates where I need to represent missing dates with null or None. As this is scala my preference is None, however, I get an exception trying to encode Option[LocalDate] in a sequence or map. As a workaround, I switched to Null but I can't do this in a map as Null can't be used as a key.
This is code that reproduces the problem:
case class SubClass[T](i: Option[T])case class TestClass[T](i: Option[T] = None, st: Option[SubClass[T]], sq: Option[Seq[Option[T]]] = None) it should "handle sequence of date" in { val sparkSession = spark import sparkSession.implicits._ val aDate = LocalDate.parse("2024-04-29") var ds = Seq(TestClass[LocalDate](Some(aDate), Some(SubClass[LocalDate](Some(aDate))), Some(Seq(Some(aDate))))).toDS ds.show } it should "handle date structure" in { val sparkSession = spark import sparkSession.implicits._ val aDate = LocalDate.parse("2024-04-29") var ds = Seq(TestClass[LocalDate](Some(aDate), Some(SubClass[LocalDate](Some(aDate))))).toDS ds.show } it should "handle sequence of double" in { val sparkSession = spark import sparkSession.implicits._ val aNum = 2.42 var ds = Seq(TestClass[Double](Some(aNum), Some(SubClass[Double](Some(aNum))), Some(Seq(Some(aNum))))).toDS ds.show } it should "handle sequence of string" in { val sparkSession = spark import sparkSession.implicits._ val aString = "Foo" var ds = Seq(TestClass[String](Some(aString), Some(SubClass[String](Some(aString))), Some(Seq(Some(aString))))).toDS ds.show }
For the above tests, they all work except should handle sequence of date
which fails with exception org.apache.spark.SparkRuntimeException: Error while encoding: java.lang.RuntimeException: scala.Some is not a valid external type for schema of date
.
This only seems to be an issue with LocalDate as the Double and String tests work just fine.
Can anyone suggest a workaround or fix for this?
Thanks,
David