-
Notifications
You must be signed in to change notification settings - Fork 21
Description
This is a generalization of scala/scala-xml#254 reported by @ashawley and analyzed by @xuwei-k
steps
To minimize scala.xml.XMLTestJVM.serializeAttribute failure Yoshida-san created a minimization using a custom collection that looks like this:
class MyCollection[B](val list: List[B]) extends scala.collection.Iterable[B] {
override def iterator = list.iterator
// protected[this] override def writeReplace(): AnyRef = this
}I'm breaking up the test into the following steps:
@Test
def testMyCollection: Unit = {
val list = List(1, 2, 3)
val arr = serialize(new MyCollection(list))
val obj2 = deserialize[MyCollection[Int]](arr)
assert(obj2.list == list)
}
def serialize[A <: Serializable](obj: A): Array[Byte] = {
val o = new ByteArrayOutputStream()
val os = new ObjectOutputStream(o)
os.writeObject(obj)
o.toByteArray()
}
def deserialize[A <: Serializable](bytes: Array[Byte]): A = {
val s = new ByteArrayInputStream(bytes)
val is = new ObjectInputStream(s)
is.readObject().asInstanceOf[A]
}problem
When I run this the error I get is as follows:
[error] Test issue.IssueTest.testMyCollection failed: java.lang.ClassCastException: scala.collection.immutable.$colon$colon cannot be cast to issue.MyCollection, took 0.009 sec
[error] at issue.IssueTest.testMyCollection(IssueTest.scala:20)
[error] ...
In other words, I don't get to the assertion, but instead it's failing at casting in the deserialization which deserialized $colon$colon instead of MyCollection. This looks to be a different problem than #9237.
expectation
Either the serialization works out of the box, or MyCollection does not compile without providing some serialization mechanism.
workaround
A workaround identified by Yoshida-san is uncommenting the following:
// protected[this] override def writeReplace(): AnyRef = thisnote
scala/scala#6676 makes Iterable Serializable by default.
trait Iterable[+A] extends IterableOnce[A] with IterableOps[A, Iterable, Iterable[A]] with Serializable {with writeReplace implemented as follows:
protected[this] def writeReplace(): AnyRef = new DefaultSerializationProxy(iterableFactory.iterableFactory, this)In other words, the serialization of all things Iterable are passed into DefaultSerializationProxy, including all subtypes that exist in the wild. Perhaps it should fail at the point of serialization when it detects a type that it cannot handle.
Letting it serialize the data, but not deserialize sounds like a potentially data-losing behavior. Another thing to consider is forcing subclasses of Iterable to implement a serialization method. The situation where it's easy to roll your own collection, but it will blow up on Spark by default is not a happy experience.