Friday, January 11, 2013

Serialization 201 | Cheat sheet


Serialization in Java 201

This blog / cheat sheet is meant as a ready reckon-er for intermediate / advanced Java serialization users. It provides quick look at features provided in Serialization and standard FAQs.

Absolute Basics

Serializable - marker interface


Default mechanism works fine, including circular references, eg spouse of a person has spouse as this person. The critical things that the Java Object Serialization specification can manage automatically are:
Adding new fields to a class
Changing the fields from static to nonstatic
Changing the fields from transient to nontransient

Q: Why is Object not serializable?
A:
1. If you do not make the effort to design a custom serialized form, but merely accept the default, the serialized form will forever be tied to the class’s original internal representation. In other words, if you accept the default serialized form, the class’s private and package-private instance fields become part of its exported API, and the practice of minimizing access to fields loses its effectiveness as a tool for information hiding. Ref: Effective java
2. A second cost of implementing Serializable is that it increases the likelihood of bugs and security holes. Relying on the default deserialization mechanism can easily leave objects open to invariant corruption and illegal access. eg datasource with db credentials may get serialized, object initialisation may not occur and an invalid state object may be created.
3. A third cost of implementing Serializable is that it increases the testing burden associated with releasing a new version of a class.
4. Host of classes that do not need serialization - Thread, OutputStream, Socket, ServletRequest, etc
5. Issues with inner classes

serialVersionUID

If not defined, is computed using complex algorithm, consisting of class name, access modifier, various fields, constructor, initializer, etc.
Behind the scenes
Q: If there are multiple instances of an object in a given object chain, will it be written multiple times?
A: No, the first time an object is written, JVM assigns it a handle. On all subsequent writes, it writes only this handle and not entire object.

Q: If Objects are written using Object streams, how are primitives written?
A: DataOutputStream has methods to write primitives, eg writeInt, writeChar, writeUTF, etc

serialPersistentFields

Q: Is there an alternate way than to mark fields as transient to have only some fields Serialized?
A: Yes, define serialPersistentFields as below to specify fields that will be serialised
private static final ObjectStreamField[] serialPersistentFields = {new ObjectStreamField("next", List.class)};
If the field's value is null or is otherwise not an instance of ObjectStreamField[], or if the field does not have the required modifiers, then the behavior is as if the field were not declared at all.

Q: How to prevent a subclass of Serializable implementation to not be serializable?
A: Throw NotSerializableException in writeObject

Q: Is constructor called in deserialization? What about constructor of first non-serializable parent?
A: No, constructor of serialized class and parent classes that are serializable are not invoked. Constructor of all non-serializable parents are invoked.
Q: If I have ComputerTable extends Table implements Serializable, where Table has single constructor Table(int seatingCapacity), what will be seating capacity on deserialization if I set seatingCapacity as 1 on ComputerTable?
A: It will be 0, since Table is not serializable and its property seatingCapacity will not be serialized.

Modifying what is written / read

Default mechanism is public information and can be read and understood by anyone and hence is not secure.
private void writeObject(ObjectOutputStream) and private void readObject(ObjectInputStream) can be used to provide custom hooks. eg to obscure a field by maybe applying a function on it. Notice that these MUST be private, proving that neither method is inherited and overridden or overloaded
Q: What if writeObject is provided but no readObject is provided?
A:
If writeObject contains only defaultWriteObject(), it will work fine.
If writeObject does any processing beyond defaultWriteObject, that will be lost and only defaultObject will be read
If writeObject does not use defaultWriteObject and provides only custom processing, java.io.StreamCorruptedException is thrown while deserializing

Q: What does "this" refer to in readObject?
A: It refers to the new object that is under construction.

If another custom object should be written/read instead of actual object, private Object writeReplace and private Object readResolve methods may be provided. These may be used for serialization across different (otherwise mismatched) versions, or to serialize/ deserialize custom data. This custom object may be a proxy of original object.
Q: If I have a singleton object in a JVM, I serialize it and then deserialize it in same JVM, how many instances will I have?
A: Two instances, breaking singleton. Alternative is to use readResolve to fetch the singleton.

Q: What if object is modified after it has been written to stream?
A: The changes will be lost. Alternative is to reset the ObjectOutputStream and write again.

Q: What would be the best way to provide custom implmentation on how an enum is serialized?
A: enums are maintained by JVM and their serialization cannot be customised.

Q: Can a subclass override super class’s serialization mechanism?
A: Yes, by providing protected writeObjectOverride and protected readObjectOverride methods.

Sensitive information

The easiest technique is to mark fields that contain sensitive data as private transient. If they must be serialized, following may be used.

Serialization can be made secure easily by wrapping it in SealedObject (for confidentiality) or SignedObject (for integrity) or both (rather than by putting custom algorithms in read/writeObject methods)

Object Validation

To validate an object on construction, ObjectInputValidation maybe implemented. The method validateObject() needs to be overridden.


Externalizable

public void writeExternal(ObjectOutput out) throws IOException;
public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException

Notice that these methods are "public"


ref:
http://www.ibm.com/developerworks/java/library/j-5things1/index.html
http://java.sun.com/developer/technicalArticles/Programming/serialization/
http://docs.oracle.com/javase/7/docs/platform/serialization/spec/serialTOC.html
Effective Java by Joshua Bloch, 2nd Ed

No comments :

Post a Comment