Friday, January 11, 2013

Serialization 201 | Cheat sheet


Serialization in Java 201

This blog / cheat sheet is meant as a ready reckon-er for intermediate / advanced Java serialization users. It provides quick look at features provided in Serialization and standard FAQs.

Absolute Basics

Serializable - marker interface


Default mechanism works fine, including circular references, eg spouse of a person has spouse as this person. The critical things that the Java Object Serialization specification can manage automatically are:
Adding new fields to a class
Changing the fields from static to nonstatic
Changing the fields from transient to nontransient

Q: Why is Object not serializable?
A:
1. If you do not make the effort to design a custom serialized form, but merely accept the default, the serialized form will forever be tied to the class’s original internal representation. In other words, if you accept the default serialized form, the class’s private and package-private instance fields become part of its exported API, and the practice of minimizing access to fields loses its effectiveness as a tool for information hiding. Ref: Effective java
2. A second cost of implementing Serializable is that it increases the likelihood of bugs and security holes. Relying on the default deserialization mechanism can easily leave objects open to invariant corruption and illegal access. eg datasource with db credentials may get serialized, object initialisation may not occur and an invalid state object may be created.
3. A third cost of implementing Serializable is that it increases the testing burden associated with releasing a new version of a class.
4. Host of classes that do not need serialization - Thread, OutputStream, Socket, ServletRequest, etc
5. Issues with inner classes

serialVersionUID

If not defined, is computed using complex algorithm, consisting of class name, access modifier, various fields, constructor, initializer, etc.
Behind the scenes
Q: If there are multiple instances of an object in a given object chain, will it be written multiple times?
A: No, the first time an object is written, JVM assigns it a handle. On all subsequent writes, it writes only this handle and not entire object.

Q: If Objects are written using Object streams, how are primitives written?
A: DataOutputStream has methods to write primitives, eg writeInt, writeChar, writeUTF, etc

serialPersistentFields

Q: Is there an alternate way than to mark fields as transient to have only some fields Serialized?
A: Yes, define serialPersistentFields as below to specify fields that will be serialised
private static final ObjectStreamField[] serialPersistentFields = {new ObjectStreamField("next", List.class)};
If the field's value is null or is otherwise not an instance of ObjectStreamField[], or if the field does not have the required modifiers, then the behavior is as if the field were not declared at all.

Q: How to prevent a subclass of Serializable implementation to not be serializable?
A: Throw NotSerializableException in writeObject

Q: Is constructor called in deserialization? What about constructor of first non-serializable parent?
A: No, constructor of serialized class and parent classes that are serializable are not invoked. Constructor of all non-serializable parents are invoked.
Q: If I have ComputerTable extends Table implements Serializable, where Table has single constructor Table(int seatingCapacity), what will be seating capacity on deserialization if I set seatingCapacity as 1 on ComputerTable?
A: It will be 0, since Table is not serializable and its property seatingCapacity will not be serialized.

Modifying what is written / read

Default mechanism is public information and can be read and understood by anyone and hence is not secure.
private void writeObject(ObjectOutputStream) and private void readObject(ObjectInputStream) can be used to provide custom hooks. eg to obscure a field by maybe applying a function on it. Notice that these MUST be private, proving that neither method is inherited and overridden or overloaded
Q: What if writeObject is provided but no readObject is provided?
A:
If writeObject contains only defaultWriteObject(), it will work fine.
If writeObject does any processing beyond defaultWriteObject, that will be lost and only defaultObject will be read
If writeObject does not use defaultWriteObject and provides only custom processing, java.io.StreamCorruptedException is thrown while deserializing

Q: What does "this" refer to in readObject?
A: It refers to the new object that is under construction.

If another custom object should be written/read instead of actual object, private Object writeReplace and private Object readResolve methods may be provided. These may be used for serialization across different (otherwise mismatched) versions, or to serialize/ deserialize custom data. This custom object may be a proxy of original object.
Q: If I have a singleton object in a JVM, I serialize it and then deserialize it in same JVM, how many instances will I have?
A: Two instances, breaking singleton. Alternative is to use readResolve to fetch the singleton.

Q: What if object is modified after it has been written to stream?
A: The changes will be lost. Alternative is to reset the ObjectOutputStream and write again.

Q: What would be the best way to provide custom implmentation on how an enum is serialized?
A: enums are maintained by JVM and their serialization cannot be customised.

Q: Can a subclass override super class’s serialization mechanism?
A: Yes, by providing protected writeObjectOverride and protected readObjectOverride methods.

Sensitive information

The easiest technique is to mark fields that contain sensitive data as private transient. If they must be serialized, following may be used.

Serialization can be made secure easily by wrapping it in SealedObject (for confidentiality) or SignedObject (for integrity) or both (rather than by putting custom algorithms in read/writeObject methods)

Object Validation

To validate an object on construction, ObjectInputValidation maybe implemented. The method validateObject() needs to be overridden.


Externalizable

public void writeExternal(ObjectOutput out) throws IOException;
public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException

Notice that these methods are "public"


ref:
http://www.ibm.com/developerworks/java/library/j-5things1/index.html
http://java.sun.com/developer/technicalArticles/Programming/serialization/
http://docs.oracle.com/javase/7/docs/platform/serialization/spec/serialTOC.html
Effective Java by Joshua Bloch, 2nd Ed

Thursday, January 10, 2013

Resolving External dependencies in Maven

Problem

Lets say we have an external dependency - a jar, a .so file provided to us by a third party, an external team and their files are not available on any public repository, or on the organization's own repository. We are using a maven and need to include this file in our build.

Just to make things more interesting, lets say we have to provide our project to someone outside.

We will look at two simple solutions to resolve this external dependency.

Solutions

1. Standard

Get this external file added to your local repository. Use the following command to install to your machine's local repository, as picked directly from maven docs:

mvn install:install-file  -Dfile=path-to-your-artifact-jar \
                          -DgroupId=your.groupId \
                          -DartifactId=your-artifactId \
                          -Dversion=version \
                          -Dpackaging=jar \
                          -DlocalRepositoryPath=path-to-specific-local-repo


This is very useful indeed, but needs to be done by everyone in the team, and by any person outside the project with whom you'd want to share.

2. Quick and Dirty

Keep the file in any desired location and let any person building project be aware of it. Modify your pom to explicitly have external dependency that is not provided by any repository.


<dependency>
<groupId>Third Party's group Id</groupId>
<artifactId>myCustomJar.artifcatId</artifactId>
<version>myCustomJar.version</version>
<systemPath>C:/customJars/myCustomJar.jar</systemPath>
</dependency>

Here, you explicitly provide the location of dependency, via systemPath tag, and informing people to modify this to where-ever they would like to keep. This would also be so much easier to explain to people who are not-yet maven'ated.

If people in project share a common drive, the path may refer to jar placed on common drive and none in project will have to edit their copy of pom.xml