目录
[显示]

1.摘要

前些日子系统发现一个bug,序列化后再反序列化对象后,原先的文字会产生乱码。纠结一通宵,发现原因是java对象序列化再反序列化时,通过read(byte[])没有读到完整的数据。

2.现象

待序列化的对象实现了Externalizable接口:

序列化的逻辑如下:

反序列化的逻辑大致如下:

当byte[]超过1024byte时,obj中的s会被截断,超出部分变为空格。

3.原因

追查源代码,发现Java在序列化、反序列化时有BlockData这个概念。

3.1.序列化与writeObject

writeObject()逻辑:

ObjectOutputStream.writeObject(obj) -> ObjectOutputStream.writeOrdinaryObject(obj,class) -> ObjectOutputStream.writeExternalData(obj)

ObjectOutputStream类关键代码如下:

在调用obj.writeExternal(this)时回调了 ObjectOutputStream 的write(byte[] buf),进一步调用了BlockDataOutputStream的write(buf, 0, buf.length, false)。

BlockDataOutputStream的关键部分如下:

当在block-mode写入时,会先写入到BlockData的buffer里,当buffer长度>=1024或执行 setBlockDataMode时,会在实际output中写入block header后再写入buffer数据,header有两种格式,若header后的数据<256,则占用 2byte(TC_BLOCKDATA+len),否则占用5byte(TC_BLOCKDATALONG+len),于是序列化后的对象的数据结构大致如下:

meta ..block-header(5byte)..data(1024byte)..block-header(5byte)..data(1024byte)..block-header(2byte)..data(<256byte)..block-end

其中:
TC_BLOCKDATA = (byte)0x77;
TC_ENDBLOCKDATA = (byte)0x78;

3.2.反序列化与readObject:

ObjectInputStream.readObject(buf) ->ObjectInputStream.readObject0() -> ObjectInputStream.readOrdinaryObject() ->ObjectInputStream.readExternalData(obj) -> obj.readExternal(objectInputStream) ->回调InputStream.read(byte[]) ->ObjectInputStream.read(byte[] buffer,0,buffer.length)

进一步调用了BlockDataInputStream.read(buf,off,len),关键代码如下:

read时会读block header的数据,之后把一个block的数据加载到buffer,每次读取请求直接返回buffer里的数据,如果一次请求超过了buffer里剩余 数据的长度,那么返回buffer剩余的数据,下次调用时才会再刷新buffer,所以一次调用可能读到的数据不完整,并且后面的部分都是空 格。

4.解决

read(byte[])改为readFully(byte[])

5.为什么序列化时会有block-data:

关于block-data,java doc里的描述: http://docs.oracle.com/javase/7/docs/platform/serialization/spec/protocol.html

6.3 Stream Protocol Versions

It was necessary to make a change to the serialization stream format in JDK 1.2 that is not backwards compatible to all minor releases of JDK 1.1. To provide for cases where backwards compatibility is required, a capability has been added to indicate what PROTOCOL_VERSION to use when writing a serialization stream. The method ObjectOutputStream.useProtocolVersiontakes as a parameter the protocol version to use to write the serialization stream. The Stream Protocol Versions are as follows: ObjectStreamConstants.PROTOCOL_VERSION_1: Indicates the initial stream format. ObjectStreamConstants.PROTOCOL_VERSION_2: Indicates the new external data format. Primitive data is written in block data mode and is terminated with TC_ENDBLOCKDATA. Block data boundaries have been standardized. Primitive data written in block data mode is normalized to not exceed 1024 byte chunks. The benefit of this change was to tighten the specification of serialized data format within the stream. This change is fully backward and forward compatible.