java序列化与blockdata
1.摘要
前些日子系统发现一个bug,序列化后再反序列化对象后,原先的文字会产生乱码。纠结一通宵,发现原因是java对象序列化再反序列化时,通过read(byte[])没有读到完整的数据。
2.现象
待序列化的对象实现了Externalizable接口:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
public class Test implements Externalizable { private String s; @Override public void writeExternal(ObjectOutput out) throws IOException { if (s == null) { out.writeShort(-1); } else { byte[] bb = s.getBytes("utf-8"); out.writeShort(bb.length); out.write(bb); } } @Override public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException { int len = in.readShort(); if (len < 0) return null; byte[] bb = new byte[len]; in.read(bb); return new String(bb, "utf-8"); } } |
序列化的逻辑如下:
1 2 3 |
ByteArrayOutputStream bos = new ByteArrayOutputStream(); (new ObjectOutputStream( bos )).writeObject( value ); val = bos.toByteArray(); |
反序列化的逻辑大致如下:
1 2 |
ContextObjectInputStream ois =new ContextObjectInputStream( new ByteArrayInputStream( buf ), classLoader ); Object obj=ois.readObject(buf); |
当byte[]超过1024byte时,obj中的s会被截断,超出部分变为空格。
3.原因
追查源代码,发现Java在序列化、反序列化时有BlockData这个概念。
3.1.序列化与writeObject
writeObject()逻辑:
ObjectOutputStream.writeObject(obj) -> ObjectOutputStream.writeOrdinaryObject(obj,class) -> ObjectOutputStream.writeExternalData(obj)
ObjectOutputStream类关键代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
public class ObjectOutputStream extends OutputStream implements ObjectOutput, ObjectStreamConstants { private final BlockDataOutputStream bout; public void write(byte[] buf) throws IOException { bout.write(buf, 0, buf.length, false); } private void writeExternalData(Externalizable obj) throws IOException { …… bout.setBlockDataMode(true); obj.writeExternal(this); //这里调用的是object的writeExternal bout.setBlockDataMode(false); bout.writeByte(TC_ENDBLOCKDATA); …… } } |
在调用obj.writeExternal(this)时回调了 ObjectOutputStream 的write(byte[] buf),进一步调用了BlockDataOutputStream的write(buf, 0, buf.length, false)。
BlockDataOutputStream的关键部分如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
private static class BlockDataOutputStream extends OutputStream implements DataOutput { /** maximum data block length */ private static final int MAX_BLOCK_SIZE = 1024; /** buffer for writing general/block data */ private final byte[] buf = new byte[MAX_BLOCK_SIZE]; /** underlying output stream */ private final OutputStream out; void write(byte[] b, int off, int len, boolean copy) throws IOException { …… while (len > 0) { <b>if (pos >= MAX_BLOCK_SIZE) { drain(); }</b> if (len >= MAX_BLOCK_SIZE && !copy && pos == 0) { // avoid unnecessary copy writeBlockHeader(MAX_BLOCK_SIZE); out.write(b, off, MAX_BLOCK_SIZE); off += MAX_BLOCK_SIZE; len -= MAX_BLOCK_SIZE; } else { int wlen = Math.min(len, MAX_BLOCK_SIZE - pos); System.arraycopy(b, off, buf, pos, wlen); pos += wlen; off += wlen; len -= wlen; } } } /** * Writes all buffered data from this stream to the underlying stream, * but does not flush underlying stream. */ void drain() throws IOException { if (pos == 0) { return; } <b>if (blkmode) { writeBlockHeader(pos); }</b> out.write(buf, 0, pos); pos = 0; } /** * Writes block data header. Data blocks shorter than 256 bytes are * prefixed with a 2-byte header; all others start with a 5-byte * header. */ private void writeBlockHeader(int len) throws IOException { if (len <= 0xFF) { hbuf[0] = TC_BLOCKDATA; //TC_BLOCKDATA = (byte)0x77; hbuf[1] = (byte) len; out.write(hbuf, 0, 2); } else { hbuf[0] = TC_BLOCKDATALONG; Bits.putInt(hbuf, 1, len); out.write(hbuf, 0, 5); } } /** * Sets block data mode to the given mode (true == on, false == off) * and returns the previous mode value. If the new mode is the same as * the old mode, no action is taken. If the new mode differs from the * old mode, any buffered data is flushed before switching to the new * mode. */ boolean setBlockDataMode(boolean mode) throws IOException { if (blkmode == mode) { return blkmode; } drain(); blkmode = mode; return !blkmode; } } |
当在block-mode写入时,会先写入到BlockData的buffer里,当buffer长度>=1024或执行 setBlockDataMode时,会在实际output中写入block header后再写入buffer数据,header有两种格式,若header后的数据<256,则占用 2byte(TC_BLOCKDATA+len),否则占用5byte(TC_BLOCKDATALONG+len),于是序列化后的对象的数据结构大致如下:
meta ..block-header(5byte)..data(1024byte)..block-header(5byte)..data(1024byte)..block-header(2byte)..data(<256byte)..block-end
其中:
TC_BLOCKDATA = (byte)0x77;
TC_ENDBLOCKDATA = (byte)0x78;
3.2.反序列化与readObject:
ObjectInputStream.readObject(buf) ->ObjectInputStream.readObject0() -> ObjectInputStream.readOrdinaryObject() ->ObjectInputStream.readExternalData(obj) -> obj.readExternal(objectInputStream) ->回调InputStream.read(byte[]) ->ObjectInputStream.read(byte[] buffer,0,buffer.length)
进一步调用了BlockDataInputStream.read(buf,off,len),关键代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
private class BlockDataInputStream extends InputStream implements DataInput { /** * Attempts to read len bytes into byte array b at offset off. Returns * the number of bytes read, or -1 if the end of stream/block data has * been reached. If copy is true, reads values into an intermediate * buffer before copying them to b (to avoid exposing a reference to * b). */ int read(byte[] b, int off, int len, boolean copy) throws IOException { if (len == 0) { return 0; } else if (blkmode) { <b>if (pos == end) { refill(); }</b> if (end < 0) { return -1; } int nread = Math.min(len, end - pos); System.arraycopy(buf, pos, b, off, nread); pos += nread; return nread; } else if (copy) { int nread = in.read(buf, 0, Math.min(len, MAX_BLOCK_SIZE)); if (nread > 0) { System.arraycopy(buf, 0, b, off, nread); } return nread; } else { return in.read(b, off, len); } } /** * Refills internal buffer buf with block data. Any data in buf at the * time of the call is considered consumed. Sets the pos, end, and * unread fields to reflect the new amount of available block data; if * the next element in the stream is not a data block, sets pos and * unread to 0 and end to -1. */ private void refill() throws IOException { try { do { pos = 0; if (unread > 0) { int n = in.read(buf, 0, Math.min(unread, MAX_BLOCK_SIZE)); if (n >= 0) { end = n; unread -= n; } else { throw new StreamCorruptedException( "unexpected EOF in middle of data block"); } } else { int n = readBlockHeader(true); if (n >= 0) { end = 0; unread = n; } else { end = -1; unread = 0; } } } while (pos == end); } catch (IOException ex) { pos = 0; end = -1; unread = 0; throw ex; } } } |
read时会读block header的数据,之后把一个block的数据加载到buffer,每次读取请求直接返回buffer里的数据,如果一次请求超过了buffer里剩余 数据的长度,那么返回buffer剩余的数据,下次调用时才会再刷新buffer,所以一次调用可能读到的数据不完整,并且后面的部分都是空 格。
4.解决
read(byte[])改为readFully(byte[])
5.为什么序列化时会有block-data:
关于block-data,java doc里的描述: http://docs.oracle.com/javase/7/docs/platform/serialization/spec/protocol.html
6.3 Stream Protocol Versions
It was necessary to make a change to the serialization stream format in JDK 1.2 that is not backwards compatible to all minor releases of JDK 1.1. To provide for cases where backwards compatibility is required, a capability has been added to indicate what PROTOCOL_VERSION to use when writing a serialization stream. The method ObjectOutputStream.useProtocolVersiontakes as a parameter the protocol version to use to write the serialization stream. The Stream Protocol Versions are as follows: ObjectStreamConstants.PROTOCOL_VERSION_1: Indicates the initial stream format. ObjectStreamConstants.PROTOCOL_VERSION_2: Indicates the new external data format. Primitive data is written in block data mode and is terminated with TC_ENDBLOCKDATA. Block data boundaries have been standardized. Primitive data written in block data mode is normalized to not exceed 1024 byte chunks. The benefit of this change was to tighten the specification of serialized data format within the stream. This change is fully backward and forward compatible.
本作品采用知识共享署名-非商业性使用 4.0 国际许可协议进行许可,转载请注明作者及原网址。
ObjectInputStream有个readFully方法,实现和你的fix差不多。
改好了