Skip to content

Incremental Deserialization

laforge49 edited this page Oct 10, 2011 · 39 revisions

Most approaches to serialization do not handle updates well, forcing you to deserialize an entire block and then reserialize it every time you make a small update. This is simply too slow when implementing a persistent store.

The answer is to deserialize only what you need, keep the serialized form of everything that was deserialized, and then reserialize only what was changed. For large blocks, the speed increase is quite significant. This was implemented in the jit package of AgileWiki5. But the problem with using the jit package is that queries should be able to share these structures but queries on jit structures are not threadsafe, as queries may deserialize more of the data. The Incremental Deserialization project then begins with a rewrite of the AgileWiki5 jit package, this time using actors.

The IncDes classes allow you to create serializable tree structures, where the objects in these structures interoperate synchronously. (All the objects in a given structure must use the same mailbox.)

Application logic, in the form of a component, can also be attached to an IncDes actor. But to do this you need to define, and register, a new factory class.

  1. IncDes actors and supported messages
    1. Items
      1. IncDesInt
      2. IncDesLong
      3. IncDesString
      4. IncDesBoolean
      5. IncDesBytes
      6. IncDesIncDes
    2. Basic Collections
      1. IncDesList --Updated in release 1.3.0
      2. IncDesNavMap --Updated in release 1.3.0
      3. IncDesNavSet
  2. Examples
    1. Hit
    2. Mashup --Updated in release 1.3.0

###Advanced IncDes

Unfortunately the jit package did not adequately address the needs of an important use case--balanced trees (b-trees), as b-tree nodes were implemented over a TreeMap. And that meant that before any data could be accessed, all the keys had to be deserialized to built that TreeMap. Rather, we should build b-tree nodes over a table of key/value pairs and do binary searches. This will allow us to access and update data more quickly.

We can also support CoW (Copy on Write) optimization. In a CoW database, the indexes used to locate application data are b-tree nodes. The problem is that indexes are updated each time the application data is updated (updates are always written to newly allocated disk space), and when you update an index, you need to update the index which references it, recursively. To reduce this overhead, small updates to indexes then should be saved in the root node of the database when they are small enough, or stored in the higher-level index nodes, with leaf index nodes being updated as a last recourse. So when you access an index node, you apply a map which provides the latest updates. This approach should significantly increase the speed of updates with small slowdown in the speed of queries--with the balance between the speed of updates and the speed of queries governed by the number of updates which are tracked before forcing a rewrite of an index node. (CoW is an important technology, as it lends itself easily to implementing a transactional database.)

Advanced IncDes will be covered in a later project.

Clone this wiki locally