Since we intend to favor streaming parsing, we need to consider a format suited for streaming.
Strings + Lazy parsing
One of the problems we are going to encounter is the combination of strings and lazy parsing:
- consider two independent lazy functions
foo and bar, where bar is somewhere further down the stream from foo;
- assume that
foo defines a literal string s that does not show up in our AOT dictionary;
- how should
bar refer to s in such a way that we do not first need to parse foo?
One way to do this is the following:
- divide the stream in packets;
- each packet starts with a table of strings, which may now used by every packet further down the line.
If we do so, the packet containing foo will define literal string s. The packet containing bar will either be the same packet or a packet further down the line, and will be able to access s.
As a bonus, this will let us compress these strings table using a well-known algorithm, such as brotli.
Model State + Lazy Parsing
We will need to adapt our models to restart from a well-specified state whenever parsing a lazy function.
(TBD)
Offsets + Entropy + Streaming
We need the ability to tell the decoder where to fetch a lazy function. In non-entropy-coding versions, we could reference the actual offset at which a lazy function was encoded. With entropy coding, offsets make no sense.
A partial solution would be the following:
- each packet may contain a number of (aligned) lazy declarations;
- each packet's header declares the lazy declarations included in this packet (as keys, actual value of the key is an arbitrary string), with their starting-offset-in-packet;
- when encoding a
[lazy] field, we specify the key at which to find the content of the field;
- note that a lazy declaration could span over several packets.
Since we intend to favor streaming parsing, we need to consider a format suited for streaming.
Strings + Lazy parsing
One of the problems we are going to encounter is the combination of strings and lazy parsing:
fooandbar, wherebaris somewhere further down the stream fromfoo;foodefines a literal stringsthat does not show up in our AOT dictionary;barrefer tosin such a way that we do not first need to parsefoo?One way to do this is the following:
If we do so, the packet containing
foowill define literal strings. The packet containingbarwill either be the same packet or a packet further down the line, and will be able to accesss.As a bonus, this will let us compress these strings table using a well-known algorithm, such as brotli.
Model State + Lazy Parsing
We will need to adapt our models to restart from a well-specified state whenever parsing a lazy function.
(TBD)
Offsets + Entropy + Streaming
We need the ability to tell the decoder where to fetch a lazy function. In non-entropy-coding versions, we could reference the actual offset at which a lazy function was encoded. With entropy coding, offsets make no sense.
A partial solution would be the following:
[lazy]field, we specify the key at which to find the content of the field;