Preserves: a tutorial
Table of Contents
This document, like Preserves itself, is released under version 2.0 of the Apache license.
1 Overview
Preserves is a serialization system which supplies both a human-readable textual and efficient binary syntax; converting between the two is straightforward. Preserves' human readable syntax is easy to read and should be mostly familiar if you already know systems like JSON. However, Preserves is more precisely specified than JSON, and also has a clean extension mechanism.
This document is a tutorial; it does not get into all the details of Preserves. For that, see the Preserves specification.
2 Preserves basics
2.1 Starting with the familiar
If you're familiar with JSON, Preserves looks fairly similar:
{"name": "Missy Rose", "species": "Felis Catus", "age": 13, "foods": ["kibble", "cat treats", "tinned meat"]}
Preserves also has something we can use for debugging/development information called "annotations"; they aren't actually read in as data but we can use them for comments. (They can also be used for other development tools and are not restricted to strings; more on this later, but for now interpret them as comments.)
@"I'm an annotation... basically a comment. Ignore me!" "I'm data! Don't ignore me!"
Preserves supports some data types you're probably already familiar with from JSON, and which look fairly similar in the textual format:
@"booleans" #true #false @"various kinds of numbers:" 42 123556789012345678901234567890 -10 13.5 @"strings" "I'm feeling stringy!" @"sequences (lists)" ["cat", "dog", "mouse", "goldfish"] @"dictionaries (hashmaps)" {"cat": "meow", "dog": "woof", "goldfish": "glub glub", "mouse": "squeak"}
2.2 Going beyond JSON
We can observe a few differences from JSON already; it's possible to express numbers of arbitrary length in Preserves, and booleans look a little bit different. A few more interesting differences:
@"Preserves treats commas as whitespace, so these are the same" ["cat", "dog", "mouse", "goldfish"] ["cat" "dog" "mouse" "goldfish"] @"We can use anything as keys in dictionaries, not just strings" {1: "the loneliest number", ["why", "was", 6, "afraid", "of", 7]: "because 7 8 9", {"dictionaries": "as keys???"}: "well, why not?"}
Preserves technically provides a few types of numbers:
@"Signed Integers" 42 -42 5907212309572059846509324862304968273468909473609826340 -5907212309572059846509324862304968273468909473609826340 @"Floats (Single-precision IEEE floats) (notice the trailing f)" 3.1415927f @"Doubles (Double-precision IEEE floats)" 3.141592653589793
Preserves also provides some types that don't come in JSON.
Symbols
are fairly interesting; they look a lot like strings but
really aren't meant to represent text as much as they are, well… a
symbolic name.
Often they're meant to be used for something that has symbolic importance
to the program, but not textual importance (other than to guide the
programmer… not unlike variable names).
@"A symbol (NOT a string!)" JustASymbol @"You can do mixedCase or CamelCase too of course, pick your poison" @"(but be consistent, for the sake of your collaborators!" iAmASymbol i-am-a-symbol @"A list of symbols" [GET, PUT, POST, DELETE] @"A symbol with spaces in it" |this is just one symbol believe it or not|
We can also add binary data, aka ByteStrings:
@"Some binary data, base64 encoded" #base64{cGljdHVyZSBvZiBhIGNhdA==} @"Some other binary data, hexadecimal encoded" #hex{616263} @"Same binary data as above, base64 encoded" #base64{YWJj}
What's neat about this is that we don't have to "pay the cost" of base64 or hexadecimal encoding when we serialize this data to binary; the length of the binary data is the length of the binary data.
Conveniently, Preserves also includes Sets, which are collections of unique elements where ordering of items is unimportant.
#set{flour, salt, water}
2.3 Total ordering and canonicalization
This is a good time to mention that even though from a semantic perspective sets and dictionaries do not carry information about the ordering of their elements (and Preserves doesn't care what order we enter them in for our hand-written-as-text Preserves documents), Preserves has a well-defined "total ordering".
FULL WARNING: the following claim is not implemented yet. :) Coming soon!
Based on this total ordering, Preserves provides support for canonical ordering when serializing; in this mode, Preserves will always write out the elements in the same order, every time. When combined with binary serialization, this is Preserves' "canonical form". This is important and useful for many contexts, but especially for cryptographic signatures.
@"This hand-typed Preserves document..." {monkey: {"noise": "ooh-ooh", "eats": #set{"bananas", "berries"}} cat: {"noise": "meow", "eats": #set{"kibble", "cat treats", "tinned meat"}}} @"Will always, always be written out in this order when canonicalized:" {cat: {"eats": #set{"cat treats", "kibble", "tinned meat"}, "noise": "meow"} monkey: {"eats": #set{"bananas", "berries"}, "noise": "ooh-ooh"}}
This is a bit more expensive than normal serialization (because sorting needs to occur), but is still quite fast in general.
2.4 Defining our own types using Records
Finally, there is one more type that Preserves provides… but in a
sense, it's a meta-type.
Record
objects have a tag and a series of arguments (or "slots").
For example, we can make a Date
record:
<Date 2019 8 15>
In this example, the Date
tag is a symbol; 2019, 8, and 15 are the
year, month, and date slots respectively.
Why do we care about this? We could instead just decide to encode our date data in a string, like "2019-08-15". A document using such a date structure might look like so:
{"name": "Gregor Samsa", "description": "humanoid trapped in an insect body", "born": "1915-10-04"}
Unfortunately, say our boss comes along and tells us that the people doing data entry have complained that it isn't always possible to get an exact date. They would like to be able to type in what they know if they don't know the date exactly.
This causes a problem. Now we might have two kinds of entries:
@"Exact date known" {"name": "Gregor Samsa", "description": "humanoid trapped in an insect body", "born": "1915-10-04"} @"Not sure about exact date..." {"name": "Gregor Samsa", "description": "humanoid trapped in an insect body", "born": "Sometime in October 1915? Or was that when he became an insect?"}
This is a mess. We could just try parsing a regular expression to see if it "looks like a date", but doing this kind of thing is prone to errors and weird edge cases. No, it's better to be able to have a separate type:
@"Exact date known" {"name": "Gregor Samsa", "description": "humanoid trapped in an insect body", "born": <Date 1915 10 04>} @"Not sure about exact date..." {"name": "Gregor Samsa", "description": "humanoid trapped in an insect body", "born": "Sometime in October 1915? Or was that when he became an insect?"}
Now we can distinguish the two.
We can make as many Record types as our program need, though it is up
to our program to make sense of what these mean.
Since Preserves does not specify the Date
itself, both the program
(or person) writing the Preserves document and the program reading it
need to have a mutual understanding of how many slots it has and what
the meaning the tag signifies for it to be of use.
Still, there are plenty of interesting tags we can define. Here is one for an "iri", a hyperlink:
<iri "https://dustycloud.org/blog/">
That's nice enough, but here's another interesting detail… tags on Records are usually symbols but aren't necessarily so. They can also be strings or numbers or even dictionaries. And very interestingly, they can also be other records:
<<iri "https://www.w3.org/ns/activitystreams#Note"> {"to": [<iri "https://chatty.example/ben/">], "attributedTo": <iri "https://social.example/alyssa/">, "content": "Say, did you finish reading that book I lent you?"}>
Do you see it? This Record's tag is… an iri Record! The link here points to a more precise term saying that "this is a note meant to be sent around in social networks". It is considerably more precise than just using the string or symbol "Note", which could be ambiguous. (A social networking note? A footnote? A music note? While not all systems need this, this (partial) example hints at how Preserves can also be used to coordinate meaning in larger, more decentralized systems.
Likewise, it is also possible to annotate records with integers. A system could use this to reduce the redundancy cost of tags sent over the wire by indexing tags and substituting them after reading the structures.
@"The ordered index of tags for this session" [Employee, Role, Date] @"We could then transform this structure..." #set{<Employee @"employee name" "Alyssa P. Hacker" @"employee roles" #set{<Role Programmer>, <Role Manager>}, @"when hired" <Date 2018, 1, 24>>, <Employee @"employee name" "Ben Bitdiddle" @"employee roles" #set{<Role Programmer>}, @"when hired" <Date 2019, 2, 13>>} @"... to this structure, which in binary is 91 as opposed to 127 bytes" #set{<0 @"employee name" "Alyssa P. Hacker" @"employee roles" #set{<1 Programmer>, <1 Manager>}, @"when hired" <2 2018, 1, 24>>, <0 @"employee name" "Ben Bitdiddle" @"employee roles" #set{<1 Programmer>}, @"when hired" <2 2019, 2, 13>>}
Even in this trivial example, this is a 25% reduction in the binary size. Even though tooling to do this does not come out of the box in Preserves, the fact that Record tags can be anything makes it possible to build this or any such appropriate structure.
2.5 Annotations
Annotations are not strictly a necessary feature, but they are useful in some circumstances. We have previously shown them used as comments:
@"I'm a comment!" "I am not a comment, I am data!"
Annotations annotate the values the precede. It is possible to have multiple annotations on a value.
@"I am annotating this number" @"And so am I!" 42
As said, annotations are not really data. They are merely meant for development tooling or debugging. You have to explicitly ask for them when reading, and they wrap all the values.
So what's the point of them then? If annotations were just for comments, there would be indeed hardly point at all… it would be simpler to just provide a comment syntax.
However, annotations can be used for more than just comments. They can also be used for debugging or other development-tool-oriented data. For instance, here is some data game data annotated with who the "project owner" is of each object.
<NpcCatalog "Monsters" #set{@<ProjectLead Alyssa> {name: "Ogre", spriteSheet: #base64{T2dyZSBzcHJpdGVzIGdvIGhlcmU=}, attributes: #set{biped, brute, rage, clumsy}}, @<ProjectLead Ben> {name: "Jackal", spriteSheet: #base64{V2l0Y2ggc3ByaXRlcyBnbyBoZXJl}, attributes: #set{quadruped, swift, pack-animal, weak}}}>
Each monster descrived in the set is annotated with a ProjectLead
record.
While useful information used by the game company's organization
system, it doesn't particularly matter when reading in the data
just as code.
3 Conclusions
We've covered the broad strokes of Preserves, but not everything that is possible with it. We leave it as an exercise to the reader to try reading these examples into their languages (several libraries exist already) and writing them out as binary objects.
But as we've seen, Preserves is a flexible system which comes with well-defined, carefully specified built-in types, as well as a meta-type which can be used as an extension point.
Happy preserving!