Archives

Tags

Posts with tag "foss"

Gush: A stack based language eventually for genetic programming

By Christopher Allan Webber on Thu 06 April 2017

(This blogpost was going to be just about a project I'm working on Gush, but instead it's turned into a whole lot of backstory, and then a short tutorial about Gush. If you're just interested in the tutorial, skip to the bottom I guess.)

I recently wrote about possible routes for anti-abuse systems. One of the goofier routes I wrote about on there discussed genetic programming. I get the sense that few people believe I could be serious... in some ways, I'm not sure if I myself am serious. But the idea is so alluring! (And, let's be honest, entertaining!) Imagine if you had anti-abuse programs on your computer, and they're growing and evolving based on user feedback (hand-waving aside exactly what that feedback is, which might be the hardest problem), adapting to new threats somewhat invisibly from the user benefiting from them. They have a set of friends who have similar needs and concerns, and so their programs propagate and reproduce with programs in their trust network (along with their datasets, which may be taught to child programs also via a genetic program). Compelling! Would it work? I dunno.

A different, fun use case I can't get to leave my mind is genetic programs as enemy AI in roguelikes. After all, cellular automata are a class of programs frequently used to study genetic programs. And roguelikes aren't tooooooo far away from cellular automata where you beat things up. What if the roguelike adapted to you? Heck, maybe you could even collect and pit your genetic program roguelike monsters against your friends'. Roguelike Pokémon! (Except unlike in Pokémon, "evolving" actually really means "evolving".)

Speculating on the future with Lee Spector

Anyway, how did I get on this crazy kick? On the way back from LibrePlanet (which went quite well, and deserves its own blogpost), I had the good fortune to be able to meet up with Lee Spector. I had heard of Lee because my friend Bassam Kurdali works in the same building as him in Hampshire College, and Bassam had told me a few years ago about Lee's work on genetic algorithms. The system Lee Spector works on is called PushGP (Push is the stack language, and PushGP is Push used for genetic programming). Well, of course once I found out that Push was a lisp-based language, I was intrigued (hosted on lisps traditionally, but not always, and Push itself is kind of like Lisp meets Forth), and so by the time we met up I was relatively familiar with Lee's work.

Lee and I met up at the Haymarket Cafe, which is a friendly coffee shop in Northampton. I mentioned that I had just come from LibrePlanet where I had given a talk on The Lisp Machine and GNU. I was entertained that almost immediately after these words left my mouth, Lee dove into his personal experiences with lisp machines, and his longing for the kind of development experiences lisp machines gave you, which he hasn't been able to find since. That's kind of an aside from this blogpost I suppose, but it was nice that we had something immediately to connect on, including on a topic I had recently been exploring and talking about myself. Anyway, the conversation was pretty wild and wide-ranging.

I had also had the good fortune to speak to Gerald Sussman again at LibrePlanet this year (he also showed up to my talk and answered some questions for the audience). One thing I observed in talking with both Sussman and Spector is they're both very interested in thinking about where to bring computing in terms of examining biological systems, but there was a big difference in terms of their ideas; Sussman is very interested in holding machines "accountable", which seems to frequently also mean being able to examine how they came to conclusions. (You can see more of Sussman's thinking about this in the writeup I did of the first time I ever got to speak with Sussman when I was at FSF's 30th anniversary party... maybe I should try to capture some information about the most recent chat too, before it gets lost to the sands of time...) Spector, on the other hand, seems convinced that to make it to the next level of computing (and maybe even the next level of humanity), we have to be willing to give up on demanding that we can truly understand a system, and that we have to allow processes to run wild and develop into their own things. There's a tinge of Vernor Vinge's original vision of the Singularity, which is that there's a more advanced level of intelligence than humans are able to currently comprehend, and to understand it there you have to have crossed that boundary yourself, like the impossibility of seeing past the event horizon of a black hole. (Note that this is a pretty different definition of singularity from some of the current definitions of Singularity that have come since, which are more about possible outcomes of such a change rather than about the concept of an intellectual/technological event horizon itself.) That's a possible vision for the future of humanity, but it's also a vision of what maybe the right direction is for our programming too.

Of course, this vision that code may be generative in a way where the source is mostly unintelligible to us feels possibly at odds to our current understanding of software freedom, a movement which spends a lot of time talking about inspecting source code. The implications of that is a topic of interest of mine (I wrote about it in the FSF bulletin not too long ago) and I did needle Spector about it... what does copyright mean in a world where humans aren't writing software? Spector seems to acknowledge that it's a concern (he agrees that examples of seed-DRM and genetic patents in the case of Monsanto are troubling) but I got the sense that it's not his biggest interest... Spector thinks that the future of that side of software freedom and copyright might be that humanity realizes how absurd software copyright and patents and other intellectual restriction regimes are once things generate far enough. But I get the sense that more than talking about the legal/licensing aspects of the auto-generative future, Spector would rather be building it. Fair enough!

Anyway, somewhere along these lines I mentioned my interest in distributed anti-abuse systems. We talked about how more basic approaches such as Bayesian filtering might not be good enough to combat modern abuse beyond just spam especially, because the attack patterns taken change so frequently. Suddenly it hit me: I wonder whether or not genetic programs would work pretty well in a distributed system... after all, you could use your web of trust to breed the appropriate filtering programs with your friends' programs... would it work?

Anyway, on the ride back I began playing with some of Push's ideas, and (with a lot of helpful feedback from Lee Spector... thank you, Lee!) I started to put together a toy design for a language inspired by PushGP but with some properties that I think might be more applicable to an anti-abuse system that needs to keep around "memories" between generations. (Whether it's better or not, I don't really know yet.) So...

A little Gush tutorial

At this point, Gush exists. It has the stack based language down, but none of the genetic programming. Nonetheless, it's fun to hack around in, looks an awful lot like Push but also is far enough along to demonstrate its differences, and if you've never played with a stack based language before, it might be a good place to start.

Let's do some fun things. First of all, what does a Push program look like?

(1 2 + dup *)

Ok, that looks an awful lot like a lisp program, and yet not at the same time! If you've installed Gush, you can run this example:

> (use-modules (gush))
> (run '(1 2 + dup *))
$1 = (9)

Whoo, our program ran! But what happened? Gush programs operate on two primary stacks... there's an "exec" stack, which contains the program being evaluated in progress, and a "values" stack, with all the values currently built up by the program. Evaluation happens like so:

exec> '(1 2 + dup *)  ; initial exec stack
vals> '()             ; initial empty values stack
exec> '(2 + dup *)    ; [=> 1] popped off from exec stack
vals> '(1)            ; push 1 (a literal) onto values stack
exec> '(+ dup *)      ; [=> 2] popped off from exec stack
vals> '(2 1)          ; push 2 (a literal) onto values stack
exec> '(dup *)        ; [=> +] popped off of exec stack
vals> '(3)            ; apply `+', which is bound to an operation which takes
                      ;   top two numbers on the values stack and adds them,
                      ;   pushing result onto values stack... 2 + 1 = 3,
                      ;   so 2 and 1 are removed and 3 is added
exec> '(*)            ; [=> dup] popped off of exec stack
vals> '(3 3)          ; `dup' takes top item on stack and duplicates it
exec> '()             ; [=> *] popped off of exec stack
vals> '(9)            ; apply `*', which is bound to an operation which takes
                      ;   top two numbers on values stack and multiplies them,
                      ;   pushing result onto values stack... 3 * 3 = 9,
                      ;   so 3 and 3 are removed and 9 is added
                      ; Nothing left to do on exec, so we're done!

Okay, great! What else can we run?

;; Complicated arithmetic runs
> (run '(3 2 / 4 +))
$2 = (14/3)

;; We can assign variables to values and then reference them
> (run '(88 'foo define foo 22 +))
$3 = (110)

;; However, base operations "know" what types to apply for, and search
;; the stack... the string will be "skipped over" in search for
;; a number here.  This means that we can randomly generate code
;; and we won't run into type errors.
> (run '(1 "two" 3 +))
$4 = (4 "two")

;; Nested parentheses will be "unnested" and applied inline
> (run '(98 ("balloons" "red") 1 +))
$5 = (99 "red" "balloons")
;; so that's the same as
> (run '(98 "balloons" "red" 1 +))
$6 = (99 "red" "balloons")

;; Variables are actually stacks!  Which means we can build up
;; complicated operations on them...
> (run '('+ 'foo define      ; set foo to '(+)
         88 'foo var-push    ; append 88, so foo is now '(88 +)
         2 foo))             ; apply variable stack foo
$7 = (90)

;; Conditionals, etc also work.
> (run '(1 1 + 'b define  ; assign b to the value of 1 + 1
         2 b = if         ; check if b is 2
           "two b"        ; if-then clause
           "not two b"))  ; if-else clause
$8 = ("two b")

There's more to it than that, but that should get you started.

How is Gush different from Push?

Gush takes almost all its good ideas from Push, but there are two big differences.

Both Gush and Push try to avoid type errors. You can do all sorts of code mutation, and whether or not things will actually do anything useful is up for grabs, but it shouldn't crash to a halt. The way Push does it is via different stacks for each value type. This is really clever: it means that each operation applies to very specific types, and if you always know your input types carefully, you can always be safe on a type level and shouldn't have programs that unexpectedly crash (if there aren't enough values on the appropriate type stack, Push just no-ops).

But what if you want to run operations that might apply to more than one type? For example, in Gush you might do:

> (run '(1 2 / 4.5 +))
$9 = (6.5)

In Push, you'd probably do something like this:

> (run '(1 2 INTEGER./ FLOAT.FROMINTEGER 4.5 FLOAT.+))
$9 = (6.5)

(And of course, if you wanted rational numbers rather than just floats, you'd have to add another type stack to that...)

I really wanted generic methods that were able to determine what types they were able to apply to. For one thing, imagine you have a program that's doing a lot of complicated algebra... it should be able to operate on a succession of numbers without having to do type coercion and hit/miss on whether it chose the right of several typed operators, when it could just pick one operator that can apply to several items.

I also wanted to be able to add new types without much difficulty. As it is, I don't have to rewire anything to throw hash-maps into Gush:

> (run '(42 "meaning of life" make-hash hash-set))
$10 = (<hash-table>)

This would just work, no need to wire anything new up!

The way Gush does it is it uses generic operators which know how to check the predicates for each type, and which "search the stack" for values it knows it can apply. (It also no-ops if it can't find anything.) If bells and alarms are going off, you're not wrong! In Gush's current implementation, this does have the consequence that any given operation might be worst case O(n) of the size of the values stack! Owch! However, I'm not too worried. Gush checks how many operations every program takes (and has the option to bail out if a program is taking too many steps) and searching the stack after failing to match initially counts against a program. I figured that if programs are auto-generated, one fitness check can be how many steps it takes for the program to finish its computation, and so programs would be incentivized to keep the appropriate types near where they would be useful. I'm happy to say that it turns out I'm not the only one to think this; unknown to me when I started down this path, there's another Push derivative named Push-Forth which has only one stack altogether (not even separate stacks for exec and values!) and it does some similar-ish (but not quite the same) searching (or converging on a fixed point) by currying operations until the appropriate types are available. (Pretty cool stuff, but to be honest I have a hard time following the Push-Forth examples I've seen.) It comes to the same conclusion that by checking the number of steps a program takes to execute as part of its fitness, programs will be encouraged to keep types in good places anyway. However! There's more reasons to not despair; I'm relatively sure that there are some clever things that can be done with Gush's value stack so that predicate information is cached and looking for the right type can be made O(1). That has yet to be proven though. :)

The other feature Gush differentiates itself from Push is that Gush variables are stacks rather than single values. This ties in nicely with the classic Push approach that lists are unwrapped and applied to the exec stack at the time they are to be evaluated anyway, so it makes no behavioral change in the case that you just use "define" (which will always clobber the state of the stack, whether or not it exists, to replace it with a stack with a single element of the new value). But it also allows you to build up collections of information over time... or even collections of code. An individual variable can be appended to and modified as the program runs, so you could write or even modify subroutines to variables. (Code that writes code! Very lispy, but also a bit crazier because it might happen at runtime.) Push also has this feature, but it has one specific, restricted stack for it, named the CODE stack appropriately. Why have one of these stacks, when you could have an unlimited number of them?

That wasn't the original intent for having variables as values though; I only realized that you could make each variable into a kind of CODE stack later. My original intent was driven by a concern/need to be able to carry information from parent process to child process. I added a structure to Gush programs named "memories", and I figured that parent programs could "teach" their memories to child programs. So this was really just a hash table of symbols to stacks that persisted after the program ended (which, since if you use run-application you get the whole state of the program as the same immutable structure that is folded over during execution, you have that information attached to the application anyway). The idea of "memories" was that parent programs could have another program that, after spawning a child program, could "teach" the things they knew to their children (possibly either by simply copying, or more likely through a separate genetic program applied to that same data). That way a database of accrued information could be passed around from generation to generation... a type of genetic programming educational system (or folklore). So that was there, but then when I began adding variables around the same time I realized that a variable that contained a single value and which was pushed onto to the exec stack was, due to the way Push "unwraps" lists, exactly the same as if there was just that variable alone pushed onto the stack. Plus, it seemed to open up more paths by having the cool effect of having any variable be able to take on the power of Push's CODE stack. (Not to mention, removing the need for a redundant CODE stack!)

Are these really improvements? I don't know, it's hard to say without actually testing with some genetic programming examples. That part doesn't exist yet in Gush... probably I'll follow the current lead of the Push community and do mutation on the linearized Plush representation of Push code.

Anyway, I also want to give a huge thank you to Lee Spector. Lee has been really patient in answering a lot of questions, and even in the case that Gush does have some improvements, they're minor tweaks compared to the years of work and experimentation that has gone into the Push/PushGP designs.

And hey, it was a lot of fun! Not to mention, a great way to procrastinate on the things I should be working on...

Wireworld in Emacs

By Christopher Allan Webber on Fri 10 March 2017

It is a truth universally acknowledged, that a hacker under the pressure of a deadline must be in want of a distraction. So it has been with me; I've a TODO list a mountain high, and I've been especially cracking under the stress of trying to get things moving along with ActivityPub. I have a test suite to write, and it's turned out to be very hard, and this after several other deadlines in a row. I've also meant to blog about several things; say the talks I gave at FOSDEM or at ChicagoLUG. I've got a leak in my inbox that's been running for so long that the basement of my email has developed an undertow. So today, instead of getting what I knew I should be doing done, I instead went off and did something much more interesting, which is to say, I implemented Wireworld in emacs.

Wireworld in emacs screenshot

What is Wireworld? It's a cellular automaton, not unlike Conway's Game of Life. Except with Wireworld, the "cells" in play are a bit more constrained... you have a set of wires, and electrons run along them, multiply, and die out, but the paths stay the same. The rules are very simple to implement (Wikipedia says all there is to know). But you can build incredible things with it... even a fully working computer!

Anyway, like many hacks, this one appeared out of boredom/distraction. I had long wanted to play with Wireworld, and I was reminded of it by seeing this cool hack with a digital clock implemented in Conway's Game of Life. It reminded me just how much I wanted to try implementing that computer, or even much simpler circuitry, but I had never been able to get started, because I couldn't find a working implementation that was easy for me to package. (I started packaging Golly for Guix but got stuck for reasons I can't remember.) I started thinking about how much I liked typing out ASCII art in Emacs, and how cool would it be to just "draw out" circuits in a buffer? I started experimenting... and within two hours, I had a working implementation! Two more hours later, I had a major mode with syntax highlighting and a handy C-c C-c keybinding for "advancing" the buffer. Live hacking in Emacs is amazing!

More could be done. It would be nice to have a shortcut, say C-c C-s, that starts up a simulation in a new buffer and runs through the simulation automatically without clobbering your main buffer. (It could work the way M-x life does.) Anyway, the code is here should you want to play around.

Happy (circuit) hacking!

Hash tables are easy (in Guile)

By Christopher Allan Webber on Mon 09 November 2015

As a programmer, I use hash tables of varying kinds pretty much all day, every day. But one of the odd and embarrassing parts of being a community-trained programmer is that I've never actually implemented one. Eek! Well, today I pulled an algorithms book off the shelf and decided to see how long it would take me to implement their simplest example in Guile. It turns out that it takes less than 25 lines of code to implement a basic hash table with O(1) best time, O(1) average time, and O(n) worst case time. The worst case won't be too common depending on how we size things so this isn't so bad, but we'll get into that as we go along.

Here's the code:

;;; Simple hash table implementation -- (C) 2015 Christopher Allan Webber
;;; Released under the "Any Free License 2015-11-05", whose terms are the following:
;;;   This code is released under any of the free software licenses listed on
;;;     https://www.gnu.org/licenses/license-list.html
;;;   which for archival purposes is
;;;     https://web.archive.org/web/20151105070140/http://www.gnu.org/licenses/license-list.html

(use-modules (srfi srfi-1))

(define (make-dumbhash size)
  "Make a dumb hash table: an array of buckets"
  (make-array '() size))

(define* (dumbhash-ref dumbhash key #:optional (default #f))
  "Pull a value out of a dumbhash"
  (let* ((hashed-key (hash key (array-length dumbhash)))
         (bucket (array-ref dumbhash hashed-key)))
    (or (find (lambda (x) (equal? (car x) key))
              bucket)
        default)))

(define (dumbhash-set! dumbhash key val)
  "Set a value in a dumbhash"
  (let* ((hashed-key (hash key (array-length dumbhash)))
         (bucket (array-ref dumbhash hashed-key)))
    ;; Only act if it's not already a member
    (if (not (find (lambda (x) (equal? (car x) key))
                   bucket))
        (array-set! dumbhash
                    ;; extend the bucket with the key-val pair
                    (cons (cons key val) bucket)
                    hashed-key))))

You might even notice that some of these lines are shared between dumbhash-ref and dumbhash-set!, so this could be even shorter. As-is, sans comments and docstrings, it's a mere 17 lines. That's nothing.

We also cheated a little: we're using hash and equal? to generate a hash and to test for equality, which are arguably the hard parts of the job. But these are provided by Guile, and it's one less thing to worry about. Here's a brief demonstration though:

(equal? 'a 'a)               ;; => #t, or true
(equal? 'a 'b)               ;; => #f, or false
(equal? "same" "same")       ;; => #t
(equal? "same" "different")  ;; => #f
(hash "foo" 10)              ;; => 6
(hash 'bar 10)               ;; => 5

equal? is self-explanatory. The important thing to know about hash is that it'll pick a hash value for a key (the first parameter) for a hash table of some size (the second parameter).

So let's jump into an example. make-dumbhash is pretty simple. It just creates an array of whatever size we pass into it. Let's make a simple hash now:

scheme@(guile-user)> (define our-hash (make-dumbhash 8))
scheme@(guile-user)> our-hash
$39 = #(() () () () () () () ())

This literally made an array of 8 items which easy start out with the empty list as its value (that's nil for you common lispers). (You can ignore the $39 part, which may be different when you try this; Guile's REPL lets you refer to previous results at your prompt by number for fast & experimental hacking.)

So our implementation of hash tables is of fixed size, which doesn't limit the number of items we put into it, since buckets can contain multiple values in case of collision (and collisions tend to happen a lot in hash tables, and we come prepared for that), but this does mean we have an existing guess of about how many buckets we need for efficiency. (Resizing hash tables is left as an exercise for the reader.) Our hash table also uses simple linked lists for its buckets, which isn't too uncommon as it turns out.

Let's put something in the hash table. Animal noises are fun, so:

scheme@(guile-user)> (dumbhash-set! our-hash 'monkey 'ooh-ooh)
scheme@(guile-user)> our-hash
$40 = #(() () () ((monkey . ooh-ooh)) () () () ())

The monkey was appended to the third bucket. This makes sense, because the hash of monkey for size 8 is 3:

scheme@(guile-user)> (hash 'monkey 8)
$41 = 3

We can get back the monkey:

scheme@(guile-user)> (dumbhash-ref our-hash 'monkey)
$42 = (monkey . ooh-ooh)

We've set this up so that it returns a pair when we get a result, but if we try to access something that's not there, we get #f instead of a pair, unless we set a default value:

scheme@(guile-user)> (dumbhash-ref our-hash 'chameleon)
$43 = #f
scheme@(guile-user)> (dumbhash-ref our-hash 'chameleon 'not-here-yo)
$44 = not-here-yo

So let's try adding some more things to our-hash:

scheme@(guile-user)> (dumbhash-set! our-hash 'cat 'meow)
scheme@(guile-user)> (dumbhash-set! our-hash 'dog 'woof)
scheme@(guile-user)> (dumbhash-set! our-hash 'rat 'squeak)
scheme@(guile-user)> (dumbhash-set! our-hash 'horse 'neigh)
scheme@(guile-user)> ,pp our-hash
$45 = #(()
        ((horse . neigh))
        ()
        ((rat . squeak) (monkey . ooh-ooh))
        ((cat . meow))
        ()
        ((dog . woof))
        ())

(,pp is a shortcut to pretty-print something at the REPL, and I've taken the liberty of doing some extra alignment of its output for clarity.)

So we can see we have a collision in here, but it's no problem. Both rat and monkey are in the same bucket, but when we do a lookup of a hashtable in our implementation, we get a list back, and we search to see if that's in there.

We can figure out why this is O(1) average / best time, but O(n) worst time. Assume we made a hash table of the same size as the number of items we put in... assuming our hash procedure gives pretty good distribution, most of these things will end up in an empty bucket, and if they end up colliding with another item (as the rat and monkey did), no big deal, they're in a list. Even though linked lists are of O(n) complexity to traverse, assuming a properly sized hash table, most buckets don't contain any or many items. There's no guarantee of this though... it's entirely possible that we could have a table where all the entries end up in the same bucket. Luckily, given a reasonably sized hash table, this is unlikely. Of course, if we ended up making a hash table that started out with 8 buckets, and then we added 88 entries... collisions are guaranteed in that case. But I already said resizing hash tables is an exercise for the reader. :)

If you're familiar enough with any Scheme (or probably any other Lisp), reading dumbhash-ref and dumbhash-set! should be pretty self-explanatory. If not, go read an introductory Scheme tutorial, and come back! (Relatedly, I think there aren't many good introductory Guile tutorials... I have some ideas though!

What lessons are there to be learned from this post? One might be that Guile is a pretty dang nice hacking environment, which is true! Another might be that it's amazing how far I've gotten in my career without ever writing a hash table, which is also true! But the lesson I'd actually like to convey is: most of these topics are not as unapproachable as they seem. I had a long-time fear that I would never understand such code until I took the time to actually sit down and attempt to write it.

As an additional exercise for the reader, here's a puzzle: is the Any Free License this code released under actually a free license? And what commentary, if any, might the author be making? :)

Activipy v0.1 released!

By Christopher Allan Webber on Tue 03 November 2015

Hello all! I'm excited to announce v0.1 of Activipy. This is a new library targeting ActivityStreams 2.0.

If you're interested in building and expressing the information of a web application which contains social networking features, Activipy may be a great place to start.

Some things I think are interesting about Activipy:
  • It wraps ActivityStreams documents in pythonic style objects
  • Has a nice and extensible method dispatch system that even works well with ActivityStreams/json-ld's composite types.
  • It has an "Environment" feature: different applications might need to represent different vocabularies or extensions, and also might need to hook up entirely different sets of objects.
  • It hits a good middle ground in keeping things simple, until you need complexity. Everything's "just json", until you need to get into extension-land, in which case json-ld features are introduced. (Under the hood, that's always been there, but users don't necessarily need to understand json-ld to work with it.)
  • Good docs! I think! Or I worked really hard on them, at least!

As you may have guessed, this has a lot to do with our work on federation and the Social Working Group. I intend to build some interesting things on top of this myself.

In the meanwhile, I spent a lot of time on the docs, so I hope you find reading them to be enjoyable, and maybe you can build something neat with it? If you do, I'd love to hear about it!

Hitchhiker's guide to data formats

By Christopher Allan Webber on Wed 21 October 2015

Just thinking out loud this morning on what data formats there are and how they work with the world:

  • XML: 2000's hippest technology. Combines a clear, parsable tree based syntax with extension mechanisms and a schema system. Still moderately popular, though not as it once was. Tons of tooling. Many seem to think the tooling makes it overly complex, and JSON has taken over much of its place. Has the advantage of unambiguity over vanilla JSON, if you know how to use it right, but more effort to work with.
  • SGML: XML's soupier grandmother. Influential.
  • HTML: Kind of like SGML and XML but for some specific data. Too bad XHTML never fulfilled its dream. Without XHTML, it's even soupier than SGML, but there's enough tooling for soup-processing that most developers don't worry about it.
  • JSON: Also tree-based, but keeps things minimal, just your basic types. Loved by web developers everywhere. Also ambiguous since on its own, it's schema-free... this may lead to conflicts between applications. But if you know the source and the destination perfectly it's fine. Has the advantage of transforming into basic types in pretty much every language and widespread tooling. (Don't be evil about being evil, though? #vaguejokes) If you want to send JSON between a lot of locations and want to be unambiguous in your meaning, or if you want more than just the basic types provided, you're going to need something more... we'll come to that in a bit.
  • S-expressions: the language of lisp, and lispers claim you can represent anything as s-expressions, which is true, but also that's kind of ambiguous on its own. Capable also of representing code just as well, which is why lispers claim benefits of symmetry and "code that can write code". However, serializing "pure data" is also perfectly possible with s-expressions. So many variations between languages though... it's more of a "generalized family" or even better, a pattern, of data (and code) formats. Some damn cool representations of some of these other formats via sexps. Some people get scared away by all the parens, though, which is too bad, because (though this strays into code + data, not just data) homoiconicity can't be beat. (Maybe Wisp can help there?)
  • Canonical s-expressions: S-expressions, with a canonical representation... cool! Most developers don't know about it, but was designed for public key cryptography usage, and still actively used there (libgcrypt uses canonical s-expressions under the hood, for instance). No schema system, and actually pretty much just lists and binary strings, but the binary strings can be marked with "display hints" so systems can know how to unpack the data into appropriate types.
  • RDF and friends: The "unicode" of graph-oriented data. Not a serialization itself, but a specification on the conceptual modeling of data, and you'll hear "linked data" people talking about it a lot. A graph of "subject, predicate, object" triples. Pretty cool once you learn what it is, though the introductory material is really overwhelming. (Also, good luck representing ordered lists). However, there is no one serialization of RDF, which leads to much confusion among many developers (including myself, while being explained to the contrary, for a long time). For example, rdf/xml looks like XML, but woe be upon ye who uses XML tooling upon it. So, deserialzie to RDF, then deal with RDF in RDF land, then serialize again... that's the way to go with RDF. Has more sane formats than just rdf/xml, for example Turtle is easy to read. RDF community seems to get mad when you want to interpret data as anything other than RDF, which can be very off-putting, though the goal of a "platonic form" of data is highly admirable. That said, graph based tooling is definitely harder for most developers to work with than tree-based tooling, but hopefully "the jQuery of RDF" library will become available some day, and things will be easier. Interesting stuff to learn, anyway!
  • json-ld: A "linked data format", technically can transform itself into RDF, but unlike other forms of RDF syntax, can often be parsed just on its own as simple JSON. So, say you want to have JSON and keep things easy for most of your users who just use their favorite interpreted language to extract key value pairs from your API. Okay, no problem for them! But suddenly you're also consuming JSON from multiple origins, and one of them uses "run" to say "run a mile" whereas your system uses "run" to mean "run a program". How do you tell these apart? With json-ld you can "expand" a JSON representation with supplied context to an unambiguous form, and you can "compact" it down again to the terms you know and understand in your system, leaving out those you don't. No more executing a program for a mile!
  • Microformats and RDFa: Two communities which are notoriously and exasperatingly at odds with each other for over a decade, so why do I link them together? Well, both of these take the same approach of embedding data in HTML. Great when you have HTML for your data to go with, though not all data needs an HTML wrapper. But it's good to be able to extract it! RDFa simply extracts to RDF, which we've discussed plenty; Microformats extracts to its own thing. Frequent form of contention between these groups is about vocabulary, and how to represent vocabulary. RDFa people like their vocabulary to have canonical URIs for each term (well, that's an RDF thing, so not surprising), Microformats people like to document everything in a wiki. Arguments about extensibility is a frequent topic... if you want to get into that, see Amy Guy's summary of things.

Of course, there's more data formats than that. Heck, even on top of these data formats there's a lot more out there (these days I spend a lot of time working on ActivityStreams 2.0 related tooling, which is just JSON with a specific structure, until you want to get fancier, add extensions, or jump into linked data land, in which case you can process it as json-ld). And maybe you'd also find stuff like Cap'n Proto or Protocol Buffers to be interesting. But the above are the formats that, today, I think are generally most interesting or impactful upon my day to day work. I hope this guide was interesting to you!