This article is long, and it's taken me a long time to get to. I procrastinated enough that the only way I could do it was to make it as [social network] [threads] first, and this is a compilation and expansion of those ideas. But still, this post is also too long, so here's a tl;dr (too long; didn't read) summary: • Spritely Goblins has made significant headway with CapTP, a design referred to as the "Capability Transport Protocol" • CapTP is not original to Spritely, but Spritely is making some improvements; various implementations have been around for over two decades • CapTP enables secure, distributed object capability programming in mutually suspicious networks • CapTP reduces the work on writing "protocols" to merely thinking about writing ordinary programs • CapTP has cool features like "distributed acyclic garbage collection" and "promise pipelining" • We plan a general version of CapTP which can be used across many different languages (and we're already talking with the Agoric folks about interoperability with their Javascript ecosystem); a ways out, but on the horizon • Nearly everything cool that comes out of Spritely in the future will use CapTP as part of its foundation Okay, now if you want the verbose version, read on. # WHITEBOARD IMAGE HERE For a while I've had the following whiteboard hovering around my desk in my office. It has unintentionally served as a fun conversation starter in some video calls, but its real purpose was to help me think through some difficult problems in implementation. If you've been following me on the fediverse or on Twitter you probably have a guess as to what this is, because I've been saying it over and over again: this diagram represents [CapTP]. I've been making major strides lately in the CapTP implementation in [Spritely Goblins] and posting about it as I go. I've even made wild claims like "I think CapTP is the most important work I've done yet in my life" (yes, in the long run, I think it will be a bigger deal than [ActivityPub]'s standardization work, which I am very proud of, but the two are not at odds; the plan for Spritely is that the two [live side by side]). But while I've made it very clear that /I am/ excited, I haven't done a very good job of explaining why /others/ should be excited. So the real question is: what does CapTP do? Or more importantly, what does CapTP enable us to do? I don't think I've done a good job of explaining this so far, so this blogpost is an attempt to do better. Admittedly, even the best attempt here might not succeed until people get to use it; I used to have /the hardest/ time explaining to anyone what ActivityPub was, even though I use mostly the same language I do now, and suddenly when it started gaining major adoption it's as if everyone I was talking to got it and my life became much easier. I suspect a similar thing will happen with CapTP… at the moment, a lot of what I am going to say will sound abstract and maybe even barely believable. As it turns out, simply [talking at people] is a rather horrible way to get people to understand concepts, but [showing people] is another matter. A [small demo] does exist, but the big CapTP demo is to come. Nonetheless, this blogpost will attempt to explain in plain (or plain-ish) language what CapTP can enable us to do. Let me speak at a high level first. Spritely makes the claim that we are aiming for /the ability to/ build such rich things as distributed virtual worlds. That requirement is not there just because distributed virtual worlds are fun things to work on or use (though they are), but because right now the social and computing systems we have tend to be /insufficiently/ capable of performing the kind of rich and secure interactions that this goalpost requires. CapTP becomes the foundation for making this and many of the other spaces Spritely aims to work in. But I'm already multiple paragraphs in with no explanation yet, so here we go. Are you ready? CapTP is a protocol to allow "distributed object programming over mutually suspicious networks, meant to be combined with an object capability style of programming". Wow, clear as mud, right? There's a lot to unpack there! So let's break it apart piece by piece… First of all, "distributed object programming": Yes, CapTP allows you to program across networks with the level of convenience as if you were programming against /local objects/. (Throw out [your assumptions of "objects" as OOP], this can be functional; Spritely Goblins more or less is.) This is done by creating local proxies representing remote objects that the programmer can operate against. This has been done wrong many times in the past (eg the NeXT model); doing this right is the result of [significant research]. But the result is significant programmer ergonomics in building distributed systems. Next, "mutually suspicious networks": there's no assumption that trust exists on server-boundaries… CapTP is built to allow collaboration /without/ full trust. Curiously, this approach allows for /increased/ collaboration and building of more trust; collaboration is more consensual. This is no small matter. To draw parallels to non-computing life, I feel safer knowing I do not need to trust all people equally and with the same things in my own life… it is important to permit building the /appropriate/ level of trust, rather than an /absolute/ level of trust, in all parties. We do this in our daily lives, but our computing systems are generally not privy to all of our thoughts (that too might result in trust violations) and yet must act on our behalf. The ability to scope the amount of trust permitted means living a life of greater collaboration, less paranoia, and less distrust. The decision to not assume the need for trust on machine/server boundaries may also seem surprising, but is important. If you've ever tried to configure CORS, you'll be aware of how hard and error-prone this is. Even the most advanced security architects find themselves frequently making mistakes in this area. But making decisions based on node-boundary seems like a strange system if we think too long about it anyway. In general (though it often requires much social un-conditioning), I try to not evaluate trust boundaries where I treat members of one nation-state the same. Similarly, there are many households where I trust its members to varying degrees and with different things. So the machine boundary trust seems like a poor indicator. It is even more poor when we examine the needs of fully peer to peer systems. Server-boundary-oriented-systems with only a few, small number of trusted servers barely scales in the post-web-2.0 increasing consolidation of the web to just a few service providers. They cannot stand up when making new nodes is extremely trivial. So, nodes are mutually suspicious and do not hand out access to each other simply because they happen to be on trusted lists of server identities. So how is authority handed out? This is where "combined with object capability style of programming" comes in: this combination is where the power really comes out. *Safe, cooperative interaction* is very /easy/ in the ocap style: it turns out capability flows can be encoded as normal programming: argument passing and scope! This is the fundamental observation of [A Security Kernel Based on the Lambda Calculus]; if we take our models of programming security, within them is the best security model we have, and the easiest for programmers to reason about. CapTP takes that observation and applies it on the network level. A nicely implemented CapTP system will abstract this for programmers so they can focus on the programming part. It wasn't handed to you? Then it's not in your scope and you can't access it. This simplifies program construction dramatically. Recently [I did a 250 line client/server p2p chat "protocol"] (well, 250 lines for the protocol, a mere 300 lines more for the GUI), but I didn't really have to think about the protocol at all. In fact I designed it locally first, in one process; it "automatically" worked over the network, but that's because CapTP took care of the network considerations for me. By contrast, in most programming systems, an /enormous/ amount of time is spent on protocol design and APIs which tend to be bespoke and disconnected mostly from the actual implementation. They also tend to be made of many moving parts which are hard to reason about. We know that building good abstractions can lead to significant gains in programmer productivity; TCP and TLS are clear examples of this. CapTP, when combined with object capability security systems, brings a similar type of abstraction gain; the more tedious parts of protocols are handled in a general way, and we can focus on the specific ways our programs work and need to communicate. In general this will often correspond to what we would have put at the API perimeter anyway, but now we need less confusing wiring to do it. Okay, if you've really read this far already, time for an intermission. Spritely has not invented CapTP (though it is helping in some of the innovations happening which have been planned for this generation). The idea is somewhere around a two and a half decades ago and was part of the [E programming language] (which I often jokingly call "the most interesting programming language you've never heard of"). E actually came out of another distributed virtual worlds system of the late 90s, [Electric Communities Habitat]. Even though EC Habitat did not make it out of the dot-com crash, E did, and lived on as an open source project. (It's no exaggeration to say that the vast majority of the exploration space Spritely is exploring comes out of work that was trailblazed most especially by E; I can't recommend [Mark Miller's dissertation] enough.) In-between then and now, CapTP has seen several variants (maybe the most famous of which is [Cap'N Proto], which in some ways I think of as a mostly-CapTP for people who don't know [they're using a CapTP], and whose [rpc.capnp] was of enormous help to me learning how CapTP works). Another funny thing happened in-between E's CapTP and now: most of the E and ocap folks joined Javascript's standardization efforts and over the course of the last decade and a half or so have helped beat it into a suitable shape to finally also achieve the distributed object dream. Most of those folks have gone on to start an organization named [Agoric] (whose namesake goes back go [The Agoric Papers] which laid out the vision for all this work all the way back in 1998 (holy cow!)) which is just now bringing that dream of distributed ocap networks to Javascript land and beyond (with a bit more focus on economic systems in contrast to Spritely's focus on social networks). Which may lead you to ask: shouldn't Agoric and Spritely's CapTPs interoperate? I'm happy to say: [we're already talking, and that's the plan]. In fact, part of my ([documented, but long]) process of learning CapTP I submitted a [PR adding some comments to Agoric's implementation] since I was reading and trying to make sense of it anyway (happily, it was merged). It's a long-term high priority (but not an urgent short-term priority) for both sides to implement the same protocol. When this happens, it won't matter whether you're using Lisp'y Spritely code or Javascript'y Agoric code; you should be able to do distributed object programming in each and both should interoperate happily. Okay, back to the cool features that CapTP provides. CapTP is very /efficient:/ you may have used ocap systems that have huge certificates or long URIs. In CapTP a shared capability is merely a /bidirectional integer assignment/ between the machine importing and the machine exporting! (But users need not be aware of this, since again, a well designed CapTP interface encapsulates the underlying semantics in the same way that good TLS and TCP libraries encapsulate the layers of encrypted and ordered network connections.) CapTP also has distributed acyclic garbage collection. That means that two servers can collaborate to say "oh yeah, thanks for giving me that object, but you don't need to hold onto it any more on my behalf." Wow! (The original Electric Communities proto-CapTP even could handle [collecting cycles that span machines]; this seems to require more significant support from the underlying language runtime than most support. This turns out to be rarely needed anyway and is definitely a deeper rabbit hole than the already deep rabbit holes this blogpost has gone down so we will save that for another time.) Why should you care about distributed acyclic GC though? Let's put it another way: imagine you were building some sort of distributed role playing game. Your players are regularly fighting bats, which are cheap enemies that generally don't stick around long. It doesn't take long for your system to be bogged down by bat corpses! CapTP helps solve this problem by allowing servers to cooperatively know when they no longer need object references held on their behalf. (Before you start asking about non-cooperative scenarios, there are abstraction layers for that too, but we won't worry about those in this particular post.) This is a big win that few other protocols provide, but which is cheap and efficient under CapTP. CapTP also has ["promise pipelining"], which reduces round trips. I can send a message to a remote car factory and ask it to drive the car once it makes it, even before I've been told the car is made! (Spritely, Agoric, and even E all have tooling that makes this look like a "natural" code flow as well.) To quote [Mark Miller's dissertation]: Machines grow faster and memories grow larger. But the speed of light is constant and New York is not getting any closer to Tokyo. As hardware continues to improve, the latency barrier between distant machines will increasingly dominate the performance of distributed computation. All in all, this reduces the amount of work for rich, networked collaborations with safety properties we can reason about from something which only protocol hyper-experts are deemed worthy to consider, to something that us mere mortals can think about. Focus on writing your code and think about where access is being passed around on that layer. Now let's discuss the current state of things. Some of the biggest pieces of the CapTP puzzle have recently landed. One of those, "handoffs", is what was being puzzled on the whiteboard previously shown; since much of CapTP's efficiency and operation comes from local pairwise meaning of integers between two machines, transferring a capability machine A has to machine C to machine B is a tricky process. We now have a certificate-oriented solution that has gone through some community review (though we would like more, and more will come at protocol codification time for certain) and is also in alignment with the plans expressed by the folks over at Agoric, so this is good news: getting the "key features" of CapTP in is (mostly) no longer a blocker for fleshing out other layers of Spritely. At this current moment, Spritely has actually gotten a bit ahead of Agoric in terms of /current implementations/ of CapTP (this won't be the case for long though, it's just a matter of time until we're at feature parity); we already have distributed acyclic garbage collection, handoffs, and shortening/unwrapping of object references that return home. This is easily explained in that the Agoric folks have actually implemented such features in the past systems so truly are the main innovators in this space (though some of these features have been broadly planned to be done differently in this generation of CapTP, and Spritely's implementation has proceeded by implementing those new approaches… this is probably Spritely's main source of innovation CapTP-wise), but Agoric is also working on helping Javascript standardize the appropriate tooling to make all these pieces possible in such runtimes whereas Spritely's choice of lisp'y languages, while more obscure, puts it in a space of languages built for language design enthusiasts, so all of those pieces were already there. We should be glad that Agoric is doing the hard work to bring these tools to a wide audience through Javascript standardization; this is hard work (trust me, I'm not a stranger to standards work, but language standardization requires /extra/ care). Additionally, the focuses of Spritely and Agoric means we're both working on similar things with different priorities: Agoric is more focused on things that are economic-oriented and so has made enormous strides in the features that are needed for this area, whereas Spritely has made innovations in the areas necessary for distributed virtual worlds and social networks. In the future we should see feature parity on the CapTP layer; this is also a big win because it means that users of either system can leverage the features of the other side without having to disagree over /which/ language is the right foundation. This also means that we have a way to collaborate even with us focusing on different pieces of the end-user puzzle. (Ie, Spritely need not be focusing on the economic layer right now… once we have CapTP interoperability, we'll already have a bridge into that world thanks to the hard work of the Agoric folks! And the Agoric folks can also benefit from our work on the social side of things.) But Spritely's CapTP implementation still needs some cleanup and to be documented. (I did my best to [document my process of learning and implementing CapTP as I went], though forewarning, the linked mailing list thread is full of many twists and turns. In a cleaned up primer, introducing the core concepts should be much simpler.) This work will begin soonish, but again, both Spritely and Agoric are in close conversations but also have pressing matters to attend to which precede this work. But likely you'll hear more about this soon-ish. The more exciting and immediate thing to do is to start building demos. Longform textual explanations are good and well, but "seeing is believing" (and shinier demos are better; the [Terminal Phase time travel demo] showed off features that had existed for some time Goblins, but was the first time I saw people raising their heads and saying "gosh, wow, what's happening /over here/!") CapTP is only exciting because it's a powerful /foundation/ for what's to come. I fear this blogpost, long and rambly as it is, still will not capture minds in the appropriate way. Hopefully by building demos people can get a sense and feeling that indeed, something truly interesting is going on here, something they really want to use. I believe Spritely's future is bright, but part of this is because of the long and hard work on its architectural foundations. Those pieces are coming together, and CapTP is probably the most shining pillar of all of those. At the beginning of this process, CapTP was a strange and mysterious thing, yet with seemingly alluring powers. At present, those alluring powers have shown themselves true and are increasingly available to the Spritely system. In time as CapTP is codified, we aim to chip away at the strange and mysterious component, distributing its power to all. [social network] [threads] [CapTP] [Spritely Goblins] [ActivityPub] [live side by side] [talking at people] [showing people] [small demo] [your assumptions of "objects" as OOP] [significant research] [A Security Kernel Based on the Lambda Calculus] [I did a 250 line client/server p2p chat "protocol"] [E programming language] [Electric Communities Habitat] [Mark Miller's dissertation] [Cap'N Proto] [they're using a CapTP] [rpc.capnp] [Agoric] [The Agoric Papers] [we're already talking, and that's the plan] [documented, but long] [PR adding some comments to Agoric's implementation] [collecting cycles that span machines] ["promise pipelining"] [document my process of learning and implementing CapTP as I went] [Terminal Phase time travel demo]