Recent Posts -- Dustycloud Brainstorms

[<-Previous] [--latest--] [--archive--] [Next->]

The Little Learner and hotel room hacking

By Christine Lemmer-Webber on Fri 20 June 2025

Right now I'm reading The Little Learner. Like how The Little Schemer introduces fundamental concepts of how computing and programming languages work, and The Reasoned Schemer teaches logic programming, and in each you build your own implementation of the language in question at the end, The Little Learner is like that but for deep learning neural networks.

I tend to be vocally skeptical of the hype around LLMs right now, so it may seem strange that I'm working through this book, but the truth is that I mostly think that LLMs are insufficient in many ways, and that there's more to the puzzle, but not having the complete puzzle is a dangerous situation for society. More on this in an upcoming blogpost.

But I wanted to really actually understand things on an algorithm level before I said the things I am thinking, and the Little Books have introduced me to multiple deep subjects, and I thought I'd give this one a try. It seemed pretty difficult to believe that they'd be able to explain something that's much more statistical and what I largely thought of as a big ol' number soup, but they've done it quite well.

I have my MNT Pocket Reform out and I've been working on it through that. It's nice insofar as, well, with my normal laptop out it's easy enough to get distracted, but the Pocket feels focused.

Originally I was using Racket and DrRacket, I figured I'd try to go through the book using the Racket package malt. But for whatever reason, DrRacket is terribly slow on this device despite actually having a reasonably powerful board on it, and the load time for setting up Malt seems to be huge. Maybe Racket performs slowly on ARM64? I'm not actually sure. But anyway, eventually I decided, they have the implementation at the back of the book in an appendix (well actually they have two versions, Appendix A for the simpler but slower version, and Appendix B for the faster parallelizable one). So I started typing in the implementation in the appendix into my own little Guile modules, and that turned out to be a good idea, because I am already getting a better sense of what's going on.

And to my delight I realized that their neural network kernel is a metacircular evaluator! (If you don't know what this is but are now very curious, you might try reading A Scheme Primer, which I think does a good job of introducing the ideas towards the end, but hey I'm biased.) There's the equivalent of eval and apply in there, but the arguments to functions are basically matrices of floating point numbers, but that idea is there! And as the metacircular evaluator evaluates things, it "updates its intuitions" about its statistics. Well, anyway, that's how I see it.

The point is: eval/apply will never die.

Anyway, the book is helping me understand, respect, and also better see the limitations of these systems. My opinions on a social level haven't changed much, but my vision for how to forge a better future than the present are congealing.

And in the meanwhile, I'm in a beautiful city, and I suppose I should be out exploring more. Instead I've locked myself mostly in this hotel room, venturing out occasionally for boba tea to fuel my hacking sessions, locking myself in again. It's not even a great hotel experience, I'm staying in hotel dorms (and the University wifi's nanny firewall, hilariously, blocks this very website, who knows why).

But it's so hard to find hack time on things like this anymore. I've spent many a vacation with some time head down, doing the kind of programming and research I wouldn't be able to do day to day. I guess it feels silly. But I'm still venturing out, wandering around, just a little bit.

But I'm here for a wedding, one of a good friend, who loves this kind of stuff also. That makes it feel a little bit more appropriate.

Anyway, rambly post. Something more of substance about these topics is coming soon. I should finish getting ready! The wedding is tomorrow, but people are gathering tonight, and I should leave soon. As excited as I am to hack, I am also excited to step out and spend time with friends. Off I go!

Ode to Cleaning Robots (a song)

By Christine Lemmer-Webber on Fri 13 June 2025

Close to a year ago I wrote about making two songs in Milkytracker. Well today, I have a new one! An Ode to Cleaning Robots:

(Here's a direct link, if the audio tag above doesn't work!)

And here's the Milkytracker source file should you want to look at it. This song is released as CC BY-SA 4.0 International, so have fun with it.

The goal here was to test out the new Milkytracker synthesizer, which is pretty good! All the samples in this song were generated within Milkytracker itself. Less surprising for the robot noises, but also the "strings" too.

I feel like I am getting better at making music; things are starting to feel more layered and like the kind of music I actually want to put out. It's nice! Milkytracker remains a comfortable place for me to do it.

I have a pretty clear vision in my head about a music video that could go with this one, and I'd like to do it, but who knows if I'll have time. Probably not!

There's so much else going on, I keep thinking "I need to blog, I need to blog, I need to blog about it" but I haven't been writing about nearly anything. But hey, I wrote about this song! Hope you enjoy it.

Crystal Chariot: a poem for the moment

By Christine Lemmer-Webber on Sun 27 April 2025

Held apart
across a crystal divide
A chariot holding me
made of glass

I should be so lucky
as to ride inside this thing

I can see
but I cannot touch
I can travel
and I cannot go home

How long until it shatters?
What will happen as I fall?

The DIY FOSS cyborg

By Christine Lemmer-Webber on Wed 15 January 2025

Do you feel the allure of becoming a cyborg? Are you one of those people who loves computing, but hates what computing has become? Do you wish to become one with your computing environment, without having to give into an immersively curated dystopian corporate version of computing we experience today? Throw off the shiny shackles of Apple, dismiss the metaverse garbage Mark Zuckerberg is trying to sell you. We are going for something simpler, something more powerful, something only beholden to you.

Zacchae's ergonomic keyboard, hanging off his hips

And also something a little bit clunkier. But in a good way. The way that installing your first Linux distro on that old laptop felt. Freedom, in a dorkier, clunkier, but more liberating sense.

Encompassing, but configurable. Fully embracing computing as an extension of you, but within your control.

My friends. That future is here. I have met the DIY FOSS cyborg. He lives not in an overly saturated 3d graphical environment, wears not an awkward bucket over his head. No, we are talking about a fulltime Linux and Guix and Emacs cyborg, living with an org-mode overlay over his eyes. The hacker's cyborg.

And lo, I bring you the good news: you too can become such a cyborg. The technology is here today! And it is far simpler, and far dorkier (in a delightful way). And you can have it. You too can become a DIY FOSS cyborg.

Meeting Zacchae, the Emacs and Guix dadcore cyborg

It's best to have Zacchae introduce himself and his own setup. I will do so in a moment, and I even have a video. You can skip ahead to it in the next section if you like. But if you are willing, perhaps indulge me in a story.

When I first met Zacchae, it was at DWeb Camp earlier this year. The experience had a strange religious overtone to it.

DWeb Camp is always overwhelming to me, and I forget just how overwhelming until I am there. "See you Sunday, Christine!" said Dave Thompson (Spritely's CTO), as we pulled up our rented car onto the gravel parking lot of the campgrounds. I barely got out a "What do you mean?" before I found myself consumed by the nonstop series of conversations, presentations, and lying face-down in the darkness in our cabin trying to recover social spoons in-between the above.

On the first night of DWeb Camp, people were gathered around. There was a bonfire. The redwoods loomed large.

Zacchae approached me. "I am running Emacs and Guix on my computer. I have org-mode projected over my vision and I can access it wherever I wander. Would you like to see?"

I stood there, dazed, as the fire flickered shadows about. "I am interested, but I cannot possibly mentally process this right now. Please ask me tomorrow."

Zacchae nodded politely and left me to lesser conversations of the evening. Protocols, the ethics of decentralization. Simpler, more familiar topics for my mind in such a state.

The next day passed. More socializing, nonstop. Presentations. We ran a booth where we showed off the "Spritely Arcade", playable demos of our tech while we explained what the ramifications were to interested audiences. Our booth was so popular it jammed the space and we were demonstrating and talking for two hours straight.

We packed up. I wandered, dizzy, over to the fire to socialize again, unable to speak or think about even the slightest of technical topics.

Zacchae approached again. "Hello. Would you like to look into the display now? It is very easy." He removed the overlay from his eye and offered it to me, hand outstretched.

Zacchae's eyepiece

But my brain could not process anything. I understood this was something I wanted to see, but I could not quite comprehend it.

An offer twice refused. Zacchae nodded politely and let me resume simpler conversations.

In the morning of the third day I had another presentation, jointly given with Dave. And then I was done! I was free of all the main things I was scheduled to do and say. I felt elated. Dave and I high-fived. We could enjoy the conference now.

I headed down to the center of the event to socialize with some friends. Lunch would be starting soon. I was feeling bubbly with the relief of being free of obligations.

Zacchae approached once more and said, "Hey, just curious, would now be a good time to look at the Emacs and Guix computer thing?" But by this time I was a little bit irritated. My brain went into the mode of "this is someone who I have brushed off several times, probably it's because I am avoiding them" and I politely said "I'm busy but why don't we get in contact later?"

Zacchae nodded. "Oh sure. Could you give me your email?" I rattled it off and nearly turned away.

But it was an offer twice refused. A third refusal was not possible, try as I foolishly may have.

Zacchae stared into the distance and started typing at his hips. My brain jolted to reality.

This is not how a person normally interacts with a computer! my brain said, kicking me in the metaphorical shins. This is a thing you have been waiting to experience your whole adult life!

"Oh wait, holy shit, no... you've got to tell me about this! You've got to show me!"

And Zacchae did. And it was so exciting I ran off to gather the group of friends who I knew would all be equally excited. And we gathered around as he explained to us how his system worked.

Meet Zacchae's setup

I promised you that I would let Zacchae explain his computer setup himself. Luckily, I caught it on video! Here it is!

Watch on Peertube or on YouTube

When you look at it, it's astounding how simple the whole damn thing is. These are off the shelf components! A popular ergonomic keyboard. A Linux phone. A pile of extra batteries. And what's most exciting of all: the fact that we are now entering an era of technology where heads-up displays are ordinary HDMI devices!

It's hard to not be overwhelmed with the aesthetic shape of Zacchae's design. Earlier I said it was dadcore, and I think that this is true. To me, Zacchae's computer evokes similar feelings to utilikilts, swiss army knives, Linux User Groups, hackerspace projects, a shoebox of Slackware floppies, or that desktop computer your friend always left the panel of slightly unhinged because they were always opening it up to mess with it a little bit more.

It feels like an era of computing forgotten. But when you blow off the layer of accrued dust, it also feels like an era of computing vision that has nearly been lost. And here it is, afresh!

The truth is that wearable and mobile computing has been long important to me. But over time, I have become disillusioned with it. In highschool and early college, I had a series of PDAs; Palm Pilot with a fold-out keyboard, then a Sharp Zaurus which was much less useful but way cooler for actually running Linux on it. I bought an OpenMoko and then an n900; the former was a brick that never worked, but the latter was hands-down not only the greatest smartphone I have ever used, but the only one I have enjoyed using. It felt like we were on the verge of something wonderful: computers available to people in every moment of their lives, but also the freedom of control seemed more possible than ever in those days.

But the days of the Linux User Group ended, and the first big entry into wearable computing for the mass market was the creepy Google Glass, which felt like being scanned nonconsensually under Google's surveillance ray every time someone wearing one looked at you. Apple brought out the iPhone and it took over and defined the vision of smartphones, which lost any sort of useful keyboard, and critically lost the ability to install and do anything useful with them. The FOSS world pivoted away from the Debian-based n900 to Google's Android for the mobile world, which had the veneer of FOSS but without ever seeming to meaningfully deliver real practical user freedom. And I got bitter and disillusioned with my dreams.

To see Zacchae's FOSS cyborg setup revived all my interests. I'm not interested in buying a corporation's idea of a software + hardware "experience". I don't need to "click" on a remote projected object by blinking in a particular way or tapping some awkward controllers which make the Wiimote look like a comfortable experience. By god! I just want to run Emacs over my vision! I want org-mode everywhere! I want to run a terminal! I want to program! I want to have access to all my tools!

I have mentioned the dadcore + fosscore energy of Zacchae's setup a couple of times now. Well, I am fine with the fosscore side, but I can't deal with the dadcore side; that would be too dysphoric to me personally. But as Zacchae said, the main work of putting together his design is sewing. And that hints at the ability to be aesthetically adjusted to one's purposes.

What about FOSScore techno-witchocracy? Now there's an aesthetic I can get behind.

In many ways, I am a software person, not a hardware person. But I can open up a computer, I can tinker with it, even if I don't often. For a while though, I have lamented the state of hardware, as everything has gotten more locked down, more miserable, I have felt increasingly like "I wish I could live in software only... that hardware could fade to the background. I wish using my computer felt like using my computer again." Computers have become sleek, like jewelry. I don't mind jewelry; I wear it myself. But I don't want computers to only be jewelry. I want to feel empowered when I sit behind them. I don't want something closed off from me. And so more and more, I feel like being part of computing is disincentivized. I am walled off from my computing experience.

For the first time in ages, this feeling has been changing for me. MNT's computers feel community-oriented, accessible, usable in a way I miss, even if they do still feel like they're in the "enthusiast who is willing to get their hands dirty" way. I wrote a bit about my MNT Pocket Reform; even more excitingly the MNT Reform Next looks like we are starting to achieve the direction of a computer that can be hacked on and modified by others but which increasingly looks like a direction I can recommend others pick up and use. MNT's open hardware computers feel hackable and extensible, and like they have a future behind them. And feeling like computing has a future, or even a present, is something I desperately need right now.

Zacchae's designs feel future-facing in a different way, dare I say futuristic, kind of. It's a kind of retro-futurism that felt destroyed by corporate visions that locked users out and pushed them to the side instead of inviting them in. I think both of these directions, building computers that are community-oriented versions of the kinds of computing form factors in use today, and form factors that are future-facing, are worth pursuing.

And the fact is, components are finally getting cheap enough and feasible enough that the future can be here with just a bit of vision. Since recording that video a few months back, Zacchae has told me that he's now using a different output device, the XReal Air 2, which he says have been "life changing". (Zacchae also told me he has skateboarded around San Francisco while typing on his computer using these devices also... not something I can recommend the average person do or which I ever would, but I think that means we've fully achieved the clunky-FOSS version of cyberpunk movies of the 80s and 90s.) While these devices are sold with all the never-quite-living-up-to-reality augmented reality visions that a certain subset of humanity is just wild about hitting, but for me, I'm really just excited that this is an HDMI monitor that can overlay over your vision and be carried with you everywhere. And it's plug-and-play... well, heck, all of the components Zacchae has shown off are fairly modular, pluggable components. The tech is here; the big vision is in bringing them together.

And if you're interested in learning about Zacchae's setup, then good news! He's published his setup on his website. It's all there!

And so it is. I have been sketching designs on paper and soliciting the thoughts of my wife Morgan (who is far more of a textile witch than I am). I am plotting, and I am scheming my own Guix Emacs fosscore techno-witchocracy cyborg conversion.

And I am, at last, excited about computers all over again.

Re: Re: Bluesky and Decentralization

By Christine Lemmer-Webber on Fri 13 December 2024

A few weeks ago I wrote How decentralized is Bluesky really?, a blogpost which received far more attention than I expected on the fediverse and Bluesky both. Thankfully, the blogpost was received well generally, including by Bluesky's team. Bryan Newbold, core Bluesky engineer, wrote a thoughtful response article: Reply on Bluesky and Decentralization, which I consider worth reading.

I have meant to reply for a bit, but life has been busy. We launched a fundraising campaign over at Spritely and while I consider it important to shake out the topics in this blog series, that work has taken priority. So it is about a week and a half later than I would like, but here are my final thoughts (for my blog at least) on Bluesky and decentralization.

For the most part, if you've read my previous piece, you'll remember that my assertion was that Bluesky was neither decentralized nor federated. In my opinion many of the points raised in Bryan's article solidify those arguments and concerns. But I still think "credible exit" remains a "valuable value" for Bluesky. Furthermore, Bryan specifically solicited a request to highlight the values of ActivityPub and Spritely, so I will do so here. Finally, we'll conclude by what I think is the most important thing: what's a positive path forward from here?

Some high-level meta about this exchange

Before we get back into protocol-analysis territory, and for that matter, protocol-analysis-analysis, I suppose I'd like to do some protocol-analysis-analysis-analysis, which is to say, talk about the posting of and response to my original blogpost. Skip ahead if you don't care about this kind of thing, but I think there are interesting things to say about the discourse and meta-discourse of technology analysis both generally and how it has played out here.

Much talk of technology tends to treat said tech to be as socially neutral as a hammer. When I buy a hammer from a hardware store, it feels pretty neutral and bland to me. But of course the invention of the hammer had massive social ramifications, the refinement of its design was performed by many humans, and the manufacture of said hammer happens within social contexts of which I am largely disconnected. To someone, the manufacture and design of hammers is I am sure deeply personal, even though it is not to me.

To me, decentralized networking tech standards and protocols and software are deeply personal territory. I have poured many years of my life into them, I have had challenging meetings where I fought for things I believed important. I have made many concessions in standards which I really did not want, but where something else was more important, and we had to come to agreement, so a compromise was made. I write code and I work on projects that I believe in. To me, tech is deeply personal, especially decentralized networking tech.

So it took me a long time and effort and thinking to write the previous piece, not only because I wanted to put down my technical analysis carefully, but because I have empathy for how hearing a critique of tech you have poured your life into feels.

I probably would not have written anything if it were actually not for the invitation and encouragement of Bryan Newbold, whose piece is the "Re:" in this "Re: Re:" article. People had been asking me what I thought about ATProto and I said on a public fediverse thread that I had been "holding my tongue". Bryan reached out and said he would be "honored" if I wrote up my thoughts. So I did.

So I tried to be empathetic, but I still didn't want to hold back on technical critique. If I was going to give a technical critique, I would give the whole thing. The previous post was... a lot. Roughly 24 pages of technical critique. That's a lot to be thrown at you, invitation or otherwise. So when I went to finally post the article, I sighed and said to Morgan, "time for people to be mad at me on the internet."

I then posted the article, and absurdly, summarized all 24 pages of tech in social media threads both on the fediverse and on bluesky. I say "summarized" but I think I restated nearly every point and also a few more. It took me approximately eight hours, a full work day, to summarize the thing.

And people... by and large weren't mad! (For the most part. Well, some Nostr fans were mad, but I was pretty hard on Nostr's uncomfortable vibes, which I still stand by my feelings on that. Everyone else who was initially upset said they were happy with things once they actually read it.) This includes Bluesky's engineering team, which responded fairly positively and thoughtfully overall, and a few even said it was the first critique of ATProto they thought was worthwhile.

After finishing posting the thread, I reached out to a friend who is a huge Bluesky fan and was happy to hear she was happy with it too and that it had gotten her motivated to work on decentralized network protocol stuff herself again. I asked if the piece seemed mean to her, because she was one of the people I kept in mind as "someone I wouldn't want to be upset with me when reading the article", and she said something like "I think the only way in which you could be considered mean was that you were unrelentingly accurate in your technical analysis." But she was overall happy, so I considered that a success.

Why am I bringing this up at all? I guess to me it feels like the general assessment is that "civility is dead" when it comes to any sort of argument these days. You can't win by being polite, and the trolls will always try to use it against you, don't bother. And the majority of tech critiques that one sees online are scathingly, drippingly, caustically venomous, because that's what gets one the most attention. So I guess it's worth seeing that here's an example where that's definitively not the case. I'm glad Bryan's reply was thoughtful and nice as well.

Finally, speaking of things that one is told that simply don't work anymore, my previous article was so long I was sure nobody would read it. And, truth be told, people maybe mostly read the social media threads that summarized it in bitesized chunks, but there were so many of those bitesized chunks. As a little game, I hid "easter eggs" throughout the social media threads and encouraged people to announce when they found them. For whatever reason, the majority of people who reported finding them were on the fediverse. So from a collective standpoint, congratulations to the fediverse for your thorough reading, for collective collection of the egg triforce, and for defeating Gender Ganon.

Okay, that's enough meta and meta-meta and so on. Let's hop over to the tech analysis.

Interesting notes and helpful acknowledgments

Doing everything out of order of what one would be considered "recommendable writing style", I am putting some of the least important tidbits up front, but I think these are interesting and in some ways open the framing for the most important parts that come next. I wanted to highlight some direct quotes from Bryan's article and just comment on them here, just some miscellaneous things that I thought were interesting. If you want to see more pointed, specific responses, jump ahead again to the next section.

Anything you see quoted in this section comes straight from Bryan's article, so take that as implicit.

A technical debate, but a polite one

First off:

I am so happy and grateful that Christine took the time to write up her thoughts and put them out in public. Her writing sheds light on substantive differences between protocols and projects, and raises the bar on analysis in this space.

I've already said above that I am glad the exchange has been a positive one and I'm grateful to see that my writing was well received. So just highlighting this as an opening to that.

However, I disagree with some of the analysis, and have a couple specific points to correct.

I highlight this only to remind that despite the polite exchange, and several acknowledgments about things Bryan does say I am right about, there are some real points of disagreement which are highlighted in Bryan's article, and that's mostly what I'll be responding to.

"Shared heap" and "message passing" seem to stick

Christine makes the distinction between "message passing" systems (email, XMPP, and ActivityPub) and "shared heap" systems (atproto). Yes! They are very different in architecture, which will likely have big impacts on outcomes and network structure. Differences like this make "VHS versus BetaMax" analogies inaccurate.

I'm glad that the terms "message passing" and "shared heap" seem to have caught on when it comes to analyzing the technical differences in approach between these systems. "Message passing" is hardly a new term, but I think (I could be wrong) that "shared heap" is a term I introduced here, though I didn't really state that I was doing so. I'm glad to have seen these terms highlighted as being useful for understanding what's going on, and I've even seen the Bluesky team use the term "shared heap" to describe their system including around some of the positive qualities that come from their design, and I consider that to be a good thing.

If I were going to pull on a deeper amount of computer science history, another way to have said things would have been "actor model" vs "global shared tuplespaces". However, this wouldn't have been as helpful; the important thing to deliver for me was a metaphor that even non-CS nerds could catch onto, and sending letters was the easiest way to do that. "Message passing" and "shared heap" thus attached to that metaphor, and it seems like overall there has been increased clarity for many starting with said metaphor.

Acknowledgment of scale goals

One thing I thought was good is that Bryan acknowledged Bluesky's goals in terms of scaling and "no compromises". Let me highlight a few places:

Other data transfer mechanisms, such as batched backfill, or routed delivery of events (closer to "message passing") are possible and likely to emerge. But the "huge public heap" concept is pretty baked-in.

In particular, "big-world public spaces" with "zero compromises" is a good highlight to me:

Given our focus on big-world public spaces, which have strong network effects, our approach is to provide a "zero compromises" user experience. We want Bluesky (the app) to have all the performance, affordances, and consistency of using a centralized platform.

And finally:

So, yes, the atproto network today involves some large infrastructure components, including relays and AppViews, and these might continue to grow over time. Our design goal is not to run the entire network on small instances. It isn't peer-to-peer, and isn't designed to run entirely on phones or Raspberry Pis. It is designed to ensure "credible exit", adversarial interop, and other properties, for each component of the overall system. Operating some of these components might require collective (not individual) resources.

By the way, I had anticipated in my previous blogpost that we would see the space hosting requirements for Bluesky's public network to double within the month. I underestimated!

The cost of running a full-network, fully archiving relay has increased over time. After recent growth, our out-of-box relay implementation (bigsky) requires on the order of 16 TBytes of fast NVMe disk, and that will grow proportional to content in the network. We have plans and paths forward to reducing costs (including Jetstream and other tooling).

What's also highlighted above is that there are some new tools which don't require the "whole network". I will comment on this at length later.

Sizable endeavors

This section raised an eyebrow for me:

This doesn't mean only well-funded for-profit corporations can participate! There are several examples in the fediverse of coop, club, and non-profit services with non-trivial budgets and infrastructure. Organizations and projects like the Internet Archive, libera.chat, jabber.ccc.de, Signal, Let's Encrypt, Wikipedia (including an abandoned web search project), the Debian package archives, and others all demonstrate that non-profit orgs have the capacity to run larger services. Many of these are running centralized systems, but they could be participating in decentralized networks as well.

The choice of community and nonprofit orgs here surprised me, because for the most part I know the numbers on them. Libera.chat and and jabber.ccc.de might be small enough, because IRC and XMPP are in decline of use for one thing, but also because they're primarily sending around low-volume plaintext messages which are ephemeral.

The other cases are particularly curious. The annual budgets of some of these organizations:

Wikimedia's annual expenses: ~$178 million/year
Signal's annual expenses: ~$50 million/year
Let's Encrypt/ISRG's annual expenses: ~$7 million/year
Internet Archive's annual expenses: ~$25 million/year

These may sound like overwhelming numbers, but it is true that each of these organizations is extremely efficient relative to the value they're providing, especially compared to equivalent for-profit institutions. My friend Nathan Freitas of the Guardian Project likes to point out that US military fighter jets cost hundreds of millions of dollars... "when people complain about public funding of open source infrastructure, I like to point out that funding signal is just asking for a wing of a fighter jet!" Great point.

But for me personally, this is a strange set of choices in terms of "non-profits/communities can host large infrastructure!" Well yes, but not because they don't cost a lot. People often don't realize the size and scale of running these kinds of organizations or their infrastructure, so I'm highlighting that to show that it's not something your local neighborhood block can just throw together out of pocket change.

(But seriously though, could open source orgs have some of that fighter jet wing money?)

Decentralization and federation terminology

If you are going to read any section of this writeup, if you are going to quote any section, this one is the important one. For I believe the terms we choose are important: how we stake the shape of language affects what kinds of policies and actions and designs spring forth.

Language is loose, but language matters. So let us look at the terminology we have.

A comparison of definitions

Bryan acknowledges my definitions of decentralization and federation, and also acknowledges that perhaps Bluesky does not meet either definition. Bryan instead "chooses his own fighter" and proposes two different definitions of decentralization and federation from Mark Nottingham's RFC 9518: Centralization, Decentralization, and Internet Standards.

First let us compare definitions. Usefully, Bryan highlights Mark's definition of centralization (which I had not defined myself):

[...] "centralization" is the state of affairs where a single entity or a small group of them can observe, capture, control, or extract rent from the operation or use of an Internet function exclusively.

So far so good. I agree with this definition.

Now let us get onto decentralization. First my definition of decentralization:

Decentralization is the result of a system that diffuses power throughout its structure, so that no node holds particular power at the center.

Now here is Bryan's definition (more accurately Mark Nottingham's definition (more accurately, Paul Baran's definition)) of decentralization:

[Decentralization is when] "complete reliance upon a single point is not always required" (citing Baran, 1964)

Perhaps Bluesky matches this version of decentralization, but if so, it is because it is an incredibly weak definition of decentralization, at least taken independently. This may well say, taken within the context it is provided, "users of this network may occasionally not rely on a gatekeeper, as a treat".

Put more succinctly, the delta between the definition I gave and the definition chosen by Bryan is:

The discussion of power dynamics, and diffusion thereof, is removed
The phrase complete reliance is introduced, opening acceptability within the definition that incomplete reliance is an acceptable part of decentralization
The phrase not always required is introduced, opening acceptability that even complete reliance may be acceptable, as long as it is not always the case

When I spoke of my concerns of moving the goalpost, the delta between the goalpost chosen in my definition and the goalpost chosen in Bryan's chosen definition are miles away.

We'll come back to this in a second, because the choice of the definition by Baran is more interesting when explored in its original context.

But for now, let's examine federation. Here is my definition:

[Federation] is a technical approach to communication architecture which achieves decentralization by many independent nodes cooperating and communicating to be a unified whole, with no node holding more power than the responsibility or communication of its parts.

Here is Bryan's definition (more accurately Mark Nottingham's definition):

[...] federation, i.e., designing a function in a way that uses independent instances that maintain connectivity and interoperability to provide a single cohesive service.

At first these two seem very similar. What, again is the delta?

The discussion of power dynamics, once again, is not present.
"Cooperation" is not present.
And very specifically, "decentralization" and "no node holding more power than the responsibility or communication of its parts" is not present.

Reread the definition above and the definition I gave and compare: under these definitions, any corporate but proprietary and internal microservices architecture or devops platform would qualify. (Not an original observation; thanks to Vivi for pointing this out.) Dropping power dynamics and decentralization from the definition reduces this to "communicating components", which isn't enough.

Bryan then goes on to acknowledge that this definition is a comparative low bar:

What about federation? I do think that atproto involves independent services collectively communicating to provide a cohesive and unified whole, which both definitions touch on, and meets Mark's low-bar definition.

However, in the context of Nottingham's paper, it's admittedly stronger, because federation is specifically upheld as a decentralization technique, which is missing when quoted out of context (though Nottingham notably challenges whether or not it achieves that goal in practice). Which turns out to be important. The "power dynamics" part and specifically "immersing this definition in decentralization" parts are actually really both very important parts of the definition I gave.

Bryan then goes on to acknowledge that maybe federation isn't the best term for Bluesky, and leaves some interesting history I feel is worthwhile including here:

Overall, I think federation isn't the best term for Bluesky to emphasize going forward, though I also don't think it was misleading or factually incorrect to use it to date. An early version of what became atproto actually was peer-to-peer, with data and signing keys on end devices (mobile phones). When that architecture was abandoned and PDS instances were introduced, "federation" was the clearest term to describe the new architecture. But the connotation of "federated" with "message passing" seems to be pretty strong.

So on that note, I think it's fine to say, Bluesky is not federated, and there's enough general acknowledgement of such. Thus it's probably best if we move onto an examination of decentralization, and in particular, where that definition came from.

"Decentralization" from RFC 9518, in context

Earlier I said "now here is Bryan's definition (more accurately Mark Nottingham's definition (more accurately, Paul Baran's definition)) of decentralization" and those nested parentheses were very intentional. In order to understand the context in which this definition arises, we need to understand each source.

First, let us examine Mark Nottingham's IETF independent submission, RFC 9518: Centralization, Decentralization, and Internet Standards. Mark Nottingham has a long and respected history of participating in standards, and most of his work history is doing so for fairly sizable corporate participants. From the title, one might think it a revolutionary call-to-arms towards decentralization, but that isn't what the RFC does at all. Instead, Nottingham's piece is best summarized by its own words:

This document argues that, while decentralized technical standards may be necessary to avoid centralization of Internet functions, they are not sufficient to achieve that goal because centralization is often caused by non-technical factors outside the control of standards bodies. As a result, standards bodies should not fixate on preventing all forms of centralization; instead, they should take steps to ensure that the specifications they produce enable decentralized operation.

The emphasis is mine, but I believe captures well what the rest of the document says. Mark examines centralization, as well as those who are concerned about it. In the section "Centralization Can Be Harmful", Mark's description of certain kinds of standards authors and internet activists might as well be an accurate summation of myself:

Many engineers who participate in Internet standards efforts have an inclination to prevent and counteract centralization because they see the Internet's history and architecture as incompatible with it.

Mark then helpfully goes on to describe many kinds of harms that do occur with centralization, and which "decentralization advocates" such as myself are concerned about: power imbalance, limits on innovation, constraints on competition, reduced availability, monoculture, self-reinforcement.

However, the very next section is titled "Centralization Can Be Helpful"! And Mark goes into great lengths also about ways in which centralized systems can sometimes provide superior service or functionality.

While Mark weighs both, the document reads as a person who authors standards document who would like the internet to be more decentralized where it's possible, but also operates from the "pragmatic" perspective that things are going to re-centralize most of the time anyway, and when they do this ultimately tends to be useful. It is also important to realize that this is occuring in a context where many people are worrying about increasing centralization of the internet, and wondering to what degree standards groups should play a role. From Mark's own words:

Centralization and decentralization are increasingly being raised in technical standards discussions. Any claim needs to be critically evaluated. As discussed in Section 2, not all centralization is automatically harmful. Per Section 3, decentralization techniques do not automatically address all centralization harms and may bring their own risks.

Note this framing: centralization is not necessarily harmful, and decentralization may not address problems and may cause new ones. Rather than a rallying cry for decentralization, Mark's position is in many ways a call for a preservation of the increasing status quo: large corporations tend to be capturing and centralizing more of the internet, and we should be worried about that, but should it really be the job of standards? Remember, this is a concern within IETF and other standards groups. Mark says:

[...] approaches like requiring a "Centralization Considerations" section in documents, gatekeeping publication on a centralization review, or committing significant resources to searching for centralization in protocols are unlikely to improve the Internet.
Similarly, refusing to standardize a protocol because it does not actively prevent all forms of centralization ignores the very limited power that standards efforts have to do so. Almost all existing Internet protocols -- including IP, TCP, HTTP, and DNS -- fail to prevent centralized applications from using them. While the imprimatur of the standards track is not without value, merely withholding it cannot prevent centralization.
Thus, discussions should be very focused and limited, and any proposals for decentralization should be detailed so their full effects can be evaluated.

Mark evaluates several structural concerns, many of which I strongly agree with. For example, Mark points out that email has, by and large, become centralized, despite starting as a decentralized system. I fully agree! "How does this system not result in the same re-centralization problems which we've seen happen to email" is a question I often throw around. And Mark also highlights paths to which standards groups may reduce centralization.

But ultimately, the path which Mark leans most heavily into is the section "Enable Switching":

The ability to switch between different function providers is a core mechanism to control centralization. If users are unable to switch, they cannot exercise choice or fully realize the value of their efforts because, for example, "learning to use a vendor's product takes time, and the skill may not be fully transferable to a competitor's product if there is inadequate standardization".
Therefore, standards should have an explicit goal of facilitating users switching between implementations and deployments of the functions they define or enable.

Does this sound familiar? If so, it's because it's awfully close to "credible exit"!

There is a common ring between Mark and Bryan's articles: centralization actually provides a lot of features we want, and we don't want to lose those, and it's going to happen anyway, so what's really important is that users have the ability to move away. While this provides a safety mechanism against centralization gone badly, it is not a path to decentralization on its own. Credible exit is useful, but as a decentralization mechanism, it isn't sufficient. If the only options in town are Burger King and McDonalds for food, one may have a degree of options and choice, but this really isn't satisfying to assuage my concerns, even if Taco Bell comes into town.

What's missing from Mark's piece altogether is "Enable Participation". Yes, email has re-centralized. But we should be upset and alarmed that it is incredibly difficult to self-host email these days. This is a real problem. It's not unjustified in the least to be upset about it. And work to try to mitigate it is worthwhile.

"Decentralization" within Baran's "On Distributed Communications"

In the last subsection, we unpacked the outer parenthetical of "now here is Bryan's definition (more accurately Mark Nottingham's definition (more accurately, Paul Baran's definition)) of decentralization". In this subsection, we unpack the inner parenthetical. (Can you tell that I like lispy languages yet? Now if there was only also a hint that I also enjoy pattern matching ...)

Citing again the definition chosen by Bryan (or more accurately ... (or more accurately ...)):

[Decentralization is when] "complete reliance upon a single point is not always required" (citing Baran, 1964)

Citations, in a way, are a game of telephone, and to some degree this is inescapable for the sake of brevity in many situations. Sometimes we must take an effort to return to the source, and here we absolutely must.

The cited paper by Paul Baran is none other than "On Distributed Communications: I. Introduction to Distributed Communication Networks published by Paul Baran in 1964. There is perhaps no other paper which has influenced networked systems as highly as this work of Baran's has. One might assume from the outset that the paper is too dense, but I encourage the interested reader: print it out, go read it away from your computer, mark it up with a pen (one should know: there is no other good way to read a paper, the internet is too full of distractions). There is a reason the paper stands the test of time, and it is a joy to read. Robust communication in error-prone networks! Packet switching! Wi-fi, telephone/cable, satellite internet predicted as all mixing together in one system! And the gall to argue that one can build it and that it would be a dramatically superior system if we focus on having a lot of cheap and interoperable components rather than big, heavy centralized ones!

It may come as a surprise, then, that I have called the above definition of decentralization too weak if I am heaping praise on Baran's paper as such. But actually, this definition of "decentralized" is the only time in the paper that the term comes up. How could this be?

To understand, we need only look at the extremely famous "Figure 1" of the paper which, if you have worked on "decentralized" (or "distributed") network architecture at all, you have certainly seen:

Paul Baran's diagram of "centralized" (central hub and spoke), "decentralized" (tiered hub and spoke), and "distributed" (what we might think of as a decentralized mesh)

The full paragraph linked to the cited figure is worth citing in its entirety:

The centralized network is obviously vulnerable as destruction of a single central node destroys communication between the end station. In practice, a mixture of star and mesh components is used to form communication networks. For example, type (b) in Fig. 1 shows the hierarchical structure of a set of stars connected in the form of a larger star with an additional link forming a loop. Such a network is sometimes called a "decentralized" network, because complete reliance upon a single point is not always required.

In other words, in Baran's paper, where he is defining a new and more robust vision for what he calls "distributed networks", he is providing "decentralized" as a pre-existing term, not his own definition, for a topology he is criticizing for still being centralized! (Observe! If you read the paragraph carefully Baran is saying that "decentralized" networks like this are still "centralized"!)

Let's observe that again. Baran is effectively saying that a tiered, hierarchical system with many nodes, being called "decentralized" (because that is a term that already existed for these kinds of networks), was in fact centralized. So the very definition selected by Mark Nottingham (and thus Bryan as well) was being criticized for being too centralized by the original cited author!

Baran had to introduce a new term because the term "decentralized" was already being used. However, when we talk about "centralized" vs "decentralized" as if they are polar ends of a spectrum, we are actually talking about type (a) of "Figure 1" being "centralized" and type (c) being the "ideal" version of "decentralized", with (b) sometimes showing up as kind of a grey-area space. Notably, Mark Nottingham makes no such distinction as Baran does between "decentralized" and "distributed", yet uses the definition of "decentralized" that instead resembles tiered, hierarchically centralized systems... not the version of "decentralized" to which Mark Nottingham then goes at great length to analyze.

That is why Baran's definition of "decentralized" is so weak. This is critical history to understand!

In other words:

Contemporary nomenclature: "Centralized" and "Decentralized" as polar ends of a spectrum.
Baran nomenclature: "Centralized and "decentralized" are both centralized topologies, but the latter is hierarchical. "Distributed" is the more robust and revolutionary view.

Do you see it? To use the latter's definition of decentralization to describe the former is to use a definition of centralization. This is not a good starting point.

Baran, notably, is bold about what he calls distributed systems, and it is important to understand Baran's vision as being bold and revolutionary for its time. I can't resist quoting one more paragraph before we wrap up this section (remember! nothing like the internet had yet been proposed or envisioned as something possible before!):

It would be treacherously easy for the casual reader to dismiss the entire concept as impractically complicated -- especially if he is unfamiliar with the ease with which logical transformations can be performed in a time-shared digital apparatus. The temptation to throw up one's hands and decide that it is all "too complicated," or to say, "It will require a mountain of equipment which we all know is unreliable," should be deferred until the fine print has been read.

May that we all be so bold as Baran in envisioning the system we could have!

Why is this terminology discussion so important?

Here I have, at even greater length than my previous post or even Bryan's own response to mine, gone into great length about terminology. But this is important. As I have stated, there are great risks in moving the goalposts. It is hard for those who do not work on networked systems day in and day out to make sense of any of this. "Decentralization washing" is a real problem. I don't find it acceptable.

Bryan "chose his fighter" with Mark Nottingham's RFC, and the choice of fighter informs much that follows. Mark Nottingham himself is advocating that a push too hard on decentralization is something standards people should not be doing, and if they do, should be scoping. Some amount of centralization, according to Mark, is useful, good, and inevitable, and we should scope down the amount of decentralization vs centralization topics that come up in standards groups to something actionable. Mark's reasons are well studied, and while Mark's history often comes from a background in representing standards on behalf of larger corporations, I believe he would like to see decentralization where possible, but is "pragmatic" about it.

(This is also somewhat of a personal issue for me; my participation in standards has been generally as more "outside the system" of corporate standards work than the average standards person, and there's a real push and pull between how much standards orgs tend to be dominated by corporate influence. I'm not against corporate participation, I think it's important but... I highly recommend reading Manu Sporny's Rebalancing How the Web is Built which describes the governance challenges, particularly leading to corporate capture, which tend to happen within standards orgs. One of the only reasons I believe ActivityPub was able to be as "true to its goals" as it was is partly that the big players refused to participate at the time of standardization, which at the time was an existential threat to the continuation of the group, but in retrospect ended up being a blessing for the spec. It both is and is not an aside, but getting further into that is a long story, and this is already a long article.)

Mark's choice to use the definition of "decentralization" from Baran is however dangerous to read without understanding the surrounding context. The way Baran used the term was as a criticism of hierarchical centralization, and was introducing a new term as an alternative. This is why Baran's definition of "decentralization" appears so weak: Baran was not advocating for the ideas he was scoping under that (pre-existing in the context he was arguing within) term.

I personally don't believe we need to support the three-word "centralization", "decentralization", and "distributed" phrases which Baran used, it's fine for me to have a spectrum between "centralized" and "decentralized". But we should not conflate a situation where "decentralization" means "tiered centralization" with the contemporary usage of "resisting centralization".

However, all this is to say that I think Nottingham's view on how much we should be bothering to be concerned with centralization vs decentralization aligns well with ATProto and Bluesky's own interpretations. "Credible exit", I still assert, is a separate view, a particular mechanism to avoid some of the challenges of centralization going bad, and indeed in Nottingham's own RFC, it is only one path of several examined, but the one Nottingham appears most aligned with as practically possible.

Regardless, I'd still say then: if Bluesky does not meet my definition of "decentralized", the solution is not to move the goalposts. I think I've made it clear enough, with a thorough enough reading of the literature, of why to accept the proposed definition within Bryan's post would be to move the goalposts. I don't think that's intentional, or malicious, but it is the result, and I'm not satisfied with that result.

That's enough said on the topics of terminology. Let's move on.

What happens when ATProto scales down?

A specific form of scale-down which is an important design goal is that folks building new applications (new Lexicons) can "start small", with server needs proportional to the size of their sub-network. We will continue to prioritize functionality that ensures independent apps can scale down. The "Statusphere" atproto tutorial demonstrates this today, with a full-network AppView fitting on tiny server instances.

I won't spend too long on this other than to say: a large portion of the arguments for why to choose ATProto's architecture specifically was to "not miss replies/messages", and as said in my previous article, that requires a god's-eye view of the system. Here it's argued that ATProto can scale down, and yes it can, but is that the architecture you want?

Given that message passing systems, by having directed messaging, is able to scale down quite beautifully but still interoperate as much as one would like with a larger system, what is the value of using an architecture which scales down with much more difficulty and which is oblivious of external interactions without knowing all of them?

I made a claim here: ATProto doesn't scale down well. That's mainly because to me, scaling down still means being participatory with as much as you'd like of a wider system while still having small resources. What I would like to analyze in greater detail is why ATProto doesn't scale wide. To me, these two arguments are inter-related. Let's analyze them.

Defending my claim: decentralized ATProto has quadratic scaling costs

In my previous article, I said the following:

If this sounds infeasible to do in our metaphorical domestic environment, that's because it is. A world of full self-hosting is not possible with Bluesky. In fact, it is worse than the storage requirements, because the message delivery requirements become quadratic at the scale of full decentralization: to send a message to one user is to send a message to all. Rather than writing one letter, a copy of that letter must be made and delivered to every person on earth.

However, clearly not everyone agreed with me:

"Agency" really is important to me, probably the most important thing, but we will leave this aside for the moment and focus on a different phrase: "participatory infrastructure".

Was I right or wrong that as nodes are added to ATProto that the scaling costs are quadratic? After I read this exchange, I really doubted myself for a bit. I don't have a formal background in Computer Science; I learned software engineering through community educational materials and honed my knowledge through the "school of hard thunks".

So I spent a morning on the Spritely engineering call distracting my engineering team by walking through the problem. It's easy to get lost in the details of thinking about the roles of the communicating components, so Spritely's CTO, David Thompson, decided to throw my explainer aside and work through the problem independently. Dave came to the same conclusion I did. I also called up one of my oldschool MIT AI Lab type buddies and asked hey, what do you think? I think this is a quadratic scaling problem am I wrong? He said (from vague memory of course) "I think it's pretty clear immediately that it's quadratic. This is basic engineering considerations, the first thing you do when you start designing a system." Well that's a relief that I wasn't confused. But if it seemed obvious to me, why wasn't it obvious to everyone else? "It seemed pretty clear the way you described it to me. So why don't you just repeat that?"

So okay, that's what I'll do.

Let's start with the following points before we begin our analysis:

We will assume that, since ATProto has partly positioned itself as having one of its key values be "no compromises on centralized use cases" including "no missed messages or replies", that at minimum ATProto cannot do worse than ActivityPub, in its current deployment, does today. Replies and messages addressed to specific users (whether or not addressing is on the protocol layer or extrapolated on top of it) must, at least, be seen by the their intentional recipients.
We will start with the assumption that the most centralized infrastructure is one in which there is only one provider controlling the storage and distribution of all messages: the least amount of user participation in the operation of the network.
We will, on the flip side, consider decentralization to be the inverse, with the most amount of user participation in the operation of the network. In other words, "every user fully self hosts".
We will also take the lessons of my previous post at face value; just as blogging is decentralized but Google (and Google Reader) are not, it is not enough to have just PDS'es in Bluesky be self-hosted. When we say self-hosted, we really mean self-hosted: users are participating in the distribution of their content.
We will consider this a gradient. We can analyze the system from the greatest extreme of centralization which can "scale towards" the greatest degree of decentralization.
We will analyze both in terms of the load of a single participant on the network but also in terms of the amount of network traffic as a whole.

With that in place, it's time to analyze the "message passing" architecture vs the "shared heap" architecture in terms of how they perform when scaling.

Here is my assertion in terms of the network costs of scaling towards decentralization, before I back it up (I will give the computer science'y terms then explain in plain language after):

There is an inherent linear cost to users participating on the network, insofar as for n users, there will always be an O(n) cost of operation.
"Message passing" systems such as ActivityPub, at full decentralization:
- Operate at O(1) from a single user's perspective
- Operate at O(n) from a whole-network perspective (and this is, by definition, the best you can do)
"Public no-missed-messages shared-heap" systems such as ATProto, at full decentralization:
- Operate at O(n) from a single user's perspective
- Operate at O(n^2) from a whole-network perspective

In other words, as we make our systems more decentralized, message passing systems handle things fairly fine. Individual nodes can participate on the network no matter how big the network gets. Zooming out, as more users are added to the decentralized network, the message load is roughly the normal amount of adding more users to the network. However, as we make things more decentralized for the public shared heap model, everything explodes, both on the individual node level, but especially when we zoom out to how many messages need to be sent.

And there is no solution to this without adding directed message passing. Another way to say this is: to fix a system like ATProto to allow for self-hosting, you have to ultimately fundamentally change it to be a lot more like a system like ActivityPub.

This can easy to get lost about; the example above of stating that "gossip" can improve things indicates that talking about message sending is confusing the matter. It will be easier to understand by thinking about message receiving.

To start with a very small example by which we can clearly observe the explosion, let's set up a highly simplified scenario. First let me give the parameters, then I will tell a story. (You can skip the following paragraph to jump to the story if that's more your thing.)

That n number we mentioned previously will now stand for the number-of-users on the network. We will also introduce m which will be number-of-machines, which represents the number nodes on the network. Decentralizing the system involves n moving towards m, so at full decentralization, n and m would be the same; at intermediate levels of decentralization it may be less, but n converges towards m as we decentralize. Each user is individually somewhat chatty, and sends a number of daily-messages-per-user, but we can average these out across all users, so this is just a constant which for our cases we can simplify to 1 for a modeled scenario (though it can scale up and down, it does not affect the rate of growth). Likewise, each message individually sent by a user has a number-of-intended-recipients-per-message, which we can average by the amount of people who were individually intended to receive such message, such as directed messages or subscribers in a publish-subscribe system; however this too can be averaged, so we can also simplify this to 1 (so this also does not affect the rate of growth).

Lost? No worries. Let's tell a story.

In the beginning of our network, we have 26 users, which conveniently for us map to each letter of the English alphabet: [Alice, Bob, Carol, ... Zack]. Each user sends one message per day, which is intended to have one recipient. (This may sound unrealistic, but this is fine to do to model our scenario.) To simplify things, we'll have each user send a message in a ring: Alice sends a message to Bob, Bob sends a message to Carol, and so on, all the way up to Zack, who simply we wrap around and have message Alice. This could be because these messages have specific intended recipients or it could be because Bob is the sole "follower" of Alice's posts, Carol is the sole "follower" of Bob's, etc.

Let's look at what happens in a single day under both systems.

Under message passing, Alice sends her message to Bob. Only Bob need receive the message. So on and so forth.
- From an individual self-hosted server, only one message is passed per day: 1.
- From the fully decentralized network, the total number of messages passed, zooming out, is the number of participants in the network: 26.
Under the public-gods-eye-view-shared-heap model, each user must know of all messages to know what may be relevant. Each user must receive all messages.
- From an individual self-hosted server, 26 messages must be received.
- Zooming out, the number of messages which must be transmitted in the day is 26 * 26: 676, since each user receives each message.

Okay, so what does that mean? How bad is this? With 26 users, this doesn't sound like so much. Now let's add 5 users.

Under message passing:
- Per server, still 1 message received per user per day.
- Per the network, it's 5 extra messages transmitted per day, which makes sense: we've added 5 users.
Under the public-gods-eye-view-shared-heap model:
- Per server: 5 new messages received per user per day.
- Per the network, it's ((31 * 31) - (26 * 26)): 285 new messages per day!

But we aren't actually running networks of 26 users. We are running networks of millions of users. What would happen if we had a million self-hosted users and five new users were added to the network? Zooming out, once again, the message passing system simply has five new messages sent. Under the public shared heap model, it is 10,000,025 new messages sent! For adding five new self-hosted users! (And that's even just with our simplified model of only sending one message per day per user!)

Maybe this sounds silly, if you're a Bluesky enthusiast. I could hear you saying: well Christine, we really aren't planning on everyone self hosting. Yes, but how many nodes can participate in the system at all? The fediverse currently hosts around 27,000 servers (many more users, but let's focus on servers). Adding just 5 more servers would be a blip in terms of the affect on the network. Adding 5 more servers to an ATProto ecosystem with that many fully participating nodes would be an exhausting number of additional messages sent on the network. ATProto does not scale wide: it's a liability to add more fully participating nodes onto the network. Meaningfully self-hosting ATProto is a risk to the ATProto network, there is active reason to disincentivize it for those already participating. But it's not just that. Spreading things around so that more full Bluesky-like nodes are present is something server operators will have to come to discourage if they don't want their already existing high hosting costs to not skyrocket.

Now, what about that mention of "well gossip could help"? This is why I said it is important to think of messages as they are received as opposed to how they are sent. The scenario I gave above was a more ideal scenario than gossip. In a gossip protocol, a node often receives messages more than once. The scenario I gave was more generous: messages are only received once. You can't know information unless it's told to you (well, unless you can infer it, but that's not relevant for this case). It's best to think about receiving.

Architecture matters. There is a reason message passing exists. I don't believe in the distinction between "it's a technical problem" or "it's a social problem" most of the time when designing systems, because it's usually both: the kinds of social power dynamics we can have are informed by the power dynamics of our tech and vice versa. Who can participate here? I agree with the agency concern, I am always deeply concerned with agency, but here agency depends on providers. How big do they have to be? How many of them can there be?

A lot of hope in Bluesky and ATProto is in terms of the dreams of what seems possible. Well, for decentralization of Bluesky and ATProto to even be possible, it must change its architecture fundamentally. ATProto doesn't need to switch to ActivityPub, but in order to become a real decentralized protocol, it has to become a lot more like it.

Reframing portable identity

Bryan has some nice responses to the did:plc stuff in his article, I won't go over it again in depth here. I'll just say it was nice to see.

I actually think that despite all the concerns I laid out about the centralization of did:plc, it's not something I'm all too worried about in terms of the governance of the ledger of updates. It seems like the right things are being done so that did:plc can be audited by multiple parties in terms of working towards a certificate transparency log, etc. That's good to hear.

My bigger concern is that if Bluesky shuts down tomorrow or is bought by a larger player, in practice if Bluesky refuses to allow for a path to rotating keys to move away, it'll be hard to do anything about that. Still, Bluesky is doing more work in the decentralized identity space than most at this point. I want to give them some credit there, and end this little subsection on that positive note.

Bluesky's expectations of public content vs community expectations

ATProto's main design is built upon replicating and indexing the firehose. That is its fundamental design choice and method of operation.

I won't go into this too far here other than to say, I'm not sure this is in alignment with what many of its users want. And we're seeing this, increasingly, as users are being upset about finding out that other providers have replicated and indexed their data. This is happening in a variety of ways, from LLM training concerns, to moderation concerns, etc.

I won't say too much more on that. I think it's just... this all just gives me the feeling that the "speech vs reach" approach, and the idea of a global public firehose, a "global town square" type approach... it all feels very web 2.0, very "Millennial social media"... for Millenials, by Millenials, trying to capture the idea that society would be better if we all got everyone to talk to each other at once.

I think Bluesky is doing about as good a job as a group of people can do with the design they have and are trying to preserve. But I don't think the global context-collapse firehose works, and I'm not sure it's what users want it either, and if they do, they really seem to want both strong central control to meet their needs but also to not have strong central control be a thing that exists when it doesn't.

And who can blame users for that? An alternative can not usually be envisioned unless an alternative is presented.

So, what's the alternative?

On the values and design goals of projects and protocols

One thing I appreciated was where Bryan laid out Bluesky's values and design goals:

Over the summer, I wrote a summary of Bluesky's progress on atproto on my personal blog: "Progress on atproto Values and Value Proposition". Christine identified "Credible Exit" as one of these key properties. Some of the other high-level goals mentioned there were:
Own Your Identity and Data
Algorithmic Choice
Composable Multi-Party Moderation
Foundation for New Apps
Any of these could be analyzed individually; I have my own self-assessment of our progress in the linked article.

I think this is great for Bryan to lay out. They're a nice set of goals. (I don't love the term "own your data" for various "intellectual property" term-confusion adjacent reasons, but that's an aside; the intended meaning is good.) Overall I think this is a pretty reasonably set of goals and you can see why they would inform the design of Bluesky significantly. You don't see many projects lay out their values like this, and it would be good to see done more often.

On that note...

One thing I'd be curious to see is an equivalent set of design goals for ActivityPub (or for Spritely's work, for that matter). This might exist somewhere obvious and I just haven't seen it. It might all differ for the distinct original projects and individuals which participated in the standards process.

This was a nice ask to make. Let me address them separately.

ActivityPub's values and design goals

In a way, it's a bit harder for me to talk about the values and design goals of ActivityPub. It happened in a larger standards group and involved a lot of passing of hands. I think if I were to be robust about it, I would also ask Evan Prodromou, Erin Shepherd, and Amy Guy to weigh in, and maybe they should; I think it would be nice to hear. But since I work with Jessica Tallon (and I'm kind of tired of writing this and want to just get it out there) we had a brief talk this morning and I'll just discuss what we talked about.

The SocialWG charter is informative, first of all. It says the following:

The Social Web Working Group will create Recommendation Track deliverables that standardize a common JSON-based syntax for social data, a client-side API, and a Web protocol for federating social information such as status updates. This should allow Web application developers to embed and facilitate access to social communication on the Web. The client-side API produced by this Working Group should be capable of being deployed in a mobile environment and based on HTML5 and the Open Web Platform. For definitions of terms such as "social" and "activity", please see the W3C Social XG report A Standards-based, Open and Privacy-aware Social Web.
There are a number of use cases that the work of this Working Group will enable, including but not limited to:
User control of personal data: Some users would like to have autonomous control over their own social data, and share their data selectively across various systems. For an example (based on the IndieWeb initiative), a user could host their own blog and use federated status updates to both push and pull their social information across a number of different social networking sites.
Cross-Organization Ad-hoc Federation: If two organizations wish to co-operate jointly on a venture, they currently face the problem of securely interoperating two vastly different systems with different kinds of access control and messaging systems. An interoperable system that is based on the federation of decentralized status updates and private groups can help two organizations communicate in a decentralized manner.
Embedded Experiences: When a user is involved in a social process, often a particular action in a status update may need to cause the triggering of an application. For example, a travel request may need to redirect a user to the company's travel agent. Rather than re-direct the user, this interaction could be securely embedded within page itself.
Enterprise Social Business: In any enterprise, different systems need to communicate with each other about the status of various well-defined business processes without having crucial information lost in e-mail. A system built on the federation of decentralized status updates with semantics can help replace email within an enterprise for crucial business processes.

I think the "user control of personal data" is kind of like "owning your own data" but with terminology I am more comfortable with personally. Cooperation, even if organization-focused, is there, and embedding is I guess also present. The "enterprise" use case... well, I can't say that ever ended up being important to me, but "business-to-business" use cases is partly how the Social Web Working Group was able to describe that it would have enough W3C member organization support to be able to run as a group (which the corporate members quickly dropped out, leaving a pile of independent spec authors... in most ways for the best for the specs, but it seemed like an existential crisis at the time).

But those don't really speak as values to me. When Jessica and I spoke, we identified, from our memories (and without looking at the above):

The need to provide a federation API and client-to-server api for federated social networks
Relatively easy to implement
Feasible to self-host without relying on big players
Social network domain agnosticism: entirely different kinds of applications should be able to usefully talk to and collaborate with each other with the same protocol
Flexibility and extensibility (which fell out of json-ld for ActivityPub, though it could have been accomplished other ways)
A unified design for client-to-server and server-to-server. This was important for ActivityPub at least. Amy Guy ultimately did the important work of separating the two enough where you could just implement one or the other.
An implementation guide which told a story, included in the spec. (Well, maybe I was the only one who really was opinionated about that, but I still do think it was one of the things that lead AP to be successful.)

In some ways though, that still doesn't speak enough of values to me, though. I added this late in the spec, and I kind of did it without consulting anyone until after the fact, sneaking it into a commit where I was adding acknowledgments. It felt important, and ultimately it turned out that everyone else in the group liked it a lot. Here it is, the final line of the ActivityPub spec:

This document is dedicated to all citizens of planet Earth. You deserve freedom of communication; we hope we have contributed in some part, however small, towards that goal and right.

Spritely's values and design goals

Spritely is the decentralized networking research organization I'm the head of. We're trying to build the next generation of internet infrastructure, and I think we're doing incredibly cool things.

It's easier for me to talk about the values of Spritely than ActivityPub, having founded the project technically from the beginning and co-founded it organizationally. Here is the original mission statement which Karen Sandler and I put together:

The purpose of The Spritely Institute is to advance networked user freedom. People deserve the right to communicate and have communication systems which respect their agency and autonomy. Communities deserve the right to organize, govern, and protect and enrich their members. All of these are natural outgrowths of applying the principles of fundamental human rights to networked systems.
Achieving these goals requires dedicated effort. The Spritely Foundation stewards the standardization and base implementation for decentralized networked communities, promotes user freedom and agency of participants on the network, develops the relevant technologies as free, libre, and open source software, and facilitates the framing and narrative of network freedom.

But still, though we have a mission statement, we haven't written out a bullet point list like this before and so I tried to gather Spritely staff input on this:

Secure collaboration: Spritely is trying to enable safe cooperation between individuals and communities in an unsafe world. We are working on tools to make this possible.
Networks of consent: The cooperation mechanism we use is capability security, which allows for consent-granted mechanisms which are intentional, granted, contextual, accountable, and revocable. Rather than positioning trust as all-or-nothing or advocating for "zero trust" environments, we consider trust as something fundamental to cooperation, but it's also something that is built. We want individuals and communities to be able to build trust to collaborate cooperatively together.
Healthy communities: We must build tech that allows communities to self-govern according to their needs, which vary widely from community to community. We may not know all of these needs or mechanisms required for all communities in advance, but we should have the building blocks so communities can easily put them in place.
User empowerment and fostering agency: We believe in users having the freedom to communicate, but also to be able to live healthy lives protected from dangerous or bad interactions. We want users to be able to live the lives they want to live, as agents in the system, to the degree that it does not harm the agency of other users in the system. Maximizing agency and minimizing subjection, not just for you and me, but for everyone, is thus is a foundation.
Contextual communication: There is no "global town square", and we are deeply concerned about context collapse. Communication and collaboration should happen from contextual flows.
Decentralized is the default: We are building technology foundations on top of which then the rest of our user-facing technology is built. These foundations change the game: instead of peer-to-peer, decentralized, secure tech being the realm of experts, it's the default output of software built on top of our tech.
Participatory, gatekeeper-free technology: Everyone should be able to participate in the tech, without gatekeepers. This means we have a high bar for our tech being possible for individuals to meaningfully run and for a wide variety of participants to be able to cooperate on the network at once.
We should not pretend we can prevent what we cannot: Much harm is caused by giving people the impression that we provide features and guarantees that we cannot provide. We should be clear about the limitations of our architecture, because if we don't, users may believe they are operating with safety mechanisms which they do not have, and may thus be hurt in ways they do not expect.
Contribute to the commons: We are a research institution, and everything we build is free and open source software, user-freedom empowering tech and documentation. This also informs our choice to run the Spritely Institute, organizationally, as a nonprofit building technology for the public good.
Fun is a revolutionary act: The reason technology tends to succeed is that people enjoy using it and get excited about it. We care deeply about human rights and activism. This is not in opposition to building tech and a community environment that fosters a sense of fun; planned carefully, fun is at the core of getting people to understand and adopt any technology we make.

I will note that the second to last post, contributing to the commons, makes running the Spritely Institute challenging in so far as the commons, famously, benefits everyone but is difficult to fund. If the above speaks to you, I will note that the Spritely Institute is, at the time of writing, running a supporter drive, and we could really use your support. Thanks. 💜

This is not a post about Spritely, but I appreciate that Bryan invited me talking about Spritely a bit here. And ultimately, this is important, because I would next like to talk about the present and the future, and the world that I think we can build.

Where to from here?

I am relieved that the previous piece was overall received well and was not perceived as me "attacking" Bluesky. I hope that this piece can be seen the same way. I may have been "harshly analytical" in my analysis at times, but I have tried to not be mean. I care about the topics discussed within this blogpost, and that's why I spent so much time on them. I know Bryan feels the same way, and one thing we both agree on is that we don't want to be caught in an eternal back-and-forth: we want to build the future.

But building the future does mean clear communication about terminology. I will (quasi)quote Jonathan Rees again, as I have previously when talking about the nature of language and defining terminology:

Language is a continuous reverse engineering effort between all parties involved.

(Somewhat humorously, I seem to adjust the phrasing of this quoting-from-memory just slightly every time I quote it.)

If we aren't careful and active in trying to understand each other, words can easily lose their meaning. They can even lose their meaning when shifting between defined contexts over time. The fact that Baran defined the term "decentralization" as a particular kind of centralization was because he was responding to a context in which that term had already been defined (and thus introducing a new term "distributed" to describe what we might call "decentralization"). The fact that today we use "centralization" and "decentralization" as two ends of a spectrum is also fine. I don't think Bryan quoting Mark quoting Baran in this way and thus introducing this error was intentional, but ultimately it helps explain exactly why the term chosen produced a real risk of decentralization-washing.

I agree that Bluesky does use some decentralization techniques in interesting and useful ways. These enable "credible exit", and are also key to enabling some of the other values and design goals which Bryan articulated Bluesky and ATProto as having. But to me, a system which does not permit user participation in its infrastructure and which is dependent on a few centralized gatekeepers is not itself decentralized.

So what of my analysis of the public-global-gods-eye-view-shared-heap approach as growing quadratically in complexity as the system decentralizes? I'm not trying to be rude in any way. I made a statement about the behavior of the system algorithmically, and it felt important that I not only assess whether or not that statement was true because if it wasn't true then I would want to understand myself and retract it. But there's interest and belief right now by many people that ATProto can be "self hosted". It's important, at least, to understand to the degree which that simply, under the current architecture, it is not possible to do. Especially because right now a lot of people are operating on this information out of belief in and hope for the future. If my assertion about the quadratic explosion problems of meaningfully decentralizing ATProto are false, and that it is possible for self-hosting to become common in the system with the properties that Bluesky has set out as being key features still being possible to be preserved, then I will welcome and retract that assertion.

However, I suspect that the reality is that I am not wrong, and instead what we will see is a shift in expectations about what is possible for Bluesky to be decentralized and in what capacity. Some people will be upset to have a new realization about what is and isn't possible, some people will simply update their expectations and say that having only a few large players be able to provide a Bluesky-like experience is actually good enough for them (and that what they're interested in instead is API-consumer-and-redistributor features on top of Bluesky's API), and the majority of the network will have the same level concern they have always had: none.

The reality is that most of Bluesky's userbase doesn't know or care about or understand the degree to which Bluesky is decentralized, except for potentially as a reassurance that "the same thing can't happen here" as happened on X-Twitter. "Decentralization" is not the escape hatch people think it might be in Bluesky, but it's true that "credible exit" may be. However, the credibility of that exit currently predicates on another organization of the same cost and complexity of Bluesky standing in if-or-when Bluesky ends up becoming unsatisfying to its users.

But the indifference towards Bluesky's "credible exit", indeed the indifference towards very architecture on which Bluesky is built, puts Bluesky at an immediate collision course of expectations. ATProto's entire design is built on the foundational expectation of replicating and indexing its content by anyone, but the discovery that this is possible for purposes which users are not excited about has begun to lead to an increased backlash by users, many of whom are increasingly asking for solutions which are effectively centralized.

To me, this collision course is unsurprising, and I am empathetic towards users insofar as that I think we are seeing that the global public firehose worldview is perhaps not the right way to do things. I laid out a different set of values that Spritely is pursuing, and I think that a system that encompasses these values is a system which better fits the needs of users. I think we need systems which empower users and healthy communities, secure collaboration, and all the other values we put out above. Those are the design goals, but Spritely is on a longer roadmap in terms of deliverables than Bluesky is. And Bluesky has a userbase now. So perhaps this observation sounds thoroughly unhelpful. I don't know. But I will say I am not surprised to see that the vibes on Bluesky shifted dramatically between three weeks ago when I wrote the first article and today. In many ways, Bluesky is speedrunning the history of Twitter. Investor pressure towards centralization compounded with users who are upset to find their content replicated and indexed by people they don't like will likely combine into a strong push to restrict Bluesky's API, and I'm not sure myself how this will play out for certain.

And all of that sounds fairly negative, so let me shift towards something positive.

I still do truly believe that "credible exit" is a worthy goal. Actually, I think that (perhaps with one mentioned wording change) all of Bluesky's stated goals are actually quite good. I think Bluesky should continue to pursue them. And I think Bluesky has a team that is interested in doing so. There may be opportunities to share knowledge and collaborate on solutions between Bluesky and other projects, including those I work on. I know Bryan and I are both interested in such. And I said in the previous article how much I respect Jay Graber, and that's true. I also respect Bryan Newbold tremendously. One thing is true for certain: Bryan is a believer in all of the ideals he previously stated. I respect him for that. I would like to see those ideals succeed as far as they possibly can. Perhaps there are even ways to do so together. I will not waver in my goals and values, but I am a strong believer in collaboration where it is fruitful.

And that is the conclusion to what I have to say on the matters of Bluesky and decentralization. I will probably comment on the fediverse and Bluesky itself, but I don't think I will write another blogpost like these two mega-posts I have written. I am not personally interested in going back-and-forth on this any longer. More than I am interested in laying out concerns, by far, I am interested in building the future.

Thanks for listening.

How decentralized is Bluesky really?

By Christine Lemmer-Webber on Fri 22 November 2024

Recently due to various events (namely a lot of people getting off of X-Twitter), Bluesky has become a lot more popular, and excitement for its underlying protocol, ATProto, is growing. Since I worked on ActivityPub which connects together Mastodon, Sharkey, Peertube, GotoSocial, etc, etc, etc in the present-day fediverse, I often get asked whether or not I have opinions about ATProto vs ActivityPub, and the answer is that I do have opinions, but I am usually head-down focused on building what I hope to be the next generation of decentralized (social) networking tech, and so I keep to myself about such things except in private channels.

This debate has been growing harder to ignore, with articles ranging from "Bluesky is cosplaying decentralization" on the one hand and "Nobody cares about decentralization until they do" on the other (which I suppose went subscriber-only; it had a big splash recently and wasn't previously) in favor of ATProto and arguing that other approaches are not as decentralized. Still, I mostly believed that anything I had to say on the subject would not be received productively, and so I figured it was best to reserve comment to myself and those close to me. But recently I have received some direct encouragement from a core Bluesky developer that they have found my writings insightful and useful and would be happy to see me write on the subject. So here are my thoughts.

Let us open with a framing. Decentralization is the result of a system that diffuses power throughout its structure, so that no node holds particular power at the center. "Federation", as has been used as a technical term since the emergence of the "Fediverse" (which presently is mostly associated with ActivityPub, though I would argue XMPP and email are also federated), is a technical approach to communication architecture which achieves decentralization by many independent nodes cooperating and communicating to be a unified whole, with no node holding more power than the responsibility or communication of its parts.

Under these definitions, Bluesky and ATProto are not meaningfully decentralized, and are not federated either. However, this is not to say that Bluesky is not achieving something useful; while Bluesky is not building what is presently a decentralized Twitter, it is building an excellent replacement for Twitter, and Bluesky's main deliverable goal is something else instead: a Twitter replacement, with the possibility of "credible exit".

Bluesky's strengths

I'm sure some people are already bristling having read the previous paragraph, and I will get to explaining my rationale, which is a technical analysis in nature. But let's open with the positive, because I think there are positive things to say about Bluesky as well.

Bluesky has done an incredible job scaling to meet the present moment. Right now, Bluesky is facing a large influx of users who are looking for an alternative to X-Twitter since Musk's takeover (and particularly since Trump's re-election). In other words, the type of user who would be sympathetic to the post X is a White-Supremacist Site is now looking for someplace else to be. The future of X-Twitter is a place where only hard-right people are going to feel comfortable; anyone else is going to be looking for a replacement now, and Bluesky is going to be their quickest and easiest option.

In many ways, Bluesky was built for this. Its experience is basically a one-to-one feature-for-feature replacement for the Twitter that many people loved. And the original directive that Bluesky was given, when it was Jack Dorsey and Parag Agrawal's joint pet project (my understanding at the time was that it was Jack's vision, but Parag took the lead, and my impression was also that they were both very sincere about this) to kick off a decentralized protocol which Twitter could adopt. This informed a lot of the initial architectural decisions of Bluesky, including its scaling needs. It also incidentally lead it to become an excellent offboarding platform when it turned out that many X-Twitter users no longer felt comfortable on the platform. You miss old Twitter? Bluesky already has been building an alternative: hop on board, it's just like old Twitter!

The fact that Jack Dorsey kicked off Bluesky as an initiative and a funded effort and that Jack was originally on Bluesky's board often leads to snarky or snide comments on the fediverse that Bluesky is owned by Jack Dorsey. However, this isn't true: Jack Dorsey quit Bluesky and has been focusing on Nostr (which I can best describe as "a more uncomfortable version of Secure Scuttlebutt for Bitcoin people to talk about Bitcoin"). So I don't think this particular criticism holds true. Bluesky is also fully independent of Twitter; my impression is that this only happened because Jay Graber (Bluesky's CEO) very carefully negotiated to make sure that when Bluesky got its funds that it would receive them without Twitter having control, and this shows a lot of foresight on Jay's part.

For that matter, I think the part of Bluesky I probably respect most personally is Jay Graber. I was not surprised when she was awarded the position of leading Bluesky; she was the obvious choice given her leadership in the process and project, and every interaction I have had with Jay personally has been a positive one. I believe she leads her team with sincerity and care. Furthermore, though a technical critique and reframing follows, I know Jay's team is full of other people who sincerely care about Bluesky and its stated goals as well.

There is one other thing which Bluesky gets right, and which the present-day fediverse does not. This is that Bluesky uses content-addressed content, so that content can survive if a node goes down. In this way (well, also allegedly with identity, but I will critique that part because it has several problems), Bluesky achieves its "credible exit" (Bluesky's own term, by the way) in that the main node or individual hosts could go down, posts can continue to be referenced. This is possible to also do on the fediverse, but is not done presently; today, a fediverse user has to worry a lot about a node going down. indeed I intentionally fought for and left open the possibility within ActivityPub of adding content-addressed posts, and several years ago I wrote a demo of how to combine content addressing with ActivityPub. But nonetheless, even though such a thing is spec-compatible with ActivityPub, content-addressing is not done today on ActivityPub, and is done on Bluesky.

Bluesky's architecture and centralization

On blogs, search engines, and Google Reader

When you build architecture that in theory anyone can participate in, but the barrier to entry is so high so that only those with the highest number of resources can participate, then you've still built a walled garden. -- Morgan Lemmer-Webber, (summarizing things succinctly in our household over breakfast)

Think of our app like a Google. -- ATProto quick start guide

I believe that to this day, the web and blogs are still perceived as a decentralized and open publishing platform. However, I don't know of anyone who would consider Google to be decentralized. In theory, anyone could build their own search engine: the web, which is being consumed and indexed, is trivial enough to parse and aggregate. However, few can, and in practice, we see two (or maybe three) search engines in practice alive today: Google, Bing, and maybe DuckDuckGo (which, per my understanding, uses several sources, but is largely Bing).

This is all to say, in many ways Bluesky's developers have described Bluesky as being a bunch of blogs aggregated by Bluesky as a search engine, and while this isn't really true, it's a good starting point for understanding its challenges. (To understand the rest of the terms involved in the document in detail, Bluesky's architecture document is a good source.)

The same way that in theory the web and blogs are not tied to Google, neither are ATProto's Personal Data Stores (necessarily) tied to Bluesky, the company. Since a small number of people are running Personal Data Stores right now, which is quite viable, Bluesky may have the appearance of being decentralized. And at present, there's really only one each of the Relay and (Twitter-like) AppView components used in practice, but there is a real possibility of this changing and real architectural affordance work to allow it. So perhaps things to not seem all that bad.

However, if we look back at the metaphor of blogs and Google, it's important to note that before social networking in its present form took off, blogs and the "blogosphere" were the primary mechanism of communication on the internet, aggregated by RSS and Atom feeds. And RSS and Atom feed readers started out with an enormous amount of "biodiversity" and largely ran on peoples' desktops. Then along came Google Reader and... friends, if you are reading this and are of a certain age range, there is a strong chance you have feelings just seeing the phrase "Google Reader" mentioned. Google Reader did a great job of providing not all of RSS and Atom feed readers, but enough of it that when Google shuttered it, blogging (and my favorite offshoot of blogging, independent webcomics) crumbled as a primary medium. Blogs still exist, but blog feed aggregation fell quickly to the wayside. Browsers removed the feed icon, and right around that time, social networks in their present shape took their place. To this day, blogs are now primarily shared on social networks.

This is all to say: blogging plus feed readers started out a lot more decentralized than Bluesky has, and having one big player enter the room and then exit effectively killed the system.

And that's even without feed readers being particularly expensive or challenging to run as independent software. I have this concern for the fediverse as well (in case you think this article is harsh on Bluesky and I am a fediverse fangirl because I co-authored the spec it uses, stay tuned; I plan on releasing a critical analysis of the fediverse as-is here shortly). mastodon.social is certainly a "meganode" and Threads... let's not even get started with Threads, that's a long topic. And running your own server is much more challenging than I'd like. But even so, if you check some of the fediverse aggregators such as FediDB or Fediverse Observer you will see thousands of servers across many interoperating implementations.

Self-hosting resources: ActivityPub and ATProto comparison

Hosting a fediverse server is cheap particularly if you use something like GotoSocial is lightweight enough where one could host a server for one's family or friends on a device as light as a Raspberry Pi style form factor. It may require a lot of technical expertise, I may have many critiques of how it runs, but it's possible to host a fully participating fediverse node quite cheaply.

Now, you may see people say, running an ATProto node is fairly cheap! And this is because comparatively speaking, running a Personal Data Store is fairly cheap, because running a Personal Data Store is more akin to running a blog. But social networks are much more interactive than blogs, and in this way the rest of Bluesky's architecture is a lot more involved than a search engine: users expect real-time notifications and interactivity with other users. This is where the real architecture of Bluesky/ATProto comes in: Relays and AppViews.

So how challenging is it to run those? In July 2024, running a Relay on ATProto already required 1 terabyte of storage. But more alarmingly, just a four months later in November 2024, running a relay now requires approximately 5 terabytes of storage. That is a nearly 5x increase in just four months, and my guess is that by next month, we'll see that doubled to at least ten terabytes due to the massive switchover to Bluesky which has happened post-election. As Bluesky grows in popularity, so does the rate of growth of the expected resources to host a meaningfully participating node.

"Message passing" vs "shared heap" architectures

The best way to understand the reason for this difference in hosting requirements is to understand the underlying architecture of these systems. ActivityPub follows an message passing architecture (utilizing publish-subscribe architecture prominently for most "subscription" oriented uses), the same as email, XMPP, and so on. A message is addressed, and then delivered to recipients. (Actually a more fully peer-to-peer system would deliver more directly; all of email, XMPP, ActivityPub and so on use a client-server architecture, so there is a particular server which tends to operate on behalf of a particular user. See comments on the fediverse later in this article for how things can be moved more peer-to-peer.) This turns out to be pretty efficient; if only users on five servers need to know about a message, out of tens of thousands of servers, only those five servers will be contacted. Until recently, every system I knew of described as federated used a message passing architecture, to the degree where I and others assumed that federation implied a message passing architecture, because achieving the architectural goal of many independent nodes cooperating to produce a unified whole seemed to imply this was necessary for efficiency of a substantially sized network. If Alyssa wants to write a piece of mail to Ben, she can send it directly to Ben, and it can arrive at Ben's house. If Ben wants to reply, Ben can reply directly to Alyssa. Your intuitions about email apply exactly here, because that's effectively what this design is.

Bluesky does not utilize message passing, and instead operates in what I call a shared heap architecture. In a shared heap architecture, instead of delivering mail to someone's house (or, in a client-to-server architecture as most non p2p mailing lists are, at least their apartment's mail room), letters which may be interesting all are dumped at a post office (called a "relay") directly. From there it's the responsibility of interested parties to show up and filter through the mail to see what's interesting to them. This means there is no directed delivery; if you want to see replies which are relevant to your messages, you (or someone operating on behalf of you) had better sort through and know about every possible message to find out what messages could be a reply.

It is curious then that the reason for not taking a message passing architecture such as ActivityPub as the foundation for Bluesky and ATProto is often described as wanting users to not have the experience of seeing a reply thread containing missed messages. From the AT Protocol paper:

The distinction between servers in Mastodon introduces complexity for users that does not exist in centralized services. For example, a user viewing a thread of replies in the web interface of one server may see a different set of replies compared to viewing the same thread on another server, because a server only shows those replies that it knows about.

This is particularly curious because experiencing missed messages is a frequent complaint of other shared heap architecture designs such as Secure Scuttlebutt and Nostr, where missing message replies are even more common than on ActivityPub and other message-passing federated architectures. (Both Secure Scuttlebutt and Nostr take steps so you don't need to necessarily fetch everything; in SSB you fetch the feeds of your friends and 3 degrees removed from your friends from the hubs you use, and anything else you simply don't see. In Nostr you simply "embrace the chaos" of only grabbing the information from hubs you use, and hubs don't try to fetch all information.) For instance, if Ben replies to Alyssa's message in one of these systems but does not leave the reply message in the relay which Alyssa pulls from, Alyssa would never see Ben's reply. If multiple relays were to exist in Bluesky, this same problem would presumably occur, so how does Bluesky solve this?

The answer is: Bluesky solves this problem via centralization. Since there is really just one very large relay which everyone is expected to participate in, this relay has a god's-eye knowledge base. Entities which sort through mail and relevant replies for users are AppViews, which pull from the relay and also have a god's-eye knowledge base, and also do filtering. So too do any other number of services which participate in the network: they must operate at the level of gods rather than mortals.

The reality of the fediverse today is that due to the complexity of hosting an instance, many users join nodes hosted by either a friend or a larger group, but there are still many nodes on the network. As mentioned earlier, this is closer to being an apartment building than a house, but the ideal version of decentralization is that everyone self-hosts, and from a resource perspective, this is perfectly possible to do. It would be possible, for mere tens of dollars, for everyone to get a cheap computer and self-host something like GotoSocial and (ignoring the challenges of firewalls and ISPs frowning upon self-hosting at home these days) from an architectural perspective, it's certainly possible. The primary challenge preventing this future is in the technical difficulty of hosting these services presently (and the way the internet has generally become hostile to home-hosting, but shared hosting for this level of service is also still relatively cheap for individuals). The ideal version of decentralization is, from a message transmission perspective (we shall look at other aspects later), fully possible.

The physical world equivalent for a fully decentralized fediverse then is that every user sends mail to every other user's house, as needed, similar to how sending letters works in the physical world. This is decidedly not the case with a fully decentralized ATProto. The physical world equivalent would be that every user had their own house at which they stored a copy of every piece of mail delivered to every other user at their house.

If this sounds infeasible to do in our metaphorical domestic environment, that's because it is. A world of full self-hosting is not possible with Bluesky. In fact, it is worse than the storage requirements, because the message delivery requirements become quadratic at the scale of full decentralization: to send a message to one user is to send a message to all. Rather than writing one letter, a copy of that letter must be made and delivered to every person on earth.

The costs of decentralizing ATProto

Bluesky's architecture documentation does acknowledge this, to some degree:

The federation architecture allows anyone to host a Relay, though it’s a fairly resource-demanding service. In all likelihood, there may be a few large full-network providers, and then a long tail of partial-network providers. Small bespoke Relays could also service tightly or well-defined slices of the network, like a specific new application or a small community. -- Federation Architecture Overview (Bluesky blog)

What is not mentioned is that any smaller bespoke relays would have a greater problem with missing message replies than a directed message-passing architecture has. If larger nodes are gods, then I suppose smaller nodes are demi-gods, from which one could say that only truly fully and complete gods can participate meaningfully in the network.

In the meanwhile, many users of Bluesky seem to be operating under the impression that things are more decentralized than they are:

Part of the concern I have with Bluesky presently is thus that people are gaining the impression that it's a decentralized system in ways that it is not. There are multiple ways this could end up being a problem for the decentralized world; one irritating way is that people might believe there's an "easy decentralized way to do things" that Bluesky has discovered which isn't actually that at all, and another is that Bluesky could collapse at some point and that people might walk away with the impression of "oh well, we tried decentralization and that didn't work... remember Bluesky?"

But perhaps we should look at making Bluesky more decentralized by adding more meaningfully fully participating nodes to it again. How much would that cost today, and how much will that cost in the future?

Again, returning to alice's previously mentioned blogpost, the most recent time that costs of running a Bluesky relay were calculated (which does not also include the costs of running an AppView node or any other critical components), just looking at storage, the amount required was 5 terabytes of storage. The first glance thing I did was to look up, if you were going to pick up a complete shared hosting server configured with that storage size, how much would that be on Linode, as just a common example of a shared hosting provider? At first glance, this appeared to end up around $55k/year, just to host the last estimate of a current relay:

Bryan Newbold pointed out that this was fairly expensive and that there were cheaper options even on Linode using Linode's block storage options (though this doesn't account for still needing a database in addition):

But as both alice and Bryan Newbold have pointed out, using a dedicated server would be much cheaper. Unfortunately, this solution only scales for so long. In Bryan's previous article on running a relay, costs were calculated for one terabyte, and the server came to $152/month plus a one-time setup fee of $92. Cheap! However, the network is clearly growing and already exceeds that size, so let's take the same bare metal server and see how much storage we can add and how expensive this will get:

That's nearly 5x the expense to move to what it looks like the expenses for running a relay will be very soon. And this is for a single server that isn't being used, as we say, "in anger" against an actual userbase, of which the expenses of hosting such a thing are unknown, because nobody is really using it. We are now hitting the limits of a dedicated server regardless, so one will have to move towards more abstracted and clustered storage and indexing mechanisms past this point to keep the network running (unless disk manufacturers surprise us all with an enormous leap in capacity which is rolled out in the very short term future).

And that is just for storage running without backups or any of the other things one would need to keep such a thing going, including bandwidth and CPU cycles and so on and so forth. A single machine does not look like it can be a viable solution for very long, so pointing to dedicated servers which can currently handle an entire relay (when not actually relied upon by any number of users) isn't particularly convincing to me.

There is also the legal liability that one is taking on by effectively hosting the equivalent of all of Twitter! While Bluesky/ATProto does provide multiple filtering techniques which are very interesting, the relay does need to be in the business of identifying what content is not safe to use:

[...] the Relay performs some initial data cleaning (discarding malformed updates, filtering out illegal content and high-volume spam) -- Bluesky and the AT Protocol: Usable Decentralized Social Media

The likely answer to this is that there will always have to be a large corporation at the heart of Bluesky/ATProto, and the network will have to rely on that corporation to do the work of abuse mitigation, particularly in terms of illegal content and spam. This may be a good enough solution for Bluesky's purposes, but on the economics alone it's going to be a centralized system that relies on trusting centralized authorities.

Everything is public, including who you block

You may be reading so far at this point and be wondering: so far I have only analyzed Bluesky from the perspective of public content. What about private or semi-private content? How does Bluesky provide its various services of filtering and labeling and so on in such an environment, and how does Bluesky know which messages are sent in reply to your messages with limited or entirely private messages?

The answer is that Bluesky and ATProto have no design for this at present, and most of the architectural assumptions assume public messages only. Now this could change of course, but everything within Bluesky's current literature and architecture assume public-only content. In fact, even blocks are public information:

[...] for example, if one user has blocked another, and one of the users’ repositories contains a record of an interaction that should not have been allowed due to the block, then the App View drops that interaction so that nobody can see it in the client apps. This behavior is consistent with how blocking works on Twitter/X, and it is also the reason why blocks are public records in Bluesky: every protocol-conforming App View needs to know who is blocking who in order to enforce the block. -- Bluesky and the AT Protocol: Usable Decentralized Social Media

I'm not sure this behavior is consistent after all with how blocking works on X-Twitter; it was not my understanding that blocking someone would be public information. But blocks are indeed public information on Bluesky, and anyone can query who is blocking or being blocked by anyone. It is true that looking at a blocking account from a blocked account on most social media systems or observing the results of interactions can reveal information about who is blocked, but this is not the same as this being openly queryable information. There is a big difference between "you can look at someone's post and see who is being blocked" to "you can query the network for every person who is blocking or is blocked by JK Rowling".

I found this very surprising; in ActivityPub's development, I remember a conversation between Amy Guy and myself where we decided it was very important to not deliver Block activities between servers. We encoded this in ActivityPub's specification thusly:

The Block activity is used to indicate that the posting actor does not want another actor (defined in the object property) to be able to interact with objects posted by the actor posting the Block activity. The server SHOULD prevent the blocked user from interacting with any object posted by the actor.
Servers SHOULD NOT deliver Block Activities to their object. -- ActivityPub

The reason for this is very simple: we have seen people who utilize blocklists be retaliated against for blocking someone who is angry about being blocked. It was our opinion that sharing such information could result in harassment. (Last I checked, Mastodon provides the user with the choice of whether or not to send a "report" about a block to the offending instance so that moderators of that server can notice a problematic user and take action, but delivering such information is not required.)

That said, to Bluesky's credit, this is an issue that is being openly considered. There is an open issue to consider whether or not private blocks are possible. Which does lead to a point, despite my many critiques here: it is true that even many of the things I have talked about could be changed and evaluated in the future. But nonetheless, in many ways I consider the decision to have blocks be publicly queryable to be an example of emergent behavior from initial decisions... early architectural decisions can have long-standing architectural results, and while many things can be changed, some things are particularly difficult to change form an initial starting point.

Direct messages are fully centralized

But you may notice! Bluesky provides direct messages! So surely not all information is publicly available, because otherwise else direct messages would simply not work! So how do direct messages work in Bluesky?

The answer, if you guessed it, is centralization. All direct messages, no matter what your Personal Data Store is, no matter what your relay is, go through Bluesky, the company.

If you find this shocking, so did I, but then again, this information was publicly available even when direct messages were announced. Bluesky's direct messages are also not end-to-end encrypted, and don't use any particular kind of protocol which is amenable to decentralization or federation.

But perhaps we should back up... am I being too harsh? After all, while the fediverse works like email (actually, ActivityPub's architecture is (contrary to many users' expectations since Mastodon is also a Twitter clone and is how most people experience the fediverse), designed for direct communication first and foremost... public communication is clearly supported, but the default and simplest case is direct individual or group messaging), it's also "about as private as email". Which is to say, not private enough for many kinds of security concerns these days: your administrator can read your DMs but hopefully does not in the general case, but messages are not end-to-end encrypted at present. But I can know my administrator personally, and thus the trust dynamics are often not the same.

A feature complete Twitter ASAP

So direct messages on Bluesky are centralized, and while Bluesky does say so in their blogposts (past the point at which most people have read), most users I have talked to have assumed they worked the same way that the rest of ATProto works. Why would Bluesky roll out a direct message system that they have acknowledged is not the long term direct message system they would like long term? (Though I am also still puzzled... why they didn't at least use XMPP?)

The presumable answer is: Bluesky wanted to provide a feature-complete platform from the perspective of a user who is looking for an exit from Twitter now. And this is honestly a fair decision to make in many ways, since as I have said previously, while I don't see Bluesky as a very good decentralized Twitter, I do see it as a good replacement for Twitter, which is what most users are looking for immediately. But the lack of understanding of these detail by many users and media coverage is a bit maddening from someone like myself who actually really does look at and care about decentralization, and I know it's a bit maddening to much of the fediverse too.

But again: to many users, this doesn't matter. What many users fleeing X-Twitter right now care about is a replacement for Twitter. For that matter, if you're coming from Twitter, whether or not Bluesky is truly decentralized, it certainly seems more decentralized than Twitter, the same way that Twitter may seem more decentralized than cable news. Things are sometimes more decentralized in degrees, and I certainly think the fediverse could be more decentralized than it is. (More, again, on this later.) But in all ways related to the distribution of power, Bluesky's technology is notably much less distributed than existing and prominent decentralized technology in deployment today.

Bluesky is centralized, but "credible exit" is a worthy pursuit

In many places, Bluesky acknowledges that it is more centralized than other alternatives in its own writing. From its own paper:

Even though the majority of Bluesky services are currently operated by a single company, we nevertheless consider the system to be decentralized because it provides credible exit: if Bluesky Social PBC goes out of business or loses users’ trust, other providers can step in to provide an equivalent service using the same dataset and the same protocols. -- Bluesky and the AT Protocol: Usable Decentralized Social Media

It is not a bad choice for Bluesky to be focused on providing an alternative to X-Twitter for those who miss Twitter-of-yore and are immediately looking for an offboarding from an abusive environment. I understand and support this effort! Bluesky does use several decentralization tricks which may lend themselves more towards its self-stated goal of "credible exit". But these do not make Bluesky decentralized, which it is not within any reasonable metric of the power dynamics we have of decentralized protocols which exist today, and it does not use federation in any way that resembles the way that technical term has been used within decentralized social networking efforts. (I have heard the term "federation-washing" used to describe the goalpost-moving involved here, and I'm sympathetic to that phrase personally.)

In my opinion, this should actually be the way Bluesky brands itself, which I believe would be more honest: an open architecture (that's fair to say!) with the possibility of credible exit. This would be more accurate and reflect better what is provided to users.

ATProto's portable identity challenges

Bluesky's credible exit claims rely both on content addressing but also on its use of Decentralized Identifiers (DIDs) for account migration. This is certainly a good goal, and account migration support is something we should see more broadly (including on the fediverse).

However, there are several major problems:

ATProto supports two DID methods, did:web and did:plc, which (despite the "D" in "DID"!) are both centralized.
The cyclic relationship between ATProto's approach to DIDs and DNS causes problems which undercuts the utility of DIDs (this is addressable, but it's not clear to me that there will be interest).
Even if a user wishes to switch away from Bluesky's infrastructure, Bluesky probably has effective permanent control over that user's identity destiny, removing the reassurance that one need not trust Bluesky as a corporation in the long term.
Several other concerning details about did:plc generated DIDs.

Some history: Decentralized Identifiers (including the centralized ones)

First, some background. The DID Core spec spec is really more of an abstract interface on which specific "DID methods" can be implemented. What a DID method mostly provides (it does some other things too, but but as importantly) is a mechanism by which cryptographic public keys can be registered, retrieved, and rotated (though rotation ability is not strictly a requirement; did:key cannot be rotated, for instance).

Surprisingly, despite the name and the original intents of the DID architects when DIDs were being envisioned, that a DID method be decentralized is not a requirement.

I said surprisingly, so: are you surprised? I used to be very active in this space (I never worked on the DID spec directly, but when I worked at Digital Bazaar I was sometimes active in the Verifiable Credentials calls, and I did co-author a spec also developed in that space for linked data capabilities, zcap-ld). I say this because I want to provide context that allowing decentralized identifier methods to not be decentralized at all was a debate.

I think I had stepped out of the group by then, but I remember talking to colleagues about did:web; it was the first real argument for centralized DIDs. But wait, didn't I previously say that the web was open and decentralized? Yes, but the naming+encryption system the web runs on top of is not: DNS+TLS relies on trusting ICANN on down and TLS Certificate Authorities, both of which are centralized approaches. My understanding of the justification for did:web was primarily that everyone would have a trivial DID method that would allow all conforming implementations to easily pass the DID test suite. I was in the camp that blessing DIDs which were centralized as "Decentralized Identifiers" would lead to decentralization-washing, and that's exactly what's happening here.

There's another silly thing about did:web: there's really not a real reason for did:web, since all did:web does is effectively get rewritten via a trivial regular expression to an https: link, and you could just use that very https: link instead of did:web and serve the same information in any relevant context. But the thing is, people hear that did:web is a decentralized identifier, so they assume it must be, even though again, did:web never gets us past the centralization challenges inherent in DNS+TLS, it simply uses them! Unfortunately, due to the name, many people think did:web provides a more robust layer of security than simply retrieving a key over https: does. I'm here to tell you that it doesn't, because that's exactly what did:web does anyway.

did:plc, the "placeholder" DID

But Bluesky has developed its own DID method, did:plc. Today, did:plc stands for "Public Ledger of Credentials", however it originally stood for "Placeholder DIDs", with the hope of replacing them with something else later. The way that did:plc works is that Bluesky hosts a web service from which one can register, retrieve, and rotate keys (and other associated DID document information). However, this ledger is centrally controlled by Bluesky.

This aspect of centralization, on its own, doesn't bother me as much as a reader might think! For one thing, if all works right, Bluesky can only deny rotations or retrieval of did:plc documents, but since future updates to the document are signed by the original DID document's key, Bluesky shouldn't (hm, we'll return to "shouldn't" in a second) be able to forge future updates to said document. And Bluesky's developers are very open to acknowledging that did:plc is centralized, and have expressed some interest in moving to something else, or improving its governance so that the organization is controlled by another more neutral org (Paul Frazee in particular suggests that one solution could even be to move to an ICANN-like organization).

However, there are other aspects to did:plc which seem strange. For one thing, did:plc documents' identifiers are, as best as I can tell, sha256 hashes of the DID document truncated to 15 bytes (120 bits) of entropy. This seems like a strange decision to me; it does mean that did:plc URIs fit in 32 characters (8 characters for did:plc:, 20 characters for the truncated hash) which I guess is a nice round "computer'y" number but why throw away all that valuable entropy? For aesthetics? DID identifiers aren't meant to be read by humans, they should be encapsulated, so this is a strange decision to me. (I'm admittedly an amateur when it comes to cryptography, but I'm old enough to remember Debian getting into trouble over using PGP short ids.) (Also choosing sha256 over sha256d, there's maybe the question of length extension attacks, but I suppose the parsing of the document means this is maybe not a problem, I'm not sure.) It's just strange decisions. But again, did:plc was meant to be a placeholder. The problem, of course, is that this placeholder is now the basis of identifiers for many users who have already joined the system today.

I have not spent much time auditing did:plc myself, just reading high level details and wondering, but there are some other strange details which can be found in the blogpost Hijacking Bluesky Identities with a Malleable Deputy. From that post, the most alarming is:

However, there's one other factor that raises this from "a curiosity" to "a big problem": bsky.social uses the same rotationKeys for every account. This is an eyebrow-raising decision on its own; apparently the cloud HSM product they use does billing per key, so it would be prohibitively expensive to give each user their own. (I hear they're planning on transitioning from "cloud" to on-premise hosting, so maybe they'll get the chance to give each user their own keypair then?)

I have not looked, but I would assume this is not the case anymore, but I find it surprising and alarming that reusing the same key per user was ever the case. It feels like this flies in the face of the fundamental goals one would have around building a DID system and it is difficult for me to fathom how such a decision could ever have been made.

But there is a bigger problem regarding centralization and did:plc:

In principle, the cryptographic keys for signing repository updates and DID document updates can be held directly on the user’s devices, e.g. using a cryptocurrency wallet, in order to minimize trust in servers. However, we believe that such manual key management is not appropriate for most users, since there is a significant risk of the keys being compromised or lost.
The Bluesky PDSes therefore hold these signing keys custodially on behalf of users, and users log in to their home PDS via username and password. This provides a familiar user experience to users, and enables standard features such as password reset by email. The AT Protocol does not make any assumptions about how PDSes authenticate their users; other PDS operators are free to use different methods, including user-managed keys.
-- Bluesky and the AT Protocol: Usable Decentralized Social Media

I am sympathetic to this position: it is true that key management for users is an incredibly hard user experience and coordination problem, so this decision might not even be wrong. But the more concerning thing is that users are being told that if they want to walk away from Bluesky the company, they can at any time! After all, it's possible for a user to change both their key and the location it points to at a future time. However, what does that look like for a user who trusts Bluesky today but does not trust Bluesky tomorrow? The truth of the matter is: Bluesky controls users' keys, and therefore even if users "move away" they must trust Bluesky to perform this move on their behalf. And even if Bluesky delegates authority to that user to control their identity information in the future, there is still a problem in that Bluesky will always have control over that user's key, and thus their identity future.

Zooko's triangle, petnames, and a cyclic dependency between DNS and DIDs on Bluesky

Alas, this is not the end of the identity challenges, because there is a fundamental challenge (or set of challenges) around the way Bluesky binds a user's DID to the handle of that user which the user sees. Zooko's triangle tells us that a decentralized and globally unique name cannot also be human meaningful, and indeed Decentralized Identifiers which are actually decentralized cannot be. Ignoring, again, that did:plc is not actually decentralized, it is a non-human-meaningful identifier. So what's the solution? How do we provide a human meaningful name that the user can understand?

I strongly believe that the right answer is a Petname System, which allows for local human meaning to globally non-human-meaningful names. However, the discussion of why I believe that is the right approach and how to accomplish it is too large for this writeup; I will only say that Ink and Switch did a great petnames demo and (while not particularly polished) there are more ideas one can read about in a prototype Spritely put together. But admittedly, petname systems have not been widely deployed to this date, and so the UX challenges around them are not fully solved.

Perhaps because of this reason, Bluesky did not adopt a petname system for users' handles and instead adopts something much more familiar to users today: domain names! Every user on Bluesky's handle is effectively its own domain name. But users can also change their handle by associating with a different domain name later!

In one way, Bluesky is doing the right thing here by taking a human meaningful name and mapping to the less human meaningful identifier, which is what you would want to do if using an actually decentralized Decentralized Identifier, so this appears to be a good sign for the future. But wait... let's take a look a bit deeper, and the situation seems to bet a bit murkier.

ATProto uses the alsoKnownAs field on the DID document itself for the DID to proclaim what URLs it is associated with. It is not really possible for DID documents to be able to verify this information on their own since verifying such things is a "live" operation, and the did:plc method's documentation correctly identifies this:

The PLC server does not cross-validate alsoKnownAs or service entries in operations. This means that any DID can "claim" to have any identity, or to have an active account with any service (identified by URL). This data should not be trusted without bi-directionally verification, for example using handle resolution.

As such, it's the job of the rest of the ATProto consuming infrastructure (at every step of the process, if we are being robust, but less robustly we could choose to trust a source of incoming data) to verify whether or not a DID does indeed map to the handle it claims to.

But this is puzzling. Consider: the point of DIDs originally was to provide a decentralized path to identity, which DNS+TLS is decidedly not. But even in the case of did:plc, one must still rely on the liveness of the web to ensure that a handle and a DID document bidirectionally map to each other. So the problems of did:web, in effect, still exist for did:plc too.

But here are some other thorny questions: if at one point we verified that did:plc:<blah> mapped bidirectionally to alyssa.example, what happens if https://alyssa.example goes down temporarily or permanently? What if a new user claiming to represent alyssa.example shows up instead, and this seems to "check out"? How do we represent posts by the previous alyssa.example to the user? The current one? I don't know how to answer these questions... do you?

For users, DIDs don't really come into account: if bsky.app (and bsky.social) went down because Bluesky the company folded, it would be challenge for users to tell whether or not alyssa.bsky.social continues to represent Alyssa. It isn't clear to me how a sudden "null" mapping of identity to a domain that no longer exists should be represented to the user, and I'm not sure there is one, and at any rate as far as users are concerned, the DNS record of alyssa.bsky.social is the record for Alyssa, not the DID. A petname system solves these problems, Bluesky's user experience does not.

Can you credibly exit with your identity from a Bluesky takeover?

But in total, if a hostile company were to take over Bluesky, did:plc seems to not fare well:

Bluesky controls most users' keys anyway, so can control their identity future regardless, including in terms of signed updates
Most users are mapped to domains controlled as *.bsky.social subdomains, so Bluesky can remap those to different users, and this will be true with any "Personal Data Store" provider one might use too
It's not clear how Ben would know that the person who used to be alyssa.bsky.social is now alyssa.example even if she migrated to her own domain without Ben reading previous interactions (in general, "sudden changes" in one's handle being the norm means that any domain update seems to be a ripe opportunity for phishing attacks anyway)
Bluesky could block future did:plc document updates, and the proposed solution seems to be "another ICANN"

At any rate, in both did:plc and did:web, ICANN must literally be trusted, because domain names are what users know. But if the solution to did:plc is an ICANN-like entity, then I guess users must trust two ICANNs.

So at the moment, Bluesky's identity system is in no real way decentralized, but that's only part of the problems, as outlined above. Still, the fact that Bluesky is using the Decentralized Identifiers interface means that perhaps actually decentralized identity solutions can be layered on top. In the meanwhile, effectively everything bottoms out to trusting the domain name system, and for that matter, trusting Bluesky. Users effectively trust domains, so in the end, we might as well be just retrieving a key from a particular domain. While continuity of identity is in theory possible from one domain onward, this is not done in a useful way that users can understand; a shift from one domain to the next is a complete shift of the handle one is known by, potentially even opening a phishing attack vector. Petname systems could address this issue, but integrating them at this point would be a major shift in how users perceive of the network, and it seems unlikely that downplaying the role of domains is something Bluesky as an organization will be motivated to do since selling domains is currently a Bluesky business strategy.

What should the fediverse do?

I promised a critique of the fediverse, and the reality is that I have been doing critiques of the fediverse the entire time since ActivityPub has been released. It is not the case that I believe that ActivityPub-as-deployed is an end-all-be-all solution. Quite the opposite.

The most succinct version of what I think the fediverse/ActivityPub should do is actually in ActivityPub + OCaps which was, of all things, a proposal about what Bluesky might be which I co-submitted with Jay Graber when Twitter was still evaluating Bluesky proposals. I am not bringing this up because I think it was the proposal which should have been chosen; I think Bluesky had directive based needs around scaling quickly and I ultimately think both that Jay Graber was the correct choice to lead Bluesky and that it also made most sense to run Spritely as a separate organization, because Spritely needed to spend its first few years focused on research fundamentals. However, I think the proposal really is the best writeup I know of on how to transform the fediverse from where it is to where it should be:

Answer the missing authorization part of the ActivityPub spec by integrating capability security throughout. A writeup about why this is important, which includes significant critique of the fediverse, can be found in my OCapPub writeup.
Integrate decentralized / content addressed storage (ideally with an encryption layer a-la Tahoe LAFS) for posts which can survive server shutdown (like in the Golem demo I put together).
Use mutable files within decentralized storage (Tahoe and IPFS are both examples of immutable systems which layer mutable files on top) to permit portable identity. When a user wishes to switch servers, point the inbox property at a new endpoint.
Use a petname system to make the portable / decentralized identifiers (not DIDs necessarily) more human-understandable and to make the system more robust against phishing attacks generally.
More anti-spam / anti-harassment tooling built on top of capability security foundations.
Improve privacy by bringing in End-to-End Encryption (there has been some work on this but I can't say I have followed closely at all).

Some of these tasks are quite feasible for the fediverse to pick up today: the content-addressed storage and the portable identity stuff I think would be a major thing to introduce into the system but would be quite doable and would give the fediverse properties of surviving nodes going down better.

Some of the rest might be more challenging, though these aren't new directions for me to be pushing in the direction of. I put together a document called OCapPub a few years ago to present an alternative vision for how the fediverse should go.

However at the time I found that many fediverse implementers didn't really understand what I was pushing for, and for that matter, didn't really understand how they could possibly implement this on top of web 2.0 frameworks like, say, Ruby on Rails. Fair enough, and the work we've been doing at Spritely is in many ways what I think is the answer: we're designing things such that capability secure, distributed systems are what falls out of Spritely's tech when you write a program in it.

Blaine Cook said that the correct version of ActivityPub and the correct version of ATProto are "the same picture" at one point. This is true insofar as I believe addressing the serious issues of both converges on a shared direction: the fediverse needs to adopt content addressing and portable identity (criticisms of Bluesky's approach to this latter one aside at the moment), Bluesky needs to support a messaging architecture such that participating meaningfully and fully in decentralization does not mean needing to host everything (adopting such a solution will probably mean adopting something that ultimately looks a lot like ActivityPub). And of course, I think both need to move towards supporting privacy and stronger collaboration tools with capability security. While others have argued that these are "different approaches" -- and perhaps this is because I am overly ambitious in what I think decentralized networks should do -- to me this is because both are not being all they could be. Instead to me it feels that there is a "fixed point" of resolving these issues to iterate towards.

But perhaps that's too ambitious to suggest taking on for either camp. And maybe it doesn't matter insofar as the real lessons of Worse is Better is that both first mover advantage on a quicker and popular solution outpaces the ability to deliver a more correct and robust position, and entrenches the less ideal system. It can be really challenging for a system that is in place to change itself from its present position, which is a bit depressing.

This last paragraph applies to both Bluesky and the fediverse, but again, the fediverse is currently actually decentralized, and from my analysis, if there was willingness to take on the work, the gap of moving towards resolving content addressing and portable identity at least are not so large architecturally. But I'm not sure there's interest or not. Maybe there will be so more now.

But for me, I am more interested in "secure collaboration" these days than anything else, and that's where my work continues at Spritely. We are working our ways towards our own answer for social systems, but we aren't there yet. And so in the meanwhile, I feel like I am sounding incredibly grouchy about all of the above, but really, it's just that I think these really are important things to get right. Conway's law applies in both directions: a technical system reflects the communication and social structure of those who build it, but the communication and social structures that we have available to ourselves are informed by what technology is available to ourselves.

Regardless, that's enough of my Cassandra complex about the fediverse and otherwise. Perhaps, at the risk of making me sound grouchy and bitter (I'm not, I hope, I usually am just focused on building the things I think are heading in the right direction) you have seen that I am not lacking in fediverse critiques also. But one thing I think is true: the fediverse is decentralized and is federated. My critiques of Bluesky as not achieving either such thing still hold. So let us move onward to: can such concerns be addressed in time? Or, at least, can a "credible exit" be made possible?

Preparing for the organization as a future adversary

One interesting thing about Bluesky is that its team uses a very self-reflective phrase: "the organization is a future adversary" (here are a couple of examples). This is a very self-aware phrase that one rarely sees in an organization and is thus commendable. In many ways it reminds me of Google saying "Don't Be Evil", which was an internal rallying cry which, while perhaps never fully sincere, gave a lot of opportunity to challenge decisions internally and externally and hold Google to account to some degree. While questionable things happened while the phrase was in place, when the term was decommissioned, things at Google really did seem to be getting a lot worse.

Bluesky is a Public Benefit Corporation, which means that profit is not its only motive; Bluesky has also declared its work as being for the public good in addition to seeking profit. I can say with confidence that many of the people working at Bluesky fully believe this and, as I have emphasized earlier in the article, I think the people working at Bluesky are good and earnest about the goals of Bluesky.

So "The organization is a future adversary" is thus a prescient phrase for the moment. In addition to its launching funds from Twitter (which my understanding is Bluesky received with few if any strings attached other than to carry out its stated work), Bluesky has raised two rounds of venture capital funding. I have respect for this insofar as I have done nearly every role possible in free and open source software orgs at some point or another, and fundraising is by far the hardest and most stressful of all of them. And when you're building an organization and building good people in, their future livelihood can really weigh on you. So I am glad to see Bluesky get funding, in this regard.

But venture capital is not a donation; investors want a return. I have many friends who have taken VC money, including some running decentralized social network orgs, and have seen an exciting and positive time early on and then have seen their organization clawed away from them by investors looking for returns. I'm not judging the choice to take venture capital negatively, just acknowledging: this is the state of affairs, and we should recognize it.

And by using the phrase "the organization is a future adversary", Bluesky has acknowledged it. The right next step then is to start planning all work to survive this situation of course.

I've analyzed previously in the document the challenges Bluesky has in achieving meaningful decentralization or federation. Bluesky now has much bigger pressures than decentralization, namely to satisfy the massive scale of users who wish to flock to the platform now, to satisfy investors which will increasingly be interested in whether or not they can see a return, and to achieve enough income to keep their staff and servers going. Rearchitecting towards meaningful decentralization will be a big pivot and will likely introduce many of the problems that Bluesky has touted their platform as not having that other decentralized platforms have.

There are early signs that Bluesky the company is already considering or exploring features that only make sense in a centralized context. Direct messages were discussed previously in this document, but with the announcement of premium accounts, it will be interesting to see what happens. Premium accounts would be possible to handle in a fully decentralized system: higher quality video uploads makes sense. What becomes more uncertain is what happens when a self-hosted PDS user uploads their own higher quality videos, will those be mirrored onto Bluesky's CDN in higher quality as well? Likewise, ads seem likely to be coming to Bluesky:

A common way to make premium accounts more valuable is to make them ad-free. But if Bluesky is sufficiently decentralized and its filtering and labeling tools work as described, it will be trivial for users to set up filters which remove ads from the stream. Traditionally when investors realize users are doing this and removing a revenue stream, that is the point at which they start pressuring hard on enshittification and removing things like public access to APIs, etc. What will happen in Bluesky's case?

Here is where "credible exit" really is the right term for Bluesky's architectural goals. Rearchitecting towards meaningful decentralization and federation is a massive overhaul of Bluesky's infrastructure, but providing "credible exit" is not. It is my opinion that leaning into "credible exit" is the best thing that Bluesky can do: perhaps a large corporation or two always have to sit at the center of Bluesky, but perhaps also it will be possible for people to leave.

Conclusions

Bluesky is built by good people who care, and it is providing something that people desperately want and need. If you are looking for a Twitter replacement, you can find it in Bluesky today.

However, I stand by my assertions that Bluesky is not meaningfully decentralized and that it is certainly not federated according to any technical definition of federation we have had in a decentralized social network context previously. To claim that Bluesky is decentralized or federated in its current form moves the goalposts of both of those terms, which I find unacceptable.

However, "credible exit" is a reasonable term to describe what Bluesky is aiming for. It is Bluesky's term, and I think Bluesky should embrace that term fully in all contexts and work that they can.

MNT Pocket Reform first impressions

By Christine Lemmer-Webber on Mon 02 September 2024

I got my MNT Pocket Reform. In short, it's an absolutely gorgeous device and lovely for doing some light hacking or chiptune tracking or etc. It's incredibly built and also feels like it has a lot of potential. It's very clearly upgradeable which is a refreshing change of pace from modern electronics. On the downside, if you get an MNT reform today, you will probably find that you will need an upgrade or two because there are some rough edges, and you will need to be willing to spend some time hacking and on community support forums.

But maybe you're into that kind of thing. If you're willing to go with those caveats, it's hard to imagine a better future for computing than the stuff that MNT Research puts out. It's a cost and time investment, but it does feel like a cost and time investment moving towards a better computing future.

A bit more of a bulleted list set of impressions appear below.

The good stuff:

It's hard to understate how beautiful this device is and all its packaging and etc. It feels like it was put together by a bunch of indie artist queers because oh, it was. Lovely.
Despite the name pocket, it's more purse sized (which I was expecting). It's a bit hefty but it feels okay that it is because it feels very well built. It's a chonker, but it definitely feels like a device where if you pull it out and show it to your friends they're all going to gasp, and they should.
The manual is incredibly informative; it feels like parts of it really could be extracted for a general "intro to running a linux'y system" book. The schematics are also beautiful.
The keyboard feels incredible to use. Hard to believe a portable device is allowed to have a keyboard this good. Every click and clack warms my heart. And CTRL is in the right place! Amazing.
For me, it was the right decision to pick the purple version; it looks so good.
The installer is really good and just works, leaving you with a lightly customized Debian environment.

The rough stuff:

None of the rough things here feel insurmountable, but they all feel like things that are not ready yet, and you have to expect to pour time into them.
While the construction of the exterior is nigh perfect and a thing of beauty, there's a lot of rough edges in terms of the intersection of hardware things and software things. Expect to spend time on the community forum, expect to spend time tinkering, and maybe you find you'll want to upgrade it (but at least you can upgrade it pretty easily, and that's encouraged). For instance, there's a wifi upgrade kit already, and you can even swap out the whole main processor module.
It's running Debian Unstable which is an incredibly anxiety-inducing thing to upgrade. A couple things got worse after doing a system upgrade. Found myself missing Guix's rollback features. Would really like to see Guix running on these things.
Wifi is disconnecting pretty much constantly for me after upgrading Debian, but is it a driver issiue or hardware? It didn't have a problem before, but now it disconnects after seconds and then refuses to connect again. It seems there are known issues around wifi stuff generally and some upgrade kits coming. One way or another it's solvable, but I raise this as the type of issue that one can expect to run into. UPDATE: Also it seems fine when tethering to my phone. So I guess that probably there is a hardware component to it.
Battery life isn't super phenomenal, but more concerning, using a generic usb-c wall charger I seem to drain the battery faster than it charges. I hear that better USB-C chargers do better, maybe I will try to get one.

So that's my feelings so far. The Pocket Reform has only just recently started making its way into users' hands. It feels like something that could have a long life ahead of it, and the fact that all the schematics are right there and in the open (and in the manual and on a gorgeous poster, did I mention the gorgeous poster) means that all your eggs aren't necessarily in the MNT Research basket. If you want to get one, you have to be aware that you're probably investing your time and money into making that long life of better computing available for others.

What I will say is that it feels like the MNT Pocket Reform feels like carrying around a computer that really is mine, and which has a future to it. I hope more comes from it, and this is just the beginning.

Two songs in Milkytracker

By Christine Lemmer-Webber on Wed 07 August 2024

A Fairy Leaves Home displayed in Milkytracker

Recently I've been making some music in Milkytracker, a decidedly oldschool piece of music tracking software. I've made two songs which I am proud of, one of which is original, the other is a cover.

Here's the original piece, titled "A Fairy Leaves Home" (released as CC BY-SA 4.0, source file):

I'm fairly proud of this one. It's the first ever piece of original composition that came out the way I wanted with the level of complexity I wanted.

Halfway through the fairy meets some "frogs" (or toads?) who join her on her journey for a bit... see if you can identify what I mean.

Preceding this, I also did a cover of a song (source file) that's haunted me since I was very young. See if you can recognize it:

Your likeliness of guessing it is based on whether or not you or a member of your family watched a lot of daytime soap operas as a child. My mom watched Days of our Lives, often taping the show and watching it when she'd get home from work, but would occasionally turn on this one, which was The Young and the Restless, which had an incredible opening song. (I could never get into the plots of these shows, but the music was captivating... it's amazing how much nostalgia and attachment for television shows begins and builds with a theme song, honestly.)

I used to try to play this theme when I was a kid and I'd find a couple of chords at a time and then lose my place. I took piano lessons when I was young, but I was (in my view) never very good. And by that I mean I could never play a pre-existing song live very well. I'd get lost finding the keys partway through, and have to back up and try again. But I could improvise music I was happy with, and I could "find" the music I wanted... I was just not a very good performer.

But I think I am coming to realize that I am a better composer / arranger of music than I am a performer. Part of the reason I like tracker software in particular is that it's laid out in a sensible grid, spreadsheet-like. Even though very little programming tends to happen inside of trackers, it's a fairly common observation that trackers seem to appeal to computer programmers. But I had another realization, which is that the way I make music in Milkytracker also resembles the way lisp programmers especially tend to program: in programming, experiment in the REPL, then commit that experiment to code; in music, play keys on the keyboard (midi attachment or typing; many trackers are designed to be played on computer keyboards, and I use both) then commit to the tracker sheet.

If I were to start all over with learning music I might still take piano lessons but I would also start with a tracker right away. I'm a clumsy performer, but with trackers I can be the composer and let the software itself be the performer.

Still, even though I didn't feel confident in myself taking piano lessons all that time ago, I feel happy that some of it has stuck in my memory, and I'm grateful that my parents encouraged me. If only I had known more about how I had learned so I could better take advantage of such lessons at the time!

Thanks to my dad for encouraging me to flesh out the fairy piece and take it seriously after I shared an early draft. He challenged me to build several movements around the melody, and I did. When I finished I shared the piece with him and he was so excited he called me up to talk about it. It was good to hear him so happy with the piece; I was too. Maybe I will make more.

I appeared on PBS NOVA

By Christine Lemmer-Webber on Tue 21 May 2024

Christine gesturing while talking on PBS' NOVA documentary

I appeared on a PBS's NOVA documentary, Secrets in Your Data! (It's also on YouTube and, well, on broadcast television I suppose!) This actually aired a few days ago but I hadn't really had time to write anything about it and well, I still don't, but I wanted to mark the time before too much time passed and I never wrote anything.

The documentary maker was Structure Films and I have to say, working with all of them (Jason Sussberg, Jennifer Wiley, and David Alvarado) was really great. You can watch the film to see for yourself but they clearly did their research and pulled in a lot of wonderful people. I was really impressed! I think the film came out really well too. It covers a lot of ground, so I think it's easy to focus on any one part and think "gosh this should be a whole episode on its own", but they did a really incredible job producing things.

And it was also a delight working Alok Patel, the host of the program (his website is very entertaining, by the way)! Alok is kind of a character in the documentary, but the funny thing is that he really is as much a character in real life, the kind of person who just oozes charisma, like he was custom-built to be on television.

So it was extra charming that at one point in-between filming, Alok turned to me and said "So, you do television spots regularly." Not as a question, just as a statement. I replied that I didn't, and he said he wouldn't have guessed it. I don't know if he was trying to put me at ease or if he really thought that, but I really appreciated it regardless.

My appearance is both short and yet still one of the longer ones in the documentary, but gosh, speaking of "any of these sub-topics could be its own film", they probably actually had that much good content that fell on the cutting room floor. It was somewhere between ten and twelve hours from when we showed up to everything being wrapped up for the day, and there was this whole incredible and I must assume gorgeous bit filmed in a model railroad museum that got cut. We were showing off the idea of interoperability by talking about trains running on the same track. And ultimately, the entirety of that section got summed up into a much shorter line by me in the film, talking about how email is an example of a decentralized social network networks people already know and the role the standards plays there. I'm guessing it was painful to choose to cut that bit. But I don't mind: it was a really great experience and I have some funny stories to tell friends about it.

Anyway, PBS NOVA is one of those programs that I grew up with and so it was really cool to do this! I hope you enjoy the film, and I'm glad to have been a part of it.

My Moon and Stars

By Christine Lemmer-Webber on Sun 21 January 2024

My moon and my stars
Guided me through dark nights
My moon and my stars
Sang me sweet songs
I had found my muse
I found my song
My moon and stars were there

One night the clouds came,
darker than before
My moon and stars
she called for me to help
My moon and stars called out

How to help, how to guide
that which guides you, finds you peace?
The clouds closed in and I tried
But I could not stop the clouds

I sat in darkness, I lost myself
I scrambled and could not find you
I cried and thrashed
I lost my voice
My moon and stars were gone

I heard her voice beyond the clouds
A whisper, a small song
I rose and searched
I searched the pools
But my moon and stars were gone

I laid beneath a tree
The darkness turned to day
But I could not see the light because
My moon and stars were gone

The sun spinner shone above me then
She kissed me with her touch
Remember me, for I am here,
when your moon and stars are gone

She wove a blanket for me to sleep
I laid beneath her rays
She told me she loved me as I loved her
While my moon and stars were gone

Friends came to me and gathered round
They brought me food and drink
They sat a lantern by my feet
While my moon and stars were gone

That night I rose
I searched the pools
I ate, I shone a light
I wandered in the darkness
my blanket warmed my fright

The clouds opened like an eyelid
My moon shone like an eye
They closed again, I lost my vision
But the moon and stars were there

I slept, I ate, I searched the pools
I basked beneath the sun
I made a space for you to sleep
For my muse to rest at home

I slept, I ate, I searched the pools
I laid beneath the rug
The clouds departed, you were there
My moon and stars came home

They shone so bright, they shimmered
They rose into the air
I felt alive, I knew I could
My moon and stars were there

[<-Previous] [--latest--] [--archive--] [Next->]