Posts with tag "federation"

An even more distributed ActivityPub

By Christopher Allan Webber on Thu 06 October 2016

So ActivityPub is nearing Candidate Recommendation status. If you want to hear a lot more about that whole process of getting there, and my recent trip to TPAC, and more, I wrote a post on the MediaGoblin blog about it.

Last night my brother Stephen came over and he was talking about how he wished ActivityPub was more of a "transactional" system. I've been thinking about this myself. ActivityPub as it is designed is made for the social network of 2014 more or less: trying to reproduce what the silos do, which is mutate a big database for specific objects, but reproduce that in a distributed way. Well, mutating distributed systems is a bit risky. Can we do better, without throwing out the majority of the system? I think it's possible, with a couple of tweaks.

  • The summary is to move to objects and pointers to objects. There's no mutation, only "changing" pointers (and even this is done via appending to a log, mostly).

    If you're familiar with git, you could think of the objects as well, objects, and the pointers as branches.

    Except... the log isn't in the objects pointing at their previous revisions really, the logging is on the pointers:

    [pointer id] => [note content id]
  • There's (activitystreams) objects (which may be content addressed, to be more robust), and then "pointers" to those, via signed pointer-logs.

  • The only mutation in the system is that the "pointers", which are signed logs (substitute "logs" for "ledger" and I guess that makes it a "blockchain" loosely), are append-only structures that say where the new content is. If something changes a lot, it can have "checkpoints". So, you can ignore old stuff eventually.

  • Updating content means making a new object, and updating the pointer-log to point to it.

  • This of course leads to a problem: what identifier should objects use to point at each other? The "content" id, or the "pointer-log" id? One route is that when one object links to another object, it could link to both the pointer-log id and the object id, but that hardly seems desirable...

  • Maybe the best route is to have all content ids point back at their official log id... this isn't as crazy as it sounds! Have a three step process for creating a brand new object:

    • Open a new pointer-log, which is empty, and get the identifier
    • Create the new object with all its content, and also add a link back to the pointer-log in the content's body
    • Add the new object as the first item in the pointer-log
  • At this point, I think we can get rid of all side effects in ActivityPub! The only mutation thing is append-only to that pointer-log. As for everything else:

    • Create just means "This is the first time you've seen this object." And in fact, we could probably drop Create in a system like this, because we don't need it.
    • Update is really just informing that there's a new entry on the pointer-log.
    • Delete... well, you can delete your own copy. You're mostly informing other servers to delete their copy, but they have a choice if they really will... though that's always been true! You now can also switch to the nice property that removing old content is now really garbage collection :)
  • Addressing and distribution still happens in the same, previous ways it did, I assume? So, you still won't get access to an object unless you have permissions? Though that gets more confusing if you use the (optional) content addressed storage here.

  • You now get a whole lot of things for free:

    • You have a built in history log of everything
    • Even if someone else's node goes down, you can keep a copy of all their content, and keep around the signatures to show that yeah, that really was the content they put there!
    • You could theoretically distribute storage pretty nicely
    • Updates/deletes are less dangerous

(Thanks to Steve for encouraging me to think this through more clearly, and lending your own thoughts, a lot of which is represented here! Thanks also to Manu Sporny who was the first to get me thinking along these lines with some comments at TPAC. Though, any mistakes in the design are mine...)

Of course, you can hit even more distributed-system-nerd points by tossing in the possibility of encrypting everything in the system, but let's leave that as an exercise for the reader. (It's not too much extra work if you already have public keys on profiles.)

Anyway, is this likely to happen? Well, time is running out in the group, so I'm unlikely to push for it in this iteration. But the good news, as I said, is that I think it can be built on top without too much extra work... The systems might even be straight-up compatible, and eventually the old mutation-heavy-system could be considered the "crufty" way of doing things.

Architectural astronaut'ing? Maybe! Fun to think about! Hopefully fun to explore. Gotta get the 2014-made-distributed version of the social web out first though. :)

Activipy v0.1 released!

By Christopher Allan Webber on Tue 03 November 2015

Hello all! I'm excited to announce v0.1 of Activipy. This is a new library targeting ActivityStreams 2.0.

If you're interested in building and expressing the information of a web application which contains social networking features, Activipy may be a great place to start.

Some things I think are interesting about Activipy:
  • It wraps ActivityStreams documents in pythonic style objects
  • Has a nice and extensible method dispatch system that even works well with ActivityStreams/json-ld's composite types.
  • It has an "Environment" feature: different applications might need to represent different vocabularies or extensions, and also might need to hook up entirely different sets of objects.
  • It hits a good middle ground in keeping things simple, until you need complexity. Everything's "just json", until you need to get into extension-land, in which case json-ld features are introduced. (Under the hood, that's always been there, but users don't necessarily need to understand json-ld to work with it.)
  • Good docs! I think! Or I worked really hard on them, at least!

As you may have guessed, this has a lot to do with our work on federation and the Social Working Group. I intend to build some interesting things on top of this myself.

In the meanwhile, I spent a lot of time on the docs, so I hope you find reading them to be enjoyable, and maybe you can build something neat with it? If you do, I'd love to hear about it!

A conversation with Sussman on AI and asynchronous programming

By Christopher Allan Webber on Wed 14 October 2015


A couple weeks ago I made it to the FSF's 30th anniversary party. It was a blast in many ways, and a good generator of many fond memories, but I won't go into depth of them here. One particularly exciting thing that happened for me though was I got to meet Gerald Sussman (of SICP!) The conversation has greatly impacted me, and I've been spinning it over and over again in my mind over the last few weeks... so I wanted to capture as much of it here while I still could. There are things Sussman said that I think are significant... especially in the ways he thinks contemporary AI is going about things wrong, and a better path forward. So here's an attempt to write it all down... forgive me that I didn't have a real tape recorder, so I've written some of this in a conversational style as I remember it, but of course these are not the precise things said. Anyway!

I wasn't sure initially if the person I was looking at was Gerald Sussman or not, but then I noticed that he was wearing the same "Nerd Pride" labeled pocket protector I had seen him wear in a lecture I had watched recently. When I first introduced myself, I said, are you Sussman? (His first reply was something like to look astonished and say, "am I known?") I explained that I've been reading the Structure and Interpretation of Computer Programs and that I'm a big fan of his work. He grinned and said, "Good, I hope you're enjoying it... and the jokes! There's a lot of jokes in there. Are you reading the footnotes? I spent a lot of time on those footnotes!" (And this point my friend David Thompson joined me, and they both chuckled about some joke about object oriented programmers in some footnote I either hadn't gotten to or simply hadn't gotten.)

He also started to talk enthusiastically about his other book, the Structure and Interpretation of Classical Mechanics, in which classical engineering problems and electrical circuits are simply modeled as computer programs. He expressed something similar to what he had said in the forementioned talk, that conventional mathematical notation is unhelpful, and that we ought to be able to express things more clearly as programs. I agreed that I find conventional mathematical notation unhelpful; when I try to read papers there are concepts I easily understand as code but I can't parse the symbolic math of. "There's too much operator overloading", I said, "and that makes it hard for me to understand in which way a symbol is being used, and papers never seem to clarify." Sussman replied, "And it's not only the operator overloading! What's that 'x' doing there! That's why we put 'let' in Scheme!" Do you still get to write much code or Scheme these days, I asked? "Yes, I write tens of thousands of lines of Scheme per year!" he replied.

I mentioned that I work on distributed systems and federation, and that I had seen that he was working on something that was called the propagator model, which I understood was some way of going about asynchronous programming, and maybe was an alternative to the actor model? "Yes, you should read the paper!" Sussman replied. "Did you read the paper? It's fun! Or it should be. If you're not having fun reading it, then we wrote it wrong!" (Here is the paper, as well as the documentation/report on the software... see for yourself!) I explained that I was interested in code that can span multiple processes or machines, are there any restrictions on that in the propagator model? "No, why would there be? Your brain, it's just a bunch of hunks of independent grey stuff sending signals to each other."

At some point Sussman expressed how he thought AI was on the wrong track. He explained that he thought most AI directions were not interesting to him, because they were about building up a solid AI foundation, then the AI system runs as a sort of black box. "I'm not interested in that. I want software that's accountable." Accountable? "Yes, I want something that can express its symbolic reasoning. I want to it to tell me why it did the thing it did, what it thought was going to happen, and then what happened instead." He then said something that took me a long time to process, and at first I mistook for being very science-fiction'y, along the lines of, "If an AI driven car drives off the side of the road, I want to know why it did that. I could take the software developer to court, but I would much rather take the AI to court." (I know, that definitely sounds like out-there science fiction, bear with me... keeping that frame of mind is useful for the rest of this.)

"Oh! This is very interesting to me, I've been talking with some friends about how AI systems and generative software may play in with software freedom, and if our traditional methods of considering free software still applied in that form," I said. I mentioned a friend of a friend who is working on software that is generated via genetic programming, and how he makes claims that eventually that you won't be looking at code anymore, that it'll be generating this black box of stuff that's running all our computers.

Sussman seemed to disagree with that view of things. "Software freedom is a requirement for the system I'm talking about!" I liked hearing this, but didn't understand fully what he meant... was he talking about the foundations on top of which the AI software ran?

Anyway, this all sounded interesting, but it also sounded very abstract. Is there any way this could be made more concrete? So I asked him, if he had a student who was totally convinced by this argument, that wanted to start working on this, where would you recommend he start his research? "Read the propagators paper!" Sussman said.

OH! Prior to this moment, I thought we were having two separate conversations, one about asynchronous programming, and one about AI research. Suddenly it was clear... Sussman saw these as interlinked, and that's what the propagator system is all about!

One of the other people who were then standing in the circle said, "Wait a minute, I saw that lecture you gave recently, the one called 'We Don't Really Know How to Compute!', and you talked about the illusion of seeing the triangle when there wasn't the triangle" (watch the video) "and what do you mean, that you can get to that point, and it won't be a black box? How could it not be a black box?"

"How am I talking to you right now?" Sussman asked. Sussman seemed to be talking about the shared symbolic values being held in the conversation, and at this point I started to understand. "Sure, when you're running the program, that whole thing is a black box. So is your brain. But you can explain to me the reasoning of why you did something. At that point, being able to inspect the symbolic reasoning of the system is all you have." And, Sussman explained, the propagator model carries its symbolic reasoning along with it.

A more contrived relation to this in real life that I've been thinking about: if a child knocks over a vase, you might be angry at them, and they might have done the wrong thing. But why did they do it? If a child can explain to you that they knew you were afraid of insects, and swung at a fly going by, that can help you debug that social circumstance so you and the child can work together towards better behavior in the future.

So now, hearing the above, you might start to wonder if everything Sussman is talking about means needing a big complicated suite of natural language processing tools, etc. Well, after this conversation, I got very interested in the propagator model, and to me at least, it's starting to make a lot of sense... or at least seems to. Cells' values are propagated from the results of other cells, but they also carry the metadata of how they achieved that result.

I recommend that you read the materials yourself if this is starting to catch your interest. (A repeat: here is the paper, as well as the documentation/report on the software... see for yourself!). But I will highlight one part that may help drive the above points more clearly.

The best way to catch up on this is to watch the video of Sussman talking about this while keeping the slides handy. The whole thing is worth watching, but about halfway through he starts talking about propagators, and then he gets to an example of where you're trying to measure the height of a building by a variety of factors, and you have these relationships set up where as information is filled in my a cell's dependencies, that cell merges what it already knows about the cell with what it just learned. In that way, you might use multiple measurements to "narrow down" the information. Again, watch the video, but the key part that comes out of the demonstration is this:

(content fall-time)
=> #(contingent #(interval 3.0255 3.0322)
                (shadow super))

What's so special about this? Well, the fall-time has been updated to a more narrow interval... but that last part (shadow and super) are the symbols of the other cells which propagated the information of this updated state. Pretty cool! And no fancy natural language parsing involved.

There's certainly more to be extrapolated from that, and more to explore (the Truth Maintenance Systems are particularly something interesting to me). But here were some interesting takeaways from that conversation, things I've been thinking over since:

  • AI should be "accountable", in the sense that it should be able to express its symbolic reasoning, and be held up to whether or not its assumptions held up to that.
  • Look more into the propagators model... it's like asynchronous programming meets functional programming meets neural nets meets a bunch of other interesting AI ideas that have been, like so many things, dropped on the floor for the last few decades from the AI winter, and which people are only now realizing they should be looking at again.
  • On that note, there's so much computing history to look at, so many people are focused on looking at what the "new hotness" is in web development or deployment or whatever. But sometimes looking backwards can help us better look forwards. There are ideas in SICP that people are acting as if they just discovered today. (That said, the early expressions of these ideas are not always the best, and so the past should be a source of inspiration, but we should be careful not to get stuck there.)
  • Traditional mathematical notation and electrical engineering diagrams might not convey clearly their meaning, and maybe we can do better. SICM seems to be an entire exploration of this idea.
  • Free software advocates have long known that if you can't inspect a system, you're held prisoner by it. Yet this applies not just to the layers that programmers currently code on, but also into new and more abstract frontiers. A black box that you can't ask to explain itself is a dangerous and probably poorly operating device or system.
  • And for a fluffier conclusion: "If you didn't have fun, we were doing it wrong." There's fun to be had in all these things, and don't lose sight of that.