Archives

Tags

Posts with tag "xudd"

Why XUDD is stuck (or: why Python needs better immutable structures)

By Christopher Lemmer Webber on Mon 23 February 2015

Update: Well, you post a thing, and sometimes that's enough for people to come and help you realize how wrong you are. Which is good! There are a number of ways forward (some obvious in retrospect). For one, pyrsistent does exist and looks nice and... well it's even actively developed. But even aside from that, there are several clean solutions: wrapper objects which "lock" the child object with getters but no setters, or even just using alist style tuples of tuples for a fake hashmap. Options are indeed abound.

And the exception thing? Well, that wasn't listed as a permanent problem below, but the solution is even easier! It would be simple to have a MessageError("some_identifier") which has a minimalist identifier which can be passed across the wire, and the directive of this error can be a special case.

Anyway, you can read the original post below. But it's good to be wrong. XUDD no longer has reason to be dead. Long live XUDD!

tl;dr: Kind of along post, but basically the lack of good functional data structures in Python has kinda killed the project.

One of my favorite projects ever has been to work on XUDD, an asynchronous actor model system for Python. Originally born out of a quest to build a MUD (hence the name), but eventually became the focus of being an actor model itself, it was a really interesting exploration for me.

There's a lot of things I like about the actor model... for one thing, functional programming is all the rage, right? But not all systems are easy to express in a purely functional style, and done right an actor model can be fairly object oriented'y, but done right, you can have your mutable cake and eat it too, safely! Your actor can mutate some of its own variables, but when it communicates across the wire, since it's a "shared nothing" environment you can get even better scale-to-the-moon type functionality than in many functional language systems: it's trivial to write code where actors communicate across multiple processes or machines in just the same way as if they were all on the same machine in the same process and thread.

I put a lot of thought into XUDD, and I've looked into some of the alternatives like Pykka, and I still think XUDD has some ideas that kicks butt on other systems. I still think the use of coroutines feels very clean and easy to read, the "hive" model is pretty nice, and the way it's built on top of the awesome asyncio system for Python are all things I'm happy with.

So, every once in a while I get an email from someone who reads the XUDD documentation and also gets excited and asks me what's going on.

The sad reality is: I'm stuck. I'm stuck on two fronts, and one I can figure my way out of, but the other one doesn't seem easy to deal with in Python as-is.

The first issue is of error propagation. This is solvable, but when an exception is raised, it would be nice to propagate this back to the original actor. There are some side issues I'm not sure about: in an inter-hive-communication (read: multiple machines or processes) type scenario, should we use standard exceptions and try to import and reproduce the same exception that was raised elsewhere? That seems like it could be... gnarly to do. Raising the error inside the original routine is also a bit tricky, but not too hard; python's coroutines can support it and I just need to think about it. So exceptions are annoying but solvable.

But the other issue... I'm not sure what to do about it. Basically, it's an issue of a lack of immutable types, or we might even say "purely functional datastructures" that are robust enough to continue with. Why does that matter? Messages sent between actors shouldn't have any mutable data. It's fine and well and even a nice feature for actors to be able to have mutable data within themselves, and actually even provide a nice way to pull of things that are just damned hard in purely functional systems, but between actors, mutable data is a no-no.

It's easy to see why: say we have a function that has a list in it, say of a number of children in my classroom, and I send this list over to an actor that controls some sort of database, right? I'm doing things in a nice, fancy coroutine'ish type way, which means my function can just suspend mid-execution while it waits for that database actor to generate some sort of reply and send it back to me. What happens if that other function pops one of the items off the list, or appends to it, or in some way actually mutates the list? Now, when my function continues, it'll be operating on a differently formed list than the one it thinks it has. I might have a reference to the third item in the list, but it turns out that there isn't a third item in the list anymore, because the other function popped it off. This can introduce all sorts of subtle bugs, and it's bumming me out that I don't have a good solution to them.

Now, there's a way around this: you can serialize each and every message completely before sending it to another actor. And of course, if actors are on different threads, processes, or even entirely different machines, of course we'd do this. But XUDD has the concept of actors being on the same hive, and there are a number of reasons for this, but one of them is that for local message passing, packing and unpacking data in some sort of serialized format for every call slows things down by a lot. When I originally began designing XUDD, the plan was for games that might need to shard out to a number of different servers but have players that can traverse different parts of the system and communicate with other shards (without knowing or the code mostly knowing that it's communicating with other actors that are technically remote). I want to be able to pass many messages at once to actors that are on the same hive, while still having a totally safe time of doing so. But there's no way to do so without a nice set of immutable / "purely functional" types, and Python just doesn't have this right now. None of the third party libraries I've found seem well maintained (am I missing something?), and the standard library is fairly deficient here. Why? I'm not really sure. I guess Python's history is just synchronously imperative enough that it just hasn't mattered.

I'd like to continue research into the actor model... I have some projects I'd like to work on where the actor model seems perfectly tuned to those tasks. What to do?

Well, I'm not really sure... I guess I could just serialize everything all the time, but it's kind of a bummer to me that so many cycles would be wasted for local computation. Maybe it's a dumb reason to feel exhausted with things, but that's the state of it. I'm not enough of a datastructure wizard to implement these things myself, but they exist. I've thought about giving up on XUDD being a Python project and to move over to something else... Guile has a cooperative REPL which would be great for debugging, and I really like the community there, so maybe that would be a nice place to go. Not really sure there's anything else I'm interested enough in at the moment. I think I'd miss Python. Or maybe I'm over-thinking everything in the first place? (Wouldn't be the first time.)

Maybe there's another way out. If you have any ideas, contact me.

Life update: Late November 2013

By Christopher Lemmer Webber on Tue 26 November 2013

I thought I'd give a brief "life update" post. In some ways, this is a more me-centric version of a "state of the goblin" post. Life is pretty intertwined with that these days.

I gave my block o' conferencing reflections already, so we'll consider that out of the way. We're also about to put out a new release of MediaGoblin. Stay tuned to the MediaGoblin blog... it'll be an exciting one I think.

What can I say about this last year though? We're nearly at the end of it. For this last year, I ate, breathed and lived MediaGoblin. This has been simultaneously the greatest thing ever, and also super exhausting. I really have not had much as in terms of breaks, role-wise I have worn more hats than I thought I could fit on my head (among other things, this includes writing core architecture, code review, promoting and speaking about the project, plenty of behind the scenes communication, plenty of management and project administration, budgeting things, the project's "art identity", some system administration (though thankfully simonft is helping), grant writing, all the many roles that went into running the crowdfunding campaign and producing the associated video). I'm glad I was an Interdisciplinary Humanities major; it couldn't have been a more interdisciplinary year. I'm also glad I use Org-Mode; it will sound silly, but MediaGoblin could not exist without that program.

And as tiring as it may have been, I am hoping I can continue with it. The MediaGoblin community is... dare I say while admitting tons of bias... one of the best communities I have seen in free software. (Maybe even the best? Again, I am admitting bias! ;))

But Joar Wandborg summarized the situation well:

The challenge at the moment, at least from what I see, is time. MediaGoblin would greatly benefit from more resources, having either one or more funded MediaGoblin developers would greatly benefit the project, as it is now, we have a lot of separate volunteers contributing code, thus putting a lot of work on the lead developer to review code. If we could increase the throughput on reviewing by assigning more people to review it would make the lead developer able to concentrate on increasingly keeping the project coherent and flexible while moving forward.

Well said. :)

On that note, I am simultaneously working on trying to get more resources on board and growing MediaGoblin upward and outward. This is achievable, I believe, and if we can get enough resources in front of ourselves, I think MediaGoblin can easily be sustainable. But to get there, we need to split my role into multiple people. That's hard to do because splitting my role into multiple people requires more resources, but it's hard to do the work to get more resources in while I am the only full time person, even with the amazing, amazing community we have (which is, again, super amazing!). This is solvable, but as a friend of mine accurately described it over dinner, it's a "bootstrapping problem". In the meanwhile, I am also playing a role of trying to bootstrap things just so, but that means actively wearing another hat, one that the MediaGoblin community does not usually see. It's hard not to feel bad while I'm doing that kind of work, because I feel like I am neglecting other things I want to move forward. But it needs to be done. And I think we can and will get there.

On that note, we will be running another crowdfunding campaign. I won't go into details here, but I have elsewhere, and if you're interested, you can read a relevant IRC log. There will be more to say soon, and of course you will hear about it here.

Another way to summarize things: next year I want to wrap up the features we need to get MediaGoblin 1.0 out the door (and that includes federation work) and then work on pushing forward MediaGoblin adoption. Plans are moving ahead on those fronts, and I am feeling optimistic. (One way to advance those plans is, if you or an organization you are working with are interested in running an instance, do it! And even better, if you are interested in funding either us developing relevant features or helping you run an instance, by all means contact me!

By the way, have I mentioned XUDD? I don't get that much time to talk about it, but the very rare times I get to work on code that isn't MediaGoblin (sadly, it's pretty rare) I have been spending on XUDD. In short, I think the way we're writing a lot of asynchronous network applications is wrong, and I think we can massively improve the situation. XUDD is an attempt to show how I think that could happen through an implementation of the actor model in Python. The architecture is shaping up nicely, and I feel good about the ideas and directions of the project. It's too bad it's so hard to allocate time for it. As you may have guessed, this may tie back into MediaGoblin some day, but if it does it will be some time in the future.

Anyway, that's enough of me yammering on for now. I think we've got an exciting year head. Now, back to working on this release!