XUDD: An Async / Actor Model System for Python

 FROM THE DARKNESS, OLD GODS AROSE TO
   BRING NEW ORDER TO THE WORLD.
   BEHOLD, ALL SHALL SUBMIT TO...

  .-   .-   -. ..  .. .---.  .---.     -.
  \ \  \ \ / // |  | \| .- \ | .- \   / /
/\ \ \  \ ' / | |  | || | \ || | \ | / / /\
\/ /_/  / . \ \ \__/ /| |_/ .| |_/ . \_\ \/
       \_/ \_/ '____' '.___/ '.___/
          _                __              
        /  .        ___   /  \
       / /o \    .-'   './ /o\_
      /.- \o \.-/ oooo--o\/ /  \
           \O/  \  .-----.\/ /o\\
           /' /\ \  ^^^^^   / \o\\
          / ./\o          _/   \o\\
         / /   '---______/      \_\'
         '                        '

"I have seen the future… and the future is XUDD!"
  – Acolyte of the Cult of XUDD

"The greatest threat to our children since Dungeons and Dragons."
  – Somebody's Relative

"It's an asyncronous actor model system for Python… I don't
understand what this has to do with chaotic deities, or why it's
called XUDD."
  – Someone reading this document

1 Braindump

1.1 Rationale

So we started XUDD a while ago and it didn't go very far. Originally, XUDD stood for "eXtensible User Dungeon Design" because it was designed to be an actor model for a Multi User Dungeon framework. That game system didn't really go anywhere.

But I think the core ideas were pretty good. More specifically, I'm very interested in the actor model that's there.

Everyone's interested in async. I still hate the node.js and event-driven-crazy models of twisted and node.js. They're quite simply hard for me to follow. I keep thinking that if we had the system we were designing with XUDD, it could change the way people do async programming in python: it's a system that allows for writing async code that feels synchronous, and if the design fulfills its full goals, it'll be as easy to write code that scales across multiple machines as it does run on a single machine.

The actor model has interested me for some time. We built a good proof of concept with a neat idea of a task queue that seemed to work. I'd like to revive that code.

Also, there's a set of new possibilities opened up by Hy. Several things involving yield and message passing and constructing good flow felt a bit awkward in Python, but with Hy, we could build macros to make some of that more friendly.

I also want hivemind to do no monkeypatching of core python things, like network stuff.

1.2 Use cases

Hivemind should be designed to be used for multiple things:

writing web applications that are a lot more distributed
- we could have websockets and wsgi running nicely together in the same system
- get rid of celery entirely
a pump.io type system
an mmorpg
distributed data crunching

1.3 tl;dr version of the design

1.4 Core design

1.4.1 Actors

All processing that happens is encapsulated in actors.
Actors only communicate with each other over messages (which includes the abstracted read-only property access system)
There is NO LOCKING of resources in this system. Any access of resources is done via message passing. For a competing resource, set one actor to take responsibility of maintainting it.
Actors are not constantly "alive"… they often exist but are in a state of suspension and are woken up as things need them on the hive.
Actors do not have direct reference to each other… instead they get an ActorProxy, which is mostly an id (possibly uuid) with some other abstractions wrapped around it (including network access, etc)
Only an actor's hive should keep direct reference to it.
Actors are normally in a state of semi-suspension (with maybe some exceptions that need to be kept alive constantly, such as ones watching a socket) and are woken up in the event that a message is passed to them (which can include them requesting of the hive or a timer actor that they be woken up periodically)

1.4.2 Hives (actor queues)

Hives are collections of actors. A hive is responsible for a variety of things:

Hives register and maintain actors
Hives manage execution availability. In the default "Hive" pattern (though actors could be spawned in a way different from this) there is a pool of threads available for actors to execute from. Whenever an actor needs to act (because it is initializing or because it is receiving and responding to a thread)

Each actor when spawned registers itself with a HiveProxy. Like actor proxies, an actor does not touch the Hive directly. But the HiveProxy has several methods for sending messages from one actor to another.

Notes from Elrond on letting Hives handle transport:

<Elrond> BTW: Letting the hive do much of the "transporty" things has the    
         advantage of:                                                 
<Elrond> - No need to convert to json, if everything is local.
<Elrond> - No need to dns lookup uuids, if to and from a relocal objects.
<Elrond> - No need to assign a complex message-id, if it's local. Just use 
           something that is hash()able.

1.4.3 Messaging between actors

Message basics
Messages, when sent across the wire, are simple. They're just json documents, with the following fields supported:
- from: the id of the actor that sent this message
- to: a list of identifier(s), probably another actor's id (we might have some special "dns-like-lookup" actors later for special things)
- message-id: this message
- in-reply-to: the message this is replying to
- body: a json document full of
(They should be json-format as in termss of what fields anyway. They may be serialized when sent across the wire as msgpack or something… that's not important for now.)

Maybe you're running a very simple text game which has two characters in a room… one fires a shot at another.
```
{"from": "fd309f44-fb28-441d-b228-c9f35c520a79",
 "to": ["3c108d16-2acb-4a54-9398-dc8747fb4928"],
 "message-id": "bbb1d4c5-8565-4e17-a5cf-c94bc228d2f3",
 "directive": "attack",
 "body": {
   "type": "gunshot",
   "damage": 3,
   "accuracy": 40}}
```
The recipient does some calculations regarding its own ability to dodg, determines that it is indeed hit, and lowers its hit points. It sends a message back to the other user informing that it was indeed damaged for 3 hit points, and is now at only 32 hit points.
```
{"from": "3c108d16-2acb-4a54-9398-dc8747fb4928",
 "to": ["fd309f44-fb28-441d-b228-c9f35c520a79"],
 "message-id": "82f85f6a-a0b7-4e28-8e73-1e83e9ba3984",
 "in-reply-to": "bbb1d4c5-8565-4e17-a5cf-c94bc228d2f3",
 "directive": "attack_result",
 "body": {
   "success": True,
   "damage": 3,
   "hit-points-left": 32}}
```
(We've ignore the mechanics of informing a room, etc… it might be a better pattern to pass the message through the room itself if your game works that way, that way the room can negotiate if the character is still there. But this is just a simple example!)

For convenience, and since messages always follow this pattern, messages are wrapped in a Message object that also contains ActorProxy objects for each of the ids in from/to/message-id, etc.

Receiving and handling a message

Actors operate whenever they receive a message. It's up to the actor to decide how to handle a message, but all actors must provide a receive_message() method.

The default actor comes with a basic receive_message() method, but you can override it.

By default, you define on an actor a set of message_handlers. Messages are routed depending on the directive.

Let's give an example with a "vote counting" robot. This robot is super simple: it takes ballots and counts them until it's told to not accept ballots any more.

class VotingMachine(Actor):
    def __init__(self):
        self.message_routing = {
            "enter_ballot": self.enter_ballot,
            "election_over": self.election_over}

        self.tally_sheet = {}
        self.election_running = True

    def enter_ballot(self, message):
        """
        Receive a ballot and increase the votes for the candidate by one.
        """
        if self.election_running:
            candidate = message.body['candidate']
            self.tally_sheet['candidate'] = self.tally_sheet.get(
                message.body['candidate'], 0) + 1

    def election_over(self, message):
        """
        End the election!
        """
        self.election_running = False

Sending messages

There are two ways to submit messages to an actor's hive.

Fire and forget

Basically, for a message that you don't need to "return" to in the same function, it should be possible to fire off a message to another actor and just continue on your way. For example, maybe we want to tell the voter from the above example whether or not their vote was accepted or rejected.

Notice that we're adding a new self.hive.send_message call:

class VotingMachine(Actor):
    def __init__(self):
        self.message_routing = {
            "enter_ballot": self.enter_ballot,
            "election_over": self.election_over}

        self.tally_sheet = {}
        self.election_running = True

    def enter_ballot(self, message):
        """
        Receive a ballot and increase the votes for the candidate by one.
        """
        if self.election_running:
            candidate = message.body['candidate']
            self.tally_sheet['candidate'] = self.tally_sheet.get(
                message.body['candidate'], 0) + 1

            # Inform the user that their ballot was accepted
            self.hive.send_message(
                to=message.from,
                directive="inform_vote_result",
                in_reply_to=message.id,
                body={
                    result="accepted"})
        else:
            # Inform the user that their ballot was rejected
            self.hive.send_message(
                to=message.from,
                directive="inform_vote_result",
                in_reply_to=message.id,
                body={
                    result="rejected",
                    reason="The election is over."})

    def election_over(self, message):
        """
        End the election!
        """
        self.election_running = False

With the above change, we send a message to the voter about whether or not their vote was accepted. But that's all we really care to do… once our votingmachine has processed the ballot and replied to the voter, they consider their job done.

Another example of this might be something like a web application that needs to send off some task… say it got a video and it needs it transcoded. The web application doesn't need to sit around and wait for the video to finish transcoding before it sends a response to the user who submitted the video, so it fires off a task Celery-style to some worker actor, who will handle the rest of the transcoding.

Yielding and waiting
What happens if you send a message and you need to wait back on a reply? You could handle state on your actor and break things apart into a state machine with different functions depending on what part of the state of code you're writing. But shattering your functionality into a bunch of tiny functions can result in code that is really hard to follow.

Enter python's "yield" statement. Yield does something fairly cool… it suspends a function mid-execution until it is woken up again (potentially with some result). This means we can fire a message off and our code will halt, but will wake up again and will continue executing once it has a response.

(Note, this is a built-in functionality with the default Actor, but not all actors must implement things this way.)

Say our voter is going to submit their ballot. They want to make sure they get their vote in… if they do, they'll stop feeling so anxious about voting and feel content. But if their vote isn't accepted for whatever reason (they've been hearing about problems with these machines…) they'll complain to the government and become indignant.

Let's see what our Voter looks like then.
```
class Voter(Actor):
    def __init__(self):
        self.message_routing = {
            "go_vote": self.go_vote}

        # Our voter is anxious by default!
        self.mood = "anxious"

    def go_vote(self, message):
        """
        Our voter is reminded to go vote after polls open.
        """
        polling_place = message.body["voting_machine_id"]

        voting_result = yield self.wait_on_message(
            to=polling_place,
            directive="enter_ballot",
            body={
                "candidate": "Wesley Prelvis"})

        if voting_result["result"] == "accepted":
            self.mood = "content"

        else:
            self.hive.send_message(
                to=message.body["municipal_id"],
                directive="formal_complaint",
                body={
                    "type": "voting problems",
                    "text": (
                        "Your stupid voting machine gave me this error: "
                        "'%s'") % voting_result["reason"]})

            self.mood = "indignant"
```
Our function pauses mid-execution at tye "yield" and waits for a response. Despite this all being one method, there's no blocking while waiting for that response, yet this is all written as a single, coherent method!

What happens in the background is that this actor has a registry of messages that are "waiting" on a response. It tacks on a message id and waits for a reply. The next time the handle_message() method sees an incoming message with this id, it wakes up this method (suspended at that yield) and passes the incoming message to it.

TODO:
- document how to set timeouts!
- better error handling? Or maybe not in this example to keep it simple ;)

1.4.4 Initial boot-up

Hive should have an init, just as a wsgi has an init.

Hives are subclassed for this, and specify their own boot-up methods. Different types of applications may boot up in different ways.

We might want to provide some common boot-ups, as well as some "chained component" bootups (eg, both a wsgi app and a celery-like task handler and a websocket thing may all boot up together)

1.4.5 Spawning an actor

Spawning at init
Spawning off another actor

1.4.6 Inter-hive communication

For lack of a better term, a set of hives that are connected together are called a "HiveMesh".

Inter-hive representatives
Actor lookup

1.4.7 Read-only actor properties

All lookup of properties is abstracted into a macro. This macro checks to see if the actor is within the hive… if it is, it can actually just do a simple read of the property. Otherwise, it's fetched and returned as a message.

If you want to implement this yourself, you can (by doing it in plain python), but it's easier to just use the macro (or you can always request it via message passing).

1.4.8 Subscribing to an actor's "events"

1.4.9 Serializing actors

All actors come with a .serialize() method and a .serializable property that's set on the class level. .serialize() is only available

1.4.10 Load balancing

Actors which can be fully serialized to a state can also be restored on other servers. Some hives may handle load balancing, or may negotiate with a load balancer from another server.

1.4.11 "Safe passage" of local data

1.4.12 Supporting old applications

1.4.13 "Virtual" hives

1.4.14 Actor shutdown

1.4.15 Other ideas

Functional registry actors

Faster-ids

import datetime

def fast_message_id(origin_actor_id):
    dt = datetime.datetime.now()
    return "%s%s%s%s%s%s%s%s" % (
        "existing-actor-id", dt.year, dt.month,
        dt.day, dt.hour, dt.minute, dt.second, dt.microsecond)

Even better idea:

<Elrond> paroneayea - You could also create one uuid-per-actor at "boot" and  
         then append a counter. That should be pretty fast.                   
<paroneayea> Elrond: oh yeah                                                  
<paroneayea> that's a good idea                                               
<paroneayea> so a counter per actor?                                          
<Elrond> Right.

<Elrond> paroneayea - Saw it.                                          
<Elrond> paroneayea - Still could let the hive do the sending.
<paroneayea> the hive will still do the sending
<paroneayea> but the actor itself has to register the id               
<paroneayea> so it knows what suspended coroutine to wake back up
<Elrond> def self.wait_on_message():                                   
<Elrond>   message_id = self.hive.send_message(...)
<Elrond>   self.wait_for_message_with_id(message_id)
<Elrond> Yeah.
<Elrond> So that's why hive.send_message could return the message_id.
<Elrond> So why not let hive.send_message return a message_id?
<paroneayea> oh                                                        
<Elrond> And let actor.wait_on_message take note of it?
<paroneayea> yeah I suppose it could!
<paroneayea> good point
<Elrond> I am not saying, that this is best way since sliced bread though!   
         Just pointing out another option.

<Elrond> *gna* That one is very tricky:
         http://29a.ch/2009/2/20/atomic-get-and-increment-in-python
<Elrond> paroneayea - You might want to look at this ^^... If you want to
         implement the message-id_by_counter.

1.5 Problems

1.5.1 Error handling

1.5.2 Debugging

1.5.3 Clean shutdowns

1.5.4 How to "lock" actors? An actor should only do one task at a time.

If an actor can't keep up with the amount of messages it is receiving, it should probably get some "assistants"

1.6 How far does this go?

2 Examples

2.1 The robot sentry

2.2 MediaGoblin submits to the hivemind

2.3

3 Tasks

3.1 TODO Inter-hive communication round 1

3.1.1 Braindump

What do we need?

Each actor should have a actor@hive naming?
Hive recognition/registry of other hives
Forwarding mechanics
Test out zeromq
Set up zeromq thing

Okay, okay, that's not clear enough. massive braindump of everything I thought of on that walk.

Actors declaring themselves responsible for IHC
- They should do this by sending a message to the hive declaring themselves responsible for routing messages to actors on a hive with this ID
- This means the hive does have to be an actor!
- Excitingly, for certain kind of hives (hives that run in their own process via multiprocess for example), this means that we can have an actor like a "MultiprocessActor" that:
  - spawns the hive in a process
  - declares itself responsible for that hive(!)
  - handles the shutdown of that hive
Demos to try:
- Simplest demo: just an actor waiting for input
  - No back and forth, all it does is keep polling on the IPC socket
  - Should use ZeroMQ
- lotsamessages.py split into multiple subprocess hives (test IHC)
  - Would be cool to specify how many!
  - In this case, this also means that it should be possible to request actors to be spawned over IHC!
  - Would also be neat if they were all sent shutdown messages via the departmentchair
  - This also requires Hive-is-itself-an-actor
- IHC where two hives wait for each other to come online, then actually start talking to each other
  - Uses ZeroMQ
  - This could be made maybe a little bit easier with actor event subscriptions, but I'm not sure.
- "Dedicated actor" demo with wsgi server?
- "Dedicated actor" demo with XMPP server?
About hives-as-actors
- I don't see any way around it
- The only demo it's not a prerequisite for is the "simplest demo"
About dedicated actors
- Examples of dedicated actors:
  - PyGame main loops
  - gobject event loops
  - SleekXMPP bots
  - Certain kind of networked IO things that run in their own thread
  - Your standard hive will become a dedicated actor (yes! read more about this later)
- Fun fact! These are actually a separate "problem domain" from Inter-Hive-Communication. They could make some Inter-Hive-Communication nicer, but they are no means needed for it.
- Dedicated actors run in their own thread and have their own while loop. (I haven't thought of a clear API for spawning them though…)
- Once per loop, they should check their messages
- They get a special HiveProxy that actually dumps to a Queue() object on the Hive. The hive only ever looks at this queue on its own while loop if it has dedicated actors!
- The extra cool thing: this means that when the hive itself becomes an actor, the hive itself is just a special kind of dedicated actor!
Hive cleanup should happen with considerations of all the above.
Documentation should be done before things get too crazy, but on the other hand, I'm pretty excited, and if I wait too long maybe I'll lose sight of some of this?
What about "actor event subscriptions"? This is a really nice feature and it may make the "two hives talking" demo easier because some of the other actors in the systems might actually want to wait on that.
On the other hand, it's by no means necessary. There can be a "Overseer" actor that simply yields on the hives establishing communication before it agrees to start the other "talk to each other" actors up.

That seems really clean.
actor@hive naming
- This isn't too complicated, though it does involve some refactoring
- A message sent to a hive is like "hive@hive-id-here".
- A message that's sent without the "@hive-id-here" is automatically assumed to be for a recipient on the same hive as this actor
What order should we try to accomplish these things in? What are the big milestones here?
- Should do the proof-of-concept listening demo first just to feel comfortable with the polling and receiving bit
- This also means that we need the ability for actors to be able to "reschedule" themselves easily. That's already more or less possible, but it would be nice if they could actually yield to themselves on a while loop!
- Clean up the hive.
- Turn the hive into an actor.
- Enable actor@hive naming.
- Write some docs
- Write some tests
- Write first inter-hive communication demo
- Generalize "dedicated actors"
- Add XMPP or WSGI dedicated actors demo

… is that order sane?

Misc things:

We should also put a TODO for yielding on messages with multiple recipients, but given all the above, that seems like a non-priority at the moment.
At some point it also seems like we should have some performance tests… I'm not sure how to do that though.
I'm really looking forward also to have a "Clock" actor that actors can schedule themselves against

3.1.2 DONE Test out zeromq

So it seems we want to set up two separate sockets for each…

3.2 TODO Dedicated actors

3.2.1 TODO embed XMPP actor

3.3 TODO Docs for XUDD core

3.3.1 DONE Add docstrings->docs support

3.3.2 TODO intro.rst

3.3.3 TODO quickstart.rst

3.3.4 TODO core.rst

3.4 TODO yielded-call macro

<paroneayea> paultag: thought of a good use of hy macros for XUDD
<paultag> yeah?
<paroneayea> paultag: so say you've got this guy
<paroneayea> class MyLittleActor(Actor):
<paroneayea>   def some_message_handler(message):
<paroneayea>     foo
<paroneayea>     result = yield some_message()
<paroneayea>     blah_utility(message)
<paroneayea>     bar
<paroneayea> paultag: what if you wanted blah_utility to *also* be able to
             yield back a message
<paroneayea> if you're just calling that, there's no way it can pass back
             through some_message_handler() yielding
<paroneayea> you could do it manually
<paroneayea> but you'd have to do a big rigamarule basically
<paroneayea> of starting the generator, .send(None), seeing if it sends a
             thing, if so yielding
<paroneayea> but you *have to* do it inside that message_handler
<paroneayea> because the message handler would have to yield also at that
             point
<paroneayea> so, if you wrap the yield-rigamarule into a macro like so
<paroneayea> (yielded-call blah-utility self message arg1 arg2)
<paroneayea> (or you could probably even skip self and message if lazy)
<paroneayea> it could handle all that rigamarule, expanded in-line
<paroneayea> there's no way otherwise that I can tell to nicely have shared
             utilities that can also yield

3.5 TODO Put use cases in intro.rst

<moggers87> still trying to get my head around what xudd is and isn't <paroneayea> moggers87: hm, yeah <paroneayea> I need to put examples of what to do and how it would be done in intro.rst <paroneayea> I think that would explain more clearly some use cases and will help people wrap their heads around it faster

3.6 TODO Tests

3.7 TODO Messages with multiple recipients

3.7.1 TODO Allowing messages to have multiple recipients in to:

3.7.2 TODO Handling "collecting" this in yield

We'll probably have to restore wait_on_message which was removed in d1df939

3.8 TODO Review other competing thinking

3.8.1 TODO Twisted

TODO Re-try twisted tutorial
TODO Check out twisted coroutines

3.8.2 TODO Watch Guido's talk on his async stuff

3.8.3 TODO Look at whatever that coroutine network one is

3.8.4 TODO Look at Go

3.9 TODO Support old XUDD tests

3.9.1 TODO abcxy_talk

TODO Break this down into the core ideas
TODO Re-code

3.9.2 TODO noisyab

TODO Break this down into the core ideas
TODO Re-code

3.10 TODO Have meeting with Paul

3.11 TODO Need way for actors to wait on other actors initing?

4 Marketing

If we stick with XUDD, we'll use "lovecraftian cult which is raising the evil gods" of blah blah

Acronymns:

eXtra Universal Destruction Deity
eXtensible User Dungeon Design (the original)