Researchcation, concluded -- Dustycloud Brainstorms

Researchcation, concluded

By Christine Lemmer-Webber on Tue 03 June 2014

I just got done with a three week thing I've dubbed "researchcation". It's exactly what it sounds like, research + vacation.

It's hard for me to take time away from MediaGoblin right now and have it still meet its goals as a project. On the other hand, there's a lot that we have planned for the year ahead, but some of it I'm not really prepared enough for to make optimal decisions on. In addition, the last year and a half really have just not given me much of a break at all, and running a crowdfunding campaign (not to mention two over two years) is really exhausting. (Not that I'm complaining about success!)

I was feeling pretty close to burnout, but given how much there is to get done, I decided to take a compromise on this break... instead of taking a full fledged vacation, I'd take a "researchcation": three weeks to recharge my batteries and step away from the day to day of the project. In the meanwhile of that, I'd work on some projects to prepare me for the year ahead. A number of good things came out of it, though not exactly the same things I expected coming in. But I think it was worth the time invested.

My original plan going in was that I would work on two things: something related to the Pump API and federation, and something related to deployment. It turns out I didn't get around to the deployment part, but working on the federation part was insightful, though not in all the ways I anticipated. Though I've read the Pump API document and helped advise a bit on the design of PyPump (not to take credit for that, clearly credit belongs to Jessica Tallon generally, not me), there's nothing really like having a solid project to toss you into things, and I wanted to take a non-MediaGoblin-codebase approach to playing around with the Pump API.

I started out by hacking on a small project called PumpBus, which was going to be a daemon which wrapped pypump and exposed a d-bus API. I figured this would make writing clients easier (and even make it possible to write an emacs client... yeah I know). I got far enough to where I was able to post a message from emacs lisp, then decided that what I was working on just wasn't that interesting and wasn't teaching me much more than I already knew.

Given that there was both the "research" but also the "-cation" component to this, I figured the risks of failure were low, so I'd up the challenge of what I was working on a bit. I instead started working on something I've dubbed Pydraulics: a python-powered implementation of the Pump API. Worst came to worst I'd learn a few things.

I decided from the outset to keep a few assumptions related to pydraulics:

The end goal would be to have something that provided interfaces for object storage and retreival... not to wrap the database itself, but hopefully there would not be too many views, and maybe this could happen on a per-view basis. This way you could easily wrap Pydraulics around whatever application and use the storage/database it's already using. That's the end goal. (I didn't get there ;))
I'd keep things simple database-wise: assuming you're not providing your own interface, the default interface provided is postgres + sqlalchemy only. There's some new JSON-related features in Postgresql that are pretty exciting and would be appropriate here.
I'd use this as an oportunity to think about MediaGoblin's codebase. I decided I'd see how easy or hard it would be to split out components from MediaGoblin as I needed them into something I dubbed "libgoblin". For now, I'd allow this to be messy, but hopefully would give me a chance to think about what libgoblin should be.
I'd also use to think about where MediaGoblin fits as in terms of recent developments in asynchronous Python coding.

So, what came out of it?

Turns out SQLAlchemy does a nice job of making use of Postgres' built-in JSON support. Early tests seemed to indicate that this choice would pay off well. (Left me wondering: how hard would it be for someone to write a python API-compatible implementation of pymongo or something?)
I ended up spending a lot more time on the libgoblin side of things than I expected. I didn't realize that MediaGoblin had become such a self-contained microframework until this point. I wanted to port the MediaGoblin OAuth views over to Pydraulics to save time, but it turned out this required porting over a significant amount of MediaGoblin code over to libgoblin. I did get the oauth views working though!
Asynchronous stuff turned out to be interesting to explore, and I'll expand on what I've been thinking below.
I did end up getting a much, much stronger sense of the Pump API, which of course was the main goal, though the implementation of that is not yet complete.

Pondering asynchronous coding developments and MediaGoblin/libgoblin/pydraulics turned out to be fruitful. Mostly I have been looking at "what would it take for libgoblin to be usefully integrated into asyncio?"

This turns out to be a bit more challenging than it appears at the outset for one reason: mg_globals. mg_globals is a pretty sad design in MediaGoblin that I'd like to get rid of; basically it makes it easy to write functions that don't have to have the database session and friends, template environment and etc passed into them, because those are set on a global variable level. That works (but is nasty) as long as you're not in a multithreaded environment, but breaks as soon as you are. I recently created a ticket reflecting such, suggesting switching over to werkzeug context locals (Flask makes heavy use of this). Werkzeug's hack is clever, using thread locals so that even in a multi-threaded environment, the objects you're accessing are still globals, but they're the right globals.

But Werkzeug's solution is not good enough for integration with asyncio, where you might be doing "asynchronous" behavior in the same thread, suspending and resuming coroutines or coming back to tasks or etc. As such, it's almost guaranteed in this system that you'll be clobbering the variables another task needs.

What to do? I did research to see if anyone had ideas. It looks like you could do such a thing with Task.current_task() in asyncio, and that would be fairly equivalent. I think you'd need careful implementation though... if you're not paying close attention you might not attach the right things to the right subtask, and that whole thing just seems... fragile. But it still is a neat idea to play around with.

But here's some ideas that I think are neat all combined, related to this problem:

The idea that the request or asyncio task is the main object that you attach useful variables to, and you just pass that thing around as a "universal context" like crazy. (The downside: what happens when you aren't using asyncio, or don't need an http request, like in a migration script?)
I like the idea of the application being multi-instance'able, and then having requests and a local context as a layer on top of that. So you've got the "instantiated application" layer, and on top of that the "current request" layer, and at both levels you can have variables attached (like the database engine on the application level, but the database session on the request level). That's an awesome distinction.

But you don't really know whether or not some bit of code is using an asyncio task, a web request, or whatever to pass around. Here's the thing though: it doesn't really matter most of the time. With rare exceptions, you're just looking for $OBJECT.db or $OBJECT.templates or something. You just need some kind of object you can tack attributes on to.

So that's my idea in libgoblin/pydraulics: you have an application and you want to do something with it (handle a request, execute a task, etc), you can tack stuff onto that object. So either create a fresh context object to tack stuff onto or just start tacking things onto an object you have!

Currently, this looks like:

# Pydraulics -- Easy Pump API integration into your software.
#
# Copyright (C) 2014 Pydraulics contributors.  See AUTHORS.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
#
# Large parts of this heavily borrowed from MediaGoblin.
# Copyright (C) 2011-2014 MediaGoblin contributors, see MEDIAGOBLIN-AUTHORS

class PydraulicsApp(object):
    """
    Pydraulics "basic" WSGI application
    """

    # ...

    def gen_context(self, request=None):
        """
        Generate or apply a context.

        If we have a request, use that as the context instead.
        """
        if request is not None:
            c = request
            c.app = self
        else:
            c = Context(self)

        c.template_env = jinja2.Environment(
            loader=self.template_loader, autoescape=True,
            undefined=jinja2.StrictUndefined,
            extensions=[
                'jinja2.ext.autoescape'])

        # Set up database
        c.db = self.Session()

        if request is not None:
            c.url_map = url_map.bind_to_environ(request.environ, server_name=self.netloc)
        else:
            c.url_map = url_map.bind(
                self.netloc,
                # Maybe support this later
                script_name=None)

Anyway, simple enough. Then you have request.db made, or if you've just got a command line script and you need the equivalent of that and you already have your instantiated application, just run application.gen_context(). Thus, for utilities that are working with this application and need a variety of instantiated things (the database and the template engine and so on and so on) it's easy enough to just accept "context" as the first argument of the function, then use context.db and etc. (I've considered using just "c" or "ctx" instead of "context" as the variable name since it seems so common and since it conflicts a bit with template context and friends, though that's not very explicit.) So, this seems good.

At one point I got frustrated with the massive amount of porting to libgoblin I was having to do and thought "I really should probably just use Django or Flask itself." However, I found that neither framework really addresses the asyncio stuff I was dealing with above, and once I got enough ported of libgoblin over, libgoblin-based development is very fast and comfortable.

That said, it took up enough time working through those things where I didn't complete the Pump implementation. That's okay, I've got enough to do what's required on my end from MediaGoblin (and we've got good direction and help on the federation end this upcoming year where the most important thing is that I have a good understanding). I still think pydraulics is a pretty neat idea, and I may finish it, though it'll be back-burner'ed for now.

However, libgoblin is something I'm likely to extract. I'm convinced that MediaGoblin is at a point where it's stable enough to know what works and doesn't about the technical design, so that gives me a good basis to know what to build from here. There are other applications I'd like to build which should mesh nicely with MediaGoblin but which really don't belong as part of MediaGoblin itself, and would be kind of hacky add-ons. Clearly this is not the most important development, but towards the end of the summer as we hopefully get the Python 3 branch merged, I will be looking towards this.

Aside from this, on the "-cation" end of things, I took some time to relax and also reapproach my health. I may have a separate post on that soon.

So, that's that. Overall it was productive, but again, not quite in the ways I was expecting. I feel okay about that though... I wanted to do some hacking and not feel deeply pressured or stressed about it... if that wasn't true, I think the "-cation" part wouldn't have held up. So I feel okay that I wandered a bit, and the other things I worked on / found I think are important anyhow, and have me much better prepared for the year ahead. Not to mention the most important part: I feel pretty refreshed and capable of taking it on!

What's next for the coming week? Well now that this is all over, I'm organizing plans so we can get rewards out the door and do project planning for the year ahead. We've got a lot of promises to fulfill. Better get to it!