GNU MediaGoblin

Table of Contents

1 About

What is MediaGoblin? I'm shooting for:

  • Initially, a place to store all your photos that's as awesome as, more awesome than, existing proprietary solutions
  • Later, a place for all sorts of media, such as video, music, etc hosting.
  • Federated, like statusnet/ostatus (we should use ostatus, in fact!)
  • Customizable
  • A place for people to collaborate and show off original and derived creations
  • Free, as in freedom. Under the GNU AGPL, v3 or later. Encourages free formats and free licensing for content, too.

Wow! That's pretty ambitious. Hopefully we're cool enough to do it. I think we can.

It's also necessary, for multiple reasons. Centralization and proprietization of media on the internet is a serious problem and makes the web go from a system of extreme resilience to a system of frightening fragility. People should be able to own their data. Etc. If you're reading this, chances are you already agree though. :)

2 Milestones

Excepting the first, not necessarily in this order.

2.1 Basic image hosting

2.2 Multi-media hosting (including video and audio)

2.3 API(s)

2.4 Federation

Maybe this is 0.2 :)

2.5 Plugin system

3 Technology

I have a pretty specific set of tools that I expect to use in this project. Those are:

  • Python: because I love, and know well, the language
  • MongoDB: a "document database". Because it's extremely flexible (and scales up well, but I guess not down well)
  • MongoKit: a lightweight ORM for mongodb. Helps us define our structures better, does schema validation, schema evolution, and helps make things more fun and pythonic.
  • Jinja2: for templating. Pretty much django templates++ (wow, I can actually pass arguments into method calls instead of tediously writing custom tags!)
  • WTForms: for form handling, validation, abstraction. Almost just like Django's templates,
  • WebOb: gives nice request/response objects (also somewhat djangoish)
  • Paste Deploy and Paste Script: as the default way of configuring and launching the application. Since MediaGoblin will be fairly wsgi minimalist though, you can probably use other ways to launch it, though this will be the default.
  • Routes: for URL routing. It works well enough.
  • JQuery: for all sorts of things on the javascript end of things, for all sorts of reasons.
  • Beaker: for sessions, because that seems like it's generally considered the way to go I guess.
  • nose: for unit tests, because it makes testing a bit nicer.
  • Celery: for task queueing (think resizing images, encoding video) because some people like it, and even the people I know who don't don't seem to know of anything better :)
  • RabbitMQ: for sending tasks to celery, because I guess that's what most people do. Might be optional, might also let people use MongoDB for this if they want.

3.1 Why python

Because I (Chris Webber) know Python, love Python, am capable of actually making this thing happen in Python (I've worked on a lot of large free software web applications before in Python, including Miro Community, the Miro Guide, a large portion of Creative Commons' site, and a whole bunch of things while working at Imaginary Landscape). I know Python, I can make this happen in Python, me starting a project like this makes sense if it's done in Python.

You might say that PHP is way more deployable, that rails has way more cool developers riding around on fixie bikes, and all of those things are true, but I know Python, like Python, and think that Python is pretty great. I do think that deployment in Python is not as good as with PHP, but I think the days of shared hosting are (thankfully) coming to an end, and will probably be replaced by cheap virtual machines spun up on the fly for people who want that sort of stuff, and Python will be a huge part of that future, maybe even more than PHP will. The deployment tools are getting better. Maybe we can use something like Silver Lining. Maybe we can just distribute as .debs or .rpms. We'll figure it out.

But if I'm starting this project, which I am, it's gonna be in Python.

3.2 Why mongodb

In case you were wondering, I am not a NOSQL fanboy, I do not go around telling people that MongoDB is web scale. Actually my choice for MongoDB isn't scalability, though scaling up really nicely is a pretty good feature and sets us up well in case large volume sites eventually do use MediaGoblin. But there's another side of scalability, and that's scaling down, which is important for federation, maybe even more important than scaling up in an ideal universe where everyone ran servers out of their own housing. As a memory-mapped database, MongoDB is pretty hungry, so actually I spent a lot of time debating whether the inability to scale down as nicely as something like SQL has with sqlite meant that it was out.

But I decided in the end that I really want MongoDB, not for scalability, but for flexibility. Schema evolution pains in SQL are almost enough reason for me to want MongoDB, but not quite. The real reason is because I want the ability to eventually handle multiple media types through MediaGoblin, and also allow for plugins, without the rigidity of tables making that difficult. In other words, something like:

{"title": "Me talking until you are bored",
 "description": "blah blah blah",
 "media_type": "audio",
 "media_data": {
     "length": "2:30",
     "codec": "OGG Vorbis"},
 "plugin_data": {
     "licensing": {
         "license": "http://creativecommons.org/licenses/by-sa/3.0/"}}}

Being able to just dump media-specific information in a media_data hashtable is pretty great, and even better is having a plugin system where you can just let plugins have their own entire key-value space cleanly inside the document that doesn't interfere with anyone else's stuff. If we were to let plugins to deposit their own information inside the database, either we'd let plugins create their own tables which makes SQL migrations even harder than they already are, or we'd probably end up creating a table with a column for key, a column for value, and a column for type in one huge table called "plugin_data" or something similar. (Yo dawg, I heard you liked plugins, so I put a database in your database so you can query while you query.) Gross.

I also don't want things to be too lose so that we forget or lose the structure of things, and that's one reason why I want to use MongoKit, because we can cleanly define a much structure as we want and verify that documents match that structure generally without adding too much bloat or overhead (mongokit is a pretty lightweight wrapper and doesn't inject extra mongokit-specific stuff into the database, which is nice and nicer than many other ORMs in that way).

3.3 Why wsgi minimalism / Why not Django

If you notice in the technology list above, I list a lot of components that are very Django-like, but not actually Django components. What can I say, I really like a lot of the ideas in Django! Which leads to the question: why not just use Django?

While I really like Django's ideas and a lot of its components, I also feel that most of the best ideas in Django I want have been implemented as good or even better outside of Django. I could just use Django and replace the templating system with Jinja2, and the form system with wtforms, and the database with MongoDB and MongoKit, but at that point, how much of Django is really left?

I also am sometimes saddened and irritated by how coupled all of Django's components are. Loosely coupled yes, but still coupled. WSGI has done a good job of providing a base layer for running applications on and if you know how to do it yourself it's not hard or many lines of code at all to bind them together without any framework at all (not even say Pylons, Pyramid, or Flask which I think are still great projects, especially for people who want this sort of thing but have no idea how to get started). And even at this already really early stage of writing MediaGoblin, that glue work is mostly done.

Not to say I don't think Django isn't great for a lot of things. For a lot of stuff, it's still the best, but not for MediaGoblin, I think.

One thing that Django does super well though is documentation. It still has some faults, but even with those considered I can hardly think of any other project in Python that has as nice of documentation as Django. It may be worth learning some lessons on documentation from Django, on that note.

I'd really like to have a good, thorough hacking-howto and deployment-howto, especially in the former making some notes on how to make it easier for Django hackers to get started.

Author: Christopher Allan Webber

Org version 7.5 with Emacs version 24

Validate XHTML 1.0