Goodbye 2015, Hello 2016

By Christine Lemmer-Webber on Mon 04 January 2016

I'm sitting on a train traveling from Illinois to California, the long stretch of a journey from Madison to San Francisco. Morgan sits next to me. We are staring out the windows of the observation deck of this train as we watch the snow covered mountains pass by. I am feeling more relaxed and at peace than I have in years.

2016 is opening in a big way for me. As you may have heard (I mentioned it in the last State of the Goblin post) MediaGoblin was accepted into the Stripe Open Source Retreat program. Basically, Stripe gives us no-strings-attached funding for me to advance our work on MediaGoblin, but they wanted me to work from their office during that time. Seems like quite a deal to me! Unfortunately it does mean leaving Morgan behind in Madison for that time period. But that's why we splurged on a fancy train car and why she's joining me in San Francisco for the first week, so we can spend some quality time together. (Plus, Morgan has a conference that first week in San Francisco anyway; double plus, Amtrak has an extremely generous baggage policy so I'm able to get all of the belongings I need for that period shipped along with me fairly easily.) Morgan and I have been talking about but not really taking a vacation for a while, so we decided the moving-scenery approach would be a nice way to do things. It's great... we're mostly reading and drinking tea and staring out the window at the beautiful passings-by. I could hardly imagine a nicer send-off. (So yeah, if you're considering taking such a journey with your loved ones, I recommend it.)

The passage of scenery leads to reflection on the passage of time. Now seems a good time to write a bit about 2015 and what it meant. It was a very eventful year for me. I have come recently to explain to people that "I live a magical and high-stress life"; 2015 evoked that well. From a personal standpoint, Morgan and I's relationship runs strong, maybe stronger than ever, and I am thankful for that. From the broader family standpoint, the graph advances steady at times with strong peaks and valleys, perhaps more pronounced than usual. Love, gain, success, loss... it feels that everything has happened this year. Our lives have also been rearranged dramatically in an attempt to help a family member in a time of need, and that has its own set of peaks and valleys, as is to be expected. But that is the stuff of life, and you do what you can when you can, and you try your best, and you hope that others will try their best, what happens from there happens, and you use it to plan the next round of doing the best you can.

That's all very vague I suppose, but many things feel too private to discuss so publicly. Nonetheless, I wanted to record the texture of the year.

So what in the way of, you know, that thing we call a "career"? Well, it has continued to be magical, in the way that I have had a lot of freedom to explore things and address issues I really care about. Receiving an award (particularly since I did not know I had even been a candidate ahead of being notified that I received it) has also been gratifying and reassuring in some ways; I regularly fear that I am not doing well enough at advancing the issues I care about, but clearly some people do, and that's nice. It has also continued to be high stress, in that the things I worry about feel very high stakes on a global level, and that the difficulty of accomplishing them also feels very strong, and of course many are not there yet. Nonetheless, there has been a lot of progress this year, though it has come with a worrying increase of scope in the number of things I am attempting to accomplish.

We're much nearer to 1.0 on MediaGoblin, which is a huge relief. Of course, this is mostly due to Jessica Tallon's hard work on getting federation in MediaGoblin working, and other MediaGoblin community memebers doing many other interesting things. Embarassingly, I have done a lot less on MediaGoblin than in the last few years. In a sense, this is okay, because the money from the campaign has been going to pay Jessica Tallon, and not myself. I still feel bad about it though. The good news is that the focus time from the Stripe retreat should allow me the space and focus to hopefully get 1.0 actually out the door. So that leads to strong optimism.

The reduced time spent coding on MediaGoblin proper has been deceptive, since most of the projects I've worked on have spun out of work I believe is essential for MediaGoblin's long-term success. I took a sabbatical from MediaGoblin proper mid-year to focus on two goals: advancing federation standards (and my own understanding of them), and advancing the state of free software deployment. (I'm aware of a whiff of yak fumes here, though for each I can't see how MediaGoblin can succeed in their present state.) I believe I have made a lot of progress in both areas. As for federation, I've worked hard in participating in the W3C Social Working Group, I have done some test implementations, and recently I became co-editor on ActivityPump. On deployment, much work has been done on the UserOps side, both in speaking and in actual work. After initially starting to try to use Salt/Ansible as a base and hitting limitations, then trying to build my own Salt/Ansible'esque system in Hy and then Guile and hitting limitations there too, I eventually came to look into (after much prodding) Guix. At the moment, I think it's the only foundation solid enough on which to build the tooling to get us out of this mess. I've made some contributions, albeit mostly minor, have begun promoting the project more heavily, and am trying to work towards getting more deployment tooling done for it (so little time though!). I'm also now dual booting between GuixSD and Debian, and that's nice.

(Speaking of, towards the end of the year I switched to a Minifree x200 on which I'm dual booting Debian and Guix. I believe this puts me much deeper into the "free software vegan" territory.)

<*COMMENT*> fundamentals, brushing up on

I also believe that over the last year I have changed dramatically as a programmer. For nearly ten years I identified as a "python web developer", but I believe that identity no longer feels like an ideal description. One thing I have always been self conscious of is how little I've known about deeper computer science fundamentals. This has changed a lot, and I believe much of it has been spending so much time in the Guile and Scheme communities, and reading the copious interesting literature that is available there. My brother Steve and I also now often meet together and watch various programming lectures and discuss them, which has been both illuminating and also a great way to understand a side of my brother I never knew. It's a nice mix; I'm a very get-things-done person, he's a very theoretical person, and we're meeting partway in the middle and I think both of us are stretching our brains in ways we hadn't before. I feel like a different programmer than I was. A year and a half ago, I remember being on a bike ride with Steve and I remember complaining to him that I didn't understand why functional programmers are so obsessed with immutability... mutation is so useful, I exclaimed! Steve paused and said very carefully, "Well... mutation brings a lot of problems..." but I just didn't understand what he was getting at. Now I look back on that bike ride and wonder at the former-me taking that position.

(All that said though, I'm glad that I've had the background I have of being a "python web developer" first, for a matter of perspective...)

I do feel that much has changed in my life in this last year. There were hard things, but overall, life has been good to me, and I still am doing what I believe in and care about. Not everyone has that opportunity. And this train ride already points the way to a year that should be productive, and will certainly be eventful.

Anyway, that's enough navel-gazing-reflection, I suppose. One more navel-gaze: here's to the changed person on the other end of 2016. I hope I can do them justice. And I hope you can do yourself justice in 2016 too.

VCS friendly, patchable, document line wrapping

By Christine Lemmer-Webber on Thu 17 December 2015

If you do enough work in any sort of free software environment, you get used to doing lots of writing of documentation or all sorts of other things in some plaintext system which exports to some non-plaintext system. One way or another you have to decide: are you going to wrap your lines with newlines? And of course the answer should be "yes" because lines that trail all the way off the edge of your terminal is a sin against the plaintext gods, who are deceptively mighty, and whose wrath is to be feared (and blessings to be embraced). So okay, of course one line per paragraph is off the table. So what do you do?

For years I've taken the lazy way out. I'm an emacs user, and emacs comes with the `fill-paragraph' command, so conveniently mapped to M-q. So day in and day out I'm either whacking M-q now and then, or I'm being lazy and letting something like `auto-fill-mode' do the job. Overall this results in something rather pleasing to the plaintext-loving eye. If we take our first paragraph as an example, it would look like this:

If you do enough work in any sort of free software environment, you get used to
doing lots of writing of documentation or all sorts of other things in some
plaintext system which exports to some non-plaintext system.  One way or
another you have to decide: are you going to wrap your lines with newlines?
And of course the answer should be "yes" because lines that trail all the way
off the edge of your terminal is a sin against the plaintext gods, who are
deceptively mighty, and whose wrath is to be feared (and blessings to be
embraced).  So okay, of course one line per paragraph is off the table.  So
what do you do?

But my friends, you know as well as I do: this isn't actually good. And we know it's not good because one of the primary benefits of plaintext is that we have nice tools to diff it and patch it and check it into version control systems and so on. And the sad reality is, if you make a change at the start of a paragraph and then you re-fill (or re-wrap for you non-emacs folks) it, you are going to have a bad time! Why? Because imagine you and your friends are working on this document together, and you're working in some branch of your document, and then your friend Sarah or whoever sends you a patch and you're so excited to merge it, and she does a nice job and edits a bunch of paragraphs and re-wraps it or re-fills them because why wouldn't she do that, it's the best convention you have, so you happily merge it in and say thanks, you look forward to future edits, and then you go to merge in your own branch you've been working on privately, but oh god oh no you were working on your own overhaul which re-wrapped many of the same paragraphs and now there are merge conflicts everywhere.

That's not an imaginary possibility; if you've worked on a documentation project big enough, I suspect you've hit it. And hey, look, maybe you haven't hit it, because maybe most of your writing projects aren't so fast paced. But have you ever looked at your version control log? Ever done a `git/svn/foo blame', `git/svn/foo praise', or whatever convention? Eventually you can't figure out what commit anything came from, and my friends, that is a bad time.

In trying to please the plaintext gods, we have defiled their temple. Can we do better?

One interesting suggestion I've heard, but just can't get on board with, is to keep each sentence on its own line. It's a nice idea, and I want to like it, because the core idea is good: each sentence doesn't interfere with the one before or after it, so if you change a sentence, it's easy for both you and the computer to tell which one. This means you can check things in and out of version control, send and receive patches, and from that whole angle, things are great.

But it's a sin to the eye to have stuff scrolling off the edge of your terminal like that, and each sentence on its own line, well... it just confuses me. Let's re-look at that first paragraph again in this style:

If you do enough work in any sort of free software environment, you get used to doing lots of writing of documentation or all sorts of other things in some plaintext system which exports to some non-plaintext system.
One way or another you have to decide: are you going to wrap your lines with newlines?
And of course the answer should be "yes" because lines that trail all the way off the edge of your terminal is a sin against the plaintext gods, who are deceptively mighty, and whose wrath is to be feared (and blessings to be embraced).
So okay, of course one line per paragraph is off the table.
So what do you do?

Ugh, it's hard to put into words why this is so offensive to me. I guess it's because each sentence can get so long that it looks like the separation between sentence is a bigger break than the separation between paragraphs. And I just hate things scrolling off to the right like that. I don't want to be halfway through reading a word on my terminal and then have to jump back so I can keep reading it.

So no, this is not good either. But it is on the right track. Is there a way to get the best of both worlds?

Recently, when talking about this problem with my good friend David Thompson, I came to realize that there is a potentially great solution that makes a hybrid of the technical merits of the one-sentence-per-line approach and the visually pleasing merits of the wrap/fill-your-paragraph approach. And the answer is: put each sentence on its own line, and wrap each sentence!

This is best seen to be believed, so let's take a look at that first paragraph again... this time, as I typed it into my blogging system:

If you do enough work in any sort of free software environment, you get used
  to doing lots of writing of documentation or all sorts of other things in
  some plaintext system which exports to some non-plaintext system.
One way or another you have to decide: are you going to wrap your lines with
  newlines?
And of course the answer should be "yes" because lines that trail all the way
  off the edge of your terminal is a sin against the plaintext gods, who are
  deceptively mighty, and whose wrath is to be feared (and blessings to be
  embraced).
So okay, of course one line per paragraph is off the table.
So what do you do?

Yes, yes, yes! This is what we want! Now it looks good, and it merges good. And we still can preserve the multi-line separation between paragraphs. Also, you might notice that I continue each sentence by giving two spaces before its wrapped continuation, and I think that's an extra nice touch (but you don't have to do it).

This is how I'm writing all my documentation, and the style in which I will request all documentation for projects I start be written in, from now on. Now if you're writing an email, or something else that's meant to be read in plaintext as-is (you do read/write your email in plaintext, right?), then maybe you should just do the traditional fill paragraph approach. After all, you want that to look nice, and in many of those cases, the text doesn't change too much. But if you're writing something where the plaintext version is just intermediate, and you have some other export which is what people mostly will read, I think this is a rather dandy approach.

I hope you find it useful as well! Happy documentation hacking!

Parallels Between Code of Conducts and Copyleft (and their objectors)

By Christine Lemmer-Webber on Wed 09 December 2015

Here's a strawman I'm sure you've seen before:

Why all this extra process? Shouldn't we trust people to do the right thing? Wouldn't you rather people do the right thing because it's the right thing to do? People are mostly doing the right thing anyway! Trust in the hacker spirit to triumph!

The question is, is this an objection to copyleft, or is it an objection to code of conducts? I've seen objections raised to both that go along these lines. I think there's little coincidence, since both of them are objections to added process which define (and provide enforcement mechanisms for) doing the right thing.

Note that I don't think copyleft and code of conducts are exactly the same thing. They aren't, and the things they try to prevent are probably not proportional.

But I do think that there's an argument that achieving real world social justice involves a certain amount of process, laying the ground for what's permitted and isn't, and (if you have to, but hopefully you don't) a specified direction for requiring compliance with that correct behavior.

Curiously we also have people who are pro copyleft and strongly anti code of conduct, and the reverse. Maybe examining the parallels between objections to both might help identify that a supporter of one might consider that the other makes sense, too.

Update: This was originally posted to the pumpiverse and since cross-posting to my blog, at least one interesting response to it has been made there.

Another update: Sumana Harihareswara pointed out that not only had we probably talked about this when we hung out in May, but that she had previously written on the subject even before then! I had forgotten about reading this (I forget about a lot of things), though honestly, my post is probably a reflection of Sumana's original thoughts. She goes into more detail (and probably much better) than I did here... you should read her writing on the subject! It's good stuff!

Hash tables are easy (in Guile)

By Christine Lemmer-Webber on Mon 09 November 2015

As a programmer, I use hash tables of varying kinds pretty much all day, every day. But one of the odd and embarrassing parts of being a community-trained programmer is that I've never actually implemented one. Eek! Well, today I pulled an algorithms book off the shelf and decided to see how long it would take me to implement their simplest example in Guile. It turns out that it takes less than 25 lines of code to implement a basic hash table with O(1) best time, O(1) average time, and O(n) worst case time. The worst case won't be too common depending on how we size things so this isn't so bad, but we'll get into that as we go along.

Here's the code:

;;; Simple hash table implementation -- (C) 2015 Chris Lemmer-Webber
;;; Released under the "Any Free License 2015-11-05", whose terms are the following:
;;;   This code is released under any of the free software licenses listed on
;;;     https://www.gnu.org/licenses/license-list.html
;;;   which for archival purposes is
;;;     https://web.archive.org/web/20151105070140/http://www.gnu.org/licenses/license-list.html

(use-modules (srfi srfi-1))

(define (make-dumbhash size)
  "Make a dumb hash table: an array of buckets"
  (make-array '() size))

(define* (dumbhash-ref dumbhash key #:optional (default #f))
  "Pull a value out of a dumbhash"
  (let* ((hashed-key (hash key (array-length dumbhash)))
         (bucket (array-ref dumbhash hashed-key)))
    (or (find (lambda (x) (equal? (car x) key))
              bucket)
        default)))

(define (dumbhash-set! dumbhash key val)
  "Set a value in a dumbhash"
  (let* ((hashed-key (hash key (array-length dumbhash)))
         (bucket (array-ref dumbhash hashed-key)))
    ;; Only act if it's not already a member
    (if (not (find (lambda (x) (equal? (car x) key))
                   bucket))
        (array-set! dumbhash
                    ;; extend the bucket with the key-val pair
                    (cons (cons key val) bucket)
                    hashed-key))))

You might even notice that some of these lines are shared between [dumbhash-ref]{.title-ref} and [dumbhash-set!]{.title-ref}, so this could be even shorter. As-is, sans comments and docstrings, it's a mere 17 lines. That's nothing.

We also cheated a little: we're using [hash]{.title-ref} and [equal?]{.title-ref} to generate a hash and to test for equality, which are arguably the hard parts of the job. But these are provided by Guile, and it's one less thing to worry about. Here's a brief demonstration though:

(equal? 'a 'a)               ;; => #t, or true
(equal? 'a 'b)               ;; => #f, or false
(equal? "same" "same")       ;; => #t
(equal? "same" "different")  ;; => #f
(hash "foo" 10)              ;; => 6
(hash 'bar 10)               ;; => 5

[equal?]{.title-ref} is self-explanatory. The important thing to know about [hash]{.title-ref} is that it'll pick a hash value for a [key]{.title-ref} (the first parameter) for a hash table of some [size]{.title-ref} (the second parameter).

So let's jump into an example. [make-dumbhash]{.title-ref} is pretty simple. It just creates an array of whatever [size]{.title-ref} we pass into it. Let's make a simple hash now:

scheme@(guile-user)> (define our-hash (make-dumbhash 8))
scheme@(guile-user)> our-hash
$39 = #(() () () () () () () ())

This literally made an array of 8 items which easy start out with the empty list as its value (that's [nil]{.title-ref} for you common lispers). (You can ignore the [$39]{.title-ref} part, which may be different when you try this; Guile's REPL lets you refer to previous results at your prompt by number for fast & experimental hacking.)

So our implementation of hash tables is of fixed size, which doesn't limit the number of items we put into it, since buckets can contain multiple values in case of collision (and collisions tend to happen a lot in hash tables, and we come prepared for that), but this does mean we have an existing guess of about how many buckets we need for efficiency. (Resizing hash tables is left as an exercise for the reader.) Our hash table also uses simple linked lists for its buckets, which isn't too uncommon as it turns out.

Let's put something in the hash table. Animal noises are fun, so:

scheme@(guile-user)> (dumbhash-set! our-hash 'monkey 'ooh-ooh)
scheme@(guile-user)> our-hash
$40 = #(() () () ((monkey . ooh-ooh)) () () () ())

The monkey was appended to the third bucket. This makes sense, because the hash of [monkey]{.title-ref} for size 8 is 3:

scheme@(guile-user)> (hash 'monkey 8)
$41 = 3

We can get back the monkey:

scheme@(guile-user)> (dumbhash-ref our-hash 'monkey)
$42 = (monkey . ooh-ooh)

We've set this up so that it returns a pair when we get a result, but if we try to access something that's not there, we get #f instead of a pair, unless we set a default value:

scheme@(guile-user)> (dumbhash-ref our-hash 'chameleon)
$43 = #f
scheme@(guile-user)> (dumbhash-ref our-hash 'chameleon 'not-here-yo)
$44 = not-here-yo

So let's try adding some more things to `our-hash`:

scheme@(guile-user)> (dumbhash-set! our-hash 'cat 'meow)
scheme@(guile-user)> (dumbhash-set! our-hash 'dog 'woof)
scheme@(guile-user)> (dumbhash-set! our-hash 'rat 'squeak)
scheme@(guile-user)> (dumbhash-set! our-hash 'horse 'neigh)
scheme@(guile-user)> ,pp our-hash
$45 = #(()
        ((horse . neigh))
        ()
        ((rat . squeak) (monkey . ooh-ooh))
        ((cat . meow))
        ()
        ((dog . woof))
        ())

([,pp]{.title-ref} is a shortcut to [pretty-print]{.title-ref} something at the REPL, and I've taken the liberty of doing some extra alignment of its output for clarity.)

So we can see we have a collision in here, but it's no problem. Both [rat]{.title-ref} and [monkey]{.title-ref} are in the same bucket, but when we do a lookup of a hashtable in our implementation, we get a list back, and we search to see if that's in there.

We can figure out why this is O(1) average / best time, but O(n) worst time. Assume we made a hash table of the same size as the number of items we put in... assuming our [hash]{.title-ref} procedure gives pretty good distribution, most of these things will end up in an empty bucket, and if they end up colliding with another item (as the rat and monkey did), no big deal, they're in a list. Even though linked lists are of O(n) complexity to traverse, assuming a properly sized hash table, most buckets don't contain any or many items. There's no guarantee of this though... it's entirely possible that we could have a table where all the entries end up in the same bucket. Luckily, given a reasonably sized hash table, this is unlikely. Of course, if we ended up making a hash table that started out with 8 buckets, and then we added 88 entries... collisions are guaranteed in that case. But I already said resizing hash tables is an exercise for the reader. :)

If you're familiar enough with any Scheme (or probably any other Lisp), reading [dumbhash-ref]{.title-ref} and [dumbhash-set!]{.title-ref} should be pretty self-explanatory. If not, go read an introductory Scheme tutorial, and come back! (Relatedly, I think there aren't many good introductory Guile tutorials... I have some ideas though!

What lessons are there to be learned from this post? One might be that Guile is a pretty dang nice hacking environment, which is true! Another might be that it's amazing how far I've gotten in my career without ever writing a hash table, which is also true! But the lesson I'd actually like to convey is: most of these topics are not as unapproachable as they seem. I had a long-time fear that I would never understand such code until I took the time to actually sit down and attempt to write it.

As an additional exercise for the reader, here's a puzzle: is the [Any Free License]{.title-ref} this code released under actually a free license? And what commentary, if any, might the author be making? :)

Activipy v0.1 released!

By Christine Lemmer-Webber on Tue 03 November 2015

Hello all! I'm excited to announce v0.1 of Activipy. This is a new library targeting ActivityStreams 2.0.

If you're interested in building and expressing the information of a web application which contains social networking features, Activipy may be a great place to start.

Some things I think are interesting about Activipy:

: - It wraps ActivityStreams documents in pythonic style objects - Has a nice and extensible method dispatch system that even works well with ActivityStreams/json-ld's composite types. - It has an "Environment" feature: different applications might need to represent different vocabularies or extensions, and also might need to hook up entirely different sets of objects. - It hits a good middle ground in keeping things simple, until you need complexity. Everything's "just json", until you need to get into extension-land, in which case json-ld features are introduced. (Under the hood, that's always been there, but users don't necessarily need to understand json-ld to work with it.) - Good docs! I think! Or I worked really hard on them, at least!

As you may have guessed, this has a lot to do with our work on federation and the Social Working Group. I intend to build some interesting things on top of this myself.

In the meanwhile, I spent a lot of time on the docs, so I hope you find reading them to be enjoyable, and maybe you can build something neat with it? If you do, I'd love to hear about it!

Hitchhiker's guide to data formats

By Christine Lemmer-Webber on Wed 21 October 2015

Just thinking out loud this morning on what data formats there are and how they work with the world:

  • XML: 2000's hippest technology. Combines a clear, parsable tree based syntax with extension mechanisms and a schema system. Still moderately popular, though not as it once was. Tons of tooling. Many seem to think the tooling makes it overly complex, and JSON has taken over much of its place. Has the advantage of unambiguity over vanilla JSON, if you know how to use it right, but more effort to work with.
  • SGML: XML's soupier grandmother. Influential.
  • HTML: Kind of like SGML and XML but for some specific data. Too bad XHTML never fulfilled its dream. Without XHTML, it's even soupier than SGML, but there's enough tooling for soup-processing that most developers don't worry about it.
  • JSON: Also tree-based, but keeps things minimal, just your basic types. Loved by web developers everywhere. Also ambiguous since on its own, it's schema-free... this may lead to conflicts between applications. But if you know the source and the destination perfectly it's fine. Has the advantage of transforming into basic types in pretty much every language and widespread tooling. (Don't be evil about being evil, though? #vaguejokes) If you want to send JSON between a lot of locations and want to be unambiguous in your meaning, or if you want more than just the basic types provided, you're going to need something more... we'll come to that in a bit.
  • S-expressions: the language of lisp, and lispers claim you can represent anything as s-expressions, which is true, but also that's kind of ambiguous on its own. Capable also of representing code just as well, which is why lispers claim benefits of symmetry and "code that can write code". However, serializing "pure data" is also perfectly possible with s-expressions. So many variations between languages though... it's more of a "generalized family" or even better, a pattern, of data (and code) formats. Some damn cool representations of some of these other formats via sexps. Some people get scared away by all the parens, though, which is too bad, because (though this strays into code + data, not just data) homoiconicity can't be beat. (Maybe Wisp can help there?)
  • Canonical s-expressions: S-expressions, with a canonical representation... cool! Most developers don't know about it, but was designed for public key cryptography usage, and still actively used there (libgcrypt uses canonical s-expressions under the hood, for instance). No schema system, and actually pretty much just lists and binary strings, but the binary strings can be marked with "display hints" so systems can know how to unpack the data into appropriate types.
  • RDF and friends: The "unicode" of graph-oriented data. Not a serialization itself, but a specification on the conceptual modeling of data, and you'll hear "linked data" people talking about it a lot. A graph of "subject, predicate, object" triples. Pretty cool once you learn what it is, though the introductory material is really overwhelming. (Also, good luck representing ordered lists). However, there is no one serialization of RDF, which leads to much confusion among many developers (including myself, while being explained to the contrary, for a long time). For example, rdf/xml looks like XML, but woe be upon ye who uses XML tooling upon it. So, deserialzie to RDF, then deal with RDF in RDF land, then serialize again... that's the way to go with RDF. Has more sane formats than just rdf/xml, for example Turtle is easy to read. RDF community seems to get mad when you want to interpret data as anything other than RDF, which can be very off-putting, though the goal of a "platonic form" of data is highly admirable. That said, graph based tooling is definitely harder for most developers to work with than tree-based tooling, but hopefully "the jQuery of RDF" library will become available some day, and things will be easier. Interesting stuff to learn, anyway!
  • json-ld: A "linked data format", technically can transform itself into RDF, but unlike other forms of RDF syntax, can often be parsed just on its own as simple JSON. So, say you want to have JSON and keep things easy for most of your users who just use their favorite interpreted language to extract key value pairs from your API. Okay, no problem for them! But suddenly you're also consuming JSON from multiple origins, and one of them uses "run" to say "run a mile" whereas your system uses "run" to mean "run a program". How do you tell these apart? With json-ld you can "expand" a JSON representation with supplied context to an unambiguous form, and you can "compact" it down again to the terms you know and understand in your system, leaving out those you don't. No more executing a program for a mile!
  • Microformats and RDFa: Two communities which are notoriously and exasperatingly at odds with each other for over a decade, so why do I link them together? Well, both of these take the same approach of embedding data in HTML. Great when you have HTML for your data to go with, though not all data needs an HTML wrapper. But it's good to be able to extract it! RDFa simply extracts to RDF, which we've discussed plenty; Microformats extracts to its own thing. Frequent form of contention between these groups is about vocabulary, and how to represent vocabulary. RDFa people like their vocabulary to have canonical URIs for each term (well, that's an RDF thing, so not surprising), Microformats people like to document everything in a wiki. Arguments about extensibility is a frequent topic... if you want to get into that, see Amy Guy's summary of things.

Of course, there's more data formats than that. Heck, even on top of these data formats there's a lot more out there (these days I spend a lot of time working on ActivityStreams 2.0 related tooling, which is just JSON with a specific structure, until you want to get fancier, add extensions, or jump into linked data land, in which case you can process it as json-ld). And maybe you'd also find stuff like Cap'n Proto or Protocol Buffers to be interesting. But the above are the formats that, today, I think are generally most interesting or impactful upon my day to day work. I hope this guide was interesting to you!

A conversation with Sussman on AI and asynchronous programming

By Christine Lemmer-Webber on Wed 14 October 2015

Sussman!

A couple weeks ago I made it to the FSF's 30th anniversary party. It was a blast in many ways, and a good generator of many fond memories, but I won't go into depth of them here. One particularly exciting thing that happened for me though was I got to meet Gerald Sussman (of SICP!) The conversation has greatly impacted me, and I've been spinning it over and over again in my mind over the last few weeks... so I wanted to capture as much of it here while I still could. There are things Sussman said that I think are significant... especially in the ways he thinks contemporary AI is going about things wrong, and a better path forward. So here's an attempt to write it all down... forgive me that I didn't have a real tape recorder, so I've written some of this in a conversational style as I remember it, but of course these are not the precise things said. Anyway!

I wasn't sure initially if the person I was looking at was Gerald Sussman or not, but then I noticed that he was wearing the same "Nerd Pride" labeled pocket protector I had seen him wear in a lecture I had watched recently. When I first introduced myself, I said, are you Sussman? (His first reply was something like to look astonished and say, "am I known?") I explained that I've been reading the Structure and Interpretation of Computer Programs and that I'm a big fan of his work. He grinned and said, "Good, I hope you're enjoying it... and the jokes! There's a lot of jokes in there. Are you reading the footnotes? I spent a lot of time on those footnotes!" (And this point my friend David Thompson joined me, and they both chuckled about some joke about object oriented programmers in some footnote I either hadn't gotten to or simply hadn't gotten.)

He also started to talk enthusiastically about his other book, the Structure and Interpretation of Classical Mechanics, in which classical engineering problems and electrical circuits are simply modeled as computer programs. He expressed something similar to what he had said in the forementioned talk, that conventional mathematical notation is unhelpful, and that we ought to be able to express things more clearly as programs. I agreed that I find conventional mathematical notation unhelpful; when I try to read papers there are concepts I easily understand as code but I can't parse the symbolic math of. "There's too much operator overloading", I said, "and that makes it hard for me to understand in which way a symbol is being used, and papers never seem to clarify." Sussman replied, "And it's not only the operator overloading! What's that 'x' doing there! That's why we put 'let' in Scheme!" Do you still get to write much code or Scheme these days, I asked? "Yes, I write tens of thousands of lines of Scheme per year!" he replied.

I mentioned that I work on distributed systems and federation, and that I had seen that he was working on something that was called the propagator model, which I understood was some way of going about asynchronous programming, and maybe was an alternative to the actor model? "Yes, you should read the paper!" Sussman replied. "Did you read the paper? It's fun! Or it should be. If you're not having fun reading it, then we wrote it wrong!" (Here is the paper, as well as the documentation/report on the software... see for yourself!) I explained that I was interested in code that can span multiple processes or machines, are there any restrictions on that in the propagator model? "No, why would there be? Your brain, it's just a bunch of hunks of independent grey stuff sending signals to each other."

At some point Sussman expressed how he thought AI was on the wrong track. He explained that he thought most AI directions were not interesting to him, because they were about building up a solid AI foundation, then the AI system runs as a sort of black box. "I'm not interested in that. I want software that's accountable." Accountable? "Yes, I want something that can express its symbolic reasoning. I want to it to tell me why it did the thing it did, what it thought was going to happen, and then what happened instead." He then said something that took me a long time to process, and at first I mistook for being very science-fiction'y, along the lines of, "If an AI driven car drives off the side of the road, I want to know why it did that. I could take the software developer to court, but I would much rather take the AI to court." (I know, that definitely sounds like out-there science fiction, bear with me... keeping that frame of mind is useful for the rest of this.)

"Oh! This is very interesting to me, I've been talking with some friends about how AI systems and generative software may play in with software freedom, and if our traditional methods of considering free software still applied in that form," I said. I mentioned a friend of a friend who is working on software that is generated via genetic programming, and how he makes claims that eventually that you won't be looking at code anymore, that it'll be generating this black box of stuff that's running all our computers.

Sussman seemed to disagree with that view of things. "Software freedom is a requirement for the system I'm talking about!" I liked hearing this, but didn't understand fully what he meant... was he talking about the foundations on top of which the AI software ran?

Anyway, this all sounded interesting, but it also sounded very abstract. Is there any way this could be made more concrete? So I asked him, if he had a student who was totally convinced by this argument, that wanted to start working on this, where would you recommend he start his research? "Read the propagators paper!" Sussman said.

OH! Prior to this moment, I thought we were having two separate conversations, one about asynchronous programming, and one about AI research. Suddenly it was clear... Sussman saw these as interlinked, and that's what the propagator system is all about!

One of the other people who were then standing in the circle said, "Wait a minute, I saw that lecture you gave recently, the one called 'We Don't Really Know How to Compute!', and you talked about the illusion of seeing the triangle when there wasn't the triangle" (watch the video) "and what do you mean, that you can get to that point, and it won't be a black box? How could it not be a black box?"

"How am I talking to you right now?" Sussman asked. Sussman seemed to be talking about the shared symbolic values being held in the conversation, and at this point I started to understand. "Sure, when you're running the program, that whole thing is a black box. So is your brain. But you can explain to me the reasoning of why you did something. At that point, being able to inspect the symbolic reasoning of the system is all you have." And, Sussman explained, the propagator model carries its symbolic reasoning along with it.

A more contrived relation to this in real life that I've been thinking about: if a child knocks over a vase, you might be angry at them, and they might have done the wrong thing. But why did they do it? If a child can explain to you that they knew you were afraid of insects, and swung at a fly going by, that can help you debug that social circumstance so you and the child can work together towards better behavior in the future.

So now, hearing the above, you might start to wonder if everything Sussman is talking about means needing a big complicated suite of natural language processing tools, etc. Well, after this conversation, I got very interested in the propagator model, and to me at least, it's starting to make a lot of sense... or at least seems to. Cells' values are propagated from the results of other cells, but they also carry the metadata of how they achieved that result.

I recommend that you read the materials yourself if this is starting to catch your interest. (A repeat: here is the paper, as well as the documentation/report on the software... see for yourself!). But I will highlight one part that may help drive the above points more clearly.

The best way to catch up on this is to watch the video of Sussman talking about this while keeping the slides handy. The whole thing is worth watching, but about halfway through he starts talking about propagators, and then he gets to an example of where you're trying to measure the height of a building by a variety of factors, and you have these relationships set up where as information is filled in my a cell's dependencies, that cell merges what it already knows about the cell with what it just learned. In that way, you might use multiple measurements to "narrow down" the information. Again, watch the video, but the key part that comes out of the demonstration is this:

(content fall-time)
=> #(contingent #(interval 3.0255 3.0322)
                (shadow super))

What's so special about this? Well, the fall-time has been updated to a more narrow interval... but that last part (shadow and super) are the symbols of the other cells which propagated the information of this updated state. Pretty cool! And no fancy natural language parsing involved.

There's certainly more to be extrapolated from that, and more to explore (the Truth Maintenance Systems are particularly something interesting to me). But here were some interesting takeaways from that conversation, things I've been thinking over since:

  • AI should be "accountable", in the sense that it should be able to express its symbolic reasoning, and be held up to whether or not its assumptions held up to that.
  • Look more into the propagators model... it's like asynchronous programming meets functional programming meets neural nets meets a bunch of other interesting AI ideas that have been, like so many things, dropped on the floor for the last few decades from the AI winter, and which people are only now realizing they should be looking at again.
  • On that note, there's so much computing history to look at, so many people are focused on looking at what the "new hotness" is in web development or deployment or whatever. But sometimes looking backwards can help us better look forwards. There are ideas in SICP that people are acting as if they just discovered today. (That said, the early expressions of these ideas are not always the best, and so the past should be a source of inspiration, but we should be careful not to get stuck there.)
  • Traditional mathematical notation and electrical engineering diagrams might not convey clearly their meaning, and maybe we can do better. SICM seems to be an entire exploration of this idea.
  • Free software advocates have long known that if you can't inspect a system, you're held prisoner by it. Yet this applies not just to the layers that programmers currently code on, but also into new and more abstract frontiers. A black box that you can't ask to explain itself is a dangerous and probably poorly operating device or system.
  • And for a fluffier conclusion: "If you didn't have fun, we were doing it wrong." There's fun to be had in all these things, and don't lose sight of that.

Minimalist bundled and distributed bugtracker w/ orgmode

By Christine Lemmer-Webber on Sun 11 October 2015

Thinking out loud here... this isn't a new idea but maybe here's a solid workflow...

"Distributed" as in the project's existing DVCS.

  • Check a TODO.org orgmode file right into your project's git repo
  • Accept additions/adjustments to TODO.org via patches on your mailing list
  • As soon as a bug is "accepted", it's committed to the project.
  • When a bug is finished, it's closed and archived.
  • Contributors are encouraged to submit closing tasks in the orgmode tree as part of their patch.
  • Bug commentary happens on-list, but if users have useful information to contribute to someone working on a bug, they can submit that as a patch.

I think this would be a reasonably complete but very emacs user oriented bugtracker solution, so maybe in addition:

  • A script can be provided which renders a static html copy for browsing open/closed bugs.
  • A "form" can be provided on that page to email the list about new discovered bugs, and formats the submission as an orgmode TODO subsection. This way maintainers can easily file the bug into the tracker file if they deem appropriate.

I think this would work. Lately I've been hacking on a project that's mostly just me so far, so I just have an orgmode file bundled with the repo, but I must say that it's rather nice to just hack an orgmode file and have your mini-bugtracker distributed with your project. I've done this a few times but as soon as the project grows to multiple contributors, I move everything over to some web based bugtracker UI. But why not distribute all bugs with the project itself? My main thinking is that there's a tool-oriented barrier to entry, but maybe the web page render can help with that.

I've been spending more time working on more oldschool projects that just take bugs submitted on mailing lists as a contribution project. They seem to do just fine. So I guess it entirely depends on the type of project, but this may work well for some.

And yes, there are a lot of obvious downsides to this too; paultag points out a few :)

Chicago GNU/Linux talk on Guix retrospective

By Christine Lemmer-Webber on Thu 01 October 2015

GuixSD logo

Friends... friends! I gave a talk on Guix last night in Chicago, and it went amazingly well. That feels like an undersell actually; it went remarkably well. There were 25 people, and apparently there was quite the waitlist, but I was really happy with the set of people who were in the room. I haven't talked about Guix in front of an audience before and I was afraid it would be a dud, but it's hard to explain the reaction I got. It felt like there was a general consensus in the room: Guix is taking the right approach to things.

I didn't expect this! I know some extremely solid people who were in the audience, and some of them are working with other deployment technologies, so I expected at some point to be told "you are wrong", but that moment didn't come. Instead, I met a large amount of enthusiasm for the subject, a lot of curious questions, and well... there was some criticism of the talk, though it mostly came to presentation style and approach. Also, I had promised to give two talks, both about federation and about Guix, but instead I just gave the latter and splatted over the latter's time. Though people seemed to enjoy themselves enough that I was asked to come back again and give the federation talk as well.

Before coming to this talk, I had wondered whether I had gone totally off the yaks in heading in this direction, but giving this talk was worth it in at least that the community reaction has been a huge confidence booster. It's worth persuing!

So, here are some things that came out of the talk for me:

  • The talk was clear, and generally people said that though I went into the subject to quite some depth, things were well understood and unambiguous to them.
  • This is important, because it means that once people understood the details of what I was saying, it gave them a better opportunity to evaluate for whether it was true or not... and so the general sense of the room that this is the right approach was reassuring.
  • A two tier strategy for pushing early adoption with "practical developers" probably makes sense:

    • Developers seem really excited about the "universal virtualenv" aspect of Guix (using "guix environment") and this is probably a good feature to start gaining adoption.
    • Getting GuixOps working and solidly demonstrable
  • The talk was too long. I think everything I said was useful, but I literally filled two talk slots. There are some obvious things that can be cut or reduced from the talk.
  • In a certain sense, this is also because the talk was not one, but multiple talks. Each of these items could be cut to a brief slide or two and then expanded into its own talk:

    • An intro to functional programming. I'm glad to see this this intro was very clear, and though concise, could be reduced within the scope of this talk to two quick slides rather than four with code examples.
    • An "Intro to Guile"
    • Lisp history, community, and its need for outreach and diversity
    • "Getting over your parenthesis phobia"
  • I simply unfolded an orgmode tree while presenting the talk, and while this made things easy on me, it's not very interesting for most audience members (though my friend Aeva clearly enjoyed it)

Additionally, upon hearing my talk, my friend Karl Fogel seemed quite convinced about Guix's direction (and there's few people whose analysis I'd rate higher). He observed that Guix's fundamentals seem solid, but that what it probably needs is institutional adoption at this point to bring it to the next level, and he's probably right. He also pointed out that it's not too much for an organization to invest themselves in Guix at this point, considering that developers are using way less stable software than Guix to do deployments. He suggested I try to give this talk at various companies, which could be interesting... well, maybe you'll hear more about this in the future. Maybe as a trial run I should submit some podcast episodes to Hacker Public Radio or something!

Anyway, starting next week I'm putting my words to action and working on doing actual deployments using Guix. Now that'll be interesting to write about! So stay tuned, I guess!

PS: You can download the orgmode file of the talk or peruse the html rendered version or even better check out my talks repo!

Userops Acid Test v0.1

By Christine Lemmer-Webber on Sun 27 September 2015

Hello all!

So a while ago we started talking about this userops thing. Basically, the idea is "deployment for the people", focusing on user computing / networking freedom (in contrast to "devops", benefits to large institutions are sure to come as a side effect, but are not the primary focus. There's kind of a loose community surrounding the term now, and a number of us are working towards solutions. But I think something that has been missing for me at least is something to test against. Everyone in the group wants to make deployment easiser. But what does that mean?

This is an attempt to sketch out requirements. Keep in mind that I'm writing out this draft on my own, so it might be that I'm missing some things. And of course, some of this can be interpreted in multiple ways. But it seems to me that if we want to make running servers something for mere mortals to do for themselves, their friends, and their families, these are some of the things that are needed:

  1. Free as in Freedom:

I think this one's a given. If your solution isn't free and open source software, there's no way it can deliver proper network freedoms. I feel like this goes without saying, but it's not considered a requirement in the "devops" world... but the focus is different there. We're aiming to liberate users, so your software solution should of course itself start with a foundation of freedom.

  • Reproducible:
  • It's important to users that they be able to have the same system produced over and over again. This is important for experimenting with a setup before deployment, for ensuring that issues are reproducible and friends and communities helping each other debug problems when they run into them. It's also important for security; you should be able to be sure that the system you have described is the system you are really running, and that if someone has compromised your system, that you are able to rebuild it. And you shouldn't be relying on someone to build a binary version of your system for you, unless there's a way to rebuild that binary version yourself and you have a way to be sure that this binary version corresponds to the system's description and source. (Use the source, Luke!)

    Nonetheless, I've noticed that when people talk about reproducibility, they sometimes are talking about two distinct but highly related things.

    1. Reproducible packages:

    The ability to compile from source any given package in a distribution, and to have clear methods and procedures to do so. While has been a given in the free software world for a long time, there's been a trend in the devops-type world towards a determination that packaging and deployment in modern languages has gotten too complex, so simply rely on some binary deployment. For reasons described above and more, you should be able to rebuild your packages... *and* all of your packages' dependencies... and their dependencies too. If you can't do this, it's not really reproducible.

    An even better goal is to guarantee not only that packages can be built, but that they are byte-for-byte identical to each other when built upon all their previous dependencies on the same architecture. The Debian Reproducibility Project is a clear example of this principle in action.

  • Reproducible systems:
  • Take the package reproducibility description above, and apply it to a whole system. It should be possible to, one way or another, either describe or keep record of (or even better, both) the system that is to be built, and rebuild it again. Given selected packages, configuration files, and anything else that is not "user data" (which is addressed in the next section), it should be possible to have the very same system that existed previously.

    As with many things on this list, this is somewhat of a gradient. But one extrapoliation, if taken far enough, I believe is a useful one (and ties in with the "recoverable sytem" part): systems should not be necessarily dependent upon the date and time they are deployed. That is to say, if I deployed a system yesterday, I should be able to redeploy that same system today on an entirely new system using all the packages that were installed yesterday, even if my distribution now has newer, updated packages. It should be possible for a system to be reproducible towards any state, no matter what fixed point in time we were originally referring to.

  • Recoverable:
  • Few things are more stressful than having a computer that works, is doing something important for you, and then something happens... and suddenly it doesn't, and you can't get back to the point where your computer was working anymore. Maybe you even lost important data!

    If something goes wrong, it should be possible to set things right again. A good userops system should do this. There are two domains to which this applies:

    1. Recoverable data:

    In other words, backups. Anything that's special, mutable data that the user wants to keep fits in this territory. As much as possible, a userops system should seek to make running backups easy. Identifying based on system configuration which files to copy and helping to provide this information to a backup system, or simply only leaving all mutable user data in an easy-to-back-up location would help users from having to determine what to back up on their own, which can be easily overwhelming and error-prone for an individual.

    Some data (such as data in many SQL databases) is a bit more complex than just copying over files. For something like this, it would be best if a system could help with setting up this data to be moved to a more appropriate backup serialization.

  • Recoverable system:
  • Linking somewhat to the "reproducible system" concept, a user should be able to upgrade without fear of becoming stuck. Upgrade paralysis is something I know I and many others have experienced. Sometimes it even appears that an upgrade will go totally fine, and you may have tested carefully to make sure it will, but you do an upgrade, and suddenly things are broken. The state of the system has moved to a point where you can't get back! This is a problem.

    If a user experiences a problem in upgrading their system software and configuration, they should have a good means of rolling back. I believe this will remove much of the anxiety out of server administration especially for smaller scale deployments... I know it would for me.

  • Friendly GUI
  • It should be possible to install the system via a friendly GUI. This probably should be optional; there may be lower level interfaces to the deployment system that some users would prefer to use. But many things probably can be done from a GUI, and thus should be able to be.

    Many aspects of configuring a system require filling in shared data between components; a system should generally follow a Don't Repeat Yourself type philosophy. A web application may require the configuration details of a mail transfer agent, and the web application may also need to provide its own details to a web server such as Nginx or Apache. Users should have to fill in these details in one place each, and they should propagate configuration to the other components of the system.

  • Scriptable
  • Not everyone should have to work with this layer directly, but everyone benefits from scriptability. Having your system be scriptable means that users can properly build interfaces on top of your system and additional components that extend it beyond directions you may be able to do directly. For example, you might not have to build a web interface yourself; if your system exposes its internals in a language capable enough of building web applications, someone else can do that for you. Similarly with provisioning, etc.

    Working with the previous section, bonus points if the GUI can "guide users" into learning how to work with more lower level components; the Blender UI is a good example of this, with most users being artists who are not programmers, but hovering over user interface elements exposes their Python equivalents, and so many artists do not start out as developers, but become so in working to extend the program for their needs bit by bit. (Emacs has similar behavior, but is already built for developers, so is not as good of an example.) "Self Extensibility" is another way to look at this.

  • Collaboration friendly:
  • Though many individuals will be deploying on their own, many server deployments are set up to serve a community. It should be possible for users to help each other collaborate on deployment. This may mean a variety of things, from being able to collaborate on configuration, to having an easy means to reproduce a system locally.

    Additionally, many deployments share steps. Users should be able to help each other out and share "recipes" of deployment steps. The most minimalist (and least useful) version of this is something akin to snippet sharing on a wiki. Most likely, wikis already exist, so more interestingly, it should be possible to share deployment strategies via code that is proceduralized in some form. As such, in an ideal form, deployment recipes should be made available similar to how packages are in present distributions, with the appropriate slots left open for customization for a particular deployment.

  • Fleet manageable:
  • Many users have not one but many computers to take care of these days. Keeping so many systems up to date can be very hard; being able to do so for many systems at once (especially if your system allows them to share configuration components) can help a user actually keep on track of things and lead to less neglected systems.

    There may be different sets, or "fleets", of computers to take care of... you may find that a user discovers that she needs to both take care of a set of computers for her (and maybe her loved ones') personal use, but she also has servers to take care of for a hobby project, and another set of servers for work.

    Not all users require this, and perhaps this can be provided on another layer via some other scripting. But enough users are in "maintainance overload" of keeping track of too many computers that this should probably be provided.

  • Secure
  • One of the most important and yet open ended requirements, proper security is critical. Security decisions usually involve tradeoffs, so what security decisions are made is left somewhat open ended, but there should be a focus of security within your system. Most importantly, good security hygeine should be made easy for your users, ideally as easy or easier than not following good hygeiene.

    Particular areas of interest include: encrypted communication, preferring or enforcing key based authentication over passwords, isolating and sandboxing applications.

    To my knowledge, at this time no system provides all the features above in a way that is usable for many everyday users. (I've also left some ambiguity in how to achieve these properties above, so in a sense, this is not a pass/fail type test, but rather a set of properties to measure a system against.) In an ideal future, more Userops type systems will provide the above properties, and ideally not all users will have to think too much about their benefits. (Though it's always great to give the opportunity to users who are interested in thinking about these things!) In the meanwhile, I hope this document will provide a useful basis for implementing and thinking about mapping one's implementation against!