Why is it hard to move from one machine to another? An analysis. [x-post from Userops] -- Dustycloud Brainstorms

Why is it hard to move from one machine to another? An analysis. [x-post from Userops]

By Christine Lemmer-Webber on Wed 08 April 2015

NOTE: This is a shameless cross-post of something I originally sent to the userops list, where we discuss deployment things.

Hello all,

For a while I've been considering, why is it so harder for me to migrate from server to server than it is for me to migrate from desktop to desktop? For years, ever since I discovered rsync, migrating between machines has not been hard. I simply rsync my home directory over to the new machine (or maybe even just keep the old /home/ directory's partition where it is!) and bam, I am done. Backing this up is easy; it's just another rsync away. (I use dirvish as a simple wrapper around rsync so it can manage incremental backups.)

If I set up a new machine, it is no worry. Even if my current machine dies, it is mostly no worry. Rsync back my home directory, and done. I will spend a week or so discovering that certain programs I rely on are not there, and I'll install them one by one. In a way it's refreshing: I can install the programs I need, and the old cruft is gone!

This is not true for servers. At the back of my mind I realized this, but until the end of Stefano Zacchiroli's excellent LibrePlanet talk when I posed a question surrounding this situation, I hadn't totally congealed in my head: why is it so much harder for me to move from server to server? Assume I even have the old server around and I want to move. It isn't easy!

So here are some thoughts that come out of this:

For my user on my workstation, configuration and data are in the same place: /home/ (including lots of little dotfiles for the configs, and the rest is mostly data). Sure, there's some configuration stuff in /etc/ and data in /var/ but it mostly doesn't really matter, and copying that between machines is not hard.
Similarly, for my user on workstation experience, it is very little stress if I set up a machine and am missing some common packages. I can just install them again as I find them missing.
Neither of these are true for my server! In addition to caring about /home/, and even more importantly, I have to worry about configuration in /etc/ and data in /var/. These are both pains in the butt for me for different reasons.
Lots of stuff in /etc/ is configuration that interacts with the rest of the system in specific ways. I could rsync it to a new machine, but I feel like that's just blindly copying over stuff where I really need to know how it works and how it was set up with the rest of the machine in the first place.
This is compounded by the fact that people rarely set up one machine these days; usually they have to set up several machines for several users. Remembering how all that stuff worked is hard. The only solution seems to be to have some sort of reproducible configuration system. Hence the rise of salt, ansible, etc. But these aren't really "userops" systems, they're "devops"... developer focused. Not only do you need to know how they work, you need to know how the rest of the system works. And it's not easy to share that knowledge.
/var/ is another matter. Theoretically, most of my program data goes there (unless, of course, it went to /srv/, god help us). But I can't just rsync it! There are some processes that are very persnickety about the stuff there. I have to dump my databases and etc before I can move them or back up. Nothing sets up an automatic cronjob for me on these, I have to know to dump postgres. Hopefully I set up a cronjob!
While I as a workstation user don't stress too much if I'm missing some packages (just install them as I go), that is NOT true of my servers. If my mail servers aren't running, if jabber isn't on, (if SSH isn't running!!!), there are other servers expecting to communicate with my machine, and if I don't set them up, I miss out.
Not only this, assuming I have moved between servers correctly, even once I have set up my machine and it has become a perfectly okay running special snowflake, there are certain routine tasks that require a lot of manual intervention, and I have only picked up the right steps by knowing the right friends, having run across the right tutorials which hopefully have shown me the right setup, etc. SSL configuration, I'm looking at you; the only savior that I have is that I have written myself my own little orgmode notes on what to do the next time my certs expire.
My servers do become special snowflakes, and that is very stressful to me. I will, in the future, need to set up one more server, and remembering what I did in the past will be very hard.
Assuming I use all the mainstream tools, not talking about "upcoming" ones, a better configuration management solution is probably the answer, right? That's a lot to ask users though: it's not a solution to existing deployment, because it doesn't remove the need to know about all the layers underneath, it just adds a new layer to understand.

Those are all headaches, and they are not the only headaches. But here are some thoughts on things that can help:

If I recognize which parts of my system are "immutable" and which parts are "mutable", it's easier to frame how my system works.
- /var/ is mutable, it's data. There's no making this "reproducible" really: it needs to be backed up and moved around.
- My packages and system are immutable, or mostly should be. Even if not using a perfectly immutable system like guix/nix, it's helpful to act like this part of the system is pseudo-immutable, and simply derived from some listing of things I said I wanted installed. (Nix/Guix probably do this the most nicely though.)
- /etc/ is similarly "immutable but derived" in the best case. I should be able to give the same system configuration inputs and always get the same system of packages and configuration files.
I like Guix/Nix, but my usage of Debian and Fedora and friends is not going away anytime soon. Nonetheless, configuration management systems like puppet/ansible/salt help give the illusion of an immutable system derived from a set of inputs, even though they are working within a mutable one.
Language packaging for deployment needs to die. Yes, I say this as a project that advocates that very route. We're doing it wrong, and I want to change it. (Language packaging for development though: that's great!)
Asking people to use systems like ansible/salt/puppet is asking users too much. You're just asking them to learn one more layer on top of knowing how the whole system works. Sharing common code is mostly copy and paste. There are some layers built on top of here to mitigate this but afaict they aren't really good, not good enough. (I am working on something to solve this...)
Pre-built containers are not the solution. Sorry container people! Containers can be really useful but only if they are built in some reproducible way. But very few people using Docker and etc seem to be doing this. But here's another thing: Docker and friends contain their own deployment domain specific languages, which is dumb. If a reproducible configuration system is good enough, it should be good enough for a VM or a container or a vanilla server or a desktop. So maybe we can use containers as lightweight and even sandboxed VMs, but we shouldn't be installing prebuilt containers on our servers alone as a system.
Otherwise else you're running 80 heavy and expensive Docker images that slowly go out of date... now you're not maintaining 1 distribution install, you're maintaining 81 of them. Yikes! Good luck with the next Shellshock!
Before Asheesh jumps in here: yes I will say that Sandstorm is taking maybe the best route as in terms of a system that uses containers heavily (and unlike Docker, they seem actually sandboxed) in that it seems to have a separation between mutable parts and immutable parts: the container is more or less an immutable machine from what I can tell that has /var/ mounted into it, which is a pretty good route.
In this sense I think Sandstorm has a good picture of things. There are other things that I am still very unsure about, and Asheesh knows because I have expressed them to him (I sure hope that iframe thing goes away, and that daemons like Celery can run, and etc!) but at least in this sense, Sandstorm's container story is more sane.

So there are some reflections in case you are planning on debugging why these things are hard.

-- Chris

PS: If you haven't gotten the sense, the direction I'm thinking of is more along the lines of Guix becoming our Glorious Future (TM) assuming something like GuixOps can happen (go Dave Thompson, go Guix crew!) and a web UI can be built on top of it with some sort of common recipe system.

But I don't think our imperative systems like Debian are going away anytime soon; I certainly don't intend to move all my stuff over to Guix at this time. For that reason, I think there needs to be another program to fit the middle ground: something like salt/ansible/puppet, but with less insane one-off domain specific languages, with a sharable recipe system, and scalable both from developer-oriented scripts but also having a user-friendly web interface. I've begun working on this tool, and it's called Opstimal. Expect to hear more about it soon.