mkdir -p $(dirname $0)/data; cd data
print_messages() {
clear
cat $(ls -tr | tail -n30)
printf "\033[31m$USER:\033[0m "
}
export -f print_messages
watchexec 2> /dev/null -- bash -c "print_messages" &
while read text; do
printf "\033[31m$USER:\033[0m $text\n\n" > "$(uuidgen)"
donePut that script inside a folder, share the folder with someone via Syncthing or Dropbox or whatever, run it, and you should get this:
This is hardly a Discord killer, but as far as messengers go there are some interesting properties:
- There is no central authority or server that “owns” the messages
- An offline machine can write new messages that will propagate once it’s back online
- All participating machines will show the same messages in the same order once they’re synced, no matter what
There’s nothing really novel about those three things; that’s what you get out of the box with Conflict Free Replicated Data Types (CRDTs). So my goal with this blog post is to plant the seed in your mind that CRDTs are just generally cool, and they are very simple.
And even though this little messenger is kinda toy-ish, it’s completely solid and I use it to communicate with a (equally nerdy) friend of mine. I’ve used the same technique to create a time tracker that I can use on different machines without every worrying about being online or things getting out of sync. We’re obviously relying on a file sync program to do some heavy lifting here, but because the data is “conflict free”, something as simple as rsync or scp would work (and always work) just fine.
The Bash Script
There’s not much to it, so I’ll run through it quick.
mkdir -p $(dirname $0)/data; cd dataCreate a data directory (if needed) and move into it.
print_messages() {
clear
cat $(ls -tr | tail -n30)
printf "\033[31m$USER:\033[0m "
}Clear the screen, print the contents of the last 30 messages, and
then print the mike: prompt. The gross looking
\033[31m stuff is just ANSI escape codes to set and unset
the color.
export -f print_messagesSome Bash nonsense to “export a function”. Otherwise the watchexec subprocess can’t see it.
watchexec 2> /dev/null -- bash -c "print_messages" &Start up watchexec. Send it’s stderr output into
/dev/null so it doesn’t bother us. Whenever it sees file changes,
reprint the messages. The & symbol makes it run in the background so
our script can do other things.
BTW, I used watchexec to watch for file changes because it works on
Termux, which lets me use this on Android. If you want to use
fswatch (which seems more common) instead, replace that
line with this:
print_messages # fswatch doesn't fire on startup, so print messages first
fswatch -o . | while read -r event; do
print_messages
done &while read text; do
printf "\033[31m$USER:\033[0m $text\n\n" > "$(uuidgen)"
doneRead user input. When they hit Enter, put whatever they
wrote into a file. Critically, use a Universally Unique
Identifier for that file.
So basically, stuff messages into files that all have UUIDs. If you
look inside data you’ll see this:
$ ls data
035aa216-3e23-4921-8d14-b79bdc150232 5d07ed32-8f0c-4c88-9a93-f12606d57ea1
04455d8b-b58a-40da-a01a-7631e90ccbd8 6187df26-8a6e-4729-9553-ffe1acf0d45f
1d621700-322e-4ba9-9d66-16a739838adf 650b8c73-9ef5-40e5-8014-a8d97d617f1fWhy This Works
Using UUIDs means that any machine can create files without having to worry about another machine creating an identically-named file and causing a conflict, which would break our whole system.
We can’t delete files, because if a machine deletes a file and then tries to sync, it won’t be able to tell if it deleted something or if it’s just talking to a machine that has a new file (we actually could get away with this because Syncthing keeps a local database to log file deletions, but that’s cheating, and simpler tools like scp definitely don’t. Plus there’s better ways anyway).
Lastly, after two machines exchange files, it’s critical that they
both can display messages in the same order. Using ls -tr
to order the files actually works perfectly, because -tr
(order by time, reversed) uses the file modification date, and that gets
preserved when copying the files. It’s technically possible to create
files with the same modification date on two different machines and
therefore have an arbitrary ordering, but at least on Linux with most
modern filesystems you get billionth-of-a-second granularity, which is
more than fine. On a filesystem like FAT32 with 2 second granularity
this would very much be a problem.
So, those 3 properties mean that we have created a CRDT. CRDTs are just data structures that:
- Can be replicated across an arbitrary number of nodes
- Can be modified concurrently
- Will always converge to the same thing after nodes sync with each other
Specifically, we’ve created a grow-only set. If we ignored the
contents of each file we could still count them with something
like ls -1 | wc -l, and that would be an even simpler CRDT
called a grow-only counter.
That’s what I used in the timer-tracker thing I mentioned earlier.
Just add a file with a UUID into a directory called
25_minute_pomodoros, and now you have a distributed,
conflict-free pomodoro counter.
Edits and Deletions
So an obvious problem is that you can’t edit or delete a message. And, in fact, it’s fundamental to the design that once you create a new file, you absolutely do not mess with it.
To get around that, you just create more files. So in the
Pomodoro example, there’s a folder called
25_minute_pomodoros_deletions. If I decide that I want to
decrement my Pomodoro counter, just
touch 25_minute_pomodoros_deletions/$(uuidgen). Then
subtract the number of files in
25_minute_pomodoros_deletions from
25_minute_pomodoros. This is called a positive-negative
counter.
For messages, rather than just putting the plain text contents in each file, we could do more structured data like:
message:Hey it's mike what's up?
or
delete:2880dbc8-a2c6-43c0-8f88-e0fb2672755c
or
edit:2880dbc8-a2c6-43c0-8f88-e0fb2672755c:Hey, it's Mike what's up???
We’d then have to actually inspect the contents of each messsage and decide if it should be displayed or if it affects a previous message (so we’re well beyond 12 lines of bash at this point) but it doesn’t change anything about the properties of the system. Any machine can make those changes freely, and messages will always be rendered the same way.
The Takeaway
The important concept here is that data is stored in one of these very simple CRDT models, and you can use that basic model to deterministically “render” whatever data you want.
Flat files and uuidgen is enough to implement the data
structure (not saying you should, but cleary you can). The sync
part is what’s mind blowing. You can sync arbitrarily complex data
between an arbitrary number of devices without knowing anything
about it. rsync or scp could easily handle this job.
If we were doing the same thing in a more sane way (like, say, storing these messages in a local sqlite database), you can still pump messages between machines without any care for what’s in them or what they mean.
Even if you want a dedicated server, the server does not need to know
how to render them, so the entirely of the server logic can be: Hey,
let’s compare messages. Please give me the ones that I’m missing. Here’s
the ones that you’re missing. And you have one endpoint:
/sync.
I’ve been building things with CRDTs for a while now and have developed a real love for what they let you do. I’d love to talk more about them soon, but for now, I hope that’s at least a fun introduction for anyone who isn’t familiar yet. I really think they’re being slept on and I hope more people start using them.
A Little Note: If you actually want to play with this and you’re using Syncthing, messages are kinda slow by default. There’s a setting in ~/.config/syncthing/config.xml called fsWatcherDelayS. Set it to “1” for the folder you’re keeping messages in and it will be much faster. If you’re using Google Drive or Dropbox or whatever, you’re on your own.