&yet

the scaling

Backbone.js and Capsule and Thoonk, oh my! A scalable realtime architecture

This last year, we’ve learned a lot about building scalable realtime web apps, most of which has come from shipping &bang.

&bang is the app we use to keep our team in sync. It helps us stay on the same page, bug each other less and just get stuff done as a team.

The process of actually trying to get something out the door on a bootstrapped budget helped us focus on the most important problems that needed to be solved to build a dynamic, interactive, real-time app in a scaleable way.

A bit of history

I’ve written a couple of posts on backbone.js since discovering it. The first one introduces Backbone.js as a lightweight client-side framework for building clean, stateful client apps. In the second post I introduced Capsule.js. Which is a tool that I built on top of Backbone that adds nested models and collections and also allows you to keep a mirror of your client-side state on a node.js server to seemlessly synchronize state between different clients.

That approach was great for quickly prototyping an app. But as I pointed out in that post, that’s a lot of in memory state being stored on the server and simply doesn’t scale very well.

At the end of that post I hinted at what we were aiming to do to ultimately solve that problem. So this post is meant to be a bit of an update on those thoughts.

Our new approach

Redis is totally freakin’ amazing. Period. I can’t say enough good things about it. Salvatore Sanfilippo is a god among men, in my book.

Redis can scale.

Redis can do PubSub.

PubSub just means events. Just like you can listen for click events in Javascript in a browser you can listen for events in Redis.

Redis, however is a generic tool. It’s purposely fairly low-level so as to be broadly applicable.

What makes Redis so interesting, from my perspective, is that you can treat it as a shared memory between processes, languages and platforms. What that means, in a practical sense, is that as long as each app that uses it interacts with it according to a pre-defined set of rules, you can write a whole ecosystem of functionality for an app in whatever language makes the most sense for that particular task.

Enter Thoonk

My co-worker, Nathan Fritz, is the closest thing you can get to being a veteran of realtime technologies.

He’s a member of the XSF council for the XMPP standard and probably wrote his first chat bot before you knew what chat was. His Sleek XMPP Python library is iconic in the XMPP community. He has a self-declared un-natural love for XEP-60 which describes the XMPP PubSub standard.

He took everything he learned from his work on that standard and built Thoonk. (In fact, he actually kept the PubSub spec open as he built the Javascript and Python implementations of Thoonk.)

What is Thoonk??

Thoonk is an abstraction on Redis that provides higher-level datatypes for a more approachable interface. Essentially, staring at Redis as a newbie is a bit intimidating. Not that it’s hard to interface with, it’s just kind of tricky to figure out how to logically structure and retrieve your data. Thoonk simplifies that into a few data-types that describe common use cases. Primarly “feeds”, “sorted feeds”, “queues” and “jobs”.

You can think of a feed as an ad-hoc database table. They’re “cheap” to create and you simply declare them to make them or use them. For example, in &bang, we have all our users in a feed called “users” for looking up user info. But also, each user has a variety of individual feeds. For example, they have a “task” feed and a “shipped” feed. This is where it veers from what people are used to in a relational database model, because each user’s tasks are not a part of a global “tasks” feed. Instead, each user has a distinct feed of tasks because that’s the entity we want to be able to subscribe to.

So rather than simply breaking down a model into types of data, we end up breaking things into groups of items (a.k.a. “feeds”) that we want to be able to track changes to. So, as an example, we may have something like this:

// our main user feed
var userFeed = thoonk.feed('users');

// an individual task feed for a user
var userTaskFeed = thoonk.sortedFeed('team.andyet.members.{{memberID}}.tasks');

Marrying Thoonk and Capsule

Capsule was actually written with Thoonk in mind. In fact that’s why they were named the way they did: You know these lovely pneumatic tube systems they use to send cash to bank tellers and at Costco? (PPSHHHHHHHTHOONK! And here’s your capsule.)

Anyway, the integration didn’t end up being quite as tight as we had originally thought but it still works quite well. Loose coupling is better anyway right?

The core problem I was trying to solve with Capsule was unifying the models that are used to represent the state of the app in the browser and the models you use to describe your data on the server—ideally, not just unifying the data structure, but also letting me share behavior of those objects.

Let me explain.

As I mentioned, we recently shipped &bang. It lets a group of people share their task lists and what they’re actively working on with each other.

It spares you from a lot of “what are you working on?” conversations and increases accountability by making your work quite public to the team.

It’s a realtime, keyboard-driven, web app that is designed to feel like a desktop app. &bang is a node.js application built entirely with the methods described here.

So, in &bang, a team model has attributes as well as a couple of nested backbone collections such as members and chat messages. Each member has attributes and other nested collections, tasks, shipped items, etc.

Initial state push

When a user first logs in we have to send the entire model state for the team(s) they’re on so we can build out the interface (see my previous post for more on that). So, the first thing we do when a user logs in is subscribe them to the relevant Thoonk feeds and perform the the initial state transfer to the client.

To do this, we init an empty team model on the client (a backbone/capsule model shared between client/server) . Then we recurse through our Thoonk feed structures on the server to export the data from the relevant feeds into a data structure that Capsule can use to import that data. The team model is inflated with the data from the server and we draw the interface.

From there, the application is kept in sync using events from Thoonk that get sent over websockets and applied to the client interface. Events like “publish”, “change”, “retract” and “position”.

Once we got the app to the point where this was all working, it was kind of a magical moment, because at this point, any edits that happen in Thoonk will simply get pushed out through the event propagation all the way to the client. Essentially, the inteface that a user sees is largely a slave to the server. Except, of course, the portions of state that we let the user manipulate locally.

At this point, user interactions with the app that change data are all handled through RPC calls. Let’s jump back to the server and you’ll see what I mean.

I thought you were still using Capsule on the server?

We do, but differently, here’s how that is handled.

In short… it’s a job system.

Sounds intimidating right? As someone who started in business school, then gradually got into front-end dev, then back-end dev, then a pile of JS, job systems sounded scary. In my mind they’re for “hardcore” programmers like Fritzy or Nate or Lance from our team. Job systems don’t have to be that scary.

At a very high level you can think of a “job” as a function call. The key difference being, you don’t necessarily expect an immediate result. To continue with examples from &bang: a job may be to “ship a task”. So, what do we need to know to complete that action? We need the following:

  • member Id of the user shipping the task
  • the task id being completed (we call this “shipping”, because it’s cooler, and it’s a reminder a reminder that finishing is what’s important)

We can derive everything else we need from those key pieces of information.

So, rather than call a function somewhere:

shipTask(memberId, taskId)

We can just describe a job as a simple JSON object:

{
userId: <user requesting the job>,
taskId: <id of task to 'ship'>,
memberId: <id of team member>
}

The we can add that to our “shipTask” job queue like so:

thoonk.job('shipTask').put(JSON.stringify(jobObject));

The cool part about the event propagation I talked about above is we really don’t care so much when that job gets done. Obviously fast is key, but what I mean is, we don’t have to sit around and wait for a synchronous result because the event propagation we’ve set up will handle all the application state changes.

So, now we can write a worker that listens for jobs from that job queue. In that worker we’ll perform all the necessary related logic. Specifically stuff like:

  • Validating that the job is properly formatted (contains required fields of the right type)
  • Validating that the user is the owner of that task and is therefore allowed to “ship” it.
  • Modifying Thoonk feeds accordingly.

Encapsulating and reusing model logic

You’ll notice that part of that list requires some logic. Specifically, checking to see if the user requesting the action is allowed to perform it. We could certainly write that logic right here, in this worker. But, in the client we’re also going to want to know if a user is allowed to ship a given task, right? Why write that logic twice?

Instead we write that logic as a method of a Capsule model that describes a task. Then, we can use the same method to determine whether to show the UI that lets the user perform the action in the browser as we use on the back end to actually perform the validation. We do that by re-inflating a Capsule model for that task in our worker code and calling the canEdit() method on it and passing it the user id requesting the action. The only difference being, on the server-side we don’t trust the user to tell us who they are. On the server we roll the user id we have for that session into the job when it’s created rather then trust the client.

Security

One other, hugely important thing that we get by using Capsule models on the server is some security features. There are some model attributes that are read only as far a the client is concerned. What if we get a job that tries to edit a user’s ID? In a backbone model if I call:

backboneModelInstance.set({id: 'newId'});

That will change the ID of the object. Clearly that’s not good in a server environment when you’re trusting that to be a unique ID. There are also lots of other fields you may want on the client but you don’t want to let users edit.

Again, we can encapsulate that logic in our Capsule models. Capsule models have a safeSet method that assumes all inputs are evil. Unless an attribute is whitelisted as clientEditable it won’t set it. So when we go to set attributes within the worker on the server we use safeSet when dealing with untrusted input.

The other important piece of securing a system that lets users indirectly add jobs to your job system is ensuring that the job you receive validate your schema. I’m using a node implementation of JSON Schema for this. I’ve heard some complaints about that proposed standard, but it works really well for the fairly simple usecase I need it for.

A typical worker may look something like this:

workers.editTeam = function () {
var schema = {
type: "object",
properties: {
user: {
type: 'string',
required: true
},
id: {
type: 'string',
required: true
},
data: {
type: 'object',
required: true
}
}
};
editTeamJob.get(0, function (err, json, jobId, timeout) {
var feed = thoonk.feed('teams'),
result,
team,
newAttributes,
 inflated;

async.waterfall([
function (cb) {
// validate our job
validateSchema(json, schema, cb);
},
function (clean, cb) {
// store some variables from our cleaned job
result = clean;
team = result.id;
newAttributes = result.data;
verifyOwnerTeam(team, cb);
},
function (teamData, cb) {
// inflate our capsule model
inflated = new Team(teamData);
// if from the server user normal 'set'
inflated.safeSet(newAttributes);
},
function (cb) {
// do the edit, all we're doing is storing JSON strings w/ ids
feed.edit(JSON.stringify(inflated.toJSON()), result.id, cb);
}
], function (err) {
var code;
if (!err) {
code = 200;
logger.info('edited team', {team: team, attrs: newAttributes});
} else if (err === 'notAllowed') {
code = 403;
logger.warn('not allowed to edit');
} else {
code = 500;
logger.error('error editing team', {err: err, job: json});
}
// finish the job
editTeamJob.finish(jobId, null, JSON.stringify({code: code}));
// keep the loop crankin'
process.nextTick(workers.editTeam);
});
});
};

Sounds like a lot of work

Granted, writing a worker for each type of action a user can perform in the app with all the related job and validation is not an insignificant amount of work. However, it worked rather well for us to use the state syncing stuff in Capsule while we were still in the prototyping stage, then converting the server-side code to a Thoonk-based solution when we were ready to roll out to production.

So why does any of this matter?

It works.

What this ultimately means is that we now push the system until Redis is our bottleneck. We can spin up as many workers as we want to crank through jobs and we can write those workers in any language we want. We can put our node app behind HA proxy or Bouncy and spin up a bunch of ‘em. Do we have all of this solved and done? No. But the core ideas and scaling paths seem fairly clear and doable.

[update: Just to add a bit more detail here, from our tests we feel confident that we can scale to tens of thousands of users on a single server and we believe we can scale vertically after doing some intelligent sharding with multiple servers.]

Is this the “Rails of Realtime?”

Nope.

Personally, I’m not convinced there ever will be one. Even Owen Barnes (who originally set out to build just that with SocketStream) said at KRTConf: “There will not be a black box type framework for realtime.” His new approach is to build a set of interconnected modules for structuring out a realtime app based on the unique needs of its specific goals.

The kinds of web apps being built these days don’t fit into a neat little box. We’re talking to multiple web services, multiple databases, and pushing state to the client.

Mikeal Rogers gave a great talk at KRTConf about that exact problem. It’s going to be really, really hard to create a framework that solves all those problems in the same way that Rails or Django can solve 90% of the common problems with routes and MVC.

Can you support a BAJILLION users?

No, but a single Redis db can handle a fairly ridiculous amount of users. At the point that actually becomes our bottleneck, (1) we can split out different feeds for different databases, and (2) we’d have a user base that would make the app wildly profitable at that point—certainly more than enough to spend some more time on engineering. What’s more, Salvatore and the Redis team are putting a lot of work into clustering and scaling solutions for Redis that very well may outpace our need for sharding, etc.

Have you thought about X, Y, Z?

Maybe not! The point of this post is simply to share what we’ve learned so far.

You’ll notice this isn’t a “use our new framework” post. We would still need to do a lot of work to cleanly extract and document a complete realtime app solution from what we’ve done in &bang—particularly if we were trying to provide a tool that can be used to quickly spin up an app. If your goal is to find a tool like that, definitely check out what Owen and team are doing with SocketStream and what Nate and Brian are doing with Derby.

We love the web, and love the kinds of apps that can be built with modern web technologies. It’s our hope that by sharing what we’ve done, we can push things forward. If you find this post helpful, we’d love your feedback.

Technology is just a tool, ultimately, it’s all about building cool stuff. Check out &bang and follow me @HenrikJoreteg, Adam @AdamBrault and the whole @andyet team on the twitterwebz.


If you’re building a single page app, keep in mind that &yet offers consulting, training and development services. Hit us up (henrik@andyet.net) and tell us what we can do to help.

An Introduction to Thoonk!

A persistent (and fast!) system for push feeds, queues, and jobs, leveraging Redis.

As application developers, we persist data in tables which are constantly updated, leaving most of the application’s components and user-interface in the dark until it asks for the data.

[Movie trailer voice] Imagine a world where these tables push change-events to any piece of your application stack, in diverse languages and on multiple servers.[/Movie trailer voice]

Enter Thoonk.

Clustering Node.js instances, communicating between service components in different languages and on different machines, forking off asynchronous jobs for reliability and queuing of work, communicating between APIs and views, and sending events to real-time webapps are all problems that can be solved with messaging.

Thoonk solves these problems more gracefully than simple messaging because the messages are change-events on persisted data.

Thoonk is a Redis schema for manipulating advanced, live objects (feeds, sorted-feeds, queues, and job-queues, etc). Thoonk is also a couple of implementations of this schema (currently thoonk.js for Node.js and thook.py for Python).

Thoonk is a lot of things, which I will describe, but really what I would like you to get out of this is what the concept is useful for.

A feed is a list of data entries that have publish, edit, retract, and other events associated with those entries. A feed brings to mind ATOM or RSS to most people, but I think feeds are more useful when the associated events are broadcast on publish-subscribe channels so that data can be synchronized. Redis contains both of the necessary components (object storage and publish-subscribe channels).

Thoonk feeds enable our “live tables” fantasy.

Let’s get specific about Thoonk feed-types.


Please refer to the Thoonk.js and Thoonk.py documentation for examples.

The basic feed is a list of items sorted by publish time. Verbs on these objects include publish, edit, and retract. Feeds may be configured to have a max-number of items, which when exceeded, drops the oldest items. Every item may have a unique assigned id, or Thoonk will generate one for you.

Sorted-Feeds are similar to feeds, but they have no item limit (beyond practical memory limitations) and are sorted by publishing items relative to existing item ids. Verbs for sorted-feeds include append, prepend, publishBefore, publishAfter, move, edit, and retract. Sorted-feeds emit position updates when an item is published or moved in addition to publish, edit, and retract events.

Queues contain items that can be placed at the beginning or end, producing FIFO and LIFO queues. A queue get is a blocking operation with an optional timeout that pops an item off of the end. Queues can be used for simple messaging and task distribution.

Job channels distribute items in a guaranteed completion manner. Jobs consist of three queues: available jobs, in-flight jobs, and stalled job. Like queues, jobs can be pushed to the beginning or end of available jobs and getting a job is a blocking operation with a timeout. Job verbs include: publish, retract, get, cancel (place an in-flight job back into available-jobs), stall (place a job out of the way that has been a problem), retry (place a stalled job as available).

Sets will be added in the near future as a means for maintaining live filters/queuries for feeds and other data.

An example Thoonk ecosystem:


Thoonk is a tool which allows you create an Internet service as a wide ecosystem rather than a deep application. Say we provide a series of 8 node.js processes to take advantage of the number of CPU threads available. This node.js application provides a websocket interface to a browser-js application with live events coming from Thoonk feeds on Redis, organized by individual users and teams. In another process, we might run a Ruby service that provides a REST interface for manipulating and querying objects within users and groups. Say also that we want to peer certain data with other services — we can run a Python process which provides XMPP Publish-Subscribe (XEP-0060) and a Java interface which provides a PubsubHubbub interface. In addition to that, background jobs that absolutely have to be done can be pushed through a job system with workers running in C.

All of these separate components subscribe to the feeds pertinent to their function as well as provide relevant ACL and interface to the end-points. You are now free to use the most appropriate tools for the job, distribute load, organize application data, and selectively synchronize state easily. Of course, if you don’t have to have a lot of processes on a lot of servers in a lot of languages, you can still take advantage of compartmentalizing and duplicating your componets.

Backstory


I find Messaging to be an interesting problem, particularly when machines communicate to share state, make requests, etc. However, messaging has limited use without persistent data, which is why I like XMPP Publish-Subscribe (XEP-0060) so much. Feeds of data — combining data-persistence with publish-subscribe events about changes to the data, is incredibly valuable in machine-to-machine communication.

This is something that I’ve been applying to clustering, configuration distribution, job distribution and management, and real-time webapps, and other problems for years now in my consulting work.

Then, I discovered Redis, which is a very fast key-store-with-containers database that also includes publish-subscribe, and I immediately knew what I had to build.

I’m publishing this as MIT because I not only want to share it, but I want your feedback, harsh criticism, and contributions. We need more implementations in other languages, and I’d love to see people publish tools that contribute to Thoonk interfaces. In addition, please point out flaws in the contract.txt (schema) document, show us your extensions and own object types, etc.

Just hit up myself @fritzy and/or Lance Stout @lancestout on twitter, follow the github projects (Thoonk.js and Thoonk.py), and watch http://thoonk.com.

Our team at &yet always seems to find our way to work on interesting things, so be sure to follow us on Twitter for the latest.

-Nathan Fritz, &yet Chief Architect

Follow us on Twitter

Who's &yet?

We're a crazy fun team who love tackling projects that scratch our collective creative itch.

Giving us a challenging problem to solve is like Ma ringing the jangly triangle thing to announce dinner and whatnot.

Ridiculous? Probably.

Find out more about us... if you dare.

Tag categories

andbang, architecture, awesome, backbone, casts, css3, devops, django, education, film, henrik, html5, interview, italy, javascript, nate, new hires, new office, node, node.js, nodejs, npm, ops, planning, podcast, process, qa, realtime, redis, scaling, security, templates, thanks, thoonk, tumbleweed tech, twitter, vodcast, web design, websockets, work

Post archives

We make web software for human people.
(And have a nearly inappropriate amount of fun doing it.)

Blog feed Follow us on Twitter