@roidrage’s presentation on metrics, measurements, and logging kept me thinking. At that point I had working metrics already, CPU, RAM, etc. are watched properly, and I would get warned when something bad happened. However, what I wanted on one of my application was application metrics.

This application is an update server for my Kinopilot mobile app. There is a job which runs daily and collects movie listings from various sources, enriches these with movie information etc, and packs this into a database. The Kinopilot app then fetches updates from that server.

In my case I would like to see how many cinemas, movies, and listings are pulled and merged into the data source. Because: if the number of cinemas drops say, from 70 down to 10, there is probably something wrong with either the data or the import process. So seeing a nice graph would help.

Application metrics are probably a bit different from resource metrics. I could write a munin-plugin for sure, but then I would have to put on my “root” cap and install it. I like my application as self-contained as possible, and application metrics belong to the application domain, not the server administration domain.

Enter metrics.librato.com: @roidrage mentioned these guys along the lines of “I don’t want to sell you anything but this stuff is good” a couple of times. So, yeah, why not give it a try? For starters, librato is close to free. Each measurement is US-$0.000002. There is a calculator which tells you how much it would cost depending on how much data you send. (Gotcha: once logged in you cannot see this page any longer). In my case: nothing. Nice.

Next step: signup, get API credentials. Works flawlessly, and even looks nice. Good!

Next step: pull the gem gem "librato-metrics", browse the examples, add measurements to my update script. The gem offers a nice interface to the librato-metrics service, and yes, it just works. Nice thing: when you send a piece of data to a metric, then this metric comes into existence, if it doesn’t exist.

So I add some measurements to my update script: time needed for an update, number of cinemas, movies, and listings found during the update. Now running the update… Oh boy, why is this so slow? But hey, there my data is, right on the website. Which is split into Metrics, Instruments, Dashboards. Here they are. In Metrics. This is just too easy.

But delivering each measurement taking ~500 msecs? (Note: my server is in Germany. Theirs probably not.) Ok, the gem offers a queueing option. Which is nice. Now I queue my measurements, run the script – fast as hell, as before – and then, …, no data. Sure, one probably has to flush the queue manually. No problem for me:

at_exit do

The at_exit handler nicely flushes the queue. This works nice for a run-once script situation; I wouldn’t want this inside a request/response cycle though. There is certainly room for improvement: just run the queue in a separate thread.

Still, I don’t have a clue why delivering the data takes that long. Authentication is reasonably fast (~50 msecs), so this is certainly not a network issue. For some reason metrics.librato.com takes too long to respond, and the Librato::Metrics code works synchronously and blocks until it sees the response (which I don’t even use)

Now, back to the gui. Once you have data you start building “Instruments”. Which is quite nice: just choose some of your metrics to add to your instrument. This works best when setting some properties for your metrics, like label, and min-value. Tip: In my case the label is '#' for cinemas, movies, and listings alike. Which is which is color-coded in the instruments and can be seen easily by just hovering the mouse.

(Gotcha: you have to Save the changes, and the Save button is on the bottom right of the page. On the other hand, when you save some metric properties, the Save button is on the bottom left of the page.)

Now head over to Dashboards: like in a car a dashboard is just a collection of instruments. You create a dashboard, add a couple of instruments, save it, and done. (Gotcha: the Save confirmation – with its Growl-like look IMHO out-of-place in a web app, but that is probably just a matter of taste – obscures the “Dashboard” menu point.) There are not many options in how to build the Dashboard. I would like to have one row of 3 instruments, and one row with one instrument spanning the entire page, but apparently this is not possible.

When done, you just click your dashboard, choose a time frame in the upper right, and see your data. I am said that this updates itself in realtime (or whatever passes as realtime these days.) A few points here, though:

  • It seems impossible to share the dashboard with someone else, without giving out my account credentials.
  • The dashboard forgets the time frame entered. In my case, the default time frame (last 5 mins, if I remember right) just doesn’t make sense; so it would be nice if the dashboard would remember it.
  • And a future improvement: instruments that span different timespans. I could probably go with my application metrics as they are now for the last 4 weeks or so, and, say, number of update requests on a finer scale.

metrics.librato.com: All in all a nice experience. A few improvements, mainly in the GUI, and this is my metrics service for the foreseeable future.

PS: one more addition: I really would like an “instrument”, which just gives me the latest readings for one or more specific metrics, as numbers, in a nice, large font.