Specifications and Interpretations
So I’ve just spent about an hour reading an uber-discussion from forum feedback based on the release of Apache Apollo v1.0 — the next generation of messaging broker based on ActiveMQ. For those that aren’t familiar, Apollo is currently a sub-project of ActiveMQ, simply because the approach used is a radical departure from the way that ActiveMQ works. Apollo takes things completely asynchronous, which allows it to fully leverage multi-core hardware. It’s anticipated that Apollo will become part of ActiveMQ at some future release (possibly as soon as 6.0).
It’s fair to say that the bulk of the effort so far has been into running Apollo using STOMP, a text-based protocol which is rapidly becoming a preferred protocol over binary transports like OpenWire, thanks to the modern increases in the ability to quickly parse text (compared to binary) headers.
I use a number of Apache products today (via Fusesource) and was interested to see how Apollo was shaping up, so I read the press release for 1.0 with interest. Then I came across this discussion, via a tweet from Rob Davies, which I’ve only just finished reading. Suffice to say, that like most other areas in technology, there’s no shortage of “competitive feedback” from developers/architects/founders of other messaging brokers/protocols. It’s to be expected, I guess, since anything like messaging is so driven by performance and reliability, and anything which claims to be better than another is going to be met with the usual scepticism. However this forum thread reached a whole new level.
The ‘problem’, as I understand it, is centered around interpretation of the JMS Specification, namely in the realm of “Guaranteed Delivery of Persistent Messages”. JMS API 1.1, section 4.7 states that:
“A JMS provider must deliver a PERSISTENT message once-and-only-once.”
The crux of the issue is that Apollo introduces an optimisation which skips disc writes for persistent messages if the message can be delivered directly to a consumer. That is, if it can successfully deliver the message and receive an ACK, then there is no need to persist the message. All OK so far. Except that it introduces a scenario where the Consumer’s JMS Client can receive a message, but before it has time to respond with an ACK, the broker dies. So technically, we’re in a position where the message has not been “received” (if we’re assuming that receipt implies an ACK). But, because the message was never persisted in the broker, the broker cannot re-send it. So what’s the solution? For Apollo, in this case, they rely on the Producer to re-send the message. Why would it do that? Because, in the case of persistent messages, the Producer waits for an ACK (from the broker) before returning from the call to send() (and hence considering the message successfully sent). If it doesn’t get a successful return to send(), then it should consider it not successfully sent.
The somewhat long debate centered around the fact that the Consumer might “see” a persistent message, suffer a broker failure, and then never receive it again if the Producer dies and never recovers, because the message wasn’t stored on the broker.
So my take on all this? From a Spec point-of-view, we need to agree on what it means to “send” a message, and “deliver” a message, and herein lies the slightly wobbly ground. Personally I think that, in general, it’s perfectly fine for Consumers to only care about messages that were successfully sent. And I think that it’s OK to consider a persistent message successfully sent if and only if the Producer returns from the call to send(). So from a professional sense, I totally agree that Apollo is adhering to the rules and providing users with a solid improvement in performance. Yay! I look forward to reaping the benefits when it makes its way into ActiveMQ at some point.
However, it uncovers an example of where specifications and interpretations can get us in hot water if they are suitably vague. For this optimisation (and probably other situations too), you can certainly make the case that the “Sent” vs “Received” argument above could be considered circular. “When receiving a message, I only consider it valid if it was successfully sent, which will only happen if I successfully acknowledged it, which surely means that I received it?”.
Nevertheless, nit-picking over the JMS Specification (or any other for that matter), is kinda pointless. Ultimately, it will come down to whether a Client Application and Server Application can send and receive data according to their understanding of the rules, and not lose any messages. If a Client does not actually process any messages that it does not ACK, and a Producer always retries messages that it never successfully sent, then I can’t see the problem. If you want to hold a pedantic debate, that’s fine, but it really doesn’t help (or hinder) my ability to write fault-tolerant message-based systems.
So let’s not debate “does it meet (y)our interpretation of the JMS 1.1 Spec”, let’s debate “will it keep my system running as I expect?”. I believe the answer is a resounding “YES”.