LMX 1.0 released

2007-03-03 23:56:18 -08:00

Some of you know that I’m a developer on Adium. (Hopefully all of you; it is mentioned in the sidebar. ;)

Adium has a feature called “message history”. When you open a new chat with a person, message history shows you the last n messages from your previous chat with that person. Since 1.0 (which changed message history to draw from the logs rather than separate storage and changed the log format to be XML rather than bastardized HTML—more info on the Adium blog post), message history has been implemented using a library that I wrote called LMX.

LMX is a reverse XML parser. Whereas most XML parsers (AFAIK, all of them except LMX) parse the XML data from the start to the end, LMX parses it from the end to the start. Thus, while characters are kept in their original order (“foo” will still be “foo”; it will not become “oof”), everything else is reported in the reverse order: elements close before they are opened, and appear from last to first. All this is by design, so that Adium can retrieve the last n message elements without having to parse all the message elements before them.

Today, LMX gets its very own webpage (not just a page on the Adium wiki, but a real webpage), and is released at version 1.0. It’s the same code as shipped with Adium 1.0.1, but shined up into a release tarball.

So, if you too ever find yourself in desperate need of a reverse XML parser, now there is one.

15 Responses to “LMX 1.0 released”

  1. Matt Thomas Says:

    Another solution could be to run it through xslt. The syntax would go something like this:

    <xsl:for-each select="message">
    <xsl:sort select="position()" order="descending" data-type="number" />
    <xsl:apply-templates select="."/>
    </xsl:for-each>

    (hopefully wordpress will render this correctly)

  2. Joshua Gooden Says:

    That solution still requires that a DOM tree be built and parsed in order to re-order the elements. DOM trees are hugely memory hungry beasts, and though I haven’t looked at the LMX source, I would imagine that it lazy parses the XML doc, saving hugely on memory and CPU requirements.

  3. Simon Rolfe Says:

    This could be hugely useful as a tail replacement for XML-formatted log files of any sort – seeing as more and more services are logging to XMLish structures, I can see LMX becoming a really handy tool.

  4. Peter Hosey Says:

    Matt Thomas: (building on Joshua Gooden’s comment) It’s theoretically possible that an implementation of XSL could do the job lazily (as lazily as it can, after analyzing the style-sheet to find out how lazy that is), but I don’t know if any such implementations exist. Besides, I don’t think any of us thought of using XSL for this. :)

    (Also, interestingly, your comment didn’t get emailed to me. I wonder how that happened.)

    Joshua Gooden: Yes, LMX is lazy. It reports start and end tags and character-runs as it finds them, and the parser allows any such callback to pause the parse and resume it later. Adium takes advantage of this by pausing the parse after n messages, and then releasing the parser (never resuming it). This is subtly guaranteed by the API (see comment for -pause in LMXParser.h).

    Simon Rolfe: Well, LMX is a library, not a tool unto itself, but yes, LMX could definitely be used to create such a tool. :)

  5. Peter Hosey Says:

    Ah—the email just came in. Guess it got held up for some reason.

  6. Eugene Morozov Says:

    Why use XML at all in this case? I’d say that using XML for logs and then writing another parser for it, is like creating problem out of nothing and spending significant effort to solve it.

    There’re plenty ways to store logs that wouldn’t require backwards XML parser or XML parser at all.

  7. Peter Hosey Says:

    Because the XML format satisfies all the requirements on our LogFormatIdeas page.

  8. Håkan W Says:

    Can’t you just add every new message to the beginning of the file, putting them in newest-first order?

  9. Peter Hosey Says:

    No, because then we would need to rewrite all the previously-written files every time (file I/O does not have an insert mode, only overwrite). This would get very expensive for very long logs, and some people leave the chat open all the time.

  10. Håkan W Says:

    Oh, that’s interesting. Offtopic random question: do you know if this is something HFS+-specific, or is the problem a general disk one? Are there file systems that do have a “prepend” mode? I admit my knowledge in that kind of territory is not the strongest.

  11. Peter Hosey Says:

    Neither POSIX nor File Manager nor NSFileHandle has an insert mode, regardless of underlying file-system. I don’t know about any other operating systems’ file I/O APIs.

  12. Dethe Elza Says:

    What I’ve done in similar circumstance is to put each XML snippet on its own line and use standard line-oriented tools (tail in the shell, for instance) to trim off the ones I’m interested in. For parsing snippets of XML you can use two approaches. The first is to wrap them in a single tag before passing them to your parser. The second is useful when you want to get the whole list but don’t want to have an end-tag (so you can keep appending to the file): add the starting tag, omit the end tag, use the SAX api for parsing and simply catch the error about the missing end tag. Either way its pretty easy to get last-N messages.

  13. Alastair Says:

    Interesting choice for name. Did you know that there is already another application called LMX? It is used for code generation and reading/writing in an XML Beans fashion. Was somewhat confusing for me as we actually use LMX in our development (not this particular LMX that is).

  14. Peter Hosey Says:

    Yeah, I’ve seen a couple of other LMXen on Google since releasing 1.0 (I didn’t look previously—oops). Here’s hoping that nobody makes a beef about it.

  15. Jason Says:

    Wow, everyone but Simon seems unnecessarily negative about this. Anyways, thank you for making this available, it’s a very useful idea.

Leave a Reply

Do not delete the second sentence.