WORK – Web Object Records – is a way of describing messages we pass over the web: a single header object called the “meta” and zero or more objects called “items”. Each object can be encoded as a JSON record, though we can access invidual items within each WORK object using a WORK Path which allows quite a bit of latitude for type coercision and vagarities in packaging.
Pipe Cleaner is a project I’ve been working on for the last two months that allows one to script data using WORK, to accomplish tasks such as remixing and filtering RSS feeds, read or produce OPML, make JSON interfaces and so forth. I actually have one live deployment which I will blog about soon and hope to have it beta productized for March.
Atom is a standard for syndicating feeds, not unsimilar to RSS but with a richer better described vocabulary. I already have one major “project” built around Atom: the hAtom microformat for describing microcontent and information that can be syndicated. hAtom has also been morphed by Microsoft to produce the Web Slice format, so you may be seeing that about. Atom is conforms to WORK: there’s a “feed” meta header and zero or more “entry” items.
With Pipe Cleaner I’m trying not only to make a way where feeds and other data can be remixed, but also make it easy to do so! To do that, I’ve decided that be default, even though you are working with (say) OPML or RSS, we’ll translate all the terms to their Atom equivalents as best as possible. You’ll have to read the spec yourselves, but here’s a quick rundown of common elements, not all required by any means:
author
, with possible sub-fieldsuri
andemail
content
– the bodysummary
– a summary of the body; currently my feeling is that content & summary must always be HTMLupdated
– when last updatedcreated
– when created, assume to be updated if not presentlink
– the main URIlinks
– for alternate URIs (this is a variance from the Atom spec; it should be easy to find the main URI for an element; I may reconsider this before release)id
– a unique identifiercategory
– tags, encoded in a sub-field term
Note that I’m not slavish about making the output conformant to all the SHOULDs, MUSTs, etc. that are in the Atom spec: my pragmatic programming approach says “do the best we can” and if the user needs better, they can walk the extra mile.
Here’s some examples of data that’s been run through Pipe Cleaner, translating to Atom upon input and translating back to whatever is needed upon output. The JSON (actually pretty printed JSON) output is the most instructive for what’s going to inside Pipe Cleaner.
RSS Feed
OPML Data
Note how the OPML is “flattened”, with hierarchy being encoded into the Category. This can be turned off if needed.
hCard microformat (in HTML)
Note the neat namespacing in the RSS output. The OPML is almost devoid of useful information, further consideration is needed.
hCalendar microformat (in HTML)
Similar to hCard. We’ll probably also (or exclusively) encode the hCalendar data in an xCal extension.
hAtom microformat (in HTML)
hAtom -> RSS is basically turning an hAtom page into a feed!
Source example
Since no blog post is complete without a little source code, here’s a Pipe Cleaner script to parse the hCard document. If you’re following closely, the output format is selected by the user at runtime. All the other scripts are of similar terseness.
import module:api_microformat; api_microformat.HCard uri:"http://tantek.com/" to:items meta:meta;