Rorkvell - pingback 1.01

This version

http://www.rorkvell.de/tech/specs/pingback

Latest version

http://www.rorkvell.de/tech/specs/pingback

Previous versions

Editors

Abstract

Pingback is a method for web authors to request notification when somebody links to one of their documents. Typically, web publishing software will automatically inform the relevant parties on behalf of the user, allowing for the possibility of automatically creating links to referring documents.

For example, Alice writes an interesting article on her Web log. Bob then reads this article and comments about it, linking back to Alice's original post. Using pingback, Bob's software can automatically notify Alice that her post has been linked to, and Alice's software can then include this information on her site.

Status of This Document

This is a stable specification. It contains a very small addition to Ian Hicksons specification v1.0. Comments are welcome.

Available languages

This document is available in english and german (and in the formats html 4.01 and xhtml 1.1). The version delivered depends on automatic content negotiation and thus und your browser settings. The default fallback language is german. The normative version is the english version. A link to the alternate versions is provided in the footer.

Errata

This section was added after the final publication date of the specification.

(2007-01-16) In order to avoid susceptibility to denial of service attacks, pingback servers that fetch the specified source document (as described in section 3) are urged to impose limits on the size of the source document to be examined and the rate of data transfer. Thanks to Blake Matheny for bringing this issue to our attention.

Introduction
1. Technical Details
2. Definitions
Server Autodiscovery
XML-RPC Interface
Conformance Requirements
Example
References

Introduction

The pingback system is a way for a blog to be automatically notified when other Web sites link to it. It is entirely transparent to the linking author, requiring no user intervention to work, and operates on principles of automatic discovery of everything that it needs to know. A sample blog post involving pingback might go like this:

Alice posts to her blog. The post she's made includes a link to a post on Bob's blog.
Alice's blogging system contacts Bob's blogging system and says "look, Alice made a post which linked to one of your posts".
Bob's blogging system then includes a link back to Alice's post on his original post.
Reader's of Bob's article can follow this link to Alice's post to read her opinion.

It enables reverse linking — a way of going back up a chain of links rather than merely drilling down.

Technical Details

The pingback mechanism uses an HTTP header and an HTML or XHTML link element for autodiscovery, and uses a single XML-RPC call for notifying the target site of the link on the source site.

It is intended that compliant pingback clients and pingback servers be implementable with minimal effort using libraries typically available in CGI environments. For this reason, the requirements on parsing HTTP headers and HTML documents have been kept to a strict minimum.

Definitions

sourceURI

The address of the entry on the site containing the link.

This is the URI of the document B that has an artikle related to an original artikle in document A. If you write a related article this is your URI.

The document that is referred to by the sourceURI should be in the format HTML as default. The server should request this URI with an http Accept header of text/html, application/xml, application/rss+xml or others. The client must be able to send at least text/html or application/xml or application/rss+xml (the last both are both refering to an rss document). It is explicitely legal to use any mime type for sourceURI. If the pingback server does not know how to handle the format it may deliberately decide to either reject the ping or to just and only include the URI into the pinged document.

pingback client

The software that establishes the connection to inform the server about the link from the source to the target. Typically, the source will be the client.

pingback server

The software that accepts XML-RPC connections. Typically, the target URI will be associated with the server (e.g. on the same host).

pingback user agent

A single system, which is both a pingback client and a pingback server.

targetURI

The target of the link on the source site. This SHOULD be a pingback-enabled page.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119].

Server Autodiscovery

There are two mechanisms for the automatic discovery of pingback servers: HTML (or XHTML) link elements and HTTP headers. A pingback-enabled resource MUST either be served with an X-Pingback HTTP header or contain a link element, or both. Pingback-enabled HTML and XHTML pages MUST be valid. Clients MAY refuse to search invalid pages for pingback information.

HTTP Header

Pingback-enabled resources MAY be returned with a X-Pingback HTTP header. For example, a PNG image served with the following headers would be pingback-enabled:


HTTP/1.1 200 OK
Date: Sun, 08 Sep 2002 15:05:37 GMT
Server: Apache/1.3.26 (Unix)
Last-Modified: Thu, 28 Dec 2000 03:18:26 GMT
ETag: "65044-15b9c-3a4ab102"
Accept-Ranges: bytes
Content-Length: 88988
Connection: close
Content-Type: image/png
X-Pingback: http://charlie.example.com/pingback/xmlrpc

.PNG...

The value of the X-Pingback header MUST be the absolute URI of the pingback XML-RPC server.

Pages MUST NOT include more than one such header. HTML and XHTML documents MAY include a link element in addition to an HTTP header, although this is discouraged. If included, the header SHOULD have exactly the same value as the link element. In the case of a discrepancy, the HTTP header SHALL override the link element, however, authors should be aware that some clients will not process HTTP headers due to limitations of their environment.

Pingback-enabled resources MUST NOT use the HTTP Link header for advertising pingback servers. HTTP Link headers require non-trivial parsing, and were therefore deemed too heavy-duty for the purposes of pingback server autodiscovery.

Link element

An HTML or XHTML pingback-enabled page MAY contain a link element in one of the following two forms:

html

link rel="pingback" href="pingback server"

html 4.x or xhtml served as text/html

link rel="pingback" href="pingback server" /

If used, the link element MUST match the appropriate form exactly (including the whitespace before the slash, for instance).

xhtml served as text/xml, application/xml or application/xhtml+xml

link rel="pingback" href="pingback server" /

If used, the link element MAY include the whitespace before the trailing slash, it MUST conform to XML wellformedness.

Pages MUST NOT include more than one such element, and MUST NOT include such a string matching the pattern described below unless it is intended to be the link element.

The pingback server placeholder MUST be replaced by the absolute URI of the pingback XML-RPC server. This URI MUST NOT include entities other than &, <, >, and ". Other characters that would not be valid in the HTML document or that cannot be represented in the document's character encoding MUST be escaped using the %xx mechanism as described in [RFC2396].

These strict requirements are intended to drastically reduce the requirements on clients implementing server autodiscovery, as it was deemed that requiring clients to implement an HTML parser in addition to an XML parser was a too heavy burden, given how easy it would be for page authors to comply to the restrictions described above.

Autodiscovery Algorithm

Pingback clients, given a source URI and a target URI, SHOULD fetch the target URI and follow the following steps to find the pingback server URI.

Examine the HTTP headers of the response. If there are any X-Pingback headers then the first such header's value should be used as the pingback server URI. Clients MUST examine the HTTP headers if they are able to. If for some reason the HTTP headers are not available to the implementation then this step MAY be skipped.
Otherwise, search the entity body for the first match of the following regular expression: link +rel="?pingback"? +href=[^ ]+"? *[^>]* */?
The original RegExp was link rel="pingback" href="([^"]+)" ?/?. The new one allows for some more flexibility without adding sgnificantly to parsing complexity, it is still a simple RegExp. The new RegExp allows for any number of attributes, as long as rel="pingback" is the first and href is the second. Attributes may be quoted as specified in newer html versions, or the quotes may be skipped, as in older html versions. Any number of blanks between the attributes are allowed, as long as the attributes are separated ba at least one blank.
If the regular expression matched, clients MUST then expand the four allowed entities (& for &, < for <, > for >, and " for ").

Having extracted this pingback server URI, it SHOULD be used to send an XML-RPC request as described below.

If the there is no X-Pingback header and the regular expression does not match, then the target in question does not support pingback as defined by this specification and the client MAY do whatever it likes. However, it is RECOMMENDED that clients do not attempt to be more lenient (e.g. by correctly parsing the HTML and looking for link elements that look like pingback links from an HTML point of view) because this will lead to some systems recognising the link and others ignoring it.

Clients MAY optimise the search. For example:

The client MAY initally only send an HTTP HEAD request in the hope that the header will be found and the content will not have to be fetched.
Since link elements may only appear in the document's head, clients MAY abort when the strings /head or body are seen (e.g. if the client reads the content one line at a time).
Since the pingback links are most likely to appear near the top of the document, clients MAY abort the search after passing a certain size threshold. Clients MAY similarly use the HTTP Content-Range header to only fetch the first few kilobytes of the target URI.

Note, however, that these optimisations are prone to being caught out by legitimate documents, for example those having comments containing the strings given above, or those with large inline stylesheets appearing before the pingback link. Authors are encouraged to take these possible optimisations into account when deciding where to place their pingback links.

XML-RPC Interface

Pingback clients, having discovered a pingback server, SHOULD send the server an XML-RPC request with the method name pingback.ping and two arguments, the source URI and the target URI respectively.

pingback.ping

Notifies the server that a link has been added to sourceURI, pointing to targetURI.

Parameters

sourceURI of type string: The absolute URI of the post on the source page containing the link to the target site.
targetURI of type string: The absolute URI of the target of the link, as given on the source page.

Return Value

A string, as described below.

Faults

If an error condition occurs, then the appropriate fault code from the following list should be used. Clients can quickly determine the kind of error from bits 5-8. 0×001x fault codes are used for problems with the source URI, 0×002x codes are for problems with the target URI, and 0×003x codes are used when the URIs are fine but the pingback cannot be acknowledged for some other reaon.

0: A generic fault code. Servers MAY use this error code instead of any of the others if they do not have a way of determining the correct fault code.
0×0010 (16): The source URI does not exist.
0×0011 (17): The source URI does not contain a link to the target URI, and so cannot be used as a source.
Please note the new fault code below.
0x0012 (18) (new): The source URI could not be parsed because of unknown format (mime type).
0×0020 (32): The specified target URI does not exist. This MUST only be used when the target definitely does not exist, rather than when the target may exist but is not recognised. See the next error.
0×0021 (33): The specified target URI cannot be used as a target. It either doesn't exist, or it is not a pingback-enabled resource. For example, on a blog, typically only permalinks are pingback-enabled, and trying to pingback the home page, or a set of posts, will fail with this error.
0×0030 (48): The pingback has already been registered.
0×0031 (49): Access denied.
0×0032 (50): The server could not communicate with an upstream server, or received an error from an upstream server, and therefore could not complete the request. This is similar to HTTP's 402 Bad Gateway error. This error SHOULD be used by pingback proxies when propagating errors.

In addition, [FaultCodes] defines some standard fault codes that servers MAY use to report higher level errors.

Servers MUST respond to this function call either with a single string or with a fault code.

If the pingback request is successful, then the return value MUST be a single string, containing as much information as the server deems useful. This string is only expected to be used for debugging purposes.

If the result is unsuccessful, then the server MUST respond with an RPC fault value. The fault code should be either one of the codes listed above, or the generic fault code zero if the server cannot determine the correct fault code.

Clients MAY ignore the return value, whether the request was successful or not. It is RECOMMENDED that clients do not show the result of successful requests to the user.

Upon receiving a request, servers MAY do what they like. However, the following steps are RECOMMENDED:

The server MAY attempt to fetch the source URI to verify that the source does indeed link to the target.

If the server does so, it MUST be able to parse at least the formats text/html and application/xml resp. application/rss+xml (the rss format). The server SHOULD send an appropriate http Accept header to inform the server at the other end about acceptable mime types (formats). The server MAY add any known parsable mime types to the Accept header.
Option: If the server finds a rev attribute in the corresponding link it MAY add a rel attribute to its own link and set the value to the rev attribute value found in the sourceURI. Possible link relations for these attributes may be found for example at microformats vote-links.
If the document served is not parsable the server MAY discard the pingback or MAY just include the sourceURI to the article.
The server MAY check its own data to ensure that the target exists and is a valid entry.
The server MAY check that the pingback has not already been registered.
The server MAY record the pingback.
The server MAY regenerate the site's pages (if the pages are static).

Conformance Requirements

To claim conformance to this specification a pingback client MUST support server autodiscovery as described in this specification and MUST correctly send pingback XML-RPC calls.

To claim conformance to this specification a pingback server MUST be able to receive pingback XML-RPC calls and MUST always return results that conform to the allowed return values. Returning detailed (non-zero) fault codes is OPTIONAL.

Note that some pingback servers may not have associated pages. For example, a pingback gateway server could be standalone, and other pages would then use the link element to link to this gateway server instead of providing a server of their own. To claim conformance to this specification a pingback-enabled resource MUST have either an HTTP X-Pingback header or a link element in order to allow for server autodiscovery.

To claim conformance to this specification a pingback user agent MUST support server autodiscovery as described in this specification, MUST correctly send pingback XML-RPC calls, MUST be able to receive pingback XML-RPC calls, MUST always return results that conform to the allowed return values (returning detailed (non-zero) fault codes is OPTIONAL), and MUST have either an HTTP X-Pingback header or a link element on all potential target pages in order to allow for server autodiscovery.

Example

Here is a more detailed look at what could happen between Alice and Bob during the example described in the introduction.

Alice posts to her blog. The post she's made includes a link to a post on Bob's blog. The permalink to Alice's new post is http://alice.example.org/#p123, and the URL of the link to Bob's blog is http://bob.example.net/#foo.
Alice's blogging system parses all the external links out of Alice's post, and finds http://bob.example.net/#foo.
It then requests the first 5 kilobytes of the page referred to by the link.
It looks for an X-Pingback header, but fails to find one.
It scans this page fragment for the pingback link tag, which it finds: link rel="pingback" href="http://bob.example.net/xmlrpcserver" If this tag had not been contained in the page, then Bob's blog would not support pingback, so Alice's software would have given up here (moving on to the next link found in step 2).
Next, since the link was there, it executes the the following XML-RPC call to http://bob.example.net/xmlrpcserver: pingback.ping('http://alice.example.org/#p123', 'http://bob.example.net/#foo')
Alice's blogging system repeats step 3 to 6 for each external link that was found in the post.

There ends the work undertaken by Alice's system. The rest of the work is performed by Bob's blog.

Bob's blog receives a ping from Alice's blog (the ping sent in step 6 above), naming http://alice.example.org/#p123 (the site linking to Bob) and http://bob.example.net/#foo (the page Alice linked to).
Bob's blog confirms that http://bob.example.net/#foo is in fact a post on this blog.
It then requests the content of http://alice.example.org/#p123 and checks the Content-Type of the entity returned to make sure it is text of some sort.
# It verifies that this content does indeed contain a link to http://bob.example.net/#foo (to prevent spamming of pingbacks).
Bob's blog also retrieves other data required from the content of Alice's new post, such as the page title, an extract of the page content surrounding the link to Bob's post, any attributes indicating which language the page is in, and so forth.
Finally, Bob's post records the pingback in its database, and regenerates the static pages referring to Bob's post so that they mention the pingback.

Manual example

Now let's assume something else than a blog and thus no pingback client whatsoever. Then Alice still can send a pingback to Bob's blog:

Alice publishes something on the web. Preferably this should be some kind of html document, but indeed it might be anythin publishable on the web. This document is somehow related to an article in Bob's blog. Alice knows the URI beeing http://bob.example.net/#foo.
Alice downloads Bob's document with some client that allows her to take a look at the http response header. She might for example use wget (a Windows version exists, too): wget -S -O bob.html http://bob.example.net/#foo Now Alice has the http response header on the screen and Bob's html file locally.
Alice looks for an X-Pingback header on the screen, but fails to find one.
Alice now opens the saved file and looks for the pingback link tag and finds it, containing link rel="pingback" href="http://bob.example.net/xmlrpcserver" If this tag had not been contained in the page, then Bob's blog would not support pingback, so Alice would have given up here.
Next, since the link was there, Alice gets any text editor and manually creates an xml file containing her sourceURI and the pingback URI found as targetURI: ?xml version="1.0"? methodCall methodNamepingback.ping/methodName params param valuestringhttp://alice.example.org/#p123/string/value /param param valuestringhttp://bob.example.net/#foo/string/value /param /params /methodCall
Now Alice manually sends an XML-RPC request to Bob, for example again using wget: wget --header='Content-Type: text/xml' -O response.xml\ --post-file=$XML http://bob.example.net/xmlrpcserver Replace $XML with the file name of the xml file just created.
Alice gets a file named response.xml which she now may check for success or fault codes

References

RFC 2119: Key words for use in RFCs to Indicate Requirement Levels, S. Bradner. IETF, March 1997. RFC 2119 is available at http://www.normos.org/ietf/rfc/rfc2119.txt.
RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax, T. Berners-Lee, R. Fielding, L. Masinter. IETF, August 1998. RFC 2396 is available at http://www.normos.org/ietf/rfc/rfc2396.txt
XML-RPC: XML-RPC Specification, D. Winer. UserLand Software, Inc, June 1999. XML-RPC is available at http://www.xmlrpc.com/spec
FaultCodes: Specification for Fault Code Interoperability, D. Libby, et al. May 2001. The Specification for Fault Code Interoperability is available at http://xmlrpc-epi.sourceforge.net/specs/rfc.fault_codes.php

Pingback 1.01