2021 Call for Code Awards: Live from New York, with SNL’s Colin Jost! Learn more

Meet the Extensible Messaging and Presence Protocol (XMPP)

Instant messaging (IM) is a popular application among casual Internet users as well as business users. It provides not only the means for users to communicate with others in real time, but also get their presence information (available, away from the computer, offline, and so on). One of the earliest open IM protocols was Jabber, which began as a nonstandard IM protocol in 1998 (developed by Jeremie Miller). As an extensible protocol built with XML, Jabber quickly found other applications as a general transport or message-oriented middleware (MoM). Eventually, XMPP arose from Jabber as a standards-based protocol in the form of an IETF working group protocol document: RFC 3920, “Extensible Messaging and Presence Protocol (XMPP).”

XMPP is not alone as a general-purpose messaging transport. Other popular protocols such as XML-RPC and SOAP can provide this capability with function call-like semantics. Newer methods such as Representational State Transfer (ReST) provide managed file access using URLs to specify the location, object, and method.

XMPP architecture

XMPP has similarities to other application-layer protocols like SMTP. In these architectures, a client with a unique name communicates with another client with a unique name through an associated server. Each client implements the client form of the protocol, where the server provides routing capability. Figure 1 illustrates this simple architecture. In this case, each client is part of the same domain (discovery.nasa.guv).

Figure 1. A simple XMPP architecture, consisting of a server and two clients
Diagram of a simple XMPP architecture, consisting of a server and two clients

Servers can also communicate for purposes of routing between domains (for example, between discovery.nasa.guv and europa.nasa.guv). Further, gateways can exist for purposes of translating between foreign messaging domains and protocols. The example in Figure 2 shows an XMPP network with gateways to a Short Message Service (SMS) domain and an SMTP domain. Gateways are most often used in this context to translate between IM protocols (for example, XMPP to Internet Relay Chat [IRC]). As an extensible protocol, XMPP is an ideal backbone protocol to provide universal connectivity among different endpoint protocols. The XMPP gateway permits the termination of a given client-to-server session and the initiation of a new session to the target endpoint protocol (along with the necessary protocol translation).

Figure 2. A more complex XMPP architecture, including XMPP gateways
Diagram of a more complex XMPP architecture, with XMPP gateways connecting to SMS and SMTP clients and servers

Addresses in XMPP

Addresses (or Jabber IDs [JIDs]) in XMPP are similar to standard e-mail addresses with a couple of notable differences. JIDs include an optional node, a domain, and an optional resource in the form:

[ node "@" ] domain [ "/" resource ]

The most common use is defining an IM user (like an e-mail address), such as DavidBowman@discovery.nasa.guv. A user can log in to an XMPP server multiple times, and in this case, the resource can denote a location. For example, the sample user might have a JID for his main terminal (DavidBowman@discovery.nasa.guv/terminal) and another JID (session) from an EVA pod (DavidBowman@discovery.nasa.guv/eva_pod1). So, a particular location can be targeted or left absent to reach the user at whichever location he happens to be logged in.

The XMPP protocol

XMPP is a relatively simple protocol that occurs over TCP sockets using XML messages. Asynchronous communication occurs within XML streams and with XML stanzas. An XML stream is an envelope that encapsulates the exchange of XML information between two entities. XML streams communicate XML stanzas, which are discrete units of information. For example, XML stanzas are used within XMPP to communicate messages (text between IM users) as well as presence information. To illustrate these concepts, look at a simple example of an IM communication using XMPP between two clients.

Figure 3 illustrates a simple conversation between two entities. Note that at least one server appears within the conversation (in this case, because both clients exist within the same domain, there’s exactly one server). In Figure 3, the left client is known as the initiating entity (it initiates the XMPP communication between the two entities). This XML stream uses the to attribute to identify the receiving domain (as well as define the XML namespace). The receiving client on the right receives this XML stream and responds with an XML stream response (in this case, using the from attribute). At this stage, a number of different negotiations are possible, such as authentication and encryption. Ignore this aspect for this discussion (in addition to server-to-server communication when IM clients appear in separate domains).

Figure 3. Sample (simplified) XMPP communication
Diagram of a sample (simplified) XMPP communication

The next step in the XML stream from Figure 3 is communicating messages. This communication occurs within the message stanza and includes the source and destination XMPP addresses (from and to), the language used, and a message contained within the body of the stanza. The peer responds with its own message, the key difference being the source and destination XMPP addresses. Finally, the XML stream is closed by issuing the stream closure message (which occurs on both sides of the connection).

Either side can return an error, such as defined below. In this case, the peer sent an XML stream or stanza that was invalid.

  <xml‑not‑well‑formed xmlns='urn:ietf:params:xml:ns:xmpp‑streams'/>

Although this example demonstrated simple IM communication, it’s easy to see how the message stanzas might be transformed into RPC messages, piggybacking the security aspects from the peer negotiation. Instead of users within a domain, you can register functions as the nodes to build a dynamic Web services framework. Now, let’s look at how to build a simple application that communicates across XMPP.

An XMPP example with Ruby

One of the other interesting aspects of XMPP is the large number of libraries you can choose from, including a wide spectrum of languages. This example uses the Ruby language and the XMPP4R library.

To demonstrate XMPP through a library, explore the development of a simple IM agent that acts as a technical dictionary. In this way, through a standard instant messenger, you can type a word, and the IM agent will return its definition.

This example implements an IM agent that connects through XMPP to another IM agent and, once connected, resolves words to definitions. Listing 1 provides the simple XMPP agent.

Listing 1. Simple XMPP agent for word definitions
require 'xmpp4r/client'

#Create a very simple dictionary using a hash
hash = {}
hash['ruby'] = 'Greatest little object oriented scripting language'
hash['xmpp4r'] = 'Simple XMPP library for ruby'
hash['xmpp'] = 'Extensible Messaging and Presence Protocol'

#Connect to the server and authenticate
jid = Jabber::JID::new('bot@default.rs/Home')
cl = Jabber::Client::new(jid)

#Indicate our presence to the server
cl.send Jabber::Presence::new

#Send a salutation to a given user that we're ready
salutation = Jabber::Message::new( 'hal@default.rs', 'DictBot ready' )
cl.send salutation 

#Add a message callback to respond to peer requests
cl.add_message_callback do |inmsg|

    #Lookup the word in the dictionary
    resp = hashinmsg.body    if resp == nil
      resp = "don't know about " + inmsg.body

    #Send the response
    outmsg = Jabber::Message::new( inmsg.from, resp )
    cl.send outmsg


while 1

Listing 1 begins with the creation of a simple dictionary. For this purpose, you use the hash class in Ruby, which allows you to create key-value pairs (in what appears to be an array), but then easily reference them later by key. Next, you use the XMPP4R library to connect to a server. Start by creating a JID and a new client connection with the Client class. To actually connect to the IM server, use the connect method. Once connected, you call the auth method with the password. The connection is ready for messaging.

The next step (optional) is to indicate your presence to the IM server. To do this, send a presence stanza to the server. You also send an optional message to the peer to indicate that you’re available. To do this, create a message stanza and initialize it with the peer address and a message. After the message is initialized, send it using your Client class instance with the send method.

To react to messages sent to you, use the add_message_callback method of your client connection. Whenever a message arrives, the block of code is invoked to handle the message. The incoming message is represented as inmsg (Message instance). Start by checking to see whether the word defined by the incoming message body is a word in your dictionary. If a nil was returned, the word was not found, so you provide a default response. Construct a new message using the source of the incoming message (inmsg.from) and your response string. With the initialization complete, send the new message through your client instance to the originator.

Figure 4 shows a sample run of the application. This example uses the popular pidgin universal chat client. The pidgin client supports all the major chat protocols and can be used with many of the available chat networks (even simultaneously). Figure 4 shows the messaging pop-up window created when your IM agent connects to the server and begins a conversation with the defined user.

Figure 4. Sample IM session with the IM agent
Screen capture of the sample IM session with the IM agent

This application was extremely simple, but XMPP4R provides a large number of classes and methods for other functionality, such as account registration, discovery, file transfers, multi-user chat, publish/subscribe, and even RPC. You can find a “browseable” class API that provides a convenient way to view all the XMPP4R files, classes, and methods.

Applications of XMPP

XMPP provides a general framework for messaging across a network. Not surprisingly, this has a multitude of applications beyond traditional IM and the distribution of presence data.

A close application to IM is group or multi-party messaging or the development of multi-user chat rooms. With multi-party communication, features similar to micro-blogging as provided by Twitter can be implemented. But text is not the only data that can be transmitted through XMPP. Other forms of communication could include audio, image, and video data.

Service discovery protocols exist today (such as Bonjour, or the Service Location Protocol), but XMPP provides a solid base for both discovery of services on a network and advertisement of services and capabilities.

Online gaming could make considerable use of XMPP. XMPP natively provides a crucial set of features for online games, including authentication, presence information, chat, and extensible near-real-time communication of game state information.

Finally, XMPP is a perfect protocol for the new era of cloud computing. Cloud computing and storage systems rely on various levels and forms of communication, including not only messaging between systems to relay state but also migration of larger objects, such as storage or virtual machines. Along with authentication and in-transit data protection, XMPP can be applied at a variety of levels and is ideal as a middleware protocol.

Note here that the majority of the applications have nothing to do with human communication but instead with machine communication (MMI, or machine-to-machine communication). It’s quite interesting that a protocol intended for IM is finding quite diverse uses.

Multi-language XMPP

XMPP is implemented as a set of libraries providing XMPP capabilities to the application. It’s easy to tell how useful XMPP is as a protocol from the vast number of languages it now supports. You’ll find XMPP library software for the traditional languages like C, C++, and popular scripting languages like Ruby, the Java™ language, Python, Perl, and Tcl. You’ll also find XMPP libraries for languages like Erlang, C#, and Lisp. Therefore, whatever your environment, there’s likely an XMPP library that you can use to gain XMPP access.

Going further

Many useful technologies are often applied in ways their originators never considered. For example, HTTP is the de facto standard protocol for serving Web pages over the Internet, but it is also used as an application-layer transport for other protocols like SOAP and XML-RPC (including protocol models like REST). XMPP is another useful technology that is finding many new applications beyond simply IM. How will you apply XMPP to your solution today?