IBM Developer Advocacy

logshare



Glynn Bird
3/14/16

logshare is a command-line tool that lets you share real-time, streaming text data with your colleagues.

logshareI work for IBM, but I don’t drive to an IBM office every day, I work from home instead. The rest of my team is 250 miles away in Bristol or thousands of miles away in the USA and beyond. I work closely with other IBMers using Slack as a virtual office, but sometimes you need your colleagues to see what you’re seeing on your screen. Imagine I’m developing and running an app locally which causes it to generate logs to a terminal or a log file. I can watch the logs go by with:

    > tail -f thelogs.txt

but my remote colleagues can’t. I could share my screen with them, but that is bandwidth-heavy and doesn’t allow folks to scroll through the logs themselves, or cut-and-paste bits out. This frustration brought about the development of logshare, the simple log-sharing service.

logshare schematic

logshare is command-line tool that lets you share real-time, streaming text data with your colleagues. The consumers of the data can see it either in their terminal or on a web page.

Install logshare

logshare is a Node.js app and is published to the npm code repository, so installing it is as simple as:

   > npm install -g logshare

This installs the logshare command-line utility and its dependencies globally on your machine. You can then begin sharing streaming data immediately:

   > tail -f /var/log/system.log | logshare
    Share URL: https://logshare.mybluemix.net/share/kkdgapgdx

logshare outputs a URL which you can share with the folks who also need to view the logs. If they open the URL in their web browser, they see a real-time stream of logs:

screenshot

If your colleagues have also installed logshare and would prefer to see the shared data on their terminals, then they can type:

    > logshare kkdgapgdx

where kkdgapgdx is the unique token generated for each logshare session.

How does it work?

logshare consists of two software projects:

  • logshare-server is the server-side code that hosts the website, publishes the sharing API, and hosts the web-based sharing tool
  • logshare-client is the client-side code that either publishes or consumes data on the command-line

The logshare-server code is deployed to Bluemix, IBM’s platform-as-a-service where it connects to a Redis service to handle pubsub and meta-data storage, and to a Cloudant NoSQL service which records the stats for each completed logshare session.

Redis

Redis is an in-memory database that lets you store and retrieve simple data structures very quickly. It also has pubsub channels that broker the flow of data between the producer of the data (the originator of the logshare session) and zero or more consumers of it (other command-line or web-based clients). Each logshare session results in the creation of a new PubSub channel to which incoming data is published. Every command-line and web client connects to the logshare server via a WebSockets connection. As new data arrives on the pubsub channel, it is dispatched to the appropriate WebSocket clients.

pubsub

The same effect could be acheived without Redis but if we want our app to scale across multiple logshare servers, then we need Redis to pass requests between the servers:

pubsub multiple servers

Redis also stores the meta data about a log-sharing session, including

  • start date
  • end date
  • number of lines of data
  • number of bytes of data

which is stored in a Redis hash. The arrival of every line of data results in the associated meta data record being updated. Redis is a good fit for this task as its in-memory storage provides low-latency commands that let values increment in the database:

   HINCRBY logshare_kkdgapgdx_meta messages 1
   HINCRBY logshare_kkdgapgdx_meta bytes 251

Deploying and maintaining a multi-node Redis cluster is easy with Compose.io which provides dedicated Redis hosting in a choice of data centres with a 30-day free trial.

API

The logshare server has an HTTP API that lets you start and stop logshare sessions, and publish data. There is also a WebSockets API that lets you publish and subscribe to data. The logshare-client project uses a combination of the HTTP and WebSockets API to generate and consume data.

Cloudant

Cloudant is a scalable NoSQL database run as-a-service by IBM, with free, pay-as-you-go, and dedicated tiers.

The Cloudant component of this project is optional: it is used to archive the logshare meta data once a sharing session has completed. The meta data is converted into a JSON object and stored in a Cloudant database:

{
  "_id": "f9f6122a76f64d8790b2351714f07622",
  "_rev": "1-bc9567e660f69e6709461c89534c6c3d",
  "start": "2016-02-23T09:17:33+00:00",
  "messages": 17,
  "bytes": 1327,
  "end": "2016-02-23T09:18:34+00:00",
  "duration": 61
}

A MapReduce view calculates totals and averages across the entire meta data collection:

function (doc) {
 if (doc.duration) {
   var bits = doc.start.split('Z');
   var d = bits[0];
   var datebits = d.split('-');
   var year = parseInt(datebits[0], 10);
   var month = parseInt(datebits[1], 10);
   var day = parseInt(datebits[2], 10);
   emit([year, month, day], [doc.messages, doc.bytes, doc.duration]);
 }
}

Making the key of the MapReduce index an array lets us group the data at query-time by year, year & month, or year & month & day. The built-in “_stats” reducer allows the data to be summarised as

  • number of messages published
  • number of bytes published
  • average duration

for a given time period.

Privacy and data retention

This project makes no guarantees as to the privacy of the data that you stream to logshare. If you are using https://logshare.mybluemix.net then the data is encrypted between the producer and server, and between the server and the consumers. There is no authentication mechanism to prevent an unknown third party observing your data stream, if they can guess the nine-digit session token. So don’t consider it safe for confidential data. It is designed to relay streaming data across development teams temporarily, not for anything you wouldn’t want others to see.

This project does not store your data at any time. Log data goes to a Redis pubsub channel and then relays immediately to any connected clients who have subscribed to that session. The data is then discarded, with only meta data about the session (the number of lines of data and the number of bytes of data received) being retained.

blog comments powered by Disqus