Hi. I am Prachi, a backend developer for IBM Graph, a fully-managed, enterprise-grade graph database service built on the cloud. Our development team works via a continuous delivery pipeline to regularly add new features, enhance existing ones and deliver bug fixes. Several weeks ago, I was working on the backend code to improve the graph upload experience, adding REST API methods for asynchronous graph uploads. When the service receives an asynchronous graph upload request, it notifies the user that the request has been accepted and generates an upload Id. The upload Id can be used to query the status of the upload via the service’s REST API, as in the following commands. This setup provides an nice user experience since they’re not blocked with a wait time dependent on how big the upload is or due to slowness in the service.

# Session auth
curl -X GET -H 'Content-Type:application/json' -u 'cffb672f-fe5e-4810-a5da-a6ce182014e2:2eafd208-841d-4afd-aa35-6bdb2214d84b' https://ibmgraph-alpha.ng.bluemix.net/32b7fa84-df0e-4546-b38e-74a71a1e69c7/_session
{"gds-token":"Y2ZmYjY3MmYtZmU1ZS00ODEwLWE1ZGEtYTZjZTE4MjAxNGUyOjE0Nzk1MDgxMTUxNTU6eXZNdUxBSGxNSXYvUEszM3pMVDJEakh6QlVkRFdEdStucFFiRFd2d2xmcz0="}

# Asynchronous graph upload
curl -X POST -H 'Content-Type:multipart/form-data' -H 'Authorization: gds-token Y2ZmYjY3MmYtZmU1ZS00ODEwLWE1ZGEtYTZjZTE4MjAxNGUyOjE0Nzk1MDgxMTUxNTU6eXZNdUxBSGxNSXYvUEszM3pMVDJEakh6QlVkRFdEdStucFFiRFd2d2xmcz0=' -F 'graphml=@./air-routes-small.graphml' https://ibmgraph-alpha.ng.bluemix.net/32b7fa84-df0e-4546-b38e-74a71a1e69c7/g/uploads/graphml
{"uploadId":"502f4f57-f60c-4e92-ae9a-63eca980817a","operation":"bulkload","status":"ACCEPTED","code":202}

# Graph upload status using uploadId
curl -X GET -H 'Content-Type:application/json' -H 'Authorization: gds-token Y2ZmYjY3MmYtZmU1ZS00ODEwLWE1ZGEtYTZjZTE4MjAxNGUyOjE0Nzk1MDgxMTUxNTU6eXZNdUxBSGxNSXYvUEszM3pMVDJEakh6QlVkRFdEdStucFFiRFd2d2xmcz0=' https://ibmgraph-alpha.ng.bluemix.net/32b7fa84-df0e-4546-b38e-74a71a1e69c7/g/uploads/502f4f57-f60c-4e92-ae9a-63eca980817a/status
{"uploads":[{"serviceId":"32b7fa84-df0e-4546-b38e-74a71a1e69c7","graphId":"g","uploadId":"502f4f57-f60c-4e92-ae9a-63eca980817a","startTimestamp":1479508238184,"completionTimestamp":null,"statusCode":202,"statusMessage":"ACCEPTED","type":"bulkload"},{"serviceId":"32b7fa84-df0e-4546-b38e-74a71a1e69c7","graphId":"g","uploadId":"502f4f57-f60c-4e92-ae9a-63eca980817a","startTimestamp":1479508238184,"completionTimestamp":1479508249550,"statusCode":201,"statusMessage":"COMPLETED","type":"bulkload"}]}

Part of this effort required storing state in a Cloudant database. Initially, I added three indexes to query upload status in different ways – using the Service Id, the Graph Id and the Upload Id. These queries looked like this:

 # Graph upload status using uploadId
 curl -X GET -H 'Content-Type:application/json' -H 'Authorization: gds-token Y2ZmYjY3MmYtZmU1ZS00ODEwLWE1ZGEtYTZjZTE4MjAxNGUyOjE0Nzk1MDgxMTUxNTU6eXZNdUxBSGxNSXYvUEszM3pMVDJEakh6QlVkRFdEdStucFFiRFd2d2xmcz0=' https://ibmgraph-alpha.ng.bluemix.net/32b7fa84-df0e-4546-b38e-74a71a1e69c7/g/uploads/502f4f57-f60c-4e92-ae9a-63eca980817a/status</pre>
{"uploads":[{"serviceId":"32b7fa84-df0e-4546-b38e-74a71a1e69c7","graphId":"g","uploadId":"502f4f57-f60c-4e92-ae9a-63eca980817a","startTimestamp":1479508238184,"completionTimestamp":null,"statusCode":202,"statusMessage":"ACCEPTED","type":"bulkload"},{"serviceId":"32b7fa84-df0e-4546-b38e-74a71a1e69c7","graphId":"g","uploadId":"502f4f57-f60c-4e92-ae9a-63eca980817a","startTimestamp":1479508238184,"completionTimestamp":1479508249550,"statusCode":201,"statusMessage":"COMPLETED","type":"bulkload"}]}

# Graph upload status using graphId
curl -X GET -H 'Content-Type:application/json' -H 'Authorization: gds-token Y2ZmYjY3MmYtZmU1ZS00ODEwLWE1ZGEtYTZjZTE4MjAxNGUyOjE0Nzk1MDgxMTUxNTU6eXZNdUxBSGxNSXYvUEszM3pMVDJEakh6QlVkRFdEdStucFFiRFd2d2xmcz0=' https://ibmgraph-alpha.ng.bluemix.net/32b7fa84-df0e-4546-b38e-74a71a1e69c7/g/uploads/status
{"uploads":[{"serviceId":"32b7fa84-df0e-4546-b38e-74a71a1e69c7","graphId":"g","uploadId":"502f4f57-f60c-4e92-ae9a-63eca980817a","startTimestamp":1479508238184,"completionTimestamp":null,"statusCode":202,"statusMessage":"ACCEPTED","type":"bulkload"},{"serviceId":"32b7fa84-df0e-4546-b38e-74a71a1e69c7","graphId":"g","uploadId":"502f4f57-f60c-4e92-ae9a-63eca980817a","startTimestamp":1479508238184,"completionTimestamp":1479508249550,"statusCode":201,"statusMessage":"COMPLETED","type":"bulkload"}]}

# Graph upload status using serviceId
curl -X GET -H 'Content-Type:application/json' -H 'Authorization: gds-token Y2ZmYjY3MmYtZmU1ZS00ODEwLWE1ZGEtYTZjZTE4MjAxNGUyOjE0Nzk1MDgxMTUxNTU6eXZNdUxBSGxNSXYvUEszM3pMVDJEakh6QlVkRFdEdStucFFiRFd2d2xmcz0=' https://ibmgraph-alpha.ng.bluemix.net/32b7fa84-df0e-4546-b38e-74a71a1e69c7/uploads/status
{"uploads":[{"serviceId":"32b7fa84-df0e-4546-b38e-74a71a1e69c7","graphId":"g","uploadId":"502f4f57-f60c-4e92-ae9a-63eca980817a","startTimestamp":1479508238184,"completionTimestamp":null,"statusCode":202,"statusMessage":"ACCEPTED","type":"bulkload"},{"serviceId":"32b7fa84-df0e-4546-b38e-74a71a1e69c7","graphId":"g","uploadId":"502f4f57-f60c-4e92-ae9a-63eca980817a","startTimestamp":1479508238184,"completionTimestamp":1479508249550,"statusCode":201,"statusMessage":"COMPLETED","type":"bulkload"}]}

At first, I used Cloudant map-reduce views for index creation, but code review feedback recommended Cloudant Queries instead. This meant rewriting a lot of code, which was painful to contemplate when the existing logic already worked. On the plus side, we’d gain a performance improvement. So I rewrote index creation using Cloudant Queries. But it was still slow. The problem was that I had created only one Cloudant design document, sequentially creating the indexes, to keep things organized properly. A colleague suggested that separate design documents may help. At first, that approach seemed unorganized and sloppy, until I realized: backend development is like general surgery. As Dr. Richard Webber said in Grey’s Anatomy:

I don’t need pretty. And I don’t need perfect. What I need is for this to work. And what’s gonna make it work is for me to take out that tumor and put these healthy organs inside my very sick patient. It won’t be pretty, but it will work, and it will keep my patient alive.

In engineering school, they teach us the importance of performance and agility. This real-world example shows how prioritizing engineering concerns over organization and prettiness is smart and effective. I ended up invoking 3 index creation requests in parallel, which was so fast! It’s learning moments like this that just make me smile. The fact that Cloudant Query is a REST API – stateless, predictable and easy to use, just added to my joy. :)

Join The Discussion

Your email address will not be published. Required fields are marked *