IBM Cloud Satellite: Build faster. Securely. Anywhere. Read more

Extract data from XML and expose it as a service

We all have dealt with XML files and used numerous tools to parse XML and extract information out of it. However, the code to extract information could get complicated while dealing with XML with complex structures. In this tutorial, we will discuss how we could use the IBM Cloudant® view to simplify XML data extraction and also expose it as a service.

Prerequisites

  • An active IBM Cloudant account
  • Basic JavaScript knowledge
  • Knowledge in Java or any other language to work with the XML file
  • We used Java to convert the Sample XML file to JSON first and store it in the Cloudant database. You could use any other language of your choice to accomplish the same. The XML file contains country names and their respective codes in multiple languages. We extracted the list of country names and codes in English and expose it as a service using Cloudant views.

Estimated time

Completing this tutorial should take about 30 minutes.

Steps

  1. Convert the XML to JSON and store it in the Cloudant database
  2. Configure Cloudant view to extract data and expose it as a service

Figure 1

Step 1. Convert the XML to JSON and store it in the Cloudant database

Create a Java project using any IDE (Eclipse, for example).

Add the json.jar and commons-io.jar in the Java Build Path.

Create a CreateJSON class and paste the code below.

To keep the tutorial simple, we used the following Java code to convert the XML to JSON format and stored it in a file. Then we copied the content of the file to create a database document. This could also be done programmatically.

package com.test;

import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import org.apache.commons.io.IOUtils;
import org.json.JSONObject;
import org.json.XML;

public class CreateJSON {
 public static void main(String[] args) throws Exception {
  FileInputStream fis = new FileInputStream("src/Country_List.xml");
  String xmlStr = IOUtils.toString(fis,"UTF-8");
  JSONObject jsonObject = XML.toJSONObject(xmlStr);
  String data = jsonObject.toString();
  String updatedData = data.replaceFirst("\"", "\"_id\":\"countrylist_id\",\"");
  OutputStreamWriter writer = new OutputStreamWriter(new FileOutputStream("src/Country_List.txt"), "utf-8");
  writer.write(updatedData);
  if (writer != null){
   writer.close();
  }
  System.out.println(">>> Finished");
 }
}

Open your Cloudant account and create a database called sample_db.

Click on database sample_db > New Doc and copy the content of the Country_List.txt and paste it in the document, then click Save to create the document.

Step 2. Configure Cloudant view to extract data and expose it as a service

Create a view by clicking on the New View option on the plus symbol present in All Documents or Design Documents, give the index the name countries-list-us-en and _design as data.

Add the following JavaScript in the Map function textbox:

function (doc) {
  if(doc._id == "countrylist_id") {
    var length = doc.picklist.entry;
    for(var i in length){
      var desclength = length[i].description;
      for(var j in desclength){
        if(length[i].description[j].language == "en-US"){
          emit(length[i].description[j].content,length[i].name);
        }
      }
    }
  }
}

Now click Create Document > Build Index. This starts the process of creating the view.

Figure 2

Once the view creation is complete, click on the view name and again click on the JSON option, outlined in a red box in the following image.

Figure 3

This opens a unique URL, as shown below, through which the data could be consumed. Here, “key” represents the country name, and “value” represents the corresponding two-digit country code.

Figure 4

Similarly, other views could easily be created for different languages available in the attached XML file showing similar data by making minor changes in the Javascript code like this:

function (doc) {
  if(doc._id == "countrylist_id") {
    var length = doc.picklist.entry;
    for(var i in length){
      var desclength = length[i].description;
      for(var j in desclength){
        // Changed language below to show data in Italian
        if(length[i].description[j].language == "it-IT"){
          emit(length[i].description[j].content,length[i].name);
        }
      }
    }
  }
}

Summary

With minimal coding, we can easily extract data from XML and expose it as a service using the IBM Cloudant built-in feature. The above-mentioned Cloudant view has a unique URL that can be accessed through a GET call makes it convenient for adopting applications to consume it.