Apache Accumulo is a high performance distributed data store modeled after Google’s BigTable, similar to HBase, but with some of its own special features. For example, fine grained security access control. This post documents the instructions to install and set up Apache Accumulo on top of IBM Open Platform 4.1.

Installing Accumulo

Download a binary distribution of Apache Accumulo from Apache Accumulo. Download version 1.7.0.

Unpack as follows.

    cd <install location>
    tar xzf <download location>/accumulo-X.Y.Z-bin.tar.gz
    cd accumulo-X.Y.Z

Accumulo has some optional native code that improves its performance and stability.
Before configuring Accumulo, attempt to build this native code with the following command.

    ./bin/build_native_library.sh

If the command fails, it is okay to continue with the setup and resolve this issue later

Configuring

The Accumulo conf directory needs to be populated with initial configuration files.
The following script is provided to assist with this. Run the script and respond to the prompts.
When prompted to choose the memory-map type (Choose the Accumulo memory-map type:), type Native if the build native script was successful previously during installation, otherwise type Java

    ./bin/bootstrap_config.sh

The script will prompt for memory usage. Please note that the footprints are only for the Accumulo system processes, so ample space should be left for other processes like Hadoop, Zookeeper, and the Accumulo client code.

One of the steps in the bootstrap_config is to choose a Hadoop version. Based on the Hadoop distribution, bootstrap_config produces configuration files that match the location layout of your hadoop and dependency jars.
The current Accumulo does not list IBM Open Platform 4.1 as a choice. We have an open source JIRA to fix that.

A modified bootstrap_config.sh can be downloaded from this blog. bootstrap_config

After this script runs, the conf directory should be populated and now a few edits are needed.

Secret

Accumulo coordination and worker processes can only communicate with each other if they share the same secret key.
To change the secret, set instance.secret in conf/accumulo-site.xml. Changing this secret key from the default is highly recommended.

Dependencies

Accumulo requires running Zookeeper and HDFS instances. Also, the Accumulo binary distribution does not include jars for Zookeeper and Hadoop. When configuring Accumulo the following information about these dependencies must be provided.

Location of Zookeepers:

Provide this by setting instance.zookeeper.host in conf/accumulo-site.xml.
For example,

    <property>
    <name>instance.zookeeper.host</name>
    <value>bdvs1163.svl.ibm.com:2181</value>
    <description>comma separated list of zookeeper servers</description>
    </property>

Where to store data:

Provide this by setting instance.volumes in conf/accumulo-site.xml.
For example,

    <property>
    <name>instance.volumes</name>
    <value>hdfs://bdvs1163.svl.ibm.com:8020/apps/accumulo</value>
    <description>comma separated list of URIs for volumes. example: hdfs://localhost:9000/accumulo</description>
    </property>

Ensure that the HDFS location exists and is writable by the user starting Accumulo.
For example, if ‘biadmin’ is the user to start Accumulo:

    sudo su hdfs
    hadoop fs -mkdir /apps/accumulo
    hadoop fs -chown biadmin /apps/accumulo
    exit

Location of Zoookeeper and Hadoop jars:
Setting ZOOKEEPER_HOME and HADOOP_PREFIX in conf/accumulo-env.sh will help Accumulo find these jars.

    export HADOOP_PREFIX=/usr/iop/4.1.0.0/hadoop
    export ZOOKEEPER_HOME=/usr/iop/4.1.0.0/zookeeper

If Accumulo has problems later on finding jars, then run bin/accumulo classpath to print out info about where Accumulo is finding jars.
If the settings mentioned above are correct, then inspect general.classpaths in conf/accumulo-site.xml.

Initialization

Accumulo needs to initialize the locations where it stores data in Zookeeper and HDFS. The following command will do this.

    ./bin/accumulo init

The initialization command will prompt for the following information.

Instance name : This is the name of the Accumulo instance and its Accumulo clients need to know it inorder to connect.
Root password : Initialization sets up an initial Accumulo root user and prompts for its password. This information will be needed to later connect to Accumulo.

In conf/accumulo-site.xml

    <property>
    <name>trace.token.property.password</name>
    <!– change this to the root user’s password –>
    <value>secret</value>
    </property>

Modify conf/monitor to specify the full hostname of the Monitor server.

Multiple Nodes

Skip this section if running Accumulo on a single node. Accumulo has coordinating, monitoring, and worker processes that run on specified nodes in the cluster. The following files should be populated with a newline separated list of node names. Must change from localhost.

conf/masters : Accumulo primary coordinating process. Must specify one node. Can specify a few for fault tolerance.
conf/gc : Accumulo garbage collector. Must specify one node. Can specify a few for fault tolerance.
conf/monitor : Node where Accumulo monitoring web server is run.
conf/slaves : Accumulo worker processes. List all of the nodes where tablet servers should run in this file.
conf/tracers : Optional capability. Can specify zero or more nodes.

The Accumulo, Hadoop, and Zookeeper software should be present at the same location on every node. Also the files in the conf directory must be copied to every node. There are many ways to replicate the software and configuration, two possible tools that can help replicate software and/or config are pdcp and prsync.

Starting Accumulo

The Accumulo scripts use ssh to start processes on remote nodes. Before attempting to start Accumulo, passwordless ssh must be setup on the cluster.

After configuring and initializing Accumulo, use the following command to start it.

    ./bin/start-all.sh

Once the start-all.sh script completes, use the following command to run the Accumulo shell.

    ./bin/accumulo shell -u root

Use your web browser to connect the Accumulo monitor page on port 50095.

    http://<hostname in conf/monitor>:50095/

Accumulo Overview - Accumulo 1.7.0

When finished, use the following command to stop Accumulo.

    ./bin/stop-all.sh

Sample commands in the shell

The following are a few basic Accumulo shell commands, some of which provide a good illustration of Accumulo access control.

    ./bin/accumulo shell -u root
    Password: ********
    2015-10-15 12:52:45,254 [trace.DistributedTrace] INFO : SpanReceiver org.apache.accumulo.tracer.ZooTraceClient was loaded successfully.
      Shell – Apache Accumulo Interactive Shell
      – version: 1.7.0
      – instance name: instance1
      – instance id: 5567cbf9-aefb-4744-934e-7581c7da7d4d
      – type ‘help’ for a list of available commands
      root@instance1> help
      root@instance1> createtable table1
      root@instance1 table1> tables
      accumulo.metadata
      accumulo.replication
      accumulo.root
      table1
      trace
      root@instance1 table1> insert row1 family1 column1 value1 -l private
      root@instance1 table1> scan

    The last command return nothing because the cells are pretected by the label ‘private’ and we need to setauths to the user.

      root@instance1 table1> setauths -s private -u root
      root@instance1 table1> scan
      row1 family1:column1 [private] value1

    The following commands test a new user ‘user1’.

      root@instance1 table1> createuser ‘user1’
      Enter new password for ‘user1’: ********
      Please confirm new password for ‘user1’: ********
      root@instance1 table1> user user1
      Enter password for user user1: ********
      user1@instance1 table1> scan
      2015-10-15 13:39:18,331
    [shell language=”.Shell”][/shell]
      ERROR: java.lang.RuntimeException: org.apache.accumulo.core.client.AccumuloSecurityException: Error PERMISSION_DENIED for user user1 on table table1(ID:2) – User does not have permission to perform this action

    Now grant table permission and setauths for ‘user1’.

      user1@instance1 table1> user root
      Enter password for user root: ********
      root@instance1 table1> grant Table.READ -t table1 -u user1
      root@instance1 table1> setauths -s private -u user1
      root@instance1 table1> user user1
      Enter password for user user1: ********
      user1@instance1 table1> scan
      row1 family1:column1 [private] value1

Join The Discussion

Your email address will not be published. Required fields are marked *