A production customer on the a private cloud had a HBase cluster of about a few dozens of servers. It held about 50TB of data. Multiple clients obtained data from Kakfa topics and ingested them into HBase. The data size had been increasing constantly. To accommodate the increasing data size, the team decided to add additional dozens of servers to the cluster.

Servers were added to the cluster and each served as both HDFS datanode and HBase region server. After the servers were started, HDFS balancer and HBase balancer kicked in to balance the HDFS data blocks and HBase regions. It was fairly quick that the HBase regions were distributed to the new region servers. But the HDFS data blocks were fairly slow to be on the new datanodes. But overall the new servers were up and running and seemed to have joined the cluster nicely.

But there was a notable problem. The clients that ingested data from Kafka to HBase saw their throughput dropped significantly, moving at a crawl. The clients were getting exceptions like this and others:

INFO htable-pool7-t6382 [AsyncProcess] #86, table=RAWHITS_20160705203248, attempt=20/35 failed=116ops, last exception: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ipc.RpcServer$CallQueueTooBigException):
Call queue is full on /xx.154.33.147:16020, is hbase.ipc.server.max.callqueue.size too small?
on xxx.com,16020,1505889514955, tracking started null, retrying after=20079ms,

The clients used HBase BufferedMutator. We had increased the ‘hbase.ipc.server.max.callqueue.size’ to 2GB from the default 1GB.
On the region server side, we saw many in the log:

    WARN [B.defaultRpcServer.handler=9,queue=9,port=16020] ipc.RpcServer: (responseTooSlow): {“processingtimems”:27142,“call”:“Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)“,
    ”client”:“xx.154.33.180:44124”,“starttimems”:1505953597039,“queuetimems”:59197,
    “class”:“HRegionServer”,“responsesize”:62,“method”:“Multi”}

Also,
INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 719 ms, current pipeline:

The problem continued. We checked the system resources on the nodes. The servers had plenty of CPU cores and memory. The network was 10 Gb Ethernet. From ‘nmon’, the CPUs and disk IO were good, not even near full utilization, and peek network transfer at around 70 GB/s.

The cluster had been running well with satisfactory throughput before the addition of the new servers. The problem had to be caused by the presence of the new servers. But the hardware profile and node configuration of the new servers were identical to the old ones. The data locality were low on the new servers, which was expected. Compaction would gradually improve the locality. But the compactions were extremely slow:

    regionserver.CompactSplitThread: Aborted compaction: Request = regionName=RAWHITS_20170309185447,D1CB38984A13104A21C6E7C1D2E32908_20170605122536_BD9070EC99A543888E392D8CFE6783DD,1505893790907.
    01a90223203e8816af1943e637263122., storeName=f1, fileCount=15, fileSize=24.2 G (23.4 G, 437.2 M, 248.2 M, 31.4 M, 2.9 M, 11.6 M,
    5.8 M, 2.8 M, 2.7 M, 7.4 M, 9.2 M, 2.6 M, 1.9 M, 935.7 K, 2.5 M), priority=180, time=3050883393679985;
    duration=1hrs, 49mins, 46sec

We expected some degradation in performance because of the low data locality on the new servers. But it was too much degradation.

There were messages in the datanode logs:

WARN datanode.DataNode (BlockReceiver.java:receivePacket(562)) – Slow BlockReceiver write packet to mirror took 1631ms (threshold=300ms)

It seemed to point to network congestion. We ran ‘iperf’ to look at the network performance. Between the old nodes, between the new nodes, and finally between the old and new nodes.

    [ 3] local xx.154.33.32 port 38954 connected with xx.154.33.247 port 5001
    [ ID] Interval Transfer Bandwidth
    [ 3] 0.0-10.0 sec 10.9 GBytes 9.35 Gbits/sec
    [ 3] local xx.154.33.32 port 52064 connected with xx.155.37.35 port 5001
    [ ID] Interval Transfer Bandwidth
    [ 3] 0.0-10.0 sec 50.8 MBytes 42.5 Mbits/sec

The first ‘iperf’ output was between old nodes. The second ‘iperf’ output was between old node and new node, and it was apparently too slow. It was the problem!
Without going into the details, it turned out that the network was mis-configured when adding the new servers. The data transfer between new and old nodes was not up to speed while HBase’s data locality was low.

After the network issue was fixed, HBase performance got back to normal, and clients were able to feed into HBase nicely again.

Join The Discussion

Your email address will not be published. Required fields are marked *