Working with Stanislav Barton on Load Balancer in HBase 0.90, he asked for document on how load balancer works. This writeup would touch the internals of load balancer and how it evolved over time.
Master code (including load balancer) has been rewritten for HBase 0.90
When a region receives many writes and is split, the daughter regions are placed on the same region server as the parent region. Stan proposed to change this behavior and I summarized in HBASE-3779.
HBASE-3586 tried to solve the problem where load balancer moves inactive regions off an overloaded region server by randomly selecting regions to offload. This is to handle the potential problem of moving too many hot regions onto a region server which recently joined the cluster.
But this random selection isn't optimal. For Stan's cluster, there're around 600 regions on each region server. When 30 new regions were created on the same region server, random selector only chose 3 out of the 30 new regions for reassignment. The other region selection was from inactive (old) regions. This is expected behavior because new and old regions were selected equally probably.
Basically we traded some optimization for safety of not overloading a newly discovered region server.
So I continued enhancement using HBASE-3609 where one of the goals is to remove randomness from LoadBalancer so that we can deterministically produce near-optimal balancing actions.
If at least one region server joined the cluster just before the current balancing action, both new and old regions from overloaded region servers would be moved onto underloaded region servers. Otherwise, I find the new regions and put them on different underloaded servers. Previously one underloaded server would be filled up before the next underloaded server is considered.
I also utilize the randomizer which shuffles the list of underloaded region servers.
This way we can avoid distributing offloaded regions to few region servers.
HBASE-3609 has been integrated into trunk as of Apr 18th, 2011.
HBASE-3704 would help users observe the distribution of regions. It is currently only in HBase trunk code.
Also related is HBASE-3373. Stan proposed to make it more general. A new policy for load balancer can be added to have balanced number of regions per RS per table and not in total number of regions from all tables.
If you're interested in more detail, please take a look at the javadoc for LoadBalancer.balanceCluster()
For HBase trunk, I implemented HBASE-3681 upon Jean-Daniel Cryans's request. For 0.90.2 and later, the default value of sloppiness is 0.
I am planning for the next generation of load balancer where request histogram would play an important role in deciding which regions to move. Please take a look at HBASE-3679
HBaseWD project introduced multiple scanners for bucketed writes. I plan to accommodate this new feature through HBASE-3811 where additional attributes in Scan object would allow balancer to group the scanners generated by HBaseWD.
HBASE-3943 is supposed to solve the problem where region reassignment disrupts (potentially long) compaction.
HBASE-3945 tries to give regions more stability by not reassigning region(s) in consecutive balancing actions.