Sunday, September 25, 2011

Streamlining patch submission

I have spent considerable time fixing up HBase builds on Apache Jenkins. I want to share a few points for HBase contributors so that it is easier to maintain stable builds.

For contributors, I understand that it takes so much time to run whole test suite that he/she may not have the luxury of doing this - Apache Jenkins wouldn't do it when you press Submit Patch button.
If this is the case, please use Eclipse (or other tool) to identify tests that exercise the classes/methods in your patch and run them. Also clearly state what tests you ran in the JIRA.

If you have a Linux box where you can run whole test suite, it would be nice to utilize such resource and run whole suite. Then please state this fact on the JIRA as well.
Considering Todd's suggestion of holding off commit for 24 hours after code review, 2 hour test run isn't that long.

Sometimes you may see the following (from 0.92 build 18):
Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
[INFO] ----------------------------------
[INFO] ----------------------------------
[INFO] Total time: 1:51:41.797s
You should examine the test summary above these lines and find out which
 test(s) hung. For this case it was TestMasterFailover:
Running org.apache.hadoop.hbase.master.TestMasterFailover
Running org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
Write deterministic test cases
Sometimes the new tests you added passed for you but fail when committer runs them.
Here is analysis for one such case - the original version of TestRegionServerCoprocessorExceptionWithAbort.
The following call would create 25 regions for the test table:
TEST_UTIL.createMultiRegions(table, TEST_FAMILY);
If we look at HBaseTestingUtility:
public static final byte[][] KEYS = {
    HConstants.EMPTY_BYTE_ARRAY, Bytes.toBytes("bbb"),
We can see that the row in the test
final byte[] ROW = Bytes.toBytes("bbb");
actually doesn't belong to the first region.
The following call was used to track the underlying region server:
    HRegionServer regionServer =

This means regionServer which we were waiting for may not host the region where NPE happened.
I modified the above line slightly and the test passed reliably:
final byte[] ROW = Bytes.toBytes("aaa");
The takeaway is that scoping the test scenario properly and deterministically would simply patch submission.