2009-04-26

请,给点提示吧,bitten

一直在用bitten做集成测试,效果挺好,现在天天离不了它。不过最近被它搞的很郁闷,因为bitten-slave吐出一行神秘的错误信息:
[DEBUG   ] Sending POST request to 'https://mytrac.com/build/5632/steps/'
[WARNING ] Server returned error 403: Forbidden
[ERROR ] HTTP Error 403: Forbidden

起因是因为最近升级了bitten,顺便想把以前的单个进程拆成多个,加快一下bitten的执行速度。人家说,一个和尚有水吃,两个和尚抬水吃,三个和尚没水吃,没想到软件也是这样。两个slave一块跑起来,就有了问题:有时这个出错,有时那个出错,提示都是神秘的403。观察了一下,出现错误的时候,都是某slave A先执行,B稍后也开始执行,然后A再次向服务器POST数据就会出错。猜测可能用户之间有冲突,试过了几种方式,使用各自独立的用户名,在不同目录执行,在不同机器执行,但问题依旧。今天费了些力气,把bitten的代码跟踪了半天,发现程序调用了这个函数
    def reset_orphaned_builds(self):
"""Reset all in-progress builds to ``PENDING`` state if they've been
running so long that the configured timeout has been reached.

This is used to cleanup after slaves that have unexpectedly cancelled
a build without notifying the master, or are for some other reason not
reporting back status updates.
"""

幸亏代码里有注释,这才恍然大悟,原来是timeout参数在作怪。进入trac管理界面,把timeout从10秒调高到500秒,总算解决了问题。

实际上,在bitten描述客户端协议的文档中,已经指出了这一问题:
To handle the case of build slaves going away at some point between having created a build and completing the build, the build master should have a configurable timeout. All in-progress builds would be checked against this timeout; if there has been no activity on the build for an amount of time exceeding the timeout, the master should cancel the build, resetting it the PENDING state. If a slave later does decide to come back to life and post results, it would get 404 (Not Found) or 409 (Conflict) errors, and should cancel the build on its side, too.

但由于未明的原因,bitten的实现用403代替了404、409,并且没有给出进一步的提示信息。要是bitten-slave能打印一条附加信息,告诉我403的原因可能跟timeout设置有关系,那该多好呀。

这个故事告诉我们,别嫌提示信息废话太多,能多写一点是一点,不定什么时候它就会节省自己和别人的若干时间。折腾了三天以后,我现在太喜欢话唠的软件啦。