[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The standard purpose of regression testing is to avoid getting the same bug twice. When a bug is found, the programmer fixes the bug and adds a test to the test suite. The test should fail before the fix and pass after the fix. When a new version is about to be released, all the tests in the regression test suite are run and if an old bug reappears, this will be seen quickly since the appropriate test will fail.
The regression testing in GNU Go is slightly different. A typical test case involves specifying a position and asking the engine what move it would make. This is compared to one or more correct moves to decide whether the test case passes or fails. It is also stored whether a test case is expected to pass or fail, and deviations in this status signify whether a change has solved some problem and/or broken something else. Thus the regression tests both include positions highlighting some mistake being done by the engine, which are waiting to be fixed, and positions where the engine does the right thing, where we want to detect if a change breaks something.
20.1 Regression testing in GNU Go | Regression Testing in GNU Go | |
20.2 Test suites | Test Suites | |
20.3 Running the Regression Tests | ||
20.4 Running regress.pike | ||
20.5 Viewing tests with Emacs | ||
20.6 HTML Regression Views | HTML Views |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Regression testing is performed by the files in the ‘regression/’
directory. The tests are specified as GTP commands in files with the
suffix ‘.tst’, with corresponding correct results and expected
pass/fail status encoded in GTP comments following the test. To run a
test suite the shell scripts ‘test.sh’, ‘eval.sh’, and
‘regress.sh’ can be used. There are also Makefile targets to do
this. If you make all_batches
most of the tests are run. The
Pike script ‘regress.pike’ can also be used to run all tests or a
subset of the tests.
Game records used by the regression tests are stored in the directory ‘regression/games/’ and its subdirectories.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The regression tests are grouped into suites and stored in files as GTP commands. A part of a test suite can look as follows:
# Connecting with ko at B14 looks best. Cutting at D17 might be # considered. B17 (game move) is inferior. loadsgf games/strategy25.sgf 61 90 gg_genmove black #? [B14|D17] # The game move at P13 is a suicidal blunder. loadsgf games/strategy25.sgf 249 95 gg_genmove black #? [!P13] loadsgf games/strategy26.sgf 257 100 gg_genmove black #? [M16]* |
Lines starting with a hash sign, or in general anything following a hash
sign, are interpreted as comments by the GTP mode and thus ignored by
the engine. GTP commands are executed in the order they appear, but only
those on numbered lines are used for testing. The comment lines starting
with #?
are magical to the regression testing scripts and
indicate correct results and expected pass/fail status. The string
within brackets is matched as a regular expression against the response
from the previous numbered GTP command. A particular useful feature of
regular expressions is that by using ‘|’ it is possible to specify
alternatives. Thus B14|D17
above means that if either B14
or D17
is the move generated in test case 90, it passes. There is
one important special case to be aware of. If the correct result string
starts with an exclamation mark, this is excluded from the regular
expression but afterwards the result of the matching is negated. Thus
!P13
in test case 95 means that any move except P13
is
accepted as a correct result.
In test case 100, the brackets on the #?
line is followed by an
asterisk. This means that the test is expected to fail. If there is no
asterisk, the test is expected to pass. The brackets may also be
followed by a ‘&’, meaning that the result is ignored. This is
primarily used to report statistics, e.g. how many tactical reading
nodes were spent while running the test suite.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
./test.sh blunder.tst
runs the tests in ‘blunder.tst’ and
prints the results of the commands on numbered lines, which may look
like:
1 E5 2 F9 3 O18 4 B7 5 A4 6 E4 7 E3 8 A3 9 D9 10 J9 11 B3 12 C6 13 C6 |
This is usually not very informative, however. More interesting is
./eval.sh blunder.tst
which also compares the results above
against the correct ones in the test file and prints a report for each
test on the form:
1 failed: Correct '!E5', got 'E5' 2 failed: Correct 'C9|H9', got 'F9' 3 PASSED 4 failed: Correct 'B5|C5|C4|D4|E4|E3|F3', got 'B7' 5 PASSED 6 failed: Correct 'D4', got 'E4' 7 PASSED 8 failed: Correct 'B4', got 'A3' 9 failed: Correct 'G8|G9|H8', got 'D9' 10 failed: Correct 'G9|F9|C7', got 'J9' 11 failed: Correct 'D4|E4|E5|F4|C6', got 'B3' 12 failed: Correct 'D4', got 'C6' 13 failed: Correct 'D4|E4|E5|F4', got 'C6' |
The result of a test can be one of four different cases:
passed
: An expected pass
This is the ideal result.
PASSED
: An unexpected pass
This is a result that we are hoping for when we fix a bug. An old test case that used to fail is now passing.
failed
: An expected failure
The test failed but this was also what we expected, unless we were trying to fix the particular mistake highlighted by the test case. These tests show weaknesses of the GNU Go engine and are good places to search if you want to detect an area which needs improvement.
FAILED
: An unexpected failure
This should nominally only happen if something is broken by a change. However, sometimes GNU Go passes a test, but for the wrong reason or for a combination of wrong reasons. When one of these reasons is fixed, the other one may shine through so that the test suddenly fails. When a test case unexpectedly fails, it is necessary to make a closer examination in order to determine whether a change has broken something.
If you want a less verbose report, ./regress.sh . blunder.tst
does the same thing as the previous command, but only reports unexpected
results. The example above is compressed to
3 unexpected PASS! 5 unexpected PASS! 7 unexpected PASS! |
For convenience the tests are also available as makefile targets. For
example, make blunder
runs the tests in the blunder test suite by
executing eval.sh blunder.tst
. make all_batches
runs all
test suites in a sequence using the regress.sh
script.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A more powerful way to run regressions is with the script ‘regress.pike’. This requires that you have Pike (http://pike.ida.liu.se) installed.
Executing ./regress.pike
without arguments will run all
testsuites that make all_batches
would run. The difference is
that unexpected results are reported immediately when they have been
found (instead of after the whole file has been run) and that statistics
of time consumption and node usage is presented for each test file and
in total.
To run a single test suite do e.g. ./regress.pike nicklas3.tst
or
./regress.pike nicklas3
. The result may look like:
nicklas3 2.96 614772 3322 469 Total nodes: 614772 3322 469 Total time: 2.96 (3.22) Total uncertainty: 0.00 |
The numbers here mean that the test suite took 2.96 seconds of processor
time and 3.22 seconds of real time. The consumption of reading nodes was
614772 for tactical reading, 3322 for owl reading, and 469 for
connection reading. The last line relates to the variability of the
generated moves in the test suite, and 0 means that none was decided by
the randomness contribution to the move valuation. Multiple testsuites
can be run by e.g. ./regress.pike owl ld_owl owl1
.
It is also possible to run a single testcase, e.g. ./regress.pike
strategy:6
, a number of testcases, e.g. ./regress.pike
strategy:6,23,45
, a range of testcases, e.g. ./regress.pike
strategy:13-15
or more complex combinations e.g. ./regress.pike
strategy:6,13-15,23,45 nicklas3:602,1403
.
There are also command line options to choose what engine to run, what
options to send to the engine, to turn on verbose output, and to use a
file to specify which testcases to run. Run ./regress.pike --help
for a complete and up to date list of options.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
To get a quick regression view, you may use the graphical
display mode available with Emacs (see section GNU Go mode in Emacs). You will
want the cursor in the regression buffer when you enter
M-x gnugo
, so that GNU Go opens in the correct
directory. A good way to be in the right directory is to
open the window of the test you want to investigate. Then
you can cut and past GTP commands directly from the test to
the minibuffer, using the :
command from
Emacs. Although Emacs mode does not have a coordinate grid,
you may get an ascii board with the coordinate grid using
: showboard
command.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Extremely useful HTML Views of the regression tests may be produced using two perl scripts ‘regression/regress.pl’ and ‘regression/regress.plx’.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There are many ways configuring Apache to permit CGI scripts, all of them are featured in Apache documentation, which can be found at http://httpd.apache.org/docs/2.0/howto/cgi.html
Below you will find one example.
This documentation assumes an Apache 2.0 included in Fedora Core distribution, but it should be fairly close to the config for other distributions.
First, you will need to configure Apache to run CGI scripts in the directory you wish to serve the html views from. In ‘/etc/httpd/conf/httpd.conf’ there should be a line:
DocumentRoot "/var/www/html"
Search for a line <Directory "/path/to/directory">
, where
/path/to/directory
is the same as provided in DocumentRoot
,
then add ExecCGI
to list of Options
.
The whole section should look like:
<Directory "/var/www/html"> ... Options ... ExecCGI ... </Directory> |
This allows CGI scripts to be executed in the directory used by regress.plx. Next, you need to tell Apache that ‘.plx’ is a CGI script ending. Your ‘httpd.conf’ file should contain a line:
AddHandler cgi-script ...
If there isn’t already, add it; add ‘.plx’ to the list of extensions, so line should look like:
AddHandler cgi-script ... .plx
You will also need to make sure you have the necessary modules loaded to run
CGI scripts; mod_cgi and mod_mime should be sufficient. Your ‘httpd.conf’
should have the relevant LoadModule cgi_module modules/mod_cgi.so
and
LoadModule mime_module modules/mod_mime.so
lines; uncomment them if
necessary.
Next, you need to put a copy of ‘regress.plx’ in the DocumentRoot
directory /var/www/html
or it subdirectories where you plan to serve the
html views from.
You will also need to install the Perl module GD (http://search.cpan.org/dist/GD/), available from CPAN.
Finally, run ‘regression/regress.pl’ to create the xml data used to generate the html views (to do all regression tests run ‘regression/regress.pl -a 1’); then, copy the ‘html/’ directory to the same directory as ‘regress.plx’ resides in.
At this point, you should have a working copy of the html regression views.
Additional notes for Debian users: The Perl GD module can be installed
by apt-get install libgd-perl
. It may suffice to add this to
the apache2 configuration:
<Directory "/var/www/regression"> Options +ExecCGI AddHandler cgi-script .plx RedirectMatch ^/regression$ /regression/regress.plx </Directory> |
and then make a link from ‘/var/www/regression’ to the GNU Go
regression directory. The RedirectMatch
statement is only
needed to set up a shorter entry URL.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by root on November 12, 2019 using texi2html 1.82.