Next: , Previous: Scripts, Up: Top

6 Data Analysis with Getstats

The basic Auto-Pilot work flow is essentially that Auto-pilot produces two files for each experiment (as defined by a TEST directive in your .ap file). The first file is a log of stdout and stderr, with the suffix .log. No processing is done on this file—it is just there so you can take a look at what happened (for example, if something broke or you can't explain your results). The second file is more interesting—it is the results file which has the suffix .res. The results file contains information that the benchmarker deems worthy of automatic extraction (the default Auto-pilot scripts record time and optionally several other quantities, see Included Script Plugins).

The results files are made up of blocks, which start with [blockname], and then contain simple text (for example, the system snapshot contains the output of various commands, without any special formatting). Each command that is measured with the ap_measure function creates a measurement block, which looks like the following:

     thread = 2
     epoch = 2
     command = postmark /tmp/postmark_config-9868
     user = 0.200000
     sys = 1.170000
     elapsed = 5.470682
     status = 0

Using a measure hook, arbitrary fields can be added to this measurement. The Auto-Pilot distribution has two measure hooks distributed by default. The first keeps track of the number of SCSI commands queued on Adaptec SCSI cards. The second determines the amount of CPU time used by all processes in the system, which can be compared with the amount of CPU time your benchmark used. These two hooks have proven useful to investigate possible anomalies. These hooks in @pkgdatadir@/commonsettings.d can be used as samples for creating your own measurement hooks. By default @pkgdatadir@ is /usr/local/share/auto-pilot.

After you have these results files, you can pass them through the Getstats program. Getstats is an automated and powerful way to transform the results files into nicely formatted tables, and to compare two different results files.

If you just want to know how to use Getstats and not how it works then read only this Chapter. If you want to understand the internals, and perform more complex transformations, then you should read Getstats Internals.

Getstats starts off by processing its command line. The command line consists of options and transformations, followed by a list of files to read.

Getstats takes each transformation that is specified on the command line and pushes it onto a stack (@TRANSFORMS) for later use.

Each file that is specified on the command line is parsed into a two dimensional array. You can specify either a CSV file, an Auto-pilot results file, or a sequence of GNU time output. Getstats automatically determines the right file type and parse the file. Getstats Parsers describes how to add your own parser. The two dimensional array is a relation that Getstats then manipulates. The relation consists of labels (or names) for each field, and then rows (or tuples) with a value for each field.

Each individual transformation on the @TRANSFORMS stack is done to each of the relations, in turn. If two transformations are specified A and B; and there are two files R and S, then A is applied to R, A is applied to S, B is applied to R, and finally B is applied to S.

This section describes how to produce tabular reports, how to convert a results file into a CSV file, and how to do simple hypothesis testing with Getstats.