Configuring the arbtt categorizer (arbtt-stats)

Once arbtt-capture is running, it will record data without any configuration. And only to analyze the recorded data, one needs to configure the categorizer. Everytime the categorizer (arbtt-stats) runs, it applies categorization rules to all recorded data and tags it accordingly. Thus, if you improve your categorization rules later, they will apply also to all previous data samples!

Configuration example

The configuration file needs to be placed in ~/.arbtt/categorize.cfg. An example is included in the source distribution, and it is reproduced here: see Example 1, “categorize.cfg. It should be more enlightening than a formal description.

Example 1. categorize.cfg

-- Comments in this file use the Haskell syntax:
-- A "--" comments the rest of the line.
-- A set of {- ... -} comments out a group of lines.

-- This defines some aliases, to make the reports look nicer:
aliases (
	"sun-awt-X11-XFramePeer"  -> "java",
	"sun-awt-X11-XDialogPeer" -> "java",
	"sun-awt-X11-XWindowPeer" -> "java",
	"gramps.py"               -> "gramps",
	"___nforschung"           -> "ahnenforschung",
	"Pidgin"                  -> "pidgin"
	)

-- A rule that probably everybody wants. Being inactive for over a minute
-- causes this sample to be ignored by default.
$idle > 60 ==> tag inactive,

-- A rule that matches on a list of strings
current window $program == ["Navigator","galeon"] ==> tag Web,

current window $program == "sun-awt-X11-XFramePeer" &&
current window $title == "I3P"
  ==> tag Program:I3P,

current window $program == "sun-awt-X11-XDialogPeer" &&
current window $title == " " &&
any window $title == "I3P"
  ==> tag Program:I3P,

-- Simple rule that just tags the current program
tag Program:$current.program,

-- Another simple rule, just tags the current desktop (a.k.a. workspace)
tag Desktop:$desktop,

-- I'd like to know what evolution folders I'm working in. But when sending a
-- mail, the window title only contains the (not very helpful) subject. So I do
-- not tag necessarily by the active window title, but the title that contains
-- the folder
current window $program == "evolution" &&
any window ($program == "evolution" && $title =~ /^(.*) \([0-9]+/)
  ==> tag Evo-Folder:$1,

-- A general rule that works well with gvim and gnome-terminal and tells me
-- what project I'm currently working on
current window $title =~ m!(?:~|home/jojo)/projekte/(?:programming/(?:haskell/)?)?([^/)]*)!
  ==> tag Project:$1,
current window $title =~ m!(?:~|home/jojo)/debian!
  ==> tag Project:Debian,

-- This was a frequently looked-at pdf-File
current window $title =~ m!output.pdf! &&
any window ($title =~ /nforschung/)
  ==> tag Project:ahnenforschung,


-- My diploma thesis is in a different directory
current window $title =~ [ m!(?:~|home/jojo)/dokumente/Uni/DA!
                         , m!Diplomarbeit.pdf!
                         , m!LoopSubgroupPaper.pdf! ]
  ==> tag Project:DA,

current window $title =~ m!TDM!
  ==> tag Project:TDM,

( $date >= 2010-08-01 &&
  $date <= 2010-12-01 &&
  ( current window $program == "sun-awt-X11-XFramePeer" &&
      current window $title == "I3P" ||
    current window $program == "sun-awt-X11-XDialogPeer" &&
      current window $title == " " &&
      any window $title == "I3P" ||
    current window $title =~ m!(?:~|home/jojo)/dokumente/Uni/SA! ||
    current window $title =~ m!Isabelle200! ||
    current window $title =~ m!isar-ref.pdf! ||
    current window $title =~ m!document.pdf! ||
    current window $title =~ m!outline.pdf! ||
    current window $title =~ m!Studienarbeit.pdf! )
) ==> tag Project:SA,


-- Out of curiosity: what percentage of my time am I actually coding Haskell?
current window ($program == "gvim" && $title =~ /^[^ ]+\.hs \(/ )
  ==> tag Editing-Haskell,

{-
-- Example of time-related rules. I do not use these myself.

-- To be able to match on the time of day, I introduce tags for that as well.
-- $time evaluates to local time.
$time >=  2:00 && $time <  8:00 ==> tag time-of-day:night,
$time >=  8:00 && $time < 12:00 ==> tag time-of-day:morning,
$time >= 12:00 && $time < 14:00 ==> tag time-of-day:lunchtime,
$time >= 14:00 && $time < 18:00 ==> tag time-of-day:afternoon,
$time >= 18:00 && $time < 22:00 ==> tag time-of-day:evening,
$time >= 22:00 || $time <  2:00 ==> tag time-of-day:late-evening,

-- This tag always refers to the last 24h
$sampleage <= 24:00 ==> tag last-day,

-- To categorize by calendar periods (months, weeks, or arbitrary periods),
-- I use $date variable, and some auxiliary functions. All these functions
-- evaluate dates in local time. Set TZ environment variable if you need
-- statistics in a different time zone.

-- You can compare dates:
$date >= 2001-01-01 ==> tag this_century,
-- You have to write them in YYYY-MM-DD format, else they will not be recognized. 

-- “format $date” produces a string with the date in ISO 8601 format
-- (YYYY-MM-DD), it may be compared with strings. For example, to match
-- everything on and after a particular date I can use
format $date =~ ".*-03-19"  ==> tag period:on_a_special_day,
-- but note that this is a rather expensive operation and will slow down your
-- data processing considerably.

-- “day of month $date” gives the day of month (1..31),
-- “day of week $date” gives a sequence number of the day of week
-- (1..7, Monday is 1):
(day of month $date == 13) && (day of week $date == 5) ==> tag day:friday_13,

-- “month $date” gives a month number (1..12), “year $date” gives a year:
month $date == 1 ==> tag month:January,
month $date == 2 ==> tag month:February,
year $date == 2010 ==> tag year:2010,
-}

}

The semantics (informal)

A data sample consists of the time of recording, the time passed since the user’s last action, the name of the current workspace and the list of windows. For each window this information is available:

  • the window title
  • the program name
  • whether the window was the active window

Based on this information and on the rules in categorize.cfg, the categorizer (arbtt-stats) assigns tags to each sample.

A simple rule consists of a condition followed by an arrow (==>) and a tag expression (tag keyword followed by tag name). The rule ends with a coma (,).

The keyword tag, usually preceded with a condition, assigns a tag to the sample; tag keyword is followed by a tag name (any sequence of alphanumeric symbols, underscores and hyphens). If tag name contains a colon (:), the first part of the name before the colon, is considered to be tag category.

For example, this rule

month $date == 1 ==> tag month:January,

if it succeeds, assigns a the tag January in the category month.

If the tag has a category, it will only be assigned if no other tag of that category has been assigned. This means that for each sample and each category, there can be at most only one tag in that category. Tags can contain references to group matches in the regular expressions used in conditions ($1, $2)...). Tags can also reference some variables such as window title ($current.title) or program name ($current.program).

The variable $idle contains the idle time of the user, measured in seconds. Usually, it is used to assign the tag inactive, which is handled specially by arbtt-stats, as can be seen in Example 1, “categorize.cfg.

When applying the rules, the categorizer has a notion of the window in scope, and the variables $title, $program and $active always refer to the window in scope. By default, there is no window is in scope. Condition should be prefixed with either current window or any window, to define scope of these variables.

The name of the current desktop (or workspace) is available as $desktop.

For current window, the currently active window is in scope. If there is no such window, the condition is false.

For any window, the condition is applied to each window, in turn, and if any of the windows matches, the result is true. If more than one window matches it is not defined from which match the variables $1... are taken from (see more about regular expressions below).

The variable $time refers to the time-of-day of the sample (i.e. the time since 00:00 that day, local time), while $sampleage refers to the time span from when the sample was recored until now, the time of evaluating the statistics. The latter variable is especially useful when passed to the --filter option of arbtt-stats. They can be compared with expressions like "hh:mm", for example

$time >=  8:00 && $time < 12:00 ==> tag time-of-day:morning

The variable $date referes to the date and time of the recorded sample. It can be compared with date literals in the form YYYY-MM-DD (which stand for midnight, so

$date ==
      2001-01-01

will not do what you want, but

$date >= 2001-01-01 && $date <= 2001-01-02

would). All dates are evaluated in local time.

Expression format $date evaluates to a string with a date formatted according to ISO 8601, i.e. like "YYYY-MM-DD". The 19th of March 2010 is formatted as "2010-03-19". Formatted date can be compared to strings. Formatted dates may be useful to tag particular date ranges. But also note that this is a rather expensive operation that can slow down your data processing.

Expression month $date evaluates to an integer, from 1 to 12, corresponding to the month number. Expression year $date evaluates to an integer which is a year number. Expression day of month $date evaluates to an integer, from 1 to 31, corresponding to the day of month. Expression day of week $date evaluates to an integer, from 1 to 7, corresponding to the day of week, Monday is 1, Sunday is 7. These expressions can be compared to integers.

Expressions can be compared to literal values with == (equal), /= (not equal), <, <=, >=, > operators. String expressions ($program, $title) can be matched against regular expressions with =~ operator. With these operatorions, the right hand side can be a comma-separated list of literals enclosed in square brackets ([ ..., ..., ]), which succeeds if any of them succeeds.

Regular expressions are written either between slashes (/ regular expression /), or after a letter m followed by any symbol (m c regular expression c, where c is any symbol). The second appearance of that symbol ends the expression. You can find both variants in Example 1, “categorize.cfg.

Complex conditions may be constructed from the simpler ones, using Boolean AND (&&), OR (||), and NOT (!) functions and parentheses.

The syntax

categorize.cfg is a plain text file. Whitespace is insignificant and Haskell-style comments are allowed. A formal grammar is provided in Figure 1, “The formal grammar of categorize.cfg.

Figure 1. The formal grammar of categorize.cfg

[1]Rules::= [ AliasSpec ] Rule ( (, Rule)* | ( ; Rule)* )  
[2]AliasSpec::=aliases ( Alias (, Alias)* )  
[3]Alias::=Literal -> Literal 
[4]Rule::={ Rules } |
Cond ==> Rule | if Cond then Rule else Rule |
tag Tag
 
[5]Cond::=( Cond ) |
! Cond | Cond && Cond | Cond || Cond |
$active |
String CmpOp String |
String CmpOp [ ListOfString ] |
String =~ RegEx |
String =~ [ ListOfRegex ] |
Number CmpOp Number |
TimeDiff CmpOp TimeDiff |
Date CmpOp Date |
current window Cond |
any window Cond
 
[6]String::= $title |
$program |
$desktop |
format Date |
" string literal "
 
[7]ListOfString::= " string literal " |
" string literal " , ListOfString
 
[8]Number::= $idle |
day of week Date |
day of month Date |
month Date |
year Date |
number literal
 
[9]Date::= $date  
[10]TimeDiff::= $time |
$sampleage |
( Digit )* Digit : Digit Digit
 
[11]Tag::= [ Literal : ] Literal  
[12]RegEx::= / Literal / | m c Literal c /* Where c can be any character. */
[13]ListOfRegex::= " RegEx " |
" RegEx " , ListOfRegex
 
[14]CmpOp::=<= | < | == | > | >= 

A String refers to a double-quoted string of characters, while a Literal is not quoted. Tags may only consist of letters, dashes and underscores, or variable interpolations. A Tag maybe be optionally prepended with a category, separated by a colon. The category itself follows he same lexical rules as the tag. A variable interpolation can be one of the following:

$1, $2,...
will be replaced by the respective group in the last successfully applied regular expression in the conditions enclosing the current rule.
$current.title, $current.program
will be replaced by title the currently active window, resp. by the name of the currently active program. If no window happens to be active, this tag will be ignored.

A regular expression is, like in perl, either enclosed in forward slashes or, alternatively, in any character of your choice with an m (for match) in front. This is handy if you need to use regular expressions that match directory names. Otherwise, the syntax of the regular expressions is that of perl-compatible regular expressions.