The Bro Cluster

This document explains how to setup a Bro Cluster, i.e., a set of commodity PCs jointly analyzing the traffic of a network link. See this RAID Paper for more information about the general architecture.

A cluster consists of four types of components:

One or more frontends. Frontends load-balance the traffic across a set of worker machines.
Worker nodes. Workers are doing the actual analysis, with each seeing a slice of the overall traffic as splitted up by the frontends.
One or more proxies. Proxies relay the communication between worker nodes.
One manager. The manager provides the cluster's user-interface for controlling and logging. During operation, the user only interacts with the manager.

This documents focusses on the installation of manager, workers, and the proxies. See <somewhere different> for a discussion of how to setup a frontend. We also assume a general idea on how to configure Bro in a traditional stand-alone setup.

In this document the terms "manager", "worker", and "proxy" each refers to one Bro instance, not to physical machines. There may be multiple such instances running on the same host. For example, it's typical to run a proxy on the same host as the manager.

This documentation assumes that all cluster systems

are running a Unix. FreeBSD, Linux, and MacOS are supposed to work, though FreeBSD has seen the most testing. Other Unix systems will quite likely require some tweaking. Note that all systems must be running the same operating system.
have Python >= 2.4 installed.
have ssh and rdist installed.
have a user account set up on all nodes which can run Bro and has monitoring access to the network interface. ssh access from the manager to this account must be setup on all machines to work without asking for a password/passphrase.
have some storage available for the cluster on the local disks. In the following we will use /data/cluster as the base path for this. The Bro user must be able to either create this directory or, where it already exists, must have write permission inside this directory on all nodes.

In the following, as an example setup, we will assume that the cluster consists of four machines (not counting the frontend). The host names of the systems are host1, host2, host3, and host4. We will configure the cluster so that host1 runs the manager and the (only) proxy, and host{2,3,4} are each running one worker. This is a typical setup which will work well for many sites.

Cluster Installation

Get the current version of Bro with the cluster shell. Note that at the moment the cluster framework is still under development and requires you to use Robin's development branch. This is made available for testing only and not suitable for production use.

Configure and compile Bro (do not do make install)

> svn checkout http://svn.icir.org/bro/branches/robin/work
> cd work
> ./autogen.sh && ./configure && make

Change into the cluster's distribution directory:
```
> cd aux/cluster
```
Configure the cluster installation with the cluster's base path as —prefix (/data/cluster in our example as discussed above).
```
> ./configure --prefix=<prefix>
```
If your system is set up to compile Python extension modules, build Broccoli's Python module (otherwise some functionality will be disabled). If you're unsure, just try it:
```
> make pybroccoli
```
Install the cluster files on the master:
```
> make install
```
Add <prefix>/bin to your PATH.
Create a cluster configuration file. There is an example which you can edit according to the contained instructions:
```
> cd <prefix>
> cp etc/cluster.cfg.example etc/cluster.cfg
> vi etc/cluster.cfg
```
Create a node configuration file to define where manager, workers, and proxies are to run. There is an example which defines the example scenario described above and can be edited as needed:
```
> cd <prefix>
> cp etc/node.cfg.example etc/node.cfg
> vi etc/node.cfg
```
Create a network configuration file which lists all of the networks which the cluster should consider local to the monitored enviroment. Again there is an example which you can use as a template:
```
> cd <prefix>
> cp etc/networks.cfg.example etc/networks.cfg
> vi etc/networks.cfg
```
Install workers and proxies using the cluster shell:
```
> cluster install
```
This install process uses ssh and rdist to copy the configuration over to the remote machines so, as described above, you need to ensure that these services work before the install will succeed.
Some tasks need to be run on a regular basis. Insert a line like this into the crontab of the user running the cluster:
```
0-59/5 * * * * <prefix>/bin/cluster cron
```

Getting Started

The shell is an interactive interface to the cluster which allows you to, e.g., start/stop the cluster nodes or update their configuration. The shell is started with the cluster script and then expects commands on its command-line:

> cluster
Welcome to BroCluster 0.1

Type "help" for help.

[BroCluster] >

As the message says, type help to see a list of all commands. We will now briefly summarize the most important commands. A full reference follows below.

Once cluster.cfg and node.cfg are set up as described above, the cluster can be started with the start command. This will successively start manager, proxies, and workers. The status command should then show all nodes as operating. To stop the cluster again, issue the stop command. exit leaves the shell.

On the manager system, you find the current set of (aggregated) cluster logs in spool/manager/. Similarly, the workers and proxies log into spool/proxy/ and spool/<worker-name>/, respectively. The manager's logs are archived in logs/, by default once a day. Logs files of workers and proxies are discarded at the same rotation interval.

Whenenver the cluster configuration is modified in any way (including changes to custom or provided policy files and new versions of the cluster environment), install installs the new version. No changes will take effect until install is run. Before you run install, check can be used to check for any potential erros in the new configuration, e.g., typos in scripts. If check does not report any problems, doing install will pretty likely not break anything.

Note that generally configuration changes only take effect after a restart of the affected cluster nodes. The restart command triggers this. Some changes however can be put into effect on-the-fly without restarting any of the nodes by using the update command (again after doing install first). Such dynamic updates work with all changes done via the analysis command (see below) as well as generally with all policy which only modify global variables declared as redefinable (i.e., with Bro's &redef attribute).

Generally, site-specific tuning needs to be done with local policy scripts, as in a single-Bro setup. This is described see below. Some general types of analysis can however be enabled/disabled via the shell's analysis command.

The shell provides various options to control the behaviour of the cluster. These options can be set by editing etc/cluster.cfg. The config command gives list of all options with their current values. A list of the most important options also follows below.

Site-specific Customization

As with a stand-alone setup, you'll likely want to adapt the Bro policy to the local environment. While some types of analysis can be customized via the analysis command, much of the more specific tuning requires writing local policy files.

By default, it is assumed that you put site-specific policy scripts into the policy/local sub-directory inside the manager's base path. To change the location of site policies, set the option SitePolicyPath in cluster.cfg to a different path.

During the first install, sample policy scripts are installed in policy/local which you can edit as appropiate: local-manager.bro and local-worker.bro are loaded by the manager and the workers respectively. In turn, they both load local.bro which contains all configuration code shared by manager and workers. If in doubt, put your customizations into local.bro so that all nodes see it. If you want to change which local scripts are loaded by the nodes, you can set SitePolicyManager for the manager and SitePolicyWorker for the workers.

The main exception to putting everything into local.bro is notice filtering, which should be done only on the manager. The example local-manager.bro comes with an example setup to configure notice policy and notice actions. You will likely want to adapt this to the local environment.

In general, all the cluster's policy scripts are loaded before the any site-specific policy so that you can redefine any of the defaults locally.

Please note that enabling a particular kind of analysis via the shell's analysis command only has an effect if the corresponding scripts are loaded by the local site policy in local.bro.

It is also possible to add additional scripts to individual nodes only. This works by setting the option aux_scripts for the corresponding node(s) in etc/nodes.cfg. For example, one could add a script experimental.bro to a single worker for trying out new experimental code.

Command Reference

The following summary lists all commands supported by the shell. All may be either entered interactively or specificed on the command line. If not specified otherwise, commands taking [<nodes>] as arguments apply their action either to the given set of nodes, or to all nodes if none is given.

analysis enable|disable <type>

This command enables or disables certain kinds of analysis without the need for doing any changes to Bro scripts. Currently, the analyses shown in the table below can be controlled (in parentheses the corresponding Bro scripts; the effect of enabling/disabling is similar to loading or not loading these scripts, respectively). The list will be extended in the future. Any changes performed via this command are applied by update and therefore do not require a cluster restart.

Type	Description
dns	DNS analysis (`dns.bro`)
ftp	FTP analysis (`ftp.bro`)
http-body	HTTP body analysis (`http-body.bro`).
http-request	Client-side HTTP analysis only (`http-request.bro`)
http-reply	Client- and server-side HTTP analysis (`http-request.bro`/`http-reply.bro`)
http-header	HTTP header analysis (`http-headers.bro`)
scan	Scan detection (`scan.bro`)
smtp	SMTP analysis (`smtp.bro`)

attachgdb [<nodes>]

Primarily for debugging, the command attaches a gdb to the main Bro process on the given nodes.

capstats [<interval>] [<nodes>]

Determines the current load on the network interfaces monitored by each of the given worker nodes. The load is measured over the specified interval (in seconds), or by default over 10 seconds. This command requires the capstats tool, which needs to be available in the user's search PATH at the time of install. (Note: When using a CFlow and the CFlow command line utility is installed as well, the capstats command can also query the device for port statistics. TODO: document how to set this up.)

check [<nodes>]

Verifies a modified configuration in terms of syntactical correctness (most importantly correct policy script syntax). This command should be executed for each configuration change before install is used to put the update into place. Note that check is the only command which operates correctly without a former install command; check uses the policy files as found in SitePolicyPath to make sure they compile correctly. If they do, install will then copy them over to an internal place from where the nodes will read them at the next start. This approach ensures that errors in a policy script will not affect currently running nodes, even when one or more of them need to be restarted.

cleanup [—all] [<nodes>]

Removes the nodes' spool directories (if they are not running currently). This implies that their persistent state is flushed. Nodes that were crashed are reset into stopped state. If —all is specified, this command also removes the content of the node's TmpDir, in particular deleteing any data there saved for reference from previous crashes. Generally, if you want to reset the cluster back into a clean state, you can first stop all nodes, then execute cleanup —all, and finally start all nodes again.

config

Prints all configuration options with their current values.

cron [<nodes>]

As the name implies, this command should be executed regularly via cron, as described above. It performs a set of maintainance tasks, including the logging of various statistical information, expiring old log files, checking for dead hosts, and restarting nodes which terminated unexpectedly. While not intended for interactive use, no harm will be caused by executing the command manually: all the maintainance tasks will then just be performed one time more.

df [<nodes>]

Reports the amount of disk space available on the nodes. Shows only paths relevant to the cluster installation.

diag [<nodes>]

If a node has terminated unexpectedly, this command prints a (somewhat cryptic) summary of its final state including excerpts of any stdout/stderr output, resource usage, and also a stack backtrace if a core dump is found. The same information as is sent out via mail when a node is found to have crashed (the "crash report"). While the information is mainly intended for debugging the shell and Bro itself, it can also help to find misconfigurations (which are usually, but not always, caught by the check command).

exec <command line>

Executes the given Unix shell command line on all hosts configured to run at one cluster node. This is handy to quickly perform an action across all cluster systems.

exit

Terminates the shell.

help

Prints a brief summary of all commands understood by the shell.

install [<nodes>]

Reinstalls the given nodes, including all configuration files, policies, scripts, and binaries. This command must be executed after all changes to any part of the cluster configuration/installation, otherwise the modifications will not take effect. Usually all nodes should be reinstalled at the same time, as any inconsistencies between them will lead to strange effects. Before executing install, it is recommended to verify the configuration with check.

netstats [<nodes>]

Queries each of the nodes for their current counts of captured and dropped packets.

nodes

Prints a list of all configured cluster nodes.

peerstatus [<nodes>]

Primarily for debugging, peerstatus reports statistics about the network connections the nodes are using to communicate with other cluster nodes.

print <id> [<nodes>]

Reports the current value of the given Bro script ID on all of the specified nodes (which obviously must be running). This can for example be useful to (1) check that policy scripts are working as expected, or (2) confirm that configuration changes have in fact been applied. Note that IDs defined inside a Bro namespace must be prefixed with <namespace>:: (e.g., print SSH::did_ssh_version to print the corresponding table from ssh.bro.)

quit

Terminates the shell.

restart [<nodes>]

Restarts the given nodes, or the whole cluster if none are specified. The effect is the same as first executing stop followed by a start and giving the same nodes in both cases. This command is most useful to activate any changes made to Bro policy scripts (after running install first). Note that a subset of policy changes can also be installed on the fly via the update, without requiring a restart.

scripts [-p|-c] [<nodes>]

Primarily for debugging Bro configurations, the script command lists all the Bro scripts loaded by each of the nodes in the order as they will be parsed at node startup. If -p is given, all scripts are listed with their full paths. If -c is given, the command operates as check does: it reads the policy files from their original location, not the copies installed by install. The latter option is useful to check a not yet installed configuration.

start [<nodes>]

Starts the given nodes, or the whole cluster if none are specified. Nodes already running are left untouched.

status [<nodes>]

Prints the current status of the given nodes.

stop [<nodes>]

Stops the given nodes, or the whole cluster if none are specified. Nodes not running are left untouched.

top [<nodes>]

For each of the nodes, prints the status of the two Bro processes (parent process and child process) in a top-like format, including CPU usage and memory consumption. If executed interactively, the display is updated frequently until aborted with CTRL-C. If invoked non-interactively, the status is printed only once.

update [<nodes>]

After a change to Bro policy scripts, this command updates the Bro processes on the given nodes while they are running (i.e., without requiring a restart). However, such dynamic updates work only for a subset of Bro's full configuration. The following changes can be applied on the fly: (1) The value of all script variables definedas &redef can be changed; and (2) all configuration changes performed via the analysis command can be put into effect.

More extensive script changes are not possible during runtime and always require a restart; if you change more than just the values of &redef variables and still issue update, the results are undefined and can lead to crashes. Also note that before running update, you still need to do an install (preferably after check), as otherwise update will not see the changes and resend the old configuration.

Option Reference

This section summarizes the options that can be set in etc/cluster.cfg for customizing the behaviour of the shell. Usually, one only needs to change the "user options", which are listed first. The "internal options" are, as the name suggests, primarily used internally and set automatically. They are documented here only for reference.

User Options

AuxScriptsManager (string, default empty): Additional Bro scripts loaded on the manager, separated by spaces.
AuxScriptsStandalone (string, default empty): Additional Bro scripts loaded on a standalone Brothe manage, separated by spaces.
AuxScriptsWorker (string, default empty): Additional Bro scripts loaded on the workers, separated by spaces.
BroArgs (string, default empty): Additional arguments to pass to Bro on the command-line.
CFlowAddr (string, default empty): If a cFlow load-balander is used, the address of the device (format: <ip>:<port>).
CFlowPassword (string, default empty): If a cFlow load-balander is used, the password for accessing its configuration interface.
CFlowUser (string, default empty): If a cFlow load-balander is used, the user name for accessing its configuration interface.
CustomInstallBin (string, default empty): Additional executables to be installed into ${BinDir}, including full path and separated by spaces.
Debug (bool, default 0): Enable extensive debugging output in spool/debug.log.
HaveNFS (bool, default 0): True if shared files are mounted across all nodes via NFS (see FAQ).
LogDir (string, default "${BroBase}/logs"): Directory for archived log files.
LogExpireInterval (int, default 30): Number of days log files are kept.
MailAlarmPrefix (string, default "ALERT:"): Subject prefix for individual alerts triggered by NOTICE_EMAIL.
MailAlarms (bool, default 1): True if Bro should send mails for NOTICE_EMAIL alerts.
MailFrom (string, default "Big Brother <bro@localhost>"): Originator address for cluster-generated mails.
MailReplyTo (string, default empty): Reply-to address for cluster-generated mails.
MailSubjectPrefix (string, default "[Bro]"): General Subject prefix for cluster-generated mails.
MailTo (string, default "<user>"): Destination address for cluster-generated mails.
MinDiskSpace (int, default 5): Percentage of minimum disk space available before warning is mailed.
Prefixes (string, default "local"): Additional script prefixes for Bro, separated by colons. Use this instead of @prefix.
SaveTraces (bool, default 0): True to let backends capture short-term traces via -w. These are not archived but might be helpful for debugging.
SendMail (bool, default 1): True if shell may send mails.
SitePolicyManager (string, default "local-manager.bro"): Local policy file for manager.
SitePolicyPath (string, default "${BroBase}/policy/local"): Directories to search for local policy files, separated by colons.
SitePolicyStandalone (string, default "local.bro"): Local policy file for standalone Bro.
SitePolicyWorker (string, default "local-worker.bro"): Local policy file for workers.
TimeFmt (string, default "%d %b %H:%M:%S"): Format string to print data/time specifications (see man strftime).

Internal Options

AnalysisCfg (string, default "${CfgDir}/analysis.dat"): Configuration file defining types of analysis which can be toggled on-the-fly.
BinDir (string, default "${BroBase}/bin"): Directory for executables file.
BroBase (string, default empty): Base path of cluster installation on all nodes.
Capstats (string, default empty): Path to capstats binary; empty if not available.
CfgDir (string, default "${BroBase}/etc"): Directory for configuration files.
ClusterCfg (string, default "${CfgDir}/config.cfg"): Primary cluster configuration file.
DebugLog (string, default "${SpoolDir}/debug.log"): Log file for debugging information.
DistDir (string, default empty): Path to Bro distribution directory.
HaveBroccoli (bool): True if Broccoli interface is available.
HelperDir (string, default "${BroBase}/bin/cluster-helper"): Directory for cluster helper scripts.
LibDir (string, default "${BroBase}/lib"): Directory for library files.
LocalNetsCfg (string, default "${CfgDir}/networks.cfg"): File definining the local networks.
LockFile (string, default "${SpoolDir}/lock"): Lock file preventing concurrent shell operations.
NodeCfg (string, default "${CfgDir}/node.cfg"): Node configuration file.
OS (string, default empty): Name of operation systems as reported by uname.
PolicyDir (string, default "${BroBase}/policy"): Directory for standard policy files.
Scripts-Manager (string, default "cluster-manager"): Bro scripts loaded on the manager, separated by spaces.
Scripts-Proxy (string, default "cluster-proxy"): Bro scripts loaded on the proxies, separated by spaces.
Scripts-Standalone (string, default "standalone"): Bro scripts loaded on a standalone Bro, separated by spaces.
Scripts-Worker (string, default "cluster-worker"): Bro scripts loaded on the workers, separated by spaces.
SpoolDir (string, default "${BroBase}/spool"): Directory for run-time data.
StandAlone (bool, default 0): True if running in stand-alone mode (see elsewhere).
StateFile (string, default "${SpoolDir}/cluster.dat"): File storing the current cluster state.
StatsLog (string, default "${SpoolDir}/stats.log"): Log file for statistics.
Time (string, default empty): Path to time binary.
TmpDir (string, default "${SpoolDir}/tmp"): Directory for temporary data.
TmpExecDir (string, default "${SpoolDir}/tmp"): Directory where binaries are copied before execution.
TraceSummary (string, default empty): Path to trace-summary script; empty if not available.
Version (string, default empty): Version of the cluster shell.

Miscellanous

Mails

The cluster sents four types of mails to the address given in MailTo:

When logs are rotated (default: once a day), a list of all alerts during the last rotation interval is sent. This can be disabled by setting MailAlarms=0.
When the cron command noticies that a node has crashed, it restarts it and sends a notification. It may also send a more detailed crash report containing information about the crash.
NOTICES with a notice action of NOTICE_EMAIL; see the Bro documentation for how to configure notices priorities.
If trace-summary is installed, a traffic summary is sent each rotation interval.

Performance Analysis

TODO: cluster cron logs quite a few statistics which can be analyzed/plotted for understanding the cluster's run-time behaviour.

Standalone Mode

The cluster shell does not actually need to control a full cluster but can also be used to operate just a traditional single Bro instance on the local machine. To faciliate this, the shell has a special standalone mode. Using the terminology of the Bro cluster, in this mode the single Bro acts as both manager and worker (and there's obviously no need for a proxy). If the standalone mode turns out to work well, it might eventually replace the BroLite framework which you currently get by doing make install-brolite (see the user manual). BroLite is no longer maintained.

Setting up a standalone installation is pretty easy:

Get the right Bro version and compile Bro, as described above.

Change into the cluster's distribution directory, configure the cluster framework for standalone operation and install it:

> cd aux/cluster
> ./configure --standalone --prefix=/usr/local/bro
> make pybroccoli (*)
> make install

(*) Skip this if you're not set up to compile Python extension modules
    (you can just try it if you aren't sure).

Add <prefix>/bin to your PATH.
Different than when doing a full cluster installation, the standalone mode automatically installs suitable default configuration files. Initially, you need to make only two changes:
1. edit the line interface in <prefix>/etc/node.cfg to tell Bro which network interface it should monitor; and
2. add a list of your local networks to <prefix>/etc/networks.cfg.
Install the changed configuration:
```
> cluster install
```
Now you can start the standalone Bro:
```
> cluster start
```
A default policy is installed in <prefix>/policy/local/local.bro, which you should edit.
Some tasks need to be run on a regular basis. Insert a line like this into the crontab of the user running the Bro:
```
0-59/5 * * * * <prefix>/bin/cluster cron
```
Everything else works just as it does in a "real" cluster setup, including configuration, mail notifications, log archival, and dynamic updates.

Warning

Please note that at the moment the standalone mode is less tested than full cluster setups. The cluster shell is still under development and there might be some quirks (also, but not only, with respect to platform portability). Feel free to send a mail to the Bro mailing list if you encounter any problems.

Questions and Answers

Can I use an NFS-mounted partition as the cluster's base directory to avoid the rsync'ing?

Yes. BroBase can be on an NFS partition. Configure and install the shell as usual with —prefix=<BroBase>. Then add HaveNFS=1+ and SpoolDir=<spath> to etc/cluster.cfg, where <spath> is a path on the local disks of the nodes; <spath> will be used for all non-shared data (make sure that the parent directory exists and is writable on all nodes!). Then run install again. Finally, you can remove <BroBase>/spool (or link it to <spath>). In addition, you might want to keep the log files locally on the nodes as well by setting LogDir+ to a non-NFS directory. Usually only the manager's logs are interesting. (In some later version, the default will likely be to not archive worker/proxy logs at all.)
When I'm using the standalone mode, do I still need to have ssh and rsync installed and configured?

No. The standalone performs all operations directly on the local file system.