
/*
 * Copyright (c) 2001 BBNT Solutions LLC
 *
 * Permission to use, copy, modify, and distribute this software
 * and its documentation for any purpose is hereby granted without
 * fee, provided that the above copyright notice and this permission
 * appear in all copies and in supporting documentation, and that the
 * name of BBN Technologies not be used in advertising or publicity
 * pertaining to distribution of the software without specific,
 * written prior permission.  BBN makes no representations about the
 * suitability of this software for any purposes.  It is provided "AS
 * IS" without express or implied warranties.
 */

/*
 * Copyright (c) 2006--2012 International Computer Science Institute
 * 
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation files
 * (the "Software"), to deal in the Software without restriction,
 * including without limitation the rights to use, copy, modify, merge,
 * publish, distribute, sublicense, and/or sell copies of the Software,
 * and to permit persons to whom the Software is furnished to do so,
 * subject to the following conditions:
 * 
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 * 
 * The names and trademarks of copyright holders may not be used in
 * advertising or publicity pertaining to the software without specific
 * prior permission. Title to copyright in this software and any
 * associated documentation will at all times remain with the copyright
 * holders.
 * 
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
 * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
 * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
 * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 * SOFTWARE.
 */

stats v2.7
Mark Allman (mallman@icir.org)
May 2012

The aim of this program is to do small amounts of data manipulation
and provide basic statistics for datasets.  For in-depth statistical
analysis you should probably look at another utility.  This program
started as a small pile of perl when I was an undergrad (stats v0),
progressing to a nice little C utility when I was in grad school
(stats v1) and then to a slightly nicer and more featureful pile of
C (if there is such a thing!) after grad school.

If you add something useful to the program, please drop me a note
and I'll add it to the code base.  If you find a bug, I'd appreciate
hearing about that, as well.

The original "stats" program has been augmented over the years by
several additional small tools that are included.  Each tool is
described below.  

Building is as easy as running "make".  The tool requires the GNU
"readline" library -- which I assume is fairly standard these days.
After building you can re-locate the binaries to wherever you like
to keep such things (~/bin, /usr/local/bin, etc.).

I routinely use stats under FreeBSD and OSX.  I and have also used
it under NetBSD, Linux and Solaris in the past.  I expect that it
should compile and run fine under any Unix variant.

Changes since version 2.4 are outlined in the ChangeLog file.  In
addition, there are several possibly useful items that are not
compiled by defualt that are enumerated at the end of the ChangeLog,
for the interested.



STATS
=====================================================================

The tool expects input to be one data point per line.  When using
the command line the "-" denotes that the input should come from
standard input.  

Internally the data is kept in a big array of doubles.  This makes
lots of things easy (such as finding percentiles).  However, the
downside is that the memory usage is somewhat gross and since I
dynamically increase the size of the array the program can be slow
at times.  If you have a lot of data and know it you can use the "-M
X" option to give the tool a hint about the number of data items and
therefore boost performance in that realloc() will not be called as
much. 

Stats has three basic modes of operation.  Each will be explained in
turn.  

    (1) Interactive
        
	Type "help" at the stats command line for a list of
        commands. 

    (2)	Batch mode (or command line mode)
        
	Type "stats -h" in the shell for command-line usage
        instructions. 

    (3) Script mode

	You can write a script containing any of the commands you
        would give in interactive mode.  Invoking "stats -f
        scriptname" will then execute the stats script.

One of the bugs is that there is very little documentation.
However, after figuring out a few things I am sure you'll agree that
it is a fairly straightforward and flexible utility to use.  I hope
it serves you well.



LESQ
=====================================================================

This tool takes (x,y) points from the input file(s) (formatted as
"x<space>y") and performs Least Squares fitting of the data to
produce the equation for the line that characterizes the data.  The
arguments on the command line are files to process (with "-"
indicating that the tool should read from standard input).



DUMPDIFF
=====================================================================

This tool dumps the difference between subsequent samples of the
input files.  The format of the input files is one data point per
line.  The differences are written to standard output by default or
some file when using the "-o filename" option.  The input comes from
the filename(s) listed on the command line (or, standard input if
"-" is given).



TRANSFORM
=====================================================================

This tool takes input from files with one data point per line and
then transforms these values by applying a user-specified function.
There are a number of functions provided and more can be added
fairly easily.  As with the other tools, "-" indicates that input
should be taken from standard input.
