Changes between 2.8.1 and 2.8.2 ===================================================================== stats: + No changes. dumpdiff: + No changes. lesq: + No changes. transf: + The 'transform' tool has been renamed 'transf' because the name conflicts with some GNU tool named 'transform'. + Added the 'sum' transformation. Changes between 2.8 and 2.8.1 ===================================================================== stats: + Fixed a stupid assumption about strtok() that caused a segfault. Ethan Blanton tripped on this one and fixed it. dumpdiff: + No changes. lesq: + No changes. transform: + No changes. Changes between 2.7 and 2.8 ===================================================================== stats: + Bugfix: only seed the random generator once when sampling data (!). + Re-fector the sampling routine to do less work on every value run through the sampler. + Added the "-p" option to stats to crunch data for whatever can be had in a single pass---i.e., the data is not stored. This speeds up processing large data files for simple statistics (but means there are functions we cannot undertake, e.g., percentiles, IQR, CDFs, etc.). + Added the "-F" option to stats to allow setting the numeric output format for basic statistics. E.g., "%0.2f". dumpdiff: + No changes. lesq: + No changes. transform: + No changes. Changes between 2.6 and 2.7 ===================================================================== stats: + Added the "all" argument to the "perc" command. + Basic usage message now goes to stderr instead of stdout. (Flagged by Ethan Blanton.) + Memory now freed after calculating the MAD. Oops. + Made some internal changes to sorting (which is now faster): - Data is tracked as it is read in to determine whether it is already sorted. If so, we don't sort it. - Use qsort(3) instead of the internal version we have been using forever. (In response to (a) a bug reported by Matt Sargent whereby we are apparently blowing up the stack with a massive amount of recursion while trying to sort an already sorted sample and (b) a suggestion from Ethan Blanton that qsort(3) is likely iterative and a better choice.) + Removed the old code to transform data within stats. The "transform" tool now has done this for a long time. And, the code has been commented out for a long time, so this is just cleanup. dumpdiff: + No changes. lesq: + No changes. transform: + No changes. Changes between 2.5 and 2.6 ===================================================================== General issues: + The "transform" functionality of the "stats" tool has been seperated out into an independent tool called "transform". Its initial functionality is exactly that of the previous transform command. + Factored some similar bits of various tools into util.c instead of repeating the code in each tool. stats: + Added the "p1" and "p5" basic commands to dump the 1st and 5th percentiles. + The usual basic listing now includes the 1st, 5th, 95th and 99th percentiles. + Fixed some slight bogosity in the CDF & CCDF generation. (Bug reported by Tom Callahan.) dumpdiff: + Added the -d option to dumpdiff to dump the differences as well as the raw data. + Added the -i option to dumpdiff to set an initial value for the difference to dump for the first sample. lesq: + No changes. transform: + New tool that captures the functionality of the old 'transform' command in stats. + Added the "sample" transform to randomly sample out points at a given probability. Changes between 2.4 and 2.5 ===================================================================== stats: * Added an alias for the "standd" basic variable -- "stddev" now also works. * The transform command now takes an optional "-d" argument. When given -d causes the dataset to be displayed immediately after the transform. * New transforms: average: each value if replaced with the average of previous values in the list. ceil: replace each value with the ceiling of the value. div: divides all values in the dataset by some given value. (This is essentially the same as the "scale" function -- using the recipricol of the given value instead of the value itself. This is just for convenience.) ewma: each value is replaced with the exponentially weighted moving average of the previous values in the dataset. floor: replace each value with the floor of the value. mod: run modulo on each value. relative: transform all values in the dataset by the first value (x_i - x_0). slide-win: replace each value with the average of the previous window of values. [Some of these transforms are sort of non-sensical in the context of stats because stats keeps all data in sorted order (most of the time). This limits the use of a number of the transforms. (Although, I will note that I add transforms as I need them and so I have used these in some context.) The plan is to rip out the transform functionality into a separate tool that will not sort data. That should increase the usefulness of some of these things. (The alternative is to make stats deal with unordered data, but that complicates a range of functionality that expects ordered data now.)] * Added "zero supression" for frequency tables. This is enabled by adding a "z" to the command line option (so, e.g., "-thz") or by specifying the "zero" modifier on the command interactively or from a script. * Added the p25, p75, p95 and p99 basic statistics for dumping percentiles. * Added the coefficient of variation basic statistic. (Contributed by Wes Eddy.) * Added the "sort" command (mostly to re-sort after the mod transform). * Internal changes: Changed the new memory increment from linear to exponential by doubling the increment every time new memory is needed. General cleanup -- especially of the code that produces basic statistics. * Bug fixes: Reading data from files in interactive mode is now more robust (i.e., the program does not exit() when a non-existent file is accessed!). (From Joe Ishac.) dumpdiff: * Added the "-r" command to dumpdiff to produce ratios (current_value / previous_value) between subsequent values rather than differences. lesq: None. Changes between 2.4 and 2.4.1 ===================================================================== stats: * Fixed a bug in the CDF and CCDF generation. In an effort to try to save data points stats only dumped one point per x-value. However, that distorts plots. What really should happen is that the first and last y-value for every x-value should be dumped. That is now happening. (Found by Ethan Blanton.) Changes between 2.3 and 2.4 ===================================================================== stats: * Added "incr" and "power" transform functions. * Added the "trim" command to winnow datasets. * New "basic" statistics: inter-quartile range (IQR) median absolute deviation (MAD) * Fixed a bug whereby the program dumped core if the data file given was non-existent. (Found by Joseph Ishac.) * Added the -C option to take a single command on the command-line. * Added the -M option (new version) that controls the increment of the realloc() function call. That is, how quickly the array grows. (This can be quite important for very large datasets.) * Added the ability to generate a complimentary CDF plot with the "ccdf" command or the "-cc" command-line option. * Fixed up the command line processing a bit. It should now be more robust to dumb user input (i.e., mine!). * The cdf (and ccdf) generation now consolodates data points. So, if some X appears in the dataset multiple times only one value is dumped. This saves space in the resulting output file and also makes subsequent analysis (e.g., plotting) easier. (It is not clear that this is always the right approach and future work may include an option to dump all the points.) * Added the "-o" option to stats. This option is used to dump all basic statistics to a single line (for ease of parsing by other scripts/programs.) lesq: * Introduced in 2.4. dumpdiff: * Introduced in 2.4. Non-default Items ===================================================================== * Added a new tool to perform the ANOVA test. This is not built into the main tool and must be compiled specifically with "make anova" at the moment. This may change later after the tool is tested a bit more. (This tool is basically not wrung out and you use it at your own risk.) * A start at incorporating the Anderson-Darling goodness of fit test. Currently must define "IMPL_A2" at compile time to use. This test works (the code is from RFC 2330), but in running the test the array is changed. So, it is not built by default until I can figure out some sane way to deal with this oddity.