RAMBIN HOWTO : Programs

4. Programs

The programs in the RAMBIN suite are described one by one. The programs are grouped into two categories.

4.1 Tools

`average`

Usage: average [-d] [input-file] [output-file]

Calculates the average of a set of numbers.

average calculates the average of a set of numbers; the -d option gives detailed information including the standard deviation and the accuracy.

`bootstrap`, `bootstrap_lines`

Usage: bootstrap [input-file] [output-file] [output-data-size] [seed]

Bootstraps a set of values or lines.

bootstrap uses a statistical technique known as "bootstrapping" (duh!), a Stanford-based invention, to randomly generate arbirary sets of data from existing sets. bootstrap operates on a file contain a set of floating-point values and generates an equal number of values (default) by randomly selecting from the initial set of values. bootstrap_lines operates on any ASCII file and randomly selects an equal number of lines (default) to output from the initial set of lines.

The number of values/lines output can be changed by specifying a value for the optional argument output-data-size. This can be used to generate smaller or larger subsets from the initial set. The default seed used for the random number generator is the value returned by time(). The seed can also be specified with the optional seed argument for reproducable results.

The reason to bootstrap is to assess the statistical validity to the data. When one has a powerful hammer, everything looks like a nail. Bootstrapping is a powerful technique, and should be used wisely.

`ccoef`, `lsqr`

Usage: ccoef|lsqr [input-file] [output-file]

Calculates the correlation coefficient of pairs of numbers.

ccoef calculates the correlation coefficient of pairs of numbers specified in two column format. When invoked as lsqr detailed information, including the least squares fitting slope and its standard deviation, are output.

`clog`, `log10`, `loge`

Usage: clog [value | input-file] [output-file]

Calculates the logarithm.

clog calculates the log of a value or a set of values. When invoked as loge, log base e is calculated. When invoked as log10, log base 10 is calculated.

`compare_numbers, gt, gt1, gte, gte1, lt, lt1, lte, lte1, eq, eq1`

Usage: compare_numbers [value | input-file] [compare-value] [output-file] [comparison-operator]

Compare pairs of numbers and return a truth value.

compare_numbers compares pairs of numbers and returns a truth value. The usage of this program is not standard, but it is done in the interest of being able to freely pipe values to the program. Normally, compare_numbers should be invoked as one of the comparison operators:


gt   - true if greater than
gt1  - true if greater than one
gte  - true if greater than equal to
gte1 - true if greater than equal to one
lt   - true if lesser than
lt1  - true if lesser than one
lte  - true if lesser than equal to
lte1 - true if lesser than equal to one
eq   - true if equal
eq   - true if equal to one

As the number of arguments for the operator depend on the specific operation (for example, gt1 requires only one argument whereas gt requires two arguments), the program needs only one argument for it to produce a result (truth value). However, in the cases where the operator requires two arguments, and only one is specified, the second argument is assumed to be one. Also, when comparison-operator is explicitly specified, it takes precendence over the argv[0] variable which is used to determine the comparison operator (when one is not specified).

Examples:


gt 1 2 foo gte

will invoke gt to compare the numbers 1 and 2, and output the result to the file foo, but since gte is given as the comparison operator, it will override the argv[0] specification and will see if 1 is greater than or equal to 2.

lt1

will take input numbers from stdin and see whether they are lesser than one and output the number of times the truth value of that comparison was 1 (true).


eq1 foo 2 bar

will take the input number from the file foo compare them to one, and output the number of times the truth value of that comparison was 1 (true) to the file bar. In this particular case, the third argument, 2, is ignored but does need to be specified for the output to occur in the file bar.

`compound`

Usage: compound [-a] value percent [iterations] [output-file]

Compounds a value given a certain percent.

compound demonstrates the "magic" of compounding by calculating the result after compounding value to percent. The -a option will add value to the compounded value for each iteration. The default number of iterations is given by DEFAULT_NUMBER_OF_ITERATIONS in the source, which can be modified on the command line.

Features: if an output file is specified, then for the program to produce the correct result, iterations must also be explicitly specified.

`count`

Usage: count [input-file] [output-file]

Calculates the sum of a set of numbers.

count calculates the sum of a set of numbers (in the first column, if many columns are available).

Features: The program tries to be clever by ignoring values that it thinks are not numbers. This may work for the most part.

`downcase_filename, upcase_filename`

Usage: downcase_filename filename

Changes the case of a filename.

downcase_filename changes the case of a filename to lower case (and its counterpart, upcase_filename does the opposite).

`find_cliques`

Usage: find_cliques [input-file] [output-file]

Finds all the cliques in a graph.

find_cliques reads the size of the graph, the graph itself specified by a matrix of 1 and 0 (each line corresponds to a vertex number) and outputs all the maximal completely connected sub-graphs in the graph. An example graph of three vertices would look like:

The program uses the Bron and Kerbosch algorithm to do clique finding.

The reference for this program is: Bron C, Kerbosch, R. Algorithm 457: Finding all cliques of an undirected graph. Communications of the ACM, 16: 575-577, 1973.

Features: the graph must be undirected. That is, the matrix must be symmetric and the diagonals must be 1.

`find_duplicate_words`

Usage: find_duplicate_words [input-file] [output-file]

Find occurances of duplicate words in a document.

find_duplicate_words is a simple program to find the duplicate words in a document. It basically reads in every word in a document (separated by WHITESPACE, as defined in the source file) and stores the last word and checks to the see if the current word is the same as the last one.

`histogram`

Usage: histogram input-file start-value increment-value stop-value [output-file]

Makes a histogram from a set of numbers.

histogram uses the input data to create a histogram from start-value to stop-value. The size of each bin is determined by increment-value (and the number of bins will be determined by the difference between start-value and stop-value divided by increment-value.

`ic`

Usage: ic [input-file] [output-file]

Gives the information content for a set of probabilties.

ic outputs the information content for a set of probabilities using the formula P * log(P). The units are decimal digits ("dits") since the log(P) is calculated using log base 10.

`max`, `min`

Usage: max|min input-file [output-file]

Find the maximum or minimum value of a set of numbers.

max finds the maximum value of a set of numbers. min finds the minimum value.

Features: Can't handle numbers lesser/greater than MIN_VALUE/MAX_VALUE in ramp/src/tools/maxmin.c.

`mypaste`

Usage: mypaste input-file1 input-file2 [paste-string] [output-file]

Concatenates the lines in two files sequentially.

mypaste concatenates the lines in two files sequentially. If paste-string is specified, then it is used as a conjunction between the lines.

Features: The number of lines output will always be the same as the number in input-file1. If input-file1 has a greater number of lines compared to input-file2, lines in input-file1 which don't have a corresponding line in input-file2 will be output as is. If input-file1 has a lesser number of lines compared to input-file2, then the number of lines output will be the same as the number in input-file1.

This routine is better than the paste commonly found in Unix systems in that it allows you to specify an arbitrary paste string.

`mysplit`

Usage: mysplit input-file number-of-lines [output-file-prefix]

Splits a file into chunks with a specified line length.

mysplit splits an input file into N files, where N is the number of lines in the input file divided by the value specified for number-of-lines. If output-file-prefix is not specified, then the value for input-file is used in its place.

This routine is better than the split commonly found in Unix system in that it outputs files using a numeric index suffix (0..N).

`normalise`

Usage: normalise [-lt] [input-file] [output-file]

Normalises a set of numbers.

normalises divides a set of numbers by the large value -l or the total (-t) of the numbers.

Features: Can't handle sets larger than MAX_VALUES in ramp/src/tools/normalise.c.

`random`

Usage: random [seed] [output-file]

Generates a random number.

random generates a random number using the random() function. The default seed is the value returned by time(). The seed can also be specified with the optional seed argument.

`rotate_text`

Usage: rotate_text [input-file] [output-file]

Rotates a block of ASCII text.

rotate_text rotates a block of ASCII text by 90 degrees. While the input does not have to be in the form an MxN matrix and can contain free flow of text, the output is printed as an NxM matrix (i.e., with spaces).

`sizeof`

Usage: sizeof [output-file]

Outputs sizes for various types in C.

sizeof outputs the sizes (in bytes) for various types in C (as reported by the sizeof() function. This is useful for cross-platform development.

4.2 Miscellaneous programs

`text2html`

Usage: text2html [input-file] [output-file]

Convert a text file into an HTML file.

text2html takes a simple text file (formatted in different paragraphs) and converts it into an HTML file (essentially adding <p> and </p> tags whenever two consecutive new lines are encountered). The program also prompts for a title and header string, and appends the file .signature in the current directory (if it exists) to the output.

In this preliminary version of the program, there is also an attempt to use characters normally used in ASCII text for various style changes. For example, when the program encouters a word of the form /foo/ it will give you a set of italics options to use to convert the word into an appropriate style.

The nice thing about this program is that it provides a way to learn how lex and yacc work in terms of writing and parsing formal language grammers (i.e., it touches upon topics in compiler theory, deterministic pushdown automata, etc.).

Features: program is still a bit buggy (it does what I want it to do, so there's little incentive to fix it) and needs lex and yacc installed.

Next Previous Contents

4. Programs

4.1 Tools

average

bootstrap, bootstrap_lines

ccoef, lsqr

clog, log10, loge

compare_numbers, gt, gt1, gte, gte1, lt, lt1, lte, lte1, eq, eq1

compound

count

downcase_filename, upcase_filename

find_cliques

find_duplicate_words

histogram

ic

max, min

mypaste

mysplit

normalise

random

rotate_text

sizeof