The programs in the RAMBIN suite are described one by one. The programs are grouped into two categories.
Calculates the average of a set of numbers.
average calculates the average of a set of numbers; the
-d option gives detailed information including the standard
deviation and the accuracy.
Bootstraps a set of values or lines.
bootstrap uses a statistical technique known as
"bootstrapping" (duh!), a Stanford-based invention, to randomly
generate arbirary sets of data from existing sets.
operates on a file contain a set of floating-point values and
generates an equal number of values (default) by randomly selecting
from the initial set of values.
bootstrap_lines operates on
any ASCII file and randomly selects an equal number of lines (default)
to output from the initial set of lines.
The number of values/lines output can be changed by specifying a
value for the optional argument output-data-size. This can
be used to generate smaller or larger subsets from the initial set.
The default seed used for the random number generator is the value
time(). The seed can also be specified with the
optional seed argument for reproducable results.
The reason to bootstrap is to assess the statistical validity to the data. When one has a powerful hammer, everything looks like a nail. Bootstrapping is a powerful technique, and should be used wisely.
Calculates the correlation coefficient of pairs of numbers.
ccoef calculates the correlation coefficient of pairs of
numbers specified in two column format. When invoked as
detailed information, including the least squares fitting slope and
its standard deviation, are output.
Calculates the logarithm.
clog calculates the log of a value or a set of
values. When invoked as
loge, log base e is calculated. When
log10, log base 10 is calculated.
compare_numbers, gt, gt1, gte, gte1, lt, lt1, lte, lte1, eq, eq1
Compare pairs of numbers and return a truth value.
compare_numbers compares pairs of numbers and returns a
truth value. The usage of this program is not standard, but it
is done in the interest of being able to freely pipe values to the
compare_numbers should be invoked as one
of the comparison operators:
gt - true if greater than gt1 - true if greater than one gte - true if greater than equal to gte1 - true if greater than equal to one lt - true if lesser than lt1 - true if lesser than one lte - true if lesser than equal to lte1 - true if lesser than equal to one eq - true if equal eq - true if equal to one
As the number of arguments for the operator depend on the specific
operation (for example,
gt1 requires only one argument
gt requires two arguments), the program needs only one
argument for it to produce a result (truth value). However, in the
cases where the operator requires two arguments, and only one is
specified, the second argument is assumed to be one. Also, when
comparison-operator is explicitly specified, it takes
precendence over the
argv variable which is used to
determine the comparison operator (when one is not specified).
gt 1 2 foo gte
gt to compare the numbers 1 and 2, and output the
result to the file
foo, but since
gte is given as
the comparison operator, it will override the
specification and will see if 1 is greater than or equal to 2.
will take input numbers from stdin and see whether they are lesser than one and output the number of times the truth value of that comparison was 1 (true).
eq1 foo 2 bar
will take the input number from the file
foo compare them to
one, and output the number of times the truth value of that comparison
was 1 (true) to the file
bar. In this particular case, the
third argument, 2, is ignored but does need to be specified for the
output to occur in the file
Compounds a value given a certain percent.
compound demonstrates the "magic" of compounding by
calculating the result after compounding value to
-a option will add
value to the compounded value for each iteration. The
default number of iterations is given by
DEFAULT_NUMBER_OF_ITERATIONS in the source, which can be
modified on the command line.
Features: if an output file is specified, then for the program to produce the correct result, iterations must also be explicitly specified.
Calculates the sum of a set of numbers.
count calculates the sum of a set of numbers (in the
first column, if many columns are available).
Features: The program tries to be clever by ignoring values that it thinks are not numbers. This may work for the most part.
Changes the case of a filename.
downcase_filename changes the case of a filename to lower
case (and its counterpart,
upcase_filename does the
Finds all the cliques in a graph.
find_cliques reads the size of the graph, the graph
itself specified by a matrix of 1 and 0 (each line corresponds to a
vertex number) and outputs all the maximal completely connected
sub-graphs in the graph. An example graph of three vertices would look
3 111 110 001
The program uses the Bron and Kerbosch algorithm to do clique finding.
The reference for this program is: Bron C, Kerbosch, R. Algorithm 457: Finding all cliques of an undirected graph. Communications of the ACM, 16: 575-577, 1973.
Features: the graph must be undirected. That is, the matrix must be symmetric and the diagonals must be 1.
Find occurances of duplicate words in a document.
find_duplicate_words is a simple program to find the
duplicate words in a document. It basically reads in every word in a
document (separated by
WHITESPACE, as defined in the source
file) and stores the last word and checks to the see if the current
word is the same as the last one.
Makes a histogram from a set of numbers.
histogram uses the input data to create a histogram from
start-value to stop-value. The size of each bin is
determined by increment-value (and the number of bins will
be determined by the difference between start-value and
stop-value divided by increment-value.
Gives the information content for a set of probabilties.
ic outputs the information content for a set of
probabilities using the formula P * log(P). The units are decimal
digits ("dits") since the log(P) is calculated using log base 10.
Find the maximum or minimum value of a set of numbers.
max finds the maximum value of a set of
min finds the minimum value.
Features: Can't handle numbers lesser/greater than
Concatenates the lines in two files sequentially.
mypaste concatenates the lines in two files sequentially. If
paste-string is specified, then it is used as a conjunction
between the lines.
Features: The number of lines output will always be the same as the number in input-file1. If input-file1 has a greater number of lines compared to input-file2, lines in input-file1 which don't have a corresponding line in input-file2 will be output as is. If input-file1 has a lesser number of lines compared to input-file2, then the number of lines output will be the same as the number in input-file1.
This routine is better than the
paste commonly found in Unix
systems in that it allows you to specify an arbitrary paste string.
Splits a file into chunks with a specified line length.
mysplit splits an input file into N files, where
N is the number of lines in the input file divided by the
value specified for number-of-lines. If
output-file-prefix is not specified, then the value for
input-file is used in its place.
This routine is better than the
split commonly found in
Unix system in that it outputs files using a numeric index suffix
Normalises a set of numbers.
normalises divides a set of numbers by the large value
-l or the total (
-t) of the numbers.
Features: Can't handle sets larger than
Generates a random number.
random generates a random number using the
random() function. The default seed is the value returned by
time(). The seed can also be specified with the optional
Rotates a block of ASCII text.
rotate_text rotates a block of ASCII text by 90
degrees. While the input does not have to be in the form an MxN matrix
and can contain free flow of text, the output is printed as an NxM
matrix (i.e., with spaces).
Outputs sizes for various types in C.
sizeof outputs the sizes (in bytes) for various types in
C (as reported by the
sizeof() function. This is useful for
Convert a text file into an HTML file.
text2html takes a simple text file (formatted in
different paragraphs) and converts it into an HTML file (essentially
adding <p> and </p> tags whenever two consecutive new
lines are encountered). The program also prompts for a title and
header string, and appends the file
.signature in the current
directory (if it exists) to the output.
In this preliminary version of the program, there is also an
attempt to use characters normally used in ASCII text for various
style changes. For example, when the program encouters a word of the
/foo/ it will give you a set of italics options to use
to convert the word into an appropriate style.
The nice thing about this program is that it provides a way to
yacc work in terms of writing and
parsing formal language grammers (i.e., it touches upon topics in
compiler theory, deterministic pushdown automata, etc.).
Features: program is still a bit buggy (it does what I want it to
do, so there's little incentive to fix it) and needs