Department of Computer Engineering
12 August 2000
Copyright (c) 2000 by Tolga Aydin and H.Altay Guvenir.
Permission is expressly granted to use this code in any non-commercial work, provided that this notice is preserved.
The C program (rsbf.c) implements RSBF (Regression by Selecting Best Features) method to approximate continuous function by using a given data set.
rsbf is invoked as:
rsbf <DOMAIN> [-v <V>]
Here <DOMAIN> is the name of the domain, and -v option determines the level of verbosity.
The rsbf program expects the following files in the currect directory:
<DOMAIN>.info : Information file that records types of features
<DOMAIN>.train : Training set (Predicted feature is the last column)
<DOMAIN>.test : Querying set (Predicted feature is the last column)
The output is written to a file:
If verbosity option is set, then the intermediate activities are reported to a file called <DOMAIN>.log.rsbf.
An example run for buying data is called as:
rsbf buying -v 3
The rsbf program reads information about the domain from the <DOMAIN>.info file. This file gives information about the number of features and their types. It must contain a line starting with the keyword Features. For example,
Features l l n l
indicates that there are 4 features; 1st, 2nd and 4th features take on linear values, while the 3rd feature is categorical.
For performance measuring, a shell script, cv, can be employed. The cv script can be invoked as;
cv <inducer> <DOMAIN> <fold>
An example run for rsbf is:
cv rsbf buying 10
This example runs rsbf on buying data set by using 10-fold cross-validation. To be able to use cv, two files must be in directory:
<DOMAIN>.data : Data set
<DOMAIN>.info : Information file.