PREDICTION USING CLASSIFICATIONS Classifications can be used to predict class membership for new cases. So in addition to possibly giving you some insight into the structure behind your data, you can now use Autoclass directly to make predictions, and compare Autoclass to other learning systems. This technique for predicting class probabilities is applicable to all attributes, regardless of data type/sub_type or likelihood model term type. In the event that the class membership of a data case does not exceed 0.0099999 for any of the "training" classes, the following message will appear in the screen output for each case: xref_get_data: case_num xxx => class 9999 Class 9999 members will appear in the "case" and "class" cross-reference reports with a class membership of 1.0. Cautionary Points: The usual way of using Autoclass is to put all of your data in a data_file, describe that data with model and header files, and run "autoclass -search". Now, instead of one data_file you will have two, a training_data_file and a test_data_file. It is most important that both databases have the same AutoClass internal representation. Should this not be true, AutoClass will exit, or possibly in in some situations, crash. The prediction mode is designed to hopefully direct the user into conforming to this requirement. Preparation: Prediction requires having a training classification and a test database. The training classification is generated by the running of "autoclass -search" on the training data_file ("data/soybean/soyc.db2"), for example: % autoclass -search data/soybean/soyc.db2 data/soybean/soyc.hd2 data/soybean/soyc.model data/soybean/soyc.s-params This will produce "soyc.results-bin" and "soyc.search". Then create a "reports" parameter file, such as "soyc.r-params" (see "reports-c.text"), and run AutoClass in "reports" mode, such as: % autoclass -reports data/soybean/soyc.results-bin data/soybean/soyc.search data/soybean/soyc.r-params This will generate class and case cross-reference files, and an influence values file. The file names are based on the ".r-params" file name: data/soybean/soyc.class-text-1 data/soybean/soyc.case-text-1 data/soybean/soyc.influ-text-1 These will describe the classes found in the training_data_file. Now this classification can be used to predict the probabilistic class membership of the test_data_file cases ("data/soybean/soyc-predict.db2") in the training_data_file classes. % autoclass -predict data/soybean/soyc-predict.db2 data/soybean/soyc.results-bin data/soybean/soyc.search data/soybean/soyc.r-params This will generate class and case cross-reference files for the test_data_file cases predicting their probabilistic class memberships in the training_data_file classes. The file names are based on the ".db2" file name: data/soybean/soyc-predict.class-text-1 data/soybean/soyc-predict.case-text-1 --------------------------------------------------------------------------------