Decision-tree classification for GNU Guile.

Eric Bavier bdfcb149a7 Update TODO. před 10 roky
classification d793976650 decision-trees: Rewrite without goops. před 10 roky
README 79381b1034 data-mining: notes/doc and namespace move před 10 roky
TODO bdfcb149a7 Update TODO. před 10 roky
attributes.scm 371c1a1b7b attributes: Dissector fixes. před 10 roky
dataset.scm 14b546cdff dataset: dataset->delimited and dataset-label-idx. před 10 roky
hash-util.scm 055a74f934 Rewrite dataset without goops. před 10 roky
indexed-matrix.scm 055a74f934 Rewrite dataset without goops. před 10 roky
test-util.scm 055a74f934 Rewrite dataset without goops. před 10 roky
type-conversions.scm b0e266379e Initial commit. před 10 roky
util.scm 3aab6857ba util: Documentation and combinatorial procedures. před 10 roky
wttree.scm a4fb898a91 Fix typo in wttree procedure name. před 10 roky

README

========
Design
========

Use cases (or, the reason I wrote this code instead of using someone
else's code):

1. Construct a dataset programmatically, by adding new entries one by one::

(define d (make-dataset-from-arff "/path/to/arff/file"))
=> #
(dataset-derive-new-attribute! d '(/ rows cols))

2. Output classifiers as code. E.g. so that a decision tree can be
included in C source. This extends the usefulness of the
data-mining results and also acts as a sort of visual documentation
of the results.

3. Be able to easily plug into and manipulate how parts of the mining
process is carried out. This is much easier to do in a language
like scheme, but quite a bit harder in something like C/C++ which
many of the other decision-tree codes are written in.


===============
TODO/Wishlist
===============

1. Support using the "Gain Ratio" measure for determining a split's goodness.
This should be as easy as dividing the normal gain measure by::

-sum(i,1,k){ P(v_i) log2(P(v_i)) }

where P(v_i) is the fraction of records that where put into split i.