bavier/guile-data-mining: Decision-tree classification for GNU Guile.

Decision-tree classification for GNU Guile.

Eric Bavier bdfcb149a7 Update TODO.		пре 10 година
classification	d793976650 decision-trees: Rewrite without goops.	пре 10 година
README	79381b1034 data-mining: notes/doc and namespace move	пре 11 година
TODO	bdfcb149a7 Update TODO.	пре 10 година
attributes.scm	371c1a1b7b attributes: Dissector fixes.	пре 10 година
dataset.scm	14b546cdff dataset: dataset->delimited and dataset-label-idx.	пре 10 година
hash-util.scm	055a74f934 Rewrite dataset without goops.	пре 10 година
indexed-matrix.scm	055a74f934 Rewrite dataset without goops.	пре 10 година
test-util.scm	055a74f934 Rewrite dataset without goops.	пре 10 година
type-conversions.scm	b0e266379e Initial commit.	пре 11 година
util.scm	3aab6857ba util: Documentation and combinatorial procedures.	пре 10 година
wttree.scm	a4fb898a91 Fix typo in wttree procedure name.	пре 11 година

		
				README
			
				========
 Design
========

Use cases (or, the reason I wrote this code instead of using someone
else's code):

1. Construct a dataset programmatically, by adding new entries one by one::

  (define d (make-dataset-from-arff "/path/to/arff/file"))
  => #
  (dataset-derive-new-attribute! d '(/ rows cols))

2. Output classifiers as code.  E.g. so that a decision tree can be
   included in C source.  This extends the usefulness of the
   data-mining results and also acts as a sort of visual documentation
   of the results.

3. Be able to easily plug into and manipulate how parts of the mining
   process is carried out.  This is much easier to do in a language
   like scheme, but quite a bit harder in something like C/C++ which
   many of the other decision-tree codes are written in.

===============
 TODO/Wishlist
===============

1. Support using the "Gain Ratio" measure for determining a split's goodness.
   This should be as easy as dividing the normal gain measure by::

     -sum(i,1,k){ P(v_i) log2(P(v_i)) }

   where P(v_i) is the fraction of records that where put into split i.