1234567891011121314151617181920212223242526272829303132333435 |
- ========
- Design
- ========
- Use cases (or, the reason I wrote this code instead of using someone
- else's code):
- 1. Construct a dataset programmatically, by adding new entries one by one::
- (define d (make-dataset-from-arff "/path/to/arff/file"))
- => #<dataset>
- (dataset-derive-new-attribute! d '(/ rows cols))
- 2. Output classifiers as code. E.g. so that a decision tree can be
- included in C source. This extends the usefulness of the
- data-mining results and also acts as a sort of visual documentation
- of the results.
- 3. Be able to easily plug into and manipulate how parts of the mining
- process is carried out. This is much easier to do in a language
- like scheme, but quite a bit harder in something like C/C++ which
- many of the other decision-tree codes are written in.
- ===============
- TODO/Wishlist
- ===============
- 1. Support using the "Gain Ratio" measure for determining a split's goodness.
- This should be as easy as dividing the normal gain measure by::
- -sum(i,1,k){ P(v_i) log2(P(v_i)) }
- where P(v_i) is the fraction of records that where put into split i.
|