README 1.1 KB

1234567891011121314151617181920212223242526272829303132333435
  1. ========
  2. Design
  3. ========
  4. Use cases (or, the reason I wrote this code instead of using someone
  5. else's code):
  6. 1. Construct a dataset programmatically, by adding new entries one by one::
  7. (define d (make-dataset-from-arff "/path/to/arff/file"))
  8. => #<dataset>
  9. (dataset-derive-new-attribute! d '(/ rows cols))
  10. 2. Output classifiers as code. E.g. so that a decision tree can be
  11. included in C source. This extends the usefulness of the
  12. data-mining results and also acts as a sort of visual documentation
  13. of the results.
  14. 3. Be able to easily plug into and manipulate how parts of the mining
  15. process is carried out. This is much easier to do in a language
  16. like scheme, but quite a bit harder in something like C/C++ which
  17. many of the other decision-tree codes are written in.
  18. ===============
  19. TODO/Wishlist
  20. ===============
  21. 1. Support using the "Gain Ratio" measure for determining a split's goodness.
  22. This should be as easy as dividing the normal gain measure by::
  23. -sum(i,1,k){ P(v_i) log2(P(v_i)) }
  24. where P(v_i) is the fraction of records that where put into split i.