todo.org 3.4 KB

To do

Documentation [0/2]

  • [ ] I should write more doc strings for remaining procedures.
  • [ ] Update the readme file.

Testing [0/1]

  • [ ] I should check, whether there are more functions or procedures, which need unit tests.

Multiprocessing [0/4]

  • [ ] Make a decision which tool or library for parallelizing parts of the algorithm. Here are some choices:
  • [ ] futures (apparently GNU Guile futures are not the same as Racket futures)
  • [ ] Are there similar restrictions to futures as in Racket?
  • [ ] only for side-effect free function calls
  • [ ] parallel forms (building on futures)
  • [ ] fibers library
  • [ ] Add an abstraction layer for running something in parallel.
  • perhaps 2 interfaces are a good idea:
  1. do task in n processes (process pool basically)
  2. do task on in another process

This way no matter which Scheme is used, the multiprocessing can be ported by just changing the code behind the interface and the algorithm code does not need to care.

  • [ ] Implement parallel evaluation using a chosen tool or library.
  • [ ] Consider implementing with an additional tool or library.

Abstraction layers [0/1]

  • [ ] I should check, whether there is still any usage of things in the code, which I would like to abstract from.
  • For example list primitives car, cdr, null? instead of dataset-empty?, etc.

Logging [0/1]

  • [ ] Logging should not appear when undesired.
  • For example debug logs should not appear when running the tests.
  • Maybe there is a good logging module for Guile / Scheme? Maybe even an SRFI?

Old todo (possibly obsolete, need to check) [0/4]

  • [ ] remove data from not leaf nodes by using struct setters
  • However, that might be computationally expensive, if done in a purely functional way.
  • This might save some memory.
  • The memory cost of keeping copies is at maximum (for a perfectly balanced binary tree) the depth of the tree times the memory used by the whole dataset.
  • [ ] Find any remaining randomness (if there is any), which is not determined by random-seed keyword arguments yet.
  • [ ] Prediction
  • return not only the predicted label, but also how sure we are about the prediction (percentage of data points in the leaf node, which has the predicted label)
  • A decision tree predicts the label for a data point, by following the split features and values in the tree. How is this "How sure is the model about the prediction?" calculated? Does it even exist for decision trees? I should check scikit-learn and see what they offer.
  • One way to return how sure the model is about the prediction is to create a "Prediction" struct, which contains the prediction and the number expressing how sure the model is and return that struct.
  • Another way is to use multiple return values.
  • [ ] Pruning [0/1]
  • [ ] offer a prunning way of removing all splits with less improvement than x in cost?
  • argument against this: This can be already achieved with early stopping parameters. Why would anyone not use early stopping parameters, but then do this in a prunning step?
  • Perhaps the person doing the prunning is not the same person, who did the fitting of the model.

Testing [0/2]

  • [ ] Make a list of procedures, which are not tested yet.
  • [ ] Make decisions about which of those procedures really need tests.
  • [ ] anything that is not predefined in Scheme / GNU Guile