This repository is supposed to contain all my GNU Guile or Scheme machine learning algorithm implementations.
zelphir.kaltstahl 459e0564a1 remove todo, usage of car is OK here | 4 anos atrás | |
---|---|---|
old-racket-code | 5 anos atrás | |
scripts | 5 anos atrás | |
test | 4 anos atrás | |
utils | 4 anos atrás | |
.gitignore | 5 anos atrás | |
LICENSE | 7 anos atrás | |
README.org | 5 anos atrás | |
columns.csv | 7 anos atrás | |
data-point.scm | 5 anos atrás | |
data_banknote_authentication.csv | 7 anos atrás | |
dataset.scm | 4 anos atrás | |
decision-tree.scm | 4 anos atrás | |
metrics.scm | 4 anos atrás | |
notes.org | 4 anos atrás | |
prediction.scm | 5 anos atrás | |
pruning.scm | 4 anos atrás | |
split-quality-measure.scm | 5 anos atrás | |
todo.org | 4 anos atrás | |
tree.scm | 4 anos atrás | |
utils.scm | 5 anos atrás |
You can run the tests by running the script run-tests.bash
in the scripts/
directory as follows:
# from the root directory of this project:
bash scripts/run-tests.bash
This example is outdated and still for the older Racket code.
(define shuffled-dataset (shuffle dataset))
(define small-dataset
(data-range shuffled-dataset
0
;; take only a fifth of the data to make this example run faster
(exact-floor (/ (dataset-length shuffled-dataset)
5))))
;; be sure to collect all garbage, apparently this should be called thrice
(collect-garbage)
(collect-garbage)
(collect-garbage)
;; requires a ~time~ macro
(time
;; ~for/list~ -- a Racketism, needs to be rewritten
(for/list ([i (in-range 1)])
(mean
(evaluate-algorithm #:dataset (shuffle dataset)
#:n-folds 10
#:feature-column-indices (list 0 1 2 3)
#:label-column-index 4
#:max-depth 5
#:min-data-points 24
#:min-data-points-ratio 0.02
#:min-impurity-split (expt 10 -7)
#:stop-at-no-impurity-improvement #t
#:random-seed 0))))
;; be sure to collect all garbage, apparently this should be called thrice
(collect-garbage)
(collect-garbage)
(collect-garbage)
(time
;; ~for/list~ -- a Racketism, needs to be rewritten
(for/list ([i (in-range 1)])
;; run with the whole dataset as an example, no random seed
(define tree (fit #:train-data dataset
#:feature-column-indices (list 0 1 2 3)
#:label-column-index 4
#:max-depth 5
#:min-data-points 12
#:min-data-points-ratio 0.02
#:min-impurity-split (expt 10 -7)
#:stop-at-no-impurity-improvement #t))
'done))