random forest using matlab

Matlab /Source Code 2017. 3. 12. 21:01

https://www.kaggle.com/c/titanic/discussion/6288

Hello,

Here's a matlab code to dowload the data and try some random forests with k-fold validation. Extensive test on the numbers of trees and mtry suggest default parameters are fine and the model robust to changing these hyperparameters (including the k for k-fold).

Two questions please:

1) do you see why the performance are so different for train and test set? k-fold cross validation suggest the model should get 0.82, but on the test set it gets 0.74, which is below 0.82 minus 3 SD (computed through 30 random repetitions). Does this sound likely or would you guess it indicates a bug somewhere?

2) I'm trying to set the random generator so as to make my prediction deterministic, however something sucks. Matlab random generators seems to behave fine (namely, A=rand(3,2) produces the very same numbers again and again), but overall the model still produce random predictions. I suspect the mex files don't rely on the random generator setted by matlab. Do you see a way to deal with this issue?

Regards,

PS:

The train.csv and test.csv are assumed to be in a folder "data", with all *.m in its parent file. In addition to the three first attached files (get_titanic... and pred.m), one need to get the mex and m file available through:

https://code.google.com/p/randomforest-matlab/downloads/list

(if your system fits mine, you can probably just take the attached files)

get_titanic_test.m (2.39 KB)

get_titanic_train.m (2.39 KB)

mexClassRF_predict.mexw64 (26 KB)

mexClassRF_train.mexw64 (45 KB)

mexRF_predict.mexw64 (11 KB)

mexRF_train.mexw64 (33.5 KB)

classRF_predict.m (2.12 KB)

classRF_train.m (14.48 KB)

regRF_predict.m (986 B)

regRF_train.m (12.56 KB)

저작자표시 비영리 동일조건

'Matlab > Source Code' 카테고리의 다른 글

This collection of over 300 MATLAB examples can help you with image processing and computer vision problems (0)	2017.03.31
Top 10 most popular MATLAB & Simulink file downloads from last year (0)	2017.03.18
Some Matlab Code (0)	2017.03.07
plot standard deviation and mean (0)	2017.02.08
matlab dist function (0)	2017.01.17