-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Describe the bug
When calling predict, I obtain the error:
Error in factor(predictions, labels = labels) :
invalid 'labels'; length n should be 1 or k
where n > k every time (note you could substitute n, k for any integers satisfying this constraint above).
To Reproduce
I believe the error is that if predictions does not contain any predictions for a single class that exists in the training data, the way that the factoring is done causes an error. Minimal reproducible example demonstrating this issue with the way the predictions are being assigned class labels would be (ie, the flaw with the approach chosen):
x <- rep(letters[1:5], 3) # x has only 5 unique elements
factor(x, labels=LETTERS[1:10]) # note that there are more labels than unique elements of x
Error in factor(x, labels = LETTERS[1:10]) :
invalid 'labels'; length 10 should be 1 or 5
I noticed this bug when I had a training set with extremely sparse representation (30 samples of 10,000) of a single class, which presumably is just never predicted during prediction and hence the error is thrown if I had to guess.
Expected behavior
The predictions are returned.
Desktop (please complete the following information):
- OS: Ubuntu 18.04
- Language: R
- Version 2.0.4
Additional context
It would appear this issue can be fixed by simply:
x <- rep(letters[1:5], 3) # x has only 5 unique elements
factor(x, levels=LETTERS[1:10]) # note that there are more labels than unique elements of x