Skip to content

Issue with Predict() when number of unique predicted labels is less than number of possible labels #341

@ebridge2

Description

@ebridge2

Describe the bug
When calling predict, I obtain the error:

 Error in factor(predictions, labels = labels) : 
  invalid 'labels'; length n should be 1 or k

where n > k every time (note you could substitute n, k for any integers satisfying this constraint above).

To Reproduce

I believe the error is that if predictions does not contain any predictions for a single class that exists in the training data, the way that the factoring is done causes an error. Minimal reproducible example demonstrating this issue with the way the predictions are being assigned class labels would be (ie, the flaw with the approach chosen):

x <- rep(letters[1:5], 3)  # x has only 5 unique elements
factor(x, labels=LETTERS[1:10])  # note that there are more labels than unique elements of x

Error in factor(x, labels = LETTERS[1:10]) : 
  invalid 'labels'; length 10 should be 1 or 5

I noticed this bug when I had a training set with extremely sparse representation (30 samples of 10,000) of a single class, which presumably is just never predicted during prediction and hence the error is thrown if I had to guess.

Expected behavior
The predictions are returned.

Desktop (please complete the following information):

  • OS: Ubuntu 18.04
  • Language: R
  • Version 2.0.4

Additional context
It would appear this issue can be fixed by simply:


x <- rep(letters[1:5], 3)  # x has only 5 unique elements
factor(x, levels=LETTERS[1:10])  # note that there are more labels than unique elements of x

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions