Conversation
selmanozleyen
left a comment
There was a problem hiding this comment.
Hi thanks for the PR!
After removing the print statements and besides my comments I will still give an appoval to speed up the development on this project.
| category: str | ||
| axis: Literal[0, 1, "obs", "var"] = "obs" | ||
| preserve_categories: bool = True | ||
| preserve_categories: Optional[list[str]] = None |
There was a problem hiding this comment.
I like this idea but I think it might be better if we do it like this
preserve_categories: Optional[list[str]] = []
So empty list by default which would mean no preservation. And if it is None that would mean all categories. So then it would make sense to change the variable name to preserved_categories. Also to make the typing more modern finally it would look like this
preserved_categories: Sequence[str] | None = tuple()here default is tuple instead because its not good to have default arguments as mutuable objects. Could you also update docstring with that?
There was a problem hiding this comment.
Thanks for the review.
If we change preserved_categories to a tuple then how are we to distinguish between the "obs" and "var" variables?
There was a problem hiding this comment.
But we specify if its a var or obs in axis right?
There was a problem hiding this comment.
Yes true. I am now questioning whether this even works for .var. In the Ann2DataByCategory object the ToCategoryIterator is by default initialised with obs. Also in thetest_to_category_iterator() there is nothing tested for "var". Am I overlooking where the ToCategoryIteratorwould be initialised withaxis = var`?
Added possibility to add:
varvariables in addition toobs(to_category_iterator)one_hot_encodingfor features to perform mapping back to feature names