-
Notifications
You must be signed in to change notification settings - Fork 176
chore: Fix IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]
#3266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
IntoSchema typing, allow Sequence[tuple[str, IntoDType]IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]]
| # TODO @dangotbanned: fix this? | ||
| # Constructor allows tuples, but we don't support that *everywhere* yet | ||
| IntoSchema: TypeAlias = "Mapping[str, dtypes.DType] | Schema" | ||
| IntoSchema: TypeAlias = ( | ||
| "Mapping[str, IntoDType] | Sequence[tuple[str, IntoDType]] | Schema" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dangotbanned I hope you are proud 😇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳
But now to ruin your day slightly ...
We got our typing for Schema.__init__ from polars, but even this is too strict.
Here's what typeshed has for dict, which we support (except **kwargs)
Luckily we don't need to use any overloads, but we should have this in IntoSchema somewhere:
Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting timing though, I just pushed a test where I passed a non-Mapping object to nw.Schema 😂
narwhals/tests/plan/schema_test.py
Lines 18 to 20 in 1579b32
| # NOTE: Would type-check if `Schema.__init__` didn't make liskov unhappy | |
| assert schema == nw.Schema(frozen_schema) # type: ignore[arg-type] | |
| assert mapping == dict(frozen_schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dangotbanned
Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]
SupportsKeysAndGetItemwhat the hell is this 😂 how's Mapping not enough anymore?- Regarding
Iterable, I always have a hard time being comfortable with it, since it can be infinite (and lazy).- How does infinite converts to a dict?
- Lazy makes it a bit more of a headache when we want to check it's content (e.g. first element is a string or a tuple), but that's alright
- This is the reason why I went with
Sequence, although polarsSchema.__init__signature is with theIterable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We got our typing for Schema.init from polars, but even this is too strict.
can we resolve that later and just get the type[Dtype] / Dtype part sorted out in this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SupportsKeysAndGetItemwhat the hell is this 😂 how's Mapping not enough anymore?
Lol, it is an object that supports those methods ofc 😉
just import it in a TYPE_CHECKING block from here
from _typeshed import SupportsKeysAndGetItemThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding Iterable, I always have a hard time being comfortable with it, since it can be infinite
If someone passes an infinite stream to an initializer - they need to learn the lesson that the call will never complete
(and lazy).
This is a desirable property for the caller and I think if we want to validate things, then that is our responsibility
Reducing scope
How about aligning just Schema.__init__ with dict.__init__?
We are only checking for None there, so the issues you have seem to not apply there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dangotbanned attempt in 9ce61fe, which now fails for mypy when the dict has mix of initialized and un-initialized dtypes, which was the initial purpose of the PR 🤔
Edit: By changing Iterable into Sequence there are no issue (see 0e0d7a8). I am personally not interested in diving deeper than this.
Last to fix is the forward reference for tubular. Which I am unsure if it's our fault or them not importing something within a if TYPE_CHEKING: block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last to fix is the forward reference for tubular. Which I am unsure if it's our fault or them not importing something within a
if TYPE_CHEKING:block
tubular uses a runtime type checker (beartype).
I ran into a similar problem the last time I changed the definition of Schema:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding (https://github.com/narwhals-dev/narwhals/pull/3266/files#r2483418646), (https://github.com/narwhals-dev/narwhals/pull/3266/files#r2483400506)
This is my bad, but we should keep that part and my original TODO as a todo
Lines 307 to 308 in d7bb050
| # TODO @dangotbanned: fix this? | |
| # Constructor allows tuples, but we don't support that *everywhere* yet |
These things are related, but the timing of #3252 complicated both this PR and my TODO 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Randomly removing some initialization to test the typechecking
narwhals/typing.py
Outdated
| IntoNullableSchema: TypeAlias = ( | ||
| "Mapping[str, IntoDType | None] | Sequence[tuple[str, IntoDType | None]]" | ||
| ) | ||
| """Schema specification with possible None values.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Please let's come up with a better description
| return (value,) * n_match if isinstance(value, bool) else tuple(value) | ||
|
|
||
|
|
||
| class NullableSchema(OrderedDict[str, "IntoDType | None"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason for this class is mostly two folded:
- Use as a utility to convert a
Sequence[tuple[str, DType {|None}]]into a mapping. Hence make it easy to use the same API (key, dtype in obj.items()). - Easily set a flag to know if any value passed is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name Nullable* is making me think this is related to #3176 (comment) again 🫣
I'm not opposed to having classes to make our internal API cleaner btw 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name Nullable* is making me think this is related to #3176 (comment) again 🫣
Not sure I see how it's related. Passing None seems more like to be a free card, "live and let live" kind of behavior.
I'm not opposed to having classes to make our internal API cleaner btw 👍
Any preference in having it prefixed by _?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any preference in having it prefixed by
_?
It is already in _utils, so no need for a prefix 🙂
The current _ names are an artifact from when we had everything in utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I see how it's related. Passing
Noneseems more like to be a free card, "live and let live" kind of behavior.
I'll try to give a more complete explanation later, but I was hinting at Nullable, Null, None being a bit overloaded.
Linking to (#3176 (comment)) was supposed to show that I made this mistake already 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I see how it's related. Passing
Noneseems more like to be a free card, "live and let live" kind of behavior.I'll try to give a more complete explanation later, but I was hinting at
Nullable,Null,Nonebeing a bit overloaded.
I guess now still counts as later 😂
Somebody wrote docs about the term nullable, and this usage is not to do with that
|
Regarding CI failures: the only one related to this PR seem to be tubular 👀 |
IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]]IntoSchema typing, allow to pass Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]
IntoSchema typing, allow to pass Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]
narwhals/typing.py
Outdated
| "IntoFrameT", | ||
| "IntoLazyFrame", | ||
| "IntoLazyFrameT", | ||
| "IntoNullableSchema", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure about making this a public type?
Is it possible to only focus on Dtype -> intodtype changes here without introducing new public types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I honestly think there might be something wrong in the PR to begin with. If I refocus on the basic changes to fix #3257, then passing a dictionary with mixed initialized and un-initialized dtypes does still complain (pyright is ok with it, mypy is not)
|
I will keep this PR open for a bit longer - if anyone is interested to bringing to the finish line, I am happy to hand it over 🙈 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR (and reminding me about it 🙏) @FBruzzesi
I think there are parts of this that are ready to go!
But the combination of (#3262 and #3252) + my suggestion took the PR in a direction that you couldn't have anticipated when opening #3262
It seems like I made a suggestion on that PR that may have been lost (#3262 (comment)):
@FBruzzesi I think if you were looking to clean things up following #3252, then #2486 may be what we're missing?
This issue is still open and we've both mentioned it recently, maybe there's something to that?
| return Version.MAIN.dataframe.from_dict(data, schema, backend=backend) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change and think we can split those like it into a PR that makes a dent into #2786
| f" nw.from_numpy(arr, backend='pyarrow').lazy('{implementation}')" | ||
| ) | ||
| raise ValueError(msg) | ||
| return Version.MAIN.dataframe.from_numpy(data, schema, backend=backend) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Same comment as from_dict)
| # TODO @dangotbanned: fix this? | ||
| # Constructor allows tuples, but we don't support that *everywhere* yet | ||
| IntoSchema: TypeAlias = "Mapping[str, dtypes.DType] | Schema" | ||
| IntoSchema: TypeAlias = ( | ||
| "Mapping[str, IntoDType] | Sequence[tuple[str, IntoDType]] | Schema" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last to fix is the forward reference for tubular. Which I am unsure if it's our fault or them not importing something within a
if TYPE_CHEKING:block
tubular uses a runtime type checker (beartype).
I ran into a similar problem the last time I changed the definition of Schema:
| return list(self.keys()) | ||
|
|
||
| def dtypes(self) -> list[DType]: | ||
| def dtypes(self) -> list[IntoDType]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may be difficult to work with.
list[DType] is invariant but names a single concrete type.
OTOH, list[IntoDType] will expand to this:
list[DType | type[NumericType | TemporalType | String | Boolean | Binary | Categorical | Unknown | Object]And the problem there is that type would be rejected by another annotation which used list[DType] 😢
| # TODO @dangotbanned: fix this? | ||
| # Constructor allows tuples, but we don't support that *everywhere* yet | ||
| IntoSchema: TypeAlias = "Mapping[str, dtypes.DType] | Schema" | ||
| IntoSchema: TypeAlias = ( | ||
| "Mapping[str, IntoDType] | Sequence[tuple[str, IntoDType]] | Schema" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding (https://github.com/narwhals-dev/narwhals/pull/3266/files#r2483418646), (https://github.com/narwhals-dev/narwhals/pull/3266/files#r2483400506)
This is my bad, but we should keep that part and my original TODO as a todo
Lines 307 to 308 in d7bb050
| # TODO @dangotbanned: fix this? | |
| # Constructor allows tuples, but we don't support that *everywhere* yet |
These things are related, but the timing of #3252 complicated both this PR and my TODO 😞
| return (value,) * n_match if isinstance(value, bool) else tuple(value) | ||
|
|
||
|
|
||
| class NullableSchema(OrderedDict[str, "IntoDType | None"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I see how it's related. Passing
Noneseems more like to be a free card, "live and let live" kind of behavior.I'll try to give a more complete explanation later, but I was hinting at
Nullable,Null,Nonebeing a bit overloaded.
I guess now still counts as later 😂
Somebody wrote docs about the term nullable, and this usage is not to do with that
What type of PR is this? (check all applicable)
Related issues
typing.IntoSchematoo narrow #3257Checklist
If you have comments or can explain your changes, please do so below