Skip to content

Conversation

@FBruzzesi
Copy link
Member

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

@FBruzzesi FBruzzesi added enhancement New feature or request typing labels Oct 31, 2025
@FBruzzesi FBruzzesi mentioned this pull request Oct 31, 2025
10 tasks
@FBruzzesi FBruzzesi changed the title chore: Fix IntoSchema typing, allow Sequence[tuple[str, IntoDType] chore: Fix IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]] Oct 31, 2025
Comment on lines -306 to +307
# TODO @dangotbanned: fix this?
# Constructor allows tuples, but we don't support that *everywhere* yet
IntoSchema: TypeAlias = "Mapping[str, dtypes.DType] | Schema"
IntoSchema: TypeAlias = (
"Mapping[str, IntoDType] | Sequence[tuple[str, IntoDType]] | Schema"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dangotbanned I hope you are proud 😇

Copy link
Member

@dangotbanned dangotbanned Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

But now to ruin your day slightly ...

We got our typing for Schema.__init__ from polars, but even this is too strict.
Here's what typeshed has for dict, which we support (except **kwargs)

https://github.com/python/typeshed/blob/bf7214784877c52638844c065360d4814fae4c65/stdlib/builtins.pyi#L1158-L1187

Luckily we don't need to use any overloads, but we should have this in IntoSchema somewhere:

Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting timing though, I just pushed a test where I passed a non-Mapping object to nw.Schema 😂

# NOTE: Would type-check if `Schema.__init__` didn't make liskov unhappy
assert schema == nw.Schema(frozen_schema) # type: ignore[arg-type]
assert mapping == dict(frozen_schema)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dangotbanned

Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType]
  • SupportsKeysAndGetItem what the hell is this 😂 how's Mapping not enough anymore?
  • Regarding Iterable, I always have a hard time being comfortable with it, since it can be infinite (and lazy).
    • How does infinite converts to a dict?
    • Lazy makes it a bit more of a headache when we want to check it's content (e.g. first element is a string or a tuple), but that's alright
    • This is the reason why I went with Sequence, although polars Schema.__init__ signature is with the Iterable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We got our typing for Schema.init from polars, but even this is too strict.

can we resolve that later and just get the type[Dtype] / Dtype part sorted out in this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SupportsKeysAndGetItem what the hell is this 😂 how's Mapping not enough anymore?

Lol, it is an object that supports those methods ofc 😉

just import it in a TYPE_CHECKING block from here

https://github.com/python/typeshed/blob/be34e9201db75891a67d4d3ce5e5705ee6636f6f/stdlib/_typeshed/__init__.pyi#L161-L164

from _typeshed import SupportsKeysAndGetItem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding Iterable, I always have a hard time being comfortable with it, since it can be infinite

If someone passes an infinite stream to an initializer - they need to learn the lesson that the call will never complete

(and lazy).

This is a desirable property for the caller and I think if we want to validate things, then that is our responsibility

Reducing scope

How about aligning just Schema.__init__ with dict.__init__?

We are only checking for None there, so the issues you have seem to not apply there

Copy link
Member Author

@FBruzzesi FBruzzesi Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dangotbanned attempt in 9ce61fe, which now fails for mypy when the dict has mix of initialized and un-initialized dtypes, which was the initial purpose of the PR 🤔

Edit: By changing Iterable into Sequence there are no issue (see 0e0d7a8). I am personally not interested in diving deeper than this.

Last to fix is the forward reference for tubular. Which I am unsure if it's our fault or them not importing something within a if TYPE_CHEKING: block

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last to fix is the forward reference for tubular. Which I am unsure if it's our fault or them not importing something within a if TYPE_CHEKING: block

tubular uses a runtime type checker (beartype).
I ran into a similar problem the last time I changed the definition of Schema:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding (https://github.com/narwhals-dev/narwhals/pull/3266/files#r2483418646), (https://github.com/narwhals-dev/narwhals/pull/3266/files#r2483400506)

This is my bad, but we should keep that part and my original TODO as a todo

narwhals/narwhals/typing.py

Lines 307 to 308 in d7bb050

# TODO @dangotbanned: fix this?
# Constructor allows tuples, but we don't support that *everywhere* yet

These things are related, but the timing of #3252 complicated both this PR and my TODO 😞

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Randomly removing some initialization to test the typechecking

IntoNullableSchema: TypeAlias = (
"Mapping[str, IntoDType | None] | Sequence[tuple[str, IntoDType | None]]"
)
"""Schema specification with possible None values."""
Copy link
Member Author

@FBruzzesi FBruzzesi Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Please let's come up with a better description

return (value,) * n_match if isinstance(value, bool) else tuple(value)


class NullableSchema(OrderedDict[str, "IntoDType | None"]):
Copy link
Member Author

@FBruzzesi FBruzzesi Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason for this class is mostly two folded:

  • Use as a utility to convert a Sequence[tuple[str, DType {|None}]] into a mapping. Hence make it easy to use the same API (key, dtype in obj.items()).
  • Easily set a flag to know if any value passed is None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name Nullable* is making me think this is related to #3176 (comment) again 🫣

I'm not opposed to having classes to make our internal API cleaner btw 👍

Copy link
Member Author

@FBruzzesi FBruzzesi Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name Nullable* is making me think this is related to #3176 (comment) again 🫣

Not sure I see how it's related. Passing None seems more like to be a free card, "live and let live" kind of behavior.

I'm not opposed to having classes to make our internal API cleaner btw 👍

Any preference in having it prefixed by _?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any preference in having it prefixed by _?

It is already in _utils, so no need for a prefix 🙂

The current _ names are an artifact from when we had everything in utils.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I see how it's related. Passing None seems more like to be a free card, "live and let live" kind of behavior.

I'll try to give a more complete explanation later, but I was hinting at Nullable, Null, None being a bit overloaded.
Linking to (#3176 (comment)) was supposed to show that I made this mistake already 😂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I see how it's related. Passing None seems more like to be a free card, "live and let live" kind of behavior.

I'll try to give a more complete explanation later, but I was hinting at Nullable, Null, None being a bit overloaded.

I guess now still counts as later 😂

Somebody wrote docs about the term nullable, and this usage is not to do with that

@FBruzzesi
Copy link
Member Author

Regarding CI failures: the only one related to this PR seem to be tubular 👀

@FBruzzesi FBruzzesi changed the title chore: Fix IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]] chore: Fix IntoSchema typing, allow to pass Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType] Nov 1, 2025
@FBruzzesi FBruzzesi changed the title chore: Fix IntoSchema typing, allow to pass Iterable[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType] chore: Fix IntoSchema typing, allow to pass Sequence[tuple[str, IntoDType]] | SupportsKeysAndGetItem[str, IntoDType] Nov 1, 2025
"IntoFrameT",
"IntoLazyFrame",
"IntoLazyFrameT",
"IntoNullableSchema",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure about making this a public type?

Is it possible to only focus on Dtype -> intodtype changes here without introducing new public types?

Copy link
Member Author

@FBruzzesi FBruzzesi Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I honestly think there might be something wrong in the PR to begin with. If I refocus on the basic changes to fix #3257, then passing a dictionary with mixed initialized and un-initialized dtypes does still complain (pyright is ok with it, mypy is not)

@FBruzzesi
Copy link
Member Author

I will keep this PR open for a bit longer - if anyone is interested to bringing to the finish line, I am happy to hand it over 🙈

@FBruzzesi FBruzzesi added the help wanted Extra attention is needed label Jan 7, 2026
Copy link
Member

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR (and reminding me about it 🙏) @FBruzzesi

I think there are parts of this that are ready to go!

But the combination of (#3262 and #3252) + my suggestion took the PR in a direction that you couldn't have anticipated when opening #3262

It seems like I made a suggestion on that PR that may have been lost (#3262 (comment)):

#3262 (comment)

@FBruzzesi I think if you were looking to clean things up following #3252, then #2486 may be what we're missing?

This issue is still open and we've both mentioned it recently, maybe there's something to that?

Comment on lines +269 to 270
return Version.MAIN.dataframe.from_dict(data, schema, backend=backend)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change and think we can split those like it into a PR that makes a dent into #2786

f" nw.from_numpy(arr, backend='pyarrow').lazy('{implementation}')"
)
raise ValueError(msg)
return Version.MAIN.dataframe.from_numpy(data, schema, backend=backend)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Same comment as from_dict)

Comment on lines -306 to +307
# TODO @dangotbanned: fix this?
# Constructor allows tuples, but we don't support that *everywhere* yet
IntoSchema: TypeAlias = "Mapping[str, dtypes.DType] | Schema"
IntoSchema: TypeAlias = (
"Mapping[str, IntoDType] | Sequence[tuple[str, IntoDType]] | Schema"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last to fix is the forward reference for tubular. Which I am unsure if it's our fault or them not importing something within a if TYPE_CHEKING: block

tubular uses a runtime type checker (beartype).
I ran into a similar problem the last time I changed the definition of Schema:

return list(self.keys())

def dtypes(self) -> list[DType]:
def dtypes(self) -> list[IntoDType]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be difficult to work with.

list[DType] is invariant but names a single concrete type.

OTOH, list[IntoDType] will expand to this:

list[DType | type[NumericType | TemporalType | String | Boolean | Binary | Categorical | Unknown | Object]

And the problem there is that type would be rejected by another annotation which used list[DType] 😢

Comment on lines -306 to +307
# TODO @dangotbanned: fix this?
# Constructor allows tuples, but we don't support that *everywhere* yet
IntoSchema: TypeAlias = "Mapping[str, dtypes.DType] | Schema"
IntoSchema: TypeAlias = (
"Mapping[str, IntoDType] | Sequence[tuple[str, IntoDType]] | Schema"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding (https://github.com/narwhals-dev/narwhals/pull/3266/files#r2483418646), (https://github.com/narwhals-dev/narwhals/pull/3266/files#r2483400506)

This is my bad, but we should keep that part and my original TODO as a todo

narwhals/narwhals/typing.py

Lines 307 to 308 in d7bb050

# TODO @dangotbanned: fix this?
# Constructor allows tuples, but we don't support that *everywhere* yet

These things are related, but the timing of #3252 complicated both this PR and my TODO 😞

return (value,) * n_match if isinstance(value, bool) else tuple(value)


class NullableSchema(OrderedDict[str, "IntoDType | None"]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I see how it's related. Passing None seems more like to be a free card, "live and let live" kind of behavior.

I'll try to give a more complete explanation later, but I was hinting at Nullable, Null, None being a bit overloaded.

I guess now still counts as later 😂

Somebody wrote docs about the term nullable, and this usage is not to do with that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request help wanted Extra attention is needed typing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

typing.IntoSchema too narrow

4 participants