INTPYTHON-698 Allow defining embedded model indexes on the top-level model #376

WaVEV · 2025-08-25T02:05:36Z

NOW:
Allow simple expressions to support top-level index model.

class Address(EmbeddedModel):
    unique_together_one = models.CharField(max_length=10)
    unique_together_two = models.CharField(max_length=10)

    class Meta:
        app_label = "schema_"

class Author(EmbeddedModel):
    address = EmbeddedModelField(Address)
    unique_together_three = models.CharField(max_length=10)
    unique_together_four = models.CharField(max_length=10)

    class Meta:
        app_label = "schema_"

class Book(models.Model):
    author = EmbeddedModelField(Author)

    class Meta:
        app_label = "schema_"
        constraints = [
            models.UniqueConstraint(
                F("author__unique_together_three").asc(),
                F("author__unique_together_four").desc(),
                name="unique_together_34",
            ),
            (
                models.UniqueConstraint(
                    F("author__address__unique_together_one"),
                    F("author__address__unique_together_two").asc(),
                    name="unique_together_12",
                )
            ),
        ]

BEFORE:

Indexes on EmbeddedModelField (EMF) are defined using dot notation:

class Address(EmbeddedModel):
    unique_constraint_one = models.CharField(max_length=10)

class Author(EmbeddedModel):
    address = EmbeddedModelField(Address)
    unique_constraint_two = models.CharField(max_length=10)

class Book(models.Model):
    author = EmbeddedModelField(Author)

    class Meta:
        constraints = [
            models.UniqueConstraint(
                fields=["author.unique_constraint_two"],
                name="unique_two",
            ),
            models.UniqueConstraint(
                fields=["author.address.unique_constraint_one"],
                name="unique_one",
            ),
        ]

Notes on Django checks

Django enforces models.E016, which requires all UniqueConstraint and Index definitions to reference an existing local field.

Migration output

The generated migrations expand embedded models into flat fields using dot-notation:

# Generated by Django 5.2.4.dev20250616223441 on 2025-09-02 01:27

import django_mongodb_backend.fields
import polls.models
from django.db import migrations, models


class Migration(migrations.Migration):

    initial = True

    dependencies = [
    ]

    operations = [
        migrations.CreateModel(
            name='Address',
            fields=[
                ('id', django_mongodb_backend.fields.ObjectIdAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
                ('city', models.CharField(max_length=20)),
                ('state', models.CharField(max_length=2)),
                ('zip_code', models.IntegerField()),
                ('uid', models.IntegerField()),
            ],
            options={
                'abstract': False,
            },
        ),
        migrations.CreateModel(
            name='Author',
            fields=[
                ('address.id', django_mongodb_backend.fields.ObjectIdAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
                ('name', models.CharField(max_length=10)),
                ('age', models.IntegerField()),
                ('address', django_mongodb_backend.fields.EmbeddedModelField(embedded_model=polls.models.Address)),
                ('employee_id', models.IntegerField()),
                ('address.city', models.CharField(max_length=20)),
                ('address.state', models.CharField(max_length=2)),
                ('address.zip_code', models.IntegerField()),
                ('address.uid', models.IntegerField()),
            ],
            options={
                'abstract': False,
            },
        ),
        migrations.CreateModel(
            name='Book',
            fields=[
                ('author.address.id', django_mongodb_backend.fields.ObjectIdAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
                ('name', models.CharField(max_length=100)),
                ('author', django_mongodb_backend.fields.EmbeddedModelField(embedded_model=polls.models.Author)),
                ('author.name', models.CharField(max_length=10)),
                ('author.age', models.IntegerField()),
                ('author.address', django_mongodb_backend.fields.EmbeddedModelField(embedded_model=polls.models.Address)),
                ('author.employee_id', models.IntegerField()),
                ('author.address.city', models.CharField(max_length=20)),
                ('author.address.state', models.CharField(max_length=2)),
                ('author.address.zip_code', models.IntegerField()),
                ('author.address.uid', models.IntegerField()),
            ],
            options={
                'indexes': [models.Index(fields=['author.address.zip_code'], name='polls_book_author._adffc9_idx')],
                'constraints': [models.UniqueConstraint(fields=('author.name',), name='unique')],
            },
        ),
    ]

Current issue

Because embedded fields are materialized both inside the main model and within the embedded model itself, columns are projected twice. For example:

Book._meta.local_fields
Out[1]: 
[<django_mongodb_backend.fields.auto.ObjectIdAutoField: author.address.id>,
 <django.db.models.fields.CharField: name>,
 <django_mongodb_backend.fields.embedded_model.EmbeddedModelField: author>,
 <django.db.models.fields.CharField: author.name>,
 <django.db.models.fields.IntegerField: author.age>,
 <django_mongodb_backend.fields.embedded_model.EmbeddedModelField: author.address>,
 <django.db.models.fields.IntegerField: author.employee_id>,
 <django.db.models.fields.CharField: author.address.city>,
 <django.db.models.fields.CharField: author.address.state>,
 <django.db.models.fields.IntegerField: author.address.zip_code>,
 <django.db.models.fields.IntegerField: author.address.uid>]

This duplication means fields can be referenced redundantly. A side effect is that they are still reachable through F("embedded.something"), but projections end up with repeated columns.

timgraham · 2025-08-25T22:40:41Z

django_mongodb_backend/schema.py

+            # Remove the top level indexes.
+            # TODO: Find a workaround
+            for index in model._meta.indexes:
+                if any(
+                    field_name.startswith(f"{field.column}{LOOKUP_SEP}")
+                    for field_name in index.fields
+                ):
+                    self.remove_index(model, index)
+            for constraint in model._meta.constraints:
+                if any(
+                    field_name.startswith(f"{field.column}{LOOKUP_SEP}")
+                    for field_name in constraint.fields
+                ):
+                    self.get_collection(model._meta.db_table).drop_index(constraint.name)


I would think this is unnecessary since the Index/Constraint would have to be removed from the model's Meta.indexes/constraints before removing the field.

🤔 Oki, I will try to move the logic into _remove_model_indexes and _remove_field_unique. The downfall is I have to pass all the indexes from its parents. Like concatenating all way down the recursion. The index could have been created in anywhere of the hierarchy.

I think it's simpler than you describe. Example:

class Book(models.Model): author = EmbeddedModelField(Author) class Meta: app_label = "schema_" indexes = [ models.Index(fields=["author__indexed_two"]), models.Index(fields=["author__address__indexed_one"]), ]

I removing the an index, the generated migration operation is:

migrations.RemoveIndex( model_name='book', name='embed_index_author._f84329_idx', ),

The logic you describe isn't needed.

timgraham · 2025-08-25T22:43:06Z

django_mongodb_backend/lookups.py

+class Options(base.Options):
+    def get_field(self, field_name):
+        if LOOKUP_SEP in field_name:
+            previous = self
+            keys = field_name.split(LOOKUP_SEP)
+            path = []
+            for field in keys:
+                field = base.Options.get_field(previous, field)
+                if isinstance(field, EmbeddedModelField):
+                    previous = field.embedded_model._meta
+                else:
+                    previous = field
+                path.append(field.column)
+            column = ".".join(path)
+            embedded_column = field.clone()
+            embedded_column.column = column
+            return embedded_column
+        return super().get_field(field_name)


Monkey patching at such a low level seems risky, though I haven't much about what could go wrong. Instead I imagined an Index subclass with this sort of logic. Did you consider it?

🤔 Maybe is feasible if we create something like EMFIndex. I didn't considered yet. I could take a look.

mmh this way (monkey patching) will not work, I forgot that foreign field, lookups and many other things use this. So isn't very straightforward. I have to think another way to solve this

Jibola

Everything looks good to me!
Only missing component is documentation.

Jibola · 2025-09-30T15:03:21Z

django_mongodb_backend/indexes.py

+        query = Query(model=model, alias_cols=False)
+        compiler = query.get_compiler(connection=schema_editor.connection)
+        for expression in self.expressions:
+            query = Query(model=model, alias_cols=False)


Do we need to reinstantiate this? Does expression.resolve_expression(query) mutate query?

You are right, I moved the query outside the loop, and forgot to remove this one.

timgraham · 2025-10-17T18:46:30Z

Concurrent with the release of this, will we stop support unique constraints and indexes declared on embedded models and their fields? Or perhaps we should try to put that through a deprecation process?

aclark4life · 2025-10-17T19:16:52Z

Probably should be deprecated in case anyone is using them.

Jibola · 2025-10-28T17:35:40Z

Concurrent with the release of this, will we stop support unique constraints and indexes declared on embedded models and their fields? Or perhaps we should try to put that through a deprecation process?

We should put this in a deprecation process. For the first bump where this change occurs, we'll let them know that the deprecation is coming.

Jibola · 2025-11-11T22:42:48Z

BEFORE:

Indexes on EmbeddedModelField (EMF) are defined using dot notation:

class Address(EmbeddedModel):
    unique_constraint_one = models.CharField(max_length=10)

class Author(EmbeddedModel):
    address = EmbeddedModelField(Address)
    unique_constraint_two = models.CharField(max_length=10)

class Book(models.Model):
    author = EmbeddedModelField(Author)

    class Meta:
        constraints = [
            models.UniqueConstraint(
                fields=["author.unique_constraint_two"],
                name="unique_two",
            ),
            models.UniqueConstraint(
                fields=["author.address.unique_constraint_one"],
                name="unique_one",
            ),
        ]

This section needs to be rewritten in the PR description. The constraints/indexes for nested fields need to be defined within the EmbeddedModel. Right now this "before" looks like a better system than the AFTER. 😭

class Address(EmbeddedModel):
    unique_constraint_one = models.CharField(max_length=10)

    class Meta:
        constraints = [
            models.UniqueConstraint(
                fields=["author.address.unique_constraint_one"],
                name="unique_one",
            ),
        ]

class Author(EmbeddedModel):
    address = EmbeddedModelField(Address)
    unique_constraint_two = models.CharField(max_length=10)
    
    class Meta:
        constraints = [
            models.UniqueConstraint(
                fields=["author.unique_constraint_two"],
                name="unique_two",
            ),
        ]

class Book(models.Model):
    author = EmbeddedModelField(Author)

Jibola · 2025-11-11T22:52:59Z

Potentially out of scope for this PR, but could we introduce an indexing specific expression subclass that just converts dot notation to __ ?

class MongoDBIndexExpression(F):
    def __init__(name, *args, **kwargs):
        super().(name.replace(".", "__"), *args, **kwargs)

WaVEV · 2025-11-11T23:06:29Z

This section needs to be rewritten in the PR description. The constraints/indexes for nested fields need to be defined within the EmbeddedModel Right now this "before" looks like a better system than the AFTER. 😭

Yes, I’ll rewrite it.
I know, it looks better, but it was impossible to implement as it was. I tried using some artificial columns, but that caused failures in migrations and queries. Since Django doesn’t provide an interface similar to EMF, we can’t reference columns that way (not even with __), but we can through F(), which resolves to a column in the end. Another option would be using key_transforms from EMF, but that’s not any better.

timgraham · 2025-12-17T01:46:49Z

django_mongodb_backend/expressions/builtins.py

+def index_expression(self, compiler, connection, as_expr=False):  # noqa: ARG001
+    result = []
+    for expr in self.get_source_expressions():
+        if expr is None:
+            continue
+        for sub_expr in expr.get_source_expressions():
+            try:
+                result.append(sub_expr.as_mql(compiler, connection))
+            except FullResultSet:
+                result.append(Value(True).as_mql(compiler, connection))
+    return result


It seems to me this should be in indexes.py rather than the expressions module?

timgraham · 2025-12-17T01:49:34Z

django_mongodb_backend/schema.py

+    def _unique_supported(
+        self,
+        condition=None,
+        deferrable=None,
+        include=None,
+        expressions=None,
+        nulls_distinct=None,
+    ):
+        return (
+            (not condition or self.connection.features.supports_partial_indexes)
+            and (not deferrable or self.connection.features.supports_deferrable_unique_constraints)
+            and (not include or self.connection.features.supports_covering_indexes)
+            and (
+                not expressions
+                or self.connection.features.supports_expression_indexes
+                or self._check_expression_indexes_applicable(expressions)
+            )
+            and (
+                nulls_distinct is None
+                or self.connection.features.supports_nulls_distinct_unique_constraints
+            )
+        )


This copies much from the base class. It needs a comment explaining the method is is overridden. I guess it's about _check_expression_indexes_applicable.

timgraham · 2025-12-17T01:51:32Z

docs/topics/indexes.rst

+Indexes from Expressions
+========================
+
+Django MongoDB Backend now supports creating indexes from expressions.
+Currently, only ``F()`` expressions are supported, which allows referencing
+fields from the top-level model inside embedded fields.
+
+Example::
+
+    from django.db import models
+    from django.db.models import F
+
+    class Author(models.EmbeddedModel):
+        name = models.CharField()
+
+    class Book(models.Model):
+        author = models.EmbeddedField(Author)
+
+        class Meta:
+            indexes = [
+                models.Index(F("author__name")),
+            ]


Instead of this document, probably "Indexing embedded models" should be part of that topic guide.

timgraham · 2025-12-17T01:52:57Z

docs/releases/5.2.x.rst

+- Added support for creating indexes from expressions.
+  Currently, only ``F()`` expressions are supported to reference top-level
+  model fields inside embedded models.


This should say something like "You can now index embedded model fields by adding an index to the parent model." and link to the new documentation.

timgraham · 2025-12-17T02:07:02Z

tests/schema_/test_embedded_model.py

+        class Book(models.Model):
+            author = EmbeddedModelField(Author)
+
+            class Meta:
+                app_label = "schema_"
+                indexes = [
+                    models.Index(F("author__indexed_two").asc(), name="indexed_two"),
+                    models.Index(F("author__address__indexed_one").asc(), name="indexed_one"),
+                ]
+
+        new_field = EmbeddedModelField(Author)
+        new_field.set_attributes_from_name("author")
+
+        with connection.schema_editor() as editor:
+            # Create the table and add the field.
+            editor.create_model(Book)
+            editor.add_field(Book, new_field)


I think the test you copied this from needs to be fixed, but this logic isn't correct. You started with Book that already has an author field, then added the field again. Anyway, this doesn't simulate how the migration operations will be generated. When you add the author field like this, you'll also have AddIndex operations, so really I think there's nothing additional to test. Unlike the case that you copied this test from, you haven't added any logic to SchemaEditor that requires testing. I'll try to provide some more guidance about this later.

timgraham · 2025-12-17T02:11:33Z

django_mongodb_backend/schema.py

+    def _unique_supported(
+        self,
+        condition=None,
+        deferrable=None,
+        include=None,
+        expressions=None,
+        nulls_distinct=None,
+    ):


I didn't analyze this complete, but it could be problematic to override this without also overriding Index.check. This method will silently ignore unsupported indexes. The system check framework is what gives the user a warning that the index they declared isn't supported.

WaVEV force-pushed the INTPYTHON-698-Define-indexes-on-Embedded-Models-in-Top-Level-Model branch from 998a080 to 5dd116b Compare August 25, 2025 03:05

timgraham reviewed Aug 25, 2025

View reviewed changes

WaVEV force-pushed the INTPYTHON-698-Define-indexes-on-Embedded-Models-in-Top-Level-Model branch from 3e722d1 to 2010776 Compare August 25, 2025 23:16

timgraham changed the title ~~Intpython 698 define indexes on embedded models in top level model~~ INTPYTHON-698 Allow defining embedded model indexes on the top-level model Aug 26, 2025

WaVEV force-pushed the INTPYTHON-698-Define-indexes-on-Embedded-Models-in-Top-Level-Model branch 6 times, most recently from 86aed5e to 9636c17 Compare September 6, 2025 03:34

WaVEV force-pushed the INTPYTHON-698-Define-indexes-on-Embedded-Models-in-Top-Level-Model branch from 9636c17 to e1c98f3 Compare September 6, 2025 19:33

WaVEV marked this pull request as ready for review September 6, 2025 19:40

Jibola requested changes Oct 7, 2025

View reviewed changes

WaVEV force-pushed the INTPYTHON-698-Define-indexes-on-Embedded-Models-in-Top-Level-Model branch from e1c98f3 to eb29e72 Compare October 11, 2025 15:19

WaVEV force-pushed the INTPYTHON-698-Define-indexes-on-Embedded-Models-in-Top-Level-Model branch from eb29e72 to c0707cc Compare October 30, 2025 00:08

WaVEV mentioned this pull request Nov 1, 2025

Add as_expr parameter to OrderBy #438

Merged

WaVEV force-pushed the INTPYTHON-698-Define-indexes-on-Embedded-Models-in-Top-Level-Model branch 3 times, most recently from d40c24a to 9c56be4 Compare November 3, 2025 01:38

WaVEV closed this Nov 3, 2025

WaVEV deleted the INTPYTHON-698-Define-indexes-on-Embedded-Models-in-Top-Level-Model branch November 3, 2025 01:49

WaVEV restored the INTPYTHON-698-Define-indexes-on-Embedded-Models-in-Top-Level-Model branch November 3, 2025 02:18

WaVEV reopened this Nov 3, 2025

WaVEV force-pushed the INTPYTHON-698-Define-indexes-on-Embedded-Models-in-Top-Level-Model branch 3 times, most recently from 42d9062 to 4ad1c49 Compare November 4, 2025 04:58

Support index definition on Embedded Models in top level model.

59c6b51

WaVEV added 2 commits November 9, 2025 18:04

Add docs.

2d55d9b

Add docs.

d9c04dc

WaVEV force-pushed the INTPYTHON-698-Define-indexes-on-Embedded-Models-in-Top-Level-Model branch from 1c28e45 to d9c04dc Compare November 9, 2025 21:04

Jibola approved these changes Dec 16, 2025

View reviewed changes

timgraham reviewed Dec 17, 2025

View reviewed changes

INTPYTHON-698 Allow defining embedded model indexes on the top-level model #376

Are you sure you want to change the base?

INTPYTHON-698 Allow defining embedded model indexes on the top-level model #376

Uh oh!

Conversation

WaVEV commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes on Django checks

Migration output

Current issue

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WaVEV Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jibola left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timgraham commented Oct 17, 2025

Uh oh!

aclark4life commented Oct 17, 2025

Uh oh!

Jibola commented Oct 28, 2025

Uh oh!

Jibola commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jibola commented Nov 11, 2025

Uh oh!

WaVEV commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

WaVEV commented Aug 25, 2025 •

edited

Loading

WaVEV Aug 26, 2025 •

edited

Loading

Jibola commented Nov 11, 2025 •

edited

Loading

WaVEV commented Nov 11, 2025 •

edited

Loading