Skip to content

Change the way we handle series categories in a post-MCA world #652

@JulianKniephoff

Description

@JulianKniephoff

As already implied by #648, the way series categories are implemented lead to problems with some changes made during the development of the MCA feature.

Specifically, the problem is that in the MCA-world, annotations refer to their labels by ID instead of having a copy of their title, abbreviation, and color. That leads to deleted labels having to stick around.

Series categories or more specifically labels are managed in a rather roundabout way including a lot of deletion and (re-)creation every time they are requested. In combination, these facts lead to rather quickly growing duplication of labels in the database, and consequently the initial data fetching of the tool.

I will include my understanding of how series categories are implemented below the fold, as I (re-)develop it; the goal of this issue is then to think up and evaluate alternatives to this approach.


How are series categories implemented?

Throughout this investigation I will be referring to code in the snapshot ef2f54e, which is the current state of the next branch at the time of this writing.

Creation

It probably makes sense to look at how series categories and labels are created first, so we know what we are dealing with in therms of database and application state.

The main entry point for that is postCategoryResponse which the frontend calls by POST-ing to /video/.../categories. The newly created category is not series category, yet, though; after that, the frontend PUT-s the same category, calling putCategoryResponse, and sets series_category_id to its own id. This, then, is the marker of a series category; a category where series_category_id === id. This call also encodes the series ID into the category. This will be important later on.

Now, a category is nothing without its labels, and in fact, if everything goes right, you shouldn't be able to create a category without at least one, so if we look at the creation of a category, we also need to consider the creation of a label.

Labels

The labels are created by the frontend by POST-ing to .../category/.../labels for every label created in the modal. This leads to eventually calling postLabelResponse. Note that before that, though, the labels of the newly requested category are queried twice, once after the initial POST and once after the PUT, only to come back empty both times. This might be important, though, since, as we will see later, querying labels of a series category is kind of the work horse of this implementation and also the crux of the associated bugs in combination with MCAs.

Anyway, postLabelResponse doesn't really do anything special, meaning you can't really tell the labels of a series category apart from “normal labels;” “series labels” aren't really a thing, yet, they are all just labels. You have to look at the category.

A note about udpates

Now, one important caveat is that when an existing category is saved/updated, it is treated slightly differently, both by the frontend and the backend, when it has or gets a series_category_id. The frontend kind of “redirects” certain API requests, and the backend makes its own adaptations to the request. We will look at this in more detail later, since even though this happens once every time during (series) category creation, this should only be relevant when said series_category_id differs from the category's id.

The POV of another video

If we go through the whole feature in a kind of chronological order, the next thing we might want to look at is how we are getting to these categories/labels in another video that belongs to the same series.

The tool loads all the categories and labels it needs in the beginning, starting with the categories:

Categories

The tool calls GET /video/.../categories, passing the series-extid as URL parameter. This call ultimately lands in getCategories, which first gathers all the categories belonging to the video directly. This might contain series categories, if they were created on this video, but assuming, like we do here, that a series category was created on another video and this is the first time we load this video, those won't be in here, yet. TODO Is this right?

Next we look for all categories that belong to this series, and that refer to themselves as the series category; these are the “master categories.” We then compare the categories that belong to the video with all of these, and when we find a “match” (i.e. name, description, settings, and tags are qual), we override all of its properties with that of the master category. This will be important for changing series categories as we'll see later.

Next, we go through the master categories again and see if we find a “local copy” in the video categories already and if not, we create one. The result is that we get a local copy of each master category and we keep these up to date with changes to the master category.

Labels

Now, with all of these categories in hand, the tool queries the labels for each of them. The frontend calls GET /.../categories/.../labels for either the video category or a local copy of a master series category. This lands in getLabels. This works similar, but not identical to the above:

First, we again get all the labels that are associated with the given category. Then we check if we are looking at a series category, and if so, we just delete all the labels we just found (read: mark as deleted) and create new ones as copies from the labels of the series category. They are marked as “series labels” by setting their series_label_id to the id of the “master label.”

Updates and deletion

The meat of the implementation happens during querying. This is a rather lazy approach to implementing something like this. All that's left to do during updating and deleting is “redirecting” the requests for updating/deleting local copies to their corresponding master, and to “propagate” updating/deleting from the master to the copies where this doesn't happen automatically by the Schrödinger's series category/label approach above. Let's breeze through it.

Updating/deleting categories

Whenever the frontend sync-s a local copy of a series category (which happens when you save it in the modal, and specifically also happens when you edit the labels, before doing anything with the labels), it switches its ID for the series category ID, effectively redirecting any subsequent API calls related to that category. That should make working with the local copy more or less equivalent to working with the master category, which, for updating, means the “right” category is updated, and for deleting, it means deleting all the copies as well, and also deleting all labels hanging off of any of those categories.

This should also make new labels “land” in the right (master series) category.

Updating/deleting labels

Updating series labels from a video where they didn't originate isn't possible at the moment, apparently; the frontend just doesn't send any request, but (temporarily until the next refresh) duplicates labels. See #654. In theory, looking at the code, the frontend would just PUT to the copy of the label under the copy of the category, but updateLabel then checks whether this label is a series label and properly redirects the update to the master label.

The same is true for deletion.

Changing the affiliation of categories

TKTKTK

Weird things ...

... I found which we could fix at the same time or which might just become obsolete

  • Why do we pass around the series ID? The backend can just know it. Also it doesn't really seem to be used for anything, at least for categories?
  • When creating a category, it's first POST-ed only to be immediately updated via another PUT call.
    • It wouldn't surprise me if this happened for other resources as well.
    • After it is created, but before the labels are created, it has its labels requested once, as well ...
      • Actually twice, once after POST, once after PUT.
  • putCategoryResponse has logic to handle updating a local copy of a series category, but that is never called, because the frontend already redirects these calls, right? 🤔
  • Comparing categories to link them together is brittle

Potential fixes and/or alternative approaches

Keep series labels in sync just like series categories

While the entire approach outlined above is super convoluted and bad, the simplest solution might just be to transfer the “syncing logic” from categories to labels, instead of deleting and recreating them. That would entail at least the following changes:

  • getLabels no longer deletes labels.
  • Instead it overrides existing ones and creates new ones, like getCategories.
  • And that could already be it. 👀

Others

TKTKTK

Optional

  • Whatever approach we take, labels won't be duplicated (as much) anymore, and thus, annotations created with them will have more of the same label ID. We might want to provide a script to improve the situation for older annotations as well.
    • This should not be necessary with the current feature set, though.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions