⚡️ Speed up method LabelStudioAnnotation.to_dict by 3,362%
#4
+2
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 3,362% (33.62x) speedup for
LabelStudioAnnotation.to_dictinunstructured/staging/label_studio.py⏱️ Runtime :
13.3 milliseconds→385 microseconds(best of135runs)📝 Explanation and details
The optimization achieves a dramatic 3361% speedup by eliminating expensive deep copy operations and using more efficient data structures.
Key optimizations:
Replaced
deepcopy(self.__dict__)withdict(self.__dict__): The original code useddeepcopytwice - once to create the initial dictionary and again to create_annotation_dict. The profiler shows thesedeepcopycalls consumed 97.8% of the total runtime (65.1% + 32.7%). The optimization uses a shallow copy instead, which is orders of magnitude faster since we only need the top-level dictionary structure.Dictionary comprehension instead of pop operations: The final filtering step was replaced from a loop with
.pop()calls to a single dictionary comprehension{k: v for k, v in annotation_dict.items() if v is not None}. This eliminates the need for the seconddeepcopyentirely and is more efficient than iterating and mutating the dictionary.Why this works safely: The shallow copy is sufficient because the code immediately replaces the nested objects (
resultandreviews) with new lists created by calling.to_dict()on their elements. This means the original nested structure isn't shared, maintaining the same isolation behavior as the deep copy.Performance impact: The profiler shows the time dropped from 87.9ms to 1.9ms for the core function logic, with most time now spent on the actual work (converting results/reviews) rather than unnecessary copying. This optimization is particularly beneficial for objects with many fields or nested structures, making it ideal for data serialization workflows where
to_dict()may be called frequently.✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
staging/test_label_studio.py::test_created_annotationTo edit these changes
git checkout codeflash/optimize-LabelStudioAnnotation.to_dict-mjcbd4c4and push.