-
Notifications
You must be signed in to change notification settings - Fork 481
fix(jobs): changed copy host job to use scroll API for host with large content volume #34044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c6ab7b3 to
ca53053
Compare
| } catch (final InterruptedException e) { | ||
| Logger.warn(this, "Batch pause was interrupted", e); | ||
| Thread.currentThread().interrupt(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should always sleep. to avoid starvation. if the minimum is set to 0, we should default to something
| private static final String SELECT_SYSTEM_HOST = "SELECT id FROM identifier WHERE id = '"+ Host.SYSTEM_HOST+"' "; | ||
|
|
||
| private static final String FROM_JOINED_TABLES = "INNER JOIN identifier i " + | ||
| "ON c.identifier = i.id AND i.asset_subtype = '" + Host.HOST_VELOCITY_VAR_NAME + "' " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this getting removed?
Is it good enough to filter by the structure inode?
Gain in performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice set of improvements !
| sourceContentlets.size(), batchSize, relationshipBatchSize, batchPauseMs)); | ||
|
|
||
| // Strategy: Process simple content immediately, collect HTML pages for later processing | ||
| final List<String> htmlPageInodes = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps a LinkedList is best suited here. or an initialization using the size of the dbResults
640ce0b to
9ef0091
Compare
…e content volume (#33661)
9ef0091 to
163e018
Compare
Closes #33661
This PR addresses performance issues and pagination errors in the site copy job by implementing ElasticSearch Scroll API for a large result set.
Proposed Changes
indexSearchScrollmethod from theESContentFactoryImpclass to expose the ES scroll API in a new wrapper interfaceESContentletScroll. ThePaginatedContentletsclass uses this new interface to iterate on results using the ES scroll API.HostFactoryImplwere optimized to usestructure_inodefield fromcontentlettable to filter hosts, and also to use the ILIKE clause in SQL conditions to match case insensitive values.Checklist