Skip to content

Conversation

@Aliaksandr-Fedasiuk
Copy link
Contributor

Purpose

Using a partitioned journal_records table for increase performance for DI summary

Approach

SQL script for converting journal_records table as partitioned table.

Learn

MODSOURMAN-1204

@sonarqubecloud
Copy link

@NikitaSedyx
Copy link
Contributor

Code Review for MODSOURMAN-1204: Database Partitioning Migration and Related Updates

Overview of the code changes: This branch introduces database table partitioning for journal records by entity type, updates Lombok version, adds tenant_id to SQL queries, and includes additional test logging. The main feature implements a partitioning strategy to improve query performance by segmenting journal records based on entity types.

Files Changed:

  • mod-source-record-manager-server/pom.xml - Lombok version update
  • mod-source-record-manager-server/src/main/resources/templates/db_scripts/create_get_job_log_entries_function.sql - SQL function update
  • mod-source-record-manager-server/src/main/resources/templates/db_scripts/migration_to_partitioning_logs.sql - New partitioning migration script
  • mod-source-record-manager-server/src/main/resources/templates/db_scripts/schema.json - Schema version updates
  • mod-source-record-manager-server/src/test/java/org/folio/rest/impl/metadataProvider/MetaDataProviderJobLogEntriesAPITest.java - Test improvements

Suggestions

🔧 Missing SQL Transaction Management in Migration Script

  • Priority: 🔥 Critical
  • File: mod-source-record-manager-server/src/main/resources/templates/db_scripts/migration_to_partitioning_logs.sql
  • Details: The migration script performs critical data operations (table creation, data migration, table renaming) without explicit transaction boundaries. If any step fails, the database could be left in an inconsistent state.
  • Suggested Change:
BEGIN;

-- Create partitioned table structure
CREATE TABLE IF NOT EXISTS ${myuniversity}_${mymodule}.journal_records_entity_type (
  -- ... existing definition
);

-- Create partitions
-- ... existing partition creation

-- Migrate data with error handling
INSERT INTO ${myuniversity}_${mymodule}.journal_records_entity_type (...)
SELECT ... FROM ${myuniversity}_${mymodule}.journal_records;

-- Create indexes
-- ... existing index creation

-- Rename tables atomically
ALTER TABLE ${myuniversity}_${mymodule}.journal_records RENAME TO journal_records_backup;
ALTER TABLE ${myuniversity}_${mymodule}.journal_records_entity_type RENAME TO journal_records;

COMMIT;

🔧 Potential Data Loss Risk in Migration

  • Priority: 🔥 Critical
  • File: mod-source-record-manager-server/src/main/resources/templates/db_scripts/migration_to_partitioning_logs.sql
  • Details: The migration creates partitions for specific entity types but may not handle unexpected entity_type values. Records with entity types not covered by the partitions could cause insertion failures.
  • Suggested Change: Add a default partition to catch any unexpected entity types:
CREATE TABLE IF NOT EXISTS ${myuniversity}_${mymodule}.journal_records_default PARTITION OF journal_records_entity_type DEFAULT;

⛏️ Typo in SQL Comment

  • Priority: 🟢 Low
  • File: mod-source-record-manager-server/src/main/resources/templates/db_scripts/migration_to_partitioning_logs.sql
  • Details: Line 51 contains a typo: --ANALIZE VERBOSE should be --ANALYZE VERBOSE
  • Suggested Change:
--ANALYZE VERBOSE ${myuniversity}_${mymodule}.journal_records;

🔧 Missing Error Handling in Data Migration

  • Priority: ⚠️ High
  • File: mod-source-record-manager-server/src/main/resources/templates/db_scripts/migration_to_partitioning_logs.sql
  • Details: The INSERT statement migrating data doesn't include error handling or validation. Large datasets could cause timeouts or constraint violations.
  • Suggested Change: Consider adding batch processing or validation:
-- Validate data before migration
DO $$
BEGIN
  IF (SELECT COUNT(*) FROM ${myuniversity}_${mymodule}.journal_records WHERE entity_type NOT IN 
     ('MARC_BIBLIOGRAPHIC', 'PO_LINE', 'MARC_HOLDINGS', 'MARC_AUTHORITY', 
      'HOLDINGS', 'AUTHORITY', 'INSTANCE', 'ITEM', 'ORDER', 'INVOICE', 'EDIFACT', '', NULL)) > 0 THEN
    RAISE EXCEPTION 'Unexpected entity_type values found. Please review data before migration.';
  END IF;
END $$;

👍 Good Practice: Lombok Version Update

  • Priority: 🟢 Low
  • File: mod-source-record-manager-server/pom.xml
  • Details: Updating Lombok from 1.18.30 to 1.18.34 is a good practice for security and bug fixes. The version jump appears reasonable and shouldn't introduce breaking changes.

🔧 Incomplete Index Strategy for Partitioned Table

  • Priority: ⚠️ High
  • File: mod-source-record-manager-server/src/main/resources/templates/db_scripts/migration_to_partitioning_logs.sql
  • Details: The script creates a unique index including entity_type in the key, but partition pruning efficiency could be improved with additional indexes on individual partitions for frequently queried columns.
  • Suggested Change: Consider adding partition-specific indexes:
-- Add indexes on individual partitions for better query performance
CREATE INDEX CONCURRENTLY IF NOT EXISTS journal_records_marc_bib_job_exec_idx 
  ON ${myuniversity}_${mymodule}.journal_records_marc_bibliographic (job_execution_id);
CREATE INDEX CONCURRENTLY IF NOT EXISTS journal_records_marc_bib_source_idx 
  ON ${myuniversity}_${mymodule}.journal_records_marc_bibliographic (source_id);
-- Repeat for other high-traffic partitions

💭 Consider Impact on Existing Queries

  • Priority: 🟡 Medium
  • File: mod-source-record-manager-server/src/main/resources/templates/db_scripts/migration_to_partitioning_logs.sql
  • Details: The migration changes the underlying table structure. Existing queries that don't include entity_type in WHERE clauses may perform poorly as they'll scan all partitions.
  • Suggested Change: Review and optimize existing queries to include entity_type filters where possible.

⛏️ Excessive Logging in Tests

  • Priority: 🟢 Low
  • File: mod-source-record-manager-server/src/test/java/org/folio/rest/impl/metadataProvider/MetaDataProviderJobLogEntriesAPITest.java
  • Details: Multiple .log().all() calls have been added, including duplicate calls on lines 135 and 136. This will create verbose test output.
  • Suggested Change: Remove duplicate logging calls and consider using conditional logging:
.then()
.log().ifValidationFails() // Only log on failures
.statusCode(HttpStatus.SC_OK)

👍 Good Addition: Stronger Test Assertions

  • Priority: 🟢 Low
  • File: mod-source-record-manager-server/src/test/java/org/folio/rest/impl/metadataProvider/MetaDataProviderJobLogEntriesAPITest.java
  • Details: The addition of object deserialization and assertThat statements (lines 1062-1081) provides stronger validation than just REST Assured body assertions.

📝 Missing Documentation for Schema Version

  • Priority: 🟡 Medium
  • File: mod-source-record-manager-server/src/main/resources/templates/db_scripts/schema.json
  • Details: The schema.json adds two new migration scripts but doesn't include documentation about what mod-source-record-manager-3.10.0 introduces.
  • Suggested Change: Consider adding comments or documentation describing the purpose of the new version and migrations.

🔧 SQL Query Modification Impact Analysis Needed

  • Priority: ⚠️ High
  • File: mod-source-record-manager-server/src/main/resources/templates/db_scripts/create_get_job_log_entries_function.sql
  • Details: The addition of tenant_id to the GROUP BY clause (line 409) changes the query's behavior. This could potentially create more result rows if the same source records exist across different tenants.
  • Suggested Change: Verify that this change aligns with the expected business logic and doesn't break existing functionality.

🌱 Future Consideration: Partition Pruning Optimization

  • Priority: 🟡 Medium
  • File: mod-source-record-manager-server/src/main/resources/templates/db_scripts/migration_to_partitioning_logs.sql
  • Details: The current partitioning strategy is by entity_type, but queries might benefit from compound partitioning (e.g., by entity_type and date ranges) for time-series data.
  • Suggested Change: Monitor query patterns after deployment and consider adding sub-partitioning if beneficial.

❓ Question: Rollback Strategy

  • Priority: ⚠️ High
  • File: mod-source-record-manager-server/src/main/resources/templates/db_scripts/migration_to_partitioning_logs.sql
  • Details: The migration script renames the original table to journal_records_backup but doesn't provide a clear rollback procedure.
  • Suggested Change: Document the rollback process or provide a companion script for reverting the migration if needed.

Summary

This code review identified several critical issues that should be addressed before deployment:

Critical Issues (🔥):

  • Missing transaction management in the migration script could leave the database in an inconsistent state
  • Potential data loss risk for records with unexpected entity types
  • Missing comprehensive error handling during data migration

High Priority Issues (⚠️):

  • Incomplete index strategy may impact query performance
  • SQL query modification needs impact analysis
  • No documented rollback strategy for the migration

Positive Aspects (👍):

  • Lombok version update follows best practices
  • Enhanced test assertions provide better validation
  • Partitioning strategy should improve query performance for large datasets

Recommendations:

  1. Add transaction boundaries and error handling to the migration script
  2. Include a default partition for unexpected entity types
  3. Document rollback procedures
  4. Review the impact of the tenant_id addition to the GROUP BY clause
  5. Consider adding partition-specific indexes for optimal performance

The partitioning migration is a significant database change that requires careful testing in a staging environment before production deployment.

@Aliaksandr-Fedasiuk
Copy link
Contributor Author

This is an experimental PR to test one of the ideas for speeding up the receiving of import results from the DB.
This was a very helpful code review. I'm confident this functionality will be a great time-saver and streamline our code review process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants