-
Notifications
You must be signed in to change notification settings - Fork 16
SKU/molecule, Phase 2 #1455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sku/molecule
Are you sure you want to change the base?
SKU/molecule, Phase 2 #1455
Conversation
* GQL: split Molecule into MoleculeV1 & MoleculeV2 * GQL: MoleculeMutV2 WIP * Move account_mut.rs to ./account_mut/ * GQL: Account quotas * test_search_accounts_by_name_pattern(): fix rebase collision * GQL: MoleculeDataRoomMut * GQL: MoleculeV2: announcements & whole file tweaks * GQL: MoleculeMutV2: announcements * GQL: MoleculeV2::activity() * Fixes after self review * AccountQuotasMut::set_user_level_quotas(): add doc string * Backported changes to make clippy a bit happier
resources/schema.gql
Outdated
| Access the underlying core Dataset | ||
| """ | ||
| dataset: Dataset! | ||
| entries(pathPrefix: CollectionPath, maxDepth: Int, page: Int, perPage: Int): MoleculeDataRoomEntryV2Connection! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data room entries listing should also have filters parameter to enable filtering by tags, categories.
Additionally I think we should include accessLevel into the filters too. Currently Molecule fetches all entries and filters them on their end based on what access level user has, but this doesn't work well with pagination. I think instead they should be telling us what access levels are allowable and we will account for those in filtering on our side.
* Tests: mark the current molecule tests as v1 * Modularize Molecule queries * Molecule: extract common things into common.rs * molecule/v2: create a dir * molecule/v1: create a dir * molecule_mut/: create a dir * molecule_project_v2.rs: extract * molecule_activity_event_v2.rs: extract * molecule_data_room_dataset_v2.rs: extract * molecule/: split rest entities * molecule_mut/: split rest entities * MoleculeV2::activity(): add "filters" arguments * MoleculeAnnouncementsDatasetV2::tail(): add "filters" argument * schema.gql: regenerate
* UpdateVersionFileUseCaseHelper: extract * MoleculeDataRoomMutV2::start_upload_file(): implement * QueryService::get_changelog_projection(): implement * QueryService::get_changelog_projection(): add "hint" options * MoleculeDataRoomMutV2: polished methods expect "finish_upload_file()" * MoleculeDataRoomMutV2::finish_upload_file_new_file() * MoleculeDataRoomFinishUploadFileV2 * MoleculeDataRoomMutV2::update_file_metadata(): a try to add the second step * MoleculeDataRoomMutV2::finish_upload_file_new_file_version() * clippy fixes * schema.gql: regenerate * Fix Molecule v1 tests * Working on Molecule v2 data room api * Molecule v2 API: implement basic data room operations * Migration fix * Inlined `UpdateVersionFileUseCaseHelper` within GQL utils, as we may not use `UploadService` in the "datasets" domain. + fixed MySQL migration * Try fixing ODF code generation flow * MoleculeDataRoomMutV2::finish_upload_file_new_file_version(): use UpdateCollectionEntriesUseCase * MoleculeDataRoomMutV2::finish_upload_file_new_file(): use UpdateCollectionEntriesUseCase * MoleculeDataRoomMutV2::move_entry(): use UpdateCollectionEntriesUseCase * MoleculeDataRoomMutV2::remove_entry(): use UpdateCollectionEntriesUseCase * MoleculeDataRoomMutV2::update_file_metadata(): implement * Schema regenerated * test_molecule_v2_data_room_operations(): use pretty_assertions::assert_eq * moveEntry test * removeEntry test * updateFileMetadata test * schema.gql: regenerate --------- Co-authored-by: Sergii Mikhtoniuk <mikhtoniuk@gmail.com> Co-authored-by: Sergei Zaychenko <zaychenko.sergei@gmail.com>
…1463) * Established Molecule domain & service crates. Extracted "View Molecule Projects" use case, and plugged into v1/v2 APIs * Extracted "Find Molecule project" use case * Extracted "Create Molecule project use case". Molecule dataset snapshots, as well as generic VersionedFile/Collection dataset snapshots are now a domain, not GQL concern. * Extracted `MoleculeProjectEntity` object to use instead of untyped JSON objects at domain level. GQL objects converted to [Object], replacing [SimpleObject], and keep entity. * Improved telemetry in new Molecule service layer * Sketched MoleculeProjectMessage outbox events. For now, sending "Created" message from the corresponding use case. * Review notes fixed
* UpdateCollectionEntriesUseCaseImpl::build_data_batches(): return note * MoleculeDatasetSnapshots::data_room_v2(): create alias internally * MoleculeDatasetSnapshots::projects(): create alias internally * MoleculeDatasetSnapshots::announcements(): create alias internally * StageDataResult: update doc strings * PushIngestOpts: fix a typo * MoleculeDatasetSnapshots::global_data_room_activity() * MoleculeProjectService -> MoleculeDatasetService * MoleculeDatasetService::get_global_data_room_activity_dataset() * MoleculeProjectV2::ipnft_token_id(): use U256 * MoleculeAppendDataRoomActivityUseCaseImpl * Tests stabilized * MoleculeDatasetSnapshots::global_data_room_activity(): update comment re LIST<BYTE_ARRAY item (STRING)>) * MoleculeDataRoomMutV2::finish_upload_file_new_file_version(): write data room activity * MoleculeDataRoomMutV2::finish_upload_file_new_file(): add maintainer permissions to molecule * DatasetHandleLoader: add AccessCheckedDatasetRef-related load() method * DatasetHandleLoader: add AccessCheckedDatasetRef-related load() method + ResolvedDataset * MoleculeVersionedFile::latest(): use data loader * MoleculeViewDataRoomActivitiesUseCaseImpl * MoleculeV2::activity() * MoleculeV2::activity() * test_molecule_v2_data_room_operations(): global activity checks (part) * MoleculeProjectV2::get_data_room_activity_events(): fixed, unit-tested * clippy fixes * test_internal_error(): simplify * OperationType::deserialize(): simplify * access_level: update todo * Add todos * Consts for snapshot names * Add global prefix
#1468) * Datasets domain use cases reorganized by folders * Extracted 'ViewCollectionEntriesUseCase' use case in datasets domain * Merge corrections * Extracted `FindCollectionEntryUseCase` * FindCollectionEntryUseCase => FindCollectionEntriesUseCase * More renaming cleanups * Listing structs: use EntityPageListing template. Introduced `CollectionPath` at datasets domain level. Domain's `ExtraDataFields` applied more systematically. * Extracted `ViewVersionedFileHistoryUseCase` * Extracted `FindVersionedFileVersionUseCase` use case * Simplified structures in `UpdateCollectionEntriesUseCase` * Cleanups in GQL adapters for collections and versioned files * Minor review
* Finish global data room activity * Correct project data room activities
* Add disable/enable project api * Fix clippy * Add comments * Fix review comments - Iter 1 * Refactor changelog entry duplication * Refactore: use GraphQLQueryRequest in tests * Add chain length asserts * Make chain search with alias parameter * Fix review comments * Fix tests * Update schema
…1472) * Molecule use cases need some folder structure too. Extracted use cases `MoleculeFindProjectDataRoomEntryUseCase` and `MoleculeViewProjectDataRoomEntriesUseCase`: those indirectly request project data rooms as a collection dataset, and map structures. A direct collection adapter talking to service layer of `kamu-datasets` (not to GQL!), with the extension seam for future federation (invoking collection entries from base GQL API remotly) * Got rid of manual DataFrame at GQL level when writing or updating file versions * `MoleculeDataRoomEntry`: simplified domain structure and GQL equivalent * Some intermediate cleanups after merge * First attempt to extract data room UPSERT use case * Upsert data room entry: returning new data room record * Extracted `MoleculeRemoveProjectDataRoomEntryUseCase` * Extracted `MoleculeMoveProjectDataRoomEntryUseCase` + aligned common parts with removals * Update metadata uses data-room level upsert use case * Telemetry cleanup * Naming cleanups * First sketch of data room outbox message: sending for move and remove * Split upsert data room entry on create and update UC, as they need to produce different outbox output * Propagating source event time for collection entry operations * Propagating system time from versioned file ingest properly * Got rid of extra ReBAC check at highest data room access point
* MoleculeDatasetService::get_global_announcements_dataset() * MoleculeCreateAnnouncementUseCase * MoleculeProjectMutV2::announcements() * format-utils crate * MoleculeAnnouncementsDatasetMutV2::create() * MoleculeViewGlobalDataRoomActivitiesUseCaseImpl: respect announcements * Molecule use cases: activity/ -> activities/ * Adaptation to the latest refactoring * MoleculeV2::activity() * MoleculeAnnouncements (project) * MoleculeProjectV2::activity(): update with announcements * MoleculeCreateAnnouncementUseCaseImpl: register * Tests fix * Tests fix [2] * Tests fix [3] * MoleculeProjectAnnouncementDataRecord: add a TODO * schema.gql: regenerate
* test_molecule_v2_announcements_operations(): checkpoint -- add 2 files * test_molecule_v2_announcements_operations(): checkpoint -- create empty announcement * test_molecule_v2_announcements_operations(): checkpoint -- Create an announcement with one attachment * test_molecule_v2_announcements_operations(): checkpoint -- Create an announcement with two attachments * test_molecule_v2_announcements_operations(): checkpoint -- Create an announcement with attachment DID that does not exist * test_molecule_v2_announcements_operations(): checkpoint -- Announcements are listed as expected * test_molecule_v2_announcements_operations(): finish * MoleculeAnnouncementEntry: system_time/event_time
* MoleculeEncryptionMetadata * schema.gql: regenerate * MoleculeDatasetSnapshots::versioned_file_v2(): remove todo * MoleculeEncryptionMetadata: extract to domain
…nPathV2` scalar (#1480) * DatasetNameGenerator * MoleculeDataRoomMutV2::build_new_file_dataset_alias(): use DatasetNameGenerator * CollectionPathV2 (domain) * CollectionPathV2 (domain): updates * CollectionPathV2 (GQL) * CollectionPathV2 (GQL): tests * schema.gql: regenerate * kamu-datasets: remove unused dep * Test fixes after resent changes * RUSTSEC-2025-0134
* ViewCollectionEntriesUseCase: support extra data filters * From<GetDataRoomCollectionEntriesFilters> for Option<kamu_datasets::ExtraDataFieldsFilter> * MoleculeDataRoomCollectionService::get_data_room_collection_entries(): add filters * MoleculeViewDataRoomEntriesUseCaseImpl: filters * MoleculeDataRoomProjection::entries(): filters * MoleculeDatasetSnapshots::global_announcements(): update SetInfo * utils::DataFrameExtraDataFieldsFilterApplier: extract * MoleculeAnnouncements::tail(): filters * GetDataRoomCollectionEntriesFilters -> GetMoleculeDataRoomCollectionEntriesFilters * A clearer separation of filter entities * MoleculeAnnouncements::tail(): filters * MoleculeProjectV2::get_data_room_activity_events(): filters * MoleculeProjectV2::get_data_room_activity_events(): filters [2] * MoleculeProjectV2::activity(): filters * MoleculeViewGlobalActivitiesUseCase: filters * schema.gql: regenerate * test_molecule_v2_activity(): start unlocking * test_molecule_v2_activity(): Activities are empty
…#1489) * Versioned files: sketched and plugged create/update use cases * Specialized use case for update file metadata. Integrated read file version use case, and simplified read model. * Minor: versioned file API moved out of data room file * Minor: avoid cloning file info for serde * Minor: unifying arguments of update/upload use cases * Clarified access checking in versioned file use cases * Isolated versioned file content access behind a service * Avoiding ResolvedDataset and similar in Molecule domain interface * MoleculeVersionedFile::asOf supported. Drafted MoleculeVersionedFile::matching (not public) - takes the versioned file version that exactly matches data room entry. MoleculeVersionedFile::latest is correctly not reusing denromalized data, as it's not guaranteed the data room entry is the latest one. * Revised schema optionals * Spelling * Enabled MoleculeVersionedFile::matching endpoint. Optimized MoleculeVersionedfile::latest endpoint, when data room entry is also the latest, using denormalized data. * Guiding comments
) * test_molecule_v2_activity(): Create a few versioned files * test_molecule_v2_activity(): Upload new file versions * test_molecule_v2_activity(): Link new file into the project data room -- not relevant for v2 * test_molecule_v2_activity(): Move a file (retract + append) * test_molecule_v2_activity(): Update a file (correction from-to) -- not relevant for v2 * test_molecule_v2_activity(): Create an announcement * test_molecule_v2_activity(): Upload a new file version * test_molecule_v2_activity(): Remove a file * test_molecule_v2_activity(): Check project activity events * test_molecule_v2_activity(): Create another project * test_molecule_v2_activity(): Create an announcement for the second project * test_molecule_v2_activity(): Check global activity events * test_molecule_v2_activity(): In-between activity asserts * test_gql_custom_molecule_v2: remove misleading clone() * test_molecule_v2_activity(): Filters without values * datafusion: register array functions * DataFrameExtraDataFieldsFilterApplier:: respect array columns * test_molecule_v2_activity(): Filters by tags: tag1 * test_molecule_v2_activity(): Filters by tags: [tag2] * test_molecule_v2_activity(): // Filters by tags: [tag2, tag1] * test_molecule_v2_activity(): Filters by categories: [test-category-1] * test_molecule_v2_activity(): Filters by categories: [test-category-2] * test_molecule_v2_activity(): Filters by categories: [test-category-2, test-category-1] * test_molecule_v2_activity(): Filters by access levels: [public] * test_molecule_v2_activity(): Filters by access levels: [holders] * test_molecule_v2_activity(): Filters by access levels: [public, holders] * test_molecule_v2_activity(): Filters combination: [test-tag2] AND [test-category-1] AND [holders] * test_molecule_v2_activity(): Project filters * test_molecule_v2_announcements_operations(): announcements filters * test_molecule_v2_data_room_operations(): announcements filters
…ouncements) * Extracted 'view project announcements' use case * Extracted "find project announcement" use case * Split Molecule dataset services * Moved most services from Molecule domain to services crate, broke all dependencies from API level * Minor cleanups * Sketched new approach of dataset accessor and used it to simplify announcements use cases for now * Same accessor approach applied to activities * Same accessor approach applied to projects dataset * Better reader/writer helpers for projects * Naming simplifications * Simplified announcements data model * Further model unifications: clear separation between changelog entry, changelog insertion record, payload record, and entity. Cleaned up event time / system time propagation in all Molecule write use cases. * Outbox events for announcements * Outbox event for activities * Activities: extracted view project activity use case, related structures cleanup * Minor correction
* MoleculeDatasetSnapshots::global_announcements(): event_time should be NOT optional * MoleculeDatasetSnapshots::global_data_room_activity(): event_time should be NOT optional * MoleculeDatasetSnapshots::global_data_room_activity(): content_hash not null, content_type nullable * MoleculeCreateAnnouncementUseCase/MoleculeAppendGlobalDataRoomActivityUseCase: require event_time
* MoleculeDataRoomMutV2::finish_upload_file_new_file(): check path * MoleculeDataRoomMutV2::move_entry(): check path * schema.gql: regenerate * MoleculeDataRoomMutV2::finish_upload_file_new_file_version(): check ref * schema.gql: regenerate
* Add account quotas * Update schema * Fix tests * Fix review comments- Iter 1 * Fix review comments - Iter 2 * Fix imports * User correct defaults * Fix test ingest * Reduce default quota * Fix revie comments - Iter 3 * resolve account id from target dataset * Skip quota checks for single tanant mode * set account quotas only for admins * Fix review comments - Iter 4 * Fix error message propagate * Add e2e quota tests
* Add get quota default fallback * Update schema
Co-authored-by: Sergei Zaychenko <zaychenko.sergei@gmail.com>
* Add expected head field to update file metadata methdo * Add tests * Update schema --------- Co-authored-by: Sergei Zaychenko <zaychenko.sergei@gmail.com>
* Add activity byKind filters * Refactor activy kind --------- Co-authored-by: Sergei Zaychenko <zaychenko.sergei@gmail.com>
… alias resolution in multi-tenant mode (#1529) * test_dataset_entry_service.rs -> test_dataset_entry_service_impl.rs * test_dataset_entry_service_impl.rs: use explicit pretty_assertions imports * test_utils::test_for_each_tenancy(): implement * test_dataset_entry_service_impl.rs: use test_for_each_tenancy() macro * test_resolve_dataset_handles_by_refs(): by ids * test_resolve_dataset_handles_by_refs(): by aliases * test_resolve_dataset_handles_by_refs(): use resolution_report * test_resolve_dataset_handles_by_refs(): by handles * test_resolve_dataset_handles_by_refs(): mixed * test_resolve_dataset_handles_by_refs(): special case for mixed aliases in multi-tenant * DatasetEntryServiceImpl::resolve_dataset_handles_by_dataset_aliases(): resolve empty account alias name * clippy fixes * Self-review --------- Co-authored-by: Sergei Zaychenko <zaychenko.sergei@gmail.com>
* Remove molecule_ prefixes from gql args * Modify domain structs * Revert dataset schema changes
* Add change by for entry operations * Fix tests * Cleanup * Implement 2steps remove entry logic * Refactor diff helper method
…ters within same logical group via OR operator
…Entry]` not `[DatasetID]` (#1541) * FindCollectionEntriesUseCase::execute_find_by_ref(): ref as a ref not slice * MoleculeFindDataRoomEntryUseCaseImpl: update error handling * MoleculeFindDataRoomEntryUseCase::execute_find_by_refs(): impl * MoleculeDataRoomCollectionServiceImpl::find_data_room_collection_entries_by_refs(): implement * MoleculeCreateAnnouncementUseCaseImpl::validate_attachments(): corrections * MoleculeAnnouncementPayloadRecord: fix typos * MoleculeAnnouncementEntry::attachments(): return file versioned files not refs * MoleculeAnnouncementEntry::attachments(): return dataroom entries not refs * MoleculeDataRoomCollectionServiceImpl::find_data_room_collection_entries_by_refs(): restrore refs order * test_molecule_v2_activity(): update * test_molecule_v2_announcements_operations(): update * test_molecule_v2_search(): update * Self-review * test_molecule_v2_activity_change_by_for_remove(): fix
* Add molecule per project access level filter * Add access level rule deduplication logic * Make ipnftUid required field for MoleculeAccessLevelRule * Reduce clones
* GQL: guards module breakdown * Molecule::v1 -- feature gate * FeatureEnabledGuard: tests * Self-review
No description provided.