-
-
Notifications
You must be signed in to change notification settings - Fork 268
Feat/add config registry controller #7668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
No dependency changes detected. Learn more about Socket for GitHub. 👍 No dependency changes detected in pull request |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
@metamaskbot publish-previews |
|
Preview builds have been published. See these instructions for more information about preview builds. Expand for full list of packages and versions. |
| const stateMetadata = { | ||
| configs: { | ||
| persist: true, | ||
| anonymous: false, | ||
| includeInStateLogs: false, | ||
| includeInDebugSnapshot: true, | ||
| usedInUi: true, | ||
| }, | ||
| version: { | ||
| persist: true, | ||
| anonymous: false, | ||
| includeInStateLogs: true, | ||
| includeInDebugSnapshot: true, | ||
| usedInUi: false, | ||
| }, | ||
| lastFetched: { | ||
| persist: true, | ||
| anonymous: false, | ||
| includeInStateLogs: true, | ||
| includeInDebugSnapshot: true, | ||
| usedInUi: false, | ||
| }, | ||
| fetchError: { | ||
| persist: true, | ||
| anonymous: false, | ||
| includeInStateLogs: true, | ||
| includeInDebugSnapshot: true, | ||
| usedInUi: false, | ||
| }, | ||
| etag: { | ||
| persist: true, | ||
| anonymous: false, | ||
| includeInStateLogs: false, | ||
| includeInDebugSnapshot: false, | ||
| usedInUi: false, | ||
| }, | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use satisfies StateMetadata<ConfigRegistryState> to ensure that the stateMetadata object conforms to the expected structure for state metadata. As an example, anonymous is not accepted as a valid key anymore:
| const stateMetadata = { | |
| configs: { | |
| persist: true, | |
| anonymous: false, | |
| includeInStateLogs: false, | |
| includeInDebugSnapshot: true, | |
| usedInUi: true, | |
| }, | |
| version: { | |
| persist: true, | |
| anonymous: false, | |
| includeInStateLogs: true, | |
| includeInDebugSnapshot: true, | |
| usedInUi: false, | |
| }, | |
| lastFetched: { | |
| persist: true, | |
| anonymous: false, | |
| includeInStateLogs: true, | |
| includeInDebugSnapshot: true, | |
| usedInUi: false, | |
| }, | |
| fetchError: { | |
| persist: true, | |
| anonymous: false, | |
| includeInStateLogs: true, | |
| includeInDebugSnapshot: true, | |
| usedInUi: false, | |
| }, | |
| etag: { | |
| persist: true, | |
| anonymous: false, | |
| includeInStateLogs: false, | |
| includeInDebugSnapshot: false, | |
| usedInUi: false, | |
| }, | |
| }; | |
| const stateMetadata = { | |
| configs: { | |
| persist: true, | |
| includeInStateLogs: false, | |
| includeInDebugSnapshot: true, | |
| usedInUi: true, | |
| }, | |
| version: { | |
| persist: true, | |
| includeInStateLogs: true, | |
| includeInDebugSnapshot: true, | |
| usedInUi: false, | |
| }, | |
| lastFetched: { | |
| persist: true, | |
| includeInStateLogs: true, | |
| includeInDebugSnapshot: true, | |
| usedInUi: false, | |
| }, | |
| fetchError: { | |
| persist: true, | |
| includeInStateLogs: true, | |
| includeInDebugSnapshot: true, | |
| usedInUi: false, | |
| }, | |
| etag: { | |
| persist: true, | |
| includeInStateLogs: false, | |
| includeInDebugSnapshot: false, | |
| usedInUi: false, | |
| }, | |
| } satisfies StateMetadata<ConfigRegistryState>; |
|
|
||
| const DEFAULT_FALLBACK_CONFIG: Record<string, RegistryConfigEntry> = {}; | ||
|
|
||
| type ConfigRegistryPollingInput = Record<string, never>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think consumers should be able to arbitrarily set different polling cycles. We can probably remove this type and always (and only) accept null as polling input
| startPolling(input: ConfigRegistryPollingInput = {}): string { | ||
| return super.startPolling(input); | ||
| } | ||
|
|
||
| stopPolling(): void { | ||
| super.stopAllPolling(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would make sense to start and stop the polling autonomously by listening to KeyringController:unlock and KeyringController:lock events?
This way we would simplify the controller API, and make the client implementation easier
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| (state.configs as any) = { networks: {} }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cast to any seems unnecessary:
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | |
| (state.configs as any) = { networks: {} }; | |
| state.configs = { networks: {} }; |
| }); | ||
| } | ||
|
|
||
| removeConfig(key: string): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious: why should the client arbitrarily choose to remove config keys from the registry?
| clearConfigs(): void { | ||
| this.update((state) => { | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| (state.configs as any) = { networks: {} }; | ||
| }); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to removeConfig, I'm curious to know in what scenarios should the consumer clear all configurations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. The controller should be read-only for consumers, registry configs should be managed internally via polling. I'll remove these methods
|
|
||
| export type ConfigRegistryState = { | ||
| configs: { | ||
| networks?: Record<string, RegistryConfigEntry>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we already know that this configurations is for networks only, do you think it would make sense to make types more strict here, instead of generic RegistryConfigEntry?
Using specific types for network configurations would greately enhance type safety downstream, and help catch potential issues by types alone
|
|
||
| const controllerName = 'ConfigRegistryController'; | ||
|
|
||
| export const DEFAULT_POLLING_INTERVAL = 24 * 60 * 60 * 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a cool function that we can use from @metamask/utils:
import { Duration, inMilliseconds } from '@metamask/utils';
| export const DEFAULT_POLLING_INTERVAL = 24 * 60 * 60 * 1000; | |
| export const DEFAULT_POLLING_INTERVAL = inMilliseconds(1, Duration.Day); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going back at this, I think that having a default polling that is so long is equivalent to not having polling at all, because most sessions won't last that long so we would execute the polling cycle at init time only. Perhaps we should reduce this to something like 5 or 10 minutes?
| state.etag = result.etag ?? null; | ||
| }); | ||
| } catch (error) { | ||
| this.#handleFetchError(error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can intercept errors if we need to perform actions over them, but we should never suppress them. Errors should always be bubbled up to the caller unless there's a very good reason not to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, going back at this, there is a good reason to suppress the error in this case. Though, perhaps we can record the event with messenger.captureException?(Error)
| state = {}, | ||
| pollingInterval = DEFAULT_POLLING_INTERVAL, | ||
| fallbackConfig = DEFAULT_FALLBACK_CONFIG, | ||
| apiService = new ConfigRegistryApiService(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, alternatively, initialize this service in the client, allowing the client to inject custom configurations (e.g. during testing environments).
The controller would use the service via messenger
| export type AbstractConfigRegistryApiService = Partial< | ||
| Pick<ServicePolicy, 'onBreak' | 'onDegraded'> | ||
| > & { | ||
| fetchConfig(options?: FetchConfigOptions): Promise<FetchConfigResult>; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of having an abstract class? Do we plan to add multiple services?
| * @param comparisonOptions - Options for comparing with existing networks. | ||
| * @returns Result containing networks to add and existing chain IDs. | ||
| */ | ||
| export function processNetworkConfigs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a lot of logic in this file that supports this function, but I don't see this used anywhere. What do we need this for?
| const response = await this.#policy.execute(async () => { | ||
| const res = await fetchWithTimeout(); | ||
|
|
||
| if (res.status === 304) { | ||
| return { | ||
| status: 304, | ||
| headers: res.headers, | ||
| } as unknown as Response; | ||
| } | ||
|
|
||
| if (!res.ok) { | ||
| throw new Error( | ||
| `Failed to fetch config: ${res.status} ${res.statusText}`, | ||
| ); | ||
| } | ||
|
|
||
| return res; | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on simply returning the fetchWithTimeout promise directly instead of returning another object in case 304 is received? I feel like it adds unnecessary complexity here, but maybe I'm missing something
| this.#apiBaseUrl = apiBaseUrl; | ||
| this.#endpointPath = endpointPath; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughs on using something similar to ProfileMetricsService for configuring the environment?
| this.#baseURL = getAuthUrl(env); |
It would make the fetchConfig method way easier as we wouldn't have to validate an arbitrary string passed to the constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, the service is currently constructed by the controller only, so these constructor options would never really differ from the default ones, making the added complexity probably not worth it
| const fetchWithTimeout = async (): Promise<Response> => { | ||
| const controller = new AbortController(); | ||
| const timeoutId = setTimeout(() => controller.abort(), this.#timeout); | ||
|
|
||
| try { | ||
| const response = await this.#fetch(url.toString(), { | ||
| headers, | ||
| signal: controller.signal, | ||
| }); | ||
| clearTimeout(timeoutId); | ||
| return response; | ||
| } catch (error) { | ||
| clearTimeout(timeoutId); | ||
| if (error instanceof Error && error.name === 'AbortError') { | ||
| throw new Error(`Request timeout after ${this.#timeout}ms`); | ||
| } | ||
| throw error; | ||
| } | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that other services in the core packages don't handle directly timeouts, leaving it to the service policy execution. Was there a specific reason to handle timeouts directly in this service?
| return { | ||
| data, | ||
| etag, | ||
| notModified: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we can avoid the double negation on this parameter and call it modified (inverting the semantic)? It would probably be easier to read that way.
| } | ||
|
|
||
| const etag = response.headers.get('ETag') ?? undefined; | ||
| const data = (await response.json()) as RegistryConfigApiResponse; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using a wide type on the controller where we should know the type of data we have, while we cast a specific type in the service where we suppose to receive data in that shape. We should probably have a validation here, and we can use @metamask/superstruct for that - so we can avoid the type cast and give a guarantee that the data has the expected shape
| /** | ||
| * Checks if the config registry API feature flag is enabled. | ||
| * | ||
| * @param messenger - The controller messenger. | ||
| * @returns True if the feature flag is enabled, false otherwise. | ||
| */ | ||
| export function isConfigRegistryApiEnabled( | ||
| messenger: ConfigRegistryMessenger, | ||
| ): boolean { | ||
| try { | ||
| const state = messenger.call('RemoteFeatureFlagController:getState'); | ||
| const featureFlags = state.remoteFeatureFlags; | ||
|
|
||
| const flagValue = featureFlags[FEATURE_FLAG_KEY]; | ||
|
|
||
| if (typeof flagValue === 'boolean') { | ||
| return flagValue; | ||
| } | ||
|
|
||
| return DEFAULT_FEATURE_FLAG_VALUE; | ||
| } catch { | ||
| return DEFAULT_FEATURE_FLAG_VALUE; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clients may want to use different feature flags in different environments. Perhaps it would be easier to simply have a function constructor parameter passed to the controller, maybe even with this same name, and have the controller call that function to establish if it should continue the polling loop?
| fetchError: { | ||
| persist: true, | ||
| anonymous: false, | ||
| includeInStateLogs: true, | ||
| includeInDebugSnapshot: true, | ||
| usedInUi: false, | ||
| }, | ||
| etag: { | ||
| persist: true, | ||
| anonymous: false, | ||
| includeInStateLogs: false, | ||
| includeInDebugSnapshot: false, | ||
| usedInUi: false, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify what these two properties are used for? In what scenarios would they be publicly inspected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fetchError: Stores API fetch error messages. Updated to includeInStateLogs: false and includeInDebugSnapshot: false to avoid exposing error details.
etag: Stores the HTTP ETag header for caching. Already configured to not be exposed.
They remain persisted for internal use but are excluded from public logs and error reports
| if (hasNoConfigs) { | ||
| this.useFallbackConfig(errorMessage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of handling this case this way, have you considered assigning the fallback config to the state at controller construction time? This way, you can ensure that the state is always initialized with a valid config, and you won't need to handle the undefined case later in the code.
|
It seems to me that we are relying on the polling cycle only, without considering how much time has passed since the last fetch. Though this means that we'll fetch configs everytime the user opens the app instead of every 24 hours |
| let clock: sinon.SinonFakeTimers; | ||
| let messenger: ConfigRegistryMessenger; | ||
| let rootMessenger: RootMessenger; | ||
| let apiService: AbstractConfigRegistryApiService; | ||
| let mockRemoteFeatureFlagGetState: jest.Mock; | ||
|
|
||
| beforeEach(() => { | ||
| clock = useFakeTimers(); | ||
| const messengers = getConfigRegistryControllerMessenger(); | ||
| messenger = messengers.messenger; | ||
| rootMessenger = messengers.rootMessenger; | ||
| apiService = buildMockApiService(); | ||
|
|
||
| mockRemoteFeatureFlagGetState = jest.fn().mockReturnValue({ | ||
| remoteFeatureFlags: { | ||
| configRegistryApiEnabled: true, | ||
| }, | ||
| cacheTimestamp: Date.now(), | ||
| }); | ||
|
|
||
| rootMessenger.registerActionHandler( | ||
| 'RemoteFeatureFlagController:getState', | ||
| mockRemoteFeatureFlagGetState, | ||
| ); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually try to avoid using shared variables and encoding test initialization logic in beforeEach blocks, as it can make tests harder to read and maintain. Usually this can be easily done by using wrapper functions like withController:
| async function withController<ReturnValue>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 8 potential issues.
| state.etag = result.etag ?? null; | ||
| } | ||
| }); | ||
| return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastFetched not updated on 304 Not Modified
High Severity
When the API returns a 304 Not Modified response, the controller clears fetchError and updates etag but doesn't update lastFetched. This causes the polling interval check to use a stale timestamp, resulting in premature re-fetching even though a successful fetch just occurred. Every 304 response is treated as if no time has passed, defeating the purpose of the polling interval mechanism.
| } | ||
|
|
||
| this.#handleFetchError(error); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastFetched not updated on fetch errors
High Severity
When a fetch fails, lastFetched is never updated. The polling interval check only prevents re-fetching if lastFetched is set and recent. If the last successful fetch was 23 hours ago and a fetch fails now, when the app restarts minutes later, it will retry immediately because lastFetched still reflects the 23-hour-old timestamp. This causes retry spam when the API is down, especially across app restarts where polling reinitializes.
Additional Locations (1)
| key: string; | ||
| value: Json; | ||
| metadata?: Json; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused RegistryConfigEntry type exported
Low Severity
The RegistryConfigEntry type is defined and exported but never used anywhere in the codebase. A separate NetworkConfigEntry type serves the actual needs. This unused type adds clutter to the API surface and may confuse consumers about which type to use.
| // Subscribe to lock event - stop polling | ||
| this.messenger.subscribe('KeyringController:lock', () => { | ||
| this.stopPolling(); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Event listeners never cleaned up causing memory leak
Medium Severity
The messenger event subscriptions for KeyringController:unlock and KeyringController:lock are created in the constructor but never cleaned up. If the controller is destroyed and recreated during the application lifecycle, the old event listeners persist, causing a memory leak. Each controller instance adds new listeners without removing old ones, and the old controller instances are kept alive by these listener references.
|
|
||
| stopPolling(): void { | ||
| super.stopAllPolling(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent API breaks independent polling control
Medium Severity
The startPolling method returns a unique token suggesting per-consumer polling control, but stopPolling calls stopAllPolling which terminates polling for all consumers. If consumer A and consumer B both call startPolling, they each receive different tokens but share the polling mechanism. When consumer A calls stopPolling, consumer B's polling unexpectedly stops too. The returned tokens cannot be used to stop individual polling sessions, violating the API contract.
| key: network.chainId, | ||
| value: network, | ||
| }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate chainIds cause silent network data loss
Medium Severity
When the API returns multiple networks with the same chainId, the controller silently overwrites earlier networks with later ones in the loop. This causes complete data loss for the overwritten networks without any error or warning. If the API has a bug or data corruption, users lose access to networks and developers have no indication this occurred, making debugging extremely difficult.
| networksToAdd: transformedNetworks, | ||
| existingChainIds: [], | ||
| }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused exported transformer functions add unnecessary complexity
Medium Severity
Functions transformNetworkConfig, compareWithExistingNetworks, and processNetworkConfigs are exported from the package but never used in the controller implementation. They only appear in tests. These functions add 239 lines of unused production code, making the API surface unnecessarily large and confusing for consumers. The controller only uses filterNetworks from this module.
| Pick<ServicePolicy, 'onBreak' | 'onDegraded'> | ||
| > & { | ||
| fetchConfig(options?: FetchConfigOptions): Promise<FetchConfigResult>; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misleading name AbstractConfigRegistryApiService is not abstract
Low Severity
AbstractConfigRegistryApiService is named like an abstract class but is actually a TypeScript type alias. The naming convention suggests inheritance and abstraction that doesn't exist, causing confusion. A PR comment explicitly questioned "What's the purpose of having an abstract class? Do we plan to add multiple services?" highlighting this confusion.
Explanation
Core Package (@metamask/config-registry-controller)
What was delivered:
Business value:
References
Checklist
Note
Adds a new package to fetch and manage network configurations from a remote API with robust polling and fallback.
@metamask/config-registry-controllerwithConfigRegistryControllerandConfigRegistryApiService(ETag, retries, timeout, circuit breaker)StaticIntervalPollingController(24h), starts/stops on Keyring unlock/lockversion,etag,lastFetched, and configsRemoteFeatureFlagController; uses fallback configs when disabled/unavailableWritten by Cursor Bugbot for commit d628f08. This will update automatically on new commits. Configure here.