You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -220,32 +220,134 @@ Since Helix is aware of the global state of the system, it can send the message
220
220
This is a very generic api and can also be used to schedule various periodic tasks in the cluster like data backups etc.
221
221
System Admins can also perform adhoc tasks like on demand backup or execute a system command(like rm -rf ;-)) across all nodes.
222
222
223
+
#### Understanding Criteria and DataSource
224
+
225
+
The `Criteria` object allows you to specify message recipients using various attributes. A critical configuration is the `DataSource`, which determines where Helix looks up the cluster state to resolve your criteria.
226
+
227
+
**Available DataSource Options:**
228
+
229
+
Helix provides four DataSource types, each reading from different znodes in ZooKeeper:
230
+
231
+
| DataSource | Description | When to Use |
232
+
|------------|-------------|-------------|
233
+
|**LIVEINSTANCES**| Reads from `/LIVEINSTANCES` znodes | Targeting live instances without needing resource/partition/state filtering |
234
+
|**INSTANCES**| Reads from `/INSTANCES/[instance]` znodes | Targeting specific configured instances (live or not) based on instance configuration |
235
+
|**EXTERNALVIEW**| Reads from `/EXTERNALVIEWS/[resource]` znodes | Targeting based on actual replica placement, partition ownership, or replica state (MASTER/SLAVE) |
236
+
|**IDEALSTATES**| Reads from `/IDEALSTATES/[resource]` znodes | Targeting based on ideal state configuration (intended placement) |
237
+
238
+
**Key Differences:**
239
+
240
+
-**LIVEINSTANCES**: Contains only instance names of currently connected participants. No resource/partition information. Smallest dataset.
-**EXTERNALVIEW**: Contains actual current state - which instances own which partitions and their states (MASTER/SLAVE/OFFLINE). Large dataset at scale.
243
+
-**IDEALSTATES**: Contains desired state - which instances should own which partitions. Similar size to ExternalView.
244
+
245
+
**Choosing the Right DataSource:**
246
+
247
+
| Your Goal | Correct DataSource | Example Use Case |
|`setDataSource(DataSource)`| LIVEINSTANCES, INSTANCES, EXTERNALVIEW, IDEALSTATES |**MOST IMPORTANT:** Determines which znodes to read | N/A |`DataSource.LIVEINSTANCES`|
337
+
|`setInstanceName(String)`| Instance name | Target specific instance(s) by name | Yes (`%` = all) |`"localhost_12918"` or `"%"`|
338
+
|`setResource(String)`| Resource name | Filter by resource name (only meaningful for EXTERNALVIEW/IDEALSTATES) | Yes (`%` = all) |`"MyDatabase"` or `"%"`|
339
+
|`setPartition(String)`| Partition name | Filter by specific partition (only meaningful for EXTERNALVIEW/IDEALSTATES) | Yes (`%` = all) |`"MyDatabase_0"` or `"%"`|
340
+
|`setPartitionState(String)`| State name | Filter by replica state like MASTER, SLAVE, ONLINE, OFFLINE (only for EXTERNALVIEW/IDEALSTATES) | Yes (`%` = all) |`"MASTER"` or `"%"`|
341
+
|`setRecipientInstanceType(InstanceType)`| PARTICIPANT, CONTROLLER, SPECTATOR | Type of Helix process to target | No |`InstanceType.PARTICIPANT`|
342
+
|`setSessionSpecific(boolean)`| true/false | If true, message is only delivered to currently active sessions (not redelivered after restart) | No |`true` (recommended) |
343
+
344
+
**Important Notes:**
345
+
346
+
-**Wildcards:** Use `%` (SQL-style) or `*` to match all. Single underscore `_` matches any single character.
347
+
-**DataSource Compatibility:** Setting `resource`, `partition`, or `partitionState` only makes sense with `EXTERNALVIEW` or `IDEALSTATES` DataSource. They are ignored for `LIVEINSTANCES` and `INSTANCES`.
348
+
-**Session-Specific:** Set to `true` for most use cases to avoid redelivering messages after a participant restarts.
349
+
-**Empty vs Wildcard:** Empty string `""` and wildcard `"%"` are treated the same - both match all.
350
+
249
351
See HelixManager.getMessagingService for more info.
Copy file name to clipboardExpand all lines: website/0.9.9/src/site/markdown/tutorial_messaging.md
+20Lines changed: 20 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,6 +25,26 @@ under the License.
25
25
26
26
In this chapter, we\'ll learn about messaging, a convenient feature in Helix for sending messages between nodes of a cluster. This is an interesting feature that is quite useful in practice. It is common that nodes in a distributed system require a mechanism to interact with each other.
27
27
28
+
### Performance Considerations
29
+
30
+
**IMPORTANT:** When using the messaging API with `Criteria`, be aware of the following performance characteristics:
31
+
32
+
-**ExternalView Scanning:** By default, the messaging service uses `DataSource.EXTERNALVIEW` to resolve criteria. This can scan **all** ExternalView znodes in the cluster, even when targeting specific instances. At high resource cardinality, this can cause severe performance degradation.
33
+
34
+
**Recommended Patterns:**
35
+
36
+
-**Use `DataSource.LIVEINSTANCES`** when you only need to target live instances and do not require resource/partition-level filtering. This is much faster and more efficient.
37
+
-**Specify exact resource names** instead of wildcards if you must use ExternalView scanning.
recipientCriteria.setDataSource(DataSource.LIVEINSTANCES); // Efficient: avoids EV scan
45
+
recipientCriteria.setSessionSpecific(true);
46
+
```
47
+
28
48
### Example: Bootstrapping a Replica
29
49
30
50
Consider a search system where the index replica starts up and it does not have an index. A typical solution is to get the index from a common location, or to copy the index from another replica.
0 commit comments