-
Notifications
You must be signed in to change notification settings - Fork 0
UnionFind
The Union-Find data structure, also known as the Disjoint Set Union (DSU) data structure, is a powerful tool for managing a collection of disjoint sets. It provides an efficient way to keep track of partitions of a set into non-overlapping subsets.
At its core, Union-Find solves the problem of maintaining dynamic sets and supporting two primary operations on these sets:
- Find: Determine which set a particular element belongs to.
- Union: Merge two sets into a single set.
This data structure is particularly useful for problems involving connectivity. Consider a collection of elements, where relationships (like "connected to") can be established between pairs of elements over time. We are interested in determining if any two elements are connected (possibly indirectly through a path of relationships) or in grouping elements into connected components.
The "connected to" relationship typically has the following properties:
- Reflexivity: An element is connected to itself.
- Symmetry: If A is connected to B, then B is connected to A.
- Transitivity: If A is connected to B, and B is connected to C, then A is connected to C.
These properties define an equivalence relation. An equivalence relation partitions a set into disjoint subsets called equivalence classes. In the context of connectivity, these equivalence classes are the connected components. The Union-Find data structure naturally models these disjoint connected components.
Practical Examples where Union-Find is applicable:
- Kruskal's Algorithm for Minimum Spanning Tree (MST): Used to detect cycles efficiently. When considering an edge (u, v), if u and v are already in the same connected component (same set), adding the edge would create a cycle. Otherwise, they are in different sets, and the edge can be added, merging the two components.
- Finding Connected Components in a Graph: Given a graph, Union-Find can quickly determine which nodes belong to the same connected component.
- Image Processing: Analyzing connected regions of pixels.
- Network Connectivity: Determining if two nodes in a network are connected.
- Percolation: Studying the properties of random graphs and connectivity.
The Union-Find data structure operates on a fixed number of elements, let's say N, which are often represented by integers from 0 to N-1. The core operations defining the Union-Find ADT are:
-
UnionFind(N):-
Purpose: Creates a Union-Find structure for
Nelements. -
Initial State: Initially, each element is in its own separate set. There are
Nsets.
-
Purpose: Creates a Union-Find structure for
-
find(p):-
Purpose: Returns the representative (or root) of the set containing element
p. -
Property:
find(p)andfind(q)return the same value if and only ifpandqare in the same set.
-
Purpose: Returns the representative (or root) of the set containing element
-
union(p, q):-
Purpose: Merges the sets containing elements
pandqinto a single set. -
Effect: After
union(p, q),pandqwill be in the same set (i.e.,find(p) == find(q)). If they were already in the same set, this operation does nothing.
-
Purpose: Merges the sets containing elements
-
connected(p, q):-
Purpose: Checks if elements
pandqare in the same set. -
Implementation: Typically implemented as
return find(p) == find(q).
-
Purpose: Checks if elements
-
count():- Purpose: Returns the total number of disjoint sets currently managed by the data structure.
-
Initial State: Starts at
N. Decreases by 1 for each successfulunionoperation that merges two distinct sets.
The efficiency of Union-Find operations (find and union) depends heavily on the underlying data structure used to represent the sets. We will explore several common implementations, evolving from simple but inefficient approaches to highly optimized ones.
This implementation prioritizes fast find and connected operations.
-
Data Structure: An integer array
idof sizeN.id[i]stores the component identifier (an integer) for elementi.-
Representation: All elements belonging to the same set have the same value in the
idarray. -
Initial State:
id[i] = ifor allifrom 0 to N-1. Initially, each element is in its own set, and its identifier is itself.
-
Representation: All elements belonging to the same set have the same value in the
-
Operations:
-
find(p): Returnsid[p]. This is very fast.def find(self, p): return self.id[p]
-
connected(p, q): Checks ifid[p] == id[q]. Also very fast.def connected(self, p, q): return self.find(p) == self.find(q)
-
union(p, q): This is the costly operation. To merge the set containingpand the set containingq:- Find the component identifiers for
pandq(pid = self.find(p),qid = self.find(q)). - If
pid == qid, they are already in the same set; do nothing. - If
pid != qid, iterate through the entireidarray. For every elementi, ifid[i]is equal topid, change it toqid. This effectively re-labels all elements inp's set to haveq's set identifier.
def union(self, p, q): pid = self.find(p) qid = self.find(q) if pid == qid: return # Already in the same set # Iterate through all elements and change component IDs for i in range(len(self.id)): if self.id[i] == pid: self.id[i] = qid self._count -= 1 # Decrement the number of sets
- Find the component identifiers for
-
-
Complexity Analysis:
-
find,connected: O(1). Accessing an array element is constant time. -
union: O(N). In the worst case, we might iterate through the entire array of size N.
-
-
Drawback: While
findis fast, theunionoperation is very slow. PerformingNunion operations could take O(N^2) time in the worst case (e.g., merging sets one by one, each requiring a full array scan). This is inefficient for large N.
This implementation uses a tree structure to represent each set. The root of the tree serves as the representative for the set.
-
Data Structure: An integer array
idof sizeN.id[i]stores the parent of elementi.-
Representation: A root element
ris its own parent, meaningid[r] == r. Following parent pointers from any elementieventually leads to the root of its tree (and thus, its set representative). -
Initial State:
id[i] = ifor allifrom 0 to N-1. Each element is initially a root of its own single-node tree.
-
Representation: A root element
-
Operations:
-
find(p): To find the representative ofp, traverse up the tree frompby following parent pointers (id[p],id[id[p]], ...) until a root is reached (an elementrwhereid[r] == r).def find(self, p): # Traverse up until root is found while p != self.id[p]: p = self.id[p] return p
-
connected(p, q): Check iffind(p) == find(q).def connected(self, p, q): return self.find(p) == self.find(q)
-
union(p, q): To merge the sets containingpandq:- Find the roots of
pandq(rootP = self.find(p),rootQ = self.find(q)). - If
rootP == rootQ, they are already in the same set; do nothing. - If
rootP != rootQ, merge the two trees by setting the parent of one root to the other root. For example, setid[rootP] = rootQ. This connects the tree rooted atrootPto the tree rooted atrootQ.
def union(self, p, q): rootP = self.find(p) rootQ = self.find(q) if rootP == rootQ: return # Already in the same set # Attach rootP's tree to rootQ's tree self.id[rootP] = rootQ self._count -= 1 # Decrement the number of sets
- Find the roots of
-
-
Complexity Analysis:
-
find,connected,union: O(depth of tree). The time taken is proportional to the height of the tree containing the elements. In the worst case (e.g., union operations always creating a long, unbalanced tree resembling a linked list), the depth can be N. So, worst-case complexity is O(N).
-
-
Drawback: While
unionis faster on average than Quick-Find's O(N)union(it only involves finding roots and changing one parent pointer), the worst-case time for all operations is still O(N) if the trees become tall and degenerate.
To improve Quick-Union's worst-case performance, we can avoid creating tall trees by always attaching the smaller tree to the larger tree during the union operation. This strategy is called "weighting" or "balancing". We can balance by the number of nodes (size) or by the height/rank of the trees. Balancing by size is slightly easier to implement.
-
Data Structure:
-
idarray: Same as Quick-Union (id[i]is parent ofi). -
szarray: An integer array of sizeN, wheresz[i]stores the number of nodes in the tree rooted ati. This value is only relevant ifiis a root (id[i] == i). -
Initial State:
id[i] = iandsz[i] = 1for allifrom 0 to N-1. Each element is initially a root of a tree of size 1.
-
-
Operations:
-
find(p): Same as Quick-Union. O(depth).def find(self, p): while p != self.id[p]: p = self.id[p] return p
-
connected(p, q): Same as Quick-Union. O(depth).def connected(self, p, q): return self.find(p) == self.find(q)
-
union(p, q):- Find the roots of
pandq(rootP = self.find(p),rootQ = self.find(q)). - If
rootP == rootQ, return. - If
rootP != rootQ, compare the sizes of the trees rooted atrootPandrootQ(self.sz[rootP]vsself.sz[rootQ]). - Attach the root of the smaller tree to the root of the larger tree.
- Update the size of the new root (add the size of the smaller tree to the size of the larger tree).
def union(self, p, q): rootP = self.find(p) rootQ = self.find(q) if rootP == rootQ: return # Already in the same set # Attach smaller tree to larger tree's root if self.sz[rootP] < self.sz[rootQ]: self.id[rootP] = rootQ self.sz[rootQ] += self.sz[rootP] else: self.id[rootQ] = rootP self.sz[rootP] += self.sz[rootQ] self._count -= 1 # Decrement the number of sets
- Find the roots of
-
-
Complexity Analysis:
-
find,connected,union: O(log N). By attaching the smaller tree to the larger, the maximum depth of any node is guaranteed to be logarithmic. When we merge a tree of sizes_1with a tree of sizes_2, the size of the new tree iss_1 + s_2. A node's depth increases by 1 only when its tree is attached to a larger tree. Each time a node's depth increases, the size of the tree it belongs to at least doubles. A tree rooted at a node containing elementicannot double in size more than log₂N times before reaching size N. Therefore, the maximum depth is O(log N).
-
- Benefit: This is a significant improvement over basic Quick-Union, providing logarithmic time complexity for all operations.
Path compression is an optimization that can be applied to the find operation in tree-based implementations (Quick-Union or Weighted Quick-Union). Its goal is to flatten the tree structure by making nodes directly point to the root.
-
Technique: During a
find(p)operation, as we traverse up the tree frompto find its root, we can update the parent pointer of each node visited along that path to point directly to the root. -
Illustration:
- Imagine finding the root of element
p. You traversep -> parent(p) -> parent(parent(p)) -> ... -> root. - Once the root is found, go back down the path you just traversed. For each node
xon the path (excluding the root), setid[x] = root.
- Imagine finding the root of element
-
Implementation (Recursive approach is often cleanest):
An iterative two-pass approach is also possible: first pass to find the root, second pass to update pointers. A slightly simpler one-pass iterative approach is also common, setting
def find(self, p): # Base case: p is the root if p == self.id[p]: return p # Recursive step: Find the root of the parent root = self.find(self.id[p]) # Path compression: Set parent of p directly to the root self.id[p] = root return root
id[i] = id[id[i]](halving the path) during traversal, though this isn't full compression. The recursive version shown achieves full path compression. -
Effect: Path compression doesn't change the worst-case time of a single
findoperation (it could still traverse a long path), but it significantly improves the performance of futurefindoperations involving any node on the compressed path or its descendants. It effectively flattens the tree over time.
Combining Weighted Quick-Union (using size or rank) with Path Compression yields the most efficient and standard implementation of the Union-Find data structure.
-
Algorithms:
-
idarray: Stores parent pointers. -
szarray: Stores sizes of trees rooted ati(ifiis a root). -
find(p): Use the path compression algorithm (recursive or iterative). -
union(p, q): Use the weighted union algorithm (attach smaller tree's root to larger tree's root, update size), but call the path-compressedfindwithin it.
-
-
Example
find(recursive path compression):def find(self, p): if p == self.id[p]: return p self.id[p] = self.find(self.id[p]) # Path compression return self.id[p]
-
Example
union(calls path-compressed find):def union(self, p, q): rootP = self.find(p) # Uses path-compressed find rootQ = self.find(q) # Uses path-compressed find if rootP == rootQ: return # Already in the same set # Weighted union if self.sz[rootP] < self.sz[rootQ]: self.id[rootP] = rootQ self.sz[rootQ] += self.sz[rootP] else: self.id[rootQ] = rootP self.sz[rootP] += self.sz[rootQ] self._count -= 1
-
Amortized Complexity Analysis:
- The combination of weighted union and path compression is remarkably efficient. The amortized time complexity for
find,connected, andunionoperations on a structure with N elements is O(α(N)), where α is the inverse Ackermann function. - The inverse Ackermann function grows extremely slowly. For any practically obtainable number of elements N, α(N) is less than 5. This means that, on average over a sequence of operations, each operation takes nearly constant time.
- Amortized Analysis: It's important to understand that this is an amortized bound. A single operation might still take slightly longer (e.g., traversing a path before compression), but over a sequence of many operations, the total time divided by the number of operations approaches this nearly constant value. Path compression makes subsequent operations faster, compensating for the cost of earlier ones.
- The combination of weighted union and path compression is remarkably efficient. The amortized time complexity for
- Conclusion: This is the standard and most efficient implementation of the Union-Find data structure, widely used in practice.
The Union-Find data structure is a classic example of how combining simple ideas (array representation, tree representation) with clever optimizations (weighting, path compression) can lead to extremely efficient algorithms.
We started with Quick-Find (O(1) find, O(N) union), moved to Quick-Union (O(N) worst-case for all), improved to Weighted Quick-Union (O(log N) for all), and finally, by adding Path Compression, achieved the standard implementation with O(α(N)) amortized time per operation, effectively making operations constant time for practical purposes.
The Union-Find data structure with weighted union and path compression is a cornerstone in algorithms dealing with connectivity, set partitioning, and equivalence relations, proving indispensable in areas like graph algorithms (especially MST) and network problems.
While we focused on weighted union by size, weighted union by rank (tree height) provides the same O(α(N)) complexity. Sometimes, the structure might need to store additional data associated with each set (e.g., the sum of values in the set, the minimum element). This can be added to the root node and updated during the union operation.
Understanding Union-Find provides insight into efficient data structure design and the power of amortized analysis.