Skip to content

BasicDataStructures

Jorge Londoño edited this page May 27, 2025 · 1 revision

Basic Data Structures

Introduction: Organizing Data for Algorithms

In Computer Science, data structures are fundamental. They are not just ways to store data, but specific methods of organizing and managing data to enable efficient access and modification. Data structures are the bedrock upon which algorithms are built, and the choice of structure significantly impacts the performance (efficiency) of an algorithm in terms of both time and space.

Understanding basic data structures is crucial for several reasons:

  1. Efficiency: Different data structures are optimized for different types of operations. Choosing the right structure can dramatically reduce the time and memory required for an algorithm.
  2. Building Blocks: Complex data structures (like trees, graphs, hash tables) and algorithms are often built using these basic structures as components.
  3. Problem Solving: Many computational problems can be simplified by modeling the data using an appropriate data structure.

Often, we distinguish between Abstract Data Types (ADTs) and Data Structures. An ADT is a logical model of data and a specification of the operations that can be performed on that data (e.g., Stack, Queue). It defines what the operations do, but not how they are implemented. A Data Structure, on the other hand, is a concrete way of implementing an ADT, providing the actual storage representation and algorithms for the operations (e.g., using an array or a linked list to implement a Stack). In this document, we will look at both concrete structures (Linked Lists) and ADTs (Stack, Queue) and discuss their common implementations.

Let's explore some of the most common basic data structures.

Commonly Used Data Structures

Linked Lists

Unlike arrays which store elements in contiguous memory locations and access them via an index, linked lists store elements in scattered memory locations. Each element, called a node, contains the data itself and one or more pointers (or references) to other nodes in the sequence. This structure allows for efficient insertion and deletion of elements compared to arrays, especially in the middle of the list, but sacrifices the ability to access elements directly by index (O(1) access).

Singly-Linked List

A singly-linked list is a sequence of nodes where each node contains:

  1. The data (or value) of the element.
  2. A pointer (or reference) to the next node in the sequence.

The list itself is typically represented by a pointer to the first node, called the head. The last node in the list points to None (or null), signifying the end. Elements are ordered sequentially, but accessing the Nth element requires traversing from the head.

Conceptual Structure: Head -> [Data | Next] -> [Data | Next] -> ... -> [Data | None]

Key Operations and Time Complexity:

Operation Description Time Complexity (Worst Case)
is_empty() Check if the list is empty O(1)
size() Get the number of elements O(1) (if size is tracked)
add_first(item) Add an item to the beginning of the list O(1)
add_last(item) Add an item to the end of the list O(n) (requires traversal)
remove_first() Remove the first item O(1)
remove_last() Remove the last item O(n) (requires traversal to find the second-to-last node)
remove(item) Remove the first occurrence of an item O(n) (requires search)
search(item) Check if an item exists in the list O(n) (requires traversal)
access(index) Get the item at a specific index O(n) (requires traversal)
traverse() Visit each element in order O(n)

Space Complexity: O(n) to store n elements, plus O(n) for pointers.

Single-linked list vs Array:

Feature Singly-Linked List Array
Access O(n) (sequential traversal) O(1) (direct index access)
Insertion/Deletion (at arbitrary position) O(n) (requires traversal to find position) O(n) (requires shifting elements)
Insertion/Deletion (at beginning) O(1) O(n) (requires shifting all elements)
Insertion/Deletion (at end) O(n) (without tail pointer), O(1) (with tail pointer) O(1) (amortized for dynamic array), O(n) (fixed array if resize needed)
Memory Dynamically allocated, scattered Contiguous block of memory
Overhead O(n) for pointers O(1) per element (if fixed size), may have unused space in dynamic arrays
Size Dynamic, easily expandable/shrinkable Fixed (for static arrays) or dynamic (with potential reallocations)

Uses of the Single-Linked List:

  • Implementing other data structures (e.g., Stacks, Queues, Hash Tables using chaining).
  • Dynamic memory allocation (managing free lists).
  • Representing polynomial equations.
  • Building file systems.

Python Example Implementation (Basic):

from typing import Any, Optional

class Node:
    """Represents a node in a single-linked list."""
    def __init__(self, value: Any):
        self.value = value
        self.next: Optional[Node] = None # Pointer to the next node

class SingleLinkedList:
    """Basic implementation of a single-linked list."""
    def __init__(self):
        self._head: Optional[Node] = None   # Head of the list
        self._size: int = 0                 # Number of elements in the list

    def is_empty(self) -> bool:
        """Checks if the list is empty."""
        return self._head is None

    def __len__(self) -> int:
        """Returns the number of items in the list."""
        return self._size

    def add_first(self, item: Any):
        """Adds an item to the beginning of the list (O(1))."""
        new_node = Node(item)
        new_node.next = self._head  # The new node points to the current head
        self._head = new_node       # The list's head is now the new node
        self._size += 1

    # Note: add_last (append) would be O(n) without a tail pointer:
    # def add_last(self, item: Any):
    #     new_node = Node(item)
    #     if self.is_empty():
    #         self._head = new_node
    #     else:
    #         current = self._head
    #         while current.next is not None:
    #             current = current.next
    #         current.next = new_node
    #     self._size += 1

    def search(self, item: Any) -> bool:
        """Searches for an item in the list (O(n))."""
        current = self._head
        while current is not None:
            if current.value == item:
                return True
            current = current.next
        return False

    def __contains__(self, item: Any) -> bool:
        """Allows 'item in list' syntax. Uses search(item)."""
        return self.search(item)

    def remove_first(self) -> Any:
        """Removes and returns the first item (O(1)). Raises IndexError if empty."""
        if self.is_empty():
            raise IndexError("remove_first called on empty list")
        removed_value = self._head.value
        self._head = self._head.next # Move the head pointer to the next node
        self._size -= 1
        return removed_value

    # Note: remove_last would be O(n) without storing previous nodes (e.g., in DLL) or traversing
    # Note: remove(item) would also be O(n) requiring traversal to find the item and its predecessor.

    def __iter__(self):
        """Allows iteration (e.g., 'for item in list')."""
        current = self._head
        while current is not None:
            yield current.value
            current = current.next

    def __str__(self) -> str:
        """String representation for easy printing."""
        items = [str(item) for item in self]
        return f"SingleLinkedList([{' -> '.join(items)}])"

    def __repr__(self) -> str:
        """Representation for debugging."""
        return str(self)


# Example Usage:
sll = SingleLinkedList()
sll.add_first(10)       # List: 10
sll.add_first(20)       # List: 20 -> 10
sll.add_first(30)       # List: 30 -> 20 -> 10

print(sll)              # Output: SingleLinkedList([30 -> 20 -> 10])
print(f"Size: {len(sll)}") # Output: Size: 3
print(f"Contains 20? {20 in sll}") # Output: Contains 20? True
print(f"Contains 50? {50 in sll}") # Output: Contains 50? False

first = sll.remove_first()
print(f"Removed first: {first}") # Output: Removed first: 30
print(sll)              # Output: SingleLinkedList([20 -> 10])
print(f"Size: {len(sll)}") # Output: Size: 2

for item in sll:
    print(f"Iterating: {item}") # Output: Iterating: 20, Iterating: 10

# sll.remove_first() # List: 10
# sll.remove_first() # List: []
# sll.remove_first() # Raises IndexError

Comparison with Python's Built-in List:

Python's built-in list is implemented as a dynamic array. While it provides operations that mimic some linked list behaviors, the underlying mechanism is different, leading to different performance characteristics for some operations:

# Same operations using Python's built-in list
python_list = []

# add_first equivalent: insert at index 0
# Note: This is O(n) for Python's list because it requires shifting all existing elements.
python_list.insert(0, 10) # List: [10]
python_list.insert(0, 20) # List: [20, 10]
python_list.insert(0, 30) # List: [30, 20, 10]

print(python_list) # Output: [30, 20, 10]
print(f"Size: {len(python_list)}") # Output: Size: 3

# search equivalent: 'in' operator
print(f"Contains 20? {20 in python_list}") # Output: Contains 20? True
print(f"Contains 50? {50 in python_list}") # Output: Contains 50? False

# remove_first equivalent: pop(0)
# Note: This is O(n) for Python's list because it requires shifting all remaining elements.
first = python_list.pop(0)
print(f"Removed first: {first}") # Output: Removed first: 30
print(python_list) # Output: [20, 10]
print(f"Size: {len(python_list)}") # Output: Size: 2

# Iteration is O(n) for both, but arrays might benefit from cache locality
for item in python_list:
    print(f"Iterating: {item}") # Output: Iterating: 20, Iterating: 10

The Python list is generally faster for random access (list[i]) and appending (list.append()), while a linked list (especially a doubly-linked list) is faster for insertions and deletions within the list or at the beginning/end if you have direct pointers to those locations. The overhead per element is also higher in a linked list due to the storage required for pointers.

Doubly-Linked List

A doubly-linked list is a sequence of nodes where each node contains:

  1. The data (or value) of the element.
  2. A pointer (or reference) to the next node in the sequence.
  3. A pointer (or reference) to the previous node in the sequence.

The list is typically represented by a pointer to the first node (head) and a pointer to the last node (tail). Both the head's previous pointer and the tail's next pointer point to None.

Conceptual Structure: None <- [Data | Prev | Next] <-> [Data | Prev | Next] <-> ... <-> [Data | Prev | Next] -> None Head ----------------------------------------------------------------------------^ Tail -------------------------------------------------------------^

Advantage over Singly-Linked List: The main advantage is the ability to traverse the list in both forward and backward directions. This also allows for more efficient operations, particularly removing a node or inserting before a node, as you can easily access the previous node without traversing from the head (assuming you have a reference to the node itself). Removing from the tail is also O(1) because the tail's prev pointer gives direct access to the new tail.

Key Operations and Time Complexity:

Operation Description Time Complexity (Worst Case)
is_empty() Check if the list is empty O(1)
size() Get the number of elements O(1) (if size is tracked)
add_first(item) Add an item to the beginning of the list O(1)
add_last(item) Add an item to the end of the list O(1)
remove_first() Remove the first item O(1)
remove_last() Remove the last item O(1)
remove(node) Remove a specific node (given reference) O(1)
remove(item) Remove the first occurrence of an item O(n) (requires search)
search(item) Check if an item exists in the list O(n)
access(index) Get the item at a specific index O(n)
traverse_forward() Visit each element from head to tail O(n)
traverse_backward() Visit each element from tail to head O(n)

Space Complexity: O(n) to store n elements, plus O(2n) for pointers (each node has two pointers). Higher space overhead than SLL.

Uses of the Doubly-Linked List:

  • Implementing caches (e.g., LRU cache).
  • Implementing undo/redo functionality in applications.
  • Implementing a deque (double-ended queue).
  • Representing decks of cards or playlists where you might move backward and forward.

Python Basic Implementation Outline:

from typing import Any, Optional

class DoublyNode:
    """Represents a node in a doubly-linked list."""
    def __init__(self, value: Any):
        self.value = value
        self.prev: Optional[DoublyNode] = None # Pointer to the previous node
        self.next: Optional[DoublyNode] = None # Pointer to the next node

class DoublyLinkedList:
    """Basic implementation structure for a doubly-linked list."""
    def __init__(self):
        self._head: Optional[DoublyNode] = None
        self._tail: Optional[DoublyNode] = None
        self._size: int = 0

    def is_empty(self) -> bool:
        """Checks if the list is empty."""
        return self._head is None # or self._tail is None or self._size == 0

    def __len__(self) -> int:
         """Returns the number of items in the list."""
         return self._size

    def add_first(self, item: Any):
        """Adds an item to the beginning (O(1))."""
        new_node = DoublyNode(item)
        if self.is_empty():
            self._head = self._tail = new_node
        else:
            new_node.next = self._head # New node points to old head
            self._head.prev = new_node # Old head points back to new node
            self._head = new_node      # List head is now new node
        self._size += 1

    def add_last(self, item: Any):
        """Adds an item to the end (O(1))."""
        new_node = DoublyNode(item)
        if self.is_empty():
            self._head = self._tail = new_node
        else:
            new_node.prev = self._tail # New node points back to old tail
            self._tail.next = new_node # Old tail points to new node
            self._tail = new_node      # List tail is now new node
        self._size += 1

    def remove_first(self) -> Any:
        """Removes and returns the first item (O(1)). Raises IndexError if empty."""
        if self.is_empty():
            raise IndexError("remove_first called on empty list")
        removed_value = self._head.value
        if self._head == self._tail: # List has only one node
            self._head = self._tail = None
        else:
            self._head = self._head.next # Move head pointer
            self._head.prev = None       # New head's prev is None
        self._size -= 1
        return removed_value

    def remove_last(self) -> Any:
        """Removes and returns the last item (O(1)). Raises IndexError if empty."""
        if self.is_empty():
             raise IndexError("remove_last called on empty list")
        removed_value = self._tail.value
        if self._head == self._tail: # List has only one node
            self._head = self._tail = None
        else:
            self._tail = self._tail.prev # Move tail pointer
            self._tail.next = None       # New tail's next is None
        self._size -= 1
        return removed_value

    # remove(item), search(item), __contains__, __iter__ (forward and backward)
    # implementations would be similar to SLL but using prev/next pointers appropriately.
    # Removing a specific *node* given its reference is O(1) in DLL.

    def __str__(self) -> str:
        """String representation."""
        items = []
        current = self._head
        while current:
            items.append(str(current.value))
            current = current.next
        return f"DoublyLinkedList([{' <-> '.join(items)}])"

    def __repr__(self) -> str:
        """Representation."""
        return str(self)

# Example Usage (basic):
dll = DoublyLinkedList()
dll.add_first(10) # List: 10
dll.add_last(20)  # List: 10 <-> 20
dll.add_first(5)  # List: 5 <-> 10 <-> 20
print(dll)        # Output: DoublyLinkedList([5 <-> 10 <-> 20])
print(f"Size: {len(dll)}") # Output: Size: 3

print(f"Removed first: {dll.remove_first()}") # Output: Removed first: 5
print(dll)        # Output: DoublyLinkedList([10 <-> 20])

print(f"Removed last: {dll.remove_last()}")   # Output: Removed last: 20
print(dll)        # Output: DoublyLinkedList([10])

Abstract Data Types: Stack and Queue

Stack and Queue are examples of Abstract Data Types (ADTs). They define a set of operations and constraints on how data can be accessed, regardless of the underlying implementation (which could be an array, a linked list, etc.).

Stack

A stack is an ADT that represents a collection of elements with two main operations: adding an element to the collection and removing an element from the collection. The crucial characteristic is that the element removed is always the one most recently added. This is known as the LIFO (Last-In, First-Out) principle.

Imagine a stack of plates – you can only add a plate to the top, and you can only take a plate from the top. The last plate you put on is the first one you take off.

Stack API:

  • push(item): Adds item to the top of the stack.
  • pop(): Removes and returns the item from the top of the stack. Raises an error if the stack is empty.
  • peek() or top(): Returns the item at the top of the stack without removing it. Raises an error if the stack is empty.
  • is_empty(): Returns True if the stack contains no elements, False otherwise.
  • size(): Returns the number of elements in the stack.

Common Implementations and Time Complexity (Worst Case):

Stacks are commonly implemented using:

  • Arrays (or dynamic arrays): Elements are added/removed from one end. push is typically append (O(1) amortized), pop is pop() (O(1)).
  • Linked Lists: Elements are added/removed from the head. push (add to head) is O(1), pop (remove from head) is O(1).
Operation Time Complexity (Array) Time Complexity (Linked List)
push O(1) (amortized) O(1)
pop O(1) O(1)
peek/top O(1) O(1)
is_empty O(1) O(1)
size O(1) O(1) (if size is tracked)

Space Complexity: O(n) to store n elements.

Applications of a Stack:

  • Function Call Stack: Managing function calls in programs.
  • Undo/Redo Functionality: Storing states to revert to.
  • Expression Evaluation: Converting infix to postfix/prefix, evaluating postfix expressions.
  • Backtracking Algorithms: Exploring possibilities and returning if a path is invalid.
  • Parsing: Validating syntax (e.g., matching parentheses).

Python Example Implementing a Stack (using Linked List concept internally):

This implementation will use the concept of nodes and pointers (like the SLL) to manage the stack elements, demonstrating how the Stack ADT can be built on top of a linked structure, without directly using the built-in list's stack methods (append, pop).

from typing import Any, Optional

class StackNode:
    """Node for use in the Stack implementation."""
    def __init__(self, value: Any):
        self.value = value
        self.next: Optional[StackNode] = None

class Stack:
    """Implementation of the Stack ADT using a linked list structure."""
    def __init__(self):
        self._top: Optional[StackNode] = None # Corresponds to the head of a linked list
        self._size: int = 0

    def push(self, item: Any):
        """Adds an item to the top of the stack (O(1))."""
        new_node = StackNode(item)
        new_node.next = self._top # New node points to the current top
        self._top = new_node      # The top is now the new node
        self._size += 1

    def pop(self) -> Any:
        """Removes and returns the item from the top (O(1)). Raises IndexError if empty."""
        if self.is_empty():
            raise IndexError("pop from empty stack")
        removed_value = self._top.value
        self._top = self._top.next # Move top pointer down
        self._size -= 1
        return removed_value

    def peek(self) -> Any:
        """Returns the item from the top without removing (O(1)). Raises IndexError if empty."""
        if self.is_empty():
            raise IndexError("peek from empty stack")
        return self._top.value

    def is_empty(self) -> bool:
        """Checks if the stack is empty (O(1))."""
        return self._top is None

    def size(self) -> int:
        """Returns the number of elements (O(1))."""
        return self._size

    def __len__(self) -> int:
        """Allows len(stack) syntax."""
        return self._size

    def __str__(self) -> str:
        """String representation (shows top to bottom)."""
        items = []
        current = self._top
        while current:
            items.append(str(current.value))
            current = current.next
        # Items are collected top to bottom, so join them in that order
        return f"Stack([{', '.join(items)}])"

    def __repr__(self) -> str:
        return str(self)

# Example Usage:
stack = Stack()
print(f"Is stack empty? {stack.is_empty()}") # Output: True

stack.push(10) # Stack: [10]
stack.push(20) # Stack: [20, 10]
stack.push(30) # Stack: [30, 20, 10]

print(stack)               # Output: Stack([30, 20, 10])
print(f"Size: {stack.size()}") # Output: Size: 3
print(f"Top element: {stack.peek()}") # Output: Top element: 30

item = stack.pop()
print(f"Popped item: {item}") # Output: Popped item: 30
print(stack)               # Output: Stack([20, 10])
print(f"Size after pop: {len(stack)}") # Output: Size after pop: 2

print(f"Is stack empty? {stack.is_empty()}") # Output: False

# pop all elements
stack.pop()
stack.pop()
print(f"Is stack empty? {stack.is_empty()}") # Output: True
# stack.pop() # Raises IndexError

Queue

A queue is an ADT that represents a collection of elements supporting two primary operations: adding an element to the collection and removing an element from the collection. The key characteristic is that the element removed is always the one that has been in the collection the longest. This is known as the FIFO (First-In, First-Out) principle.

Think of a queue of people waiting in line – the first person to join the line is the first person to be served.

Queue API:

  • enqueue(item) or add(item): Adds item to the rear (or tail) of the queue.
  • dequeue() or remove(): Removes and returns the item from the front (or head) of the queue. Raises an error if the queue is empty.
  • peek() or front(): Returns the item at the front of the queue without removing it. Raises an error if the queue is empty.
  • is_empty(): Returns True if the queue contains no elements, False otherwise.
  • size(): Returns the number of elements in the queue.

Common Implementations and Time Complexity (Worst Case):

Queues are commonly implemented using:

  • Arrays:
    • Using append for enqueue and pop(0) for dequeue results in O(n) dequeue due to element shifting (like Python's list).
    • Using a circular array allows O(1) enqueue and dequeue by managing head and tail indices that wrap around the array.
  • Linked Lists: Elements are added at the tail and removed from the head. Requires pointers to both the head and the tail for O(1) operations. enqueue (add to tail) is O(1), dequeue (remove from head) is O(1).
Operation Time Complexity (Array, pop(0)) Time Complexity (Circular Array) Time Complexity (Linked List, Head/Tail)
enqueue O(1) (amortized) O(1) O(1)
dequeue O(n) O(1) O(1)
peek/front O(1) O(1) O(1)
is_empty O(1) O(1) O(1)
size O(1) O(1) O(1) (if size is tracked)

Space Complexity: O(n) to store n elements.

Applications of the Queue:

  • Task Scheduling: Managing tasks in operating systems (e.g., CPU scheduling).
  • Breadth-First Search (BFS): Graph traversal algorithm.
  • Simulations: Modeling waiting lines.
  • Buffering: Handling data flow in systems (e.g., I/O buffers).
  • Spooling: Managing print jobs.

Python Example Implementing a Queue (using Linked List concept internally):

This implementation will use the concept of nodes and pointers with both head and tail references to efficiently manage the queue elements, demonstrating the FIFO principle based on a linked structure.

from typing import Any, Optional

class QueueNode:
    """Node for use in the Queue implementation."""
    def __init__(self, value: Any):
        self.value = value
        self.next: Optional[QueueNode] = None

class Queue:
    """Implementation of the Queue ADT using a linked list structure with head and tail pointers."""
    def __init__(self):
        self._head: Optional[QueueNode] = None # Front of the queue
        self._tail: Optional[QueueNode] = None # Rear of the queue
        self._size: int = 0

    def enqueue(self, item: Any):
        """Adds an item to the rear of the queue (O(1))."""
        new_node = QueueNode(item)
        if self.is_empty():
            self._head = self._tail = new_node # If empty, new node is both head and tail
        else:
            self._tail.next = new_node # Old tail points to new node
            self._tail = new_node      # New node becomes the tail
        self._size += 1

    def dequeue(self) -> Any:
        """Removes and returns the item from the front (O(1)). Raises IndexError if empty."""
        if self.is_empty():
            raise IndexError("dequeue from empty queue")
        removed_value = self._head.value
        self._head = self._head.next # Move head pointer
        if self._head is None:       # If head becomes None, queue is now empty
            self._tail = None        # So tail must also be None
        self._size -= 1
        return removed_value

    def peek(self) -> Any:
        """Returns the item from the front without removing (O(1)). Raises IndexError if empty."""
        if self.is_empty():
            raise IndexError("peek from empty queue")
        return self._head.value

    def is_empty(self) -> bool:
        """Checks if the queue is empty (O(1))."""
        return self._head is None # or self._size == 0

    def size(self) -> int:
        """Returns the number of elements (O(1))."""
        return self._size

    def __len__(self) -> int:
        """Allows len(queue) syntax."""
        return self._size

    def __str__(self) -> str:
        """String representation (shows front to rear)."""
        items = []
        current = self._head
        while current:
            items.append(str(current.value))
            current = current.next
        return f"Queue([{', '.join(items)}])"

    def __repr__(self) -> str:
        return str(self)

# Example Usage:
queue = Queue()
print(f"Is queue empty? {queue.is_empty()}") # Output: True

queue.enqueue(10) # Queue: [10]
queue.enqueue(20) # Queue: [10, 20]
queue.enqueue(30) # Queue: [10, 20, 30]

print(queue)               # Output: Queue([10, 20, 30])
print(f"Size: {queue.size()}") # Output: Size: 3
print(f"Front element: {queue.peek()}") # Output: Front element: 10

item = queue.dequeue()
print(f"Dequeued item: {item}") # Output: Dequeued item: 10
print(queue)               # Output: Queue([20, 30])
print(f"Size after dequeue: {len(queue)}") # Output: Size after dequeue: 2

print(f"Is queue empty? {queue.is_empty()}") # Output: False

# dequeue all elements
queue.dequeue()
queue.dequeue()
print(f"Is queue empty? {queue.is_empty()}") # Output: True
# queue.dequeue() # Raises IndexError

Clone this wiki locally