Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
90ca00a
first part of BST, init class.
chelseadole Nov 24, 2017
4a5a47f
changes to BST, changing depth calculation.
chelseadole Nov 24, 2017
832b109
changes to bst, first traversal methods.
chelseadole Nov 24, 2017
0b26b1f
preorder gen traversal.
chelseadole Nov 24, 2017
498a807
post_order
chelseadole Nov 24, 2017
1c0b988
bredth_first
chelseadole Nov 24, 2017
806f991
finishing breadth first and post order gen.
chelseadole Nov 24, 2017
116a52f
adding tests
chelseadole Nov 24, 2017
7c7b83e
adding 8 new tests for BST edge cases.
chelseadole Nov 24, 2017
5e219b1
fixing small error in BST tests.
chelseadole Nov 24, 2017
1cd5bc0
last changes
chelseadole Nov 24, 2017
bd0e281
adding tests Nathan and I wrote on Friday.
chelseadole Nov 26, 2017
06f48b7
small changes to naming of tests and docstrings to be more semantic.
chelseadole Nov 26, 2017
d46e7be
moving and changing sample trees
chelseadole Nov 26, 2017
6184767
last commit, further semantic changes
chelseadole Nov 26, 2017
7f33e61
first part of README.
chelseadole Nov 26, 2017
05ea5d1
completing README with runtime of all functions.
chelseadole Nov 26, 2017
2a53b96
adding tox, tox.ini, and setup.py as required by tox.
chelseadole Nov 26, 2017
dec712d
adding if __name__ is main at bottom of BST.
chelseadole Nov 26, 2017
a054665
getting rid of duplicate test_bst file outside of src/ directory.
chelseadole Nov 26, 2017
31f4293
added Node class with letter and children.
chelseadole Dec 2, 2017
105ecd9
initializing Trie class with root and size.
chelseadole Dec 2, 2017
b9f53fb
beginnings of insert function, which takes in word.
chelseadole Dec 2, 2017
7f6faa9
adding contains method, rehash of insert method. first four tests.
chelseadole Dec 2, 2017
14b85fb
adding fifth test, for adding the same word twice.
chelseadole Dec 2, 2017
f6e63a7
error handing. fixing insert method to work with a word that is only …
chelseadole Dec 2, 2017
e6b4018
continuing to troubleshoot one-letter word issue.
chelseadole Dec 2, 2017
a8b8e7d
commenting out nonworking code for later correction.
chelseadole Dec 2, 2017
916b6a2
adding size function.
chelseadole Dec 2, 2017
c5285a4
adding contains test
chelseadole Dec 2, 2017
24dba69
incorrect code commented out for later.
chelseadole Dec 2, 2017
fc43db3
fixing contains method so it doesnt pick up fragments of words.
chelseadole Dec 2, 2017
667baec
redoing contains method to use for loop, and be shorter.'
chelseadole Dec 3, 2017
ab1df91
changing insert to use a for loop
chelseadole Dec 3, 2017
31e5559
working on remove method.
chelseadole Dec 3, 2017
226bc0e
adding size to remove method.
chelseadole Dec 3, 2017
9345b21
finishing last test, Trie complete woooo
chelseadole Dec 3, 2017
11b9599
adding one last test and changing remove method.
chelseadole Dec 3, 2017
0299d59
adding new functions and Trie info to README.
chelseadole Dec 3, 2017
60daa5f
adding Travis.CI badge/file.
chelseadole Dec 3, 2017
7732ece
adding big-o runtime to all fns in README.
chelseadole Dec 3, 2017
a10dd2e
travis badge
chelseadole Dec 3, 2017
b33a685
adding first part of trie_traversal, finding start in tree from root.
chelseadole Dec 7, 2017
c47630c
adding _trie_gen generator with recursive helper function _recursive_…
chelseadole Dec 7, 2017
aef0585
changes to new, combined version of two recursive functions.
chelseadole Dec 7, 2017
f16656e
more options for trie traversal
chelseadole Dec 7, 2017
3c8606e
adding basics of autocomplete
chelseadole Dec 7, 2017
5aa97a2
wrote bubble sort, no tests.
chelseadole Dec 9, 2017
501a826
Accidentally wrote fn in wrong file. Moved back to bubblesort.py.
chelseadole Dec 9, 2017
33a3417
two first bubblesort tests. Testing error handing for non-lists and e…
chelseadole Dec 9, 2017
81e094f
more error handling test.
chelseadole Dec 9, 2017
cecdd52
test for longer bubblesort, sort on neg numbers.
chelseadole Dec 9, 2017
7a0a0cc
test on presorted list.
chelseadole Dec 9, 2017
10c95dd
adding if __name__ == main block for bubblesort.
chelseadole Dec 9, 2017
37412e2
last changes to bubble sort.
chelseadole Dec 9, 2017
5ebdb1f
adding bubblesort to README.
chelseadole Dec 9, 2017
3ff15ed
adding a bunch of tests. that should work but mostly dont because thi…
chelseadole Dec 11, 2017
35f3a28
adding if name is main block to mergesort.
chelseadole Dec 11, 2017
c8947b0
fixing merge_sort to return item, use only index 0 of popped lists.
chelseadole Dec 11, 2017
db0f058
adding to README for mergesort.
chelseadole Dec 11, 2017
2951799
radix sorting.
chelseadole Dec 12, 2017
749fa69
wrote beginning of quick_sort fn.
chelseadole Dec 12, 2017
12ffbe6
writing for loop and finishing quick_sort.
chelseadole Dec 12, 2017
985c22c
adding tests for quicksort.
chelseadole Dec 12, 2017
97ca380
adding a few more tests.
chelseadole Dec 12, 2017
5ca28d5
adding more edgecase tests in quicksort.
chelseadole Dec 12, 2017
881e745
added README.
chelseadole Dec 12, 2017
2f3afc6
adding timeit sort type to quicksort.
chelseadole Dec 14, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
language: python
python:
- "2.7"
- "3.6"

install:
- pip install pytest
script:
- pytest
56 changes: 54 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,54 @@
# data-structures
Data Structures in Python. Code Fellows 401.
# Data-Structures

**Author**: Chelsea Dole

**Coverage**: [![Build Status](https://travis-ci.org/chelseadole/data-structures.svg?branch=master)](https://travis-ci.org/chelseadole/data-structures)

**Resources/Shoutouts**: Nathan Moore (lab partner/amigo)

**Testing Tools**: pytest, pytest-cov

## Data Structures:

* **Binary Search Tree** — *a BST is a "tree shaped" data structure containing nodes. Each node can have a maximum of two children or "leaves," and all node values are properly located based on its parents and siblings values. Nodes to the left of the "root"/head node have values smaller than the root. Those to the right have values larger than the root. There are no duplicate values.*

* **Trie Tree** - *a Trie Tree is a "tree shaped" data structure containing nodes with references to letters. These nodes string together (using each node's "children" and "parent" attriutes) to form words. This tree allows for quick lookup time of words, and is used for things such as word suggestion/auto-complete.*

## Sorting Algorithms:

* **Bubblesort** — *Bubblesort sorts an input numerically from smallest to largest number by stepping through each index, and (if the value of the index above it is lower) swapping the current index with the current index + 1. The runtime for this algorithm is O(n^2), because for each number in the input list, the algorithm must look over that number a number of times approximately equal to its square.*

* **Mergesort** - *Mergesort sorts an input list numerically from smallest to largest by dividing it into sections, then sorting each section piece by piece. As the sections are merged/sorted, the length of the list gets smaller, until there are only two "sections" to merge. Once they are merged and sorted, the list is fully osorted. The runtime for this algorithm is O(n), because the runtime increases with the increase in length of list.*

* **Quicksort** - *Quicksort sorts an input list numerically from smallest to largest by selecting a "pivot" value, and the rearranging the rest of the list around the pivot. The runtime for this algorithm is O(n log n), because though the runtime is relational to the size of the list, its increased runtime is not linear but logarithmic.*

## Time Complexities:

* balance() = *This BST function returns the balance (or size difference) between the left and right parts of the tree. Its runtime is O(1), because it always takes the same amount of time to run regardless of tree size, and only performs simple subtraction.*

* size() = *This BST function returns the number of nodes/leaves in a tree. Its runtime is O(1), because runtime never changes regardless of tree size. It only returns the value of tree_size, which is created at BST initialization, and changed during the insertion of nodes.*

* insert() = *This BST function inserts a new node into a tree, and uses a helper function called find_home() to find its correctly sorted place in the tree. This function is, depending on the tree, anywhere between O(logn) and O(n), if it's a relatively balanced tree, every decision will reduce the number of nodes one has to traverse. But if it's a one-sided tree, one may look over every node -- making it O(n).*

* search() = *This BST function is a reference to check_for_equivalence(), which is recursive, and has a runtime of O(n^2), because every time you're re-calling check_for_equivalence, it looks at every node's equivalence.*

* contains() = *This BST function looks at search(), described above, and for the same reasons has runtime of O(n^2).*

* depth() = *This BST function returns the number of "levels" that the tree has, by finding which of the two sides has the greatest depth, and returning that. It has a runtime of O(1), because no matter the size of the tree, it only performs a comparison operation.*

* in_order() = *This BST traversal function traverses the tree and returns a generator that outputs the node values in numerical order. It has a runtime of O(n), not because you visit every node once (you visit them more than once here) but because the work you do/time you take is constant and grows constantly per node addition.*

* pre_order() = *This BST traversal function returns a generator that outputs the node values in order of the furthest left parent, its left child, then its right child. This traveral then backs up to the parent, and repeats until the whole tree has been traversed. Like in_order, it has a runtime of O(n), not because you visit every node once (you visit them more than once here) but because the work you do/time you take is constant and grows constantly per node addition.*

* post_order() = *This BST traversal function returns a generator that outputs the node values in order of the bottom-most left node, the bottom-most right node, and then those nodes' parent. Then it backs up, and repeats this action with the parent as a new child node, until the whole tree has been traversed. Like in_order and pre_order, it has a runtime of O(n), not because you visit every node once (you visit them more than once here) but because the work you do/time you take is constant and grows constantly per node addition.*

* breadth_first() = *This BST traversal returns a generator that outputs the node values in order of their "levels". It produces first the root, then all nodes (left to right) in the first depth level, then all nodes (left to right) in the second depth level, et cetera. Like in_order, pre_order, and post_order, it has a runtime of O(n), not because you visit every node once (you visit them more than once here) but because the work you do/time you take is constant and grows constantly per node addition.*

* insert() = *This Trie insert method adds a word to the trie tree. It first checks to see if the word is already in the tree (in which case it does nothing). Then, it goes through each letter of the word and uses the dictionary function setdefault to add a new letter node if it doesn't already exist, and string together the letters. Finally, it increases the tree's size attribute. The time complexity is O(len(word)), because the length of runtime depends on the size of the word you're inserting.*

* contains() = *This Trie method checks if the tree contains a certain word. It does this by iterating through each letter of the word and checking if the letter node's children dictionary contains a key to the next letter in the word. If at any point it doesnt (or if the last letter of the word doesn't have the "end" attribute as True) it returns False. The time complexity is at worst case, O(n), because in the worst case scenario, you have just one word in the tree, and you have to check through all the letters in that one word.*

* size() = *This Trie method returns the number of words in the tree by returning the tree's size attribute, which is incremented and decremented in insert() and remove() respectively. Time complexity should be O(1), because it just returns a number: the attribute of Trie.*

* remove() = *This Trie method removes a word from the tree. First it traverses to the node of the last letter of the tree (and raises an error if the word doesnt exist). Once at the last letter, it moves backwards, deleting references to the children/letters below. Time complexity should be O(n * 2), because worst case scenario, the word you're removing is the only word in the tree, and you had to traverse all the way down the letters then come back up.*

273 changes: 273 additions & 0 deletions bst.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
"""Implementation of a binary search tree data structure."""
import timeit as time


class Node(object):
"""Define the Node-class object."""

def __init__(self, value, left=None, right=None, parent=None):
"""Constructor for the Node class."""
self.val = value
self.left = left
self.right = right
self.parent = parent
self.depth = 0


class BST(object):
"""Define the BST-class object."""

def __init__(self, starting_values=None):
"""Constructor for the BST class."""
self.tree_size = 0
self.left_depth = 0
self.right_depth = 0
self.visited = []

if starting_values is None:
self.root = None

elif isinstance(starting_values, (list, str, tuple)):
self.root = Node(starting_values[0])
self.tree_size += 1
for i in range(len(starting_values) - 1):
self.insert(starting_values[i + 1])

else:
raise TypeError('Only iterables or None\
are valid parameters!')

def balance(self):
"""Return the current balance of the BST."""
return self.right_depth - self.left_depth

def size(self):
"""Return the current size of the BST."""
return self.tree_size

def insert(self, value):
"""Insert a new node into the BST, and adjust the balance."""
new_node = Node(value)

if self.root:
if new_node.val > self.root.val:
if self.root.right:
self._find_home(new_node, self.root.right)
if new_node.depth > self.right_depth:
self.right_depth = new_node.depth
else:
new_node.parent = self.root
self.root.right = new_node
self.root.right.depth = 1
if self.root.right.depth > self.right_depth:
self.right_depth = self.root.right.depth
self.tree_size += 1

elif new_node.val < self.root.val:
if self.root.left:
self._find_home(new_node, self.root.left)
if new_node.depth > self.left_depth:
self.left_depth = new_node.depth
else:
new_node.parent = self.root
self.root.left = new_node
self.root.left.depth = 1
if self.root.left.depth > self.left_depth:
self.left_depth = self.root.left.depth
self.tree_size += 1
else:
self.root = new_node
self.tree_size += 1

def _find_home(self, node_to_add, node_to_check):
""".
Check if the node_to_add belongs on the left or right
of the node_to_check, then place it there if that spot is empty,
otherwise recur.
"""
if node_to_add.val > node_to_check.val:
if node_to_check.right:
self._find_home(node_to_add, node_to_check.right)
else:
node_to_add.parent = node_to_check
node_to_check.right = node_to_add
node_to_check.right.depth = node_to_check.depth + 1
self.tree_size += 1

elif node_to_add.val < node_to_check.val:
if node_to_check.left:
self._find_home(node_to_add, node_to_check.left)
else:
node_to_add.parent = node_to_check
node_to_check.left = node_to_add
node_to_check.left.depth = node_to_check.depth + 1
self.tree_size += 1

def search(self, value):
"""If a value is in the BST, return its node."""
return self._check_for_equivalence(value, self.root)

def contains(self, value):
"""Return whether or not a value is in the BST."""
return bool(self.search(value))

def _check_for_equivalence(self, value, node_to_check):
""".
Check if the value matches that of the node_to_check
if it does, return the node. If it doesn't, go left or right
as appropriate and recur. If you reach a dead end, return None.
"""
try:
if value == node_to_check.val:
return node_to_check

except AttributeError:
return None

if value > node_to_check.val and node_to_check.right:
return self._check_for_equivalence(value, node_to_check.right)

elif value < node_to_check.val and node_to_check.left:
return self._check_for_equivalence(value, node_to_check.left)

def depth(self):
"""Return the depth of the BST."""
if self.left_depth > self.right_depth:
return self.left_depth
return self.right_depth

def in_order(self):
"""Return a generator to perform an in-order traversal."""
self.visited = []

if self.root is None:
raise IndexError("Tree is empty!")

gen = self._in_order_gen()
return gen

def _in_order_gen(self):
"""Recursive helper method for in-order traversal."""
current = self.root

while len(self.visited) < self.tree_size:
if current.left:
if current.left.val not in self.visited:
current = current.left
continue

if current.val not in self.visited:
self.visited.append(current.val)
yield current.val

if current.right:
if current.right.val not in self.visited:
current = current.right
continue

current = current.parent

def pre_order(self):
"""Return a generator to perform an pre-order traversal."""
self.visited = []

if self.root is None:
raise IndexError("Tree is empty!")

gen = self._pre_order_gen()
return gen

def _pre_order_gen(self):
"""Recursive helper method for pre-order traversal."""
current = self.root

while len(self.visited) < self.tree_size:
if current.val not in self.visited:
self.visited.append(current.val)
yield current.val

if current.left:
if current.left.val not in self.visited:
current = current.left
continue

if current.right:
if current.right.val not in self.visited:
current = current.right
continue

current = current.parent

def post_order(self):
"""Return a generator to perform an post-order traversal."""
self.visited = []

if self.root is None:
raise IndexError("Tree is empty!")

gen = self._post_order_gen()
return gen

def _post_order_gen(self):
"""Recursive helper method for post-order traversal."""
current = self.root

while len(self.visited) < self.tree_size:
if current.left:
if current.left.val not in self.visited:
current = current.left
continue

if current.right:
if current.right.val not in self.visited:
current = current.right
continue

if current.val not in self.visited:
self.visited.append(current.val)
yield current.val

current = current.parent

def breadth_first(self):
"""Return a generator to perform a breadth-first traversal."""
self.visited = []

if self.root is None:
raise IndexError("Tree is empty!")

gen = self._breadth_first_gen(self.root)
return gen

def _breadth_first_gen(self, root_node):
"""Helper generator for breadth-first traversal."""
queue = [self.root]
while queue:
current = queue[0]
yield current.val
queue = queue[1:]

if current not in self.visited:
self.visited.append(current)

if current.left:
if current.left not in self.visited:
queue.append(current.left)

if current.right:
if current.right not in self.visited:
queue.append(current.right)


if __name__ == '__main__': # pragma: no cover
left_bigger = BST([6, 5, 4, 3, 2, 1])
right_bigger = BST([1, 2, 3, 4, 5, 6])
bal_tree = BST([20, 12, 10, 1, 11, 16, 30, 42, 28, 27])

left_bigger = time.timeit("left_bigger.search(5)", setup="from __main__ import left_bigger")
right_bigger = time.timeit("right_bigger.search(5)", setup="from __main__ import right_bigger")
bal_tree = time.timeit("bal_tree.search(8)", setup="from __main__ import bal_tree")

print('Left-Skewed Search Time: ', left_bigger)
print('Right-Skewed Search Time: ', right_bigger)
print('Balanced Search Time: ', bal_tree)
15 changes: 15 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
"""Setup module for Chelsea's data structures."""

from setuptools import setup

setup(
name='Data structures',
description='Various data structures in python',
author='Chelsea Dole',
author_email='chelseadole@gmail',
package_dir={' ': 'src'},
py_modules=['bst'],
install_requires=['timeit'],
extras_require={
'test': ['pytest', 'pytest-cov', 'pytest-watch', 'tox'],
'development': ['ipython']})
Loading