GitHub - carterrows/Hash-table-implementation-for-large-data: Used a hash table with ability to select different hashing algorithms or inserting / searching algorithms for large data sets.

Carter Rows - 1170615 October 13, 2023

IMPLEMENTATION:

Insertion works as expected. Only thing I have noticed in testing is that you must set the table size to a sufficient size so every key can be inserted, orelse the program quits.
Searching works as expected. One thing to note, if we search for a key that isn't in the table, the cost of the search increases depending on which probing algorithm is used. For example, if we probe with linear, and try to search for a key that isn't in the table, we will have to probe through every table cell before realizing it is not in the table. Quadratic and double hash probing will realize it is not in the table much faster.
Deletion works as expected. The same thing applies here with the probing costs for keys that are not in the table.

VALGRIND:

I've run valgrind for insertion only, deletion, and searching. All dynamically allocated blocks are freed, and I have no leaks.

hash_CARTER:

My hashing algorithm I implemented is an adapted version of the Fowler-Noll-Vo hashing function.

This algorithm starts by setting our return HashIndex as an offset value defined as 0x811c9dc5. This is the offset value for the 32 bit implementation of the fnv hashing algorithm. This value helps with the distribution of hash indexes provided for different keys.

Then, for every byte in the AAKeyType key, we XOR it with the current HashIndex, and multiply it with the FNV prime value defined as 2^24 + 2^8 + 0x93 0x01000193, again using the 32 bit prime value for the FNV function. Then once we have iterated through every byte in the key, we simply return our HashIndex % size_of_array. This ensures every index returned is inside the boundary of allowed return indexes.

I have experimented with using the 64 bit FNV prime and offset values, and found that for this specific use case, the 32 bit offset and prime hex values lower the insertion cost of the program.

Here is a comparison of the three hashing algorithms used in this assignment, with different table sizes:

Size = 250:

hash_CARTER cost = 144
hashByLength cost = 5662
hashBySum cost = 157

Size = 500:

hash_CARTER cost = 121
hashByLength cost = 5662
hashBySum cost = 136

Size = 1000:

hash_CARTER cost = 122
hashByLength cost = 5662
hashBySum cost = 131

As seen above, my hashing algorithm performs better than hashBySum and hashByLength for all three table sizes. The distribution of keys seems to be uniform and has minimal clustering using my hash function, for all three table sizes. My algorithm reduces collisions, reducing the need for probing, and speeds up the program as a result.

hash_CARTER meets the must have requirements of a hashing algorithm:

For a given key, the hash-CARTER will return the same index every time
The return indexes for different keys are uniformally distributed throughout any hash table size I used.

hash_CARTER meets the should have goals:

For a small change in the input key, the index returned is significantly different, helping with uniform distribution.
Uses all the space available in the table. Some keys are inserted at the end, others at the start, middle, etc.

hash_CARTER meets the ideally goals:

Reduced clustering or hotspots in indexing. The offset value helps with this.
Reduced collisions. As shown in the cost comparisons, my algorithm reduces collisions and insert costs.

REFERENCES:

To help create my hashing algorithm I used the following:

http://www.isthe.com/chongo/tech/comp/fnv/index.html

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
aalib		aalib
CIS2520-F23-A3.md		CIS2520-F23-A3.md
README.md		README.md
aarray.h		aarray.h
data-byname.txt		data-byname.txt
data-bynumber.txt		data-bynumber.txt
data-reader.c		data-reader.c
data-reader.h		data-reader.h
mainline.c		mainline.c
makefile		makefile
smalldata-mixed.txt		smalldata-mixed.txt
testSearch.txt		testSearch.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

carterrows/Hash-table-implementation-for-large-data

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages