An online markdown blog and knowledge repository.
Code Fellows What Is A Hashtable
Paul Programming on YouTube What Is A HashTable Data Structure
HackerEarth.com Basics of Hash Tables
Hash Table on Wikipedia
Hash: Algorithm that takes an input and returns a value that can be used for security, or for an index into an array.
Buckets: Containers located at eash index of an array that store Values. This can be a 1:1 or 1:many relationship.
Collisions: More than key, when hashed, returns the same hash value.
Use Hash Tables to hold unique values, creating dictionaries, or developing a Library.
Each Node (or Bucket) is made up of a Key and Value pair.
Goal is to quickly store and find values by keys without having to search across many/all other items.
Searching an array is an O(n) read operation because worst-case scenario we would have to touch every item in the array.
Knowing exactly which index the value is at is a O(1) read operation because of N items in the array, only 1 is traversed/accessed.
A Hash Function is used to take an input, and guarantee the output is always the same, so it a good index-defining mechanism.
Using a Hash Function against and Array with n buckets (each containing zero to one value) is a O(1) read operation.
Turn a key into an integer: K => int
Hashtables are typically an Array of buckets of some large size (1024 is the author's favorite).
Suggested Hash Function Steps:
Consider:
More than one key hashes to the same index in an array.
A perfect hash would never have collisions, and the worst possible hash always returns the exact same index into the array.
HashMaps need to be designed to handle collisions.
One way to work around collision issues is to modify the Array: Make each bucket a linked list, another array, or a BST.
When implementing buckets deeper than 1 level, store the index alongside the value in the bucket's Nodes.
The curriculum author suggests this method to STORE a value in a hash map:
The curriculum author suggests this method to RETREIVE a value from a hash map:
Tells us how full the hash table is.
To summarize Load Factor:
The curriculum author suggests these methods to use in hash tables.
Datastructure for storing information.
Utilizes Key Value pairs. Key is a sort of index, and a Value is the 'bucket' that stores the data associated with the key.
Make the initial hashtable storage size very large.
Hash Function: Takes a K, evaluates it to determine an indexing number, to determine:
Hash Functions always return the same, predictable value given the same input.
Hash Collision: A hashed value returns an index where a value is already stored.
Ways to handle Collisions:
Uniquely ID's an object from a group of objects.
Use a hashing algorithm to define and utilize keys that are large or complex.
Ideal Hash Tables will store data within their structure evenly/smoothly.
Recommended 2-step Hashing:
Suggested hashing algorithm:
hash = hashfunc(key)
index = hash % array_size
Use Prime Numbers in the calculation, like '599', to reduce the probability of collisions.
Use key * prime modulu size_of_array to further segregate similar data.
The larger the backing storage structure, the better the modulus will do at returning an empty index.
Stores KVPs.
Leverages a Hash Function to find values based on keys, and find locations to store KVPs.
The average time required to search for an element in a reasonably well designed hash table is O(1).
Commonly used.
Uses Linked Lists.
Each bucket is a Linked List if it contains data at all.
KVP's with same hash-function output values (hash codes) get stored, in-order, within the Linked List bucket.
Worst-case scenario is when all entries are inserted into the same linked list.
All entry records are stored in the array itself.
When a collision occurs, a Probe Sequence is followed until the an empty index is located.
Probe Sequence:
Linear Probing: Use a fixed count to traverse beyond any collided slot.
Linear Probing Algorithm example:
index = index % hash_table_size # 1st try collides
index = (index + 1) % hash_table_size # might also collide so
index = (index + 1) % hash_table_size # ...etc
Jon Note: This technique could also force your Search and Get methods to run additional code in order to find the stored value.
Similar to Linear Probing but interval is increased dramatically between collision detected slots.
index = index % hash_table_size # collision
index = (index + pow(1, 2) % hash_table_size) # might also collide
index = (index + pow(2, 2) % hash_table_size) # etc
When using Quadratic Probing you must be certain the underlying array size is large enough that the hashing algorithm will likely find an open spot.
Jon Note: This technique could also force your Search and Get methods to run additional code in order to find the stored value.
Similar to Linear Probing but the interval between successive probes is computing using a second hash function.
Example probing sequence pseducode:
index = ((index + 1) * indexH) % hash_table_size # might collide
index = ((index + 2) * indexH) % hash_table_size # etc
Jon Note: This technique could also force your Search and Get methods to run additional code in order to find the stored value.
There are multiple re-hashing techniques that can be utilized to resize an existing hash table.
Some re-hashing methods perform the re-hasing session against the entire table, others do so proportionally and during usual operation.
One method that could be implemented is to have an old-new-hashcode algorithm where a new hashing algorithm is implemented and the add and search functions that call data after table is incrementally increased use the newer algorithm, but the data blocks not yet touched do not.
Return to Root README