Hashing and Randomizing

925 Words2 Pages

The Hashing Definition:

Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value. It is also used in many encryption algorithms

Hash functions:

Hash functions are mostly used in hash tables, to quickly locate a data record (for example, a dictionary definition) given its search key (the headword). Specifically, the hash function is used to map the search key to the index of a slot in the table where the corresponding record is supposedly stored.

A hash table, or (a hash map), is a data structure that associates keys with values. The primary operation it supports efficiently is a lookup: given a key (e.g. a person's name), find the corresponding value (e.g. that person's telephone number). It works by transforming the key using the hash function into a hash, a number that is used as an index in an array to locate the desired location ("bucket") where the values should be.

As a simple example of the using of hashing in databases: a group of people could be arranged in a database like this:

Abernathy, Sara & Epperdingle, Roscoe & Moore, Wilfred Smith, David (and many more sorted into alphabetical order)

Each of these names would be the key in the database for that person's data. A database search mechanism would first have to start looking character-by-character across the name for matches until it found the match (or ruled the other entries out). But if each of the names were hashed, it might be possible (depending on the number of names in the database) to generate a unique four-digit key for each name. For example:

7864 Abernathy, Sara & 9802 Epperdingle, Roscoe & 1990 Moore, Wilfred & 8822 Smith, David (and so forth)

A search for any name would first consist of computing the hash value (using the same hash function used to store the item) and then comparing for a match using that value. It would, in general, be much faster to find a match across four digits, each having only 10 possibilities, than across an unpredictable value length where each character had 26 possibilities.

More about Hashing and Randomizing

Open Document