class Dictionary { public: Dictionary(); ~Dictionary(); insert(string key, int value); int& operator[] (string key); private: ... };
The inner structure of the class may vary depending on the particular needs, tools, or designer preferences. We will mention only few possibilities:
A simple example of a hash function would be like this. Assume we need to put numbers between 1000 and 1099 into an array
of 100 elements, then we can use the hash function that takes a number to place and returns this number
Unfortunately, this is not always a case. For example, if we are talking about an array of all possible strings, then we do not know array of what size we need to keep to store all of them. We may maintain an array of 26 elements and then our hash function will just return code based on the first letter of the word (0 for 'a', 1 for 'b', etc). In this case, we will definitely end up putting more than one word in the same cell. In other words, we need to deal with a collision. To deal with it, we can, for example, create not an array of words, but an array of word-lists and add each word to the corresponding list.
Another way to divide words into groups is to use a hash function that returns the length of each word as its value. Being used against the same set of words it results in the following picture
Both of these functions are not very good also because they map a word into a number between 0 and M, where M is small (either a number of letters in the alphabet or the length of the longest word). What should we do if we would like to map a word into an interval from 0 to 1024?
Thus, our good hash function should be able to map a word into a number from interval 0 - M for pretty much any M, and at the same time provide equal distribution of the words. That is, we do not want to map half of the words into index 0 and the other half into 1 left all other M-2 elements unused.
union HashKey { unsigned int int_value; char letter[4]; }; unsigned int hash(string s) { HashKey key; for(int i=0;i<4;i++) key.letter[i] = i < s.length() ? s[i] : 0; return key.int_value % M; }To map the int_value into a number from [0, M-1] range, we now need to compute the reminder of the division of the int_value by M. For example, this function maps the word test to 9 and word algorithm to 95. The disadvantage of the function is that it considers only the first 4 letters of a word; that is, all the words starting from the same group of letters will end up with the same hash value. For example, the word testing is also mapped to 9.
In our next hash function will take into consideration all letters of a word and also will mix their ASCII representations a little bit. To explain the idea of mixing up, we can look at the previous hash function and realize that it actually was received by doing this:
int key = 0; for(i=0;i<4;i++) key = ((key<<8) + s[i]) % M; return key % M;To mix the letter codes we can shift them not by 8, but by 7 positions (which is equivalent to multiplying by 128) or even multiply by 127, which mix them even more. Thus our next function will look
unsigned int hash(string s) { unsigned int key = 0; for(int i=0;i<s.length();i++) key = (key * 127 + s[i]) % M; return key % M; }
To mix the codes even better we can change the factor we multiply by (127) every iteration of the algorithm. The following code is called "Universal hash function". For details see Algorithms in C++ by Robert Sedgewick.
unsigned int hash(string s) { unsigned int key = 0, a = 31415, b = 27183; for(int i=0;i<s.length();i++){ key = (key * a + s[i]) % M; a = (a * b) % (M-1); } return key % M; }