Comparing Hashes

Create a class called HashComparer that examines two hashing functions and compares their effectiveness in the following way:

Given the following file which consists of a list of 28,500 words, one word per line. Apply two hashing functions to each word. Do not actually store the words in the hash tables; rather use integer arrays initialized to all zeros. If a word hashes to position i, simply increment the value in position i. In running your trials, use an array of size 5701. The ideal expected average size for the chains in this case would be 28500/5701 ~ = 5.

For each function, report the average chain length(for chains with values), the number of empty chains and the longest five chains. Report on the “goodness” of the functions used based on the following criteria: These are just some ideas for criteria, many others are possible. Write the reports as output statements in your program.

  • The number of chains of length zero is less than 10% of all chains.
  • The length of the longest chain will be less than eight times the expected length of the average chain.

Hashing Functions:

  1. add the Unicode values of the characters in the word, save as a long integer and obtain the remainder using 5701
  2. For each of the following character strings s, compute the location h(s) = ns mod 5701 for this string in a hash table of size 5701, where ns is the (generally, extremely large) integer that is obtained by mapping each symbol in s to a number between 0 and 127 and using these as a “digits” of a base-128 integer.

Finally, make some conclusions about your results in a short summary printed out by your program.