As I said in the comment, this is not very compatible with hashing because hash functions deliberately try to "erase" the magnitude information in a bit's position so that values with similar magnitude are unlikely to map to the same hash value.Ĭonsequently, I think that you will not do better than an optimal binary search tree, or equivalently a code generator that produces an optimal "tree" of "if else" statements. I see now that you are trying to map all integers in the ranges to the same function value. I just ran across this library I had not seen before. A literature search in ACM and IEEE literature will certainly turn up more. There are some formal references on the Wikipedia page. Just add a length byte so that integers are 5 byte strings and ranges are 9 bytes. Gnu has a perfect hash function generator called gperf that is meant for strings but might possibly work on your data. In your case a set of N 32-bit integers and ranges would each be mapped to a unique integer in a range of size a small multiple of N. The idea of a minimal perfect hash is that a set of distinct inputs is mapped to a dense set of integers in 1-1 fashion. There is a pretty extensive theory of minimal perfect hash functions that I think will meet your requirement. Note that the input stream can be considered to be randomly distributed. We are thinking this will, for most cases of input streams, yield something quite a bit faster than a binary search on the bounds of the ranges. Note that most of the incoming integers will be uninteresting, and it is of critical importance to make a very quick decision for as large a percentage of that portion of the stream as possible (which is why Bloom filters looked interesting at first (before we starting thinking that their implementation required more computational complexity than other solutions)).īecause the first decision is so important, we are also considering having multiple tables, the first of which would be inverse masks (masks to select uninteresting numbers) for the easy to find large ranges of data not included in a given "switch statement", to be followed by subsequent tables that would expand the smaller ranges. The current plan is to create a custom algorithm that preprocesses the list of interesting integers and ranges (for a given object at a given time) looking for shifts and masks that can be applied to input stream to help delineate the ranges. Perhaps we are not looking in the right places or using the correct vernacular.) (We could not even find a hash or map function that considered these types of ranges as inputs, much less a highly efficient one. We have searched around the internet, looking at Bloom filters, various hash functions listed on Wikipedia's "hash function" page and elsewhere, quite a few Stack Overflow questions, abstract algebra (mostly Galois theory which is attractive for its computationally simple operands), various ciphers, etc., but have not found a solution that appears to be targeted to this problem. Hash/map algorithms can be re-generated slowly with each update based on the specific integers and ranges of interest (for a given object at a given time). The values of interest are subject to real time change, but performance associated with managing these changes is not critical. Many of the objects will contain "switch statements" with only a few entries. The implementation details are tangential to the request, but provided to help with visualization.) It will instead be a jump table (which is an array of function pointers) or maybe a combination of the Command and Observer patterns, etc. (Note that the actual implementation will not include switch statements. (In the pseudo code above 33-122 represents all integers from 33 to 122.) There will be a large number of objects containing these "switch statements." In a typical example, there would be only a few thousand of the "cases" shown above, which could include a full range of 32-bit integer values and ranges. The performance critical portion is in processing the incoming stream and selecting the appropriate function.Īs a simple example, please consider the following pseudo-code: switch (highFrequencyIntegerStream) The number of integers/ranges of interest in most cases will be small (zero to a few thousand). It is OK if the hash/map function selection itself varies based on the specific integer and range requirements, and the performance associated with the part of the code that selects this algorithm is not critical. We are looking for the computationally simplest function that will enable an indexed look-up of a function to be determined by a high frequency input stream of widely distributed integers and ranges of integers.
0 Comments
Leave a Reply. |