123456789101112131415161718192021222324252627282930313233343536373839 |
- A structure for constant databases
- 19960914
- Copyright 1996
- D. J. Bernstein, djb@pobox.com
- A cdb is an associative array: it maps strings (``keys'') to strings
- (``data'').
- A cdb contains 256 pointers to linearly probed open hash tables. The
- hash tables contain pointers to (key,data) pairs. A cdb is stored in
- a single file on disk:
- +----------------+---------+-------+-------+-----+---------+
- | p0 p1 ... p255 | records | hash0 | hash1 | ... | hash255 |
- +----------------+---------+-------+-------+-----+---------+
- Each of the 256 initial pointers states a position and a length. The
- position is the starting byte position of the hash table. The length
- is the number of slots in the hash table.
- Records are stored sequentially, without special alignment. A record
- states a key length, a data length, the key, and the data.
- Each hash table slot states a hash value and a byte position. If the
- byte position is 0, the slot is empty. Otherwise, the slot points to
- a record whose key has that hash value.
- Positions, lengths, and hash values are 32-bit quantities, stored in
- little-endian form in 4 bytes. Thus a cdb must fit into 4 gigabytes.
- A record is located as follows. Compute the hash value of the key in
- the record. The hash value modulo 256 is the number of a hash table.
- The hash value divided by 256, modulo the length of that table, is a
- slot number. Probe that slot, the next higher slot, and so on, until
- you find the record or run into an empty slot.
- The cdb hash function is ``h = ((h << 5) + h) ^ c'', with a starting
- hash of 5381.
|