jff: checksum algorithm is not as intended
От | Yura Sokolov |
---|---|
Тема | jff: checksum algorithm is not as intended |
Дата | |
Msg-id | 7d018a5e73186a08f891e46fa25bdf18@postgrespro.ru обсуждение исходный текст |
Ответы |
Re: jff: checksum algorithm is not as intended
|
Список | pgsql-hackers |
Good day. Current checksum is not calculated in intended way and has the flaw. Single round function is written as: #define CHECKSUM_COMP(checksum, value) do {\ uint32 __tmp = (checksum) ^ (value);\ (checksum) = __tmp * FNV_PRIME ^ (__tmp >> 17);\ } while (0) And looks like it was intended to be (checksum) = (__tmp * FNV_PRIME) ^ (__tmp >> 17); At least original Florian Pflug suggestion were correctly written in this way (but with shift 1): https://www.postgresql.org/message-id/99343716-5F5A-45C8-B2F6-74B9BA357396%40phlo.org But due to C operation precedence it is actually calculated as: (checksum) = __tmp * (FNV_PRIME ^ (__tmp >> 17)); It has more longer collision chains and worse: it has 256 pathological result slots of shape 0xXX000000 each has 517 collisions in average. Totally 132352 __tmp values are collided into this 256 slots. That is happens due to if top 16 bits are happens to be 0x0326 or 0x0327, then `FNV_PRIME ^ (__tmp >> 17) == 0x1000000`, and then `__tmp * 0x1000000` keeps only lower 8 bits. That means, 9 bits masked by 0x0001ff00 are completely lost. mix(0x03260001) == mix(0x03260101) = mix(0x0327aa01) == mix(0x0327ff01) (where mix is a `__tmp` to `checksum` transformation) regards, Yura Sokolov y.sokolov@postgrespro.ru funny.falcon@gmail.com PS. Test program in Crystal language is attached and output for current CHECKSUM_COMP implementation and "correct" (intended). Excuse me for Crystal, it is prettier to write for small compiled scripts.
Вложения
В списке pgsql-hackers по дате отправления: