Re: Derived columns / denormalization
От | Jamie Tufnell |
---|---|
Тема | Re: Derived columns / denormalization |
Дата | |
Msg-id | b0a4f3350901181633x12714f15te5ea65adee5f7617@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Derived columns / denormalization (Erik Jones <ejones@engineyard.com>) |
Список | pgsql-sql |
On 1/17/09, Erik Jones <ejones@engineyard.com> wrote: > On Jan 15, 2009, at 8:06 PM, Tom Lane wrote: >> "Jamie Tufnell" <diesql@googlemail.com> writes: >>> item_count int -- this is derived from (select count(*) from items >>> where group_id = id) >>> ... >> >>> item_count would be updated by insert/update/delete triggers on the >>> items table, hopefully that would ensure it is always correct? >> >> Concurrent updates to the items table make this much harder than >> it might first appear. If you're willing to serialize all your >> updating >> transactions then you can make it work, but ... > > That was exactly the caveat I was about to point out. That being > said, keeping COUNT() values and other computed statistics based on > other data in the database *is* a fairly common "tactic". On method > that I've used to great success to avoid the serialization problem is > to have your triggers actually insert the necessary information for > the update into a separate "update queue" table. You then have > separate process that routinely sweeps that update queue, aggregates > the updates and then updates your count values in the groups table > with the total update values for each groups entry with updates. Fortunately our items table rarely sees concurrent writes. It's over 99% reads and is typically updated by just one user. We are already caching these aggregates and other data in a separate layer and my goal is to see if I can get rid of that layer. In light of your advice though, I will think things through a bit more first. Thanks for your help! Jamie
В списке pgsql-sql по дате отправления: