Given the performance problems we have with needing to touch entity usage rows (T111769, T122429) I looked at Wikibase for a bit and tried to see whether we could actually drop eu_touched.
I hope the analysis below is conclusive (did I miss any major use case for the field?), thus allowing for the conclusion I drew from it.
The field is (directly) only being used in EntityUsageTable right now (except for tests). In there we have four usages of that field:
- EntityUsageTable::touchUsages (thus updating the field)
- EntityUsageTable::makeUsageRows which is only called in EntityUsageTable::addUsages (for adding new rows)
- EntityUsageTable::queryUsages (which can optionally be filtered by the touched time).
- EntityUsageTable::pruneStaleUsages (usages older than a given value of that field)
1 and 2 are only used in SqlUsageTracker::trackUsedEntities which in turn is only used in UsageUpdater::addUsagesForPage. That function is used in two places:
- DataUpdateHookHandlers::doLinksUpdateComplete which calls it immediately before also pruning old values (with the current timestamp, thus all values in the table before that will be pruned).
- AddUsagesForPageJob which is only fired in case a new ParserCache entry is being saved.
3 has only one usage where eu_touched actually matters: EntityUsageTable::pruneStaleUsages (which is 4) where it is used to be able to delete by PK. Thus it is not interesting here.
4 is being used in SqlUsageTracker::pruneStaleUsages only which in turn is used in UsageUpdater::pruneUsagesForPage only. That function in turn is only used in DataUpdateHookHandlers in two places:
- To prune entries for deleted pages (where the timestamp obviously doesn't matter)
- In DataUpdateHookHandlers::doLinksUpdateComplete immediately after touching the batch of relevant usages (with the current timestamp, thus all values in the table before that will be pruned).
To conclude this, it should be enough to look at DataUpdateHookHandlers in order to get the big picture.
Behaviour on edit:
After a user edited a page we (immediately) run DataUpdateHookHandlers::doParserCacheSaveComplete, thus adding the new usage entries to the table (but without touching any of the old values, yet). Some time after that a LinksUpdate job will run (asynchronously), that will trigger DataUpdateHookHandlers::doLinksUpdateComplete which deletes all usage entries, except for those in the ParserOutput of the edit that triggered the LinksUpdate.
Page views that happen between the page save but before the LinksUpdate run will have their usages being lost (as we initially insert the usages via DataUpdateHookHandlers::doParserCacheSaveComplete, but purge them in our LinksUpdate hook handler later on). That is a problem with the current implementation and will also be one in the new implementation without eu_touched.
As far as I see, we can come around using eu_touched at all by making to changes:
- Simply delete all old usages and insert the new ones afterwards in DataUpdateHookHandlers::doLinksUpdateComplete (obviously you would do a diff and only touch rows you need to in a real implementation)
- Simply keep letting AddUsagesForPageJob insert all new usages it has. In order to avoid race conditions with doLinksUpdateComplete, the job should know about the page_touched of the page in question and only actually insert its rows, in case the touched timestamp hasn't changed.