Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Page MenuHomePhabricator

Slow access to Special:Contributions on mediawiki.org (due to enabling actor table WRITE_BOTH mode)
Closed, ResolvedPublic

Description

Today at least, accessing Special:Contributions for any user (even if it has less than 5 contributions) takes a long time to load (in the order of 25 - 30 seconds), while other pages load near-instantly. example

I suspect there's a problem with today's deploy: https://www.mediawiki.org/wiki/MediaWiki_1.32/wmf.22

curl https://www.mediawiki.org/wiki/Special:Contributions/Andrew0005?forceprofile=1


<!--
100.00% 10800.014      1 - main()
100.00% 10799.744      1 - {internal}
100.00% 10799.541      1 - run_init::/srv/mediawiki/php-1.32.0-wmf.20/index.php
 99.34% 10729.084      1 - MediaWiki::run
 98.95% 10686.479      1 - MediaWiki::main
 97.57% 10537.861      1 - MediaWiki::performRequest
 97.50% 10529.659      1 - MediaWiki\Special\SpecialPageFactory::executePath
 97.49% 10529.381      1 - SpecialPage::run
 97.45% 10524.883      1 - SpecialContributions::execute
 93.27% 10072.893     39 - Wikimedia\Rdbms\Database::select
 93.24% 10069.615     58 - Wikimedia\Rdbms\Database::query
 93.18% 10063.914     64 - Wikimedia\Rdbms\Database::doProfiledQuery
 93.15% 10059.856     84 - Wikimedia\Rdbms\DatabaseMysqli::doQuery
 92.46% 9985.890      2 - IndexPager::doQuery
 92.41% 9979.937      3 - IndexPager::getNumRows
 92.41% 9979.896      1 - section.IndexPager::doQuery (ContribsPager)
 92.41% 9979.756      1 - ContribsPager::reallyDoQuery
  3.99% 430.981    871 - Hooks::callHook

Event Timeline

I'm confused. Deployment of wmf.22 is supposed to happen today https://www.mediawiki.org/wiki/MediaWiki_1.32/Roadmap but it hasn't happened yet on mediawiki.org, which is on .20, but the problem seems to happen since today.

I'll leave the parent task as it is until someone can clarify that

Confirming. Source code of https://www.mediawiki.org/wiki/Special:Contributions/Andrew0005?debug=true says:

<script>(window.RLQ=window.RLQ||[]).push(function(){mw.config.set( {
    "wgBackendResponseTime": 24761,
    "wgHostname": "mw2184"
} );});</script>
hashar subscribed.

From profiling (with https://www.mediawiki.org/wiki/Special:Contributions/Andrew0005?forceprofile=1 ) that comes from ContribsPager::reallyDoQuery . HHVM reports a slow sql query:

That is for mediawikiwiki database:

SELECT /* IndexPager::buildQueryInfo (contributions page unfiltered) xxx */
   rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,
  COALESCE( comment_rev_comment.comment_text, rev_comment ) AS `rev_comment_text`,
  comment_rev_comment.comment_data AS `rev_comment_data`,
  comment_rev_comment.comment_id AS `rev_comment_cid`,
  COALESCE( actor_rev_user.actor_user, rev_user ) AS `rev_user`,
  COALESCE( actor_rev_user.actor_name, rev_user_text ) AS `rev_user_text`,
  temp_rev_user.revactor_actor AS `rev_actor`,
  rev_text_id,rev_content_format,rev_content_model,
  page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name,page_is_new,
  (SELECT  GROUP_CONCAT(ct_tag SEPARATOR ',')
   FROM `change_tag`    WHERE ct_rev_id=rev_id  ) AS `ts_tags`
FROM `revision`
LEFT JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id))
LEFT JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id))
LEFT JOIN `revision_actor_temp` `temp_rev_user` ON ((temp_rev_user.revactor_rev = rev_id))
LEFT JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = temp_rev_user.revactor_actor))
INNER JOIN `page` ON ((page_id = rev_page))
LEFT JOIN `user` ON (
    (COALESCE( actor_rev_user.actor_user, rev_user ) != 0)
    AND (user_id = COALESCE( actor_rev_user.actor_user, rev_user ))
 )

WHERE (
  ( (temp_rev_user.revactor_actor IS NOT NULL)
    AND temp_rev_user.revactor_actor = 'YYYYYYY'
  )
    OR
  (
     temp_rev_user.revactor_actor IS NULL AND rev_user = 'XXXXXXXX')
  )
  AND ((rev_deleted & 4) = 0)

ORDER BY rev_timestamp DESC LIMIT 51

I have obfuscated revactor_actor and rev_user

The query plan shows it is using a temporary and filesort against the page table which is a big NO/NO:

           id: 1
  select_type: PRIMARY
        table: page
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 682929
        Extra: Using temporary; Using filesort

I have quickly poked the DBA to check whether something was going on, and the servers looks all fine.

From our Logstash, a query for ""contributions page unfiltered"" shows it had some occurrences for the last 30 days. There is slightly more since 09/17 ~ 16:00 UTC.

For those having access: https://logstash.wikimedia.org/goto/d9085504b0f3a18e89685a739473812e

Anomie subscribed.

The specific query in T204669#4593536 is due to the enabling of actor table WRITE_BOTH mode on mediawiki.org (07a6db7), so we can hopefully find slow queries like this and fix them before they go out to big content wikis and cause more trouble. Currently that config change is also enabled on testwiki, test2wiki, and testwikidatawiki.

In the somewhat-long term any slow queries like this would go away once we're done with the actor migration and no longer have to check both the 'actor' and 'user' fields. In the short term, I'll have to add code to do a union of the "old" and "new" queries instead of combining the different conditions with an OR.

Aklapper renamed this task from Slow access to Special:Contributions on mediawiki.org to Slow access to Special:Contributions on mediawiki.org (due to enabling actor table WRITE_BOTH mode).Sep 18 2018, 1:46 PM

I may decide to instead revert the config change and rewrite the migration process to use "write-both-read-old"→"write-both-read-new" instead of "write-both-read-both"→"write-new-read-both"; particularly when the revision table and special pages are involved, "read both" quickly gets complicated.

Change 461209 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/mediawiki-config@master] Set ActorTableSchemaMigrationStage => OLD on test wikis, mw.org

https://gerrit.wikimedia.org/r/461209

Change 461209 merged by jenkins-bot:
[operations/mediawiki-config@master] Set ActorTableSchemaMigrationStage => OLD on test wikis, mw.org

https://gerrit.wikimedia.org/r/461209

Stashbot subscribed.

Mentioned in SAL (#wikimedia-operations) [2018-09-18T20:58:52Z] <catrope@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Set ActorTableSchemaMigrationStage back to OLD on test wikis, mediawikiwiki (T188327, T204669) (duration: 00m 57s)

Change 461440 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] ActorMigration: Remove possibility of read-both

https://gerrit.wikimedia.org/r/461440

Change 461441 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/ORES@master] Adjust for core change I4764c1c78

https://gerrit.wikimedia.org/r/461441

Change 461442 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Renameuser@master] Adjust for core change I4764c1c78

https://gerrit.wikimedia.org/r/461442

Change 461443 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/UserMerge@master] Adjust for core change I4764c1c78

https://gerrit.wikimedia.org/r/461443

Change 461444 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/WikimediaMaintenance@master] Adjust for core change I4764c1c78

https://gerrit.wikimedia.org/r/461444

I don't know if this is late to the game, but from the perspective of one who uses the http API: in the past 2 weeks or so, on en.wikipedia.org, a multi-user "usercontribs" query went from always returning immediately (sub-second) to always timing out. Is this change intended to fix that?

Redacted test case that times out on en.wikipedia.org:
GET /w/api.php?action=query&format=xml&list=usercontribs&ucuser=fakeuser1%7Cfakeuser2&uclimit=20&ucnamespace=0&ucprop=title%7Ctimestamp%7Ccomment%7Cflags&continue= HTTP/1.1

Remove the "%7Cfakeuser2" and it returns immeidately.

Change 461441 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Adjust for core change I4764c1c78

https://gerrit.wikimedia.org/r/461441

Change 461444 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMaintenance@master] Adjust for core change I4764c1c78

https://gerrit.wikimedia.org/r/461444

Change 461440 merged by jenkins-bot:
[mediawiki/core@master] ActorMigration: Remove possibility of read-both

https://gerrit.wikimedia.org/r/461440

Change 461443 merged by jenkins-bot:
[mediawiki/extensions/UserMerge@master] Adjust for core change I4764c1c78

https://gerrit.wikimedia.org/r/461443

Change 466720 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Improve some queries ordering by rev_timestamp with actor migration READ_NEW

https://gerrit.wikimedia.org/r/466720

Change 466720 merged by jenkins-bot:
[mediawiki/core@master] Improve some queries ordering by rev_timestamp with actor migration READ_NEW

https://gerrit.wikimedia.org/r/466720

Should we backport the fix (and others) to REL1_31 / REL1_32?

I don't know. rMW993baa3493f1: ActorMigration: Remove possibility of read-both is a bit of a large change, the default is MIGRATION_OLD, and if a wiki goes straight to MIGRATION_NEW they wouldn't run into this either.

Change 461442 merged by Umherirrender:
[mediawiki/extensions/Renameuser@master] Adjust for core change I4764c1c78

https://gerrit.wikimedia.org/r/461442