Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Page MenuHomePhabricator

Page categorization logs expose user's IP
Closed, ResolvedPublic

Description

Tested on pl.wiktionary.org at version 1.26wmf19, which included T9148. Some background: we have implemented a Lua module that retrieves the latest wikitext of the page (mw.title:getContent()), searches for specific strings (let's say, a Noun or Adjective header) and categorizes the page conveniently if a match is found (adding a category link such as [[Category:Language - nouns]]). [Prior to this, no parts of speech were categorized at all as it would require creating lots of templates and including them in every single entry. Instead of that, we put the said Lua module's invocation inside a master template that was already transcluded in all entries, simplifying everything a lot.] As a side effect, an edit that causes the module to insert (or remove) a category link and, consequently, to (de)categorize the page may not update the db links tables and the page has to be purged by either performing a null edit or calling the API (/api.php?action=purge&forcelinkupdate=). The latter exposes the user's IP in the enhanced recent changes log (2nd pic) and in /api.php?action=query&list=recentchanges.

The latest two log entries correspond to an API purge:
{F1903149}{F1903215}

Event Timeline

PeterBowman raised the priority of this task from to Needs Triage.
PeterBowman updated the task description. (Show Details)
PeterBowman subscribed.
Restricted Application changed the visibility from "Public (No Login Required)" to "Custom Policy". · View Herald TranscriptAug 20 2015, 2:02 AM
Restricted Application changed the edit policy from "All Users" to "Custom Policy". · View Herald Transcript
Restricted Application added a project: acl*security. · View Herald Transcript

@PeterBowman, can you link to the lua module that is being used for this?

@Anomie, do you understand what's going on here?

greg triaged this task as Unbreak Now! priority.Aug 20 2015, 4:50 PM
greg subscribed.

Reverted https://gerrit.wikimedia.org/r/#/c/211526/ with https://gerrit.wikimedia.org/r/#/c/232764/, to stop the issue.

Now we need to cleanup the rc entries that leak this. I confirmed with Kai_WMDE that anything with mw.categorize should be from this patch. We can probably just delete the rows, although we could also set rc_user_text to 127.0.0.1 (or something else benign) when rc_user = 0.

The bigger issue is getting that query to run efficiently, since rc_source isn't indexed. @jcrespo, I'm working on a query, but if you see a good way to do it, let me know.

SAL entry for deploying wmf19 started with

2015-08-18 18:30 logmsgbot: twentyafterfour@tin Started scap: testwiki to 1.26wmf19

So only need to deal with rc's after that timestamp.

jynus suggested doing a limit until 0 rows changed, so probably a loop of

delete from recentchanges where rc_timestamp > 20150818183000 AND rc_source = 'mw.categorize' AND rc_user = 0 limit 1000

I'll write up a script so we can run it on each (non-wikipedia) wiki.

Here's what I'll run for group0.dblist and group1.dblist once we've stopped adding new rc rows

<?php

$IP = getenv( 'MW_INSTALL_PATH' );
if ( $IP === false ) {
        $IP = __DIR__ . '/../../..';
}
require_once( "$IP/maintenance/Maintenance.php" );

class DeleteRCs extends Maintenance {
        public function execute() {
                $dbw = wfGetDB( DB_MASTER );
                do {
                        $dbw->query( "DELETE FROM `recentchanges` where `rc_timestamp` > 20150818183000 AND rc_source = 'mw.categorize' AND rc_user = 0 limit 1000" );
                        echo 'updated ' . $dbw->affectedRows() . " rows\n";
                        wfWaitForSlaves();
                } while ( $dbw->affectedRows() > 0 );
        }
}

$maintClass = 'DeleteRCs';
require_once RUN_MAINTENANCE_IF_MAIN;

@Legoktm pointed out we should probably get rid of all mw.categorize rows since the patch has been reverted entirely, so that makes it:

foreachwikiindblist /srv/mediawiki-staging/group0.dblist runBatchedQuery.php "DELETE FROM recentchanges where rc_timestamp > 20150818183000 AND rc_source = 'mw.categorize' limit 1000"

Revert was deployed to all wikis, and DELETE has been run on all group0/1 wikis, so the major issue has been addressed.

@ArielGlenn, I'm not entirely sure what tables we include in the dumps, but is there a chance any recentchanges tables since Tuesday were included in any exports?

Any other replications that we need to consider before closing this?

@PeterBowman, can you link to the lua module that is being used for this?

@csteipp it's Module:nagłówek języka (< invoked from Template:nagłówek języka < transcluded by these templates, which are directly called in all entries).

From https://dumps.wikimedia.org/backup-index.html, it looks like there haven't been any dumps since the 15th, and @Reedy says we don't dump recentchanges. So I think we're ok there.

I also checked tool labs, and the deletes look like they've been replicated.

@PeterBowman, are those IP addresses in the screenshots your own? Or do we need to remove those from this task before we make it public?

@PeterBowman, are those IP addresses in the screenshots your own? Or do we need to remove those from this task before we make it public?

@csteipp no, mine was the one behind that red rectangle in the second pic, and the others are regular IP edits shown for comparison.

Screenshots removed to protect those IP's

At this point, I think we've fully recovered from the incident. I'll add an incident report and make this public, unless someone thinks of any other places we need to sanitize.

@ArielGlenn, I'm not entirely sure what tables we include in the dumps, but is there a chance any recentchanges tables since Tuesday were included in any exports?

@ArielGlenn doesn't have access to this task. Phabricator tries to be helpful by showing such users as greyed out in actual submitted comments, but doesn't do the same in previews. I guess it's complicated by the ability to add subscribers from the same form.

Do we know how a module was able to cause IPs to be logged like this?

Do we know how a module was able to cause IPs to be logged like this?

The issue was https://gerrit.wikimedia.org/r/#/c/211526/20/includes/changes/CategoryMembershipChange.php, around line 76. The categorization patch added a recent changes entry when a category was added. The code accounted for categories being added inside of parser functions, but set the user performing the change with User::newFromId(0) instead of looking into the request context, which generated an anonymous user with a "username" of the editor's IP address.

So the plwiktionary lua module (which added categories from lua) triggered this. I'm going to guess that a few other wikis were affected, but we deleted the data before that analysis, so not totally sure how broad the impact actually was.

instead of looking into the request context,

Not that looking into the request context would have made much more sense: why attribute the adding of the category to the user who happened to purge or null-edit the page?

Timeline:

  • 2015-08-18 18:30 UTC: 1.26wmf19 (which contained gerrit 211526) was deployed to group0 wikis (test, test2, mediawiki.org, zero, testwikidata)
  • 2015-08-19 18:06 UTC: 1.26wmf19 deployed to non wikipedias
  • 2015-08-19 21:49 UTC: Peter Bowman reports T109638
  • ~2015-08-20 18:00 UTC (not in SAL): @20after4 deploys revert (gerrit 232764)
  • 2015-08-20 20:25 UTC: Script to delete recentchanges rows finishes

@jcrespo - after all that work to delete the entries with the IP's... do you know what it would take to get a copy of those tables with the IP's still in them, so we can find out exact users who were affected? We can also try to reconstruct which wikis were affected from our best guess at which ones had parser functions that caused the bug, but would be nice to know for sure.

@csteipp I can give you the log table at 2015-08-20 18:00 UTC from several wikis. I need:

  • The exact timestamp, and/or something unique like the query of the first delete executed to point to a particular database event.
  • The list of wikis affected, or possibly affected
  • The query to get you the list of ip's affected

Recovering from an arbitrary point in time in the past is a relatively slow process. Luckily there is a backup from the 19th that I can roll forward, but it still has to be done for each wiki involved, and because of the size involved, it will take hours. This offers expires (because of log retention) in one month since the 20th.

Note on the impact.

I'm able to trigger this in dev when,

  • A new article is created that contains a category
  • A parser function adds a category as part of an edit

However, uploading a new file with a category in the summary seems to not trigger this (category association is correctly attributed to the username), so of the ~200k commons recent changes entries, very few should have exposed the uploader's IP.

Incident documentation to be posted to https://wikitech.wikimedia.org/wiki/Incident_documentation

I'll work with communications to get a version appropriate for posting in a more public location as well.


Summary

Between Aug 18th and Aug 20th, a patch was deployed to some wikis (group0 and group1 deployment groups) on the WMF cluster that caused MediaWiki to report IP addresses of editors who cause a category to be added to a page under certain conditions. When the category was added as part of a parser function triggered from a subsequent call to the purge API, the recent changes log reported the IP address from which the purge originated, even when called by a logged in editor. The proximity in the recent changes log of these changes to other edits could have allowed an observer to associate the IP address and username of an editor. The publication of this combination of data is considered a violation of our privacy policy.

Once the issue was confirmed and problem patch identified, the Security team immediately began the process of reverting the feature, and deleting all recent changes entries created by the patch to prevent leaking this private data further.

During the time that the patch was deployed, the API call that could have triggered this leak of private information was called 340 times (326 times on commons.wikimedia.org, 10 times on pl.wiktionary.org, and 4 times on en.wiktionary.org) by 69 different IP addresses.

Timeline

  • 2015-08-18 18:30 UTC: 1.26wmf19 (which contained Gerrit {{gerrit|211526}}) was deployed to group0 wikis (testwiki, test2wiki, mediawikiwiki, zerowiki, testwikidatawiki)
  • 2015-08-19 18:06 UTC: 1.26wmf19 deployed to non Wikipedias
  • 2015-08-19 21:49 UTC: User Peter Bowman reports {{Bug|T109638}}
  • Approximately 2015-08-20 18:00 UTC: Release Engineering deploys revert (Gerrit {{gerrit|232764}})
  • 2015-08-20 20:25 UTC: Script to delete recentchanges rows finishes

Conclusions

The patch that caused the issue was created, reviewed, and merged by experienced staff and volunteer developers, yet each of these developers failed to identify the potential security impact of the design. This indicates that the security ramifications of the functions involved are likely not structured or documented in a way that makes the issues clear, and this particular code pattern has not been highlighted as dangerous in documentation or training. Additionally, there are currently no automated or manual testing procedures specifically to identify privacy issues in new version of MediaWiki.

Actionables

In the near term, we will add notice about User::newFromId(0) to secure code training {{Bug|T110620}}, and investigate updating the function definition or documentation to make the security impact more clear to developers.

Longer term, we will investigate using static analysis tools to automatically flag this code pattern (depends on {{Bug|T110617}}, and look at developing security training specifically for QA testers, which would include looking for privacy issues.

[[Category:Incident documentation]]

User::newFromId( 0 ) causing issues rings a bell...

@csteipp: seems mostly good to me. Should probably replace reference to @20after4 with something that works outside of Phabricator. Also, please remove references to people's affiliations (first paragraph of 'Conclusions'), and replace "WMF Staff" with "authorised people" or something along those lines.

@jcrespo, it would be great if you could roll forward the plwiktionary just to get an idea of how many users were affected there to start. Potentially all non-wikipedias could have been affected, but I'd like to make sure we can actually find them on plwiki before looking through all the other db's.

The table should get rolled forward to about 2015-08-20 18:00, or just before the first delete which was an hour or so after that (I'm not sure the exact time). The exact query should have been "DELETE FROM recentchanges where rc_timestamp > 20150818183000 AND rc_source = 'mw.categorize' limit 1000".

To get a list of the users affected, I think the query should be something like,

SELECT DISTINCT a.rc_user FROM recentchanges AS a JOIN recentchanges AS b ON ( a.rc_namespace = b.rc_namespace AND a.rc_title = b.rc_title AND b.rc_id > a.rc_id ) WHERE b.rc_user = 0 AND a.rc_user > 0 AND b.rc_source = 'mw.categorize' AND a.rc_ip = b.rc_ip;

But that may need some help to actually run.

@csteipp I can give you the log table at 2015-08-20 18:00 UTC from several wikis. I need:

  • The exact timestamp, and/or something unique like the query of the first delete executed to point to a particular database event.
  • The list of wikis affected, or possibly affected
  • The query to get you the list of ip's affected

But that may need some help to actually run.

That is ok, I do not need exact data (I will provide that), but something to get me started. I will try now with plwiktionary, will get you back to you if I need more info.

Given the size of this feature (It touches 27 files), perhaps it should not have rode the MediaWiki deployment train, but instead been behind a feature flag, and restricted to mw.org and test.wikipedia.org for a week in order to get increased testing before wide deployment. This obviously wouldn't have prevented the issue, and its unclear if the issue would still have been identified with the lesser amount of testing it would have received restricted to those wikis (Although given that it appears that @Krinkle independently [I assume independently, since he hasn't commented on this bug] discovered this bug, albeit perhaps did not realize the security implications, I feel like a week of testing on mw.org would have revealed this issue), and assuming it did get discovered during a slow rollout, such a slow rollout would have reduced the impact of this bug quite a bit.

Ok, here is what I got. I rolled forward a backup up to binlog position 606217714, because that is where I found the first delete event (at 2015-08-20 19:15:58):

# at 606217714
#150820 19:15:58 server id 101627  end_log_pos 606217925        Query   thread_id=813712648     exec_time=0     error_code=0
use `mediawikiwiki`/*!*/;
SET TIMESTAMP=1440098158/*!*/;
SET @@session.pseudo_thread_id=813712648/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C binary *//*!*/;
SET @@session.character_set_client=63,@@session.collation_connection=63,@@session.collation_server=63/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
DELETE /* BatchedQueryRunner::execute  */ FROM recentchanges where rc_timestamp > 20150818183000 AND rc_source = 'mw.categorize' limit 1000

(please note that the delete is on mediawikiwiki, the table recovered is plwiktionary)

At that point there are 37K events on recent changes (similar to right now, where I counted 36K), 1219 of those, mw.categorize events (0 right now):

MariaDB MISC m1 localhost plwiktionary > SELECT count(*) FROM recentchanges;
+----------+
| count(*) |
+----------+
|    36891 |
+----------+
1 row in set (0.01 sec)
MariaDB MISC m1 localhost plwiktionary > SELECT count(*) FROM recentchanges WHERE rc_source = 'mw.categorize';
+----------+
| count(*) |
+----------+
|     1219 |
+----------+
1 row in set (0.03 sec)

And your query returns 1 row:

MariaDB MISC m1 localhost plwiktionary > SELECT DISTINCT a.rc_user FROM recentchanges AS a JOIN recentchanges AS b ON ( a.rc_namespace = b.rc_namespace AND a.rc_title = b.rc_title AND b.rc_id > a.rc_id ) WHERE b.rc_user = 0 AND a.rc_user > 0 AND b.rc_source = 'mw.categorize' AND a.rc_ip = b.rc_ip;
+---------+
| rc_user |
+---------+
|   23*** |  (sanitized just in case)
+---------+
1 row in set (0.36 sec)

I can provide you access to this table, if needed.

Disclaimers: I cannot guarantee 100% the validity of the results for the following reasons:

  • I only rolled forward the recentchanges table. If some insertion there depends on other tables (INSERT...SELECT), those would have been be ignored
  • I didn't have into account deleted events (shouldn't affect the total) or events added while the deletion were ongoing

@jcrespo, Can you add the full user id here? That is not sensitive. Want to
see if that is the original poster. If so, this might be much less impact
than we thought. If not, then my query is probably off..

sure: 23133

Let me help with the query if you find me live, but I will probably wind down for the day today.

That matches up with the original report, so it appears that Peter was possibly the only person affected by this on plwiktionary.

In light of that, I'd like to check meta, enwiktionary, and possibly commons to see how many users we affected on those wikis. If we don't see users affected on those wikis, this may be a non-incident.

@jcrespo, totally understandable that you're off for the night. If you can run those on Monday that would be helpful. We'll hold off on making this public until then.

I'm able to trigger this in dev when,

A new article is created that contains a category
A parser function adds a category as part of an edit

I looked back at this while working out the query for jaime, and realize that I got logged out part way through my testing. So I have not actually been able to duplicate yet.

Ok, able to reproduce, but only by using action=purge&forcelinkupdate=, not null edits.

Conclusions

My conclusions about the cause of this, after reading the code:

*The author was unaware of the proper way to create a dummy user in the case that the dummy user is inserted into the database. (Not surprising, that's documented pretty much nowhere)
*It was assumed that either User::newFromId(0) did something other than what it does, or that the fields of the recentchanges table aren't readable except via the code that generates Special:Recentchanges. It may be worthwhile to add to the security documentation, that you should assume all data in db is public (via tool labs, api, etc) unless the field in question is explicitly documented as being private data.

@jcrespo, I pulled the number of potentially affected users from the API logs, which gives us an upper limit to the number of times users could have been affected across these wikis-- only 69 potential users impacted, and those were only on commons, plwiktionary, and enwiktionary. In light of that, I don't think it's worth the time restoring each of those tables and finding out the exact users affected.

@Bawolff / @Krenair, thanks for the feedback. I've tried to incorporate your comments, and the comments from Comms, into the final incident report. I'll post that on wikitech soon.

csteipp changed the visibility from "Custom Policy" to "Public (No Login Required)".Sep 1 2015, 6:15 PM
csteipp changed the edit policy from "Custom Policy" to "All Users".
csteipp changed Security from Software security bug to None.

Change 239065 had a related patch set (by Addshore) published:
Enable users to watch category membership changes #2

https://gerrit.wikimedia.org/r/239065

Change 233441 abandoned by Addshore:
Enable users to watch category membership changes #2

Reason:
Please see the following changes which have been factored out of this one:
https://gerrit.wikimedia.org/r/#/c/239061/
https://gerrit.wikimedia.org/r/#/c/239060/
https://gerrit.wikimedia.org/r/#/c/239065/

https://gerrit.wikimedia.org/r/233441

Change 239065 merged by jenkins-bot:
Enable users to watch category membership changes #2

https://gerrit.wikimedia.org/r/239065