This task involves the work of investigating how we might detect the presence/absence of copywritten material in new content edits in a scalable way.
The above will enable, at a minimum, the Editing Team to evaluate to what – if any – extent Paste Check (T359107) impacts the proportion of new content edits (editcheck-newcontent) newcomers and inexperienced volunteers publish that are at risk of creating a copyright violation.
Requirements
Identify approaches for aggregating edits that are reverted because they do pose/are at risk of posing a copyright violation.
Approaches
- TBD
- TBD
- TBD
References
- In T376064, we are asking a similar question to this ticket. Tho, the approach there depends on manually reviewing a sample of edits which we do not consider to be scalable. Thus, this ticket.
- In T378112, we are planning to investigate whether Earwig's Copyvio Detector could be of use here
Done
- "Approaches" that meet the "Requirements" above are documented that include:
- What tradeoffs we'd need to consider
- A recommendation for which approach to move forward with