Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Page MenuHomePhabricator

[SPIKE] Investigate approaches for evaluating presence/absence of copywritten material in new content edits
Open, Needs TriagePublic

Description

This task involves the work of investigating how we might detect the presence/absence of copywritten material in new content edits in a scalable way.

The above will enable, at a minimum, the Editing Team to evaluate to what – if any – extent Paste Check (T359107) impacts the proportion of new content edits (editcheck-newcontent) newcomers and inexperienced volunteers publish that are at risk of creating a copyright violation.

Requirements

Identify approaches for aggregating edits that are reverted because they do pose/are at risk of posing a copyright violation.

Approaches

  • TBD
  • TBD
  • TBD

References

  • In T376064, we are asking a similar question to this ticket. Tho, the approach there depends on manually reviewing a sample of edits which we do not consider to be scalable. Thus, this ticket.
  • In T378112, we are planning to investigate whether Earwig's Copyvio Detector could be of use here

Done

  • "Approaches" that meet the "Requirements" above are documented that include:
    • What tradeoffs we'd need to consider
    • A recommendation for which approach to move forward with