Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Fix importation of bogus authors and cleanup old data #331

Open
LeadSongDog opened this issue Sep 25, 2022 · 0 comments
Open

Epic: Fix importation of bogus authors and cleanup old data #331

LeadSongDog opened this issue Sep 25, 2022 · 0 comments

Comments

@LeadSongDog
Copy link

LeadSongDog commented Sep 25, 2022

There are tens of thousands of bogus author records with names * Publishing or * Books. Somewhat fewer with * Editions and other-language equivalents.

Many originate with the import of low quality records from BWB or AMZ such as https://www.betterworldbooks.com/product/detail/9783110367737
which was imported as
https://openlibrary.org/books/OL34526350M/Quantenmechanik
where the authors include
https://openlibrary.org/authors/OL9711355A/Perseus_Books_Perseus_Books_LLC.

Many (about 30%) of these author records have no associated work record. Those are low-hanging fruit that could simply be bulk removed.

More have only work records that are misattributed to the “author” with these publisher names and the “publisher” shown as "Independently Published", “CreateSpace” or the like. For these there is often another correct work record of similar title showing the correct authorship. Some heuristics might help with these.

A substantial group however are corporate authorships by publisher staff writers with no public attribution to an individual. This is particularly common in bibliographies, reference works, study notes, and textbooks.

Suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants