Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Jump to content

Wikipedia:Wikipedia Signpost/Single/2024-01-31

From Wikipedia, the free encyclopedia
The Signpost
Single-page Edition
WP:POST/1
31 January 2024

 

File:Osama Khalid 2016.jpg
Fjmustak
CC-BY-SA-4.0
300
2024-01-31

Wikipedian Osama Khalid celebrated his 30th birthday in jail

Wikipedian Osama Khalid celebrated his 30th birthday in jail

Osama Khalid (left) together with fellow Wikipedian Ziyad Alsufyani: both of of them are currently imprisoned in Saudi Arabia (see previous Signpost coverage).

As reported by Palestinian Wikimedian Farah Jack Mustaklem, on 19 January 2024 Osama Khalid, a Saudi pediatrician, blogger and Wikipedia volunteer, turned 30 years old while still being detained in al-Ha'ir Prison, together with fellow doctor and Wikipedian Ziyad Alsufyani.

As detailed in previous Signpost coverage, Osama and Ziyad had both been arrested in July 2020 and sentenced, respectively, to 5 and 8 years in prison, with the former’s sentence having been increased to 32 years on appeal in September 2022; however, reports about their incarceration first went public in January 2023. The two, whose arrest and detention are considered to be connected to the Saudi government’s recent crackdown on online dissent, had both served as administrators on the Arabic Wikipedia for several years, while being deeply involved within the Wiki Project Med Foundation and contributing to several articles about Saudi human rights activists, such as Loujain al-Hathloul.

Nine different civil rights organizations, including Access Now and Digital Citizenship, have co-signed an open letter dated 19 January 2024 asking for the immediate release of Osama, Ziyad and all of the other activists currently detained in Saudi Arabian prisons. – O and AK

Child Rights Impact Assessment

Title page of WMF Child Rights Impact Assessment Report 2023 (pdf)
Article One – WMF Child Rights Impact Assessment Report 2023 (English)

The Wikimedia Foundation has announced the publication of a Child Rights Impact Assessment, described as an independent report commissioned from consultancy Article One to "understand the impacts, risks, and opportunities posed to children who access and participate in Wikimedia projects":

While the Wikimedia Foundation’s commitments to privacy and data minimization make it impossible to know just how many young readers and editors there are on Wikimedia projects, an untold number of people less than 18 years of age seek out verifiable, encyclopedic information on Wikipedia. [...] Some of them progress from readers to editors as they start to contribute their own knowledge to these pages. Protecting child safety, both of readers and editors, is a top priority not only for the Foundation, but also for Wikimedia community groups and affiliates around the globe. [...]

This Child Rights Impact Assessment (CRIA) is the latest initiative that we have undertaken in order to meet our commitment to protect and uphold the human rights of all those who interact with Wikimedia projects. In 2020, the Foundation carried out an organization-wide Human Rights Impact Assessment (HRIA), with a report and update on progress published in 2022. A key recommendation of that assessment was to conduct a targeted CRIA, which could help us to better understand the benefits to and risks to children participating in Wikimedia projects.

The Foundation—in partnership with Article One, a specialized strategy and management consultancy with expertise in human rights, responsible innovation, and sustainability—started work on the CRIA in late 2021, and completed it in March 2023. The publication of this report (redacted to protect security and privacy of volunteers, readers, and those who were interviewed for the report) represents both a continuation of the Foundation’s commitments to human rights, and an important opportunity to revitalize conversations across the Wikimedia movement around how to best protect children on Wikimedia projects.

AK

Chinese Wikipedia logo

Chinese Wikipedia

An article (Google translation) in the German Wikipedia's Kurier notes that the Chinese Wikipedia (zh:WP) has decided

  • to introduce an ArbCom;
  • to lower the RfA approval threshold from the previous 80% to about 70–75%;
  • to trial probationary adminship for candidates getting at least 50% approval.

The Chinese Wikipedia has apparently been in administrative crisis ever since the 2021 WMF desysops (see previous coverage in the 26 September 2021 Signpost issue: 1, 2, 3). The above measures have been designed to mitigate the situation. – AK

Brief notes

The Deoband Community Wikimedia ran a "Heritage Lens" project last year. The above picture is from the project's second iteration in December 2023, focused on South Kashmir, a region in the Kashmir Valley.



Reader comments

File:DALL-E_manga_style_copying_work.jpg
Bri
PD
0
0
300
2024-01-31

Until it happens to you

In the last issue of The Signpost, Smallbones addressed the explosive public fallout between Business Insider and Bill Ackman. This arose from the newspaper's exclusive story about the hedge fund billionaire's wife, Neri Oxman, who had been accused of plagiarizing from several academic and online sources (including Wikipedia) without any attribution. In a series of tweets on his Twitter profile, Ackman quite vehemently defended his wife from the accusations, and questioned whether somebody could even plagiarize Wikipedia to begin with.

Placeholder alt text
Bill Ackman, the man who inadvertently inspired both an article in our last issue and mine with his claims about Wikipedia.

Now, the claims of Ackman and several other commentators who dived in the original discussion have already been tackled by Smallbones in his brief piece, as well as fellow Wikipedian Molly White, who explained how Wikipedia ultimately works in a detailed YouTube video. If you haven't already, I'd suggest you to check both sources out before moving on. In this article, though, I'd like to reflect on the same theme from a slightly different perspective, which involves a user — me, myself and I, more specifically — who did fall victim of plagiarism.

So, to be fair, I come from a generation of pain, where murder is mi– [Vinyl scratch noise] No, wait, I've picked up the wrong script, I'm sorry. Allow me to do a second take, please… So, to be fair, I come from a generation that is considered to be very used to the dynamics of Internet, and rightfully so: in some instances, maybe we're even too tied to the online world. This, though, doesn't necessarily mean teens and people in their twenties are better at selecting, double-checking and, most importantly, citing their sources. In fact, I made my own mistakes as a little kid, and I suspect that there are many more students across the globe who have copied from Wikipedia or other sites for their school projects/assignments in good faith (hopefully), without knowing that using those text blocks without proper attribution potentially violates CC BY-SA license and copyright law. Although I've been lucky to have high school and university teachers who emphasized the importance of declaring and checking your sources, I had never fully understood how serious this aspect is until I got more familiar with fact-checking, while also keeping learning through my experiences on Genius, and then here on Wikipedia. This summer, finally, I had kind of an epiphany in this sense.

Placeholder alt text
Remember to be honest about your sources, kids (and youngsters, and every other generation)...
DALL-E 3, prompt: Bri

So, both on English and Italian Wikipedia I very often cover articles related to football –—no, not this football... not this, either... ah, there we go! Among other things, I've also tried creating new pages from scratch, including the one about French-Malian footballer Coli Saco, which first saw the light of the day roughly a year ago. In August 2023, on the “deadline day” of the summer transfer window, Saco was sent on loan to an Italian club, whose name will be kept undisclosed here — you can see for yourself, anyway. Geregen2 updated the page first, but I still wanted to check the official announcement by the club, out of pure curiosity. As soon as I started reading through the text, I was like, "Hmm… looks familiar, but they still did their research!" After a closer look, I realized it was more than just familiar: in fact, whoever wrote the announcement most likely copied the information on Saco's article, slapped it in a translator, trimmed it down slightly and, finally, pasted it on the club's website.

"They... they c-c-copied... my p-precious boy... How could you be so cruel?", I mumbled in desperation, as I felt my mind deeply descending into the arms of– [Vinyl scratch noise] Just kidding, I wasn't too bothered by that, to be honest. Nevertheless, it was quite evident that the article had been plagiarized, even when assuming good faith one more time and imagining that the club's website admin was likely scrambling to get the press release done in a reasonable amount of time — and by the way, for anyone who's not too familiar with association football, this is just one of the many and stressful phases of transfers.

Back to the topic, though. Was I annoyed by seeing "my" article[1] getting copied so blatantly? Yes.

Was that an example of the Italian media's chronic bad habit of not double-checking their sources, or even not citing them appropriately, before publishing their articles? Yes, kind of.

Is this incident as bad as the one who is currently putting the career of Neri Oxman at high risk? Well, no, not even close. Let's try to let it sink in for a moment, though.

Do you remember I listed fact-checking as one of the reasons why I started taking sources and citations more seriously? That's because it helped me discover not only how to recognize and debunk fake news, but also how much damage they can do if left unchecked. Wikipedia isn't immune to disinformation, either: just last July, we reported on the series of over-enthusiastic edits made by a suspicious user on the article about OceanGate, which (disturbingly enough) might have played a role in the tragedy of the Titan submersible implosion. Obviously, not all the lies are this dangerous, but putting in place a solid system to detect and tackle them, as Wikipedia volunteers have done, still plays a key role in preserving a community built on trust, reciprocal respect, reliability and neutrality.

The same goes for plagiarism: whether we're talking about a plain-simple Wikipedia page, an article from a respected newspaper, your school-book or the Sacred Scriptures, they have all been written by someone who (hopefully) cared about the information or the message they had intended to convey. They might not necessarily take you to court if you don't give them credit, but in most cases, it could hurt their feelings, and you might not realize it until it happens to you.

So, even if we're all just piles of flesh, blood and bones trapped in a life-long state of imperfection, a.k.a. humans, or at least until artificial intelligence will have improved so much that we'll be toe-to-toe with humanoid versions of HAL 9000 who will constantly threat to destroy us if we don't take them to eat the best Bolognese spaghetti in the world every freaking day... [Inhalating intensely] In other words, even if we all make mistakes or go out of character sometimes, we should always remember to disclose the sources who are helping us in our research, if anything, out of respect for the people behind them. It could make their day, but also help us nurture that positive cycle of trust, accountability and quality information we all desperately need in these challenging times.

  1. ^ Technically, as we all know, Wikipedia articles are not an exclusive property of their creators; rather, they are shared between everyone who decides to improve them. Still, I hope you've got what I'm saying here...



Reader comments

File:Pickpocket warning sign, train station, Turin, Italy (17783621312) (fixed).jpg
Cory Doctorow
CC-BY-SA-2.0
100
75
500
2024-01-31

How paid editors squeeze you dry

Red, black and white placard warning sign displayed in a public facility, with surrounding text "SECURITY WARNING"
Keep an eye on your wallet

The Signpost has identified an extensive scam perpetrated by a company that calls itself "Elite Wiki Writers" or "Wiki Moderator", among many other names. Some of the other names they are suspected of using include wikicuratorz.com, wikiscribes.com, wikimastery.com, and wikimediafoundetion.com.

Annie Rauwerda described the general situation in a series of tweets a year ago. Her recommendation:

You should know that 99% of all "Wikipedia editing companies" are scams that charge you $1000 for articles then never write them. Do not give them your money
— Annie Rauwerda

That "99%" may be an unscientific estimate, but it's not far from our own estimate of over 95%. Wikipedia has made great strides in fighting this type of paid editor. But we've barely made a start with another part of the problem.

Shaun Spalding, legal counsel at the Wikimedia Foundation (WMF) told The Signpost

The volunteer community is extremely diligent about finding and deleting any promotional material that marketing companies try to post on Wikipedia. Thanks to the community, paid editing firms rarely succeed in influencing the actual content on Wikipedia. There are paid editing firms that we are aware of with a 0% success rate. Unfortunately, victims of these scams don't know this, and many paid editing firms act in a predatory manner taking advantage of this. If anyone receives an email or social media message from someone offering to make a Wikipedia page for you, then it is almost certainly a scam.

My advice is to be extremely skeptical of anyone claiming that they can guarantee to make a Wikipedia article for or about you, even if they claim to be an administrator. It is likely that they are not telling the truth. The portfolios that they might send you as "proof" of their experience are almost certainly fabricated. As are the physical addresses that they list on their websites; they don't list their real addresses because they don't want to get caught by the Foundation. There are no paid editing firms that have a relationship with the Wikimedia Foundation.

To maintain the privacy of the victims of the scam, we have not linked to any website that might embarrass them, and have not revealed their names without their permission.

Both "Elite Wiki Writers" and "Wiki Moderator" list the same address, 99 Wall Street in New York City — which appears to be a maildrop rather than a real office — and a second address in Skokie, Illinois. The company appears to have preyed upon more than 100 customers in 2023, according to a customer list obtained by The Signpost.

These customers include many small businesses (such as a single-store painting supply company), writers and artists, churches and religious publishers, a couple of lawyers, some little-known financial firms, nonprofits, a private detective agency, young entrepreneurs, former government officials and retired military officers — the whole range of people who would like some publicity and aspire to a Wikipedia article.

These people are certainly victims. Less than 5% of them succeed in getting an article on Wikipedia. Prices start at about $750, and quickly escalate to $1,500 or $10,000 — or more — as the "Elite Wiki Writers" claim that extra work is needed, or requirements have been raised by Wikipedia. We estimate that the proceeds of this scam in 2023 were at least $500,000, and perhaps well over $1,000,000.

The low number of them posted as actual articles (as opposed to drafts or user pages) could be from a lack of trying. There's little evidence that many of their proposed articles are written up as drafts at all, and if they are, they're commonly left without improvement until they are deleted automatically.

In 2021, three editors who claimed to work for Elite Wiki Writers (or related firms) did post a total of 71 articles as drafts or articles, in a clumsy attempt to become rule-abiding "declared" paid editors. They were globally locked by stewards following a sockpuppet investigation, where 41 editors were blocked and confirmed as sockpuppets of CharmenderDeol. The reason the three declared paid editors were globally locked is likely that they were sockpuppets secretly controlled by undeclared paid editors.

How can you tell who is honest?

How is somebody who wants to pay for an article on Wikipedia to know which websites are legitimate and which are scammers? As a rule, potential customers of these firms should be very skeptical of "Wikipedia editing firms" who advertise on the web. They nearly all show some signs of a scam.

For example, in this archived homepage of Elite Wiki Writers, you'll see several drawings and illustrations taken from Wikipedia, including the Wikipedia trademark known as the "puzzle globe" — albeit one much the worse for wear.

Placeholder alt text
Wikipedia's puzzle globe symbol is a trademark. Permission to use is governed by WMF trademark policy.

A good first question for a potential customer to ask Elite Wiki Writers might be: "have you complied with the license on those illustrations and do you have permission to use the trademarked puzzle globe?" If they answer "yes" on the trademark, you can contact the Wikimedia Foundation's legal department at one of the email addresses listed here. If they answer "no", you'll know that they don't follow the rules.

Spalding says:

You do not need to pay someone to get a Wikipedia page; the majority of firms that want you to pay them to get one are scams. They will not succeed in delivering a live Wikipedia page. The very small handful of legitimate marketing companies and reputation management firms engaged in this work are subject to the new "Marketing Company Mediation" provision added to the terms of use when it was updated in mid-2023. The updated terms of use have enabled the Foundation to increase our enforcement of the community's undisclosed paid editing rules in the last six months. We will continue to refine and resource this enforcement in 2024.

These are some tips to help people identify paid editing scams:

  • If they reached out to you (via email or message) rather than you reaching out to them, it's likely a scam.
  • If the company doesn't have all of their user account names posted on their sales pages, then it's likely a scam since this is required by the terms of use. This goes for websites as well as things like freelancing profiles on Upwork.
  • Since these companies tend to lie and impersonate existing long-time editors, even if they mention their accounts, one should be additionally skeptical.

These are my most impactful suggestions for people who want to "do something" to help the problem:

  • If someone is scammed or has been scammed in the near past, they might be able to charge back the fraudulent transaction on their credit card
  • If someone receives a spam email from someone offering paid editing services, they should report the email to their email provider as spam to prevent it from going to others.
  • If someone receives a LinkedIn or social media message with an offer for paid editing, they should report the profile to the social media platform.

We have been here before

Scams by "paid editing companies" have been happening on Wikipedia since at least the 2015 Operation Orangemoody scandal, which was documented by the Wikimedia Foundation, as well as by the Guardian, Independent, and Signpost.

The Orangemoody scam worked like an extortion racket. Targeted articles would be nominated for deletion, or denied approval for publication. Then other editors, presumably working for the same firm, would offer their services to reinstate the article and "protect" it from deletion or unwanted changes — for a monthly charge. In reality, they couldn't protect anything, they didn't protect anything and their victims had no way to get the money back. See this warning for further details.

So another sign that a company might be scamming you is if they promise to "protect" the article. See Wikipedia's policy Ownership of content for why this is not allowed.

The current scam is much simpler, and doesn't involve extortion. The company advertises on their online sites, via email, or approaches people through social media sites such as LinkedIn.

They then quickly write a low-quality article, sending the customers a copy of the text. The scammer bills them, telling their victims that approval of the article by Wikipedia will take some time. Once they've received payment, rather than going through the effort of trying to publish the article (and the risk of getting caught), the scammer may simply abandon the article, keeping the money. When the customers complain, the scammer blames the delay on Wikipedia, or tells the victim that other paid services are needed to ensure publication.

What can a fake administrator tell you?

While researching this article on wikimoderator.com, this reporter was asked to chat online by a purported "Sr. Wikipedia administrator." The transcript of the discussion, generated by the website, follows. Only the names of this reporter (Visitorxxx) and the "Sr. Wikipedia administrator" (GreySWA) have been changed. A link to a real administrator's page has also been removed. The real administrator denies any association with the firm, and says they have never edited for pay.

Chat Transcript with Visitorxxx

Chat started on 02 Nov 2023, 12:38 AM (GMT+0)
12:38:35 *** GreySWA joined the chat ***
12:38:35 *** Visitorxxx joined the chat ***
12:38:42 GreySWA    Hi there!
12:38:44 GreySWA    How are you?
12:38:44 GreySWA    Are you looking to create a personal or business Wikipedia page?
12:39:40 Visitorxxx I'm just interested - how do I know you are a "Sr. Wikipedia Administrator"
12:39:51 GreySWA    Sure
12:40:02 GreySWA    May I have your name, please?
12:40:21 GreySWA    I can share my Wikipedia ID with you
12:41:14 GreySWA    My Wikipedia ID: (link to Wikipedia administrator's userpage removed)
12:41:36 GreySWA    Now may I have your name and details?
12:42:23 Visitorxxx I just want to be sure I'm dealing with a reputable person here
12:43:05 GreySWA    I can understand, I've been on Wikipedia for 17 years now and I have published over 900 pages.
12:43:34 Visitorxxx Are you really a Sr. Wikipedia Administrator?
12:44:12 GreySWA    Yes, I guess so
12:44:34 GreySWA    I provided you my ID, You may take a look yourself
12:46:12 GreySWA    I didn't catch your name.
12:47:16 Visitorxxx I'm looking up your history. Are you a paid editor, or does somebody else handle that?
12:47:54 GreySWA    We have a team of 10-15 editors and administrators.
12:48:49 Visitorxxx Wikipedia editors and administrators?
12:48:59 GreySWA    Yes
12:49:33 Visitorxxx Can you give me one that has made a paid editing declaration?
12:50:05 GreySWA    No. Unfortunately, I can not.
12:50:53 GreySWA    You're asking for all of this sensitive information yet not even telling your name.
12:52:11 Visitorxxx I just don't want to have anything to do with a disreputable organization. I'd want any article I pay for to be strictly above board
12:53:10 GreySWA    There won't be any mention that It is a paid article. It will be independent.
12:53:35 GreySWA    Part of the fee goes directly to other administrators to approve your page.
12:54:19 GreySWA    Hence, There is no mention that it is a sponsored article or that you paid for it.
12:55:32 Visitorxxx I think I'll go check around to see if another company looks better. Can you make any recommendations?
12:56:27 GreySWA    Sure, You may have a look around and do your due diligence.
12:57:13 GreySWA    Sorry, I don't have any recommendations.
12:58:29 Visitorxxx OK, thanks
01:03:51 *** Visitorxxx left the chat ***

Later, on the site elitewikiwriters.com, a different website apparently owned by the same company, I was invited to chat again by somebody giving the same name (shown as "GreySWA" above). That transcript shows much the same wording in places, possibly indicating that the same sales script was used.

The transcript indicates that the company was breaking many of Wikipedia's rules. According to the policy for paid contribution disclosure, every paid editor must declare when they are being paid and include the name of their employer, the client, and other affiliated parties. There are no exceptions for the client or the employer. Administrators who edit for pay must also declare their paid status.

Some questions a potential client might wish to ask are:

  • Are you required to disclose my name on Wikipedia when I pay you to post an article?
  • Are administrators required to declare if they have been paid when they edit or accept a paid for article?
  • Can you direct me to one of your paid editors' declaration of their paid status?

If they answer any of these questions "no", then they are likely trying to scam you.

How they were ripped off

Many of the apparent scam victims on the Elite Wiki Writers customer list feared harassment, or were too embarrassed to be quoted by name in The Signpost.

One victim, who did not wish to use her real name because she feared harassment, we will call "Melissa". When asked how she learned about Elite Wiki Writers, she told The Signpost that she was first approached on LinkedIn. "It was a fake photo and profile," she said, which has since been removed. She was sold the basic + startup package last fall, which was to include a personal Wikipedia page and a page about her business, costing over $2,000.

Another victim — let's call him "Jared" — asked not to be identified, but gave The Signpost an extensive interview. Jared is clearly a notable subject for a Wikipedia biography. He's had two successful careers: the first as a lawyer and judge, then as an author.

He told The Signpost that he first ran into the article about him in Wikipedia several years ago, some time after it was first published. He didn't like the tag on the top about a possible conflict-of-interest by an editor. As time went by it started to look out-of-date, and he especially didn't like the photo. He would also have liked to see more about his career as a lawyer and a judge. But mostly he just wanted to see something that his grandchildren could read and be proud of.

He was approached out of the blue on LinkedIn by a woman who offered to rewrite the article for a fee. He must have asked too many questions, and she eventually dropped the discussion. Then a new person appeared on LinkedIn to help rope him in. After some discussion about the article, the new guy said that he was moving to Australia, but gave Jared a contact at Elite Wiki Writers.

Celeste Mergens founded the nonprofit Days for Girls in 2008, and was its CEO until she retired in 2022. She's since written a book on the organization and its work, and is already promoting it with a book tour. Unfortunately, she's not familiar with Wikipedia's rules. She thought having a biography of her own on Wikipedia would help book sales, so she contacted Elite Wiki Writers, who promised that she would not have to reveal that she paid for the article because they "use legitimate Wiki-moderators". She even got a $100 discount off the $750 fee for sending in a detailed draft.

Yvonna Cazares is a community organizer who has previously held several community-oriented positions in California's state and local governments. Her first contact of any sort with Elite Wiki Writers was when she was approached on LinkedIn. She told The Signpost that her main interest in Elite Wiki Writers was having them provide editorial services for a book that was near completion. They offered her a Wikipedia article as part of a package deal. The editorial services are a sideline that is not currently offered on the Elite Wiki Writers website. But following the online chats shown above with "GreySWA", they offered this reporter the same type of services.

Elite Wiki Writers offered Cazares several book editing packages, costing up to $8,500. Due to the low quality of the services and high cost, she soon believed that she was being scammed. She had difficulty getting a bank to accept a credit card payment, even though she had previously seen no problems with her credit card. Many credit card issuers have an effective system for avoiding liability for fraudulent credit card transactions. Charges coming from merchants who have had too many recent customer complaints can trigger a refusal of the transaction during the time you're waiting on the phone. Apparently, this is what happened to one of Cazares's transactions. She later tried to cancel her previous transactions, and with determination and some luck she is now off the hook. Luckily, she had saved the documentation on all the credit card payments and refusals.

Both Mergens and Jared had similar difficulties with credit card payments, even though neither had any previous problems with their credit cards.

Mergens agreed to have the company write the article about her, but later was told that "Wikipedia moderators" now require eight citations, three more than the draft then had. And they would be willing to write the needed stories and quickly place them in top quality publications for only $5,000 apiece. Mergens settled for three "C-level" articles for only $1,500 apiece.

The surprises kept coming. She was dissatisfied with the quality of the new draft — and the inaccuracies in it — and she only saw one new source in the draft. After she was offered another discount, she attempted to pay with a credit card. But the payment wouldn't go through. So another credit card was used, but the invoice arrived with the name of another company.

Then Mergens discovered multiple complaints on an online customer review site against Elite Wiki Writers. Finally, she was able to totally cancel the order, and get most of the credit card charges reversed.

Melissa also had troubles with her credit card. Elite Wiki Writers sold her additional services — for almost $4,000 — to ensure that the Wikipedia page could go live and "be protected". The services supposedly included eight ghostwritten articles to serve as citations, a Google Knowledge Graph, wiki linking, and a semi-protection lock. After being charged for this, an additional $400 was charged to her credit card "for taxes" without notifying her or getting her permission.

She was shown a mockup of a Wikipedia article on a non-Wikipedia page. She asked how she could edit it and was told that Wikipedia would charge her $5,000 to have external admin access to the page.

After demanding the refund of all credit card charges, she has been harassed and needed to block Elite Wiki Writers employees on her telephone.

Jared told The Signpost that Elite Wiki Writers did not tell him that he would have to be disclosed on Wikipedia as paying for the rewrite. They did tell him that Wikipedia required five citations and that they could provide these quickly from reputable news sources for an extra fee. He did his own research and provided Elite Wiki Writers with articles about him from local newspapers and major news sources from his state, but they did not use those sources.

After he first suspected that he was being scammed, he learned that Wikipedia required that he be identified as paying for the article. He asked Elite Wiki Writers about it, but they wouldn't give him a straight answer. At that point he decided to cancel all payments.

Cazares posted a complaint at Trustpilot.com. Then she emailed Elite Wiki Writers: "I feel completely taken advantage of, I spent so many hours and energy. Please stop doing this to people. I am not a wealthy person, but regardless, we are all people with dreams that we are trying to actualize. I'm so disappointed in myself."

They emailed her back, blaming her.

She told The Signpost: "This company needs to be exposed and people need to be aware. Most importantly, people who have been scammed should know they are not alone and it is not their fault that somebody misled them and took their money."

Protect yourself, protect Wikipedia

Everybody associated with this scam, except for Elite Wiki Writers, comes out a loser. Wikipedians are victims because the encyclopedia's trademarks are used without permission. Volunteers' time, Wikipedia's most valuable resource, is wasted sorting out hundreds of poorly researched articles, looking for one just one or two notable subjects. Much valuable time is taken from some of our most senior and energetic editors, working on sockpuppet investigation and deletion discussions. Administrators' names and reputations are dragged through the mud. Nobody on Wikipedia benefits from having scammers operate here.

The main victims, of course, are the scammers' customers. People with very little knowledge of Wikipedia and its rules are recruited via email, social media sites, and the company's own websites and then lied to. They are told that they can get a valuable article about themselves on Wikipedia. They pay their money and then they wait a few weeks, a few months, perhaps forever.

Potential customers can protect themselves by asking normal questions about who they are dealing with. Get names of the people who contact you and of their bosses. Check if they are using the puzzle globe Wikipedia trademark. Also write down telephone numbers and addresses and save invoices and other documents. These are just the steps you should take for any large transaction.

To screen people who contact you about a Wikipedia article or to help prevent them from scamming others, see this section above.

If you think you are being scammed ask these questions:

  • Are you required to disclose my name on Wikipedia when I pay you to post an article?
  • Are administrators required to declare if they have been paid when they edit or accept a paid for article?
  • Can you direct me to one of your paid editor's declaration of their paid status?
  • Can other people change my preferred version of my Wikipedia article?

If they answer any of these questions "no" then don't believe another word they say. You are almost certainly being scammed.



Reader comments

File:George Washington Masonic Memorial at Night.jpg
Daniel M Horowitz
CC-BY-SA-4.0
88
60
500
2024-01-31

Katherine Maher new NPR CEO, go check Wikipedia, race in the race

Katherine Maher to head NPR

Placeholder alt text
Katherine Maher in 2019

National Public Radio has announced that former Wikimedia Foundation CEO Katherine Maher will take the reins as NPR's CEO at the end of March, following a conference ending her five month gig as CEO of Web Summit. NPR itself (maintaining its editorial firewall) introduced her as the former CEO of WMF, quoting her saying "There is a strong alignment in both [Wikipedia and NPR] around integrity and autonomy." The New York Times emphasizes the challenges currently facing NPR, and indeed most of the media, writing she "will take over at NPR during a critical period. Listenership of traditional radio is waning as Americans adopt alternatives ... pressuring NPR to reach its audiences in new formats." RTÉ, an Irish public service broadcaster, highlights her recent connection to Web Summit. Maher was formerly Chief Communications Officer at the WMF before her CEO role; she has resigned from the US Department of State's Foreign Affairs Policy Board following her appointment to NPR, remaining Chair of the Signal Foundation and on the board of Consumer Reports.

The Signpost wishes her all the best. Congratulations Katherine!

See this 2019 interview with Maher in The Signpost

S, F

Tell it like it is

Tempest in a teapot
One of these again.

The New York Post was shocked to learn that Katherine Maher, the new NPR CEO, had tweeted in 2018 that "Donald Trump is a racist". They consider the six year old personal tweet to be inconsistent with NPR's policy that they provide "fact-based reporting; opinion and commentary are secondary." The Post also seemed shocked that some time since 2018, Maher deleted the tweet, implying that she was hiding something.

They might also be shocked to learn that many people have called this guy that thing, since early in his term as president. In 2018 and 2019, a majority of Americans agreed with the statement "Donald Trump is a racist", according to two polls; in 2019, 84% of African-Americans agreed. Nevertheless, another 2018 poll had only 49% agreeing against 47% disagreeing; at any rate it's difficult to see this as evidence of extremism.

The controversy about Trump's perceived racism has not subsided since. His attacks this month on Asian-American Nikki Haley are even causing more controversy. – S

Conservative commentator races to "go check Wikipedia"

Media watchdog Media Matters for America reports on Matt Walsh's use of Wikipedia to verify the skin color of Nikki Haley, the other candidate for the GOP nomination for the U.S. presidency. Walsh's commentary is simply dishonest. He says he never noticed that Haley is brown skinned and had to "check Wikipedia" to see if it's true. With a sleight of hand he reports that Wikipedia confirms the fact that her parents are from India. (More precisely they are Sikh.) Then he says Haley's claims of discrimination in a 1980s South Carolina beauty pageant based on her skin color "strain credulity" and that all kids get teased about something.

What did he leave unsaid?

In less than five minutes, he puts race back into the presidential race. – S

Former Wikimedia Italy president reflects on the state of Wikipedia and the open access movement

In Il Post (in Italian), Viola Stefanello breaks down the last ten years of the evolution and decline of the open access movement in academic publications, focusing on the controversies involving "shadow libraries" such as Sci-Hub and Anna's Archive, the legacy of the late Aaron Swartz and the current state of Wikipedia.

Anna's Archive, a website which hosts some 25 million books and 100 million papers totally unencumbered by copyright law (generally by virtue of just not following it), has recently been blocked by AGCOM, at the request of the Italian Publishers Association.

Placeholder alt text
Andrea Zanni in 2012

Among the experts cited by Stefanello for her article, former Wikimedia Italy president and Wikisource admin Andrea Zanni stands out. Now a digital librarian for openMLOL and a journalist for several Italian media, as well as the co-author of an e-book about the life of Aaron Swartz, Zanni says the death of the American hacktivist is not the only reason why the open access movement has lost the momentum it had gained throughout the 2000s and the early 2010s. According to him, this also happened due to the different priorities many of the people involved had to focus on when transitioning to adulthood – Zanni left Wikipedia himself, in order to spend more time with his family – and a decline in interest by newer generations, whose best IT talents often choose to make a personal profit out of their skills, instead. The former Wikimedia Italy president also reflects on the changes that have made the Internet more "capitalistic" and "egotistic" than it was ten years ago, underlining the web’s "centralization" in just a handful of privately owned social and entertainment media, its "mobilization" as a result of the shift of most online traffic from computers to smartphones, and its "dopaminization" through the wide spread of personalized content and advertisements.

Zanni ends his reflection on a high note, celebrating the success and the very existence of Wikipedia for over twenty years as one of the "huge battles won" by the movement, a topic he already wrote about for Domani in 2021. Given the disputes related to open access and public domain we still witness worldwide and the challenges Wikimedia projects will likely face in the near future, perhaps his words should be taken as more than just a good omen to start from. – O

In brief

George Washington Masonic Memorial in Alexandria, Virginia
Is someone smearing dirt on the 49ers?



Do you want to contribute to "In the media" by writing a story or even just an "in brief" item? Edit our next issue in the Newsroom or leave a tip on the suggestions page.




Reader comments

File:Grandstand and clubhouse at Fleetwood Park.jpg
The Horseman
PD
42
66
600
2024-01-31

The long road of a featured article candidate, part 2

In our last issue, part 1 of this story introduced you to the process of a featured article candidate review. This second part of the story gives some of the more important details. Details are important in the featured article candidate process.
Grandstand and clubhouse at Fleetwood Park, Bronx, New York
A FA candidate rounding the final curve

Last issue, I gave some thoughts on the general process of a Featured article candidate review. Today I'm going to take a deeper dive into some specific problems which came up at FAC. This won't make a lot of sense if you haven't already read part 1 to get the context.

I'll start with a couple of things that got left out of part 1. One of the unusual bits of FAC culture is that it's totally acceptable to solicit reviews. You could ask on other users' talk pages, or a wikiproject talk page. Another thought is that when you think your article is ready for FA, walk away from it for a few weeks. Once you've read something 100 times, you're burned out and taking a long break will often let you see things you glossed over before.

I also strongly suggest you get your article to WP:GA first; every time you get somebody else to read your stuff and tell you which parts of it suck, the end result is a better article. Also, if you don't bag a GA on the way to FA, you won't be eligible for WP:FOUR.

Types of sources

I was shocked that people objected to my sources. Many of my sources were news reports in The New York Times from the late 1800s. Going into this, I thought that was a good thing. Several reviewers objected to relying too heavily on contemporary news reports. It's not that the Times isn't a reliable source (although there was some pushback that the modern NYT's reputation doesn't necessarily extend back to the 1800s); it's that it was contemporary. People wanted to see what modern writers had written about the topic retrospectively. Unfortunately, for the topic I was writing on, there wasn't much. There were lots of modern mentions of the race track, but most of the coverage was cursory, and many of them were just regurgitating the same stories.

During the review, some reviewers did locate better sources, which I took advantage of to improve the article, but it would have been better if I had found them earlier (see my comments in part 1 about ticking clocks). This is the kind of thing which would have come out at peer review.

Make sure you've satisfied FA criterion 1c: it is a thorough and representative survey of the relevant literature. As great as Google is, it's not enough. Search other databases and aggregators, many of which are available through TWL. Search JSTOR. For historical topics, newspapers.com is always worth checking. TWL federated search (the "Search the library" box at the top of the collections page) performs a search on many of the individual collections in parallel; this turned up things I failed to find through my other efforts. The Library of Congress is always worth trying.

Source review

I was totally unprepared for the FA source review. WP:FACR talks about consistently formatted inline citations but that doesn't come close to explaining to what level "consistent" is taken. I think most of it is silly, but it is what it is and if you want to get your FA star, you need to comply. In some places I cited New York Times, in other places, The New York Times. Sometimes I wikilinked the paper name, sometimes not. ISSN, access=subscription, via, publisher, location, ISBN, LCCN, OCLC; all of these were "inconsistent" and had to be fixed. There's not any "right" or "wrong" way to do these things, just pick a way and do it consistently.

Black and white image of a PDP-11 based computer center
You think wiki markup is hard? Try using this as a word processor.

This brought up some bitter personal memories. One of my first jobs out of college (in the early 1980s) was working at a scientific research institute doing IT stuff. Unix was the hot new thing back then, and something I knew from school. So, using a cast-off pdp-11, I set up a unix system and started teaching a bunch of microbiologists how to do word processing with troff and ed. That included a bibliographic preprocessor called (what else?) "bib". Every journal had their own reference style and I got sucked into an endless vortex of writing complicated macros to produce the proper format references for each. And debugging each one when some hitherto unexpected situation came up. It was all so pointless. It was also essential, because journals would reject manuscripts if the punctuation wasn't correct. Here we are, 40 years later, and I'm getting dinged because my page numbers have hyphens instead of en-dashes, or some such. Pardon me if I can't get excited about that stuff.

Anyway, install User:Lingzhi2/reviewsourcecheck-sb.js and just keep fixing things until it stops complaining. That should at least get you close. But then you might break Who Wrote That (see T348906).

Image review

This was a pain because I had a bunch of public domain (PD) images based on their being published 75 years ago. What I didn't anticipate is that being created 75 years ago isn't enough; it needs to have been published 75 years ago, and you need to be able to demonstrate this. As much of a pain as this is, I get it. Copyright is important and the problem isn't that our rules are twisted; it's that copyright law in general is twisted. If you're depending on PD images, make sure you understand the difference between "created" and "published". Don't trust the license tags from commons; FA is stricter than commons. In your PR request, specifically mention you want somebody to double-check the images you believe are PD. If a reviewer isn't happy with the licensing of one of your images, look around to see if there's another image you could use in its place which comes with a better provenance. Swapping out the image will probably be a straighter line to "looks good to me" (LGTM) than arguing with your reviewer.

I don't know if this is standard FA practice, but it was suggested to me that I make my images larger than normal, using the "upright" parameter, i.e [[File:whatever.jpg|upright=1.4]]and I agree that this produces a nicer display. Don't get too hung up on the exact layout; everybody with different screen sizes, different fonts, different browsers (not to mention mobile vs desktop) is going to see it slightly differently. But if you are going to make the images larger than the default, "upright" is the way to do it (as opposed to specifying width and height in pixels) because that will get you some degree of device independence.

It's not clear if it's an FA requirement to add alt texts to images, but that was also suggested. If it's not strictly a requirement, I think it should be. It provides a descriptive text which screen reader (i.e. text-to-speech) software can use to aid an unsighted person reading your article. Don't just put some minimal "picture of a person" description so you can check off the "images have alt texts" review box. Put some effort into writing a good description which gives the unsighted user as much as possible the same experience a sighted person would have looking at the image. You can download browser plugins which let you preview the alt texts as you view the article.

I should note that there is disagreement about what makes a good alt text. Many people would look at the ones I write, point to various official recommendations, and say that I'm being way too verbose. That's fair, and you can make up your own mind what style makes sense to you, but don't allow "there's no clear consensus on the best way to do it" to become your excuse to not do it at all.

Source-to-text review

For me, this was the absolute killer, mostly because my sources were a mess. This review is to verify that each referenced statement actually is backed up by what the cited source says. I thought I had been quite careful to only say things that were backed up by a WP:RS. The problem is that as the article developed, statements slowly got disconnected from the sources which backed them up. And once that starts to happen, getting it stitched back together properly is a colossal pain. The fact that the Visual Editor is totally brain-dead when it comes to reference numbering (the numbers change when you open the page for editing, although it appears that this has very recently been fixed) only makes this worse. I ended up having to have three windows open; the review page, the exact revision the reviewer was talking about, and the current revision which I was editing. If the reviewer said "ref n", I'd search in window 2 for "[n]", find some piece of text next to it, and then search in the third window for that piece of text. The user experience was beyond abysmal. At a minimum, if your reviewer is referring to references by number, ask them for a permalink so at least you're sure both of you are looking at the same revision.

The take-home lesson is to stay on top of this while you're editing. Try not to have multiple references in a cluster; that makes it harder to understand which statements are supported by which citations. As you rearrange text during the normal course of editing, be paranoid about making sure you move the associated citations, duplicating or combining them as required. It's a pain, but it's less of a pain to stay on top of it while you edit than to try to dig yourself out of a mess later.

Churn, etc.

Successive reviewers will give you conflicting feedback to the same issue. Unless it's something that you're really passionate about, don't sweat it. I had one sentence which got rewritten four times; twice to add a particular word, and twice to remove that same word. It's just not worth worrying about. And certainly not worth arguing about.

Somewhat related to this is repeat comments from multiple reviewers. Let's say one reviewer says they don't like X, but you push back on that and convince them it's OK the way it is now. Then another reviewer makes the same comment about X. This is the time to suck it up and deal with the problem.

Summary

A screenshot of the Wikipedia main page, with text "I wrote this" overlaid
A black and white text list consisting of the words or phrases "Ealdgyth", "Nikkimaria", "Girth Summit", "Epicgenius", "MyCatIsAChonk", "Serial Number 54129", "Eddie891", and "JennyOz", one per line
List of reviewers for the author's Wikipedia Day 2024 lightning talk

I started Fleetwood Park Racetrack in December 2021, and submitted it to DYK at that time. In April 2022 I started another editing sprint and got it to GA in May of that year. I did minor tweaks for a while and submitted it to FAC in late August 2023. Keeping up with review comments and rewriting pretty much consumed all my wiki time for the next two months, and it finally passed FA at the end of October. There was one little chore left, nominating it for TFA; that was straight-forward and it ran on January 14th, a couple of days after part 1 of this story. By total coincidence, that turned out to be the same day as Wikipedia Day 2024, so I indulged myself with a 30-second, 2-slide lightning talk.

Was it worth it? Yes. I'll admit, there were times during the process when I would have said, "Hell, no!" At one point, I swore I would stubbornly see this to its conclusion and never do another FAC again, but I've already broken that promise. There's no doubt that my writing is better now than it was six months ago. It's also more FACR-compliant, but I'm not convinced those are the same thing. And while I now have a better handle on when to use a hyphen, en-dash, or em-dash (and when each one should or should not be surrounded by whitespace), I still think worrying about stuff like that is a pretty dumb way for a human to be investing their time.



Reader comments

File:Research Library, International UFO Museum-06.jpg
Myotus
CC 4.0 BY-SA
400
2024-01-31

Croatian takeover was enabled by "lack of bureaucratic openness and rules constraining [admins]"


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.


A "lack of bureaucratic openness and rules constraining administrator behavior" enabled nationalist takeover of Croatian Wikipedia

Reviewed by Bri and Tilman Bayer
Presentation at Wikimania 2023 about the findings

A paper titled "Governance Capture in a Self-Governing Community: A Qualitative Comparison of the Serbo-Croatian Wikipedias"[1] (accepted for publication in the CSCW 2024 proceedings) examines the well-known case of the Croatian Wikipedia's hijacking by far-right nationalists (from at least 2011 to 2020), and asks why the similarly situated Serbian, Bosnian and Serbo-Croatian Wikipedias managed to escape this fate.

As summarized in a post by the University of Washington's Center for an Informed Public (an interdisciplinary center involving UW's Information School, School of Law, and Department of Human Centered Design & Engineering), on the Croatian Wikipedia

[A] cabal [of nationalist editors] seized complete control of the governance of the encyclopedia, banned and blocked those who disagreed with them, and operated a network of fake accounts to give the appearance of grassroots support for their policies...
— CIP summary

This has already been documented in detail in a report commissioned by the Wikimedia Foundation (see e.g. prior Signpost coverage: "Croatian Wikipedia: capture and release", Disinformation report, 2021-06-27 and "Wikimedia Foundation builds 'Knowledge Integrity Risk Observatory' to enable communities to monitor at-risk Wikipedias", Recent research, 2022-11-28). As summarized in the present paper, "In part, the [WMF's] report attributed Croatian Wikipedia’s capture to a unique situation in which there were distinct Wikipedia editions for the standardized national variants of a pluricentric language: Bosnian-Croatian Montenegrin-Serbian (BCMS), sometimes referred to as Serbo-Croatian. This explanation, however, raises the question of why Serbian and Bosnian Wikipedia did not appear to suffer Croatian [Wikipedia's] fate."

To answer this question, the authors focus in particular on the comparison with Serbian Wikipedia (the largest of the four BCMS language Wikipedias; a Montenegrin Wikipedia does not exist currently, whereas the Serbo-Croatian Wikipedia, while catering to all the national variants, was deemed to be a less attractive takeover target due to its smaller audience and lack of "national resonance"). Their findings point at weak policies and norms that allowed capture to happen, especially the lack of policies around blocking, and the importance of integrity amongst the community's bureaucrats (users who can grant and remove admin permissions).

The researchers used a grounded theory approach, specifically a "qualitative analysis of interview data with a range of participants in Croatian and Serbian Wikipedia and in the broader Wikipedia community" (15 interviews in total). Based on this,

... we arrived at three propositions that, together, help explain why Croatian Wikipedia succumbed to capture while Serbian Wikipedia did not:

1. Perceived Value as a Target. Is the project worth expending the effort to capture?

2. Bureaucratic Openness. How easy is it for contributors outside the core founding team to ascend to local governance positions?

3. Institutional Formalization. To what degree does the project prefer personalistic, informal forms of organization over formal ones?

We found that both Croatian Wikipedia and Serbian Wikipedia were attractive targets for far-right nationalist capture due to their sizable readership and resonance with a national identity. However, we also found that the two projects diverged early on in their trajectories in terms of how open they remained to new contributors ascending to local governance positions and the degree to which they privileged informal relationships over formal rules and processes as organizing principles of the project. Ultimately, Croatian [Wikipedia's] relative lack of bureaucratic openness and rules constraining administrator behavior created a window of opportunity for a motivated contingent of editors to seize control of the governance mechanisms of the project.


— CIP summary

The authors state that their paper is the first academic work they know of "that has considered how distributed influence operations target, become deeply engaged with, and are facilitated by institutional and organizational arrangements within peer production communities like Wikipedia".

Among the limitations acknowledged in the paper, "none of its authors are fluent BCMS speakers. As a result, interviews were conducted in English." However, they attempted to compensate for this potential loss of relevant interviewees by also examining policy-related talk page discussion using Google Translate.

Perhaps more seriously, while the paper's insights certainly deserve wide attention by everyone concerned with similar issues in the Wikimedia movement, they are based on a single case - the authors note "that Croatian Wikipedia reflects only one potential path." They point to the case of Chinese Wikipedia, where "infiltration concerns" had led the Wikimedia Foundation to ban several admins in 2021 (Signpost coverage), illustrating "government pressure" as an important additional factor that "Future research could extend our framework" with. However, the authors do not mention that the Chinese Wikipedia case also provides important information relevant to factors that their paper did focus on and made conclusions about. For example, the Chinese Wikipedia community decided early on to build a single language project instead of separate ones for national variants of the Chinese language, aided by an (at the time) innovative automatic conversion system. As summarized in a 2009 paper,[supp 1]

"Chinese Wikipedia (CW) [...] has accommodated diverse Chinese-speaking contributors, despite the linguistic, regional, and political differences between four regions (Mainland China, Hong Kong/Macau, Taiwan, and Singapore/Malaysia). In the creation of CW, a technological polity was built by localizing Wikipedia’s governance principles, implementing Chinese character conversion, and establishing the “Anti-Regionalism Policy” (避免地域中心) [...an editorial policy that] addresses regional issues beyond those at the technolinguistic level. This policy does not exist in the English Wikipedia. An antidote to the current [2009] Chinese cyber-nationalism, the policy mandates that China-centric, Han-centric, and Chinese-centric statements should be avoided."

One can't help wondering if a similar "anti-regionalism policy" could have been an effective "antidote" against Croatian nationalism, too, and whether using a similar technology-aided conversion between writing systems of Serbo-Croatian early on could have helped maintain Serbo-Croatian Wikipedia as a common locus of collaboration instead of being overtaken by the nationally focused Croatian and Serbian Wikipedia. (Both the Serbian and Serbo-Croatian Wikipedia did eventually adopt automatic conversion systems.) Unfortunately, the present interview study fails to address such questions.


Briefly

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Tilman Bayer


"Why do you need 400 photographs of 400 different Lockheed Constellations" on Commons?

A shiny aircraft taking off, with mountains behind it

From the abstract:[2]

We review prior studies of Commons-Based Peer Production (CBPP) identifying four common value dimensions previously noted as present in CBPP: usage value, social value, ideological value, and monetary value. We use this synthetic framework to analyze a dataset of 32 interviews with contributors to Wikimedia Commons and editors of Wikipedia who use Commons resources. Our analysis supports the prior values categories while expanding how some dimensions are expressed by participants. We also highlight four additional value dimensions that were not previously identified in CBPP: cultural heritage value, rarity value, aesthetic value, and administrative value."

These 32 interviews are apparently the same as those that already served as the basis of an earlier, related paper by the same authors (cf. our review: "Unpacking Stitching between Wikipedia and Wikimedia Commons: Barriers to Cross-Platform Collaboration").

"From academic to media capital: To what extent does the scientific reputation of universities translate into Wikipedia attention?"

From the abstract:[3]

"[...] in most cases estimates of scientific reputation are based on composite or weighted indicators and absolute positions in university rankings. In this study, we adopt a more granular approach to assessment of universities' scientific performance using a multidimensional set of indicators from the Leiden Ranking and testing their individual effects on university [English] Wikipedia page views. We distinguish between international and local attention and find a positive association between research performance and Wikipedia attention which holds for regions and linguistic areas. Additional analysis shows that productivity, scientific impact, and international collaboration have a curvilinear effect on universities' Wikipedia attention. This finding suggests that there may be other factors than scientific reputation driving the general public's interest in universities."


"NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages"

Including loan words in a training corpus for natural language processing, a linguistic-computational technique closely interrelated with recent advances in artificial intelligence, can degrade the fidelity of the model that is supposed to represent the native language, not the language of the loan words. According to the authors, the relatively high fraction of loan words in Indonesian language Wikipedias (there are several) suffer from this defect. From a Twitter/X thread by one of the authors of this preprint:[4]

"Scraped data such as from Wikipedia is vital for NLP, but how reliable is it in low-resource settings? [...]
We explore 2 methods of building a corpus for 12 underrepresented Indonesian languages: by human translation, and by doing free-form paragraph writing given a theme.
We then compare their quality vs Wikipedia text.
[Compared to] Wikipedia data, both Nusa Translation (NusaT) and Nusa Paragraph (NusaP) are generally more lexically diverse and use fewer loan words. We also realize that apparently some of the Wikipedia pages for low-resource languages are mostly boilerplate. [...]
To conclude:
- We release NusaT and NusaP, high-quality corpus for 12 underrepresented languages

- Underrepresented languages corpus from Wikipedia does not represent the true language distribution [...]"

"Loanword identification based on web resources: A case study on Wikipedia"

From the abstract:[5]

"To alleviate the resource scarcity and improve the robustness in loanword identification, the current study proposes a novel loanword identification method based on Wikipedia. In this paper, we first present how to obtain loanword candidate datasets and comparable corpora from Wikipedia. On the basis of these corpora, we develop a pseudo-data generation model for loanword identification tasks. And then we put forward a loanword identification model [...]"

From the introduction:

"In order to evaluate the performance of our method, we have applied it to different receipt languages (Uyghur, Chinese and English). Experimental results showed that the proposed method achieves the best performance compared with other baseline systems in all domains."


"Time Lag Analysis of Adding Scholarly References to English Wikipedia"

From the abstract:[6]

"... [In] a time-series analysis of adding scholarly references to the English Wikipedia as of October 2021 [...] we detect no tendencies in Wikipedia articles created recently to refer to more fresh references because the time lag between publishing the scholarly articles and adding references of the corresponding paper to Wikipedia articles has remained generally constant over the years. In contrast, tendencies to decrease over time in the time lag between creating Wikipedia articles and adding the first scholarly references are observed. The percentage of cases where scholarly references were added simultaneously as Wikipedia articles are created is found to have increased over the years, particularly since 2007–2008. This trend can be seen as a response to the policy changes of the Wikipedia community at that time ..."

See also:


"Wikipedia as a tool for contemporary history of science: A case study on CRISPR

From the abstract:[7]

"Using a mixed-method approach, we qualitatively and quantitatively analyzed the CRISPR article’s text, sections and references, alongside 50 affiliated articles. These, we found, documented the CRISPR field’s maturation from a fundamental scientific discovery to a biotechnological revolution with vast social and cultural implications. We developed automated tools to support such research and demonstrated its applicability to two other scientific fields–coronavirus and circadian clocks."

Titles of the [CRISPR] article’s sections throughout 2010–2022, sampled biannually. Subsections and those listing sources were removed for clarity [...] Alignment and coloring were added manually to highlight sections repeating in consecutive revisions." (figure 3 B from the paper)

References

  1. ^ Kharazian, Zarine; Starbird, Kate; Hill, Benjamin Mako (2023-11-06). "Governance Capture in a Self-Governing Community: A Qualitative Comparison of the Serbo-Croatian Wikipedias". arXiv:2311.03616 [cs.CY]. Accepted for publication in Proceedings of the ACM on Human-Computer Interaction (CSCW 2024)
  2. ^ Yu, Yihan; McDonald, David W. (2023-09-28). ""Why do you need 400 photographs of 400 different Lockheed Constellation?": Value Expressions by Contributors and Users of Wikimedia Commons". Proceedings of the ACM on Human-Computer Interaction. 7 (CSCW2): 1–34. doi:10.1145/3610094. ISSN 2573-0142.
  3. ^ Arroyo-Machado, Wenceslao; Díaz-Faes, Adrián A.; Herrera-Viedma, Enrique; Costas, Rodrigo (2023-11-23). "From academic to media capital: To what extent does the scientific reputation of universities translate into Wikipedia attention?". Journal of the Association for Information Science and Technology. doi:10.1002/asi.24856. ISSN 2330-1635.
  4. ^ Cahyawijaya, Samuel; Lovenia, Holy; Koto, Fajri; Adhista, Dea; Dave, Emmanuel; Oktavianti, Sarah; Akbar, Salsabil Maulana; Lee, Jhonson; Shadieq, Nuur; Cenggoro, Tjeng Wawan; Linuwih, Hanung Wahyuning; Wilie, Bryan; Muridan, Galih Pradipta; Winata, Genta Indra; Moeljadi, David; Aji, Alham Fikri; Purwarianti, Ayu; Fung, Pascale (2023-09-19). "NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages". arXiv:2309.10661 [cs.CL]. code and dataset
  5. ^ Mi, Chenggang (June 2023). "Loanword identification based on web resources: A case study on wikipedia". Computer Speech & Language. 81: 101517. doi:10.1016/j.csl.2023.101517. ISSN 0885-2308. S2CID 257800179. Closed access icon
  6. ^ Kikkawa, Jiro; Takaku, Masao; Yoshikane, Fuyuki (2023). "Time Lag Analysis of Adding Scholarly References to English Wikipedia". In Isaac Sserwanga; Anne Goulding; Heather Moulaison-Sandy; Jia Tina Du; António Lucas Soares; Viviane Hessami; Rebecca D. Frank (eds.). Information for a Better World: Normality, Virtuality, Physicality, Inclusivity. Lecture Notes in Computer Science. Cham: Springer Nature Switzerland. pp. 425–438. doi:10.1007/978-3-031-28032-0_33. ISBN 9783031280320. Closed access icon
  7. ^ Benjakob, Omer; Guley, Olha; Sevin, Jean-Marc; Blondel, Leo; Augustoni, Ariane; Collet, Matthieu; Jouveshomme, Louise; Amit, Roy; Linder, Ariel; Aviram, Rona (2023-09-13). "Wikipedia as a tool for contemporary history of science: A case study on CRISPR". PLOS ONE. 18 (9): 0290827. Bibcode:2023PLoSO..1890827B. doi:10.1371/journal.pone.0290827. ISSN 1932-6203. PMC 10499201. PMID 37703244.
Supplementary references and notes:
  1. ^ Liao, H.-T. (2009). Conflict and consensus in the Chinese version of Wikipedia. IEEE Technology and Society Magazine, 28(2), 49–56. doi:10.1109/mts.2009.932799 Closed access icon




Reader comments

File:Punch-1887-11-19 Rather a Close Shave.PNG
Punch
PD
580
0
900
2024-01-31

We've all got to start somewhere

Placeholder alt text

(-10,993) . . (Fixed article) (Tags: Mobile edit, Mobile app edit, Newcomer task: copyedit, references removed, Replaced)







Reader comments

File:Drayton House, stables - geograph.org.uk - 2484908.jpg
John Sutton
CC BY-SA 2.0
50
30
500
2024-01-31

DJ, gonna burn this goddamn house right down

This traffic report is adapted from the Top 25 Report, prepared with commentary by Igordebraga, CAWylie, Krimuk2.0, Shuipzv3, I am RedoStone and Rahcmander.

It's murder on the dancefloor (January 7 to 13)

Rank Article Class Views Image Notes/about
1 Saltburn (film) 2,212,357 After a brief theatrical run, this British thriller has gathered enough attention on Prime Video, along with its appearances in the award circuit, to ultimately top this list. Writer-director Emerald Fennell tells the story of an Oxford freshman from a humble background who spends vacation in a mansion with a rich classmate's eccentric family, ending up causing quite an impact on their lives.
2 Uruguayan Air Force Flight 571 1,633,713 This gruesome flight crash and subsequent survival tale is dramatized in the Netflix film Society of the Snow by Spanish director J. A. Bayona (pictured).
3 Jeffrey Epstein 1,336,605 He's still dead, and over 170 of his associates were given until the first of the year to disassociate from him. I suppose people look at his article, in hopes of finding names?
4 Murder of Dee Dee Blanchard 1,206,375 As chronicled in 2019's The Act (starring Patricia Arquette, pictured), Dee Dee Blanchard fabricated illness and disabilities on daughter Gypsy Rose, down to forcing her to move around in a wheelchair, until Gypsy Rose got fed up and arranged for her online boyfriend to stab Dee Dee to death in 2015. Both had sex after the fact, and later got caught and convicted. The daughter is now a free woman and married, while the now ex-boyfriend is still in prison.
5 Barry Keoghan 1,171,071 Viewers keep on being curious about the young Irish actor starring in #1, where he performs graveyard sex, licks bodily fluids, and dances in the nude.
6 12th Fail 1,007,134 0 This Bollywood sleeper hit, about a poor teen who strives hard to become a police service officer, has gained much more popularity after its streaming release on Disney+ Hotstar.
7 Jim Harbaugh 960,486 College football's second top story of the week (see #10) was Harbaugh's Michigan Wolverines winning the national championship.
8 Kalen DeBoer 936,711 Following the resignation of #10, DeBoer, the head coach at the University of Washington, has accepted an offer to become head coach at the University of Alabama.
9 Deaths in 2024 936,637 I have spoken with the tongue of angels.
I have held the hand of a devil.
It was warm in the night, I was cold as a stone.
But I Still Haven't Found What I'm Looking For...
10 Nick Saban 904,986 Arguably one of the greatest college football coaches ever, having won seven national titles as a head coach, he retired this week.

But you'd better not kill the groove (January 14 to 20)

Rank Article Class Views Image Notes/about
1 Saltburn (film) 1,529,852 There's no stopping this psychological black comedy that's ruling the streaming charts, our views, and bathtub use, for weeks on end. It even made Americans finally discover the banger that is "Murder on the Dancefloor".
2 Nikki Haley 1,367,460 This American politician of Punjabi descent was once a governor of South Carolina and an ambassador to the UN. She currently is in the running to be the Republican Party candidate for the 2024 U.S. presidential election which will occur in November. With the January 15 Iowa caucus causing several candidates to withdraw (e.g. Vivek Ramaswamy), she is now second behind heavily favored, and former president, Donald Trump.
3 2023 AFC Asian Cup 968,589 Qatar is again hosting a big football event, and again with some delays to avoid scorching desert temperatures (hence why it's labeled 2023, but started in the following year). This week, the third round of the group stage will reduce the 24 teams to the 16 advancing to the knockout rounds, with two of the ones chasing a spot including Palestine, who certainly need some relief, and India, who are always in this Report and usually prefer lawn sports with bats or sticks.
4 Australian Open 936,821 The 112th edition of the tennis tournament started in Melbourne this week.
5 Deaths in 2024 884,820 No, I'll never forget you
I'll never let you out of my heart
You will always be here with me
I'll hold on to your memories, baby.
6 The Beekeeper (2024 film) 848,821 Jason Statham had four movies in 2023 alone (albeit one would've come out in 2022 if not for the Russian invasion of Ukraine), and he already started this year's quota of beating up people in David Ayer's (pictured) The Beekeeper, where Statham plays a black ops agent who wanted to retire raising bees, but changes his mind once his neighbor kills herself for losing her money in a phishing scam, eventually indulging in violent vengeance. Reviewers praised the straightforward approach with brutal action guided by an easy, compelling plot. Already, The Beekeper has made more money worldwide ($75 million) than the movie it opened opposite to, the musical remake of Mean Girls ($66 million).
7 Uruguayan Air Force Flight 571 834,609 45 people flew together into the Andes, but only 16 survived. Society of the Snow is the latest film to document the crash.
8 Hanu Man 829,891 The first Indian film hit of 2024 comes from the Telugu film industry. This well-received mythological superhero film starring Teja Sajja marks the beginning of yet-another cinematic universe, proving once again how much the MCU formula has taken over Indian pop-culture.
9 Africa Cup of Nations 788,206 Another continental football tournament featuring the wrong year in the latest edition, as the 2023 Africa Cup of Nations was delayed to January 2024 due to adverse weather conditions in the Ivory Coast. A few traditional powerhouses of the competition have been underperforming, like Ghana, Cameroon, Tunisia, and Algeria, who also saw one of its supporters be deported after insulting the host nation.
10 Christina Applegate 753,336 The 75th Primetime Emmy Awards were supposed to take place in September, but were delayed to this month due to the 2023 Hollywood labor disputes. The first category, Best Supporting Actress in a Comedy, was co-presented by this actress who retired due to multiple sclerosis that downright forced her to walk the stage with a cane, where she was met with a standing ovation.

Most edited articles

For the December 22 – January 22 period, per this database report (with some additions from the weekly equivalent).

Title Revisions Notes
Deaths in 2024 1494 New year, and the obituary already saw the additions of Franz Beckenbauer, Mário Zagallo, Gigi Riva, David Soul, Christian Oliver, Adan Canto and Alec Musser.
2024 Sea of Japan earthquake 1179 On New Year's Day, a magnitude 7 earthquake and its resulting tsunami hit the Noto Peninsula, leading to 233 fatalities, 22 missing, and over 1,200 injured. It even affected the biggest vehicle maker in the world, who stated their 2024 operations would be delayed domestically due to damage to their suppliers.
2024 Haneda Airport runway collision 1157 Another incident in Japan, with fewer casualties: a Japan Coast Guard jet and an Airbus A350 collided in the runway while landing at Haneda Airport, leading to both planes catching fire. The smaller plane saw only the captain surviving and the five other crewmen dying, while the bigger one had at most 14 injuries.
Bigg Boss (Hindi season 17) 1001 Like its cinema, India has editions of Big Brother for all its languages.
2024 Australian Open – Men's singles 924 The first tennis Grand Slam, hosted in Melbourne. Novak Djokovic wants to reach a record 25th Grand Slam title, his 11th in Australia, while Carlos Alcaraz wishes to stop him to also top the ATP rankings.
2023–24 NFL playoffs 848 14 teams trying to get to Super Bowl LVII in Las Vegas. The final four are defending champions Kansas City Chiefs, 2012 winners Baltimore Ravens, 5 time winners with 30 year drought San Francisco 49ers, and long-time chew toy Detroit Lions.
Religion of the Shang dynasty 799 While we give Legalism (Chinese philosophy) a break from this, another Chinese article, mostly thanks to Strongman13072007.
2024 Australian Open – Women's singles 787 The female side of the Grand Slam. There were six past champions in the main draw: defending one Aryna Sabalenka along with Naomi Osaka, Sofia Kenin, Caroline Wozniacki, Angelique Kerber, and Victoria Azarenka.
Anton Webern 731 MONTENSEM is cleaning up the article of this Austrian composer.
History of Christianity 708 A vital page brought up to Good Article status.
2023 Israel–Hamas war 700 As I said in the annual report, the conflict entered its third month, Benjamin Netanyahu said it would last for six, but everyone would rather just see the bloodshed ending.
2024 PDC World Darts Championship 692 The tourney ran from December 15 to January 3, with "Cool Hand Luke" emerging victorious. Might as well, since he's World No. 1.
2024 presidential eligibility of Donald Trump 689 A very valid concern, as after all if you're going through two prosecutions there's a precedent to deny you to run for public office. (I should know, having witnessed firsthand an election that only ended the way it did, with horrible repercussions, because a candidate was rejected.)
Line 5 (Chennai Metro) 657 Some IPs decided to expand the article on the trains covering the Chennai Metro.
Andrew Johnson's drunken vice-presidential inaugural address 653 Courtesy of Jengod, this article with an eye-catching title forked off Andrew Johnson alcoholism debate regarding a plastered VP who would just 42 days later become president.

Exclusions

  • These lists exclude the Wikipedia main page, non-article pages (such as redlinks), and anomalous entries (such as DDoS attacks or likely automated views). Since mobile view data became available to the Report in October 2014, we exclude articles that have almost no mobile views (5–6% or less) or almost all mobile views (94–95% or more) because they are very likely to be automated views based on our experience and research of the issue. Please feel free to discuss any removal on the Top 25 Report talk page if you wish.



Reader comments

If articles have been updated, you may need to refresh the single-page edition.