So this has been resolved? 13GB of the 20230701 dump was so large why? Because it contained duplicate documents? Otherwise it is unclear why it is just 9.6 GB now.

Mar 18 2024, 10:55 AM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Mar 1 2024

Mitar created T358929: Enterprise HTML dump contains invalid timestamps.

Mar 1 2024, 11:16 PM · Wikimedia Enterprise

Jan 4 2024

Mitar added a comment to T298394: Produce regular public dumps of Commons media files.

Those are very much useless because they contain wiki markup and this is the reason why Enterprise HTML dumps are so useful, because they contain rendered page. Wiki markup content is useless because a lot of content gets pulled in through templates and other mechanisms. HTML dumps contain all that.

Jan 4 2024, 6:28 PM · Datasets-Archiving, Internet-Archive, Dumps-Generation, Commons-Datasets, Commons

Jan 3 2024

Mitar added a comment to T298394: Produce regular public dumps of Commons media files.

Some time ago I made T300907 to ask for HTML Enterprise dump of Wikimedia Commons. I think that could be seen as 10% of what this issue is. Images namespace is already dumped for English Wikipedia and having it be also done for Commons would allow one to use it for AI training - you would get image description and other information and links to image/media. This is similar to how other AI training datasets look like, they just point to media on the Internet but generally do not include the media itself (also for copyright reasons). So it is something people are familiar with.

Jan 3 2024, 11:39 PM · Datasets-Archiving, Internet-Archive, Dumps-Generation, Commons-Datasets, Commons

Mitar added a comment to T300907: Wikimedia Enterprise HTML dump for Wikimedia Commons.

Is there any way I could help push this further? I really think it would be very useful to have coverage of commons as well. I would guess that also enterprise users would love to have those dumps now that everyone is training AI models on images + descriptions.

Jan 3 2024, 11:25 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Mitar added a comment to T318371: Provide dumps of Template (10) and Module (828) namespaces Wikimedia Enterprise HTML dumps .

Is there any way I could help push this further? I really think this would be beneficial as it allows one to better understand structure of the content you can obtain from HTML dumps.

Jan 3 2024, 11:23 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Mitar added a comment to T298437: Provide a public pull API endpoint.

I made a topic at https://meta.wikimedia.org/wiki/Talk:Wikimedia_Enterprise#Public_access_to_on-demand_API

Jan 3 2024, 11:20 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Nov 9 2023

Mitar added a comment to T298437: Provide a public pull API endpoint.

Is the On-Demand API (docs link I gave prior) what you're looking for in regards to functionality? It grabs the current version of a single article. That is what you're asking for yes?

Nov 9 2023, 11:23 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Nov 7 2023

Mitar added a comment to T298437: Provide a public pull API endpoint.

And the article lookup API is not something which would be made publicly? Maybe at a deployment with lower SLA and rate limited differently than enterprise users? I simply find the contents aggregated in JSON per article very useful to get in one API query. Especially if you start with a dump and then want to fill in gaps or update local info about articles.

Nov 7 2023, 8:31 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Mitar added a comment to T298437: Provide a public pull API endpoint.

Is push endpoint available to the community members? Or just enterprise customers? Could you please provide a link to documentation? Maybe I missed something?

Nov 7 2023, 3:47 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Oct 16 2023

Mitar added a comment to T299464: Add to article schema number of revision the article has, at the current revision.

@FNavas-foundation Not sure what you mean. Maybe I am missing something, but revision.id by itself does not give you the count I am asking for, i.e., the id is not local to the article and does not increase by 1 every time the article has changed. Or am I mistaken?

Oct 16 2023, 7:30 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Oct 4 2023

Mitar added a comment to T305407: Stale data / missing pages in HTML ("enterprise") .

so why not simply base the HTML dumps off them?

Oct 4 2023, 8:26 AM · Wikimedia Enterprise, Dumps-Generation

Mitar added a comment to T305407: Stale data / missing pages in HTML ("enterprise") .

Sadly, the stats are not easily available for other namespaces (while dumps are made for them as well). I think currently the best way to get those counts is to use ElasticSearch endpoint, as described in T312200.

Oct 4 2023, 7:22 AM · Wikimedia Enterprise, Dumps-Generation

Aug 18 2023

Mitar added a comment to T305407: Stale data / missing pages in HTML ("enterprise") .

BTW, there is a related issue T300124 that some categories and templates are not always extracted.

Aug 18 2023, 8:39 AM · Wikimedia Enterprise, Dumps-Generation

Mitar closed T299621: Add to article schema the timestamp of the first revision as Resolved.

You are right. It is listed now in docs.

Aug 18 2023, 8:32 AM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

May 13 2023

Mitar added a comment to T324954: Provide zstd compression algorithm for dumps.

@Sascha also added benchmarks for different compressions algorithms to https://github.com/brawer/wikidata-qsdump. Thanks for that!

May 13 2023, 9:05 AM · Dumps-Generation

Mitar added a comment to T222985: Provide wikidata JSON dumps compressed with zstd .

Awesome! Thanks. This looks really amazing. I am not too convinced that we should introduce a different dump format, but changing compression seems to really be a low hanging fruit.

May 13 2023, 9:03 AM · [DEPRECATED] wdwb-tech, Dumps-Generation, Wikidata

May 9 2023

Mitar added a comment to T307610: I am hitting a rate limit on REST API endpoint.

Yes, great summary. Thanks.

May 9 2023, 8:42 PM · Documentation, Traffic, RESTBase-API

May 8 2023

Mitar added a comment to T222985: Provide wikidata JSON dumps compressed with zstd .

I think it would be useful to have a benchmark with more options: JSON with gzip, bzip (decompressed with lbzip2), and zstd. And then for QuickStatements the same. Could you do that?

May 8 2023, 12:45 PM · [DEPRECATED] wdwb-tech, Dumps-Generation, Wikidata

May 1 2023

Mitar added a comment to T307610: I am hitting a rate limit on REST API endpoint.

To my knowledge it is. https://www.mediawiki.org/wiki/Wikimedia_REST_API#Terms_and_conditions still says that 200 requests/second per REST API endpoint is fine (unless documented to have less, for example transform API endpoint) and configuration says differently: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/varnish/templates/text-frontend.inc.vcl.erb#431.

May 1 2023, 8:55 PM · Documentation, Traffic, RESTBase-API

Mar 17 2023

Mitar added a comment to T332045: Stream input file from a tarball.

Yes, I made a library for processing those dumps in Go.

Mar 17 2023, 12:34 PM · WMDE-TechWish-Sprint-2023-03-14, WMDE-References-FocusArea

Mitar added a comment to T332045: Stream input file from a tarball.

In large dumps there are multiple files inside one archive. So tar serves as a standard way to combine those multiple files into one file, and then compression is made over all of that.

Mar 17 2023, 12:12 PM · WMDE-TechWish-Sprint-2023-03-14, WMDE-References-FocusArea

Mitar added a comment to T332045: Stream input file from a tarball.

So tar format is really made for streaming, so I am surprised that this is hard to do in your programming language. Seeking is what is problem in tar, but streaming is really easy. It is really just concatenation of files. So it is similar to any other buffered stream.

Mar 17 2023, 10:27 AM · WMDE-TechWish-Sprint-2023-03-14, WMDE-References-FocusArea

Mitar added a comment to T298436: Wikimedia Enterprise HTML dumps as bzip2 archive.

I will respond about tar layer in T332045 you made.

Mar 17 2023, 10:26 AM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Feb 15 2023

Mitar added a comment to T254063: OAuth extension should support OpenID Connect.

In OIDC however the same data should also be returned alongside the access token

Feb 15 2023, 8:38 AM · MediaWiki-extensions-OAuth

Mitar closed T71232: Provide "authenticate" endpoint for regular users as Declined.

I agree. Let's close this issue then and track OIDC parameters in the issue you referenced.

Feb 15 2023, 8:19 AM · MediaWiki-extensions-OAuth

Mitar closed T71232: Provide "authenticate" endpoint for regular users, a subtask of T86869: Support a nice sso experience with MediaWiki's OAuth, as Declined.

Feb 15 2023, 8:19 AM · Roadmap, MediaWiki-extensions-OAuth, Epic

Feb 14 2023

Mitar added a comment to T300124: In Wikimedia Enterprise HTML Dumps, categories and templates are not always extracted.

So there is API call to get all categories/templates which is then included into dumps. The issue here is that sometimes that call seems to fail and not have data included.

Feb 14 2023, 5:57 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Mitar added a comment to T298436: Wikimedia Enterprise HTML dumps as bzip2 archive.

You should use a parallel gzip decompressor. Just using standard gzip (which is invoked by tar) is not that.

Feb 14 2023, 8:33 AM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Mitar added a comment to T300124: In Wikimedia Enterprise HTML Dumps, categories and templates are not always extracted.

How can you parse all the templates used from Parsoid HTML? Some templates do not have any output into HTML?

Feb 14 2023, 8:26 AM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Feb 13 2023

Mitar added a comment to T71232: Provide "authenticate" endpoint for regular users.

Since I opened that issue I learned more about OIDC and I think there is no need really for authenticate endpoint anymore. authorize and access_token endpoints currently are good enough. I think what throw me off is that (at least at the time, I have not checked recently) is that it looked like there is no way to get automatically redirected back to the app if user is still signed-in into the Mediawiki instance AND has already authorized the app. Other OIDC providers just redirect back. But Mediawiki (at the time at least) always showed the authorization dialog every time. So one click sign-in flow was less fluid than one would like to have.

Feb 13 2023, 1:53 PM · MediaWiki-extensions-OAuth

Dec 27 2022

Mitar added a comment to T71232: Provide "authenticate" endpoint for regular users.

Is this something a community contribution could help with?

Dec 27 2022, 3:35 PM · MediaWiki-extensions-OAuth

Mitar added a comment to T86869: Support a nice sso experience with MediaWiki's OAuth.

So what is the plan about this? Is this something a community contribution could help with?

Dec 27 2022, 3:35 PM · Roadmap, MediaWiki-extensions-OAuth, Epic

Sep 22 2022

Mitar added a comment to T300907: Wikimedia Enterprise HTML dump for Wikimedia Commons.

A gentle ping on this. I understand that it would be a large dump. But on the other hand it is a very important one: Wikimedia Commons is lacking any other substantial dump so having a dump of file pages would help one obtain at least descriptions of all files through a dump. That can be useful for many use cases, like training AI models, search engines, etc.

Sep 22 2022, 7:33 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Mitar created T318371: Provide dumps of Template (10) and Module (828) namespaces Wikimedia Enterprise HTML dumps .

Sep 22 2022, 7:17 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Mitar added a comment to T306409: Regression in processing PDFs on Wikimedia Commons: No width, height, no metadata.

Any luck finding a reason?

Sep 22 2022, 4:34 PM · Wikimedia-maintenance-script-run, User-TheDJ, MediaWiki-File-management, Commons

Aug 6 2022

Mitar awarded T19993: Option on API lists to only have count of links/categories/whatever returned, rather than a full resultset a Like token.

Aug 6 2022, 4:00 PM · Performance Issue, MediaWiki-Action-API

Mitar updated subscribers of T312200: Mediawiki API endpoint to get number of pages in a namespace.

A workaround is available in this StackOverflow question/answer: https://stackoverflow.com/questions/73223844/get-the-number-of-pages-in-a-mediawiki-wikipedia-namespace

Aug 6 2022, 3:55 PM · MediaWiki-Action-API

Jul 13 2022

Mitar added a comment to T300907: Wikimedia Enterprise HTML dump for Wikimedia Commons.

Just HTML dumps. So what you provide here https://dumps.wikimedia.org/other/enterprise_html/ but also for commons wiki. (You already provide namespace 6 for other wikis.)

Jul 13 2022, 4:36 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Mitar updated subscribers of T300907: Wikimedia Enterprise HTML dump for Wikimedia Commons.

I tried now to use API to fetch things myself, but it is going very slow (also because rate limit on HTML REST API endpoint is 100 requests per second and not documented 200 requests per second, see T307610). I would like to understand if I should at least hope for this to be done at some point soon or not at all. I find it surprising that so many dumps are made but just this one is missing. Would that be just one switch to enable dump on one more wiki?

Jul 13 2022, 4:07 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Jul 6 2022

Mitar created T312200: Mediawiki API endpoint to get number of pages in a namespace.

Jul 6 2022, 9:24 AM · MediaWiki-Action-API

Jul 5 2022

Mitar added a comment to T312112: Mediawiki Commons structured data is missing structured data for files with quotes.

OK, it is not connected to characters in the filename. There are files in entities with above characters. But I do not get why not all files on Wikimedia Commons have entities.

Jul 5 2022, 3:07 PM · Structured-Data-Backlog, StructuredDataOnCommons, Structured Data Engineering, Commons

Mitar updated the task description for T312112: Mediawiki Commons structured data is missing structured data for files with quotes.

Jul 5 2022, 2:59 PM · Structured-Data-Backlog, StructuredDataOnCommons, Structured Data Engineering, Commons

Mitar updated the task description for T312112: Mediawiki Commons structured data is missing structured data for files with quotes.

Jul 5 2022, 2:54 PM · Structured-Data-Backlog, StructuredDataOnCommons, Structured Data Engineering, Commons

Mitar updated the task description for T312112: Mediawiki Commons structured data is missing structured data for files with quotes.

Jul 5 2022, 2:52 PM · Structured-Data-Backlog, StructuredDataOnCommons, Structured Data Engineering, Commons

Mitar created T312112: Mediawiki Commons structured data is missing structured data for files with quotes.

Jul 5 2022, 2:44 PM · Structured-Data-Backlog, StructuredDataOnCommons, Structured Data Engineering, Commons

Jul 4 2022

Mitar added a comment to T311977: Wikimedia Commons entity dumps are lacking datatype field.

Oh, what a sad issue T149410. :-(

Jul 4 2022, 10:59 AM · Structured-Data-Backlog, Structured Data Engineering, StructuredDataOnCommons, Commons

Mitar created T311977: Wikimedia Commons entity dumps are lacking datatype field.

Jul 4 2022, 6:50 AM · Structured-Data-Backlog, Structured Data Engineering, StructuredDataOnCommons, Commons

Jun 29 2022

Mitar created T311633: Unable to get imageinfo only for the latest revision.

Jun 29 2022, 2:23 PM · MediaWiki-Action-API

Mitar added a comment to T307610: I am hitting a rate limit on REST API endpoint.

Hm, I am pretty sure that I am doing rate limiting correctly on my side, but I am hitting 429s after a brief time when trying to do 1000/10s rate limit to the REST API endpoint. If I lower it to 500/10s then I do not hit 429s. No idea why, but I am doing many requests in parallel.

Jun 29 2022, 6:12 AM · Documentation, Traffic, RESTBase-API

Jun 28 2022

Mitar added a comment to T300907: Wikimedia Enterprise HTML dump for Wikimedia Commons.

Hm, there was no response since February. :-( OK, I will wait.

Jun 28 2022, 2:47 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Mitar added a comment to T300907: Wikimedia Enterprise HTML dump for Wikimedia Commons.

Who could I ask from their team about this?

Jun 28 2022, 1:27 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Mitar added a comment to T311441: Missing Enterprise Dumps from 2022-06-20 run.

So if I understand correctly, those files have never been generated so that particular dump for that particular date will not be available?

Jun 28 2022, 1:26 PM · Dumps-Generation

Mitar updated subscribers of T300907: Wikimedia Enterprise HTML dump for Wikimedia Commons.

@ArielGlenn: Do you think dumps of file descriptions (so not media files themselves, but wikitext rendered) could be provided for Wikimedia Commons as part of public Enterprise dumps? Given that so many other wikis are generated, why not also Wikimedia Commons? This could help me obtain descriptions for files on Wikimedia Commons (and given already no other dumps for Wikimedia Commons, it would help me hit its API less).

Jun 28 2022, 10:46 AM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Mitar added a comment to T300124: In Wikimedia Enterprise HTML Dumps, categories and templates are not always extracted.

@Protsack.stephan Was there any progress on this?

Jun 28 2022, 10:44 AM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Jun 24 2022

Mitar closed T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title as Resolved.

Jun 24 2022, 12:48 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

I checked commons-20220620-mediainfo.json.bz2 and it contains title field (alongside other fields which are present in API).

Jun 24 2022, 12:47 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Mitar closed T278031: Wikibase canonical JSON format is missing "modified" in Wikidata JSON dumps as Resolved.

Jun 24 2022, 12:47 PM · MW-1.39-notes (1.39.0-wmf.14; 2022-05-30), [DEPRECATED] wdwb-tech, Dumps-Generation, Wikidata, Wikibase (3rd party installations)

Mitar added a comment to T278031: Wikibase canonical JSON format is missing "modified" in Wikidata JSON dumps.

I checked wikidata-20220620-all.json.bz2 and it contains now modified field (alongside other fields which are present in API).

Jun 24 2022, 12:46 PM · MW-1.39-notes (1.39.0-wmf.14; 2022-05-30), [DEPRECATED] wdwb-tech, Dumps-Generation, Wikidata, Wikibase (3rd party installations)

Jun 13 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

So for the next dump which will run, this will now be included? Or is there some deployment which is still necessary?

Jun 13 2022, 12:52 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Jun 11 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

Awesome. I will try to do so when you are online, but feel free also to just merge it without me. I do not know if I can be of much help being around anyway. :-)

Jun 11 2022, 2:24 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Jun 9 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

What is this subsetting you are talking about?

Jun 9 2022, 5:23 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

So what is the next step here?

Jun 9 2022, 2:32 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Jun 8 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

Yes, this change should fix both this issue and T278031.

Jun 8 2022, 7:23 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Jun 7 2022

Mitar added a comment to T298437: Provide a public pull API endpoint.

Awesome, thanks!

Jun 7 2022, 4:26 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

Thanks for testing!

Jun 7 2022, 10:09 AM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Jun 5 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

Done. Added it to June 7 puppet request window. Please review/advise if I did something wrong.

Jun 5 2022, 8:16 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

May 27 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

Awesome. Thanks for explaining.

May 27 2022, 12:33 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Mitar updated subscribers of T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

So fix to the dump script has been merged to the Wikibase extension. It is gated behind a CLI switch. What is the process that this gets turned on for dumps from Wikimedia Commons (and ideally also for Wikidata)?

May 27 2022, 12:27 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

May 22 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/793934 is ready for a review, it has both opt-in configuration option and a test.

May 22 2022, 11:52 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

May 21 2022

Mitar added a comment to T305407: Stale data / missing pages in HTML ("enterprise") .

I thin this might be related to T274359.

May 21 2022, 11:09 PM · Wikimedia Enterprise, Dumps-Generation

Mitar added a comment to T274359: Mobile REST API delivers year old+ content for very select pages.

I think this might be related to T305407.

May 21 2022, 11:09 PM · RESTBase Sunsetting, User-TheresNoTime, Page Content Service, Wikipedia-Android-App-Backlog, RESTBase-API, affects-Kiwix-and-openZIM

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

I made another pass, adding configuration option to not include page metadata (then dump is without title and other page metadata).

May 21 2022, 3:56 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

May 20 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

I made a first pass. Feedback welcome.

May 20 2022, 9:41 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

So the plan is:

May 20 2022, 4:48 PM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons

May 11 2022

Mitar added a comment to T307610: I am hitting a rate limit on REST API endpoint.

Most of that is controlled by the SRE team at a level in front of the REST API, since the frontend caching layer is a shared resource across everything.

May 11 2022, 6:42 AM · Documentation, Traffic, RESTBase-API

May 10 2022

Mitar added a comment to T307610: I am hitting a rate limit on REST API endpoint.

Because our edge traffic code enforces a stricter limit of ~100/s (for responses that aren't frontend cache hits due to popularity), before the requests ever get to the Restbase service.

May 10 2022, 7:32 PM · Documentation, Traffic, RESTBase-API

Mitar added a comment to T307610: I am hitting a rate limit on REST API endpoint.

May 10 2022, 6:54 PM · Documentation, Traffic, RESTBase-API

Mitar added a comment to T307610: I am hitting a rate limit on REST API endpoint.

Sadly bulk downloads do not have HTML dumps, and Enterprise dumps do not offer them for template/module documentation (only articles, categories, and files). Also, there are no Enterprise dumps for Wikimedia Commons.

May 10 2022, 6:28 PM · Documentation, Traffic, RESTBase-API

Mitar added a comment to T307610: I am hitting a rate limit on REST API endpoint.

Hm, but documentation for REST API says I can use 200 requests per second? https://en.wikipedia.org/api/rest_v1/

May 10 2022, 6:27 PM · Documentation, Traffic, RESTBase-API

May 4 2022

Mitar updated the task description for T307629: Unable to use REST API to get HTML of Template:;.

May 4 2022, 8:53 PM · RESTBase-API

Mitar created T307629: Unable to use REST API to get HTML of Template:;.

May 4 2022, 8:53 PM · RESTBase-API

Mitar created T307610: I am hitting a rate limit on REST API endpoint.

May 4 2022, 6:09 PM · Documentation, Traffic, RESTBase-API

Mitar added a comment to T300124: In Wikimedia Enterprise HTML Dumps, categories and templates are not always extracted.

Even if you request a single title, I think you still might get continue param.

May 4 2022, 11:07 AM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

May 3 2022

Mitar added a comment to T300124: In Wikimedia Enterprise HTML Dumps, categories and templates are not always extracted.

Are you using Mediawiki API to obtain categories and templates? I am betting you are not processing continue properly to merge multiple API responses when one batch of data is distributed across multiple responses. You have to merge data, otherwise some pages look like they have no templates/categories. I just now encountered that when I was using API to populate templates/categories manually (because dumps are missing them randomly). I used the following API query and you can see with some luck that some of returned pages are missing templates/categories, because you have to follow continue params, but then other pages are missing. Only when batchcomplete is true you know you got everything (but you have to merge everything you got before that).

May 3 2022, 6:34 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

May 1 2022

Mitar added a comment to T63111: Convert primary key integers and references thereto from int to bigint (unsigned).

I think I misunderstood in T301039 from documentation that those pointers are pointing to the text table.

May 1 2022, 4:28 PM · MW-1.43-notes (1.43.0-wmf.4; 2024-05-07), MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MediaWiki-General, Schema-change, DBA

Apr 28 2022

Mitar added a comment to T301104: Wikimedia Commons structured data dump does not contain all fields, e..g, title.

I would be interesting in doing that, but I probably need a helping hand to do it. So I have programming background, but zero understanding of where and how this could be fixed. My understanding is that hackathon would be suitable for this? Do I have to make a session? How do I find other people who might be able to help me?

Apr 28 2022, 7:21 AM · Wikimedia-Hackathon-2022, Structured-Data-Backlog, Structured Data Engineering, Commons