I did a small test of RDF dump generation for SDC/mediainfo. Elasticsearch data shows that there are about 500k files on Commons with labels and about 850k files with statements (these largely intersect). The way we dump entities right now, we scan all the files (page IDs) and skip those that do not have structured data. However, as right now only about 2% of files has data, so it is very wasteful process - we process 100 pages to find one proper mediainfo entity, essentially. We may want to find a way to do better, though not sure that current classes allow it - we may have to implement some special class instead of SqlEntityIdPager.
I tried dumping 100K mediainfo entities, and that took 166.5 minutes. On one hand, given that we can parallelize, if we split it into 8 shards, we might be done in reasonable time. On the other hand, average of 10 items per second is too slow. If we expect coverage of files with mediainfo to increase significantly (e.g. 10x and more) then it's maybe not that big of a deal (though T222497: dumpRDF for MediaInfo entities loads each page individually) still remains a factor but as it is now, RDF dumping process for mediainfo is very inefficient.