Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Page MenuHomePhabricator

Allow CHUNK value to be passed in as an option for munge.sh
Closed, ResolvedPublic

Description

When importing munged data into a fresh query service some timeouts happen:

Processing wikidump-000000274.ttl.gz
SPARQL-UPDATE: updateStr=LOAD <file:///mnt/disks/ssddata/mungeOut/wikidump-000000274.ttl.gz>
java.util.concurrent.TimeoutException

Reducing the chunk size of the munge step seems to resolve this.

Right now the chunk size is not customizable without changing the file.
It would be great to be able to pass the chunk size in as an option.

Event Timeline

Smalyshev triaged this task as Medium priority.Aug 15 2019, 6:23 AM

This should not be hard to do.

Change 530634 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[wikidata/query/rdf@master] Allow specifying chunk size in the script

https://gerrit.wikimedia.org/r/530634

Change 530634 merged by jenkins-bot:
[wikidata/query/rdf@master] Allow specifying chunk size in the script

https://gerrit.wikimedia.org/r/530634

Amazing, would it be possible to get this bakcported to 0.3.1?
Or should I just back it into the docker images ? :)

@Addshore I was planning to do 0.3.2 pretty soon, if that is easier for you you could use that.

@Addshore I was planning to do 0.3.2 pretty soon, if that is easier for you you could use that.

Sounds great to me/ I'll keep an eye out.