Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemoryError: GC overhead limit exceeded #37

Closed
visenger opened this issue Feb 4, 2016 · 7 comments
Closed

OutOfMemoryError: GC overhead limit exceeded #37

visenger opened this issue Feb 4, 2016 · 7 comments

Comments

@visenger
Copy link

visenger commented Feb 4, 2016

Hello,

currently, I am running a set of experiments on the HOSP dataset:
1k - 100k tuples with 2%-10% noise introduced by myself.
HOSP is provided within NADEEF github repository.

At some point, I am getting OutOfMemoryError: GC overhead limit exceeded
Exception in thread "Thread-762" java.lang.OutOfMemoryError: GC overhead limit exceeded

I increased the memory usage, and run NADEEF on Linux machine with the following configuration:

java -Xmx14G -cp out/bin/*:examples/:out/test qa.qcri.nadeef.console.Console
Any idea, what else I could configure to get NADEEF running without OOM errors?

Thank you for your help.

@hammady
Copy link
Contributor

hammady commented Feb 8, 2016

Hi @visenger
Can you please share the dataset after introducing the noise?
It is also helpful to share the full error log.
Thank you.

@visenger
Copy link
Author

visenger commented Feb 8, 2016

Hi @hammady , I created a repo for this https://github.com/visenger/noiselog.git

Let me know if you have any questions. Thank you!

@hammady
Copy link
Contributor

hammady commented Feb 11, 2016

Thanks @visenger, do you remember which $i/$j combinations gave you the memory issue?

@visenger
Copy link
Author

Hi Hossam, I run this experimens several times, and the OOM issue is not deterministic. E.g. i=4%; j=40k or i=8%; j=100k
Different combination and different order execution can lead to the different points of OOM.
Here is an example of the current setting: https://github.com/visenger/noiselog/blob/master/args.txt

Thank you a lot @hammady !

@hammady
Copy link
Contributor

hammady commented Feb 11, 2016

For documentation purposes, here is the error trace:

Exception in thread "Thread-3" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3236)
        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
        at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
        at qa.qcri.nadeef.core.pipeline.ViolationExportToCSV.execute(ViolationExportToCSV.java:90)
        at qa.qcri.nadeef.core.pipeline.ViolationExportToCSV.execute(ViolationExportToCSV.java:34)
        at qa.qcri.nadeef.core.pipeline.Node.execute(Node.java:68)
        at qa.qcri.nadeef.core.pipeline.Flow$1.run(Flow.java:174)

@hammady
Copy link
Contributor

hammady commented Feb 15, 2016

@visenger I have changed the way violations table is written to CSV, this should fix the memory problem (at least at this point of code). However, please note that currently NADEEF is not scalable and it may not be able to handle big data. We are working on a scalable NADEEF on a private repo if you are interested.

@visenger
Copy link
Author

@hammady Thank you a lot! I will check this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants