Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High RAM/CPU utilization #442

Open
dorinmun opened this issue Oct 1, 2024 · 9 comments
Open

High RAM/CPU utilization #442

dorinmun opened this issue Oct 1, 2024 · 9 comments
Assignees

Comments

@dorinmun
Copy link

dorinmun commented Oct 1, 2024

I am running the edifact-to-xml example and I see what it seems to be very high RAM and CPU utilization. Running it on a Mac with M2 and 24GB under latest macOS, as well as under Ubuntu 22.04.5, 2GB, Basic DigitalOcean droplet, with 2GB and Regular Intel 1vCPU, Java 17.
On both machines the java process goes up to 1.2 - 1.5 GB RAM, with full CPU utilization.
On the droplet the example runs in around 80 seconds, which seems quite a long time for the transformation, even for the droplet specs.

Is this the expected behavior? Is there a way to optimize this?
Thanks

@cjmamo cjmamo self-assigned this Oct 1, 2024
@cjmamo
Copy link
Member

cjmamo commented Oct 2, 2024

I'll dig deeper but it's possible that this is happening only on the first execution since the DFDL schema has to be compiled before the EDIFACT document can be ingested. Does resource utilisation remain that high while processing a second document?

@dorinmun
Copy link
Author

dorinmun commented Oct 2, 2024

Hello, thank you for your attention, @cjmamo!

Simply tested by changing the example and processing for a second time the same message within the main method.
With the M2, 24GB two messages are completed in 18 seconds, while the RAM usage hits 3GB.

Also tested by adjusting the edifact:parser element in the
smooks-config.xml and adding the following attributes: cacheOnDisk="true" and validationMode="Off".

RAM usage, completion time:

  • cacheOnDisk="true" - 750-850MB, 27-28 sec
  • validationMode="Off" - 3GB, 17 sec
  • cacheOnDisk="true" and validationMode="Off" - 800-850MB, 27-28 sec

It seems that for the first message the used RAM is somewhere at 750MB and then for the second an additional 50-100MB are used.
The increased processing time seems counter intuitive, but that is the result.

Any way to have a precompiled/cached schema so that first message would benefit?

In my particular case, I would use smooks by passing a single message at time to stdin and get the output from stdout. While this is not the most elegant, it would work. As such, there are no subsequent messages that would benefit from initial compilation, or caching, or any optimization that would become available after a first message instance is processed.

If you have any other hints on what and how to further test this I would be very happy to test.

Thanks

@cjmamo
Copy link
Member

cjmamo commented Oct 2, 2024

Which version of the EDIFACT cartridge are you running? I'm getting very different numbers over here.

@dorinmun
Copy link
Author

dorinmun commented Oct 2, 2024

2.0.0-RC1, as specified in the pom.xml.

Switching to 2.0.0-RC4 causes mvn clean package to fail with:

Caused by: org.xml.sax.SAXParseException; cvc-complex-type.3.2.2: Attribute 'schemaURI' is not allowed to appear in element 'edifact:parser'.

@cjmamo
Copy link
Member

cjmamo commented Oct 2, 2024

I'm getting a 404 from following that link. There were performance improvements since RC1 so you should run the example from the v2 tag: https://github.com/smooks/smooks-examples/tree/v2 :

image

schemaURI was renamed to schemaUrias noted in the breaking change release notes of RC3: https://github.com/smooks/smooks-edi-cartridge/releases/tag/v2.0.0-RC3

@cjmamo
Copy link
Member

cjmamo commented Oct 4, 2024

Was the issue resolved? Can this be closed?

@dorinmun
Copy link
Author

dorinmun commented Oct 4, 2024

I experience the issue with v2, as well.
I was looking into profiling the example app so I could provide a bit more info but as of now I have no additional data to provide.

@dorinmun
Copy link
Author

dorinmun commented Oct 7, 2024

image [app.jfr.zip](https://github.com/user-attachments/files/17284325/app.jfr.zip)

@cjmamo
Here is a JProfiler view of the Java Flight Recorder data for running the example edifact-to-xml app. Compressed JFR file attached.

@cjmamo
Copy link
Member

cjmamo commented Oct 9, 2024

I noticed that the reader pool size by default is 0 which means that Smooks constructs a new reader every time it filters the input. Can you set this to 1 and observe how it performs when filtering multiple inputs?

<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"
                      xmlns:core="https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd"
                      xmlns:edifact="https://www.smooks.org/xsd/smooks/edifact-2.0.xsd">

  <core:filterSettings readerPoolSize="1"/> 

  ...
  ...

</smooks-resource-list>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants