High RAM/CPU utilization #442

dorinmun · 2024-10-01T11:53:25Z

I am running the edifact-to-xml example and I see what it seems to be very high RAM and CPU utilization. Running it on a Mac with M2 and 24GB under latest macOS, as well as under Ubuntu 22.04.5, 2GB, Basic DigitalOcean droplet, with 2GB and Regular Intel 1vCPU, Java 17.
On both machines the java process goes up to 1.2 - 1.5 GB RAM, with full CPU utilization.
On the droplet the example runs in around 80 seconds, which seems quite a long time for the transformation, even for the droplet specs.

Is this the expected behavior? Is there a way to optimize this?
Thanks

cjmamo · 2024-10-02T04:17:01Z

I'll dig deeper but it's possible that this is happening only on the first execution since the DFDL schema has to be compiled before the EDIFACT document can be ingested. Does resource utilisation remain that high while processing a second document?

dorinmun · 2024-10-02T08:35:01Z

Hello, thank you for your attention, @cjmamo!

Simply tested by changing the example and processing for a second time the same message within the main method.
With the M2, 24GB two messages are completed in 18 seconds, while the RAM usage hits 3GB.

Also tested by adjusting the edifact:parser element in the
smooks-config.xml and adding the following attributes: cacheOnDisk="true" and validationMode="Off".

RAM usage, completion time:

cacheOnDisk="true" - 750-850MB, 27-28 sec
validationMode="Off" - 3GB, 17 sec
cacheOnDisk="true" and validationMode="Off" - 800-850MB, 27-28 sec

It seems that for the first message the used RAM is somewhere at 750MB and then for the second an additional 50-100MB are used.
The increased processing time seems counter intuitive, but that is the result.

Any way to have a precompiled/cached schema so that first message would benefit?

In my particular case, I would use smooks by passing a single message at time to stdin and get the output from stdout. While this is not the most elegant, it would work. As such, there are no subsequent messages that would benefit from initial compilation, or caching, or any optimization that would become available after a first message instance is processed.

If you have any other hints on what and how to further test this I would be very happy to test.

Thanks

cjmamo · 2024-10-02T15:45:21Z

Which version of the EDIFACT cartridge are you running? I'm getting very different numbers over here.

dorinmun · 2024-10-02T17:58:44Z

2.0.0-RC1, as specified in the pom.xml.

Switching to 2.0.0-RC4 causes mvn clean package to fail with:

Caused by: org.xml.sax.SAXParseException; cvc-complex-type.3.2.2: Attribute 'schemaURI' is not allowed to appear in element 'edifact:parser'.

cjmamo · 2024-10-02T18:10:37Z

I'm getting a 404 from following that link. There were performance improvements since RC1 so you should run the example from the v2 tag: https://github.com/smooks/smooks-examples/tree/v2 :

schemaURI was renamed to schemaUrias noted in the breaking change release notes of RC3: https://github.com/smooks/smooks-edi-cartridge/releases/tag/v2.0.0-RC3

cjmamo · 2024-10-04T14:54:26Z

Was the issue resolved? Can this be closed?

dorinmun · 2024-10-04T16:20:14Z

I experience the issue with v2, as well.
I was looking into profiling the example app so I could provide a bit more info but as of now I have no additional data to provide.

dorinmun · 2024-10-07T20:44:26Z

[app.jfr.zip](https://github.com/user-attachments/files/17284325/app.jfr.zip)

@cjmamo
Here is a JProfiler view of the Java Flight Recorder data for running the example edifact-to-xml app. Compressed JFR file attached.

cjmamo · 2024-10-09T14:22:01Z

I noticed that the reader pool size by default is 0 which means that Smooks constructs a new reader every time it filters the input. Can you set this to 1 and observe how it performs when filtering multiple inputs?

<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"
                      xmlns:core="https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd"
                      xmlns:edifact="https://www.smooks.org/xsd/smooks/edifact-2.0.xsd">

  <core:filterSettings readerPoolSize="1"/> 

  ...
  ...

</smooks-resource-list>

cjmamo self-assigned this Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High RAM/CPU utilization #442

High RAM/CPU utilization #442

dorinmun commented Oct 1, 2024

cjmamo commented Oct 2, 2024

dorinmun commented Oct 2, 2024

cjmamo commented Oct 2, 2024

dorinmun commented Oct 2, 2024

cjmamo commented Oct 2, 2024 •

edited

Loading

cjmamo commented Oct 4, 2024

dorinmun commented Oct 4, 2024

dorinmun commented Oct 7, 2024

cjmamo commented Oct 9, 2024 •

edited

Loading

High RAM/CPU utilization #442

High RAM/CPU utilization #442

Comments

dorinmun commented Oct 1, 2024

cjmamo commented Oct 2, 2024

dorinmun commented Oct 2, 2024

cjmamo commented Oct 2, 2024

dorinmun commented Oct 2, 2024

cjmamo commented Oct 2, 2024 • edited Loading

cjmamo commented Oct 4, 2024

dorinmun commented Oct 4, 2024

dorinmun commented Oct 7, 2024

cjmamo commented Oct 9, 2024 • edited Loading

cjmamo commented Oct 2, 2024 •

edited

Loading

cjmamo commented Oct 9, 2024 •

edited

Loading