Restarting the appliance leaves old podman tmpfs directories causing failures on next boot #116

agrare · 2024-09-10T15:50:27Z

After restarting the appliance subsequent podman runs fail with the following error:
Error: current system boot ID differs from cached boot ID; an unhandled reboot has occurred. Please delete directories \"/tmp/storage-run-988/containers\" and \"/tmp/storage-run-988/libpod/tmp\" and re-run Podman

The issue being podman expects its temp directories to be on a tmpfs filesystem and thus cleared on boot, however the ManageIQ appliance has an XFS formatted logical volume for /tmp so it is persisted across boots.

The text was updated successfully, but these errors were encountered:

Fryguy · 2024-09-10T16:51:54Z

Is there a reason we don't use tmpfs? Maybe @bdunne or @jrafanie know.

agrare · 2024-09-10T17:15:43Z

Yeah that was a surprise to me as well, according to this podman expects its temp dirs to be on tmpfs containers/podman#23193 (comment)

jrafanie · 2024-09-10T17:22:34Z

I assume it was never changed. For debugging, some stuff would be retained in tmp so it might be helpful to not lose it on reboot.

Fryguy · 2024-09-10T17:34:48Z

Interesting - I can't think of what we would put diagnostically in /tmp, except maybe ansible runner things, and those are cleaned up anyway.

jrafanie · 2024-09-10T18:03:53Z

Interesting - I can't think of what we would put diagnostically in /tmp, except maybe ansible runner things, and those are cleaned up anyway.

Right, I think that's the question. If it uses the system tmp by default, things will use it. Ansible was indeed one of the things I was thinking of but even automate. I'm not sure if anything on the system uses it as most should use journal. Does kickstart use it for storing the kickstart or whatever it's applying? I vaguely recall that might be the home directory. I'm not sure if that changed after we stopped running as root. cloud-init? I'm not sure. It should be using journal but not sure if any temporary stuff useful for diagnostics were kept there.

I'm not saying we can't make it tmpfs, only that this was probably the reason we wanted it persisted.

agrare · 2024-09-10T19:07:26Z

It appears that a LV backed /tmp was added here: ManageIQ/manageiq-appliance-build#51

I'm not sure why STIG recommends a persistent /tmp but if that is no longer a requirement then it looks like we'd be safe to delete it.

Honestly a 1GB partition backed /tmp isn't guaranteed to be "more space" than a RAM backed tmpfs anyway with the amount of memory we recommend so I don't think this is a "usage" argument. Also nothing (well designed) should ever depend on /tmp persisting across reboots and I highly doubt ansible would depend on that but we can always check it.

Rebooting in the middle of an ansible-runner execution should be clear and clean on boot as the execution is certainly cut short.

jrafanie · 2024-09-10T19:11:56Z

I'm not sure why STIG recommends a persistent /tmp but if that is no longer a requirement then it looks like we'd be safe to delete it.

It looks like STIG wanted separate filesystems and not specifically persistent.

agrare · 2024-09-10T19:16:20Z

It looks like STIG wanted separate filesystems and not specifically persistent.

👍 now that I 100% agree with but we would have already had that (I hope)

jrafanie · 2024-09-10T19:17:24Z

Also nothing (well designed) should ever depend on /tmp persisting across reboots and I highly doubt ansible would depend on that but we can always check it.

Rebooting in the middle of an ansible-runner execution should be clear and clean on boot as the execution is certainly cut short.

Right. I don't recall anything depending on persistent data in /tmp, only that the diagnostics were there for review if there was a problem. Again, it's been so long and I have a strong feeling the only thing we'd ever lose is diagnostics. I think we've moved a lot of the diagnostics to actual logging through journal so this is less of an issue than even at the time of this PR.

jrafanie · 2024-09-10T19:20:01Z

It looks like STIG wanted separate filesystems and not specifically persistent.

👍 now that I 100% agree with but we would have already had that (I hope)

Before that PR, logs, tmp, home, etc. were all on the / volume. He created separate filesystems for each. I think we're agreeing but want to make it clear it was really about splitting up the different types of data on different volumes with different filesystems and less about the implementation of the volume.

agrare · 2024-09-10T19:23:09Z

Before that PR, logs, tmp, home, etc. were all on the / volume.

Oh interesting, thanks yeah I didn't realize /tmp was just on /...

Fryguy · 2024-09-10T21:14:20Z

https://www.stigviewer.com/stig/red_hat_enterprise_linux_8/2021-12-03/finding/V-230295

Fryguy · 2024-09-10T21:16:57Z

Also nothing (well designed) should ever depend on /tmp persisting across reboots and I highly doubt ansible would depend on that but we can always check it.

We temporarily checkout git repos to /tmp and then ansible-runner in it, so that's not ansible doing it. For diagnostics, you can set an ENV var to tell our Ansible::Runner code to not clean up after itself, so you can see what's in there, but even in that case I'd want tmpfs to go away and cleanup on reboot.

Fryguy · 2024-09-10T21:18:02Z

I'm 100% for /tmp being a separate volume that is tmpfs - and based on the discussion here I'm pretty sure that satisfies STIG as well.

Fryguy · 2024-09-10T21:22:28Z

Strangely, I can't find anything STIG related in our code base - I was pretty sure we had something in appliance console, but maybe that was CloudForms specific? I also don't see anything STIG related in the Appliance Hardening Guide.

That being said, I also recall that we tried to default as much as we could from the STIG and SCAP rules so that we could minimize changes that the customer needed to do themselves.

agrare · 2024-09-13T14:24:43Z

Interestingly there is a podman-clean-transient.service oneshot service but that doesn't cleanup anything in /tmp assuming that they are automatically cleared.

[root@manageiq ~]# runuser --login manageiq --command 'podman system prune --external'
Error: current system boot ID differs from cached boot ID; an unhandled reboot has occurred. Please delete directories "/tmp/storage-run-986/containers" and "/tmp/storage-run-986/libpod/tmp" and re-run Podman

miq-bot · 2024-12-16T00:00:14Z

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

agrare added the bug Something isn't working label Sep 10, 2024

agrare self-assigned this Sep 10, 2024

agrare mentioned this issue Sep 13, 2024

Add a manageiq-podman-cleanup oneshot service ManageIQ/manageiq-appliance#390

Merged

miq-bot added the stale label Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restarting the appliance leaves old podman tmpfs directories causing failures on next boot #116

Restarting the appliance leaves old podman tmpfs directories causing failures on next boot #116

agrare commented Sep 10, 2024

Fryguy commented Sep 10, 2024

agrare commented Sep 10, 2024 •

edited

Loading

jrafanie commented Sep 10, 2024 •

edited

Loading

Fryguy commented Sep 10, 2024

jrafanie commented Sep 10, 2024 •

edited

Loading

agrare commented Sep 10, 2024

jrafanie commented Sep 10, 2024

agrare commented Sep 10, 2024

jrafanie commented Sep 10, 2024

jrafanie commented Sep 10, 2024

agrare commented Sep 10, 2024

Fryguy commented Sep 10, 2024

Fryguy commented Sep 10, 2024

Fryguy commented Sep 10, 2024 •

edited

Loading

Fryguy commented Sep 10, 2024

agrare commented Sep 13, 2024

miq-bot commented Dec 16, 2024

Restarting the appliance leaves old podman tmpfs directories causing failures on next boot #116

Restarting the appliance leaves old podman tmpfs directories causing failures on next boot #116

Comments

agrare commented Sep 10, 2024

Fryguy commented Sep 10, 2024

agrare commented Sep 10, 2024 • edited Loading

jrafanie commented Sep 10, 2024 • edited Loading

Fryguy commented Sep 10, 2024

jrafanie commented Sep 10, 2024 • edited Loading

agrare commented Sep 10, 2024

jrafanie commented Sep 10, 2024

agrare commented Sep 10, 2024

jrafanie commented Sep 10, 2024

jrafanie commented Sep 10, 2024

agrare commented Sep 10, 2024

Fryguy commented Sep 10, 2024

Fryguy commented Sep 10, 2024

Fryguy commented Sep 10, 2024 • edited Loading

Fryguy commented Sep 10, 2024

agrare commented Sep 13, 2024

miq-bot commented Dec 16, 2024

agrare commented Sep 10, 2024 •

edited

Loading

jrafanie commented Sep 10, 2024 •

edited

Loading

jrafanie commented Sep 10, 2024 •

edited

Loading

Fryguy commented Sep 10, 2024 •

edited

Loading