Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate arkouda.random.Generator.shuffle() function performance #3949

Open
ajpotts opened this issue Dec 19, 2024 · 2 comments
Open

Investigate arkouda.random.Generator.shuffle() function performance #3949

ajpotts opened this issue Dec 19, 2024 · 2 comments
Assignees
Labels
User Reported A user submitted the issue

Comments

@ajpotts
Copy link
Contributor

ajpotts commented Dec 19, 2024

A customer reported : I have been using the arkouda.random.Generator.shuffle() function and I notice its been a bottleneck in my computational speed, and it has not scaled up well with nnz or compute nodes.

@bengionz

@ajpotts ajpotts self-assigned this Dec 19, 2024
@ajpotts
Copy link
Contributor Author

ajpotts commented Dec 30, 2024

From @e-kayrakli on slack:
On a quick look, Chapel’s randomStream.shuffle is serial, and Arkouda’s is built on top of that. A relatively quick, but memory-heavy, solution could be to use fillRandom to create a random index mapping array, and than move data in parallel, ideally using aggregators, based off of that. That is, Arkouda’s shuffle could use a server implementation like that.

@ajpotts ajpotts added the User Reported A user submitted the issue label Dec 30, 2024
ajpotts added a commit to ajpotts/arkouda that referenced this issue Dec 31, 2024
@ajpotts
Copy link
Contributor Author

ajpotts commented Dec 31, 2024

One option would be distributed MergeSort: https://arxiv.org/pdf/1508.03167

ajpotts added a commit to ajpotts/arkouda that referenced this issue Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
User Reported A user submitted the issue
Projects
None yet
Development

No branches or pull requests

1 participant