Distributed Setup is taking up a huge amount of memory #1402
Labels
Distributed
Issues related to all things distributed
need-user-input
The issue needs more information from the reporter before moving forward
Hello,
I am running a distributed setup to perform inference with an 8-billion parameter LLaMA model. Despite expecting the workload to fit within two machines (each with 16GB of memory), I had to utilize four machines to avoid memory issues. Even after removing the initialization of the KV cache, for some passes the memory usage still exceeded 9GB per machine.
Could you please help identify potential reasons for this behavior, or let me know if there is something I might be overlooking in the setup?
Thank you!
The text was updated successfully, but these errors were encountered: