Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError when running example scripts for Llama #1145

Open
Noblezhong opened this issue Oct 27, 2024 · 1 comment
Open

AssertionError when running example scripts for Llama #1145

Noblezhong opened this issue Oct 27, 2024 · 1 comment

Comments

@Noblezhong
Copy link

Hi, when I try to run the pippy_llama.py in this repo, it show that there is a bug:

root@6e61f182b97b:/zt/code/my_dev# torchrun --nproc-per-node 4 pippy_llama.py
W1027 12:28:26.326000 25180 torch/distributed/run.py:793] 
W1027 12:28:26.326000 25180 torch/distributed/run.py:793] *****************************************
W1027 12:28:26.326000 25180 torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W1027 12:28:26.326000 25180 torch/distributed/run.py:793] *****************************************
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:14<00:00,  7.45s/it]
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
      )
    )
    (norm): LlamaRMSNorm((4096,), eps=1e-05)
    (rotary_emb): LlamaRotaryEmbedding()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:15<00:00,  7.53s/it]
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
      )
    )
    (norm): LlamaRMSNorm((4096,), eps=1e-05)
    (rotary_emb): LlamaRotaryEmbedding()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:17<00:00,  8.63s/it]
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
      )
    )
    (norm): LlamaRMSNorm((4096,), eps=1e-05)
    (rotary_emb): LlamaRotaryEmbedding()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:17<00:00,  8.60s/it]
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
      )
    )
    (norm): LlamaRMSNorm((4096,), eps=1e-05)
    (rotary_emb): LlamaRotaryEmbedding()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)
layers_per_rank = 8
layers_per_rank = 8
layers_per_rank = 8
layers_per_rank = 8
[rank3]: Traceback (most recent call last):
[rank3]:   File "/zt/code/my_dev/pippy_llama.py", line 36, in <module>
[rank3]:     pipe = pipeline(llama, mb_args=(mb_inputs["input_ids"],))
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/pipelining/_IR.py", line 1238, in pipeline
[rank3]:     return Pipe.from_tracing(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/pipelining/_IR.py", line 1051, in from_tracing
[rank3]:     pipe = Pipe._from_traced(
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/pipelining/_IR.py", line 750, in _from_traced
[rank3]:     new_submod = _outline_submodules(submodule.graph)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/pipelining/_unflatten.py", line 24, in _outline_submodules
[rank3]:     ).run_outer()
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/export/unflatten.py", line 1014, in run_outer
[rank3]:     self.run_from(node_idx)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/export/unflatten.py", line 1094, in run_from
[rank3]:     ).run_from(node_idx)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/export/unflatten.py", line 1094, in run_from
[rank3]:     ).run_from(node_idx)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/export/unflatten.py", line 1043, in run_from
[rank3]:     self.finalize_outputs()
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/export/unflatten.py", line 993, in finalize_outputs
[rank3]:     _verify_graph_equivalence(self.cached_graph_module, self.module)
[rank3]:   File "/usr/local/lib/python3.10/dist-packages/torch/export/unflatten.py", line 655, in _verify_graph_equivalence
[rank3]:     assert graph_dump(x.graph) == graph_dump(y.graph)
[rank3]: AssertionError
[rank0]: Traceback (most recent call last):
[rank0]:   File "/zt/code/my_dev/pippy_llama.py", line 36, in <module>
[rank0]:     pipe = pipeline(llama, mb_args=(mb_inputs["input_ids"],))
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/pipelining/_IR.py", line 1238, in pipeline
[rank0]:     return Pipe.from_tracing(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/pipelining/_IR.py", line 1051, in from_tracing
[rank0]:     pipe = Pipe._from_traced(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/pipelining/_IR.py", line 750, in _from_traced
[rank0]:     new_submod = _outline_submodules(submodule.graph)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/pipelining/_unflatten.py", line 24, in _outline_submodules
[rank0]:     ).run_outer()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/export/unflatten.py", line 1014, in run_outer
[rank0]:     self.run_from(node_idx)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/export/unflatten.py", line 1094, in run_from
[rank0]:     ).run_from(node_idx)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/export/unflatten.py", line 1094, in run_from
[rank0]:     ).run_from(node_idx)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/export/unflatten.py", line 1043, in run_from
[rank0]:     self.finalize_outputs()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/export/unflatten.py", line 993, in finalize_outputs
[rank0]:     _verify_graph_equivalence(self.cached_graph_module, self.module)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/export/unflatten.py", line 655, in _verify_graph_equivalence
[rank0]:     assert graph_dump(x.graph) == graph_dump(y.graph)
[rank0]: AssertionError
[rank0]:[W1027 12:29:03.612855149 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
W1027 12:29:04.124000 25180 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 25245 closing signal SIGTERM
W1027 12:29:04.126000 25180 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 25246 closing signal SIGTERM
W1027 12:29:04.131000 25180 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 25247 closing signal SIGTERM
E1027 12:29:05.080000 25180 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 3 (pid: 25248) of binary: /usr/bin/python
Traceback (most recent call last):
  File "/usr/local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 919, in main
    run(args)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 910, in run
    elastic_launch(
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
pippy_llama.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-10-27_12:29:04
  host      : 6e61f182b97b
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 25248)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

The only change in my code is the directory of the Llama2 model, I have downloaded in the local directory because of network connection error. Also my dev environment is a server with 8 A40 GPUs, and I run this code in Pytorch NGC container, and I upgrade the orginal pytorch version to stable(2.5.0)

@Noblezhong Noblezhong reopened this Nov 3, 2024
@eppane
Copy link

eppane commented Nov 14, 2024

Any updates on this, does this script actually work in some environment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants