an error occur(module ‘torch.distributed’ has no attribute ‘ReduceOp’]) #2674

fangbaolei · 2025-01-09T07:51:18Z

jetson agx orin 64G version
Platform Serial Number: [s|XX CLICK TO READ XXX]
Machine: aarch64 Hardware
System: Linux Model: NVIDIA Jetson AGX Orin Developer Kit
Distribution: Ubuntu 22.04 Jammy Jellyfish 699-level Part Number: 699-13701-0005-500 M.0
Release: 5.15.148-tegra P-Number: p3701-0005
Python: 3.10.12 Module: NVIDIA Jetson AGX Orin (64GB ram)
SoC: tegra234
Libraries CUDA Arch BIN: 8.7
CUDA: 12.6.68 L4T: 36.4.0
cuDNN: 9.3.0.75 Jetpack: 6.1
TensorRT: 10.3.0.30
VPI: 3.2.4 Hostname: ubuntu
Vulkan: 1.3.204 Interfaces
OpenCV: 4.8.0 with CUDA: NO

TensorRT-LLM version ：0.12.0-jetson

try multimodal vila demo
when setup enviroment (install deepspeed ), an error occur(module ‘torch.distributed’ has no attribute ‘ReduceOp’])
https://forums.developer.nvidia.com/t/module-torch-distributed-has-no-attribute-reduceop/256581/5 this tell need pytorch 1.11
but TensorRT-LLM v0.12.0-jetson branch need run on JetPack 6.1 which need pytorch 2.5

fangbaolei · 2025-01-09T14:03:40Z

slove the problem using torch 2.5.0 (not 2.5.0a0+872d972e41.nv24.8), but new erro as below:
python3 build_visual_engine.py --model_path tmp/hf_models/${MODEL_NAME} --model_type vila --vila_path ${VILA_PATH} # for VILA

/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
[TensorRT-LLM] TensorRT-LLM version: 0.12.0
[2025-01-09 13:56:09,944] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-01-09 13:56:10,123] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/alpha/work/multimodal/VILA/llava/model/qlinear_te.py:95: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
@amp.custom_fwd(cast_inputs=torch.bfloat16)
/home/alpha/work/multimodal/VILA/llava/model/qlinear_te.py:147: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
def backward(ctx, grad_output):
/home/alpha/work/multimodal/VILA/llava/model/llava_arch.py:113: UserWarning: model_dtype not found in config, defaulting to torch.float16.
warnings.warn("model_dtype not found in config, defaulting to torch.float16.")
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.30s/it]
Traceback (most recent call last):
File "/home/alpha/work/multimodal/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 12, in
builder.build()
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/tools/multimodal_builder.py", line 85, in build
build_vila_engine(args)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/tools/multimodal_builder.py", line 391, in build_vila_engine
model = AutoModel.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
return model_class.from_pretrained(
File "/home/alpha/work/multimodal/VILA/llava/model/language_model/llava_llama.py", line 67, in from_pretrained
return cls.load_pretrained(
File "/home/alpha/work/multimodal/VILA/llava/model/llava_arch.py", line 132, in load_pretrained
vlm = cls(config, *args, **kwargs)
File "/home/alpha/work/multimodal/VILA/llava/model/language_model/llava_llama.py", line 49, in init
self.init_vlm(config=config, *args, **kwargs)
File "/home/alpha/work/multimodal/VILA/llava/model/llava_arch.py", line 74, in init_vlm
self.llm, self.tokenizer = build_llm_and_tokenizer(llm_cfg, config, *args, **kwargs)
File "/home/alpha/work/multimodal/VILA/llava/model/language_model/builder.py", line 203, in build_llm_and_tokenizer
tokenizer.stop_tokens = infer_stop_tokens(tokenizer)
File "/home/alpha/work/multimodal/VILA/llava/utils/tokenizer.py", line 176, in infer_stop_tokens
template = tokenize_conversation(DUMMY_CONVERSATION, tokenizer, overrides={"gpt": SENTINEL_TOKEN})
File "/home/alpha/work/multimodal/VILA/llava/utils/tokenizer.py", line 110, in tokenize_conversation
text = tokenizer.apply_chat_template(
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1803, in apply_chat_template
chat_template = self.get_chat_template(chat_template, tools)
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1967, in get_chat_template
raise ValueError(
ValueError: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating

fangbaolei · 2025-01-10T03:18:00Z

solved by this issue(NVlabs/VILA#160), but the model result not good

python3 run.py
--max_new_tokens 100
--hf_model_dir tmp/hf_models/${MODEL_NAME}
--visual_engine_dir tmp/trt_engines/${MODEL_NAME}/vision_encoder
--llm_engine_dir tmp/trt_engines/${MODEL_NAME}/fp16/1-gpu
--image_path=cat.jpg
--input_text="\n\n Please elaborate what you see in the images?"
--batch_size=1

result:
Q] \n\n Please elaborate what you see in the images?
[01/10/2025-02:59:42] [TRT-LLM] [I]
[A]: ["nThe main focus of the image is a close-up of a person's hand."]
[01/10/2025-02:59:42] [TRT-LLM] [I] Generated 19 tokens

fangbaolei closed this as completed Jan 9, 2025

fangbaolei reopened this Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

an error occur(module ‘torch.distributed’ has no attribute ‘ReduceOp’]) #2674

an error occur(module ‘torch.distributed’ has no attribute ‘ReduceOp’]) #2674

fangbaolei commented Jan 9, 2025

fangbaolei commented Jan 9, 2025

fangbaolei commented Jan 10, 2025

an error occur(module ‘torch.distributed’ has no attribute ‘ReduceOp’]) #2674

an error occur(module ‘torch.distributed’ has no attribute ‘ReduceOp’]) #2674

Comments

fangbaolei commented Jan 9, 2025

fangbaolei commented Jan 9, 2025

fangbaolei commented Jan 10, 2025