Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recognize and issue error if GPU does not support bf16 #1344

Closed
wants to merge 3 commits into from

Conversation

mikekgfb
Copy link
Contributor

@mikekgfb mikekgfb commented Nov 5, 2024

Address #1298 which causes models using bf16 as dtype to fail on T4 (and other pre-9.0 arch level GPUs) by selecting an alternate dtype when possible, and issue a clear error describing the issue otherwise

Address pytorch#1298 which causes models to fail on T4 (and other pre-9.0 arch level GPUs) by selecting an alternate dtype when possible, and issue an error otherwise
Copy link

pytorch-bot bot commented Nov 5, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1344

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures

As of commit fd0f53e with merge base 9480258 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 5, 2024
@mikekgfb
Copy link
Contributor Author

mikekgfb commented Nov 6, 2024

Because we emulate BF16 on pre-V9.0 CUDA architecture, this test is overly restrictive. In a nutshell, the only problem are a small set of functions, like torchao's linear:int4 operator (xref: pytorch/ao#1110), that don't emulate FP16. Given the general posture of PyTorch, emulation for those operators, rather than issuing an error, would be the way to go (or conversely, limit an error to a more specific architecture check in the linear:int4 transformation).

@mikekgfb
Copy link
Contributor Author

mikekgfb commented Nov 6, 2024

Closing, as this PR implements an overly restrictive check given the emulation of BF16 for older architectures.

@mikekgfb mikekgfb closed this Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants