-
Notifications
You must be signed in to change notification settings - Fork 399
add nvfp4 cast benchmarks #3188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Stack from ghstack (oldest at bottom): |
Summary: as titled, extends the cast benchmarks for nvfp4 Test Plan: ```bash // 0.8 TB/s (9% peak bandwidth) python benchmarks/mx_formats/cast_bench.py --mode dim0_nvfp4 // 3.3 TB/s (42% peak bandwidth) python benchmarks/mx_formats/cast_bench.py --mode dim0_nvfp4_triton_swizzle ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 8c5500b ghstack-comment-id: 3410272031 Pull-Request: #3188
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3188
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit c3c9b07 with merge base 30082cb ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| bps = (bytes_r + bytes_w) / (time_us / 1e6) | ||
|
|
||
| elif mode == "dim0_nvfp4": | ||
| to_nvfp4_reference_c = torch.compile(to_nvfp4_reference) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you set fullgraph = True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do in a separate PR
Update [ghstack-poisoned]
Summary:
as titled, extends the cast benchmarks for nvfp4
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: