Broadcom Tomahawk Ultra Redefines Ethernet Switching for AI and HPC Scale-Up
Nov 13, 20251 min read
As generative AI and large language models drive unprecedented growth in compute demands, the limitations of proprietary interconnects like NVIDIA NVLink/NVSwitch and costly InfiniBand networks have become more evident. To meet the industry call for an open, scalable, and cost-effective alternative, Broadcom spent over three years developing the Tomahawk Ultra—a next-generation Ethernet switching ASIC designed to deliver InfiniBand-class latency and lossless performance. By combining ultra-low latency, high bandwidth, and memory interconnect structure capabilities, Tomahawk Ultra redefines Ethernet as the foundation for AI and HPC scale-up networks.

Core Performance: Redefining the Standard for AI Interconnects
Built on TSMC 5nm process, Broadcom Tomahawk Ultra delivers next-generation Ethernet performance optimized for AI and HPC workloads — achieving higher throughput, ultra-low latency, and seamless scalability for GPU-driven environments.
Massive Switching Capacity: 51.2 Tbps throughput, matching Tomahawk 5 but optimized for small-packet efficiency in AI training and inference.
Ultra-Low Latency: Achieves 250 ns port-to-port delay, maintaining line-rate even with 64-byte packets — minimizing communication overhead in distributed AI clusters.
Unmatched Packet Processing: Handles up to 77 billion packets per second (PPS), ideal for frequent, small-packet exchanges in model synchronization and inference feedback.
Scale-Up Ethernet (SUE) Support: Enables sub-400 ns XPU-to-XPU latency, narrowing the gap between Ethernet and InfiniBand for tightly coupled AI workloads.
Seamless Compatibility: Maintains pin compatibility with Tomahawk 5, simplifying platform upgrades, reducing deployment costs, and accelerating AI infrastructure evolution.

Architectural Innovation: Redefining Ethernet for True Zero Packet Loss
Building on its performance breakthroughs, Tomahawk Ultra introduces a new level of architectural refinement that fundamentally redefines the role of Ethernet in AI-scale networking.
Optimized Ethernet Header: Tomahawk Ultra reduces the traditional 46-byte Ethernet header to 10 bytes, minimizing metadata overhead and boosting efficiency for small-packet workloads common in AI training and gradient synchronization.
Lossless Flow Control: Incorporates Link Layer Retransmission (LLR) and Credit-Based Flow Control (CBFC) to achieve true zero-packet-loss performance, even under high throughput — eliminating congestion-related data loss common in traditional Ethernet.
Open Scale-Up Ethernet Standard: Introduces Broadcom SUE — an open alternative to NVLink and UALink — supporting up to 1,024 accelerators. Its variant, SUE-Lite, offers flexible, cost-efficient options for diverse AI architectures.
Expanded Buffer and Traffic Management: Features a larger, adaptive buffer to handle microbursts and maintain stable data delivery in distributed AI training, preventing network stalls and improving reliability.
Application Scenario: Scale-Up Supernode Rack Architecture
Scale-up interconnects typically consist of multiple parallel interconnect planes, each of which consists of dedicated HBD (High bandwidth domain) switches. There are dedicated links between each GPU and all HBD switches. In the case of a single server node from Nvidia, such as the A100, as shown in the figure below, the GPU and NVSwitch are directly connected through the server motherboard.

The following figure shows the scale-up networking architecture of the NVL72 hypernode (single rack). Each B200 is connected to 18 NVSwitches.

The Tomahawk Ultra is purpose-built for scale-up supernode rack architectures, where it acts as the high-speed communication backbone for tightly coupled AI training clusters. Within these rack-level environments, it serves as a communication controller connecting dozens to hundreds of adjacent accelerators — such as GPUs, TPUs, or custom AI processors — and managing massive volumes of intra-rack data traffic with exceptional precision and efficiency.
Leveraging the scale-up Ethernet framework, Tomahawk Ultra enables direct connectivity for up to 1,024 accelerators within a single rack, a scale far beyond the 72-GPU limitation of NVIDIA NVLink Switch. This capability empowers data center architects to build ultra-dense AI compute racks, maximizing performance per rack unit while simplifying the physical and logical interconnect design.
The differences between scale-up and scale-out can refer to our earlier blog post: Scale-Up vs. Scale-Out in AI Infrastructure
Tomahawk Ultra vs. NVLink Switch: Quick Overview
Positioned as a direct competitor to NVIDIA NVLink switch, Broadcom Tomahawk Ultra takes a different approach to AI interconnect design — emphasizing openness, scalability, and Ethernet-native performance.
Scalability Advantage: Supports up to 1,024 accelerators per rack, far exceeding the NVLink switch 72-GPU limit, enabling larger and more flexible AI cluster deployments.
Lower Latency: With 250 ns switching delay (64B packets), Tomahawk Ultra achieves faster inter-node communication, reducing synchronization time and accelerating model convergence.
Open Compatibility: Based on open Ethernet standards, it supports a wide range of accelerators — including Google TPUs and third-party AI chips — unlike a proprietary NVIDIA-only ecosystem.
Higher Bandwidth: Delivers 51.2 Tbps switching capacity, nearly double that of NVIDIA's fifth-generation NVLink Switch (28.8 Tbps), offering higher throughput and more efficient network scaling.
The following table compares the core parameters of the Tomahawk Ultra switch and the NVLink switch:
Comparison Item | Broadcom Tomahawk Ultra | NVIDIA NVLink Switch |
Manufacturing Process | TSMC 5nm | TSMC 7nm |
Switching Capacity | 51.2 Tbps | 28.8 Tbps |
Switching Latency | 250 ns (64B packet) | 9–18 μs (depending on connection type) |
Maximum Connectivity | Up to 1,024 accelerators | Up to 72 GPUs (single chassis) |
Protocol Standard | Scale-Up Ethernet (SUE) | Proprietary NVLink protocol |
Data Processing Capability | 77 B/s (770 billion bytes per second) | Not disclosed |
Market Impact: Breaking Monopolies and Driving an Open Ecosystem
The launch of Tomahawk Ultra has significantly reshaped the AI chip interconnect landscape, offering AI infrastructure providers more choices and challenging NVIDIA’s dominance in the AI networking space.
Broadcom Senior VP Ram Velaga emphasized that Tomahawk Ultra is designed to compete with NVIDIA NVLink Switch, asserting that Ethernet can deliver even faster performance. “Using the same technology across all parts of the network offers huge benefits,” said Del Vecchio, “and Ethernet provides advantages in monitoring, telemetry, and debugging tools.” This stance highlights Broadcom’s commitment to an open ecosystem, contrasting sharply with NVIDIA's proprietary protocols.
In practice, Tomahawk Ultra has already begun shipping to customers, including supporting Google in producing its AI chips. This collaboration demonstrates the practical value of switching in building heterogeneous AI infrastructures, enabling interoperability with accelerators from different vendors and giving AI practitioners greater flexibility.
Looking ahead, the open-standard of Tomahawk Ultra design may accelerate the standardization of AI interconnect technology, reducing costs and complexity in AI infrastructure.
FS 51.2 Tb/s AI Switch: Lossless Networking Powered by Broadcom Tomahawk 5
The FS N9600‑64OD 800G AI switch, powered by the Broadcom Tomahawk 5 chip, delivers the same ultra-high bandwidth of 51.2 Tb/s as the Tomahawk Ultra chip, while supporting sub-microsecond latency and zero packet loss. Built for Spine/Leaf architecture, it is suitable for key business network core deployment, such as AI cluster, high-performance computing, and distributed storage.

Extreme performance core: Tomahawk 5 delivers up to 51.2 Tb/s throughput with ~1 µs latency (64B), meeting the demands of high-concurrency, high-density data center traffic.
800GbE Ultra Speed Ports with Low Power Consumption: Reduce costs on both space and total power utilization in an AI data center with 64 ports of 800/400/200/100GbE on a 2U switch, supporting 2x 400GbE, 4x 200GbE, or 8x 100GbE breakout for flexible GPU connectivity.
Redundancy and intelligent cooling: 1+1 hot-swappable AC power, 3+1 hot-swappable front/back fans with temperature-based dynamic control, ensuring efficient heat dissipation, enhanced system stability, extended lifespan, and reduced maintenance costs.
Lossless, low-latency networking: Supports RoCEv2 for RDMA-based interconnects, Priority Flow Control (PFC) for zero-loss queues, and Explicit Congestion Notification (ECN) for congestion avoidance.
Automated RoCE Deployment: RoCE EasyDeploy simplifies lossless network deployment by automating PFC and ECN configuration, reducing manual effort and minimizing errors
Adaptive traffic optimization: Global Load Balancing (GLB) improves service availability, reduces congestion, and minimizes network latency.
Full lifecycle automation: AmpCon-DC Management Platform offers Day 0 to Day 2+ capabilities to manage PicOS® Ethernet AI switches, enabling provisioning, monitoring, troubleshooting, and maintenance for higher resource utilization and lower opex.
Additionally, built on H200 GPU servers, the 800G RoCE lossless network solution leverages N9600-64OD as both spine and leaf to deliver a scalable, high-bandwidth 800G backbone—enabling low-latency, lossless communication optimized for intensive AI training workloads.

Conclusion
The launch of Broadcom Tomahawk Ultra marks a major leap forward in Ethernet innovation, setting a new benchmark for AI and HPC interconnects with its ultra-low latency, zero packet loss, and memory fabric capabilities. As the industry shifts toward scalable, Ethernet-based AI infrastructures, FS stands at the forefront of this transformation.
Powered by Broadcom Tomahawk 3/4/5 chips, FS PicOS® Ethernet AI switches deliver unmatched performance for high-density GPU clusters and AI training networks. Combined with end-to-end AI data center solutions, FS empowers enterprises to build scalable, efficient, and future-ready AI infrastructures. Explore our products and solutions today and contact our experts to accelerate your journey toward next-generation AI performance.