AI Infrastructure: From “General” to “Purpose-Built”

Deepti Chandra, VP Product and Marketing

Julissa Benavente, Product Strategy & Alliances

How Upscale AI is building AI-native networking

Most AI clusters still rely on networks built for general-purpose computing. These networks were designed for north-south traffic, bursty workloads, and asynchronous applications. This is not a minor inefficiency – the limitation is structural.

AI behaves very differently.

Workloads are synchronized, not asynchronous. Modern workloads such as large-scale model training, mixture-of-experts architectures, and distributed inference place extreme synchronization pressure on the network. Training moves gradients across thousands of GPUs in tightly synchronized waves. Inference creates massive fan-out with strict latency requirements. When the network cannot keep up, GPUs stall, tail latency grows, and cluster efficiency collapses.

This is not a tuning problem.
It is an architectural mismatch.

Upscale AI was founded around a simple premise: AI infrastructure requires a network built specifically for AI.

Why General-Purpose Networks Break at AI Scale

For decades, networking platforms were designed as one-size-fits-all systems serving enterprises, service providers, and general data centers. Over time, these platforms accumulated layers of legacy across silicon, systems, and software.

When applied to AI environments, the result is predictable: a square-peg-in-a-round-hole architecture.

The complexity that once supported many workloads now becomes friction. Deterministic communication and synchronized GPU collectives push traditional networking beyond its design limits.

AI clusters require something different: a network engineered for deterministic, synchronized, high-throughput communication at scale. You cannot simply tune your way out of legacy limitations. AI networking must be built from the ground up for the specific demands of scale-up and scale-out connectivity.

Building an AI-Native Network

AI infrastructure operates at two distinct layers:

Rack-scale GPU connectivity (scale-up)
Cluster-scale fabric connectivity (scale-out)

Both must work together to keep thousands of GPUs operating as a single distributed compute engine.

Upscale AI addresses both halves of this networking equation with purpose-built architecture:

The Two Pillars of the Upscale AI Architecture

SkyHammer™: Rack-Scale AI Interconnect (Scale-Up)

SkyHammer™ is Upscale AI’s silicon architecture designed for ultra-low-latency GPU / XPU connectivity within the rack based on open standards.

It enables GPUs and XPUs to operate as a tightly synchronized compute engine by delivering deterministic communication and eliminating latency and synchronization bottlenecks that cause GPUs to idle during collective operations.

The result is higher cluster efficiency and predictable performance for large-scale training workloads.

Open Ethernet: Cluster-Scale AI Fabric (Scale-Out)

At cluster scale, AI systems require openness, interoperability, and massive bandwidth.

Upscale AI delivers AI-optimized Open Ethernet fabrics powered by NVIDIA Spectrum-X switch silicon. These systems connect thousands of GPUs into a unified high-performance fabric capable of supporting distributed training and large-scale inference.

A Full-Stack AI Networking Platform

Purpose-built AI networking requires more than fast switches.

It demands tight integration across silicon, systems, and software.

Through its collaboration with NVIDIA, Upscale AI integrates Spectrum-X switching with an AI-optimized SONiC software stack designed for large-scale AI deployments.

Operating large AI clusters requires continuous visibility into congestion, synchronization behavior, and GPU utilization across the fabric.

This includes:

High-performance RDMA networking
Adaptive congestion management
GPU-aware telemetry and observability
Real-time operational visibility across the fabric

Together, these capabilities enable the deterministic networking required to operate modern AI clusters.

Toward the AI Factory

AI infrastructure is evolving from experimental clusters to production-scale systems. General-purpose networks were not built for this environment.

As this transition accelerates, the network is becoming the backbone of the AI factory.
It must be designed for AI from the start.

Upscale AI was built from day one to deliver that foundation.

Join us at GTC (booth #7037) to see it in action.