Published Date : 17/10/2025
Meta shared details about its AI and networking advances at this week’s 2025 OCP Global Summit in San Jose, Calif. Facebook was a founding member of the Open Compute Project (OCP) in 2011, and its now-parent company Meta has used the annual conference to showcase systems it’s developing to stay on the bleeding edge of technology. At this year’s event, that meant detailing Meta’s AI and networking advances.
“The advent of AI has changed all our assumptions on how to scale our infrastructure. Building infrastructure for AI requires innovation at every layer of the stack, from hardware and software, to our networks, to our data centers themselves,” wrote Yee Jiun Song, vice president, and Kaushik Veeraraghavan, software engineer, Infra Foundation for Meta, in a blog post about Meta’s AI networking efforts.
One of Meta’s themes over the years has been to support open systems development, and it’s continuing that effort. “We have a long way to go in continuing to push open standards. We need standardization of systems, racks and power as rack power density continues to increase. We need standardization of the scale up and scale out network that these AI clusters use so that customers can mix/match different GPUs and accelerators to always use the latest and more cost-effective hardware,” Song and Veeraraghavan wrote.
“We need software innovation and standards to allow us to run jobs across heterogeneous hardware types that may be spread in different geographic locations. These open standards need to exist all the way through the stack, and there are massive opportunities to eliminate friction that is slowing down the build out of AI infrastructure.”
As part of its standardization efforts, Meta said it would be a key player in the new Ethernet for Scale-Up Networking (ESUN) initiative that brings together AMD, Arista, ARM, Broadcom, Cisco, HPE Networking, Marvell, Microsoft, NVIDIA, OpenAI, and Oracle to advance the networking technology to handle the growing scale-up domain for AI systems. ESUN will focus solely on open, standards-based Ethernet switching and framing for scale-up networking—excluding host-side stacks, non-Ethernet protocols, application-layer solutions, and proprietary technologies. The group will focus on the development and interoperability of XPU network interfaces and Ethernet switch ASICs for scale-up networks.
ESUN will actively engage with other organizations such as the Ultra-Ethernet Consortium (UEC) and long-standing IEEE 802.3 Ethernet to align open standards, incorporate best practices, and accelerate innovation.
The launch of ESUN is just one of the AI networking developments Meta shared at the event. Meta engineers also announced three data center networking innovations aimed at making its infrastructure more flexible, scalable, and efficient:
- The evolution of Meta’s Disaggregated Scheduled Fabric (DSF) to support scale-out interconnect for large AI clusters that span entire data center buildings.
- A new Non-Scheduled Fabric (NSF) architecture based entirely on shallow-buffer, disaggregated Ethernet switches that will support Meta’s largest AI clusters like Prometheus.
- The addition of Minipack3N, based on Nvidia’s Ethernet Spectrum-4 ASIC, to Meta’s portfolio of 51Tbps OCP switches that use OCP’s Switch Abstraction Interface and Meta’s Facebook Open Switching System (FBOSS) software stack.
DSF is Meta’s open networking fabric that completely separates switch hardware, NICs, endpoints, and other networking components from the underlying network and uses OCP-SAI and FBOSS to achieve that. It supports Ethernet-based RoCE RDMA over Converged Ethernet (RoCE/RDMA)) to endpoints, accelerators, and NICs from multiple vendors, such as Nvidia, AMD, and Broadcom, including its own MTIA/accelerator stack. It then uses scheduled fabric techniques between endpoints, particularly Virtual Output Queuing for traffic scheduling to proactively avoid congestion rather than just reacting to it.
“Over the last year, we have evolved DSF to a 2-stage architecture, scaling to support a non-blocking fabric that interconnects up to 18,432 XPUs,” wrote a group of Meta engineers in a co-authored blog post about the new advances. “These clusters are a fundamental building block for constructing AI clusters that span regions (and even multiple regions) in order to meet the increased capacity and performance demands of Meta’s AI workloads.”
To its DSF architecture, Meta has added a new architecture called the Non-Scheduled Fabric (NSF), which it says is based on shallow-buffer OCP Ethernet switches to deliver low round-trip latency, the engineers wrote. NSF architecture is a three-tier fabric that supports adaptive routing for effective load-balancing. This helps minimize congestion and ensure optimal utilization of GPUs, which is critical for maximizing performance in Meta’s largest AI factories. “NSF supports adaptive routing for effective load-balancing, ensuring optimal utilization and minimizing congestion and serves as a foundational building block for Gigawatt-scale AI clusters such as Meta’s Gigawatt-scale AI cluster, Prometheus.”
Going forward, Meta will utilize both DSF and NSF depending on needs. So, for example, DSF will provide a high-efficiency, highly scalable network for large, but still modular, AI clusters, while NSF will be targeted at the extreme demands of its largest, gigawatt-scale AI factories such as Prometheus, where low latency and robust adaptive routing are paramount.
Meta targeted the optical networking world as well. Last year, it introduced 2x400G FR4 BASE (3-km) optics, the primary solution supporting next-generation 51T platforms across both backend and frontend networks and DSFs. These optics have now been widely deployed throughout Meta’s data centers. This year, Meta is expanding its portfolio with the launch of 2x400G FR4 LITE (500-m) optics. FR4 LITE is optimized for the majority of intra-data center use cases, supporting fiber links up to 500 meters. This new variant is designed to accelerate optics cost reduction while maintaining robust performance for shorter-reach applications.
In addition, Meta added the 400G DR4 OSFP-RHS optics — its first-generation DR4 package for AI host-side NIC connectivity. Complementing this, the new 2x400G DR4 OSFP optics are being deployed on the switch side, providing connectivity from host to switch.
Q: What is the Open Compute Project (OCP)?
A: The Open Compute Project (OCP) is an open-source initiative aimed at designing and sharing the most efficient data center hardware and software solutions. It was founded by Facebook in 2011 and has since grown to include many major technology companies.
Q: What is the ESUN initiative?
A: The Ethernet for Scale-Up Networking (ESUN) initiative is a collaboration among major tech companies, including Meta, to advance open, standards-based Ethernet switching and framing for scale-up networking to support the growing demands of AI systems.
Q: What is Meta’s Disaggregated Scheduled Fabric (DSF)?
A: Meta’s Disaggregated Scheduled Fabric (DSF) is an open networking fabric that separates switch hardware, NICs, endpoints, and other networking components from the underlying network. It uses OCP-SAI and FBOSS to achieve this and supports Ethernet-based RoCE RDMA to endpoints, accelerators, and NICs from multiple vendors.
Q: What is the Non-Scheduled Fabric (NSF) architecture?
A: The Non-Scheduled Fabric (NSF) is a new architecture based on shallow-buffer OCP Ethernet switches that delivers low round-trip latency. It supports adaptive routing for effective load-balancing and is designed for Meta’s largest AI clusters, such as Prometheus.
Q: What are the new optical networking solutions introduced by Meta?
A: Meta introduced 2x400G FR4 LITE (500-m) optics, optimized for intra-data center use cases, and 400G DR4 OSFP-RHS optics for AI host-side NIC connectivity. These solutions are designed to reduce costs and maintain performance for shorter-reach applications.