US20200322287A1 - Switch-managed resource allocation and software execution - Google Patents
Switch-managed resource allocation and software execution Download PDFInfo
- Publication number
- US20200322287A1 US20200322287A1 US16/905,761 US202016905761A US2020322287A1 US 20200322287 A1 US20200322287 A1 US 20200322287A1 US 202016905761 A US202016905761 A US 202016905761A US 2020322287 A1 US2020322287 A1 US 2020322287A1
- Authority
- US
- United States
- Prior art keywords
- switch
- rack
- memory
- server
- vee
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/34—Network arrangements or protocols for supporting network services or applications involving the movement of software or configuration parameters
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/46—Interconnection of networks
- H04L12/4641—Virtual LANs, VLANs, e.g. virtual private networks [VPN]
- H04L12/4645—Details on frame tagging
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/12—Avoiding congestion; Recovering from congestion
- H04L47/125—Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/32—Flow control; Congestion control by discarding or delaying data units, e.g. packets or frames
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/22—Parsing or analysis of headers
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/4557—Distribution of virtual machine instances; Migration and load balancing
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45583—Memory management, e.g. access or allocation
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45595—Network integration; Enabling network access in virtual machine instances
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/35—Switches specially adapted for specific applications
- H04L49/356—Switches specially adapted for specific applications for storage area networks
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0227—Filtering policies
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0272—Virtual private networks
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/16—Implementing security features at a particular protocol layer
- H04L63/166—Implementing security features at a particular protocol layer at the transport layer
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/16—Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
- H04L69/161—Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields
 
Definitions
- cloud service providers offer various services to other companies or individuals for use such as infrastructure as a service (IaaS), software as a service (SaaS) or platform as a service (PaaS).
- IaaS infrastructure as a service
- SaaS software as a service
- PaaS platform as a service
- a hardware infrastructure including compute, memory, storage, accelerators, networking, and so forth executes and supports software stacks provided by the CSPs and their customers.
- CSPs can have experience complex networking environments where packets are parsed, de-encapsulated, decrypted, and sent to a proper virtual machine (VM).
- VM virtual machine
- SLA service level agreement
- network processing occurs in the servers within a datacenter.
- CPUs Central processing units
- CPUs and other server processor resources are used for packet processing, but CPUs and other processor resources can be used for other services that are billable or generate higher revenue than packet processing. The impact of this problem is significantly increased when using high bit-rate network devices such as the 100 Gbps and higher speed networks.
- FIGS. 1A-1D depict example switch systems.
- FIG. 2A depicts an example overview of a system of managing resources in a rack.
- FIG. 2B depicts an example overview of various management hierarchies.
- FIG. 3 depicts an example system in which a switch can respond to a memory access request.
- FIG. 4A shows examples of a Memcached server executing on a server and in a switch.
- FIG. 4B shows the Ethernet packet flow for a single request.
- FIGS. 5A-5C depict example systems in which packets can terminate at a switch.
- FIG. 6 depicts an example of a switch that executes an orchestration control plane to manage what device executes a virtualized execution environment.
- FIG. 7A depicts an example of migration of a virtualized execution environment from a server to another server.
- FIG. 7B depicts an example of migration of a virtualized execution environment.
- FIGS. 8A-8C depict example processes.
- FIG. 9 depicts a system.
- FIG. 10 depicts an environment.
- FIG. 11 depicts an example network element.
- north-south traffic can include packets that flow in or out of the data center whereas east-west traffic can include packets that flow between nodes (e.g., racks of servers) within the data center.
- North-south traffic can be considered a product for serving customers, whereas east-west traffic can be considered overhead.
- the amount of east-west traffic has been growing at a rate that is significantly higher than north-south traffic and processing east-west traffic flow in a timely manner to comply with applicable SLAs while reducing data center total cost of ownership (TCO) is a growing challenge within the datacenter.
- Increasing networking speeds within a data center e.g., 100 Gbps Ethernet and above
- Increasing networking speeds within a data center is a manner of addressing traffic growth.
- an increase in network speed can involve even more packet processing activities, which use processor resources that could otherwise be used for other tasks.
- Some solutions reduce CPU utilization and accelerate packet processing by offloading the tasks to network controller hardware with specialized hardware.
- specialized hardware may be limited to current day workloads and not be flexible to handle future, different workloads or packet processing activities.
- Various embodiments provide for attempting to reduce server processor utilization and attempting to reduce or control growth of east-west traffic within a data center while providing sufficiently fast packet processing.
- Various embodiments provide a switch with infrastructure offload capabilities including one or more CPUs or other accelerator devices inclusively.
- Various embodiments provide a switch with certain packet processing network interface card (NIC) functionality to allow the switch to perform packet processing or network termination and freeing server CPUs to perform other tasks.
- the switch can include or access server class processors, switching blocks, accelerators, offload engines, ternary content-addressable memory (TCAM) and packet processing pipelines.
- the packet processing pipeline(s) could be programmable via P4 or other programming languages.
- the switch can be connected to one or more CPUs or host servers using various connections.
- direct attach copper DAC
- fiber optic cable or other cables can be used to connect the switch with one or more CPUs, compute hosts, servers, including servers in a rack.
- connections can be less than 6 feet in length to reduce bit error rate (BER).
- BER bit error rate
- reference to a switch can refer to multiple connected switches or a distributed switch and a rack may include multiple switches to logically split a rack into two half racks or into pods (e.g., one or more racks).
- Various embodiments of the rack switch can be configured to perform one or more of: (1) telemetry aggregation via high speed connections of packet transmit rates, response latency, cache misses, virtualized execution environment requests, and so forth; (2) orchestration of server resources connected to the switch based at least on telemetry; (3) orchestration of virtual execution environments executing on various servers based at least on telemetry; (4) network termination and protocol processing; (5) memory transaction completion by retrieving data associated with a memory transaction and providing the data to the requester or forwarding the memory transaction to a target that can retrieve the data associated with the memory transaction; (6) caching of data for access by one or more servers in the rack or group of racks; (7) Memcached resource management at the switch; (8) execution of one or more virtualized execution environments to perform packet processing (e.g., header processing in accordance with applicable protocols); (9) management of execution of virtualized execution environments in the switch or in a server or both for load balancing or redundancy; or (10) migration of virtualized execution environments between the switch and
- Various embodiments can terminate network processing in the switch, in place of a server.
- the switch can perform protocol termination, decryption, decapsulation, acknowledgements (ACKs), integrity checks, and network-related tasks can be performed by a switch and not handled by the server.
- the switch can include specialized offload engines for known protocols or calculations and be extensible or programmable to process new protocols or vendor specific protocols via software or field programmable gate (FPGA) to flexibly support future needs.
- FPGA field programmable gate
- Network termination at the switch can reduce or eliminate transfers of data for processing by multiple VEEs that are potentially on different servers or even different racks for service function chain processing.
- the switch can perform network processing and provide the resulting data, after processing, to the destination server within the rack.
- the switch can manage memory input/output (I/O) requests by directing memory I/O requests to the target device instead of to a server for the server to determine a target device and the server transmitting the I/O request to another server or target device.
- Servers can include a memory pool, storage pool or server, compute server, or provide other resources.
- Various embodiments can be used in a scenario where a server 1 issues an I/O request to access memory where a near memory is accessed from a server 2 and a far memory is accessed from a server 3 (e.g., 2 level memory (2LM), memory pooling, or thin memory provisioning).
- 2LM 2 level memory
- the switch can receive a request from server 1 that requests a read or write to memory directed to system 2 .
- the switch can be configured to identify that a memory address referenced by the request is in a memory associated with a server 3 and the switch can forward the request to server 3 instead of sending the request to server 2 , which would transmit the request to server 3 .
- the switch can reduce a time taken to complete a memory transaction.
- the switch can perform caching of data on the same rack to reduce east-west traffic for subsequent requests for the data.
- the switch can notify server 2 that an access to memory of server 3 has taken place so that server 2 and server 3 can maintain coherency or consistency of the data associated with the memory address. If server 2 has posted writes or dirties (modifies) cache lines, coherency protocols and/or producer consumer models can be used to maintain consistency of data stored in server 2 and server 3 .
- the switch can execute orchestration, hypervisor functionality, as well as manage service chain functionality.
- the switch can orchestrate processor and memory resources and virtual execution environment (VEE) execution for an entire rack of servers to provide aggregated resources of a rack as a single, composite server.
- VEE virtual execution environment
- the switch can allocate use of compute sleds, memory sleds, and accelerator sleds for execution by one or more VEEs.
- the switch is positioned top-of-rack (TOR) or middle of rack (MOR) relative to connected servers to reduce a length of connection between the switch and servers.
- TOR top-of-rack
- MOR middle of rack
- servers connect to the switch so that copper cabling from the servers to the rack switch stay within the rack.
- the switch can link the rack to the data center network with fiber optic cable running from the rack to an aggregation region.
- MOR switch position the switch is positioned towards the center of the rack between the bottom of the rack and the top of the rack.
- Other rack positions for switch can be used such as end of rack (EOR).
- FIG. 1A depicts an example switch system.
- Switch 100 can include or access switch circuitry 102 that is communicatively coupled to port circuitry 104 - 0 to 104 -N.
- Port circuitry 104 - 0 to 104 -N can receive packets and provide packets to switch circuitry 102 .
- port circuitry 104 - 0 to 104 -N is Ethernet compatible
- port circuitry 104 - 0 to 104 -N can include a physical layer interface (PHY) (e.g., physical medium attachment (PMA) sublayer, Physical Medium Dependent (PMD), a forward error correction (FEC), and a physical coding sublayer (PCS)), media access control (MAC) encoding or decoding, and a Reconciliation Sublayer (RS).
- PHY physical layer interface
- PMA physical medium attachment
- PMD Physical Medium Dependent
- FEC forward error correction
- PCS physical coding sublayer
- MAC media access control
- RS Reconciliation Sublayer
- An optical-to-electrical signal interface can provide electrical signals to the network port.
- Modules can be built using a standard mechanical and electrical form factors such as the Small Form-factor Pluggable (SFP), Quad Small Form-factor Pluggable (QSFP), Quad Small Form-factor Pluggable Double Density (QSFP-DD), Micro QSFP, or OSFP (Octal Small Format Pluggable) interfaces, described in Annex 136C of IEEE Std 802.3cd-2018 and references therein, or other form factors.
- SFP Small Form-factor Pluggable
- QSFP Quad Small Form-factor Pluggable
- QSFP-DD Quad Small Form-factor Pluggable Double Density
- Micro QSFP or OSFP (Octal Small Format Pluggable) interfaces, described in Annex 136C of IEEE Std 802.3cd-2018 and references therein, or other form factors.
- a packet may be used herein to refer to various formatted collections of bits that may be sent across a network, such as Ethernet frames, IP packets, TCP segments, UDP datagrams, etc.
- L2, L3, L4, and L7 layers are references respectively to the second data link layer, the third network layer, the fourth transport layer, and the seventh application layer of the OSI (Open System Interconnection) layer model.
- OSI Open System Interconnection
- a flow can be a sequence of packets being transferred between two endpoints, generally representing a single session using a known protocol. Accordingly, a flow can be identified by a set of defined N tuples and, for routing purpose, a flow can be identified by tuples that identify the endpoints, e.g., the source and destination addresses. For content based services (e.g., load balancer, firewall, intrusion detection system etc.), flows can be identified at a finer granularity by using five or more tuples (e.g., source address, destination address, IP protocol, transport layer source port, and destination port). A packet in a flow is expected to have the same set of tuples in the packet header. A flow can be unicast, multicast, anycast, or broadcast.
- Switch circuitry 102 can provide connectivity to, from, and among multiple servers and performs one or more of: traffic aggregation, and match action tables for routing, tunnels, buffering, VxLAN routing, Network Virtualization using Generic Routing Encapsulation (NVGRE), Generic Network Virtualization Encapsulation (Geneve) (e.g., currently a draft Internet Engineering Task Force (IETF) standard), and access control lists (ACLs) to permit or inhibit progress of a packet.
- NVGRE Generic Routing Encapsulation
- Geneve Generic Network Virtualization Encapsulation
- ACLs access control lists
- Processors 108 - 0 to 108 -M can be coupled to switch circuitry 102 via respective interfaces 106 - 0 to 106 -M.
- Interfaces 106 - 0 to 106 -M can provide a low latency, high bandwidth memory-based interface such as Peripheral Component Interconnect express (PCIe), Compute Express Link (CXL), memory interface (e.g., any type of Double Data Rate (DDRx), CXL.io, CXL.cache, or CXL.mem), and/or a network connection (e.g., Ethernet or InfiniBand).
- PCIe Peripheral Component Interconnect express
- CXL Compute Express Link
- memory interface e.g., any type of Double Data Rate (DDRx), CXL.io, CXL.cache, or CXL.mem
- DDRx Double Data Rate
- CXL.io Double Data Rate
- CXL.cache CXL.me
- processor modules 108 - 0 to 108 -M can represent servers with CPUs, random access memory (RAM), persistent or non-volatile storage, accelerators and the processor modules could be one or more servers in the rack.
- processor modules 108 - 0 to 108 -M can represent multiple distinct physical servers that are communicatively coupled to switch 100 using connections.
- a physical server can be distinct from another physical server by providing different physical CPU devices, random access memory (RAM) devices, persistent or non-volatile storage devices, or accelerator devices. Distinct physical servers can, however, include the devices with the same performance specifications.
- a server as used herein, can refer to a physical server or a composite server that aggregates resources from one or more distinct physical servers.
- Processor modules 108 - 0 to 108 -M and processor 112 - 0 or 112 - 1 can include one or more cores and system agent circuitry.
- a core can be an execution core or computational engine that can execute instructions.
- a core can access to its own cache and read only memory (ROM), or multiple cores can share a cache or ROM.
- Cores can be homogeneous (e.g., same processing capabilities) and/or heterogeneous devices (e.g., different processing capabilities). Frequency or power use of a core can be adjustable. Any type of inter-processor communication techniques can be used, such as but not limited to messaging, inter-processor interrupts (IPI), inter-processor communications, and so forth.
- Cores can be connected in any type of manner, such as but not limited to, bus, ring, or mesh. Cores may be coupled via an interconnect to a system agent (uncore).
- System agent can include a shared cache which may include any type of cache (e.g., level 1, level 2, or last level cache (LLC)).
- System agent can include or more of: a memory controller, a shared cache, a cache coherency manager, arithmetic logic units, floating point units, core or processor interconnects, or bus or link controllers.
- System agent or uncore can provide one or more of: direct memory access (DMA) engine connection, non-cached coherent master connection, data cache coherency between cores and arbitrates cache requests, or Advanced Microcontroller Bus Architecture (AMBA) capabilities.
- DMA direct memory access
- AMBA Advanced Microcontroller Bus Architecture
- System agent or uncore can manage priorities and clock speeds for receive and transmit fabrics and memory controllers.
- Cores can be communicatively connected using a high-speed interconnect compatible with any of but not limited to Intel QuickPath Interconnect (QPI), Intel Ultra Path Interconnect (UPI), Intel On-Chip System Fabric (IOSF), Omnipath, Compute Express Link (CXL).
- QPI QuickPath Interconnect
- UPI Intel Ultra Path Interconnect
- IOSF Intel On-Chip System Fabric
- Omnipath Compute Express Link
- the number of core tiles is not limited to this example can be any number such as 4, 8, and so forth.
- an orchestration control plane Memcached server, one or more virtualized execution environments (VEEs) can execute on one or more of processor modules 108 - 0 to 108 -M or on processor 112 - 0 or 112 - 1 .
- VEEs virtualized execution environments
- a VEE can include at least a virtual machine or a container.
- a virtual machine can be software that runs an operating system and one or more applications.
- a VM can be defined by specification, configuration files, virtual disk file, non-volatile random-access memory (NVRAM) setting file, and the log file and is backed by the physical resources of a host computing platform.
- a VM can be an OS or application environment that is installed on software, which imitates dedicated hardware. The end user has the same experience on a virtual machine as they would have on dedicated hardware.
- Specialized software called a hypervisor, emulates the PC client or server's CPU, memory, hard disk, network and other hardware resources completely, enabling virtual machines to share the resources.
- the hypervisor can emulate multiple virtual hardware platforms that are isolated from each other, allowing virtual machines to run Linux® and Windows® Server operating systems on the same underlying physical host.
- a container can be a software package of applications, configurations and dependencies so the applications run reliably on one computing environment to another. Containers can share an operating system installed on the server platform and run as isolated processes. A container can be a software package that contains everything the software needs to run such as system tools, libraries, and settings.
- driver software for various operating systems (e.g., VMWare®, Linux®, Windows® Server, FreeBSD, Android®, MacOS®, iOS®, or any other operating system) for applications or VEEs to access switch 100 .
- the driver can present the switch as a peripheral device.
- the driver can present the switch as a network interface controller or network interface card.
- a driver can provide a VEE with ability to configure and access the switch as a PCIe endpoint.
- a virtual function driver such as Adaptive Virtual Function (AVF) can be used to access the switch.
- AVF Adaptive Virtual Function
- An example of AVF is described at least in “Intel® Ethernet Adaptive Virtual Function Specification” Revision 1.0 (2016).
- a VEE can interact with a driver to turn on or off any feature of the switch described herein.
- Device drivers e.g., NDIS-Windows, NetDev-Linux for example
- processor modules 108 - 0 to 108 -M can bind to switch 100 and advertise capabilities of switch 100 to a host operating system (OS) or any OS executed in a VEE.
- An application or VEE can configure or access switch 100 using SIOV, SR-IOV, MR-IOV, or PCIe transactions.
- switch 100 can be enumerated on any of processor modules 108 - 0 to 108 -M as a PCIe Ethernet or CXL device as a locally attached Ethernet device.
- switch 100 can be presented as a physical function (PF) to any server (e.g., any of processor modules 108 - 0 to 108 -M).
- PF physical function
- any server e.g., any of processor modules 108 - 0 to 108 -M.
- a resource e.g., memory, accelerator, networking, CPU
- the resource could appear logically to the server as if attached via a high-speed link (e.g., CXL or PCIe).
- the server could access the resource (e.g., memory or accelerator) as a hot plugged resource.
- these resources could appear as pooled resources that are now available to the server.
- processor modules 108 - 0 to 108 -M and switch 100 can support use of single-root I/O virtualization (SR-IOV).
- SR-IOV single-root I/O virtualization
- PCI-SIG Single Root IO Virtualization and Sharing Specification v1.1 and predecessor and successor versions describe use of a single PCIe physical device under a single root port to appear as multiple separate physical devices to a hypervisor or guest operating system.
- SR-IOV uses physical functions (PFs) and virtual functions (VFs) to manage global functions for the SR-IOV devices.
- PFs can be PCIe functions that can configure and manage the SR-IOV functionality.
- a PF can configure or control a PCIe device, and the PF has ability to move data in and out of the PCIe device.
- the PF is a PCIe function of switch 100 that supports SR-IOV.
- the PF includes capability to configure and manage SR-IOV functionality of switch 100 , such as enabling virtualization and managing PCIe VFs.
- a VF is associated with a PCIe PF on switch 100 , and the VF represents a virtualized instance of switch 100 .
- a VF can have its own PCIe configuration space but can share one or more physical resources on switch 100 , such as an external network port, with the PF and other PFs or other VFs.
- any server e.g., processor modules 108 - 0 to 108 -M
- a VEE executing on switch 100 can utilize a VF to configure or access any server.
- platform 1900 and NIC 1950 can interact using Multi-Root IOV (MR-IOV).
- MR-IOV Multi-Root IOV
- SIG PCI Special Interest Group
- PCIe PCI Express
- processor modules 108 - 0 to 108 -M and switch 100 can support use of Intel® Scalable I/O Virtualization (SIOV).
- processor modules 108 - 0 to 108 -M can access switch 100 as a SIOV capable device or switch 100 can access processor modules 108 - 0 to 108 -M as SIOV capable devices.
- a SIOV capable device can be configured to group its resources into multiple isolated Assignable Device Interfaces (ADIs). Direct Memory Access (DMA) transfers from/to each ADI are tagged with a unique Process Address Space identifier (PASID) number.
- ADIs Assignable Device Interfaces
- DMA Direct Memory Access
- PASID Process Address Space identifier
- SIOV System-on-Input/Output
- processor modules 108 - 0 to 108 -M network controllers, storage controllers, graphics processing units, and other hardware accelerators can utilize SIOV across many virtualized execution environments.
- SIOV enables software to flexibly compose virtual devices utilizing the hardware-assists for device sharing at finer granularity.
- Performance critical operations on the composed virtual device are mapped directly to the underlying device hardware, while non-critical operations are emulated through device-specific composition software in the host.
- a technical specification for SIOV is Intel® Scalable I/O Virtualization Technical Specification, revision 1.0, June 2018.
- Multitenant security can be employed where switch 100 is granted access to some or all server resources in the rack. Accesses by switch 100 to any server can require use of crypto keys, checksums, or other integrity checks. Any server can employ an access control list (ACL) to ensure communications from switch 100 are permitted but can filter out communications from other sources (e.g., drop communications).
- ACL access control list
- switch 100 acts a network proxy for a VEE running on a server.
- a VEE executing on switch 100 can form the packets for transmission using a network connection of switch 100 according to any applicable communications protocol (e.g., standardized or proprietary protocol).
- switch 100 can originate a packet transmission where a workload or VEE running on the cores is in switch 100 or accessible to switch 100 .
- Switch 100 can access connected internal cores in a similar manner as accessing any other externally connected host.
- One or more host(s) can be placed inside the same chassis as switch 100 .
- a VEE or service runs on a CPU of switch 100 , such VEE can originate packets for transmission.
- switch 100 could originate packets for transmission to respond to any request for data or in the case of cache miss, query another server or system for the data and retrieve data to update its cache.
- FIG. 1B depicts an example system.
- Switch system 130 can include or access switch circuitry 132 that is communicatively coupled to port circuitry 134 - 0 to 134 -N.
- Port circuitry 134 - 0 to 134 -N can receive packets and provide packets to switch circuitry 132 .
- Port circuitry 134 - 0 to 134 -N can be similar to any of port circuitry 104 - 0 to 104 -N.
- Interfaces 136 - 0 to 136 -M can provide communication with respective processor modules 138 - 0 to 138 -M.
- an orchestration control plane can execute on one or more of processor modules 138 - 0 to 138 -M.
- processor modules 138 - 0 to 138 -M can be similar to respective processor modules 108 - 0 to 108 -M.
- FIG. 1C depicts an example system.
- Switch system 140 can include or access switch circuitry 142 that is communicatively coupled to port circuitry 144 - 0 to 144 - 4 .
- Port circuitry 144 - 0 to 144 - 4 can receive packets and provide packets to switch circuitry 142 .
- Port circuitry 144 - 0 to 144 -N can be similar to any port circuitry 104 - 0 to 104 -N.
- Interfaces 146 - 0 to 146 - 1 can provide communication with respective processor modules 148 - 0 to 148 - 1 .
- an orchestration control plane can execute on one or more of processors 147 - 0 or 147 - 1 or processor modules 148 - 0 to 148 - 1 .
- processors 147 - 0 or 147 - 1 can execute on one or more of processors 147 - 0 or 147 - 1 or processor modules 148 - 0 to 148 - 1 .
- Processor modules 148 - 0 to 148 - 1 can be similar to any of processor modules 108 - 0 to 108 -M.
- FIG. 1D depicts an example system.
- aggregation switch 150 is coupled to multiple switches of different racks.
- a rack can include switch 152 coupled to servers 154 - 0 to 154 -N.
- Another rack can include switch 156 coupled to servers 158 - 0 to 158 -N.
- One or more of the switches can operate in accordance with embodiments described herein.
- a core switch or other access point can connect aggregation switch 150 to the Internet for packet transmission and receipt with another data center.
- FIG. 1 Note that depiction of servers relative to switch is not intended to show a physical arrangement as a TOR, MOR or any other switch position can be used (e.g., end of rack (EOR)) relative to servers.
- EOR end of rack
- Embodiments described herein are not limited to data center operation and can apply to operations among multiple data centers, enterprise networks, on-premises, or hybrid data centers.
- any type of configuration that requires power cycling e.g., after NVM update or firmware update (e.g., update of a Basic Input/Output System (BIOS), Universal Extensible Firmware Interface (UEFI), or a boot loader)) can be performed in isolation and not require the entire switch to power cycle to avoid impacting all servers connected to the switch and in the rack.
- firmware update e.g., update of a Basic Input/Output System (BIOS), Universal Extensible Firmware Interface (UEFI), or a boot loader
- FIG. 2A depicts an example overview of a system of managing resources in a rack.
- Various embodiments provide switch 200 with orchestration control plane 202 that can manage control planes in one or more servers 210 - 0 to 210 -N connected to switch 200 .
- Orchestration control plane 202 can receive SLA information 206 for one or more VEEs (e.g., any of 214 - 0 - 0 to 214 - 0 -P or 214 -N- 0 to 214 -N-P), telemetry information 204 from servers in the rack such as resource utilization, measured device throughput (e.g., memory read or write completion times), available memory or storage bandwidth, or resources needs of a server connected to the switch or more broadly, in the rack.
- VEEs e.g., any of 214 - 0 - 0 to 214 - 0 -P or 214 -N- 0 to 214 -N-P
- telemetry information 204 from servers in the
- orchestration control plane 202 can proactively control, moderate, or quiesce network bandwidth allocated to a server (e.g., data transmission rates from switch 200 to a server or from the server to switch 200 ) and thereby moderate a rate of communications sent from or received by VEEs running on a server.
- a server e.g., data transmission rates from switch 200 to a server or from the server to switch 200
- orchestration control plane 202 can allocate to any server's hypervisor (e.g., 212 - 0 to 212 -N) one or more of: compute resources, network bandwidth (e.g., between switch 200 and another switch (e.g., aggregation switch or switch for another rack), and memory or storage bandwidth.
- switch 200 can proactively manage data transmission or receipt bandwidths to any VEE in a rack and prior to receipt of any flow control message, but can also manage data transmission bandwidth from any VEE in the event of receipt of a flow control message (e.g., XON/XOFF or Ethernet PAUSE) to reduce or pause transmission of a flow.
- a flow control message e.g., XON/XOFF or Ethernet PAUSE
- Orchestration control plane 202 can monitor activities of all servers 210 - 0 to 210 -N in its rack at least based on telemetry data and can manage hypervisors 212 - 0 to 212 -N to control traffic generation of VEEs.
- switch 200 can perform flow control to quiesce a packet transmitter from either a local VEE or a remote sender in cases where congestion is detected.
- hypervisors 212 - 0 to 212 -N can compete for resources from orchestration control plane 202 to allocate for managed VEEs, but such a scheme may not lead to under allocation of resources to some VEEs.
- orchestration control plane 202 can configure a hypervisor (e.g., 212 - 0 or 212 -N) associated with a server that executes one or more VEEs.
- a hypervisor e.g., 212 - 0 or 212 -N
- servers 210 - 0 to 210 -N can execute respective hypervisor control plane 212 - 0 to 212 -N to manage data planes for VEEs running on a server.
- a hypervisor control plane e.g., 212 - 0 to 212 -N
- a VEE can manage the contention between flows within the resource that it is granted.
- Orchestration control plane 202 can be afforded privileges within switch 200 and servers 210 - 0 to 210 -N at least to configure resource allocations to servers. Orchestration control plane 202 can be insulated from untrusted VEEs that may compromise a server. Orchestration control plane 202 can monitor and shutdown a VEE's VF or a server's PF for a NIC if malicious activity is detected.
- a hypervisor control plane 212 (e.g., any of hypervisor control plane 212 - 0 to 212 -N) for a server can determine whether to configure resources afforded to a VEE and operations of the VEE in response to a physical host configuration request having been received, such as from orchestration control plane 202 , an administrator, as a result of an update to a policy associated with a tenant for which the VEE executes, etc.
- a configuration from orchestration control plane 202 can be classified as trusted or untrusted.
- Hypervisor control plane 212 for a server can allow any trusted configuration to be enacted for a VEE. In some examples, bandwidth allocation, initiation of VEE migration or termination, and resource allocations made by orchestration control plane 202 can be classified as trusted.
- Hypervisor 212 can limit untrusted configurations to perform certain configurations, but not certain hardware access/configuration operations that exceed a trust level. For example, an untrusted configuration cannot issue device resets, change the link configuration, write sensitive/device wide registers, and update the device firmware, etc. By separating configurations into trusted or untrusted, hypervisor 212 can neutralize a potential attack surface by sanitizing untrusted requests. In addition, hypervisor 212 can expose different capabilities for each of its different VEEs, thus allowing the host/provider to segregate tenants as needed.
- FIG. 2B depicts an example overview of various management hierarchies.
- orchestration control plane issues trusted configurations to hypervisor control plane of a server. Some or all commands or configurations from orchestration control plane sent to hypervisor control plane can be considered trusted.
- Hypervisor control plane institutes the configurations for VEEs managed by the hypervisor.
- the switch controls servers as though the servers represent physical functions (PFs) and associated virtual functions (VF- 0 to VF-N) represent VEEs.
- PFs physical functions
- VF- 0 to VF-N virtual functions
- OS hypervisor corresponds to a PF and VEEs access the PF using their corresponding VF.
- the orchestration control plane manages a hypervisor control plane.
- orchestration control plane can manage data planes DP- 0 to DP-N of a server to control allocated resources, allocated network bandwidth (e.g., transmit or receive), and migration or termination of any VEE.
- FIG. 3 depicts an example system in which a switch can respond to a memory access request.
- a requester device or VEE in or executing on server 310 can request data stored in server 312 .
- Switch 300 can receive and process the memory access request and determine a destination server or device (e.g., IP address or MAC address) to which the memory access request is to be provided for completion (e.g., read or write) is memory pool 332 .
- a destination server or device e.g., IP address or MAC address
- switch 300 can transfer the request to memory pool 332 .
- switch 300 can access mapping table 302 that indicates a mapping of a memory address associated with a memory access request to a device physical address (e.g., destination IP address or MAC address).
- switch 300 can be trusted with addresses of target devices and conversion of virtual addresses (provided with the memory access request) to physical address.
- switch 300 can request a memory access (e.g., read or write) on behalf of a requester of the memory access at the target device.
- switch 300 can directly access memory pool 332 to retrieve data for a read operation or write data. For example, when server 310 requests data from server 312 but the data is stored in memory pool 332 , switch 300 may retrieve the requested data from memory pool 332 (or other server) and provide the data to server 310 and potentially store the data in memory 304 or server 312 . Switch 300 can fetch the data from memory pool 332 (or other device, server, or storage pool) by issuing a data read request to switch 320 to retrieve the data. Memory pool 332 can be located within a same data center as switch 300 or outside of the data center.
- Switch 300 can store the fetched data in memory 304 (or server 312 ) to allow multiple read-write transactions with low latency by servers in a same rack as switch 300 .
- a highspeed connection can provide data from memory 304 to server 310 or vice versa.
- CXL.mem is used to transfer data from server 310 to memory 304 or vice versa
- Switch 300 can update the data in memory pool 332 if the data from memory 304 is modified.
- a two-level memory (2LM) architecture can be implemented to copy data to a local memory accessible over a fast connection for processing by VEEs and significantly alleviating the latency penalty associated with retrieving data.
- switch 300 can forward the request to the target device that stores the data to respond to the memory request. For example, switch 300 can use packet processing 306 to change a destination IP or MAC address of the packet that conveyed the memory access request to be that of the target device or encapsulate the request in another packet but maintain the destination IP or MAC address of the received memory access request.
- Thin memory provisioning allows less memory on a compute node and building a memory pool that is shared by multiple compute nodes.
- the shared memory can be dynamically allocated/deallocated to compute nodes with allocation set at page or cache line granularity.
- memory allocated on all compute nodes and memory in shared pool can be less than amount of memory allocated to a compute node. For example, where thin memory provisioning is used for server 310 , data can be stored in a memory on a same rack as that of server 310 and potentially in a remote memory pool 332 .
- switch 300 can queue the write, report the write operation as complete to server 310 (e.g., the VEE) and then update memory pool 332 as memory bandwidth allows or as required by memory ordering and cache coherency requires (e.g., flushing posted writes).
- server 310 e.g., the VEE
- switch 300 can process a memory access to a region of a memory with a corresponding address and, in the case of a write, corresponding data to write.
- Switch 300 can read data from or store data to memory pool 332 using remote direct memory access (e.g., InfiniBand, iWARP, RoCE and RoCE v2), NVMe over Fabrics (NVMe-oF) or NVMe.
- remote direct memory access e.g., InfiniBand, iWARP, RoCE and RoCE v2
- NVMe over Fabrics NVMe over Fabrics
- NVMe-oF NVMe over Fabrics
- NVMe-oF is described at least in NVM Express Base Specification Revision 1.4 (2019), as well as predecessors, successors, and proprietary variations thereof.
- NVMe is described for example, in NVM ExpressTM Base Specification, Revision 1.3c (2016), as well as predecessors, successors, and proprietary variations thereof.
- switch 300 can retrieve data or write data as though the data were stored in a server of a same rack as server 310 .
- switch 300 may also contribute to the aggregated cache space as well.
- Smart cache allocation could place data in a memory of a server that accesses the data.
- Data that is thrashed e.g., accessed and modified by several servers
- Memcached can provide a distributed memory-caching system within a data center or across multiple data centers. For example, Memcached can provide distributed databases to speed up applications by alleviating database load.
- dedicated servers can be used as Memcached servers to consolidate resources across servers (e.g., via Ethernet) and cache commonly accessed data to speed up access to that data.
- a switch can manage data stored as part of a Memcached object, data, or string storage in at least some memory resources in servers connected to the switch.
- FIG. 4A shows examples of a Memcached server executing on a server (system 400 ) and in a switch (system 450 ).
- Use of Memcached allows frequently requested data to be provide faster by use of a hash look up instead or a database (or any other complex) query, although a database query can be used in any embodiment.
- a first request for data can be relatively slow as it causes retrieval of data. Future requests for the same data can be faster as the data is stored and can be provided from the data server.
- a requestor can be a client/server on a different rack in a row of the data center, on a different row in the data center, or an external request from outside of the data center.
- the request can be received at aggregation switch 402 and provided to switch 404 using an Ethernet link.
- Switch 404 can use an Ethernet link to provide the request to Memcached server 408 running on server 406 - 0 , which in turn provides a request for data to server 406 - 1 .
- Memcached server 408 running on server 406 - 0
- server 406 - 1 there are multiple Ethernet communications within the same rack to provide the desired data. Ethernet communications can contribute to east-west traffic within a datacenter.
- the request can be received at aggregation switch 402 and provided to switch 452 using an Ethernet link.
- Switch 452 executes Memcached server 408 using one or more processors and determines a server device that stores requested data.
- the request can be provided to server 460 - 1 and not contribute to east-west traffic. If the requestor were in the same rack (e.g., server 460 -N), as switch 454 is a network endpoint, the request could be handled internally to switch 454 and not travel over Ethernet to be fulfilled.
- a cache miss e.g., data is not stored in server 460 - 1
- data can be retrieved from another server (e.g., 460 - 0 ) over the connection.
- switch 452 can execute Memcached in a VEE running on the switch and can consolidate resources in the entire rack into a virtual pool of combined cache and memory via a high-speed connection.
- Memcached server VEE could automatically update its cache (e.g., shown as data in server 460 - 1 ) based on how it is configured to improve data locality to requesters and reduce further latency.
- FIG. 4B shows the Ethernet packet flow for a single request.
- Each arrow represents a traversal of an Ethernet link and contribution to east-west or north-south traffic.
- a requester sends a request to an aggregation switch, the aggregation switch provides the request to a switch and in turn, the switch provide the request to the Memcached server.
- the Memcached server provides a request to be sent to a data server through the switch.
- the data server responds by indicating that data is not present via the switch to the Memcached server.
- the Memcached server receives a response of a cache miss so that the Memcached server can update its cache with the data so subsequent requests for that data no longer result in a cache miss.
- the Memcached server provides the data to the requester even in cases of a cache miss.
- the Memcached server is in a different rack in the data center than a rack that stores the data
- the request travels to a different rack and a response is provided to the Memcached server.
- the switch could issue an Ethernet request to a rack that stores the data.
- the switch could bypass the Memcached servers and request data from the data source directly.
- a requester provides a request to the switch via an aggregation switch and the switch accesses the Memcached server and data in its rack via a connection (e.g., PCIe, CXL, DDRx) and provides the response data to the requester via an aggregation switch to the requester.
- a connection e.g., PCIe, CXL, DDRx
- 4 Ethernet link traversals occur.
- Providing a Memcached service in a switch can reduce the network accesses to databases on other racks and even reduce the east-west traffic within the rack by performing the Memcached data location look-up in the switch.
- the switch can directly supply the requested data in response to the request.
- a memory of the switch e.g., in memory 304
- the switch can directly supply the requested data in response to the request.
- fewer Ethernet communications are made by system 450 because servers in the same rack are accessible via switch 452 ( FIG. 4A ) using high-speed connections (PCIe, CXL, DDR, etc.) to retrieve data to cache.
- FIG. 5A depicts an example system in which packets can terminate at a switch.
- a packet can be received by switch 502 from an aggregation switch, for example.
- the packet can be Ethernet compatible and use any type of transport layer (e.g., Transmission Control Protocol (TCP), Data Center TCP (DCTCP), User Datagram Protocol (UDP), quick User Datagram Protocol Internet Connections (QUIC)).
- TCP Transmission Control Protocol
- DCTCP Data Center TCP
- UDP User Datagram Protocol
- QUIC quick User Datagram Protocol Internet Connections
- Various embodiments of switch 502 can execute one or more VEEs (e.g., 504 or 506 ) to terminate a packet by performing network protocol activity.
- VEEs e.g., 504 or 506
- VEEs 504 or 506 can perform network protocol processing or network termination on switch 502 such as one or more of: segmentation, reassembly, acknowledgements (ACKs), negative-acknowledgements (NACKs), packet retransmit identification and requests, congestion management (e.g., flow control of a transmitter), Secure Sockets Layer (SSL) or Transport Layer Security (TLS) termination for HTTP and TCP.
- congestion management e.g., flow control of a transmitter
- SSL Secure Sockets Layer
- TLS Transport Layer Security
- protocol processing VEE 504 or 506 can perform network service chain features such as firewalls, network address translation (NAT), intrusion protection, decryption, evolved packet core (EPC), encryption, filtering of packets based on virtual local area network (VLAN) tag, encapsulation, and so forth.
- network service chain features such as firewalls, network address translation (NAT), intrusion protection, decryption, evolved packet core (EPC), encryption, filtering of packets based on virtual local area network (VLAN) tag, encapsulation, and so forth.
- switch 502 can execute protocol processing VEEs 504 and 506 when there is low utilization of switch's processors.
- a protocol processing VEE could execute on the computing resources of one or more servers in a rack.
- Switch 502 can include or access via high-speed connections packet buffers for receipt or transmission of packets.
- VEEs 504 or 506 can perform packet protocol termination or network termination of at least some received packets at switch 502 .
- VEEs 504 or 506 can perform packet processing for any of layers 2-4 of the Open Systems Interconnection model (OSI model) (e.g., Data link layer, Network layer, or Transport layer (e.g., TCP, UDP, QUIC)).
- OSI model Open Systems Interconnection model
- VEEs 504 or 506 can perform packet processing for any of layers 5-7 of the OSI model (e.g., Session layer, Presentation layer, or Application layer).
- VEEs 504 or 506 can provide a tunnel endpoint by performing tunnel origination or termination by providing encapsulation or decapsulation for technologies such as, but not limited to, Virtual Extensible LAN (VXLAN) or Network Virtualization using Generic Routing Encapsulation (NVGRE).
- VXLAN Virtual Extensible LAN
- NVGRE Generic Routing Encapsulation
- VEEs 504 or 506 or any device (e.g., programmable or fixed function) in switch 502 can perform one or more of: large receive offload (LRO), large send/segmentation offload (LSO), TCP segmentation offload (TSO), Transport Layer Security (TLS) offload, receive side scaling (RSS) to allocate a queue or core to process a payload, dedicated queue allocation, or another layer protocol processing.
- LRO large receive offload
- LSO large send/segmentation offload
- TSO TCP segmentation offload
- TLS Transport Layer Security
- RSS receive side scaling
- LRO can refer to switch 502 (e.g., VEEs 504 or 506 or a fixed or programmable device) reassembling incoming network packets and transferring packet contents (e.g., payloads) into larger contents and transferring the resulting larger contents but fewer packets for access by the host system or a VEE.
- switch 502 e.g., VEEs 504 or 506 or a fixed or programmable device
- packet contents e.g., payloads
- LSO can refer to switch 502 (e.g., VEEs 504 or 506 ) or server 510 - 0 or 510 - 1 (e.g., VEE 514 - 0 or 514 - 1 ) generating a multipacket buffer and providing content of the buffer to switch 502 (e.g., VEEs 504 or 506 or a fixed or programmable device) to split into separate packets for transmission.
- switch 502 e.g., VEEs 504 or 506
- server 510 - 0 or 510 - 1 e.g., VEE 514 - 0 or 514 - 1
- switch 502 e.g., VEEs 504 or 506 or a fixed or programmable device
- TSO can permit switch 502 or a server 510 - 0 or 510 - 1 to build a larger TCP message (or other transport layer) (e.g., 64 KB in length) and switch 502 (e.g., VEEs 504 or 506 or a fixed or programmable device) segmenting the message it into smaller data packets for transmission.
- a larger TCP message or other transport layer
- switch 502 e.g., VEEs 504 or 506 or a fixed or programmable device
- TLS is defined at least in The Transport Layer Security (TLS) Protocol Version 1.3, RFC 8446 (March 2018).
- TLS offload can refer to offload of encryption or decryption of contents in accordance with TLS to switch 502 (e.g., VEEs 504 or 506 or a fixed or programmable device).
- Switch 502 can receive data for encryption from server 510 - 0 or 510 - 1 (e.g., VEE 514 - 0 or 514 - 1 ) or VEEs 504 or 506 , and perform the encryption of data prior to transmission of encrypted data in one or more packets.
- Switch 502 can receive packets and decrypt content of packets prior to transfer of decrypted data to server 510 - 0 or 510 - 1 for access by VEE 514 - 0 or 514 - 1 or VEEs 504 or 506 .
- any type of encryption or decryption be performed by switch 502 such as but not limited to Secure Sockets Layer (SSL).
- SSL Secure Sockets Layer
- RSS can refer to switch 502 (e.g., VEEs 504 or 506 or a fixed or programmable device) calculating a hash or make another determination based on contents of a received packet to determine and select which CPU or core is to process payload from the received packet. Other manners of distributing payloads to cores can performed.
- switch 502 e.g., VEEs 504 or 506 or a fixed or programmable device
- NUMA non-uniform memory access
- switch 502 (e.g., VEEs 504 or 506 or a fixed or programmable device) can perform RSS to select a core on switch 502 or a server that is to store and process payload from the received packet. In some examples, switch 502 can perform RSS to allocate one or more cores (on switch 502 or a server) to perform packet processing.
- switch 502 can allocate a dedicated queue in a memory to an application or VEE according to Application Device Queue (ADQ) or similar technology.
- ADQ Application Device Queue
- Use of ADQ can dedicate queues to applications or VEEs, and these queues can be exclusively accessed by the applications or VEEs.
- ADQ can prevent network traffic contention whereby different applications or VEEs attempt to access the same queue and cause locking or contention, and the performance (e.g., latency) of packet availability becomes unpredictable.
- ADQ provides quality of service (QoS) control for dedicated application traffic queues for received packets or packets to be transmitted.
- QoS quality of service
- switch 502 can allocate packet payload content to one or more queues where the one or more queues are mapped to access by software such as an application or VEE.
- switch 502 can utilize ADQ to dedicate one or more queues for packet header processing operations.
- FIG. 5C depicts an example manner of NUMA node, CPU or server selection by switch 502 (e.g., VEEs 504 or 506 or a fixed or programmable device).
- switch 502 e.g., VEEs 504 or 506 or a fixed or programmable device.
- resource selector 572 can perform a hash calculation on a received packet's header (e.g., hash calculation on a packet flow identifier) to determine an indirection table stored on switch 502 that maps to a queue (e.g., among queues 576 ), which in turn maps to a NUMA node, CPU or server.
- a queue e.g., among queues 576
- Resource mappings 574 can include an indirection table and mapping to queue as well as indicator of which connection (e.g., CXL link, PCIe connection or DDR interface) to use to copy a header and/or payload of a received packet to a memory (or cache) associated with a selected NUMA node, CPU or server.
- resource selector 572 performs RSS to select a NUMA node, CPU or server. For example, resource selector 572 can select a CPU1 in NUMA Node 0 on server 580 - 1 to process the header and/or payload of the received packet.
- a NUMA node on a server could have its own connection to switch 570 to allow writing to memory in a server without traversing a UPI bus.
- a VEE can be executed on one or more cores or CPUs and the VEE can process the received payload.
- VEEs 504 or 506 can execute processes based on Data Plane Development Kit (DPDK), Storage Performance Development Kit (SPDK), OpenDataPlane, Network Function Virtualization (NFV), software-defined networking (SDN), Evolved Packet Core (EPC), or 5G network slicing.
- DPDK Data Plane Development Kit
- SPDK Storage Performance Development Kit
- NFV Network Function Virtualization
- SDN software-defined networking
- EPC Evolved Packet Core
- 5G network slicing 5G network slicing.
- ETSI European Telecommunications Standards Institute
- MANO Open Source NFV Management and Orchestration
- VNF virtual network function
- DNS domain name system
- NAT network address translation
- EPC is a 3GPP-specified core architecture at least for Long Term Evolution (LTE) access.
- 5G network slicing can provide for multiplexing of virtualized and independent logical networks on the same physical network infrastructure.
- any protocol processing, protocol termination, network termination, or offload operation can be performed by a programmable or fixed function device in switch 502 instead of or in addition to use of a VEE executing in switch 502 .
- processing packets in switch 502 can allow for a faster decision of packet handling (e.g., forward or discard) than were a decision of packet handling were made in a server.
- bandwidth utilization of a connection between a server and switch can be saved from use in the event of packet discard. If a packet were identified as related to malicious activity (e.g., DDoS attack), the packet could be discarded and insulate a server from potential exposure to the malicious activity.
- VEEs 504 and 506 running on compute resources of switch 502 can complete network processing and provide resulting data is transferred to a data buffer for a VEE 514 - 0 or 514 - 1 via DMA, RDMA, PCIe, CXL.mem, regardless of the network protocol that was used to deliver the packet.
- VEEs 504 and 506 running on the compute resources of switch 502 can act as proxy VEEs for respective VEEs 514 - 0 or 514 - 1 running on respective servers 510 - 0 and 510 - 1 .
- VEE 504 or 506 can perform protocol stack processing.
- a VEE executing on switch 502 e.g., VEE 504 or 506
- a payload from the packet can be copied into memory buffer (e.g., 512 - 0 or 512 - 1 ) in a destination server (e.g., 510 - 0 or 510 - 1 ).
- VEEs 504 and 506 can cause performance of a direct memory access (DMA) or RDMA operation to copy the packet payload to a buffer associated with a VEE (e.g., VEEs 514 - 0 and 514 - 1 ) that is to process the packet payload.
- DMA direct memory access
- RDMA RDMA
- a descriptor can be a data structure provided by an orchestrator or VEEs 514 - 0 and 514 - 1 to switch 500 to identify available regions of memory or cache to receive packets.
- VEEs 504 and 506 can complete receive descriptors to indicate destination locations of packet payloads in a buffer of a destination server (e.g., 510 - 0 or 510 - 1 ) and copy the completed receive descriptors for access by the VEE that is to process the packet payload.
- switch 502 can execute VEEs for each of VEEs executing on servers within its rack or an optimized subset.
- a subset of VEEs to execute on the switch can correspond to VEEs running on servers with low latency requirements, are primarily network focused, or other criteria.
- switch 502 is connected to servers 510 - 0 and 510 - 1 using connections that permit switch 502 to access to all the CPUs, memory, storage in the rack.
- An orchestration layer can manage resource allocation to VEEs in some or all of switch 502 and any server in the rack.
- VEEs 514 - 0 and 514 - 1 executed in respective servers 510 - 0 and 510 - 1 can select a mode of being informed of data availability such as: polling mode, busy poll, or interrupt.
- Polling mode can include a VEE polling for a new packet by actively sampling a status of a buffer to determine if there is a new packet arrival.
- Busy polling can allow socket layer code to poll a receive queue and disable network interrupts.
- Interrupt can cause an executing process to save its sate and perform a process associated with an interrupt (e.g., process a packet or data).
- Server 510 - 0 or 510 - 1 in a rack can receive interrupts instead of running in polling mode for packet processing. Interrupts can be issued by switch 502 to a server for higher level transactions, rather than per packet. For example, where a VEE 514 - 0 or 514 - 1 runs a database, an interrupt could be provided by VEE 504 or 506 to VEE 514 - 0 or 514 - 1 when a record update is complete even if a record update is provided using many packets.
- VEE 514 - 0 or 514 - 1 runs a webserver
- an interrupt could be provided by VEE 504 or 506 to VEE 514 - 0 or 514 - 1 after a complete form is received, despite one or multiple packets providing the form. Polling for received packets or data could be used in any case.
- FIG. 5B depicts an example of a composition of VEEs on a server and switch.
- VEE 552 executes on switch 550 to perform protocol processing or packet protocol termination for packets having payloads to be processed by VEE 562 , which executes on server 550 .
- VEE 552 can execute on one or more cores on switch 550 .
- VEE 552 can process packet headers for packets utilizing TCP/IP or other protocol or protocol combinations.
- VEE 552 can write a payload of a processed packet to a socket buffer 566 in server 560 via socket interface 554 -to-socket interface 564 and high speed connection 555 (e.g., PCIe, CXL, DDRx (where x is an integer)).
- Socket buffer 566 can be represented as a memory address.
- An application e.g., running in VEE 562 executing server 560 can access the socket buffer 566 to utilize or process the data.
- VEE 552 can provide operations of a TCP Offload Engine (TOE) without requiring any of the protocol stack changes (such as TCP Chimney).
- TOE TCP Offload Engine
- network termination occurs in VEE 552 of switch 550 and server 560 does not receive any packet headers in socket buffer 566 .
- VEE 552 of switch 550 can perform protocol processing of Ethernet, IP, and transport layers (e.g., TCP, UDP, QUIC) headers and such header would not be provided to server 560 .
- transport layers e.g., TCP, UDP, QUIC
- switch 550 can transfer or copy those headers or markers, in addition to the payload data, to socket buffer 566 . Accordingly VEE 562 can access data in socket buffer 566 regardless of the protocols used to transmit the data (e.g., Ethernet, asynchronous transfer mode (ATM), Synchronous optical networking (SONET), synchronous digital hierarchy (SDH), Token Ring, and so forth.
- Ethernet e.g., Ethernet, asynchronous transfer mode (ATM), Synchronous optical networking (SONET), synchronous digital hierarchy (SDH), Token Ring, and so forth.
- VEEs 552 and 562 can be related as a network service chaining (NSC) or service function chaining (SFC) whereby VEE 552 hands data off to VEE 562 within a trusted environment or at least by sharing of memory space.
- Network service VEE 552 can be chained to an application service VEE 562 and VEEs 552 and 562 could have a shared memory buffer for layer 7 data passing.
- telemetry data examples include device temperature readings, application monitoring, network usage, disk space usage, memory consumption, CPU utilization, fan speeds, as well as application-specific telemetry streams from VEEs running on a server.
- telemetry data can include counters or performance monitoring events related to: processor or core usage statistics, input/output statistics for devices and partitions, memory usage information, storage usage information, bus or interconnect usage information, processor hardware registers that count hardware events such as instructions executed, cache-misses suffered, branches mis predicted.
- telemetry data such as but not limited to outputs from Top-down Micro-Architecture Method (TMAM), execution of the Unix system activity reporter (SAR) command, Emon command monitoring tool that can profile application and system performance.
- TMAM Top-down Micro-Architecture Method
- SAR Unix system activity reporter
- Emon command monitoring tool that can profile application and system performance.
- additional information can be collected such as outputs from a variety of monitoring tools including but not limited to output from use of the Linux perf command, Intel PMU toolkit, lostat, VTune Amplifier, or monCli or other Intel Benchmark Install and Test Tool (Intel® BITT) Tools.
- Other telemetry data can be monitored such as, but not limited to, power usage, inter-process communications, and so forth.
- Various telemetry techniques such as those described with respect to the collected daemon can be used.
- KPIs key performance indicators
- a server when a high-speed connection is used between a server and switch, much more information can pass from the server to the switch without burdening east-west traffic.
- the switch can collect more than a minimum set of telemetry (e.g., KPIs) from the server while not burdening the network with excessive east-west traffic overhead.
- a server can send KPIs to the switch unless more data or history is requested such as in the case of an error.
- An orchestrator e.g., orchestration control plane 202 of FIG. 2A
- executed for the switch can use expanded telemetry data (e.g., telemetry 204 of FIG. 2A ) to determine available capacity on each of the servers on its rack and can provide refined multi-server job placements to maximize performance considering telemetry of multiple servers.
- FIG. 6 depicts an example of a switch that executes an orchestration control plane to manage what device executes a VEE.
- Orchestration control plane 604 executing on switch 602 can monitor one or more VEE's performance in terms of compliance with an applicable SLA and when the VEE does not comply with an SLA requirement (e.g., application availability (e.g., 99.999% during workdays and 99.9% for evenings or weekends), maximum permitted response times to queries or other invocations, requirements of actual physical location of stored data, or encryption or security requirements) or is within a range close to non-compliance of an SLA requirement, orchestration control plane 604 can instantiate one or more new VEEs to balance the workload among VEEs.
- an SLA requirement e.g., application availability (e.g., 99.999% during workdays and 99.9% for evenings or weekends), maximum permitted response times to queries or other invocations, requirements of actual physical location of stored data, or encryption or security requirements
- orchestration control plane 604 can instantiate
- a workload can include at least any type of activity such as protocol processing and network termination for packets or Memcached server, database or webserver.
- VEE 606 can perform protocol processing, and if a workload increases, multiple instances of VEE 606 can be instantiated on switch 602 .
- orchestration control plane 604 executing on switch 602 can determine whether to migrate any VEE executing on switch 602 or a server to execution on another server. For example, migration can depend on a shut down or restart of switch 602 on which the VEE executes, which can cause the VEE to be executed on a server. For example, VEE migration can depend on a shut down or restart of a server on which the VEE executes, which can cause the VEE to be executed on switch 602 or another server.
- orchestration control plane 604 can decide whether to execute a VEE on a particular processor or migrate the VEE among switch 602 or any server 608 - 0 to 608 -N.
- VEE 606 or VEE 610 can migrate from a server to a switch, a switch to a server, or a server to another server as needed.
- VEE 606 could execute on switch 602 for a short-term in connection with a server being rebooted and the VEE can be migrated back to the rebooted server or another server.
- switch 602 can execute a virtual switch (vSwitch) that allows communication between VEEs running on switch 602 , or any server connected to switch 602 .
- vSwitch virtual switch
- a virtual switch can include Microsoft Hyper-V, Open vSwitch, VMware vSwitches, and so forth.
- Switch 602 can support S-IOV, SR-IOV, or MR-IOV for its VEEs.
- the VEE running on switch 602 utilizes resources in one or more servers via with S-IOV, SR-IOV, or MR-IOV.
- S-IOV, SR-IOV, or MR-IOV can permit connection or bus sharing across VEEs.
- a VEE running on switch 602 operates as a network termination proxy VEE
- one or more corresponding VEEs run on one or more servers in the rack and in switch 602 .
- VEEs running on switch 602 can process packets and VEEs running on cores on the server or switch 602 can execute applications (e.g., database, webserver, and so forth).
- Use of SIOV, SR-IOV, or MR-IOV can allow the server resources to be composed whereby physically disaggregated servers are logically one system, but the tasks are divided such that the network processing occurs on switch 602 .
- switch 602 can use a high speed connection to at least some of the resources on one or more servers 608 - 0 to 608 -N in a rack, providing access to resources from any of the servers in the rack to VEE 606 running on switch 602 .
- Orchestration control plane 604 can efficiently allocate VEEs to resources and not be limited by what can execute in a single server, but also execute in switch 602 and servers 608 - 0 to 608 -N. This feature allows for potentially constrained resources such as accelerators to be optimally allocated.
- FIG. 7A depicts an example of migration of a VEE from a server to another server.
- live migration e.g., Microsoft® HyperV or VMware® vSphere
- the VEE is transmitted to a TOR switch.
- the VEE is transmitted through a data center core network and at (3), the VEE is transmitted to a TOR switch of another rack.
- the VEE is transmitted to a server, where the VEE can commence execution in another hardware environment.
- FIG. 7B depicts an example of migration of a VEE.
- a VEE can be executed on a switch that uses resources of the switch and connected servers in the rack.
- the VEE is transmitted from the switch to the core network.
- the VEE is transmitted to another switch for execution.
- Another switch can use resources of the switch and connected servers in the rack.
- the destination for the VEE can be a server, as in the example of FIG. 7A . Accordingly, by executing a VEE on a switch with expanded server resources, fewer steps are taken in migrating a VEE and the VEE can commence executing sooner in the scenario of FIG. 7B than in the scenario of FIG. 7A .
- FIG. 8A depicts an example process.
- the process can be performed by a processor enhanced switch in accordance with various embodiments.
- a switch can be configured to execute an orchestration control plane.
- the orchestration control plane can manage compute, memory, and software resources of the switch and one or more servers connected to the switch in a same rack as that of the switch. Servers can execute hypervisors that control execution of virtualized execution environments and also permit or do not permit configurations by the orchestration control plane.
- the connection can be used to provide communication between the switch and the servers.
- the orchestration control plane can receive telemetry from servers in a rack via the connection without the telemetry contributing to east-west traffic within a data center.
- Various examples of the connection are described herein.
- the switch can be configured to execute a virtualized execution environment to perform protocol processing for at least one virtualized execution environment executing on a server.
- protocol processing Various examples of protocol processing are described herein.
- the switch performs network termination of received packets and can provide data from received packets to a memory buffer of a server or the switch.
- a virtualized execution environment can perform any type of operation related to or unrelated to packet or protocol processing.
- the virtualized execution environment can execute a Memcached server or retrieve data from memory devices in another rack or outside of the data center or a webserver or database.
- orchestration control plane can determine resources whether to change an allocation of resources to the virtualized execution environment. For example, based on whether an applicable SLA for the virtualized execution environment or a flow of packets processed by the virtualized execution environment is being met or is not met, the orchestration control plane can determine whether to change an allocation of resources to the virtualized execution environment. For a scenario where the SLA is not being met or is considered likely to be violated, at 808 , orchestration control plane can add additional computing, networking, or memory resources for use by the virtualized execution environment, or instantiate one or more additional virtualized execution environments to assist with processing. In some examples, the virtualized execution environment can be migrated from the switch to a server to improve resource availability.
- orchestration control plane can de-allocate computing resources available to the virtualized execution environment.
- the virtualized execution environment can be migrated from the switch to a server to provide resources for another virtualized execution environment to utilize.
- FIG. 8B depicts an example process.
- the process can be performed by a processor enhanced switch in accordance with various embodiments.
- a virtualized execution environment executing on a switch can perform packet processing of a received packet.
- Packet processing can include one or more of: header parsing, flow identification, segmentation, reassembly, acknowledgements (ACKs), negative-acknowledgements (NACKs), packet retransmit identification and requests, congestion management (e.g., flow control of a transmitter), checksum validation, decryption, encryption, or secure tunneling (e.g., Transport Layer Security (TLS) or Secure Sockets Layer (SSL)) or other operations.
- TLS Transport Layer Security
- SSL Secure Sockets Layer
- the packet and protocol processing virtualized execution environment can perform polling, busy polling, or rely on interrupts to detect for new received packets received in a packet buffer from one or more ports. Based on detection of a new received packet, the virtualized execution environment can process the received packet.
- the virtualized execution environment executing on the switch can determine whether data from the packet is to be made available or discarded. For example, if the packet is subject to a deny status on an access control list (ACL), the packet can be discarded. If the data is determined to be provided to a next virtualized execution environment, the process can continue to 824 . If the packet is determined to be discarded, the process can continue to 826 , where the packet discarded.
- ACL access control list
- the virtualized execution environment can notify a virtualized execution environment executed on a server that data is available and provide the data for access by a virtualized execution environment executed on a server.
- the virtualized execution environment executed on the switch can cause the data to be copied to a buffer accessible to the virtualized execution environment executed on the server.
- DMA direct memory access
- RDMA RDMA
- other direct copy scheme can be used to copy the data to the buffer.
- the data is made available to a virtualized execution environment executed on the switch for processing.
- FIG. 8C depicts an example process.
- the process can be performed by a processor enhanced switch in accordance with various embodiments.
- the switch can be configured to execute a virtualized execution environment to perform retrieval of data from a device in the same or different rack as that of the switch or copying of data to a device in the same or different rack as that of the switch.
- the virtualized execution environment can be configured with information on destination devices that are associated with memory addresses.
- information can indicate a translation of a destination device or server (e.g., IP address or MAC address) that corresponds to a memory address in a memory transaction.
- a destination device or server e.g., IP address or MAC address
- the device or server can store data corresponding to the memory address and the data can be read from the memory address at the device or server.
- the device or server can receive and store data corresponding to the address for the write transaction.
- the switch can receive a memory access request from a server of the same rack.
- the virtualized execution environment executing on the switch can manage the memory access request.
- performance of 836 can include performance of 838 , where the virtualized execution environment executing on the switch can transfer the memory access request to the destination server.
- the switch can re-direct the memory access request to the destination server that stores the requested data instead of sending the memory access request to the server, which in turn, sends the request to the destination server.
- performance of 836 can include performance of 840 , where the virtualized execution environment executing on the switch can perform the memory access request. If the memory access request is a write command, the virtualized execution environment can write data to a memory address corresponding to the memory access request in a device in a same or different rack. If the memory access request is a read command, the virtualized execution environment can copy data from a memory address corresponding to the memory access request in a device in a same or different rack. For example, remote direct memory access can be used to write or read the data.
- the switch can locally cache the data for access by a server connected to the switch.
- the retrieved data can be stored in a memory device of the switch or any server such that any virtualized execution environment executed on any server of a rack can access or modify the data.
- a memory device accessible to the switch and the servers of the rack can access the data as a near memory.
- the switch can write the updated data to the memory device that stores the data.
- block 840 can be performed in a scenario where the switch executes a Memcached server and data is stored in a server that is in a same rack as that of the switch.
- the Memcached server executing on the switch can respond to memory access request that corresponds to a cache miss by retrieving data from another server and storing the retrieved data in a cache in a memory or storage of the rack.
- FIG. 9 depicts a system.
- the system can utilize a switch to manage resources in the system and perform other embodiments described herein.
- System 900 includes processor 910 , which provides processing, operation management, and execution of instructions for system 900 .
- Processor 910 can include any type of microprocessor, central processing unit (CPU), graphics processing unit (GPU), processing core, or other processing hardware to provide processing for system 900 , or a combination of processors.
- Processor 910 controls the overall operation of system 900 , and can be or include, one or more programmable general-purpose or special-purpose microprocessors, digital signal processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs), or the like, or a combination of such devices.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- PLDs programmable logic devices
- system 900 includes interface 912 coupled to processor 910 , which can represent a higher speed interface or a high throughput interface for system components that needs higher bandwidth connections, such as memory subsystem 920 or graphics interface components 940 , or accelerators 942 .
- Interface 912 represents an interface circuit, which can be a standalone component or integrated onto a processor die.
- graphics interface 940 interfaces to graphics components for providing a visual display to a user of system 900 .
- graphics interface 940 can drive a high definition (HD) display that provides an output to a user.
- HD high definition
- High definition can refer to a display having a pixel density of approximately 100 PPI (pixels per inch) or greater and can include formats such as full HD (e.g., 1080p), retina displays, 4K (ultra-high definition or UHD), or others.
- the display can include a touchscreen display.
- graphics interface 940 generates a display based on data stored in memory 930 or based on operations executed by processor 910 or both. In one example, graphics interface 940 generates a display based on data stored in memory 930 or based on operations executed by processor 910 or both.
- Accelerators 942 can be programmable or fixed function offload engines that can be accessed or used by a processor 910 .
- an accelerator among accelerators 942 can provide compression (DC) capability, cryptography services such as public key encryption (PKE), cipher, hash/authentication capabilities, decryption, or other capabilities or services.
- DC compression
- PKE public key encryption
- cipher hash/authentication capabilities
- decryption or other capabilities or services.
- an accelerator among accelerators 942 provides field select controller capabilities as described herein.
- accelerators 942 can be integrated into a CPU or connected to CPU by various devices (e.g., a connector to a motherboard or circuit board that includes a CPU and provides an electrical interface with the CPU).
- accelerators 942 can include a single or multi-core processor, graphics processing unit, logical execution unit single or multi-level cache, functional units usable to independently execute programs or threads, application specific integrated circuits (ASICs), neural network processors (NNPs), programmable control logic, and programmable processing elements such as field programmable gate arrays (FPGAs). Accelerators 942 can provide multiple neural networks, CPUs, processor cores, general purpose graphics processing units, or graphics processing units can be made available for use by artificial intelligence (AI) or machine learning (ML) models.
- AI artificial intelligence
- ML machine learning
- the AI model can use or include any or a combination of a reinforcement learning scheme, Q-learning scheme, deep-Q learning, or Asynchronous Advantage Actor-Critic (A3C), combinatorial neural network, recurrent combinatorial neural network, or other AI or ML model.
- A3C Asynchronous Advantage Actor-Critic
- Multiple neural networks, processor cores, or graphics processing units can be made available for use by AI or ML models.
- Memory subsystem 920 represents the main memory of system 900 and provides storage for code to be executed by processor 910 , or data values to be used in executing a routine.
- Memory subsystem 920 can include one or more memory devices 930 such as read-only memory (ROM), flash memory, one or more varieties of random-access memory (RAM) such as DRAM, or other memory devices, or a combination of such devices.
- Memory 930 stores and hosts, among other things, operating system (OS) 932 to provide a software platform for execution of instructions in system 900 .
- applications 934 can execute on the software platform of OS 932 from memory 930 .
- Applications 934 represent programs that have their own operational logic to perform execution of one or more functions.
- Processes 936 represent agents or routines that provide auxiliary functions to OS 932 or one or more applications 934 or a combination.
- OS 932 , applications 934 , and processes 936 provide software logic to provide functions for system 900 .
- memory subsystem 920 includes memory controller 922 , which is a memory controller to generate and issue commands to memory 930 . It will be understood that memory controller 922 could be a physical part of processor 910 or a physical part of interface 912 .
- memory controller 922 can be an integrated memory controller, integrated onto a circuit with processor 910 .
- system 900 can include one or more buses or bus systems between devices, such as a memory bus, a graphics bus, interface buses, or others.
- Buses or other signal lines can communicatively or electrically couple components together, or both communicatively and electrically couple the components.
- Buses can include physical communication lines, point-to-point connections, bridges, adapters, controllers, or other circuitry or a combination.
- Buses can include, for example, one or more of a system bus, a Peripheral Component Interconnect (PCI) bus, a Hyper Transport or industry standard architecture (ISA) bus, a small computer system interface (SCSI) bus, a universal serial bus (USB), or an Institute of Electrical and Electronics Engineers (IEEE) standard 1394 bus (Firewire).
- PCI Peripheral Component Interconnect
- ISA Hyper Transport or industry standard architecture
- SCSI small computer system interface
- USB universal serial bus
- IEEE Institute of Electrical and Electronics Engineers
- system 900 includes interface 914 , which can be coupled to interface 912 .
- interface 914 represents an interface circuit, which can include standalone components and integrated circuitry.
- Network interface 950 provides system 900 the ability to communicate with remote devices (e.g., servers or other computing devices) over one or more networks.
- Network interface 950 can include an Ethernet adapter, wireless interconnection components, cellular network interconnection components, USB (universal serial bus), or other wired or wireless standards-based or proprietary interfaces.
- Network interface 950 can transmit data to a device that is in the same data center or rack or a remote device, which can include sending data stored in memory.
- Network interface 950 can receive data from a remote device, which can include storing received data into memory.
- Various embodiments can be used in connection with network interface 950 , processor 910 , and memory subsystem 920 .
- system 900 includes one or more input/output (I/O) interface(s) 960 .
- I/O interface 960 can include one or more interface components through which a user interacts with system 900 (e.g., audio, alphanumeric, tactile/touch, or other interfacing).
- Peripheral interface 970 can include any hardware interface not specifically mentioned above. Peripherals refer generally to devices that connect dependently to system 900 . A dependent connection is one where system 900 provides the software platform or hardware platform or both on which operation executes, and with which a user interacts.
- system 900 includes storage subsystem 980 to store data in a nonvolatile manner.
- storage subsystem 980 includes storage device(s) 984 , which can be or include any conventional medium for storing large amounts of data in a nonvolatile manner, such as one or more magnetic, solid state, or optical based disks, or a combination.
- Storage 984 holds code or instructions and data 986 in a persistent state (e.g., the value is retained despite interruption of power to system 900 ).
- Storage 984 can be generically considered to be a “memory,” although memory 930 is typically the executing or operating memory to provide instructions to processor 910 .
- storage 984 is nonvolatile
- memory 930 can include volatile memory (e.g., the value or state of the data is indeterminate if power is interrupted to system 900 ).
- storage subsystem 980 includes controller 982 to interface with storage 984 .
- controller 982 is a physical part of interface 914 or processor 910 or can include circuits or logic in both processor 910 and interface 914 .
- a volatile memory is memory whose state (and therefore the data stored in it) is indeterminate if power is interrupted to the device. Dynamic volatile memory requires refreshing the data stored in the device to maintain state.
- DRAM Dynamic Random-Access Memory
- SDRAM Synchronous DRAM
- SRAM static random-access memory
- a memory subsystem as described herein may be compatible with a number of memory technologies, such as DDR3 (Double Data Rate version 3, original release by JEDEC (Joint Electronic Device Engineering Council) on Jun. 27, 2007).
- DDR4 (DDR version 4, initial specification published in September 2012 by JEDEC), DDR4E (DDR version 4), LPDDR3 (Low Power DDR version3, JESD209-3B, August 2013 by JEDEC), LPDDR4) LPDDR version 4, JESD209-4, originally published by JEDEC in August 2014), WIO2 (Wide Input/output version 2, JESD229-2 originally published by JEDEC in August 2014, HBM (High Bandwidth Memory, JESD325, originally published by JEDEC in October 2013, LPDDR5 (currently in discussion by JEDEC), HBM2 (HBM version 2), currently in discussion by JEDEC, or others or combinations of memory technologies, and technologies based on derivatives or extensions of such specifications.
- DDR or DDRx can refer to any version of DDR, where x is an integer.
- a non-volatile memory (NVM) device is a memory whose state is determinate even if power is interrupted to the device.
- the NVM device can comprise a block addressable memory device, such as NAND technologies, or more specifically, multi-threshold level NAND flash memory (for example, Single-Level Cell (“SLC”), Multi-Level Cell (“MLC”), Quad-Level Cell (“QLC”), Tri-Level Cell (“TLC”), or some other NAND).
- SLC Single-Level Cell
- MLC Multi-Level Cell
- QLC Quad-Level Cell
- TLC Tri-Level Cell
- a NVM device can also comprise a byte-addressable write-in-place three dimensional cross point memory device, or other byte addressable write-in-place NVM device (also referred to as persistent memory), such as single or multi-level Phase Change Memory (PCM) or phase change memory with a switch (PCMS), Intel® OptaneTM memory, NVM devices that use chalcogenide phase change material (for example, chalcogenide glass), resistive memory including metal oxide base, oxygen vacancy base and Conductive Bridge Random Access Memory (CB-RAM), nanowire memory, ferroelectric random access memory (FeRAM, FRAM), magneto resistive random access memory (MRAM) that incorporates memristor technology, spin transfer torque (STT)-MRAM, a spintronic magnetic junction memory based device, a magnetic tunneling junction (MTJ) based device, a DW (Domain Wall) and SOT (Spin Orbit Transfer) based device, a thyristor based memory device, or a combination of any of the
- a power source (not depicted) provides power to the components of system 900 . More specifically, power source typically interfaces to one or multiple power supplies in system 900 to provide power to the components of system 900 .
- the power supply includes an AC to DC (alternating current to direct current) adapter to plug into a wall outlet.
- AC power can be renewable energy (e.g., solar power) power source.
- power source includes a DC power source, such as an external AC to DC converter.
- power source or power supply includes wireless charging hardware to charge via proximity to a charging field.
- power source can include an internal battery, alternating current supply, motion-based power supply, solar power supply, or fuel cell source.
- system 900 can be implemented using interconnected compute sleds of processors, memories, storages, network interfaces, and other components.
- High speed interconnects can be used such as PCIe, Ethernet, or optical interconnects (or a combination thereof).
- system 900 can be implemented using interconnected compute sleds of processors, memories, storages, network interfaces, and other components.
- High speed interconnects can be used such as: Ethernet (IEEE 802.3), remote direct memory access (RDMA), InfiniBand, Internet Wide Area RDMA Protocol (iWARP), Transmission Control Protocol (TCP), User Datagram Protocol (UDP), quick User Datagram Protocol Internet Connections (QUIC), RDMA over Converged Ethernet (RoCE), Peripheral Component Interconnect express (PCIe), Intel QuickPath Interconnect (QPI), Intel Ultra Path Interconnect (UPI), Intel On-Chip System Fabric (IOSF), Omnipath, Compute Express Link (CXL), HyperTransport, high-speed fabric, NVLink, Advanced Microcontroller Bus Architecture (AMB A) interconnect, OpenCAPI, Gen-Z, Cache Coherent Interconnect for Accelerators (CCIX), 3GPP Long Term Evolution (LTE) (4G), 3GPP 5G, and variations thereof.
- AMB A Advanced
- Embodiments herein may be implemented in various types of computing, smart phones, tablets, personal computers, and networking equipment, such as switches, routers, racks, and blade servers such as those employed in a data center and/or server farm environment.
- the servers used in data centers and server farms comprise arrayed server configurations such as rack-based servers or blade servers. These servers are interconnected in communication via various network provisions, such as partitioning sets of servers into Local Area Networks (LANs) with appropriate switching and routing facilities between the LANs to form a private Intranet.
- LANs Local Area Networks
- cloud hosting facilities may typically employ large data centers with a multitude of servers.
- a blade comprises a separate computing platform that is configured to perform server-type functions, that is, a “server on a card.” Accordingly, each blade includes components common to conventional servers, including a main printed circuit board (main board) providing internal wiring (e.g., buses) for coupling appropriate integrated circuits (ICs) and other components mounted to the board.
- main board main printed circuit board
- ICs integrated circuits
- FIG. 10 depicts an environment 1000 that includes multiple computing racks 1002 , each including a Top of Rack (ToR) switch 1004 , a pod manager 1006 , and a plurality of pooled system drawers.
- Embodiments of the switch herein can be used to manage device resources, virtual execution environment operation, and data locality to a VEE (e.g., storage of data in the same rack as that which executes the VEE).
- the pooled system drawers may include pooled compute drawers and pooled storage drawers.
- the pooled system drawers may also include pooled memory drawers and pooled Input/Output (I/O) drawers.
- I/O Input/Output
- the pooled system drawers include an Intel® XEON® pooled computer drawer 1008 , and Intel® ATOMTM pooled compute drawer 1010 , a pooled storage drawer 1012 , a pooled memory drawer 1014 , and a pooled I/O drawer 1016 .
- Each of the pooled system drawers is connected to ToR switch 1004 via a high-speed link 1018 , such as a 40 Gigabit/second (Gb/s) or 100 Gb/s Ethernet link or a 100+Gb/s Silicon Photonics (SiPh) optical link.
- Gb/s 40 Gigabit/second
- SiPh Silicon Photonics
- Multiple of the computing racks 1002 may be interconnected via their ToR switches 1004 (e.g., to a pod-level switch or data center switch), as illustrated by connections to a network 1020 .
- ToR switches 1004 e.g., to a pod-level switch or data center switch
- groups of computing racks 1002 are managed as separate pods via pod manager(s) 1006 .
- a single pod manager is used to manage all of the racks in the pod.
- distributed pod managers may be used for pod management operations.
- Environment 1000 further includes a management interface 1022 that is used to manage various aspects of the environment. This includes managing rack configuration, with corresponding parameters stored as rack configuration data 1024 .
- FIG. 11 depicts an example network element that can be used by embodiments of the switch herein.
- Various embodiments of a switch can perform any operations of network interface 1100 .
- network interface 110 can be implemented as a network interface controller, network interface card, a host fabric interface (HFI), host bus adapter (HBA).
- Network interface 1100 can be coupled to one or more servers using a bus, PCIe, CXL, or DDRx.
- Network interface 1100 may be embodied as part of a system-on-a-chip (SoC) that includes one or more processors, or included on a multichip package that also contains one or more processors.
- SoC system-on-a-chip
- Network interface 1100 can include transceiver 1102 , processors 1104 , transmit queue 1106 , receive queue 1108 , memory 1110 , and bus interface 1112 , and DMA engine 1152 .
- Transceiver 1102 can be capable of receiving and transmitting packets in conformance with the applicable protocols such as Ethernet as described in IEEE 802.3, although other protocols may be used.
- Transceiver 1102 can receive and transmit packets from and to a network via a network medium (not depicted).
- Transceiver 1102 can include PHY circuitry 1114 and media access control (MAC) circuitry 1116 .
- PHY circuitry 1114 can include encoding and decoding circuitry (not shown) to encode and decode data packets according to applicable physical layer specifications or standards.
- MAC circuitry 1116 can be configured to assemble data to be transmitted into packets, that include destination and source addresses along with network control information and error detection hash values.
- Processors 1104 can be any a combination of a: processor, core, graphics processing unit (GPU), field programmable gate array (FPGA), application specific integrated circuit (ASIC), or other programmable hardware device that allow programming of network interface 1100 .
- processors 1104 can provide for identification of a resource to use to perform a workload and generation of a bitstream for execution on the selected resource.
- a “smart network interface” can provide packet processing capabilities in the network interface using processors 1104 .
- Packet allocator 1124 can provide distribution of received packets for processing by multiple CPUs or cores using timeslot allocation described herein or RSS. When packet allocator 1124 uses RSS, packet allocator 1124 can calculate a hash or make another determination based on contents of a received packet to determine which CPU or core is to process a packet.
- Interrupt coalesce 1122 can perform interrupt moderation whereby network interface interrupt coalesce 1122 waits for multiple packets to arrive, or for a time-out to expire, before generating an interrupt to host system to process received packet(s).
- Receive Segment Coalescing can be performed by network interface 1100 whereby portions of incoming packets are combined into segments of a packet. Network interface 1100 provides this coalesced packet to an application.
- Direct memory access (DMA) engine 1152 can copy a packet header, packet payload, and/or descriptor directly from host memory to the network interface or vice versa, instead of copying the packet to an intermediate buffer at the host and then using another copy operation from the intermediate buffer to the destination buffer.
- DMA engine 1152 can perform writes of data to any cache such as by using Data Direct I/O (DDIO).
- DDIO Data Direct I/O
- Memory 1110 can be any type of volatile or non-volatile memory device and can store any queue or instructions used to program network interface 1100 .
- Transmit queue 1106 can include data or references to data for transmission by network interface.
- Receive queue 1108 can include data or references to data that was received by network interface from a network.
- Descriptor queues 1120 can include descriptors that reference data or packets in transmit queue 1106 or receive queue 1108 .
- Bus interface 1112 can provide an interface with host device (not depicted).
- bus interface 1112 can be compatible with PCI, PCI Express, PCI-x, PHY Interface for the PCI Express (PIPE), Serial ATA, and/or USB compatible interface (although other interconnection standards may be used).
- network interface and other embodiments described herein can be used in connection with a base station (e.g., 3G, 4G, 5G and so forth), macro base station (e.g., 5G networks), picostation (e.g., an IEEE 802.11 compatible access point), nanostation (e.g., for Point-to-MultiPoint (PtMP) applications), on-premises data centers, off-premises data centers, edge network elements, fog network elements, and/or hybrid data centers (e.g., data center that use virtualization, cloud and software-defined networking to deliver application workloads across physical data centers and distributed multi-cloud environments).
- a base station e.g., 3G, 4G, 5G and so forth
- macro base station e.g., 5G networks
- picostation e.g., an IEEE 802.11 compatible access point
- nanostation e.g., for Point-to-MultiPoint (PtMP) applications
- on-premises data centers e.g., off-premises data centers, edge
- hardware elements may include devices, components, processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, ASICs, PLDs, DSPs, FPGAs, memory units, logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth.
- software elements may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, APIs, instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an example is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints, as desired for a given implementation.
- a processor can be one or more combination of a hardware state machine, digital control logic, central processing unit, or any hardware, firmware and/or software elements.
- a computer-readable medium may include a non-transitory storage medium to store logic.
- the non-transitory storage medium may include one or more types of computer-readable storage media capable of storing electronic data, including volatile memory or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writeable or re-writeable memory, and so forth.
- the logic may include various software elements, such as software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, API, instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof.
- a computer-readable medium may include a non-transitory storage medium to store or maintain instructions that when executed by a machine, computing device or system, cause the machine, computing device or system to perform methods and/or operations in accordance with the described examples.
- the instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, and the like.
- the instructions may be implemented according to a predefined computer language, manner or syntax, for instructing a machine, computing device or system to perform a certain function.
- the instructions may be implemented using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language.
- One or more aspects of at least one example may be implemented by representative instructions stored on at least one machine-readable medium which represents various logic within the processor, which when read by a machine, computing device or system causes the machine, computing device or system to fabricate logic to perform the techniques described herein.
- Such representations known as “IP cores,” may be stored on a tangible, machine readable medium and supplied to various customers or manufacturing facilities to load into the fabrication machines that actually make the logic or processor.
- Coupled and “connected” along with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, descriptions using the terms “connected” and/or “coupled” may indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
- first,” “second,” and the like, herein do not denote any order, quantity, or importance, but rather are used to distinguish one element from another.
- the terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced items.
- asserted used herein with reference to a signal denote a state of the signal, in which the signal is active, and which can be achieved by applying any logic level either logic 0 or logic 1 to the signal.
- follow or “after” can refer to immediately following or following after some other event or events. Other sequences of steps may also be performed according to alternative embodiments. Furthermore, additional steps may be added or removed depending on the particular applications. Any combination of changes can be used and one of ordinary skill in the art with the benefit of this disclosure would understand the many variations, modifications, and alternative embodiments thereof.
- Disjunctive language such as the phrase “at least one of X, Y, or Z,” unless specifically stated otherwise, is otherwise understood within the context as used in general to present that an item, term, etc., may be either X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z). Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y, or at least one of Z to each be present. Additionally, conjunctive language such as the phrase “at least one of X, Y, and Z,” unless specifically stated otherwise, should also be understood to mean X, Y, Z, or any combination thereof, including “X, Y, and/or Z.”′
- An embodiment of the devices, systems, and methods disclosed herein are provided below.
- An embodiment of the devices, systems, and methods may include any one or more, and any combination of, the examples described below.
- Example 1 includes a method comprising a switch device for a rack of two or more physical servers, wherein the switch device is coupled to the two or more physical servers and the switch device performs packet protocol processing termination for received packets and provides payload data from the received packets without a received packet header to a destination buffer of a destination physical server in the rack.
- Example 2 includes any example, wherein the switch device comprises at least one central processing unit, the at least one central processing unit is to execute packet processing operations on the received packets.
- Example 3 includes any example, wherein a physical server executes at least one virtualized execution environments (VEE) and the at least one central processing unit executes a VEE for packet processing of packets with data to be accessed by the physical server that executes the VEE.
- VEE virtualized execution environments
- Example 4 includes any example, wherein the switch device stores a mapping of memory addresses and corresponding destination devices, and based on receipt of a memory transaction from a physical server in the rack, the switch device performs the memory transaction.
- Example 5 includes any example, wherein the switch device performs the memory transaction comprises: for a read request, the switch device retrieves data from a physical server connected to the rack or another device of a different rack based on the mapping and stores the data into a memory managed by the switch device.
- Example 6 includes any example, wherein the switch device stores a mapping of memory addresses and corresponding destination devices, and based on receipt of a memory transaction from a physical server in the rack: based on a memory address associated with the memory transaction being associated with a destination server in another rack according to the mapping, transmitting the memory transaction to the destination server, receiving a response to the memory transaction, and storing the response in a memory of the rack.
- Example 7 includes any example, wherein the switch device comprises at least one central processing unit, the at least one central processing unit to execute a control plane for one or more physical servers that are part of the rack and the control plane to collect telemetry data from the one or more physical servers and based on the telemetry data, perform one or more of: allocation of execution of a virtualized execution environment (VEE) to a physical server of the rack, migration of a VEE from a physical server of the rack to execution on at least one central processing unit of the switch device, migration of a VEE from a physical server of the rack to execution on another physical server of the rack, or allocation of memory of a physical server of the rack for access by a VEE executing on a physical server of the rack.
- VEE virtualized execution environment
- Example 8 includes any example, wherein the switch device comprises at least one central processing unit, the at least one central processing unit to execute a control plane for one or more physical servers that are part of the rack and the control plane distributes execution of virtualized execution environment (VEEs) among one or more physical servers of the rack and selectively terminates a VEE or migrates a VEE to execution on another physical server of the rack or on the switch device.
- VEEs virtualized execution environment
- Example 9 includes any example, and includes an apparatus comprising: a switch comprising: at least one processor, wherein the at least one processor is to perform packet termination processing of a received packet and copy payload data from the received packet without an associated received packet header to a destination buffer of a destination physical server through a connection.
- a switch comprising: at least one processor, wherein the at least one processor is to perform packet termination processing of a received packet and copy payload data from the received packet without an associated received packet header to a destination buffer of a destination physical server through a connection.
- Example 10 includes any example, wherein the at least one processor is to execute a virtualized execution environment (VEE) and wherein the VEE is to perform the packet termination processing.
- VEE virtualized execution environment
- Example 11 includes any example, wherein based on receipt of a memory transaction from a physical server through the connection, the at least one processor is to perform the memory transaction based on a mapping of memory addresses and corresponding destination devices.
- Example 12 includes any example, wherein to perform the memory transaction, the at least one processor is to: for a read request: retrieve data from a physical server connected to the at least one processor through the connection or another device of a different rack and store the data into a memory managed by the at least one processor.
- Example 13 includes any example, wherein: based on receipt of a memory transaction from a physical server in a rack associated with the switch: based a memory address associated with the memory transaction being associated with a destination server in another rack according to a mapping of memory addresses and corresponding destination devices, the at least one processor is to cause transmission of the memory transaction to the destination server, the at least one processor is to access a response to the memory transaction, and the at least one processor is to cause the response to be stored in a memory of the rack.
- Example 14 includes any example, wherein: the at least one processor is to execute a control plane for one or more physical servers that are part of a rack associated with the switch and the control plane is to collect telemetry data from the one or more physical servers and based on the telemetry data, perform one or more of: allocation of execution of a virtualized execution environment (VEE) to a physical server of the rack, migration of a VEE from a physical server of the rack to execution on the at least one central processing unit of the switch, migration of a VEE from a physical server of the rack to execution on another physical server of the rack, or allocation of memory of a server of the rack for access by a VEE executing on a physical server of the rack.
- VEE virtualized execution environment
- Example 15 includes any example, wherein: the at least one processor is to execute a control plane for one or more physical servers that are part of a rack associated with the switch and the control plane is to distribute execution of virtualized execution environment (VEEs) among one or more physical servers of the rack and selectively terminate a VEE or migrate a VEE to execution on another physical server of the rack or at least one processor that is part of the switch.
- VEEs virtualized execution environment
- Example 16 includes any example, wherein the connection is compatible with one or more of: Peripheral Component Interconnect express (PCIe), Compute Express Link (CXL), or any type of Double Data Rate (DDR).
- PCIe Peripheral Component Interconnect express
- CXL Compute Express Link
- DDR Double Data Rate
- Example 17 includes any example, and includes at least one non-transitory computer-readable medium comprising instructions stored thereon, that if executed by a switch, cause the switch to: execute a control plane at the switch to collect telemetry data from one or more physical servers and based on the telemetry data, perform one or more of: allocation of execution of a virtualized execution environment (VEE) to a physical server of a rack that includes the switch, migration of a VEE from a physical server of the rack to execution on at least one central processing unit of the switch, migration of a VEE from a physical server of the rack to execution on another physical server of the rack, or allocation of memory of a server of the rack for access by a VEE executing on a physical server of the rack.
- VEE virtualized execution environment
- Example 18 includes any example, comprising instructions stored thereon, that if executed by a switch, cause the switch to: store a mapping of memory addresses and corresponding destination devices and based on receipt of a memory transaction from a physical server through a connection and based on mapping of memory addresses and corresponding destination devices, the switch is to retrieve data from a physical server connected to the switch through the connection or another device of a different rack and store the data into a memory managed by the switch.
- Example 19 includes any example, comprising instructions stored thereon, that if executed by a switch, cause the switch to: store a mapping of memory addresses and corresponding destination devices, and based on receipt of a memory transaction from a server in a rack associated with the switch: based a memory address associated with the memory transaction being associated with a destination server in another rack according to the mapping, the switch is to transmit the memory transaction to the destination server, the switch is to receive a response to the memory transaction, and the switch is to store the response in a memory of the rack.
- Example 20 includes any example, a connection between the switch and one or more physical servers of the rack is compatible with one or more of: Peripheral Component Interconnect express (PCIe), Compute Express Link (CXL), or any type of Double Data Rate (DDR).
- PCIe Peripheral Component Interconnect express
- CXL Compute Express Link
- DDR Double Data Rate
- Example 21 includes any example, and includes a network device comprising: circuitry to perform network protocol termination for received packets; at least one Ethernet port; and multiple connections to be connected to different physical servers in a rack, wherein the circuitry to perform network protocol termination for received packets is to provide a payload of a received packet without an associated header to a physical server.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Multi Processors (AREA)
Abstract
Description
-  In the context of cloud computing, cloud service providers (CSPs) offer various services to other companies or individuals for use such as infrastructure as a service (IaaS), software as a service (SaaS) or platform as a service (PaaS). A hardware infrastructure including compute, memory, storage, accelerators, networking, and so forth executes and supports software stacks provided by the CSPs and their customers.
-  CSPs can have experience complex networking environments where packets are parsed, de-encapsulated, decrypted, and sent to a proper virtual machine (VM). In some cases, packet flows are balanced and metered to achieve service level agreement (SLA) requirements. In some cases, network processing occurs in the servers within a datacenter. However, with increased volumes of packets and increased amounts and complexity of packet processing activities, a burden on the servers is increasing. Central processing units (CPUs) or other server processor resources are used for packet processing, but CPUs and other processor resources can be used for other services that are billable or generate higher revenue than packet processing. The impact of this problem is significantly increased when using high bit-rate network devices such as the 100 Gbps and higher speed networks.
-  FIGS. 1A-1D depict example switch systems.
-  FIG. 2A depicts an example overview of a system of managing resources in a rack.
-  FIG. 2B depicts an example overview of various management hierarchies.
-  FIG. 3 depicts an example system in which a switch can respond to a memory access request.
-  FIG. 4A shows examples of a Memcached server executing on a server and in a switch.
-  FIG. 4B shows the Ethernet packet flow for a single request.
-  FIGS. 5A-5C depict example systems in which packets can terminate at a switch.
-  FIG. 6 depicts an example of a switch that executes an orchestration control plane to manage what device executes a virtualized execution environment.
-  FIG. 7A depicts an example of migration of a virtualized execution environment from a server to another server.
-  FIG. 7B depicts an example of migration of a virtualized execution environment.
-  FIGS. 8A-8C depict example processes.
-  FIG. 9 depicts a system.
-  FIG. 10 depicts an environment.
-  FIG. 11 depicts an example network element.
-  Within a data center, north-south traffic can include packets that flow in or out of the data center whereas east-west traffic can include packets that flow between nodes (e.g., racks of servers) within the data center. North-south traffic can be considered a product for serving customers, whereas east-west traffic can be considered overhead. The amount of east-west traffic has been growing at a rate that is significantly higher than north-south traffic and processing east-west traffic flow in a timely manner to comply with applicable SLAs while reducing data center total cost of ownership (TCO) is a growing challenge within the datacenter.
-  Increasing networking speeds within a data center (e.g., 100 Gbps Ethernet and above) to provide for faster traffic rates within the data center is a manner of addressing traffic growth. However, an increase in network speed can involve even more packet processing activities, which use processor resources that could otherwise be used for other tasks.
-  Some solutions reduce CPU utilization and accelerate packet processing by offloading the tasks to network controller hardware with specialized hardware. However, specialized hardware may be limited to current day workloads and not be flexible to handle future, different workloads or packet processing activities.
-  Some solutions seek to reduce the overhead of packet processing through simplified protocols but still use significant CPU utilization to perform packet processing.
-  Various embodiments provide for attempting to reduce server processor utilization and attempting to reduce or control growth of east-west traffic within a data center while providing sufficiently fast packet processing. Various embodiments provide a switch with infrastructure offload capabilities including one or more CPUs or other accelerator devices inclusively. Various embodiments provide a switch with certain packet processing network interface card (NIC) functionality to allow the switch to perform packet processing or network termination and freeing server CPUs to perform other tasks. The switch can include or access server class processors, switching blocks, accelerators, offload engines, ternary content-addressable memory (TCAM) and packet processing pipelines. The packet processing pipeline(s) could be programmable via P4 or other programming languages. The switch can be connected to one or more CPUs or host servers using various connections. For example, direct attach copper (DAC), fiber optic cable, or other cables can be used to connect the switch with one or more CPUs, compute hosts, servers, including servers in a rack. In some examples, connections can be less than 6 feet in length to reduce bit error rate (BER). Note that reference to a switch can refer to multiple connected switches or a distributed switch and a rack may include multiple switches to logically split a rack into two half racks or into pods (e.g., one or more racks).
-  Various embodiments of the rack switch can be configured to perform one or more of: (1) telemetry aggregation via high speed connections of packet transmit rates, response latency, cache misses, virtualized execution environment requests, and so forth; (2) orchestration of server resources connected to the switch based at least on telemetry; (3) orchestration of virtual execution environments executing on various servers based at least on telemetry; (4) network termination and protocol processing; (5) memory transaction completion by retrieving data associated with a memory transaction and providing the data to the requester or forwarding the memory transaction to a target that can retrieve the data associated with the memory transaction; (6) caching of data for access by one or more servers in the rack or group of racks; (7) Memcached resource management at the switch; (8) execution of one or more virtualized execution environments to perform packet processing (e.g., header processing in accordance with applicable protocols); (9) management of execution of virtualized execution environments in the switch or in a server or both for load balancing or redundancy; or (10) migration of virtualized execution environments between the switch and a server or server to server. Accordingly, by enhancement to operations of a rack switch, server CPU cycles can be freed to use for billable or value add services.
-  Various embodiments can terminate network processing in the switch, in place of a server. For example, the switch can perform protocol termination, decryption, decapsulation, acknowledgements (ACKs), integrity checks, and network-related tasks can be performed by a switch and not handled by the server. The switch can include specialized offload engines for known protocols or calculations and be extensible or programmable to process new protocols or vendor specific protocols via software or field programmable gate (FPGA) to flexibly support future needs.
-  Network termination at the switch can reduce or eliminate transfers of data for processing by multiple VEEs that are potentially on different servers or even different racks for service function chain processing. The switch can perform network processing and provide the resulting data, after processing, to the destination server within the rack.
-  In some examples, the switch can manage memory input/output (I/O) requests by directing memory I/O requests to the target device instead of to a server for the server to determine a target device and the server transmitting the I/O request to another server or target device. Servers can include a memory pool, storage pool or server, compute server, or provide other resources. Various embodiments can be used in a scenario where aserver 1 issues an I/O request to access memory where a near memory is accessed from aserver 2 and a far memory is accessed from a server 3 (e.g., 2 level memory (2LM), memory pooling, or thin memory provisioning). For example, the switch can receive a request fromserver 1 that requests a read or write to memory directed tosystem 2. The switch can be configured to identify that a memory address referenced by the request is in a memory associated with aserver 3 and the switch can forward the request toserver 3 instead of sending the request toserver 2, which would transmit the request toserver 3. As such, the switch can reduce a time taken to complete a memory transaction. In some examples, the switch can perform caching of data on the same rack to reduce east-west traffic for subsequent requests for the data.
-  Note that the switch can notifyserver 2 that an access to memory ofserver 3 has taken place so thatserver 2 andserver 3 can maintain coherency or consistency of the data associated with the memory address. Ifserver 2 has posted writes or dirties (modifies) cache lines, coherency protocols and/or producer consumer models can be used to maintain consistency of data stored inserver 2 andserver 3.
-  In some examples, the switch can execute orchestration, hypervisor functionality, as well as manage service chain functionality. The switch can orchestrate processor and memory resources and virtual execution environment (VEE) execution for an entire rack of servers to provide aggregated resources of a rack as a single, composite server. For example, the switch can allocate use of compute sleds, memory sleds, and accelerator sleds for execution by one or more VEEs.
-  In some examples, the switch is positioned top-of-rack (TOR) or middle of rack (MOR) relative to connected servers to reduce a length of connection between the switch and servers. For example, for a switch positioned TOR (e.g., furthest from the floor of the rack), servers connect to the switch so that copper cabling from the servers to the rack switch stay within the rack. The switch can link the rack to the data center network with fiber optic cable running from the rack to an aggregation region. For a MOR switch position, the switch is positioned towards the center of the rack between the bottom of the rack and the top of the rack. Other rack positions for switch can be used such as end of rack (EOR).
-  FIG. 1A depicts an example switch system. Switch 100 can include oraccess switch circuitry 102 that is communicatively coupled to port circuitry 104-0 to 104-N. Port circuitry 104-0 to 104-N can receive packets and provide packets to switchcircuitry 102. When port circuitry 104-0 to 104-N is Ethernet compatible, port circuitry 104-0 to 104-N can include a physical layer interface (PHY) (e.g., physical medium attachment (PMA) sublayer, Physical Medium Dependent (PMD), a forward error correction (FEC), and a physical coding sublayer (PCS)), media access control (MAC) encoding or decoding, and a Reconciliation Sublayer (RS). An optical-to-electrical signal interface can provide electrical signals to the network port. Modules can be built using a standard mechanical and electrical form factors such as the Small Form-factor Pluggable (SFP), Quad Small Form-factor Pluggable (QSFP), Quad Small Form-factor Pluggable Double Density (QSFP-DD), Micro QSFP, or OSFP (Octal Small Format Pluggable) interfaces, described in Annex 136C of IEEE Std 802.3cd-2018 and references therein, or other form factors.
-  A packet may be used herein to refer to various formatted collections of bits that may be sent across a network, such as Ethernet frames, IP packets, TCP segments, UDP datagrams, etc. Also, as used in this document, references to L2, L3, L4, and L7 layers (orlayer 2,layer 3, layer 4, and layer 7) are references respectively to the second data link layer, the third network layer, the fourth transport layer, and the seventh application layer of the OSI (Open System Interconnection) layer model.
-  A flow can be a sequence of packets being transferred between two endpoints, generally representing a single session using a known protocol. Accordingly, a flow can be identified by a set of defined N tuples and, for routing purpose, a flow can be identified by tuples that identify the endpoints, e.g., the source and destination addresses. For content based services (e.g., load balancer, firewall, intrusion detection system etc.), flows can be identified at a finer granularity by using five or more tuples (e.g., source address, destination address, IP protocol, transport layer source port, and destination port). A packet in a flow is expected to have the same set of tuples in the packet header. A flow can be unicast, multicast, anycast, or broadcast.
-  Switch circuitry 102 can provide connectivity to, from, and among multiple servers and performs one or more of: traffic aggregation, and match action tables for routing, tunnels, buffering, VxLAN routing, Network Virtualization using Generic Routing Encapsulation (NVGRE), Generic Network Virtualization Encapsulation (Geneve) (e.g., currently a draft Internet Engineering Task Force (IETF) standard), and access control lists (ACLs) to permit or inhibit progress of a packet.
-  Processors 108-0 to 108-M can be coupled to switchcircuitry 102 via respective interfaces 106-0 to 106-M. Interfaces 106-0 to 106-M can provide a low latency, high bandwidth memory-based interface such as Peripheral Component Interconnect express (PCIe), Compute Express Link (CXL), memory interface (e.g., any type of Double Data Rate (DDRx), CXL.io, CXL.cache, or CXL.mem), and/or a network connection (e.g., Ethernet or InfiniBand). In cases where a memory interface is used, the switch can be identified as a memory address.
-  One or more of processor modules 108-0 to 108-M can represent servers with CPUs, random access memory (RAM), persistent or non-volatile storage, accelerators and the processor modules could be one or more servers in the rack. For example, processor modules 108-0 to 108-M can represent multiple distinct physical servers that are communicatively coupled to switch 100 using connections. A physical server can be distinct from another physical server by providing different physical CPU devices, random access memory (RAM) devices, persistent or non-volatile storage devices, or accelerator devices. Distinct physical servers can, however, include the devices with the same performance specifications. A server, as used herein, can refer to a physical server or a composite server that aggregates resources from one or more distinct physical servers.
-  Processor modules 108-0 to 108-M and processor 112-0 or 112-1 can include one or more cores and system agent circuitry. A core can be an execution core or computational engine that can execute instructions. A core can access to its own cache and read only memory (ROM), or multiple cores can share a cache or ROM. Cores can be homogeneous (e.g., same processing capabilities) and/or heterogeneous devices (e.g., different processing capabilities). Frequency or power use of a core can be adjustable. Any type of inter-processor communication techniques can be used, such as but not limited to messaging, inter-processor interrupts (IPI), inter-processor communications, and so forth. Cores can be connected in any type of manner, such as but not limited to, bus, ring, or mesh. Cores may be coupled via an interconnect to a system agent (uncore).
-  System agent can include a shared cache which may include any type of cache (e.g.,level 1,level 2, or last level cache (LLC)). System agent can include or more of: a memory controller, a shared cache, a cache coherency manager, arithmetic logic units, floating point units, core or processor interconnects, or bus or link controllers. System agent or uncore can provide one or more of: direct memory access (DMA) engine connection, non-cached coherent master connection, data cache coherency between cores and arbitrates cache requests, or Advanced Microcontroller Bus Architecture (AMBA) capabilities. System agent or uncore can manage priorities and clock speeds for receive and transmit fabrics and memory controllers.
-  Cores can be communicatively connected using a high-speed interconnect compatible with any of but not limited to Intel QuickPath Interconnect (QPI), Intel Ultra Path Interconnect (UPI), Intel On-Chip System Fabric (IOSF), Omnipath, Compute Express Link (CXL). The number of core tiles is not limited to this example can be any number such as 4, 8, and so forth.
-  As is described in more detail herein, an orchestration control plane, Memcached server, one or more virtualized execution environments (VEEs) can execute on one or more of processor modules 108-0 to 108-M or on processor 112-0 or 112-1.
-  A VEE can include at least a virtual machine or a container. A virtual machine (VM) can be software that runs an operating system and one or more applications. A VM can be defined by specification, configuration files, virtual disk file, non-volatile random-access memory (NVRAM) setting file, and the log file and is backed by the physical resources of a host computing platform. A VM can be an OS or application environment that is installed on software, which imitates dedicated hardware. The end user has the same experience on a virtual machine as they would have on dedicated hardware. Specialized software, called a hypervisor, emulates the PC client or server's CPU, memory, hard disk, network and other hardware resources completely, enabling virtual machines to share the resources. The hypervisor can emulate multiple virtual hardware platforms that are isolated from each other, allowing virtual machines to run Linux® and Windows® Server operating systems on the same underlying physical host.
-  A container can be a software package of applications, configurations and dependencies so the applications run reliably on one computing environment to another. Containers can share an operating system installed on the server platform and run as isolated processes. A container can be a software package that contains everything the software needs to run such as system tools, libraries, and settings.
-  Various embodiments provide driver software for various operating systems (e.g., VMWare®, Linux®, Windows® Server, FreeBSD, Android®, MacOS®, iOS®, or any other operating system) for applications or VEEs to accessswitch 100. In some examples, the driver can present the switch as a peripheral device. In some examples, the driver can present the switch as a network interface controller or network interface card. For example, a driver can provide a VEE with ability to configure and access the switch as a PCIe endpoint. In some examples, a virtual function driver such as Adaptive Virtual Function (AVF) can be used to access the switch. An example of AVF is described at least in “Intel® Ethernet Adaptive Virtual Function Specification” Revision 1.0 (2018). In some examples, a VEE can interact with a driver to turn on or off any feature of the switch described herein.
-  Device drivers (e.g., NDIS-Windows, NetDev-Linux for example) running on processor modules 108-0 to 108-M can bind to switch 100 and advertise capabilities ofswitch 100 to a host operating system (OS) or any OS executed in a VEE. An application or VEE can configure oraccess switch 100 using SIOV, SR-IOV, MR-IOV, or PCIe transactions. By incorporating a PCIe endpoint as aninterface switch 100, switch 100 can be enumerated on any of processor modules 108-0 to 108-M as a PCIe Ethernet or CXL device as a locally attached Ethernet device. For example, switch 100 can be presented as a physical function (PF) to any server (e.g., any of processor modules 108-0 to 108-M). When a resource (e.g., memory, accelerator, networking, CPU) ofswitch 100 is allocated to a server, the resource could appear logically to the server as if attached via a high-speed link (e.g., CXL or PCIe). The server could access the resource (e.g., memory or accelerator) as a hot plugged resource. Alternatively, these resources could appear as pooled resources that are now available to the server.
-  In some examples, processor modules 108-0 to 108-M and switch 100 can support use of single-root I/O virtualization (SR-IOV). PCI-SIG Single Root IO Virtualization and Sharing Specification v1.1 and predecessor and successor versions describe use of a single PCIe physical device under a single root port to appear as multiple separate physical devices to a hypervisor or guest operating system. SR-IOV uses physical functions (PFs) and virtual functions (VFs) to manage global functions for the SR-IOV devices. PFs can be PCIe functions that can configure and manage the SR-IOV functionality. For example, a PF can configure or control a PCIe device, and the PF has ability to move data in and out of the PCIe device. For example, forswitch 100, the PF is a PCIe function ofswitch 100 that supports SR-IOV. The PF includes capability to configure and manage SR-IOV functionality ofswitch 100, such as enabling virtualization and managing PCIe VFs. A VF is associated with a PCIe PF onswitch 100, and the VF represents a virtualized instance ofswitch 100. A VF can have its own PCIe configuration space but can share one or more physical resources onswitch 100, such as an external network port, with the PF and other PFs or other VFs. In other examples, an opposite relationship can be used where any server (e.g., processor modules 108-0 to 108-M) is represented as a PF and a VEE executing onswitch 100 can utilize a VF to configure or access any server.
-  In some examples, platform 1900 and NIC 1950 can interact using Multi-Root IOV (MR-IOV). Multiple Root I/O Virtualization (MR-IOV) and Sharing Specification, revision 1.0, May 12, 2008, from the PCI Special Interest Group (SIG), is a specification for sharing PCI Express (PCIe) devices among multiple computers.
-  In some examples, processor modules 108-0 to 108-M and switch 100 can support use of Intel® Scalable I/O Virtualization (SIOV). For example, processor modules 108-0 to 108-M can access switch 100 as a SIOV capable device or switch 100 can access processor modules 108-0 to 108-M as SIOV capable devices. A SIOV capable device can be configured to group its resources into multiple isolated Assignable Device Interfaces (ADIs). Direct Memory Access (DMA) transfers from/to each ADI are tagged with a unique Process Address Space identifier (PASID) number.Switch 100, processor modules 108-0 to 108-M, network controllers, storage controllers, graphics processing units, and other hardware accelerators can utilize SIOV across many virtualized execution environments. Unlike the coarse-grained device partitioning approach of SR-IOV to create multiple VFs on a PF, SIOV enables software to flexibly compose virtual devices utilizing the hardware-assists for device sharing at finer granularity. Performance critical operations on the composed virtual device are mapped directly to the underlying device hardware, while non-critical operations are emulated through device-specific composition software in the host. A technical specification for SIOV is Intel® Scalable I/O Virtualization Technical Specification, revision 1.0, June 2018.
-  Multitenant security can be employed whereswitch 100 is granted access to some or all server resources in the rack. Accesses byswitch 100 to any server can require use of crypto keys, checksums, or other integrity checks. Any server can employ an access control list (ACL) to ensure communications fromswitch 100 are permitted but can filter out communications from other sources (e.g., drop communications).
-  Examples of packettransmission using switch 100 are described next. In some examples, switch 100 acts a network proxy for a VEE running on a server. A VEE executing onswitch 100 can form the packets for transmission using a network connection ofswitch 100 according to any applicable communications protocol (e.g., standardized or proprietary protocol). In some examples, switch 100 can originate a packet transmission where a workload or VEE running on the cores is inswitch 100 or accessible to switch 100. Switch 100 can access connected internal cores in a similar manner as accessing any other externally connected host. One or more host(s) can be placed inside the same chassis asswitch 100. In some examples where a VEE or service runs on a CPU ofswitch 100, such VEE can originate packets for transmission. For example, where a VEE runs a Memcached server on a CPU ofswitch 100,switch 100 could originate packets for transmission to respond to any request for data or in the case of cache miss, query another server or system for the data and retrieve data to update its cache.
-  FIG. 1B depicts an example system.Switch system 130 can include oraccess switch circuitry 132 that is communicatively coupled to port circuitry 134-0 to 134-N. Port circuitry 134-0 to 134-N can receive packets and provide packets to switchcircuitry 132. Port circuitry 134-0 to 134-N can be similar to any of port circuitry 104-0 to 104-N. Interfaces 136-0 to 136-M can provide communication with respective processor modules 138-0 to 138-M. As is described in more detail herein, an orchestration control plane, Memcached server, or one or more virtualized execution environments (VEEs) running any application (e.g., webserver, database, Memcached server) can execute on one or more of processor modules 138-0 to 138-M. Processor modules 138-0 to 138-M can be similar to respective processor modules 108-0 to 108-M.
-  FIG. 1C depicts an example system.Switch system 140 can include oraccess switch circuitry 142 that is communicatively coupled to port circuitry 144-0 to 144-4. Port circuitry 144-0 to 144-4 can receive packets and provide packets to switchcircuitry 142. Port circuitry 144-0 to 144-N can be similar to any port circuitry 104-0 to 104-N. Interfaces 146-0 to 146-1 can provide communication with respective processor modules 148-0 to 148-1. As is described in more detail herein, an orchestration control plane, Memcached server, or one or more virtualized execution environments (VEEs) running any application (e.g., webserver, database, Memcached server) can execute on one or more of processors 147-0 or 147-1 or processor modules 148-0 to 148-1. Processor modules 148-0 to 148-1 can be similar to any of processor modules 108-0 to 108-M.
-  FIG. 1D depicts an example system. In this example,aggregation switch 150 is coupled to multiple switches of different racks. A rack can include switch 152 coupled to servers 154-0 to 154-N. Another rack can include switch 156 coupled to servers 158-0 to 158-N. One or more of the switches can operate in accordance with embodiments described herein. A core switch or other access point can connectaggregation switch 150 to the Internet for packet transmission and receipt with another data center.
-  Note that depiction of servers relative to switch is not intended to show a physical arrangement as a TOR, MOR or any other switch position can be used (e.g., end of rack (EOR)) relative to servers.
-  Embodiments described herein are not limited to data center operation and can apply to operations among multiple data centers, enterprise networks, on-premises, or hybrid data centers.
-  As network processing can be moved to a switch, any type of configuration that requires power cycling (e.g., after NVM update or firmware update (e.g., update of a Basic Input/Output System (BIOS), Universal Extensible Firmware Interface (UEFI), or a boot loader)) can be performed in isolation and not require the entire switch to power cycle to avoid impacting all servers connected to the switch and in the rack.
-  FIG. 2A depicts an example overview of a system of managing resources in a rack. Various embodiments provideswitch 200 withorchestration control plane 202 that can manage control planes in one or more servers 210-0 to 210-N connected to switch 200.Orchestration control plane 202 can receiveSLA information 206 for one or more VEEs (e.g., any of 214-0-0 to 214-0-P or 214-N-0 to 214-N-P),telemetry information 204 from servers in the rack such as resource utilization, measured device throughput (e.g., memory read or write completion times), available memory or storage bandwidth, or resources needs of a server connected to the switch or more broadly, in the rack. Usingtelemetry information 204 to affect compliance with SLAs of VEEs,orchestration control plane 202 can proactively control, moderate, or quiesce network bandwidth allocated to a server (e.g., data transmission rates fromswitch 200 to a server or from the server to switch 200) and thereby moderate a rate of communications sent from or received by VEEs running on a server.
-  In some examples,orchestration control plane 202 can allocate to any server's hypervisor (e.g., 212-0 to 212-N) one or more of: compute resources, network bandwidth (e.g., betweenswitch 200 and another switch (e.g., aggregation switch or switch for another rack), and memory or storage bandwidth. For example, switch 200 can proactively manage data transmission or receipt bandwidths to any VEE in a rack and prior to receipt of any flow control message, but can also manage data transmission bandwidth from any VEE in the event of receipt of a flow control message (e.g., XON/XOFF or Ethernet PAUSE) to reduce or pause transmission of a flow.Orchestration control plane 202 can monitor activities of all servers 210-0 to 210-N in its rack at least based on telemetry data and can manage hypervisors 212-0 to 212-N to control traffic generation of VEEs. For example, switch 200 can perform flow control to quiesce a packet transmitter from either a local VEE or a remote sender in cases where congestion is detected. In other cases, hypervisors 212-0 to 212-N can compete for resources fromorchestration control plane 202 to allocate for managed VEEs, but such a scheme may not lead to under allocation of resources to some VEEs.
-  For example, to allocate or moderate resources,orchestration control plane 202 can configure a hypervisor (e.g., 212-0 or 212-N) associated with a server that executes one or more VEEs. For example, servers 210-0 to 210-N can execute respective hypervisor control plane 212-0 to 212-N to manage data planes for VEEs running on a server. For a server, a hypervisor control plane (e.g., 212-0 to 212-N) can track SLA requirements for VEEs running on its server and manage those requirements within the allocated compute resources, network bandwidth, and memory or storage bandwidth. Similarly, a VEE can manage the contention between flows within the resource that it is granted.
-  Orchestration control plane 202 can be afforded privileges withinswitch 200 and servers 210-0 to 210-N at least to configure resource allocations to servers.Orchestration control plane 202 can be insulated from untrusted VEEs that may compromise a server.Orchestration control plane 202 can monitor and shutdown a VEE's VF or a server's PF for a NIC if malicious activity is detected.
-  An example of tiered configurability byorchestration control plane 202 of ahypervisor control plane 212 is described next. A hypervisor control plane 212 (e.g., any of hypervisor control plane 212-0 to 212-N) for a server can determine whether to configure resources afforded to a VEE and operations of the VEE in response to a physical host configuration request having been received, such as fromorchestration control plane 202, an administrator, as a result of an update to a policy associated with a tenant for which the VEE executes, etc.
-  A configuration fromorchestration control plane 202 can be classified as trusted or untrusted.Hypervisor control plane 212 for a server can allow any trusted configuration to be enacted for a VEE. In some examples, bandwidth allocation, initiation of VEE migration or termination, and resource allocations made byorchestration control plane 202 can be classified as trusted.Hypervisor 212 can limit untrusted configurations to perform certain configurations, but not certain hardware access/configuration operations that exceed a trust level. For example, an untrusted configuration cannot issue device resets, change the link configuration, write sensitive/device wide registers, and update the device firmware, etc. By separating configurations into trusted or untrusted,hypervisor 212 can neutralize a potential attack surface by sanitizing untrusted requests. In addition,hypervisor 212 can expose different capabilities for each of its different VEEs, thus allowing the host/provider to segregate tenants as needed.
-  FIG. 2B depicts an example overview of various management hierarchies. Inrepresentation 250, as described earlier, orchestration control plane issues trusted configurations to hypervisor control plane of a server. Some or all commands or configurations from orchestration control plane sent to hypervisor control plane can be considered trusted. Hypervisor control plane institutes the configurations for VEEs managed by the hypervisor.
-  Inrepresentation 260, the switch controls servers as though the servers represent physical functions (PFs) and associated virtual functions (VF-0 to VF-N) represent VEEs. In cases where SR-IOV is used, a bare metal server (e.g., single tenant server) or OS hypervisor corresponds to a PF and VEEs access the PF using their corresponding VF.
-  Inrepresentation 270, the orchestration control plane manages a hypervisor control plane. Indirectly, orchestration control plane can manage data planes DP-0 to DP-N of a server to control allocated resources, allocated network bandwidth (e.g., transmit or receive), and migration or termination of any VEE.
-  FIG. 3 depicts an example system in which a switch can respond to a memory access request. A requester device or VEE in or executing onserver 310 can request data stored inserver 312. Switch 300 can receive and process the memory access request and determine a destination server or device (e.g., IP address or MAC address) to which the memory access request is to be provided for completion (e.g., read or write) ismemory pool 332. Instead of providing the memory access request toserver 312, which will transmit the request tomemory pool 332, switch 300 can transfer the request tomemory pool 332.
-  In some examples, switch 300 can access mapping table 302 that indicates a mapping of a memory address associated with a memory access request to a device physical address (e.g., destination IP address or MAC address). In some examples, switch 300 can be trusted with addresses of target devices and conversion of virtual addresses (provided with the memory access request) to physical address. In some examples, switch 300 can request a memory access (e.g., read or write) on behalf of a requester of the memory access at the target device.
-  In some examples, switch 300 can directly accessmemory pool 332 to retrieve data for a read operation or write data. For example, whenserver 310 requests data fromserver 312 but the data is stored inmemory pool 332,switch 300 may retrieve the requested data from memory pool 332 (or other server) and provide the data toserver 310 and potentially store the data inmemory 304 orserver 312. Switch 300 can fetch the data from memory pool 332 (or other device, server, or storage pool) by issuing a data read request to switch 320 to retrieve the data.Memory pool 332 can be located within a same data center asswitch 300 or outside of the data center. Switch 300 can store the fetched data in memory 304 (or server 312) to allow multiple read-write transactions with low latency by servers in a same rack asswitch 300. A highspeed connection can provide data frommemory 304 toserver 310 or vice versa. In cases that CXL.mem is used to transfer data fromserver 310 tomemory 304 or vice versa, applicable protocol rules can be followed. Switch 300 can update the data inmemory pool 332 if the data frommemory 304 is modified.
-  Accordingly, a two-level memory (2LM) architecture can be implemented to copy data to a local memory accessible over a fast connection for processing by VEEs and significantly alleviating the latency penalty associated with retrieving data.
-  In cases where the memory access request is a read request and data is stored by a server or device connected to another switch (e.g., switch 320) and in another rack, switch 300 can forward the request to the target device that stores the data to respond to the memory request. For example, switch 300 can usepacket processing 306 to change a destination IP or MAC address of the packet that conveyed the memory access request to be that of the target device or encapsulate the request in another packet but maintain the destination IP or MAC address of the received memory access request.
-  Thin memory provisioning allows less memory on a compute node and building a memory pool that is shared by multiple compute nodes. The shared memory can be dynamically allocated/deallocated to compute nodes with allocation set at page or cache line granularity. In aggregate, memory allocated on all compute nodes and memory in shared pool can be less than amount of memory allocated to a compute node. For example, where thin memory provisioning is used forserver 310, data can be stored in a memory on a same rack as that ofserver 310 and potentially in aremote memory pool 332.
-  For a memory access request fromserver 310 that is a write operation, if a target device is not on a rack ofswitch 300, switch 300 can queue the write, report the write operation as complete to server 310 (e.g., the VEE) and then updatememory pool 332 as memory bandwidth allows or as required by memory ordering and cache coherency requires (e.g., flushing posted writes).
-  In some examples, switch 300 can process a memory access to a region of a memory with a corresponding address and, in the case of a write, corresponding data to write. Switch 300 can read data from or store data tomemory pool 332 using remote direct memory access (e.g., InfiniBand, iWARP, RoCE and RoCE v2), NVMe over Fabrics (NVMe-oF) or NVMe. For example, NVMe-oF is described at least in NVM Express Base Specification Revision 1.4 (2019), as well as predecessors, successors, and proprietary variations thereof. NVMe is described for example, in NVM Express™ Base Specification, Revision 1.3c (2018), as well as predecessors, successors, and proprietary variations thereof. In cases where the data is stored by a server or device (e.g., memory pool 332) connected to another switch (e.g., switch 320),switch 300 can retrieve data or write data as though the data were stored in a server of a same rack asserver 310.
-  In addition to the cache or memory space on each server,switch 300 may also contribute to the aggregated cache space as well. Smart cache allocation could place data in a memory of a server that accesses the data. Data that is thrashed (e.g., accessed and modified by several servers) could be placed inmemory 304 ofswitch 300 orserver 312 where it could be accessed with the fewest connection or Ethernet link traversals.
-  Memcached can provide a distributed memory-caching system within a data center or across multiple data centers. For example, Memcached can provide distributed databases to speed up applications by alleviating database load. In some examples, dedicated servers can be used as Memcached servers to consolidate resources across servers (e.g., via Ethernet) and cache commonly accessed data to speed up access to that data. In various embodiments, a switch can manage data stored as part of a Memcached object, data, or string storage in at least some memory resources in servers connected to the switch.
-  FIG. 4A shows examples of a Memcached server executing on a server (system 400) and in a switch (system 450). Use of Memcached allows frequently requested data to be provide faster by use of a hash look up instead or a database (or any other complex) query, although a database query can be used in any embodiment. A first request for data can be relatively slow as it causes retrieval of data. Future requests for the same data can be faster as the data is stored and can be provided from the data server. Insystem 400, a requestor can be a client/server on a different rack in a row of the data center, on a different row in the data center, or an external request from outside of the data center. The request can be received ataggregation switch 402 and provided to switch 404 using an Ethernet link.Switch 404, in turn, can use an Ethernet link to provide the request toMemcached server 408 running on server 406-0, which in turn provides a request for data to server 406-1. Despite a data server 406-1 being in the same rack as the Memcached server 406-0, there are multiple Ethernet communications within the same rack to provide the desired data. Ethernet communications can contribute to east-west traffic within a datacenter.
-  Insystem 450, the request can be received ataggregation switch 402 and provided to switch 452 using an Ethernet link.Switch 452 executesMemcached server 408 using one or more processors and determines a server device that stores requested data. In cases where data is stored in a same rack for which switch 452 provides connectivity (e.g., using PCIe, CXL, DDRx), the request can be provided to server 460-1 and not contribute to east-west traffic. If the requestor were in the same rack (e.g., server 460-N), as switch 454 is a network endpoint, the request could be handled internally to switch 454 and not travel over Ethernet to be fulfilled. In cases of a cache miss (e.g., data is not stored in server 460-1, in some scenarios, data can be retrieved from another server (e.g., 460-0) over the connection.
-  For example, switch 452 can execute Memcached in a VEE running on the switch and can consolidate resources in the entire rack into a virtual pool of combined cache and memory via a high-speed connection.
-  Additionally, withswitch 452 handling NIC endpoint operations, all requests could automatically route throughMemcached server 408 running in a VEE executing onswitch 452 and the client requester no longer needs to maintain a list of Memcached servers. A Memcached server VEE could automatically update its cache (e.g., shown as data in server 460-1) based on how it is configured to improve data locality to requesters and reduce further latency.
-  FIG. 4B shows the Ethernet packet flow for a single request. Each arrow represents a traversal of an Ethernet link and contribution to east-west or north-south traffic. Forsystem 400, in the case of a cache miss, whereby data is not available at the data server, a total of 10 Ethernet link (or other format) traversals are made. A requester sends a request to an aggregation switch, the aggregation switch provides the request to a switch and in turn, the switch provide the request to the Memcached server. The Memcached server provides a request to be sent to a data server through the switch. The data server responds by indicating that data is not present via the switch to the Memcached server. The Memcached server receives a response of a cache miss so that the Memcached server can update its cache with the data so subsequent requests for that data no longer result in a cache miss. The Memcached server provides the data to the requester even in cases of a cache miss.
-  Where the Memcached server is in a different rack in the data center than a rack that stores the data, for the request to be fulfilled, the request travels to a different rack and a response is provided to the Memcached server. However, the switch could issue an Ethernet request to a rack that stores the data. In some examples, the switch could bypass the Memcached servers and request data from the data source directly.
-  Forsystem 450, a requester provides a request to the switch via an aggregation switch and the switch accesses the Memcached server and data in its rack via a connection (e.g., PCIe, CXL, DDRx) and provides the response data to the requester via an aggregation switch to the requester. In this example, 4 Ethernet link traversals occur. Providing a Memcached service in a switch can reduce the network accesses to databases on other racks and even reduce the east-west traffic within the rack by performing the Memcached data location look-up in the switch. In some cases, where data is cached in a memory of the switch (e.g., in memory 304) or in a server of the rack, the switch can directly supply the requested data in response to the request. In cases of cache misses, fewer Ethernet communications are made bysystem 450 because servers in the same rack are accessible via switch 452 (FIG. 4A ) using high-speed connections (PCIe, CXL, DDR, etc.) to retrieve data to cache.
-  FIG. 5A depicts an example system in which packets can terminate at a switch. A packet can be received byswitch 502 from an aggregation switch, for example. The packet can be Ethernet compatible and use any type of transport layer (e.g., Transmission Control Protocol (TCP), Data Center TCP (DCTCP), User Datagram Protocol (UDP), quick User Datagram Protocol Internet Connections (QUIC)). Various embodiments ofswitch 502 can execute one or more VEEs (e.g., 504 or 506) to terminate a packet by performing network protocol activity. For example,VEEs switch 502 such as one or more of: segmentation, reassembly, acknowledgements (ACKs), negative-acknowledgements (NACKs), packet retransmit identification and requests, congestion management (e.g., flow control of a transmitter), Secure Sockets Layer (SSL) or Transport Layer Security (TLS) termination for HTTP and TCP. As memory pages are filled (such as at the socket layer), pages can be copied to a destination server on the rack using a high speed connection and corresponding protocol (e.g., CXL.mem) for access by the bare metal host or a VEE. In some examples,protocol processing VEE 
-  For example, switch 502 can executeprotocol processing VEEs 
-  In some examples,VEEs switch 502. For example,VEEs VEEs 
-  In some examples,VEEs 
-  In some examples,VEEs switch 502 can perform one or more of: large receive offload (LRO), large send/segmentation offload (LSO), TCP segmentation offload (TSO), Transport Layer Security (TLS) offload, receive side scaling (RSS) to allocate a queue or core to process a payload, dedicated queue allocation, or another layer protocol processing.
-  LRO can refer to switch 502 (e.g.,VEEs VEEs 504 or 506) or server 510-0 or 510-1 (e.g., VEE 514-0 or 514-1) generating a multipacket buffer and providing content of the buffer to switch 502 (e.g.,VEEs switch 502 or a server 510-0 or 510-1 to build a larger TCP message (or other transport layer) (e.g., 64 KB in length) and switch 502 (e.g.,VEEs 
-  TLS is defined at least in The Transport Layer Security (TLS) Protocol Version 1.3, RFC 8446 (August 2018). TLS offload can refer to offload of encryption or decryption of contents in accordance with TLS to switch 502 (e.g.,VEEs VEEs VEEs switch 502 such as but not limited to Secure Sockets Layer (SSL).
-  RSS can refer to switch 502 (e.g.,VEEs VEEs VEEs switch 502 or a server that is to store and process payload from the received packet. In some examples, switch 502 can perform RSS to allocate one or more cores (onswitch 502 or a server) to perform packet processing.
-  In some examples, switch 502 can allocate a dedicated queue in a memory to an application or VEE according to Application Device Queue (ADQ) or similar technology. Use of ADQ can dedicate queues to applications or VEEs, and these queues can be exclusively accessed by the applications or VEEs. ADQ can prevent network traffic contention whereby different applications or VEEs attempt to access the same queue and cause locking or contention, and the performance (e.g., latency) of packet availability becomes unpredictable. Moreover, ADQ provides quality of service (QoS) control for dedicated application traffic queues for received packets or packets to be transmitted. For example, using ADQ, switch 502 can allocate packet payload content to one or more queues where the one or more queues are mapped to access by software such as an application or VEE. In some examples, switch 502 can utilize ADQ to dedicate one or more queues for packet header processing operations.
-  FIG. 5C depicts an example manner of NUMA node, CPU or server selection by switch 502 (e.g.,VEEs resource selector 572 can perform a hash calculation on a received packet's header (e.g., hash calculation on a packet flow identifier) to determine an indirection table stored onswitch 502 that maps to a queue (e.g., among queues 576), which in turn maps to a NUMA node, CPU or server.Resource mappings 574 can include an indirection table and mapping to queue as well as indicator of which connection (e.g., CXL link, PCIe connection or DDR interface) to use to copy a header and/or payload of a received packet to a memory (or cache) associated with a selected NUMA node, CPU or server. In some cases,resource selector 572 performs RSS to select a NUMA node, CPU or server. For example,resource selector 572 can select a CPU1 inNUMA Node 0 on server 580-1 to process the header and/or payload of the received packet. A NUMA node on a server could have its own connection to switch 570 to allow writing to memory in a server without traversing a UPI bus. A VEE can be executed on one or more cores or CPUs and the VEE can process the received payload.
-  Referring again toFIG. 5A , to perform packet protocol processing,VEEs 
-  In some examples, any protocol processing, protocol termination, network termination, or offload operation can be performed by a programmable or fixed function device inswitch 502 instead of or in addition to use of a VEE executing inswitch 502.
-  In some examples, processing packets inswitch 502 can allow for a faster decision of packet handling (e.g., forward or discard) than were a decision of packet handling were made in a server. In addition, bandwidth utilization of a connection between a server and switch can be saved from use in the event of packet discard. If a packet were identified as related to malicious activity (e.g., DDoS attack), the packet could be discarded and insulate a server from potential exposure to the malicious activity.
-  VEEs switch 502 can complete network processing and provide resulting data is transferred to a data buffer for a VEE 514-0 or 514-1 via DMA, RDMA, PCIe, CXL.mem, regardless of the network protocol that was used to deliver the packet. In other words,VEEs switch 502 can act as proxy VEEs for respective VEEs 514-0 or 514-1 running on respective servers 510-0 and 510-1. For example,VEE VEE 504 or 506) can provide a socket buffer entry to host and data in a buffer (e.g., 512-0 or 512-1).
-  Based at least on successful protocol layer processing and absence of any deny condition in an ACL, a payload from the packet can be copied into memory buffer (e.g., 512-0 or 512-1) in a destination server (e.g., 510-0 or 510-1). For example,VEEs VEEs 
-  In some examples, switch 502 can execute VEEs for each of VEEs executing on servers within its rack or an optimized subset. In some examples, a subset of VEEs to execute on the switch can correspond to VEEs running on servers with low latency requirements, are primarily network focused, or other criteria.
-  In some examples,switch 502 is connected to servers 510-0 and 510-1 using connections that permitswitch 502 to access to all the CPUs, memory, storage in the rack. An orchestration layer can manage resource allocation to VEEs in some or all ofswitch 502 and any server in the rack.
-  VEEs 514-0 and 514-1 executed in respective servers 510-0 and 510-1 can select a mode of being informed of data availability such as: polling mode, busy poll, or interrupt. Polling mode can include a VEE polling for a new packet by actively sampling a status of a buffer to determine if there is a new packet arrival. Busy polling can allow socket layer code to poll a receive queue and disable network interrupts. Interrupt can cause an executing process to save its sate and perform a process associated with an interrupt (e.g., process a packet or data).
-  Server 510-0 or 510-1 in a rack can receive interrupts instead of running in polling mode for packet processing. Interrupts can be issued byswitch 502 to a server for higher level transactions, rather than per packet. For example, where a VEE 514-0 or 514-1 runs a database, an interrupt could be provided byVEE VEE 
-  FIG. 5B depicts an example of a composition of VEEs on a server and switch. In this example,VEE 552 executes onswitch 550 to perform protocol processing or packet protocol termination for packets having payloads to be processed byVEE 562, which executes onserver 550.VEE 552 can execute on one or more cores onswitch 550. For example,VEE 552 can process packet headers for packets utilizing TCP/IP or other protocol or protocol combinations.VEE 552 can write a payload of a processed packet to asocket buffer 566 inserver 560 via socket interface 554-to-socket interface 564 and high speed connection 555 (e.g., PCIe, CXL, DDRx (where x is an integer)).Socket buffer 566 can be represented as a memory address. An application (e.g., running inVEE 562 executingserver 560 can access thesocket buffer 566 to utilize or process the data.VEE 552 can provide operations of a TCP Offload Engine (TOE) without requiring any of the protocol stack changes (such as TCP Chimney).
-  In some examples, network termination occurs inVEE 552 ofswitch 550 andserver 560 does not receive any packet headers insocket buffer 566. For example,VEE 552 ofswitch 550 can perform protocol processing of Ethernet, IP, and transport layers (e.g., TCP, UDP, QUIC) headers and such header would not be provided toserver 560.
-  Some applications have their own headers or markers, and switch 550 can transfer or copy those headers or markers, in addition to the payload data, tosocket buffer 566. AccordinglyVEE 562 can access data insocket buffer 566 regardless of the protocols used to transmit the data (e.g., Ethernet, asynchronous transfer mode (ATM), Synchronous optical networking (SONET), synchronous digital hierarchy (SDH), Token Ring, and so forth.
-  In some examples,VEEs VEE 552 hands data off toVEE 562 within a trusted environment or at least by sharing of memory space.Network service VEE 552 can be chained to anapplication service VEE 562 and VEEs 552 and 562 could have a shared memory buffer for layer 7 data passing.
-  In data centers, device (e.g., compute or memory) utilization and performance and software performance can be measured to evaluate server usage and whether adjustments are to be made or not made to available resources or software. Examples of telemetry data include device temperature readings, application monitoring, network usage, disk space usage, memory consumption, CPU utilization, fan speeds, as well as application-specific telemetry streams from VEEs running on a server. For example, telemetry data can include counters or performance monitoring events related to: processor or core usage statistics, input/output statistics for devices and partitions, memory usage information, storage usage information, bus or interconnect usage information, processor hardware registers that count hardware events such as instructions executed, cache-misses suffered, branches mis predicted. For a workload request that is being performing or has completed, one or more of the following can be collected: telemetry data such as but not limited to outputs from Top-down Micro-Architecture Method (TMAM), execution of the Unix system activity reporter (SAR) command, Emon command monitoring tool that can profile application and system performance. However, additional information can be collected such as outputs from a variety of monitoring tools including but not limited to output from use of the Linux perf command, Intel PMU toolkit, lostat, VTune Amplifier, or monCli or other Intel Benchmark Install and Test Tool (Intel® BITT) Tools. Other telemetry data can be monitored such as, but not limited to, power usage, inter-process communications, and so forth. Various telemetry techniques such as those described with respect to the collected daemon can be used.
-  As VEEs in a datacenter transmit telemetry data to a central orchestrator, the bandwidth requirements could be enormous and east-west traffic could be overwhelmed by telemetry data. In some cases, key performance indicators (KPIs) are provided by a server and if one of these KPIs indicates a problem, a server sends a more robust set of telemetry to allow more detailed investigation.
-  In some embodiments, when a high-speed connection is used between a server and switch, much more information can pass from the server to the switch without burdening east-west traffic. The switch can collect more than a minimum set of telemetry (e.g., KPIs) from the server while not burdening the network with excessive east-west traffic overhead. However, in some examples, a server can send KPIs to the switch unless more data or history is requested such as in the case of an error. An orchestrator (e.g.,orchestration control plane 202 ofFIG. 2A ) executed for the switch can use expanded telemetry data (e.g.,telemetry 204 ofFIG. 2A ) to determine available capacity on each of the servers on its rack and can provide refined multi-server job placements to maximize performance considering telemetry of multiple servers.
-  FIG. 6 depicts an example of a switch that executes an orchestration control plane to manage what device executes a VEE.Orchestration control plane 604 executing onswitch 602 can monitor one or more VEE's performance in terms of compliance with an applicable SLA and when the VEE does not comply with an SLA requirement (e.g., application availability (e.g., 99.999% during workdays and 99.9% for evenings or weekends), maximum permitted response times to queries or other invocations, requirements of actual physical location of stored data, or encryption or security requirements) or is within a range close to non-compliance of an SLA requirement,orchestration control plane 604 can instantiate one or more new VEEs to balance the workload among VEEs. As a workload drops, the extra VEEs can be torn down or deactivated, freeing resources to be allocated to another VEE (or the same VEE at a later time) to use when load reaches capacity. For example, a workload can include at least any type of activity such as protocol processing and network termination for packets or Memcached server, database or webserver. For example,VEE 606 can perform protocol processing, and if a workload increases, multiple instances ofVEE 606 can be instantiated onswitch 602.
-  In some examples,orchestration control plane 604 executing onswitch 602 can determine whether to migrate any VEE executing onswitch 602 or a server to execution on another server. For example, migration can depend on a shut down or restart ofswitch 602 on which the VEE executes, which can cause the VEE to be executed on a server. For example, VEE migration can depend on a shut down or restart of a server on which the VEE executes, which can cause the VEE to be executed onswitch 602 or another server.
-  In some examples,orchestration control plane 604 can decide whether to execute a VEE on a particular processor or migrate the VEE amongswitch 602 or any server 608-0 to 608-N. VEE 606 orVEE 610 can migrate from a server to a switch, a switch to a server, or a server to another server as needed. For example,VEE 606 could execute onswitch 602 for a short-term in connection with a server being rebooted and the VEE can be migrated back to the rebooted server or another server.
-  In some examples, switch 602 can execute a virtual switch (vSwitch) that allows communication between VEEs running onswitch 602, or any server connected to switch 602. A virtual switch can include Microsoft Hyper-V, Open vSwitch, VMware vSwitches, and so forth.
-  Switch 602 can support S-IOV, SR-IOV, or MR-IOV for its VEEs. In this example, the VEE running onswitch 602 utilizes resources in one or more servers via with S-IOV, SR-IOV, or MR-IOV. S-IOV, SR-IOV, or MR-IOV can permit connection or bus sharing across VEEs. In some examples, where a VEE running onswitch 602 operates as a network termination proxy VEE, one or more corresponding VEEs run on one or more servers in the rack and inswitch 602. VEEs running onswitch 602 can process packets and VEEs running on cores on the server or switch 602 can execute applications (e.g., database, webserver, and so forth). Use of SIOV, SR-IOV, or MR-IOV (or other schemes) can allow the server resources to be composed whereby physically disaggregated servers are logically one system, but the tasks are divided such that the network processing occurs onswitch 602.
-  As described earlier,switch 602 can use a high speed connection to at least some of the resources on one or more servers 608-0 to 608-N in a rack, providing access to resources from any of the servers in the rack toVEE 606 running onswitch 602.Orchestration control plane 604 can efficiently allocate VEEs to resources and not be limited by what can execute in a single server, but also execute inswitch 602 and servers 608-0 to 608-N. This feature allows for potentially constrained resources such as accelerators to be optimally allocated.
-  FIG. 7A depicts an example of migration of a VEE from a server to another server. For example, live migration (e.g., Microsoft® HyperV or VMware® vSphere) of a VEE can be performed to migrate an active VEE. At (1), the VEE is transmitted to a TOR switch. At (2), the VEE is transmitted through a data center core network and at (3), the VEE is transmitted to a TOR switch of another rack. At (4), the VEE is transmitted to a server, where the VEE can commence execution in another hardware environment.
-  FIG. 7B depicts an example of migration of a VEE. In this example, a VEE can be executed on a switch that uses resources of the switch and connected servers in the rack. At (1), the VEE is transmitted from the switch to the core network. At (2), the VEE is transmitted to another switch for execution. Another switch can use resources of the switch and connected servers in the rack. In other examples, the destination for the VEE can be a server, as in the example ofFIG. 7A . Accordingly, by executing a VEE on a switch with expanded server resources, fewer steps are taken in migrating a VEE and the VEE can commence executing sooner in the scenario ofFIG. 7B than in the scenario ofFIG. 7A .
-  FIG. 8A depicts an example process. The process can be performed by a processor enhanced switch in accordance with various embodiments. At 802, a switch can be configured to execute an orchestration control plane. For example, the orchestration control plane can manage compute, memory, and software resources of the switch and one or more servers connected to the switch in a same rack as that of the switch. Servers can execute hypervisors that control execution of virtualized execution environments and also permit or do not permit configurations by the orchestration control plane. For example, the connection can be used to provide communication between the switch and the servers. The orchestration control plane can receive telemetry from servers in a rack via the connection without the telemetry contributing to east-west traffic within a data center. Various examples of the connection are described herein.
-  At 804, the switch can be configured to execute a virtualized execution environment to perform protocol processing for at least one virtualized execution environment executing on a server. Various examples of protocol processing are described herein. In some examples, the switch performs network termination of received packets and can provide data from received packets to a memory buffer of a server or the switch. However, a virtualized execution environment can perform any type of operation related to or unrelated to packet or protocol processing. For example, the virtualized execution environment can execute a Memcached server or retrieve data from memory devices in another rack or outside of the data center or a webserver or database.
-  At 806, orchestration control plane can determine resources whether to change an allocation of resources to the virtualized execution environment. For example, based on whether an applicable SLA for the virtualized execution environment or a flow of packets processed by the virtualized execution environment is being met or is not met, the orchestration control plane can determine whether to change an allocation of resources to the virtualized execution environment. For a scenario where the SLA is not being met or is considered likely to be violated, at 808, orchestration control plane can add additional computing, networking, or memory resources for use by the virtualized execution environment, or instantiate one or more additional virtualized execution environments to assist with processing. In some examples, the virtualized execution environment can be migrated from the switch to a server to improve resource availability.
-  For a scenario where the SLA is being met, the process returns to 806. Note that in some cases, where packet processing activity is low or idle, orchestration control plane can de-allocate computing resources available to the virtualized execution environment. In some examples, where the SLA is being met, the virtualized execution environment can be migrated from the switch to a server to provide resources for another virtualized execution environment to utilize.
-  FIG. 8B depicts an example process. The process can be performed by a processor enhanced switch in accordance with various embodiments. At 820, a virtualized execution environment executing on a switch can perform packet processing of a received packet. Packet processing can include one or more of: header parsing, flow identification, segmentation, reassembly, acknowledgements (ACKs), negative-acknowledgements (NACKs), packet retransmit identification and requests, congestion management (e.g., flow control of a transmitter), checksum validation, decryption, encryption, or secure tunneling (e.g., Transport Layer Security (TLS) or Secure Sockets Layer (SSL)) or other operations. For example, the packet and protocol processing virtualized execution environment can perform polling, busy polling, or rely on interrupts to detect for new received packets received in a packet buffer from one or more ports. Based on detection of a new received packet, the virtualized execution environment can process the received packet.
-  At 822, the virtualized execution environment executing on the switch can determine whether data from the packet is to be made available or discarded. For example, if the packet is subject to a deny status on an access control list (ACL), the packet can be discarded. If the data is determined to be provided to a next virtualized execution environment, the process can continue to 824. If the packet is determined to be discarded, the process can continue to 826, where the packet discarded.
-  At 824, the virtualized execution environment can notify a virtualized execution environment executed on a server that data is available and provide the data for access by a virtualized execution environment executed on a server. The virtualized execution environment executed on the switch can cause the data to be copied to a buffer accessible to the virtualized execution environment executed on the server. For example, direct memory access (DMA), RDMA, or other direct copy scheme can be used to copy the data to the buffer. In other examples, the data is made available to a virtualized execution environment executed on the switch for processing.
-  FIG. 8C depicts an example process. The process can be performed by a processor enhanced switch in accordance with various embodiments. At 830, the switch can be configured to execute a virtualized execution environment to perform retrieval of data from a device in the same or different rack as that of the switch or copying of data to a device in the same or different rack as that of the switch.
-  At 832, the virtualized execution environment can be configured with information on destination devices that are associated with memory addresses. For example, information can indicate a translation of a destination device or server (e.g., IP address or MAC address) that corresponds to a memory address in a memory transaction. For example, for a read memory transaction, the device or server can store data corresponding to the memory address and the data can be read from the memory address at the device or server. For example, for a write memory transaction, the device or server can receive and store data corresponding to the address for the write transaction.
-  At 834, the switch can receive a memory access request from a server of the same rack. At 836, the virtualized execution environment executing on the switch can manage the memory access request. In some examples, performance of 836 can include performance of 838, where the virtualized execution environment executing on the switch can transfer the memory access request to the destination server. In some examples, if a memory access request is to be sent to a server but the server does not store the requested data, the switch can re-direct the memory access request to the destination server that stores the requested data instead of sending the memory access request to the server, which in turn, sends the request to the destination server.
-  In some examples, performance of 836 can include performance of 840, where the virtualized execution environment executing on the switch can perform the memory access request. If the memory access request is a write command, the virtualized execution environment can write data to a memory address corresponding to the memory access request in a device in a same or different rack. If the memory access request is a read command, the virtualized execution environment can copy data from a memory address corresponding to the memory access request in a device in a same or different rack. For example, remote direct memory access can be used to write or read the data.
-  For a read request, the switch can locally cache the data for access by a server connected to the switch. In cases where an orchestration control plane manages memory resources of the switch and servers, the retrieved data can be stored in a memory device of the switch or any server such that any virtualized execution environment executed on any server of a rack can access or modify the data. For example, a memory device accessible to the switch and the servers of the rack can access the data as a near memory. In cases where the data is updated, the switch can write the updated data to the memory device that stores the data.
-  For example, block 840 can be performed in a scenario where the switch executes a Memcached server and data is stored in a server that is in a same rack as that of the switch. The Memcached server executing on the switch can respond to memory access request that corresponds to a cache miss by retrieving data from another server and storing the retrieved data in a cache in a memory or storage of the rack.
-  FIG. 9 depicts a system. The system can utilize a switch to manage resources in the system and perform other embodiments described herein.System 900 includesprocessor 910, which provides processing, operation management, and execution of instructions forsystem 900.Processor 910 can include any type of microprocessor, central processing unit (CPU), graphics processing unit (GPU), processing core, or other processing hardware to provide processing forsystem 900, or a combination of processors.Processor 910 controls the overall operation ofsystem 900, and can be or include, one or more programmable general-purpose or special-purpose microprocessors, digital signal processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs), or the like, or a combination of such devices.
-  In one example,system 900 includesinterface 912 coupled toprocessor 910, which can represent a higher speed interface or a high throughput interface for system components that needs higher bandwidth connections, such asmemory subsystem 920 or graphics interfacecomponents 940, oraccelerators 942.Interface 912 represents an interface circuit, which can be a standalone component or integrated onto a processor die. Where present, graphics interface 940 interfaces to graphics components for providing a visual display to a user ofsystem 900. In one example, graphics interface 940 can drive a high definition (HD) display that provides an output to a user. High definition can refer to a display having a pixel density of approximately 100 PPI (pixels per inch) or greater and can include formats such as full HD (e.g., 1080p), retina displays, 4K (ultra-high definition or UHD), or others. In one example, the display can include a touchscreen display. In one example, graphics interface 940 generates a display based on data stored inmemory 930 or based on operations executed byprocessor 910 or both. In one example, graphics interface 940 generates a display based on data stored inmemory 930 or based on operations executed byprocessor 910 or both.
-  Accelerators 942 can be programmable or fixed function offload engines that can be accessed or used by aprocessor 910. For example, an accelerator amongaccelerators 942 can provide compression (DC) capability, cryptography services such as public key encryption (PKE), cipher, hash/authentication capabilities, decryption, or other capabilities or services. In some embodiments, in addition or alternatively, an accelerator amongaccelerators 942 provides field select controller capabilities as described herein. In some cases,accelerators 942 can be integrated into a CPU or connected to CPU by various devices (e.g., a connector to a motherboard or circuit board that includes a CPU and provides an electrical interface with the CPU). For example,accelerators 942 can include a single or multi-core processor, graphics processing unit, logical execution unit single or multi-level cache, functional units usable to independently execute programs or threads, application specific integrated circuits (ASICs), neural network processors (NNPs), programmable control logic, and programmable processing elements such as field programmable gate arrays (FPGAs).Accelerators 942 can provide multiple neural networks, CPUs, processor cores, general purpose graphics processing units, or graphics processing units can be made available for use by artificial intelligence (AI) or machine learning (ML) models. For example, the AI model can use or include any or a combination of a reinforcement learning scheme, Q-learning scheme, deep-Q learning, or Asynchronous Advantage Actor-Critic (A3C), combinatorial neural network, recurrent combinatorial neural network, or other AI or ML model. Multiple neural networks, processor cores, or graphics processing units can be made available for use by AI or ML models.
-  Memory subsystem 920 represents the main memory ofsystem 900 and provides storage for code to be executed byprocessor 910, or data values to be used in executing a routine.Memory subsystem 920 can include one ormore memory devices 930 such as read-only memory (ROM), flash memory, one or more varieties of random-access memory (RAM) such as DRAM, or other memory devices, or a combination of such devices.Memory 930 stores and hosts, among other things, operating system (OS) 932 to provide a software platform for execution of instructions insystem 900. Additionally,applications 934 can execute on the software platform ofOS 932 frommemory 930.Applications 934 represent programs that have their own operational logic to perform execution of one or more functions.Processes 936 represent agents or routines that provide auxiliary functions toOS 932 or one ormore applications 934 or a combination.OS 932,applications 934, and processes 936 provide software logic to provide functions forsystem 900. In one example,memory subsystem 920 includesmemory controller 922, which is a memory controller to generate and issue commands tomemory 930. It will be understood thatmemory controller 922 could be a physical part ofprocessor 910 or a physical part ofinterface 912. For example,memory controller 922 can be an integrated memory controller, integrated onto a circuit withprocessor 910.
-  While not specifically illustrated, it will be understood thatsystem 900 can include one or more buses or bus systems between devices, such as a memory bus, a graphics bus, interface buses, or others. Buses or other signal lines can communicatively or electrically couple components together, or both communicatively and electrically couple the components. Buses can include physical communication lines, point-to-point connections, bridges, adapters, controllers, or other circuitry or a combination. Buses can include, for example, one or more of a system bus, a Peripheral Component Interconnect (PCI) bus, a Hyper Transport or industry standard architecture (ISA) bus, a small computer system interface (SCSI) bus, a universal serial bus (USB), or an Institute of Electrical and Electronics Engineers (IEEE) standard 1394 bus (Firewire).
-  In one example,system 900 includesinterface 914, which can be coupled tointerface 912. In one example,interface 914 represents an interface circuit, which can include standalone components and integrated circuitry. In one example, multiple user interface components or peripheral components, or both, couple to interface 914.Network interface 950 providessystem 900 the ability to communicate with remote devices (e.g., servers or other computing devices) over one or more networks.Network interface 950 can include an Ethernet adapter, wireless interconnection components, cellular network interconnection components, USB (universal serial bus), or other wired or wireless standards-based or proprietary interfaces.Network interface 950 can transmit data to a device that is in the same data center or rack or a remote device, which can include sending data stored in memory.Network interface 950 can receive data from a remote device, which can include storing received data into memory. Various embodiments can be used in connection withnetwork interface 950,processor 910, andmemory subsystem 920.
-  In one example,system 900 includes one or more input/output (I/O) interface(s) 960. I/O interface 960 can include one or more interface components through which a user interacts with system 900 (e.g., audio, alphanumeric, tactile/touch, or other interfacing).Peripheral interface 970 can include any hardware interface not specifically mentioned above. Peripherals refer generally to devices that connect dependently tosystem 900. A dependent connection is one wheresystem 900 provides the software platform or hardware platform or both on which operation executes, and with which a user interacts.
-  In one example,system 900 includesstorage subsystem 980 to store data in a nonvolatile manner. In one example, in certain system implementations, at least certain components ofstorage 980 can overlap with components ofmemory subsystem 920.Storage subsystem 980 includes storage device(s) 984, which can be or include any conventional medium for storing large amounts of data in a nonvolatile manner, such as one or more magnetic, solid state, or optical based disks, or a combination.Storage 984 holds code or instructions and data 986 in a persistent state (e.g., the value is retained despite interruption of power to system 900).Storage 984 can be generically considered to be a “memory,” althoughmemory 930 is typically the executing or operating memory to provide instructions toprocessor 910. Whereasstorage 984 is nonvolatile,memory 930 can include volatile memory (e.g., the value or state of the data is indeterminate if power is interrupted to system 900). In one example,storage subsystem 980 includescontroller 982 to interface withstorage 984. In oneexample controller 982 is a physical part ofinterface 914 orprocessor 910 or can include circuits or logic in bothprocessor 910 andinterface 914.
-  A volatile memory is memory whose state (and therefore the data stored in it) is indeterminate if power is interrupted to the device. Dynamic volatile memory requires refreshing the data stored in the device to maintain state. One example of dynamic volatile memory includes DRAM (Dynamic Random-Access Memory), or some variant such as Synchronous DRAM (SDRAM). Another example of volatile memory includes cache or static random-access memory (SRAM). A memory subsystem as described herein may be compatible with a number of memory technologies, such as DDR3 (DoubleData Rate version 3, original release by JEDEC (Joint Electronic Device Engineering Council) on Jun. 27, 2007). DDR4 (DDR version 4, initial specification published in September 2012 by JEDEC), DDR4E (DDR version 4), LPDDR3 (Low Power DDR version3, JESD209-3B, August 2013 by JEDEC), LPDDR4) LPDDR version 4, JESD209-4, originally published by JEDEC in August 2014), WIO2 (Wide Input/output version 2, JESD229-2 originally published by JEDEC in August 2014, HBM (High Bandwidth Memory, JESD325, originally published by JEDEC in October 2013, LPDDR5 (currently in discussion by JEDEC), HBM2 (HBM version 2), currently in discussion by JEDEC, or others or combinations of memory technologies, and technologies based on derivatives or extensions of such specifications. For example, DDR or DDRx can refer to any version of DDR, where x is an integer.
-  A non-volatile memory (NVM) device is a memory whose state is determinate even if power is interrupted to the device. In one embodiment, the NVM device can comprise a block addressable memory device, such as NAND technologies, or more specifically, multi-threshold level NAND flash memory (for example, Single-Level Cell (“SLC”), Multi-Level Cell (“MLC”), Quad-Level Cell (“QLC”), Tri-Level Cell (“TLC”), or some other NAND). A NVM device can also comprise a byte-addressable write-in-place three dimensional cross point memory device, or other byte addressable write-in-place NVM device (also referred to as persistent memory), such as single or multi-level Phase Change Memory (PCM) or phase change memory with a switch (PCMS), Intel® Optane™ memory, NVM devices that use chalcogenide phase change material (for example, chalcogenide glass), resistive memory including metal oxide base, oxygen vacancy base and Conductive Bridge Random Access Memory (CB-RAM), nanowire memory, ferroelectric random access memory (FeRAM, FRAM), magneto resistive random access memory (MRAM) that incorporates memristor technology, spin transfer torque (STT)-MRAM, a spintronic magnetic junction memory based device, a magnetic tunneling junction (MTJ) based device, a DW (Domain Wall) and SOT (Spin Orbit Transfer) based device, a thyristor based memory device, or a combination of any of the above, or other memory.
-  A power source (not depicted) provides power to the components ofsystem 900. More specifically, power source typically interfaces to one or multiple power supplies insystem 900 to provide power to the components ofsystem 900. In one example, the power supply includes an AC to DC (alternating current to direct current) adapter to plug into a wall outlet. Such AC power can be renewable energy (e.g., solar power) power source. In one example, power source includes a DC power source, such as an external AC to DC converter. In one example, power source or power supply includes wireless charging hardware to charge via proximity to a charging field. In one example, power source can include an internal battery, alternating current supply, motion-based power supply, solar power supply, or fuel cell source.
-  In an example,system 900 can be implemented using interconnected compute sleds of processors, memories, storages, network interfaces, and other components. High speed interconnects can be used such as PCIe, Ethernet, or optical interconnects (or a combination thereof).
-  In an example,system 900 can be implemented using interconnected compute sleds of processors, memories, storages, network interfaces, and other components. High speed interconnects can be used such as: Ethernet (IEEE 802.3), remote direct memory access (RDMA), InfiniBand, Internet Wide Area RDMA Protocol (iWARP), Transmission Control Protocol (TCP), User Datagram Protocol (UDP), quick User Datagram Protocol Internet Connections (QUIC), RDMA over Converged Ethernet (RoCE), Peripheral Component Interconnect express (PCIe), Intel QuickPath Interconnect (QPI), Intel Ultra Path Interconnect (UPI), Intel On-Chip System Fabric (IOSF), Omnipath, Compute Express Link (CXL), HyperTransport, high-speed fabric, NVLink, Advanced Microcontroller Bus Architecture (AMB A) interconnect, OpenCAPI, Gen-Z, Cache Coherent Interconnect for Accelerators (CCIX), 3GPP Long Term Evolution (LTE) (4G), 3GPP 5G, and variations thereof.
-  Embodiments herein may be implemented in various types of computing, smart phones, tablets, personal computers, and networking equipment, such as switches, routers, racks, and blade servers such as those employed in a data center and/or server farm environment. The servers used in data centers and server farms comprise arrayed server configurations such as rack-based servers or blade servers. These servers are interconnected in communication via various network provisions, such as partitioning sets of servers into Local Area Networks (LANs) with appropriate switching and routing facilities between the LANs to form a private Intranet. For example, cloud hosting facilities may typically employ large data centers with a multitude of servers. A blade comprises a separate computing platform that is configured to perform server-type functions, that is, a “server on a card.” Accordingly, each blade includes components common to conventional servers, including a main printed circuit board (main board) providing internal wiring (e.g., buses) for coupling appropriate integrated circuits (ICs) and other components mounted to the board.
-  FIG. 10 depicts anenvironment 1000 that includesmultiple computing racks 1002, each including a Top of Rack (ToR)switch 1004, apod manager 1006, and a plurality of pooled system drawers. Embodiments of the switch herein can be used to manage device resources, virtual execution environment operation, and data locality to a VEE (e.g., storage of data in the same rack as that which executes the VEE). Generally, the pooled system drawers may include pooled compute drawers and pooled storage drawers. Optionally, the pooled system drawers may also include pooled memory drawers and pooled Input/Output (I/O) drawers. In the illustrated embodiment the pooled system drawers include an Intel® XEON® pooledcomputer drawer 1008, and Intel® ATOM™ pooledcompute drawer 1010, a pooledstorage drawer 1012, a pooledmemory drawer 1014, and a pooled I/O drawer 1016. Each of the pooled system drawers is connected toToR switch 1004 via a high-speed link 1018, such as a 40 Gigabit/second (Gb/s) or 100 Gb/s Ethernet link or a 100+Gb/s Silicon Photonics (SiPh) optical link.
-  Multiple of thecomputing racks 1002 may be interconnected via their ToR switches 1004 (e.g., to a pod-level switch or data center switch), as illustrated by connections to anetwork 1020. In some embodiments, groups ofcomputing racks 1002 are managed as separate pods via pod manager(s) 1006. In one embodiment, a single pod manager is used to manage all of the racks in the pod. Alternatively, distributed pod managers may be used for pod management operations.
-  Environment 1000 further includes amanagement interface 1022 that is used to manage various aspects of the environment. This includes managing rack configuration, with corresponding parameters stored asrack configuration data 1024.
-  FIG. 11 depicts an example network element that can be used by embodiments of the switch herein. Various embodiments of a switch can perform any operations ofnetwork interface 1100. In some examples, network interface 110 can be implemented as a network interface controller, network interface card, a host fabric interface (HFI), host bus adapter (HBA).Network interface 1100 can be coupled to one or more servers using a bus, PCIe, CXL, or DDRx.Network interface 1100 may be embodied as part of a system-on-a-chip (SoC) that includes one or more processors, or included on a multichip package that also contains one or more processors.
-  Network interface 1100 can includetransceiver 1102,processors 1104, transmitqueue 1106, receivequeue 1108,memory 1110, andbus interface 1112, andDMA engine 1152.Transceiver 1102 can be capable of receiving and transmitting packets in conformance with the applicable protocols such as Ethernet as described in IEEE 802.3, although other protocols may be used.Transceiver 1102 can receive and transmit packets from and to a network via a network medium (not depicted).Transceiver 1102 can includePHY circuitry 1114 and media access control (MAC)circuitry 1116.PHY circuitry 1114 can include encoding and decoding circuitry (not shown) to encode and decode data packets according to applicable physical layer specifications or standards.MAC circuitry 1116 can be configured to assemble data to be transmitted into packets, that include destination and source addresses along with network control information and error detection hash values.Processors 1104 can be any a combination of a: processor, core, graphics processing unit (GPU), field programmable gate array (FPGA), application specific integrated circuit (ASIC), or other programmable hardware device that allow programming ofnetwork interface 1100. For example,processors 1104 can provide for identification of a resource to use to perform a workload and generation of a bitstream for execution on the selected resource. For example, a “smart network interface” can provide packet processing capabilities in the networkinterface using processors 1104.
-  Packet allocator 1124 can provide distribution of received packets for processing by multiple CPUs or cores using timeslot allocation described herein or RSS. Whenpacket allocator 1124 uses RSS,packet allocator 1124 can calculate a hash or make another determination based on contents of a received packet to determine which CPU or core is to process a packet.
-  Interrupt coalesce 1122 can perform interrupt moderation whereby network interface interrupt coalesce 1122 waits for multiple packets to arrive, or for a time-out to expire, before generating an interrupt to host system to process received packet(s). Receive Segment Coalescing (RSC) can be performed bynetwork interface 1100 whereby portions of incoming packets are combined into segments of a packet.Network interface 1100 provides this coalesced packet to an application.
-  Direct memory access (DMA)engine 1152 can copy a packet header, packet payload, and/or descriptor directly from host memory to the network interface or vice versa, instead of copying the packet to an intermediate buffer at the host and then using another copy operation from the intermediate buffer to the destination buffer. In some examples,DMA engine 1152 can perform writes of data to any cache such as by using Data Direct I/O (DDIO).
-  Memory 1110 can be any type of volatile or non-volatile memory device and can store any queue or instructions used toprogram network interface 1100. Transmitqueue 1106 can include data or references to data for transmission by network interface. Receivequeue 1108 can include data or references to data that was received by network interface from a network.Descriptor queues 1120 can include descriptors that reference data or packets in transmitqueue 1106 or receivequeue 1108.Bus interface 1112 can provide an interface with host device (not depicted). For example,bus interface 1112 can be compatible with PCI, PCI Express, PCI-x, PHY Interface for the PCI Express (PIPE), Serial ATA, and/or USB compatible interface (although other interconnection standards may be used).
-  In some examples, network interface and other embodiments described herein can be used in connection with a base station (e.g., 3G, 4G, 5G and so forth), macro base station (e.g., 5G networks), picostation (e.g., an IEEE 802.11 compatible access point), nanostation (e.g., for Point-to-MultiPoint (PtMP) applications), on-premises data centers, off-premises data centers, edge network elements, fog network elements, and/or hybrid data centers (e.g., data center that use virtualization, cloud and software-defined networking to deliver application workloads across physical data centers and distributed multi-cloud environments).
-  Various examples may be implemented using hardware elements, software elements, or a combination of both. In some examples, hardware elements may include devices, components, processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, ASICs, PLDs, DSPs, FPGAs, memory units, logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth. In some examples, software elements may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, APIs, instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an example is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints, as desired for a given implementation. A processor can be one or more combination of a hardware state machine, digital control logic, central processing unit, or any hardware, firmware and/or software elements.
-  Some examples may be implemented using or as an article of manufacture or at least one computer-readable medium. A computer-readable medium may include a non-transitory storage medium to store logic. In some examples, the non-transitory storage medium may include one or more types of computer-readable storage media capable of storing electronic data, including volatile memory or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writeable or re-writeable memory, and so forth. In some examples, the logic may include various software elements, such as software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, API, instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof.
-  According to some examples, a computer-readable medium may include a non-transitory storage medium to store or maintain instructions that when executed by a machine, computing device or system, cause the machine, computing device or system to perform methods and/or operations in accordance with the described examples. The instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, and the like. The instructions may be implemented according to a predefined computer language, manner or syntax, for instructing a machine, computing device or system to perform a certain function. The instructions may be implemented using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language.
-  One or more aspects of at least one example may be implemented by representative instructions stored on at least one machine-readable medium which represents various logic within the processor, which when read by a machine, computing device or system causes the machine, computing device or system to fabricate logic to perform the techniques described herein. Such representations, known as “IP cores,” may be stored on a tangible, machine readable medium and supplied to various customers or manufacturing facilities to load into the fabrication machines that actually make the logic or processor.
-  The appearances of the phrase “one example” or “an example” are not necessarily all referring to the same example or embodiment. Any aspect described herein can be combined with any other aspect or similar aspect described herein, regardless of whether the aspects are described with respect to the same figure or element. Division, omission or inclusion of block functions depicted in the accompanying figures does not infer that the hardware components, circuits, software and/or elements for implementing these functions would necessarily be divided, omitted, or included in embodiments.
-  Some examples may be described using the expression “coupled” and “connected” along with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, descriptions using the terms “connected” and/or “coupled” may indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
-  The terms “first,” “second,” and the like, herein do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced items. The term “asserted” used herein with reference to a signal denote a state of the signal, in which the signal is active, and which can be achieved by applying any logic level eitherlogic 0 orlogic 1 to the signal. The terms “follow” or “after” can refer to immediately following or following after some other event or events. Other sequences of steps may also be performed according to alternative embodiments. Furthermore, additional steps may be added or removed depending on the particular applications. Any combination of changes can be used and one of ordinary skill in the art with the benefit of this disclosure would understand the many variations, modifications, and alternative embodiments thereof.
-  Disjunctive language such as the phrase “at least one of X, Y, or Z,” unless specifically stated otherwise, is otherwise understood within the context as used in general to present that an item, term, etc., may be either X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z). Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y, or at least one of Z to each be present. Additionally, conjunctive language such as the phrase “at least one of X, Y, and Z,” unless specifically stated otherwise, should also be understood to mean X, Y, Z, or any combination thereof, including “X, Y, and/or Z.”′
-  Illustrative examples of the devices, systems, and methods disclosed herein are provided below. An embodiment of the devices, systems, and methods may include any one or more, and any combination of, the examples described below.
-  Example 1 includes a method comprising a switch device for a rack of two or more physical servers, wherein the switch device is coupled to the two or more physical servers and the switch device performs packet protocol processing termination for received packets and provides payload data from the received packets without a received packet header to a destination buffer of a destination physical server in the rack.
-  Example 2 includes any example, wherein the switch device comprises at least one central processing unit, the at least one central processing unit is to execute packet processing operations on the received packets.
-  Example 3 includes any example, wherein a physical server executes at least one virtualized execution environments (VEE) and the at least one central processing unit executes a VEE for packet processing of packets with data to be accessed by the physical server that executes the VEE.
-  Example 4 includes any example, wherein the switch device stores a mapping of memory addresses and corresponding destination devices, and based on receipt of a memory transaction from a physical server in the rack, the switch device performs the memory transaction.
-  Example 5 includes any example, wherein the switch device performs the memory transaction comprises: for a read request, the switch device retrieves data from a physical server connected to the rack or another device of a different rack based on the mapping and stores the data into a memory managed by the switch device.
-  Example 6 includes any example, wherein the switch device stores a mapping of memory addresses and corresponding destination devices, and based on receipt of a memory transaction from a physical server in the rack: based on a memory address associated with the memory transaction being associated with a destination server in another rack according to the mapping, transmitting the memory transaction to the destination server, receiving a response to the memory transaction, and storing the response in a memory of the rack.
-  Example 7 includes any example, wherein the switch device comprises at least one central processing unit, the at least one central processing unit to execute a control plane for one or more physical servers that are part of the rack and the control plane to collect telemetry data from the one or more physical servers and based on the telemetry data, perform one or more of: allocation of execution of a virtualized execution environment (VEE) to a physical server of the rack, migration of a VEE from a physical server of the rack to execution on at least one central processing unit of the switch device, migration of a VEE from a physical server of the rack to execution on another physical server of the rack, or allocation of memory of a physical server of the rack for access by a VEE executing on a physical server of the rack.
-  Example 8 includes any example, wherein the switch device comprises at least one central processing unit, the at least one central processing unit to execute a control plane for one or more physical servers that are part of the rack and the control plane distributes execution of virtualized execution environment (VEEs) among one or more physical servers of the rack and selectively terminates a VEE or migrates a VEE to execution on another physical server of the rack or on the switch device.
-  Example 9 includes any example, and includes an apparatus comprising: a switch comprising: at least one processor, wherein the at least one processor is to perform packet termination processing of a received packet and copy payload data from the received packet without an associated received packet header to a destination buffer of a destination physical server through a connection.
-  Example 10 includes any example, wherein the at least one processor is to execute a virtualized execution environment (VEE) and wherein the VEE is to perform the packet termination processing.
-  Example 11 includes any example, wherein based on receipt of a memory transaction from a physical server through the connection, the at least one processor is to perform the memory transaction based on a mapping of memory addresses and corresponding destination devices.
-  Example 12 includes any example, wherein to perform the memory transaction, the at least one processor is to: for a read request: retrieve data from a physical server connected to the at least one processor through the connection or another device of a different rack and store the data into a memory managed by the at least one processor.
-  Example 13 includes any example, wherein: based on receipt of a memory transaction from a physical server in a rack associated with the switch: based a memory address associated with the memory transaction being associated with a destination server in another rack according to a mapping of memory addresses and corresponding destination devices, the at least one processor is to cause transmission of the memory transaction to the destination server, the at least one processor is to access a response to the memory transaction, and the at least one processor is to cause the response to be stored in a memory of the rack.
-  Example 14 includes any example, wherein: the at least one processor is to execute a control plane for one or more physical servers that are part of a rack associated with the switch and the control plane is to collect telemetry data from the one or more physical servers and based on the telemetry data, perform one or more of: allocation of execution of a virtualized execution environment (VEE) to a physical server of the rack, migration of a VEE from a physical server of the rack to execution on the at least one central processing unit of the switch, migration of a VEE from a physical server of the rack to execution on another physical server of the rack, or allocation of memory of a server of the rack for access by a VEE executing on a physical server of the rack.
-  Example 15 includes any example, wherein: the at least one processor is to execute a control plane for one or more physical servers that are part of a rack associated with the switch and the control plane is to distribute execution of virtualized execution environment (VEEs) among one or more physical servers of the rack and selectively terminate a VEE or migrate a VEE to execution on another physical server of the rack or at least one processor that is part of the switch.
-  Example 16 includes any example, wherein the connection is compatible with one or more of: Peripheral Component Interconnect express (PCIe), Compute Express Link (CXL), or any type of Double Data Rate (DDR).
-  Example 17 includes any example, and includes at least one non-transitory computer-readable medium comprising instructions stored thereon, that if executed by a switch, cause the switch to: execute a control plane at the switch to collect telemetry data from one or more physical servers and based on the telemetry data, perform one or more of: allocation of execution of a virtualized execution environment (VEE) to a physical server of a rack that includes the switch, migration of a VEE from a physical server of the rack to execution on at least one central processing unit of the switch, migration of a VEE from a physical server of the rack to execution on another physical server of the rack, or allocation of memory of a server of the rack for access by a VEE executing on a physical server of the rack.
-  Example 18 includes any example, comprising instructions stored thereon, that if executed by a switch, cause the switch to: store a mapping of memory addresses and corresponding destination devices and based on receipt of a memory transaction from a physical server through a connection and based on mapping of memory addresses and corresponding destination devices, the switch is to retrieve data from a physical server connected to the switch through the connection or another device of a different rack and store the data into a memory managed by the switch.
-  Example 19 includes any example, comprising instructions stored thereon, that if executed by a switch, cause the switch to: store a mapping of memory addresses and corresponding destination devices, and based on receipt of a memory transaction from a server in a rack associated with the switch: based a memory address associated with the memory transaction being associated with a destination server in another rack according to the mapping, the switch is to transmit the memory transaction to the destination server, the switch is to receive a response to the memory transaction, and the switch is to store the response in a memory of the rack.
-  Example 20 includes any example, a connection between the switch and one or more physical servers of the rack is compatible with one or more of: Peripheral Component Interconnect express (PCIe), Compute Express Link (CXL), or any type of Double Data Rate (DDR).
-  Example 21 includes any example, and includes a network device comprising: circuitry to perform network protocol termination for received packets; at least one Ethernet port; and multiple connections to be connected to different physical servers in a rack, wherein the circuitry to perform network protocol termination for received packets is to provide a payload of a received packet without an associated header to a physical server.
Claims (21)
Priority Applications (7)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| US16/905,761 US20200322287A1 (en) | 2020-06-18 | 2020-06-18 | Switch-managed resource allocation and software execution | 
| JP2022568889A JP7729016B2 (en) | 2020-06-18 | 2020-12-11 | Switch-managed resource allocation and software enforcement | 
| PCT/US2020/064670 WO2021257111A1 (en) | 2020-06-18 | 2020-12-11 | Switch-managed resource allocation and software execution | 
| CN202080101003.8A CN115668886A (en) | 2020-06-18 | 2020-12-11 | Resource allocation and software execution for switch management | 
| EP24198005.1A EP4447421A3 (en) | 2020-06-18 | 2020-12-11 | Switch-managed resource allocation and software execution | 
| EP20940730.3A EP4169216A4 (en) | 2020-06-18 | 2020-12-11 | SWITCH-MANAGED RESOURCE ALLOCATION AND SOFTWARE EXECUTION | 
| US18/768,909 US12413539B2 (en) | 2020-06-18 | 2024-07-10 | Switch-managed resource allocation and software execution | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| US16/905,761 US20200322287A1 (en) | 2020-06-18 | 2020-06-18 | Switch-managed resource allocation and software execution | 
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| US18/768,909 Continuation US12413539B2 (en) | 2020-06-18 | 2024-07-10 | Switch-managed resource allocation and software execution | 
Publications (1)
| Publication Number | Publication Date | 
|---|---|
| US20200322287A1 true US20200322287A1 (en) | 2020-10-08 | 
Family
ID=72663355
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| US16/905,761 Abandoned US20200322287A1 (en) | 2020-06-18 | 2020-06-18 | Switch-managed resource allocation and software execution | 
| US18/768,909 Active US12413539B2 (en) | 2020-06-18 | 2024-07-10 | Switch-managed resource allocation and software execution | 
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| US18/768,909 Active US12413539B2 (en) | 2020-06-18 | 2024-07-10 | Switch-managed resource allocation and software execution | 
Country Status (5)
| Country | Link | 
|---|---|
| US (2) | US20200322287A1 (en) | 
| EP (2) | EP4169216A4 (en) | 
| JP (1) | JP7729016B2 (en) | 
| CN (1) | CN115668886A (en) | 
| WO (1) | WO2021257111A1 (en) | 
Cited By (91)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US11212219B1 (en) * | 2020-06-26 | 2021-12-28 | Lenovo Enterprise Solutions (Singapore) Pte. Ltd. | In-band telemetry packet size optimization | 
| CN114020618A (en) * | 2021-10-30 | 2022-02-08 | 江苏信而泰智能装备有限公司 | High-availability test method and system based on FPGA and DPDK | 
| WO2022108658A1 (en) | 2020-11-17 | 2022-05-27 | Intel Corporation | Network interface device with support for hierarchical quality of service (qos) | 
| US11363124B2 (en) | 2020-07-30 | 2022-06-14 | Vmware, Inc. | Zero copy socket splicing | 
| US11379127B2 (en) * | 2019-07-18 | 2022-07-05 | Alibaba Group Holding Limited | Method and system for enhancing a distributed storage system by decoupling computation and network tasks | 
| KR20220113283A (en) * | 2021-02-05 | 2022-08-12 | 삼성전자주식회사 | Systems and methods for storage device resource management | 
| US11489783B2 (en) | 2019-12-12 | 2022-11-01 | Vmware, Inc. | Performing deep packet inspection in a software defined wide area network | 
| US11487465B2 (en) | 2020-12-11 | 2022-11-01 | Alibaba Group Holding Limited | Method and system for a local storage engine collaborating with a solid state drive controller | 
| KR20220152132A (en) * | 2021-05-07 | 2022-11-15 | 삼성전자주식회사 | Coherent memory system | 
| US11509571B1 (en) | 2021-05-03 | 2022-11-22 | Vmware, Inc. | Cost-based routing mesh for facilitating routing through an SD-WAN | 
| US11507499B2 (en) | 2020-05-19 | 2022-11-22 | Alibaba Group Holding Limited | System and method for facilitating mitigation of read/write amplification in data compression | 
| US11516049B2 (en) | 2017-10-02 | 2022-11-29 | Vmware, Inc. | Overlay network encapsulation to forward data message flows through multiple public cloud datacenters | 
| US20220385732A1 (en) * | 2021-05-26 | 2022-12-01 | Western Digital Technologies, Inc. | Allocation of distributed cache | 
| US20220398207A1 (en) * | 2021-06-09 | 2022-12-15 | Enfabrica Corporation | Multi-plane, multi-protocol memory switch fabric with configurable transport | 
| US11533248B2 (en) | 2017-06-22 | 2022-12-20 | Nicira, Inc. | Method and system of resiliency in cloud-delivered SD-WAN | 
| US20220407740A1 (en) * | 2021-06-17 | 2022-12-22 | Avago Technologies International Sales Pte. Limited | Systems and methods for inter-device networking using intra-device protcols | 
| US11556277B2 (en) | 2020-05-19 | 2023-01-17 | Alibaba Group Holding Limited | System and method for facilitating improved performance in ordering key-value storage with input/output stack simplification | 
| US20230017643A1 (en) * | 2021-07-18 | 2023-01-19 | Elastics.cloud, Inc. | Composable infrastructure enabled by heterogeneous architecture, delivered by cxl based cached switch soc | 
| US11575591B2 (en) | 2020-11-17 | 2023-02-07 | Vmware, Inc. | Autonomous distributed forwarding plane traceability based anomaly detection in application traffic for hyper-scale SD-WAN | 
| US11575600B2 (en) | 2020-11-24 | 2023-02-07 | Vmware, Inc. | Tunnel-less SD-WAN | 
| US11601356B2 (en) | 2020-12-29 | 2023-03-07 | Vmware, Inc. | Emulating packet flows to assess network links for SD-WAN | 
| US11606712B2 (en) | 2020-01-24 | 2023-03-14 | Vmware, Inc. | Dynamically assigning service classes for a QOS aware network link | 
| US11606314B2 (en) | 2019-08-27 | 2023-03-14 | Vmware, Inc. | Providing recommendations for implementing virtual networks | 
| US11606225B2 (en) | 2017-10-02 | 2023-03-14 | Vmware, Inc. | Identifying multiple nodes in a virtual network defined over a set of public clouds to connect to an external SAAS provider | 
| US11606286B2 (en) | 2017-01-31 | 2023-03-14 | Vmware, Inc. | High performance software-defined core network | 
| US11611507B2 (en) | 2019-10-28 | 2023-03-21 | Vmware, Inc. | Managing forwarding elements at edge nodes connected to a virtual network | 
| US11617282B2 (en) | 2019-10-01 | 2023-03-28 | Alibaba Group Holding Limited | System and method for reshaping power budget of cabinet to facilitate improved deployment density of servers | 
| US20230108461A1 (en) * | 2022-06-07 | 2023-04-06 | Intel Corporation | Virtual device assignment framework | 
| US11677720B2 (en) | 2015-04-13 | 2023-06-13 | Nicira, Inc. | Method and system of establishing a virtual private network in a cloud service for branch networking | 
| EP4202679A1 (en) * | 2021-12-24 | 2023-06-28 | Intel Corporation | Platform with configurable pooled resources | 
| CN116389192A (en) * | 2023-03-31 | 2023-07-04 | 阿里巴巴(中国)有限公司 | Data transmission architecture and its method, device and storage medium | 
| WO2023129405A1 (en) * | 2021-12-30 | 2023-07-06 | Advanced Micro Devices, Inc. | Peripheral device protocols in confidential compute architectures | 
| US11700196B2 (en) | 2017-01-31 | 2023-07-11 | Vmware, Inc. | High performance software-defined core network | 
| US11706127B2 (en) | 2017-01-31 | 2023-07-18 | Vmware, Inc. | High performance software-defined core network | 
| US11706126B2 (en) | 2017-01-31 | 2023-07-18 | Vmware, Inc. | Method and apparatus for distributed data network traffic optimization | 
| US11729065B2 (en) | 2021-05-06 | 2023-08-15 | Vmware, Inc. | Methods for application defined virtual network service among multiple transport in SD-WAN | 
| US11726699B2 (en) | 2021-03-30 | 2023-08-15 | Alibaba Singapore Holding Private Limited | Method and system for facilitating multi-stream sequential read performance improvement with reduced read amplification | 
| US20230261902A1 (en) * | 2016-09-09 | 2023-08-17 | Johnson Controls Tyco IP Holdings LLP | Smart gateway devices, systems and methods for providing communication between hvac system networks | 
| US11734115B2 (en) | 2020-12-28 | 2023-08-22 | Alibaba Group Holding Limited | Method and system for facilitating write latency reduction in a queue depth of one scenario | 
| WO2023158484A1 (en) * | 2022-02-18 | 2023-08-24 | Microsoft Technology Licensing, Llc | Edge gateways in disaggregated networks | 
| US11768709B2 (en) | 2019-01-02 | 2023-09-26 | Alibaba Group Holding Limited | System and method for offloading computation to storage nodes in distributed system | 
| US11792127B2 (en) | 2021-01-18 | 2023-10-17 | Vmware, Inc. | Network-aware load balancing | 
| US11797379B2 (en) | 2022-02-04 | 2023-10-24 | Western Digital Technologies, Inc. | Error detection and data recovery for distributed cache | 
| US11804988B2 (en) | 2013-07-10 | 2023-10-31 | Nicira, Inc. | Method and system of overlay flow control | 
| US11816043B2 (en) | 2018-06-25 | 2023-11-14 | Alibaba Group Holding Limited | System and method for managing resources of a storage device and quantifying the cost of I/O requests | 
| US20230385190A1 (en) * | 2021-02-10 | 2023-11-30 | Huawei Technologies Co., Ltd. | Communication method, apparatus, and system | 
| CN117311910A (en) * | 2023-11-29 | 2023-12-29 | 中安网脉(北京)技术股份有限公司 | High-performance virtual password machine operation method | 
| US20240037026A1 (en) * | 2022-08-01 | 2024-02-01 | Memverge, Inc. | Memory pooling, provisioning, and sharing | 
| US11895194B2 (en) | 2017-10-02 | 2024-02-06 | VMware LLC | Layer four optimization for a virtual network defined over public cloud | 
| US11899585B2 (en) | 2021-12-24 | 2024-02-13 | Western Digital Technologies, Inc. | In-kernel caching for distributed cache | 
| US11902086B2 (en) | 2017-11-09 | 2024-02-13 | Nicira, Inc. | Method and system of a dynamic high-availability mode based on current wide area network connectivity | 
| US11909815B2 (en) | 2022-06-06 | 2024-02-20 | VMware LLC | Routing based on geolocation costs | 
| US11934663B2 (en) | 2022-01-10 | 2024-03-19 | Western Digital Technologies, Inc. | Computational acceleration for distributed cache | 
| US11943146B2 (en) | 2021-10-01 | 2024-03-26 | VMware LLC | Traffic prioritization in SD-WAN | 
| US11966634B2 (en) | 2021-12-02 | 2024-04-23 | Kioxia Corporation | Information processing system and memory system | 
| US11979325B2 (en) | 2021-01-28 | 2024-05-07 | VMware LLC | Dynamic SD-WAN hub cluster scaling with machine learning | 
| US12009987B2 (en) | 2021-05-03 | 2024-06-11 | VMware LLC | Methods to support dynamic transit paths through hub clustering across branches in SD-WAN | 
| US12015536B2 (en) | 2021-06-18 | 2024-06-18 | VMware LLC | Method and apparatus for deploying tenant deployable elements across public clouds based on harvested performance metrics of types of resource elements in the public clouds | 
| US12034587B1 (en) | 2023-03-27 | 2024-07-09 | VMware LLC | Identifying and remediating anomalies in a self-healing network | 
| US12047244B2 (en) | 2017-02-11 | 2024-07-23 | Nicira, Inc. | Method and system of connecting to a multipath hub in a cluster | 
| US12047282B2 (en) | 2021-07-22 | 2024-07-23 | VMware LLC | Methods for smart bandwidth aggregation based dynamic overlay selection among preferred exits in SD-WAN | 
| US12057993B1 (en) | 2023-03-27 | 2024-08-06 | VMware LLC | Identifying and remediating anomalies in a self-healing network | 
| US20240283775A1 (en) * | 2023-02-16 | 2024-08-22 | Palo Alto Networks, Inc. | Inline inspection cybersecurity enforcement of multipart file transmissions | 
| EP4432628A1 (en) * | 2023-03-14 | 2024-09-18 | Samsung Electronics Co., Ltd. | Multi-node computing system | 
| US20240385957A1 (en) * | 2022-01-30 | 2024-11-21 | Huawei Technologies Co., Ltd. | Memory management method, system, and related apparatus | 
| US20240393950A1 (en) * | 2023-05-24 | 2024-11-28 | Western Digital Technologies, Inc. | Disaggregated memory management | 
| US12166661B2 (en) | 2022-07-18 | 2024-12-10 | VMware LLC | DNS-based GSLB-aware SD-WAN for low latency SaaS applications | 
| US12164977B2 (en) | 2020-12-23 | 2024-12-10 | Intel Corporation | Advanced queue monitoring system | 
| US12184557B2 (en) | 2022-01-04 | 2024-12-31 | VMware LLC | Explicit congestion notification in a virtual environment | 
| US12182022B2 (en) | 2022-05-10 | 2024-12-31 | Western Digital Tehcnologies, Inc. | In-kernel cache request queuing for distributed cache | 
| US12218845B2 (en) | 2021-01-18 | 2025-02-04 | VMware LLC | Network-aware load balancing | 
| US12237990B2 (en) | 2022-07-20 | 2025-02-25 | VMware LLC | Method for modifying an SD-WAN using metric-based heat maps | 
| US12250114B2 (en) | 2021-06-18 | 2025-03-11 | VMware LLC | Method and apparatus for deploying tenant deployable elements across public clouds based on harvested performance metrics of sub-types of resource elements in the public clouds | 
| WO2025059415A1 (en) * | 2023-09-13 | 2025-03-20 | Zecurity, Llc | Apparatus and methods relying on non-flashable circuitry for improving security for a system connected to a public or private network | 
| US12261777B2 (en) | 2023-08-16 | 2025-03-25 | VMware LLC | Forwarding packets in multi-regional large scale deployments with distributed gateways | 
| US12267364B2 (en) | 2021-07-24 | 2025-04-01 | VMware LLC | Network management services in a virtual network | 
| US20250199983A1 (en) * | 2023-12-13 | 2025-06-19 | Unifabrix Ltd. | Scalable Virtualization of GPUs and Compute Accelerators in a Switch Providing CXL Resource-as-a-Service of Memory, NVMe or RDMA Networking via SLD Agnostic Provisioning | 
| US20250202839A1 (en) * | 2023-12-13 | 2025-06-19 | Unifabrix Ltd. | Elastic Multi-Directional Resource Augmentation in a Switched CXL Fabric | 
| US20250202841A1 (en) * | 2023-12-13 | 2025-06-19 | Unifabrix Ltd. | Switched Protocol Transformer for High-Performance Computing (HPC) and AI Workloads | 
| US12355655B2 (en) | 2023-08-16 | 2025-07-08 | VMware LLC | Forwarding packets in multi-regional large scale deployments with distributed gateways | 
| US12360937B2 (en) | 2021-07-18 | 2025-07-15 | Avago Technologies International Sales Pte. Limited | Compute express Link™ (CXL) over ethernet (COE) | 
| US12368676B2 (en) | 2021-04-29 | 2025-07-22 | VMware LLC | Methods for micro-segmentation in SD-WAN for virtual networks | 
| US12379951B2 (en) | 2022-06-27 | 2025-08-05 | Western Digital Technologies, Inc. | Memory coherence in virtualized environments | 
| US12386648B2 (en) | 2022-06-09 | 2025-08-12 | Western Digital Technologies, Inc. | Resource allocation in virtualized environments | 
| WO2025167136A1 (en) * | 2024-02-06 | 2025-08-14 | 苏州元脑智能科技有限公司 | Resource management method and apparatus, electronic device, and storage medium | 
| US12401544B2 (en) | 2013-07-10 | 2025-08-26 | VMware LLC | Connectivity in an edge-gateway multipath system | 
| US12425335B2 (en) | 2015-04-13 | 2025-09-23 | VMware LLC | Method and system of application-aware routing with crowdsourcing | 
| US12425347B2 (en) | 2020-07-02 | 2025-09-23 | VMware LLC | Methods and apparatus for application aware hub clustering techniques for a hyper scale SD-WAN | 
| US12425395B2 (en) | 2022-01-15 | 2025-09-23 | VMware LLC | Method and system of securely adding an edge device operating in a public network to an SD-WAN | 
| US12425332B2 (en) | 2023-03-27 | 2025-09-23 | VMware LLC | Remediating anomalies in a self-healing network | 
| US12430057B2 (en) | 2022-03-31 | 2025-09-30 | Intel Corporation | Dynamic multilevel memory system | 
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US20220291928A1 (en) * | 2022-05-03 | 2022-09-15 | Intel Corporation | Event controller in a device | 
| WO2024103338A1 (en) * | 2022-11-17 | 2024-05-23 | Nvidia Corporation | Application programming interface to perform asynchronous data movement | 
| CN116700887A (en) * | 2023-05-10 | 2023-09-05 | 阿里巴巴(中国)有限公司 | Cabinet and virtual server creation method | 
| CN119629097A (en) * | 2023-09-12 | 2025-03-14 | 瑞昱半导体股份有限公司 | System and method for prototyping a circuit under test | 
Citations (16)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US20080126507A1 (en) * | 2006-08-31 | 2008-05-29 | Keith Iain Wilkinson | Shared memory message switch and cache | 
| US20100064070A1 (en) * | 2008-09-01 | 2010-03-11 | Chihiro Yoshimura | Data transfer unit for computer | 
| US20140013066A1 (en) * | 2012-07-05 | 2014-01-09 | Samsung Electronics Co., Ltd. | Memory sub-system and computing system including the same | 
| US20140032795A1 (en) * | 2011-04-13 | 2014-01-30 | Hewlett-Packard Development Company, L.P. | Input/output processing | 
| US20150222443A1 (en) * | 2014-02-03 | 2015-08-06 | International Business Machines Corporation | Computer-based flow synchronization for efficient multicast forwarding for products and services | 
| US20150227312A1 (en) * | 2014-02-12 | 2015-08-13 | Oracle International Corporation | Method for steering dma write requests to cache memory | 
| US9233520B2 (en) * | 2012-03-28 | 2016-01-12 | W. L. Gore & Associates, Inc. | Laminated articles having discontinuous adhesive regions | 
| US20170052916A1 (en) * | 2015-08-17 | 2017-02-23 | Brocade Communications Systems, Inc. | PCI Express Connected Network Switch | 
| US20170090987A1 (en) * | 2015-09-26 | 2017-03-30 | Intel Corporation | Real-Time Local and Global Datacenter Network Optimizations Based on Platform Telemetry Data | 
| US20170264493A1 (en) * | 2015-03-09 | 2017-09-14 | Vapor IO Inc. | Autonomous distributed workload and infrastructure scheduling | 
| US10001933B1 (en) * | 2015-06-23 | 2018-06-19 | Amazon Technologies, Inc. | Offload pipeline for data copying | 
| US20190044705A1 (en) * | 2018-03-16 | 2019-02-07 | Intel Corporation | Technologies for accelerated quic packet processing with hardware offloads | 
| US20190044893A1 (en) * | 2018-06-30 | 2019-02-07 | Intel Corporation | Technologies for buffering received network packet data | 
| US10412002B1 (en) * | 2015-03-25 | 2019-09-10 | Amazon Technologies, Inc. | Processing packet data using an offload engine in a service provider environment | 
| US20190391835A1 (en) * | 2018-06-26 | 2019-12-26 | Dell Products L.P. | Systems and methods for migration of computing resources based on input/output device proximity | 
| US11461123B1 (en) * | 2019-11-21 | 2022-10-04 | Amazon Technologies, Inc. | Dynamic pre-copy and post-copy determination for live migration between cloud regions and edge locations | 
Family Cites Families (13)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN1503506B (en) | 2002-11-20 | 2010-05-12 | 株式会社日立制作所 | virtual access router | 
| JP5018890B2 (en) | 2007-10-31 | 2012-09-05 | 富士通株式会社 | COMMUNICATION METHOD, COMMUNICATION TERMINAL, DATA TRANSFER DEVICE, AND CONTROL DEVICE | 
| US9497039B2 (en) | 2009-05-28 | 2016-11-15 | Microsoft Technology Licensing, Llc | Agile data center network architecture | 
| US20130114607A1 (en) | 2011-11-09 | 2013-05-09 | Jeffrey S. McGovern | Reference Architecture For Improved Scalability Of Virtual Data Center Resources | 
| US9825884B2 (en) * | 2013-12-30 | 2017-11-21 | Cavium, Inc. | Protocol independent programmable switch (PIPS) software defined data center networks | 
| US9813302B2 (en) * | 2015-08-28 | 2017-11-07 | Tigera, Inc. | Data center networks | 
| US10263832B1 (en) | 2016-12-29 | 2019-04-16 | Juniper Networks, Inc. | Physical interface to virtual interface fault propagation | 
| EP3574679B1 (en) | 2017-01-24 | 2021-06-23 | Telefonaktiebolaget LM Ericsson (PUBL) | Lossless handover for mobility with location identifier separation protocol in 3rd generation partnership project networks | 
| US11563698B2 (en) | 2017-11-30 | 2023-01-24 | Telefonaktiebolaget Lm Ericsson (Publ) | Packet value based packet processing | 
| US10956336B2 (en) * | 2018-07-20 | 2021-03-23 | International Business Machines Corporation | Efficient silent data transmission between computer servers | 
| US20200104275A1 (en) * | 2019-12-02 | 2020-04-02 | Intel Corporation | Shared memory space among devices | 
| US20210184795A1 (en) * | 2019-12-16 | 2021-06-17 | Nvidia Corporation | Accelerated parallel processing of 5g nr signal information | 
| US20210311871A1 (en) * | 2020-04-06 | 2021-10-07 | Samsung Electronics Co., Ltd. | System and method for aggregating server memory | 
- 
        2020
        - 2020-06-18 US US16/905,761 patent/US20200322287A1/en not_active Abandoned
- 2020-12-11 WO PCT/US2020/064670 patent/WO2021257111A1/en not_active Ceased
- 2020-12-11 CN CN202080101003.8A patent/CN115668886A/en active Pending
- 2020-12-11 EP EP20940730.3A patent/EP4169216A4/en active Pending
- 2020-12-11 EP EP24198005.1A patent/EP4447421A3/en active Pending
- 2020-12-11 JP JP2022568889A patent/JP7729016B2/en active Active
 
- 
        2024
        - 2024-07-10 US US18/768,909 patent/US12413539B2/en active Active
 
Patent Citations (16)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US20080126507A1 (en) * | 2006-08-31 | 2008-05-29 | Keith Iain Wilkinson | Shared memory message switch and cache | 
| US20100064070A1 (en) * | 2008-09-01 | 2010-03-11 | Chihiro Yoshimura | Data transfer unit for computer | 
| US20140032795A1 (en) * | 2011-04-13 | 2014-01-30 | Hewlett-Packard Development Company, L.P. | Input/output processing | 
| US9233520B2 (en) * | 2012-03-28 | 2016-01-12 | W. L. Gore & Associates, Inc. | Laminated articles having discontinuous adhesive regions | 
| US20140013066A1 (en) * | 2012-07-05 | 2014-01-09 | Samsung Electronics Co., Ltd. | Memory sub-system and computing system including the same | 
| US20150222443A1 (en) * | 2014-02-03 | 2015-08-06 | International Business Machines Corporation | Computer-based flow synchronization for efficient multicast forwarding for products and services | 
| US20150227312A1 (en) * | 2014-02-12 | 2015-08-13 | Oracle International Corporation | Method for steering dma write requests to cache memory | 
| US20170264493A1 (en) * | 2015-03-09 | 2017-09-14 | Vapor IO Inc. | Autonomous distributed workload and infrastructure scheduling | 
| US10412002B1 (en) * | 2015-03-25 | 2019-09-10 | Amazon Technologies, Inc. | Processing packet data using an offload engine in a service provider environment | 
| US10001933B1 (en) * | 2015-06-23 | 2018-06-19 | Amazon Technologies, Inc. | Offload pipeline for data copying | 
| US20170052916A1 (en) * | 2015-08-17 | 2017-02-23 | Brocade Communications Systems, Inc. | PCI Express Connected Network Switch | 
| US20170090987A1 (en) * | 2015-09-26 | 2017-03-30 | Intel Corporation | Real-Time Local and Global Datacenter Network Optimizations Based on Platform Telemetry Data | 
| US20190044705A1 (en) * | 2018-03-16 | 2019-02-07 | Intel Corporation | Technologies for accelerated quic packet processing with hardware offloads | 
| US20190391835A1 (en) * | 2018-06-26 | 2019-12-26 | Dell Products L.P. | Systems and methods for migration of computing resources based on input/output device proximity | 
| US20190044893A1 (en) * | 2018-06-30 | 2019-02-07 | Intel Corporation | Technologies for buffering received network packet data | 
| US11461123B1 (en) * | 2019-11-21 | 2022-10-04 | Amazon Technologies, Inc. | Dynamic pre-copy and post-copy determination for live migration between cloud regions and edge locations | 
Cited By (132)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US12401544B2 (en) | 2013-07-10 | 2025-08-26 | VMware LLC | Connectivity in an edge-gateway multipath system | 
| US11804988B2 (en) | 2013-07-10 | 2023-10-31 | Nicira, Inc. | Method and system of overlay flow control | 
| US12425335B2 (en) | 2015-04-13 | 2025-09-23 | VMware LLC | Method and system of application-aware routing with crowdsourcing | 
| US11677720B2 (en) | 2015-04-13 | 2023-06-13 | Nicira, Inc. | Method and system of establishing a virtual private network in a cloud service for branch networking | 
| US12160408B2 (en) | 2015-04-13 | 2024-12-03 | Nicira, Inc. | Method and system of establishing a virtual private network in a cloud service for branch networking | 
| US12003349B2 (en) * | 2016-09-09 | 2024-06-04 | Tyco Fire & Security Gmbh | Smart gateway devices, systems and methods for providing communication between HVAC system networks | 
| US20230261902A1 (en) * | 2016-09-09 | 2023-08-17 | Johnson Controls Tyco IP Holdings LLP | Smart gateway devices, systems and methods for providing communication between hvac system networks | 
| US11700196B2 (en) | 2017-01-31 | 2023-07-11 | Vmware, Inc. | High performance software-defined core network | 
| US11706127B2 (en) | 2017-01-31 | 2023-07-18 | Vmware, Inc. | High performance software-defined core network | 
| US11706126B2 (en) | 2017-01-31 | 2023-07-18 | Vmware, Inc. | Method and apparatus for distributed data network traffic optimization | 
| US11606286B2 (en) | 2017-01-31 | 2023-03-14 | Vmware, Inc. | High performance software-defined core network | 
| US12034630B2 (en) | 2017-01-31 | 2024-07-09 | VMware LLC | Method and apparatus for distributed data network traffic optimization | 
| US12058030B2 (en) | 2017-01-31 | 2024-08-06 | VMware LLC | High performance software-defined core network | 
| US12047244B2 (en) | 2017-02-11 | 2024-07-23 | Nicira, Inc. | Method and system of connecting to a multipath hub in a cluster | 
| US12335131B2 (en) | 2017-06-22 | 2025-06-17 | VMware LLC | Method and system of resiliency in cloud-delivered SD-WAN | 
| US11533248B2 (en) | 2017-06-22 | 2022-12-20 | Nicira, Inc. | Method and system of resiliency in cloud-delivered SD-WAN | 
| US11516049B2 (en) | 2017-10-02 | 2022-11-29 | Vmware, Inc. | Overlay network encapsulation to forward data message flows through multiple public cloud datacenters | 
| US11606225B2 (en) | 2017-10-02 | 2023-03-14 | Vmware, Inc. | Identifying multiple nodes in a virtual network defined over a set of public clouds to connect to an external SAAS provider | 
| US11855805B2 (en) | 2017-10-02 | 2023-12-26 | Vmware, Inc. | Deploying firewall for virtual network defined over public cloud infrastructure | 
| US11894949B2 (en) | 2017-10-02 | 2024-02-06 | VMware LLC | Identifying multiple nodes in a virtual network defined over a set of public clouds to connect to an external SaaS provider | 
| US11895194B2 (en) | 2017-10-02 | 2024-02-06 | VMware LLC | Layer four optimization for a virtual network defined over public cloud | 
| US11902086B2 (en) | 2017-11-09 | 2024-02-13 | Nicira, Inc. | Method and system of a dynamic high-availability mode based on current wide area network connectivity | 
| US11816043B2 (en) | 2018-06-25 | 2023-11-14 | Alibaba Group Holding Limited | System and method for managing resources of a storage device and quantifying the cost of I/O requests | 
| US11768709B2 (en) | 2019-01-02 | 2023-09-26 | Alibaba Group Holding Limited | System and method for offloading computation to storage nodes in distributed system | 
| US11379127B2 (en) * | 2019-07-18 | 2022-07-05 | Alibaba Group Holding Limited | Method and system for enhancing a distributed storage system by decoupling computation and network tasks | 
| US12132671B2 (en) | 2019-08-27 | 2024-10-29 | VMware LLC | Providing recommendations for implementing virtual networks | 
| US11606314B2 (en) | 2019-08-27 | 2023-03-14 | Vmware, Inc. | Providing recommendations for implementing virtual networks | 
| US11831414B2 (en) | 2019-08-27 | 2023-11-28 | Vmware, Inc. | Providing recommendations for implementing virtual networks | 
| US11617282B2 (en) | 2019-10-01 | 2023-03-28 | Alibaba Group Holding Limited | System and method for reshaping power budget of cabinet to facilitate improved deployment density of servers | 
| US11611507B2 (en) | 2019-10-28 | 2023-03-21 | Vmware, Inc. | Managing forwarding elements at edge nodes connected to a virtual network | 
| US11489783B2 (en) | 2019-12-12 | 2022-11-01 | Vmware, Inc. | Performing deep packet inspection in a software defined wide area network | 
| US12177130B2 (en) | 2019-12-12 | 2024-12-24 | VMware LLC | Performing deep packet inspection in a software defined wide area network | 
| US11716286B2 (en) | 2019-12-12 | 2023-08-01 | Vmware, Inc. | Collecting and analyzing data regarding flows associated with DPI parameters | 
| US11606712B2 (en) | 2020-01-24 | 2023-03-14 | Vmware, Inc. | Dynamically assigning service classes for a QOS aware network link | 
| US11689959B2 (en) | 2020-01-24 | 2023-06-27 | Vmware, Inc. | Generating path usability state for different sub-paths offered by a network link | 
| US12041479B2 (en) | 2020-01-24 | 2024-07-16 | VMware LLC | Accurate traffic steering between links through sub-path path quality metrics | 
| US11722925B2 (en) | 2020-01-24 | 2023-08-08 | Vmware, Inc. | Performing service class aware load balancing to distribute packets of a flow among multiple network links | 
| US11507499B2 (en) | 2020-05-19 | 2022-11-22 | Alibaba Group Holding Limited | System and method for facilitating mitigation of read/write amplification in data compression | 
| US11556277B2 (en) | 2020-05-19 | 2023-01-17 | Alibaba Group Holding Limited | System and method for facilitating improved performance in ordering key-value storage with input/output stack simplification | 
| US11212219B1 (en) * | 2020-06-26 | 2021-12-28 | Lenovo Enterprise Solutions (Singapore) Pte. Ltd. | In-band telemetry packet size optimization | 
| US12425347B2 (en) | 2020-07-02 | 2025-09-23 | VMware LLC | Methods and apparatus for application aware hub clustering techniques for a hyper scale SD-WAN | 
| US11709710B2 (en) | 2020-07-30 | 2023-07-25 | Vmware, Inc. | Memory allocator for I/O operations | 
| US11363124B2 (en) | 2020-07-30 | 2022-06-14 | Vmware, Inc. | Zero copy socket splicing | 
| WO2022108658A1 (en) | 2020-11-17 | 2022-05-27 | Intel Corporation | Network interface device with support for hierarchical quality of service (qos) | 
| US11575591B2 (en) | 2020-11-17 | 2023-02-07 | Vmware, Inc. | Autonomous distributed forwarding plane traceability based anomaly detection in application traffic for hyper-scale SD-WAN | 
| EP4248632A4 (en) * | 2020-11-17 | 2024-09-11 | INTEL Corporation | Network interface device with support for hierarchical quality of service (qos) | 
| US11575600B2 (en) | 2020-11-24 | 2023-02-07 | Vmware, Inc. | Tunnel-less SD-WAN | 
| US12375403B2 (en) | 2020-11-24 | 2025-07-29 | VMware LLC | Tunnel-less SD-WAN | 
| US11487465B2 (en) | 2020-12-11 | 2022-11-01 | Alibaba Group Holding Limited | Method and system for a local storage engine collaborating with a solid state drive controller | 
| US12164977B2 (en) | 2020-12-23 | 2024-12-10 | Intel Corporation | Advanced queue monitoring system | 
| US11734115B2 (en) | 2020-12-28 | 2023-08-22 | Alibaba Group Holding Limited | Method and system for facilitating write latency reduction in a queue depth of one scenario | 
| US11929903B2 (en) | 2020-12-29 | 2024-03-12 | VMware LLC | Emulating packet flows to assess network links for SD-WAN | 
| US11601356B2 (en) | 2020-12-29 | 2023-03-07 | Vmware, Inc. | Emulating packet flows to assess network links for SD-WAN | 
| US11792127B2 (en) | 2021-01-18 | 2023-10-17 | Vmware, Inc. | Network-aware load balancing | 
| US12218845B2 (en) | 2021-01-18 | 2025-02-04 | VMware LLC | Network-aware load balancing | 
| US11979325B2 (en) | 2021-01-28 | 2024-05-07 | VMware LLC | Dynamic SD-WAN hub cluster scaling with machine learning | 
| KR102842132B1 (en) * | 2021-02-05 | 2025-08-04 | 삼성전자주식회사 | Systems and methods for storage device resource management | 
| KR20220113283A (en) * | 2021-02-05 | 2022-08-12 | 삼성전자주식회사 | Systems and methods for storage device resource management | 
| US12293094B2 (en) * | 2021-02-05 | 2025-05-06 | Samsung Electronics Co., Ltd. | Systems and methods for storage device resource management | 
| US20240143203A1 (en) * | 2021-02-05 | 2024-05-02 | Samsung Electronics Co., Ltd. | Systems and methods for storage device resource management | 
| US20230385190A1 (en) * | 2021-02-10 | 2023-11-30 | Huawei Technologies Co., Ltd. | Communication method, apparatus, and system | 
| US12380018B2 (en) * | 2021-02-10 | 2025-08-05 | Huawei Technologies Co., Ltd. | Method for host access to network device-managed memory pool | 
| US11726699B2 (en) | 2021-03-30 | 2023-08-15 | Alibaba Singapore Holding Private Limited | Method and system for facilitating multi-stream sequential read performance improvement with reduced read amplification | 
| US12368676B2 (en) | 2021-04-29 | 2025-07-22 | VMware LLC | Methods for micro-segmentation in SD-WAN for virtual networks | 
| US11637768B2 (en) | 2021-05-03 | 2023-04-25 | Vmware, Inc. | On demand routing mesh for routing packets through SD-WAN edge forwarding nodes in an SD-WAN | 
| US11509571B1 (en) | 2021-05-03 | 2022-11-22 | Vmware, Inc. | Cost-based routing mesh for facilitating routing through an SD-WAN | 
| US12009987B2 (en) | 2021-05-03 | 2024-06-11 | VMware LLC | Methods to support dynamic transit paths through hub clustering across branches in SD-WAN | 
| US11582144B2 (en) | 2021-05-03 | 2023-02-14 | Vmware, Inc. | Routing mesh to provide alternate routes through SD-WAN edge forwarding nodes based on degraded operational states of SD-WAN hubs | 
| US12218800B2 (en) | 2021-05-06 | 2025-02-04 | VMware LLC | Methods for application defined virtual network service among multiple transport in sd-wan | 
| US11729065B2 (en) | 2021-05-06 | 2023-08-15 | Vmware, Inc. | Methods for application defined virtual network service among multiple transport in SD-WAN | 
| US12411767B2 (en) | 2021-05-07 | 2025-09-09 | Samsung Electronics Co., Ltd. | Coherent memory system | 
| KR102851491B1 (en) * | 2021-05-07 | 2025-08-27 | 삼성전자주식회사 | Coherent memory system | 
| KR20220152132A (en) * | 2021-05-07 | 2022-11-15 | 삼성전자주식회사 | Coherent memory system | 
| US12301690B2 (en) * | 2021-05-26 | 2025-05-13 | Western Digital Technologies, Inc. | Allocation of distributed cache | 
| US20220385732A1 (en) * | 2021-05-26 | 2022-12-01 | Western Digital Technologies, Inc. | Allocation of distributed cache | 
| US20220398207A1 (en) * | 2021-06-09 | 2022-12-15 | Enfabrica Corporation | Multi-plane, multi-protocol memory switch fabric with configurable transport | 
| US11995017B2 (en) * | 2021-06-09 | 2024-05-28 | Enfabrica Corporation | Multi-plane, multi-protocol memory switch fabric with configurable transport | 
| US20220407740A1 (en) * | 2021-06-17 | 2022-12-22 | Avago Technologies International Sales Pte. Limited | Systems and methods for inter-device networking using intra-device protcols | 
| US11863347B2 (en) * | 2021-06-17 | 2024-01-02 | Avago Technologies International Sales Pte. Limited | Systems and methods for inter-device networking using intra-device protcols | 
| US12015536B2 (en) | 2021-06-18 | 2024-06-18 | VMware LLC | Method and apparatus for deploying tenant deployable elements across public clouds based on harvested performance metrics of types of resource elements in the public clouds | 
| US12250114B2 (en) | 2021-06-18 | 2025-03-11 | VMware LLC | Method and apparatus for deploying tenant deployable elements across public clouds based on harvested performance metrics of sub-types of resource elements in the public clouds | 
| US20230017643A1 (en) * | 2021-07-18 | 2023-01-19 | Elastics.cloud, Inc. | Composable infrastructure enabled by heterogeneous architecture, delivered by cxl based cached switch soc | 
| US12259816B2 (en) * | 2021-07-18 | 2025-03-25 | Avago Technologies International Sales Pte., Limited | Composable infrastructure enabled by heterogeneous architecture, delivered by CXL based cached switch SOC | 
| US12326813B2 (en) | 2021-07-18 | 2025-06-10 | Avago Technologies International Sales Pte. Limited | Heterogeneous architecture, delivered by cxl based cached switch SOC and extensible via cxloverethernet (COE) protocols | 
| US12360937B2 (en) | 2021-07-18 | 2025-07-15 | Avago Technologies International Sales Pte. Limited | Compute express Link™ (CXL) over ethernet (COE) | 
| US12386751B2 (en) | 2021-07-18 | 2025-08-12 | Avago Technologies International Sales Pte. Limited | Composable infrastructure enabled by heterogeneous architecture, delivered by CXL based cached switch SOC and extensible via cxloverethernet (COE) protocols | 
| US12047282B2 (en) | 2021-07-22 | 2024-07-23 | VMware LLC | Methods for smart bandwidth aggregation based dynamic overlay selection among preferred exits in SD-WAN | 
| US12267364B2 (en) | 2021-07-24 | 2025-04-01 | VMware LLC | Network management services in a virtual network | 
| US11943146B2 (en) | 2021-10-01 | 2024-03-26 | VMware LLC | Traffic prioritization in SD-WAN | 
| CN114020618A (en) * | 2021-10-30 | 2022-02-08 | 江苏信而泰智能装备有限公司 | High-availability test method and system based on FPGA and DPDK | 
| US11966634B2 (en) | 2021-12-02 | 2024-04-23 | Kioxia Corporation | Information processing system and memory system | 
| EP4202679A1 (en) * | 2021-12-24 | 2023-06-28 | Intel Corporation | Platform with configurable pooled resources | 
| US11899585B2 (en) | 2021-12-24 | 2024-02-13 | Western Digital Technologies, Inc. | In-kernel caching for distributed cache | 
| US11860797B2 (en) | 2021-12-30 | 2024-01-02 | Advanced Micro Devices, Inc. | Peripheral device protocols in confidential compute architectures | 
| WO2023129405A1 (en) * | 2021-12-30 | 2023-07-06 | Advanced Micro Devices, Inc. | Peripheral device protocols in confidential compute architectures | 
| US12184557B2 (en) | 2022-01-04 | 2024-12-31 | VMware LLC | Explicit congestion notification in a virtual environment | 
| US11934663B2 (en) | 2022-01-10 | 2024-03-19 | Western Digital Technologies, Inc. | Computational acceleration for distributed cache | 
| US12425395B2 (en) | 2022-01-15 | 2025-09-23 | VMware LLC | Method and system of securely adding an edge device operating in a public network to an SD-WAN | 
| US20240385957A1 (en) * | 2022-01-30 | 2024-11-21 | Huawei Technologies Co., Ltd. | Memory management method, system, and related apparatus | 
| US11797379B2 (en) | 2022-02-04 | 2023-10-24 | Western Digital Technologies, Inc. | Error detection and data recovery for distributed cache | 
| WO2023158484A1 (en) * | 2022-02-18 | 2023-08-24 | Microsoft Technology Licensing, Llc | Edge gateways in disaggregated networks | 
| US12238066B2 (en) | 2022-02-18 | 2025-02-25 | Microsoft Technology Licensing, Llc | Edge gateways in disaggregated networks | 
| US12430057B2 (en) | 2022-03-31 | 2025-09-30 | Intel Corporation | Dynamic multilevel memory system | 
| US12182022B2 (en) | 2022-05-10 | 2024-12-31 | Western Digital Tehcnologies, Inc. | In-kernel cache request queuing for distributed cache | 
| US11909815B2 (en) | 2022-06-06 | 2024-02-20 | VMware LLC | Routing based on geolocation costs | 
| US20230108461A1 (en) * | 2022-06-07 | 2023-04-06 | Intel Corporation | Virtual device assignment framework | 
| US12386648B2 (en) | 2022-06-09 | 2025-08-12 | Western Digital Technologies, Inc. | Resource allocation in virtualized environments | 
| US12379951B2 (en) | 2022-06-27 | 2025-08-05 | Western Digital Technologies, Inc. | Memory coherence in virtualized environments | 
| US12166661B2 (en) | 2022-07-18 | 2024-12-10 | VMware LLC | DNS-based GSLB-aware SD-WAN for low latency SaaS applications | 
| US12237990B2 (en) | 2022-07-20 | 2025-02-25 | VMware LLC | Method for modifying an SD-WAN using metric-based heat maps | 
| US12316524B2 (en) | 2022-07-20 | 2025-05-27 | VMware LLC | Modifying an SD-wan based on flow metrics | 
| US12399811B2 (en) * | 2022-08-01 | 2025-08-26 | Memverge, Inc. | Memory pooling, provisioning, and sharing | 
| US20240037026A1 (en) * | 2022-08-01 | 2024-02-01 | Memverge, Inc. | Memory pooling, provisioning, and sharing | 
| US20240283775A1 (en) * | 2023-02-16 | 2024-08-22 | Palo Alto Networks, Inc. | Inline inspection cybersecurity enforcement of multipart file transmissions | 
| US12407651B2 (en) * | 2023-02-16 | 2025-09-02 | Palo Alto Networks, Inc. | Inline inspection cybersecurity enforcement of multipart file transmissions | 
| EP4432628A1 (en) * | 2023-03-14 | 2024-09-18 | Samsung Electronics Co., Ltd. | Multi-node computing system | 
| US12057993B1 (en) | 2023-03-27 | 2024-08-06 | VMware LLC | Identifying and remediating anomalies in a self-healing network | 
| US12425332B2 (en) | 2023-03-27 | 2025-09-23 | VMware LLC | Remediating anomalies in a self-healing network | 
| US12034587B1 (en) | 2023-03-27 | 2024-07-09 | VMware LLC | Identifying and remediating anomalies in a self-healing network | 
| CN116389192A (en) * | 2023-03-31 | 2023-07-04 | 阿里巴巴(中国)有限公司 | Data transmission architecture and its method, device and storage medium | 
| US12321602B2 (en) * | 2023-05-24 | 2025-06-03 | Western Digital Technologies, Inc. | Disaggregated memory management | 
| US20240393950A1 (en) * | 2023-05-24 | 2024-11-28 | Western Digital Technologies, Inc. | Disaggregated memory management | 
| US12261777B2 (en) | 2023-08-16 | 2025-03-25 | VMware LLC | Forwarding packets in multi-regional large scale deployments with distributed gateways | 
| US12355655B2 (en) | 2023-08-16 | 2025-07-08 | VMware LLC | Forwarding packets in multi-regional large scale deployments with distributed gateways | 
| WO2025059415A1 (en) * | 2023-09-13 | 2025-03-20 | Zecurity, Llc | Apparatus and methods relying on non-flashable circuitry for improving security for a system connected to a public or private network | 
| CN117311910A (en) * | 2023-11-29 | 2023-12-29 | 中安网脉(北京)技术股份有限公司 | High-performance virtual password machine operation method | 
| US20250202841A1 (en) * | 2023-12-13 | 2025-06-19 | Unifabrix Ltd. | Switched Protocol Transformer for High-Performance Computing (HPC) and AI Workloads | 
| US12407630B2 (en) * | 2023-12-13 | 2025-09-02 | Unifabrix Ltd. | Elastic multi-directional resource augmentation in a switched CXL fabric | 
| US20250202839A1 (en) * | 2023-12-13 | 2025-06-19 | Unifabrix Ltd. | Elastic Multi-Directional Resource Augmentation in a Switched CXL Fabric | 
| US12425358B2 (en) * | 2023-12-13 | 2025-09-23 | Unifabrix Ltd. | Switched protocol transformer for high-performance computing (HPC) and AI workloads | 
| US20250199983A1 (en) * | 2023-12-13 | 2025-06-19 | Unifabrix Ltd. | Scalable Virtualization of GPUs and Compute Accelerators in a Switch Providing CXL Resource-as-a-Service of Memory, NVMe or RDMA Networking via SLD Agnostic Provisioning | 
| WO2025167136A1 (en) * | 2024-02-06 | 2025-08-14 | 苏州元脑智能科技有限公司 | Resource management method and apparatus, electronic device, and storage medium | 
Also Published As
| Publication number | Publication date | 
|---|---|
| JP2023530064A (en) | 2023-07-13 | 
| CN115668886A (en) | 2023-01-31 | 
| JP7729016B2 (en) | 2025-08-26 | 
| EP4447421A3 (en) | 2025-02-19 | 
| US20240364641A1 (en) | 2024-10-31 | 
| EP4169216A1 (en) | 2023-04-26 | 
| EP4447421A2 (en) | 2024-10-16 | 
| EP4169216A4 (en) | 2024-05-01 | 
| US12413539B2 (en) | 2025-09-09 | 
| WO2021257111A1 (en) | 2021-12-23 | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| US12413539B2 (en) | Switch-managed resource allocation and software execution | |
| US11941458B2 (en) | Maintaining storage namespace identifiers for live virtualized execution environment migration | |
| US12117956B2 (en) | Writes to multiple memory destinations | |
| US20220103530A1 (en) | Transport and cryptography offload to a network interface device | |
| US11748278B2 (en) | Multi-protocol support for transactions | |
| EP3754498B1 (en) | Architecture for offload of linked work assignments | |
| US12153962B2 (en) | Storage transactions with predictable latency | |
| US12292842B2 (en) | Network layer 7 offload to infrastructure processing unit for service mesh | |
| US20210349820A1 (en) | Memory allocation for distributed processing devices | |
| US20220261178A1 (en) | Address translation technologies | |
| EP4289108A1 (en) | Transport and crysptography offload to a network interface device | |
| US20210359955A1 (en) | Cache allocation system | |
| CN115004164A (en) | Dynamic interrupt configuration | |
| US20220166666A1 (en) | Data plane operation in a packet processing device | |
| US20220138021A1 (en) | Communications for workloads | |
| US20230109396A1 (en) | Load balancing and networking policy performance by a packet processing pipeline | |
| US20230153174A1 (en) | Device selection for workload execution | |
| US20210149821A1 (en) | Address translation technologies | |
| US20200228467A1 (en) | Queue-to-port allocation | |
| EP4187868A1 (en) | Load balancing and networking policy performance by a packet processing pipeline | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| AS | Assignment | Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CONNOR, PATRICK;HEARN, JAMES R.;LIEDTKE, KEVIN;AND OTHERS;SIGNING DATES FROM 20200617 TO 20200619;REEL/FRAME:053145/0962 | |
| STCT | Information on status: administrative procedure adjustment | Free format text: PROSECUTION SUSPENDED | |
| STPP | Information on status: patent application and granting procedure in general | Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION | |
| STPP | Information on status: patent application and granting procedure in general | Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION | |
| STPP | Information on status: patent application and granting procedure in general | Free format text: NON FINAL ACTION MAILED | |
| STPP | Information on status: patent application and granting procedure in general | Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER | |
| STPP | Information on status: patent application and granting procedure in general | Free format text: FINAL REJECTION MAILED | |
| STCB | Information on status: application discontinuation | Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |