Comment on page
Below we describe each major element of the CONA architecture in detail, identifying the benefits of the design choices and presenting the challenges. Figure 3 shows the architecture.
Figure 3. CONA Layer Architecture
The service layer primarily facilitates the opening of service capabilities to users. Users can invoke the platform's atomic functions and services, such as computing resources and load balancing, through the orchestration layer. The service layer connects with the user's business services through a northbound interface. Users can define business services in their own applications, and for some functions and algorithms needed in the business and services, they are directly handed over to the service layer for completion. The service layer then returns the processed results to the users. The management of the functions of the service layer by users needs to be indirectly realized through the orchestration layer, but the invocation of atomic functions is directly implemented through the interface with the service layer.
The service layer obtains information about computing power and network resources from the convergence layer through a southbound interface. While returning information to users, it delivers the processed intermediate data or other necessary information to the convergence layer for use.
The orchestration layer is responsible for the monitoring, management, scheduling, allocation, and full lifecycle management of computing resources and networks. In CONA, the role of the orchestration layer is akin to a decentralized controller. It issues orchestration and scheduling instructions through interfaces between layers, obtains returned information, and then passes the information back to the user.
In terms of resource coordination, the orchestration layer maintains the current resource status, including computing resources, network resources, etc. When the resource status itself changes, the orchestration layer can obtain the corresponding information and update the local resource status. When the user's demand for resources changes, the orchestration layer will dynamically allocate resources based on the current resource status to ensure the user's use of computing resources. When the underlying resources change due to faults or other reasons, the orchestration layer will also make real-time changes based on the resource situation.
In terms of resource management, the orchestration layer requires information support from the infrastructure layer and the convergence layer and is responsible for the lifecycle management of resources from generation to extinction. The upper layer's use of computing resources and network resources can only be carried out through the orchestration layer, and cannot be directly configured through traditional methods such as the operating system or command line.
In terms of process management, the orchestration layer has a DevOps system management concept for the management of application services, which promotes communication between IT, CT, and OT technicians. The user's demand for atomic functions or services of the service layer enters from the orchestration layer, and the provision of computing resources and network resources exits through the orchestration layer. At the same time, the monitoring and management of various resources and services can also be realized through the orchestration layer, thereby ensuring the normal operation of the entire computing power network system. In CONA, there will be a platform for trading computing power as a commodity. The orchestration layer also needs to have the ability to circulate computing power and control the application deployment process based on computing power in the computing power buying and selling and application development functions. Blockchain, serving as a trust hub connecting users and resource contributors, plays a vital role in constructing such an open computing power trading ecosystem.
In terms of security management, the orchestration layer should have the ability to authenticate and authorize users and resources. Whether users can implement capability calls for the computing power network system, and whether computing resources and network resources can join the resource pool for user use, need to be confirmed by the security of the orchestration layer. In addition, the orchestration layer can also implement priority division for users and resources. For example, through the authentication function, users with VIP privileges are allowed to enjoy the use of computing resources first, or for a certain type of user with high-priority computing resources, they can be used by this type of user first. For users or resources that have not passed authentication, they can be prohibited from completing functional interactions in the computing power network or can only achieve limited functional interactions. Emerging technologies, such as Zero-Knowledge Proofs (ZKP), can work in conjunction with blockchain to accomplish the aforementioned functions while ensuring the privacy of network participants and network security.
The Convergence Layer serves as the "Thin Waist" within CONA, much akin to the Network Layer in the context of a TCP/IP network. The core function of the convergence layer is network control and computing power management.
The network control module primarily implements the association, distribution, addressing, allocation, and optimization of computing power information resources in the network through the network control plane. The network control layer plays a bridging role in the entire computing power network. It is responsible for collecting and distributing underlying resource information, providing network services for the upper layer, and delivering the latest network status information and global computing power information in real time when the service orchestration layer needs information interaction from the network control layer.
Computing power information comes from the Infrastructure Layer and needs to be associated with the Convergence Layer and propagated. Network protocol packets serve as the carrier of information. According to the metric values after the modeling of computing power information resources, new link state data packets (such as OSPF protocol) can be defined or loaded in the original protocol packets in the form of TLV (such as ISIS protocol), thereby completing the association of computing power information with the Convergence Layer.
After completing the association of computing power information, the network control module needs to synchronize computing power information throughout the network. Since computing power information is carried in network protocol packets, the synchronization of computing power information must be completed after the establishment of network protocol neighbors. Therefore, the change in computing power information will not only change due to the change in its own resources but also change with the change in network neighbor status. This change also needs to be synchronized throughout the network through the distribution of network protocol packets. Common protocols in the network layer include IGP protocols (including RIP, OSPF, ISIS, EIGRP, etc.) and BGP protocols. IGP protocols are responsible for synchronizing network information within autonomous systems, and BGP protocols are responsible for synchronizing network information between autonomous systems. To synchronize computing power information within and between autonomous systems, it is necessary to extend IGP protocols and BGP protocols. The CONA team will closely follow the latest developments in this field by the Internet Engineering Task Force (IETF) and continuously improve during the project advancement process.
The association and synchronization of computing power information throughout the network ultimately aim to achieve network path selection, allocation, and optimization based on computing power. Traditional network protocols calculate the shortest path tree based on the cost of the link, thereby obtaining the optimal path to the destination node. The computing power network selects the optimal path by calculating the network path based on computing power information. For example, when a certain AI application requires computing power from the GPU, the computing power network will guide path calculation based on the GPU computing power information in the network. Even if the user's link cost to a certain CPU resource is less, it will not be selected. When the computing power information in the network changes, the change in the computing power network path will change with the update of the information throughout the network. If load balancing needs to be implemented, it can also be completed in the Convergence Layer, and its load balancing implemented at the application layer has the characteristics of high efficiency and low latency.
Computing power management includes functions such as registration and modeling of heterogeneous computing power resources, as well as support for upper-layer computing power trading behavior. Heterogeneous computing power resources can be divided into CPU, GPU, NPU, TPU, etc., from the professional field of chips. These different types of computing power resources need to be registered in the computing power management layer to be published through the network control layer, even if the Convergence Layer can perceive the computing power resources and quantify them appropriately. In addition, it is necessary to reasonably schedule different types of processor resources so that they can handle tasks most suitable for themselves. This is achieved through unified modeling by the computing power management module, combined with scheduling by the network control module, thus fulfilling the roles of heterogeneous computing power resources and making the best use of them.
Computing power management also needs to support the trading behavior of computing power. The computing power services and transactions in the computing power network rely on the decentralized, low-cost, privacy-protecting, and trustworthy computing power trading model of the blockchain. The Orchestration Layer is responsible for the management of blockchain functions, and the Convergence Layer is responsible for providing transaction-related resource information to the Orchestration Layer. When a computing power user needs to use computing power, they sign a contract and bill through the computing power trading platform in the computing power management layer, record it in the blockchain, and complete distributed storage.
Therefore, computing power management in the entire computing power network is a distributed deployment architecture. In the process of computing power trading, the contributor and user of computing power are separated. Through scalable blockchain technology and container technology, the scattered computing power of computing power contributors is integrated to provide economical, efficient, and decentralized computing power services for computing power users.
The Infrastructure Layer encompasses two elements: computational resources and network forwarding. It calls for an integration of the actual conditions and operational efficiencies of computational processing capacity and network forwarding capacity within the network, facilitating high-quality transmission and flow of various computing, storage, and network resources.
Computational resources include all types of heterogeneous computational resources within the network. In a narrow sense, these include processors dominated by computational power such as CPUs, GPUs, NPUs, and TPUs. In a broader sense, these extend to various independent or distributed storage with storage capabilities, as well as various devices virtualized by the operating system with data processing capabilities. From the perspective of the device layer, computational resources not only include common computing devices such as servers and storage, but also, in future scenarios of the Internet of Things, including edge devices that can provide computational power, such as cars, mobile phones, and drones.
Network forwarding belongs to the data plane in the Software-defined Networking (SDN) architecture, responsible for the deployment of various network devices. This is achieved by guiding the forwarding of data packets through the installation of forwarding table items issued by the network control layer.
The Infrastructure Layer itself is only responsible for the collection of computational resources and network devices, as well as for the integration of the physical architecture of various devices. However, for the management and application of resources and devices, guidance through the Convergence Layer and Orchestration Layer is necessary. Within CONA, it functions as the physical infrastructure.