[go: up one dir, main page]

CN119781935A - A dynamic resource scheduling method to improve Hadoop data throughput in CTyunOS - Google Patents

A dynamic resource scheduling method to improve Hadoop data throughput in CTyunOS Download PDF

Info

Publication number
CN119781935A
CN119781935A CN202411961679.0A CN202411961679A CN119781935A CN 119781935 A CN119781935 A CN 119781935A CN 202411961679 A CN202411961679 A CN 202411961679A CN 119781935 A CN119781935 A CN 119781935A
Authority
CN
China
Prior art keywords
ctyunos
hadoop
resource scheduling
task
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411961679.0A
Other languages
Chinese (zh)
Inventor
王琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Sendi Computer System Co ltd
Original Assignee
Guangzhou Sendi Computer System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Sendi Computer System Co ltd filed Critical Guangzhou Sendi Computer System Co ltd
Priority to CN202411961679.0A priority Critical patent/CN119781935A/en
Publication of CN119781935A publication Critical patent/CN119781935A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention aims to provide a dynamic resource scheduling method for improving Hadoop data throughput in CTyunOS, which comprises the steps of butting Hadoop with CTyunOS; and carrying out dynamic resource scheduling according to the predicted load result by adopting Hadoop task scheduling logic and CTyunOS bottom layer resource scheduling mechanism. According to the invention, by utilizing the kernel-level resource scheduling capability of CTyunOS, a dynamic resource scheduling mechanism is developed, the dynamic resource scheduling is realized, the resource demand change can be responded in real time, the resource utilization rate is improved, the interaction between the Hadoop task scheduling logic and the CTyunOS bottom layer resource scheduling mechanism is enhanced, and the optimization characteristics provided by the system are fully utilized. Reducing high latency and improving data throughput, especially in high concurrency environments, improves overall system performance by optimizing load distribution among nodes and improving data transmission efficiency.

Description

Dynamic resource scheduling method for improving Hadoop data throughput in CTyunOS
Technical Field
The invention relates to the technical field of dynamic resource allocation, in particular to a dynamic resource scheduling method for improving Hadoop data throughput in CTyunOS.
Background
CTyunOS is a cloud operating system that aims to provide efficient resource management and scheduling to support large-scale data processing and cloud computing applications. Hadoop is an open source framework widely used in large-scale data processing scenarios, whose core components include a distributed file system (HDFS) and a resource scheduling framework (YARN). Hadoop enables task allocation and data storage on compute nodes through these components. However, performance improvement of Hadoop relies mainly on efficient management of underlying resources and rational scheduling of tasks. In a traditional deployment environment, hadoop is generally based on a general-purpose operating system, and the optimization capability of a cloud computing platform and a specific operating system is not fully utilized, which causes the problem that a static resource allocation strategy cannot dynamically adapt to real-time load change, so that the resource utilization rate is low. Data throughput is susceptible to node load non-uniformity, resulting in overall performance fluctuations. Lack of underlying operating system supported optimization mechanisms such as memory optimization, CPU isolation, and I/O priority management.
Some existing resource scheduling optimization methods alleviate these problems to some extent. Such as dynamic load balancing algorithms, improve the problem of uneven load by reassigning tasks, predictive scheduling based on historical data may allocate resources in advance to cope with the expected load. However, in complex cloud environments, especially those involving multi-tenant and dynamic resource requirements, it is difficult to fully enhance the performance of Hadoop. Furthermore, these schemes often do not fully exploit the optimization capabilities at the operating system level. Traditional Hadoop can not respond to resource demand fluctuation dynamically, so that nodes in peak periods are overloaded, and resources in low peak periods are wasted. The system coupling degree is low, and interaction between Hadoop task scheduling logic and the resource scheduling of the bottom operating system is lacking, so that the system optimization characteristic cannot be fully utilized. High latency and low throughput-uneven load between nodes and data transmission bottlenecks lead to reduced performance, especially in high concurrency environments.
Disclosure of Invention
The invention aims to provide a dynamic resource scheduling method for improving Hadoop data throughput in CTyunOS, which develops a dynamic resource scheduling mechanism by utilizing the kernel-level resource scheduling capability of CTyunOS, realizes the dynamic of resource scheduling, can respond to resource demand change in real time, improves the resource utilization rate, enhances the interaction between Hadoop task scheduling logic and a CTyunOS bottom layer resource scheduling mechanism, and fully utilizes the optimization characteristics provided by a system. Reducing high latency and improving data throughput, especially in high concurrency environments, improves overall system performance by optimizing load distribution among nodes and improving data transmission efficiency.
A dynamic resource scheduling method for improving Hadoop data throughput in CTyunOS, comprising:
Docking Hadoop with CTyunOS;
Predicting the load;
And carrying out dynamic resource scheduling according to the predicted load result by adopting Hadoop task scheduling logic and CTyunOS bottom layer resource scheduling mechanism.
Preferably, before the Hadoop is docked with CTyunOS, the method further includes deploying CTyunOS resource environment, specifically:
creating a containerized instance in CTyunOS, running each Hadoop node in a separate container;
Based on CTyunOS a virtualization management module, the resources of the physical server are pooled into a dynamically allocated computing resource pool, a storage resource pool and a network bandwidth pool;
each independent container is allocated independent bandwidth through network namespace technology.
Preferably, the docking the Hadoop with CTyunOS includes:
Calling CTyunOS the monitoring API by expanding the YARN scheduler of Hadoop, the acquisition of real-time information comprises:
CPU load, based on the control group limiting the CPU time usage ratio of the container;
memory occupation, monitoring the memory use trend by using a memory file built in the container;
IO throughput, monitoring disk reading and writing speed through a block layer of CTyunOS cores;
Data processing, monitoring data is collected by Prometaus and stored in a time series database.
Preferably, the predicting the load includes:
the node history load is predicted using a sliding window exponential smoothing algorithm, expressed as:
;
Wherein, Is the coefficient of smoothing which is the coefficient of smoothing,Is the current load value of the load,Is a predicted value.
Preferably, the performing dynamic resource scheduling by using the Hadoop task scheduling logic and CTyunOS bottom layer resource scheduling mechanism according to the predicted load result includes:
Calculating task weights to be processed;
Distributing nodes according to the task weights;
And if the allocated node occupancy rate exceeds the threshold value, migrating the task to be processed to a new node by adopting a CTyunOS bottom layer resource scheduling mechanism.
Preferably, the calculating task weights to be processed includes:
setting a weight factor based on the task type, the data amount and the priority, calculating the task priority, expressed as:
Priority(Task) = f(Size, Deadline, Compute\_Complexity);
tasks with high priority are assigned to low load nodes.
Preferably, if the allocated node occupancy rate exceeds the threshold, migrating the task to be processed to the new node by adopting CTyunOS underlying resource scheduling mechanism includes:
Triggering dynamic expansion when the node resource occupancy rate exceeds 85%;
Using CTyunOS thermal migration technology to move the task part to a new node, and automatically selecting the copy closest to the target node to read data during migration;
And when the node resource utilization rate is lower than 30%, actively releasing the container and the corresponding resources.
Preferably, the assigning the task with the high priority to the low load node includes:
Based on CTyunOS bandwidth allocation strategy, preferentially allocating network bandwidth with higher priority tasks;
Using a token bucket algorithm to limit bandwidth consumption of the low-priority task, so that a network of the high-priority task is unblocked;
and (3) enabling a Snappy compression algorithm in the data transmission process, and merging small files into large files through a Hadoop Archive.
Preferably, the deploying CTyunOS a resource environment further comprises:
Introducing a distributed cache system to cache hot spot data at CTyunOS layers;
the HDFS blocks which need to be frequently accessed are read in the cache area in advance.
Preferably, after the dynamic resource scheduling is performed by adopting the Hadoop task scheduling logic and CTyunOS bottom layer resource scheduling mechanism according to the predicted load result, the method further includes:
Monitoring the node state by adopting a CTyunOS heartbeat detection mechanism, detecting that a container fails, and isolating a fault node;
the task on the fault node is marked as a failure state in the Hadoop scheduler, and a retry mechanism is triggered;
preferably selecting the latest copy to start retry;
recording the intermediate state of the restart task by using a CTyunOS task snapshot function;
And (3) starting an automatic copy repair process for damage to the HDFS copy caused by node failure CTyunOS, and reallocating copy storage among healthy nodes.
The invention has the beneficial effects that 1. The invention utilizes the kernel characteristics of CTyunOS, such as CPU binding, memory dynamic adjustment and network flow control, to realize the efficient resource allocation of Hadoop tasks, thereby remarkably improving the system performance. 2. According to the invention, through CTyunOS monitoring APIs, the resource utilization data of Hadoop nodes are obtained and analyzed in real time, and the resource allocation strategy is dynamically adjusted, so that the system can flexibly cope with load change, and the resource utilization rate and the data throughput are optimized. 3. The invention innovatively applies dynamic bandwidth allocation and data compression strategies, optimizes transmission efficiency aiming at the data characteristics of the HDFS, ensures high data transmission rate to be maintained in a high-load environment, and reduces network bottlenecks. 4. The invention combines CTyunOS heartbeat detection and task snapshot functions to realize rapid isolation and automatic task restarting of the fault node, enhance the reliability and stability of the system and reduce the performance loss caused by faults.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a flow chart of a dynamic resource scheduling method for improving Hadoop data throughput in CTyunOS according to the present invention;
FIG. 2 is a schematic diagram of a dynamic resource scheduling system for improving Hadoop data throughput in CTyunOS according to the present invention;
FIG. 3 is a schematic flow diagram of a deployment CTyunOS resource environment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that all directional indicators (such as up, down, left, right, front, and rear are used in the embodiments of the present invention) are merely for explaining the relative positional relationship, movement conditions, and the like between the components in a certain specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicators are changed accordingly.
Furthermore, the description of "first," "second," etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
Some existing resource scheduling optimization methods alleviate these problems to some extent. Such as dynamic load balancing algorithms, improve the problem of uneven load by reassigning tasks, predictive scheduling based on historical data may allocate resources in advance to cope with the expected load. However, in complex cloud environments, especially those involving multi-tenant and dynamic resource requirements, it is difficult to fully enhance the performance of Hadoop. Furthermore, these schemes often do not fully exploit the optimization capabilities at the operating system level. Traditional Hadoop can not respond to resource demand fluctuation dynamically, so that nodes in peak periods are overloaded, and resources in low peak periods are wasted. The system coupling degree is low, and interaction between Hadoop task scheduling logic and the resource scheduling of the bottom operating system is lacking, so that the system optimization characteristic cannot be fully utilized. High latency and low throughput-uneven load between nodes and data transmission bottlenecks lead to reduced performance, especially in high concurrency environments.
The invention utilizes the kernel characteristics of CTyunOS, such as CPU binding, memory dynamic adjustment and network flow control, to realize the efficient resource allocation of Hadoop tasks, thereby remarkably improving the system performance. According to the invention, through CTyunOS monitoring APIs, the resource utilization data of Hadoop nodes are obtained and analyzed in real time, and the resource allocation strategy is dynamically adjusted, so that the system can flexibly cope with load change, and the resource utilization rate and the data throughput are optimized. The invention innovatively applies dynamic bandwidth allocation and data compression strategies, optimizes transmission efficiency aiming at the data characteristics of the HDFS, ensures high data transmission rate to be maintained in a high-load environment, and reduces network bottlenecks. The invention combines CTyunOS heartbeat detection and task snapshot functions to realize rapid isolation and automatic task restarting of the fault node, enhance the reliability and stability of the system and reduce the performance loss caused by faults.
Example 1
A dynamic resource scheduling method for improving Hadoop data throughput in CTyunOS, referring to fig. 1 and 2, comprising:
s100, butting Hadoop with CTyunOS;
CtyunOS is a cloud operating system that aims to provide efficient resource management and scheduling to support large-scale data processing and cloud computing applications. CTyunOS virtualization and container management capabilities CTyunOS support efficient virtualized and containerized operating environments that enable dynamic adjustment of resource allocation, including CPU, memory, storage, and network bandwidth. And the real-time resource monitoring is realized by a built-in resource monitoring module, so that the real-time acquisition and analysis of the resource utilization rate, the system load and the task state are supported. And the flexible task scheduling interface is used for providing an API interface and supporting deep linkage with a resource scheduling layer of Hadoop.
Hadoop is an open source software framework for distributed storage and processing of large-scale data sets. The core components of Hadoop include HDFS (Hadoop Distributed FILE SYSTEM) and MapReduce computation models. Hadoop is characterized in that a distributed computing framework is used for carrying out large-scale data computation based on a MapReduce model, and cluster resources are managed in a dependent manner YARN (Yet Another Resource Negotiator). High IO density, large consumption of tasks on disk and network bandwidth, and easy performance bottleneck. And the scheduling is simple, the default resource scheduling strategy is static allocation, and the support for dynamic adjustment of tasks is lacking. The dynamic expansion of the two is that Hadoop can realize dynamic expansion and recovery of resources through the API of the butt joint CTyunOS, and the limit of static allocation is broken through. The monitored data of CTyunOS can provide more accurate node load information for Hadoop and guide scheduling optimization. The container migration and isolation technique provided by CTyunOS helps to improve flexibility and reliability of Hadoop task scheduling.
S200, predicting the load;
predicting the load can help the system distribute computing tasks evenly to the various nodes in the cluster to avoid overloading certain nodes, thereby improving the efficiency of the overall system.
S300, dynamic resource scheduling is carried out according to the predicted load result by adopting Hadoop task scheduling logic and CTyunOS bottom layer resource scheduling mechanism.
The invention ensures the efficient execution of tasks by managing and allocating computing resources such as CPU, memory, network bandwidth, etc. In the embodiment of the invention, CTyunOS core service layers are the support of the whole optimization mechanism, including resource management, real-time monitoring and network optimization capability. The resource monitoring module fuses the real-time monitoring capability of CTyunOS with the resource scheduling depth of Hadoop, and provides a basis for dynamic scheduling. The task scheduling module optimizes the resource utilization efficiency of the Hadoop task through a dynamic allocation and migration mechanism. The data transmission optimization module mainly solves the bottleneck of the HDFS transmission efficiency and improves the throughput capacity. The fault tolerance module ensures the stability and quick recovery of tasks by using the high reliability mechanism provided by CTyunOS.
Preferably, S100, before docking Hadoop with CTyunOS, further comprises deploying CTyunOS resource environment, specifically:
s001, creating a containerization instance in CTyunOS, and operating each Hadoop node in an independent container;
A containerized instance is created CTyunOS, with each Hadoop node running in a separate container to isolate resource usage for different tasks.
S002, based on CTyunOS virtualized management module, the resources of the physical server are pooled into a dynamically allocated computing resource pool, a storage resource pool and a network bandwidth pool;
s003, each independent container is distributed with independent bandwidth through a network naming space technology.
Each container allocates independent bandwidth through a network namespace technique (Network Namespace) to avoid inter-task network resource conflicts.
Preferably, interfacing Hadoop with CTyunOS comprises:
Calling CTyunOS the monitoring API by expanding the YARN scheduler of Hadoop, the acquisition of real-time information comprises:
‌ YARN scheduler ‌ is a core component in the YARN (Yet Another Resource Negotiator) system and is responsible for allocation and management of cluster resources. YARN is a resource manager of the Hadoop cluster, and is mainly used for coordinating the resource requirements of different jobs, optimizing the resource utilization rate and improving the utilization rate of the cluster and the execution efficiency ‌ of the jobs. The resource management mechanism of YARNs is organized in the form of resource pools, one queue for each resource pool. The user may specify a queue when submitting a job, and if not, default use the default queue. Queues in YARN form a tree structure with all applications running in leaf queues and child queue resources using parent queue resources ‌.
In practical deployment, it is important to rationally configure a scheduler to optimize cluster performance and meet the needs of different jobs. By intelligently allocating resources, the YARN scheduler can utilize cluster resources to the greatest extent, avoid the influence of hot spot problems on the application, coordinate the running of a large number of applications in the cluster, and solve the problems ‌ such as resource competition.
CPU load, based on the control group limiting the CPU time usage ratio of the container;
memory occupation, monitoring the memory use trend by using a memory file built in the container;
IO throughput, monitoring disk reading and writing speed through a block layer of CTyunOS cores;
Data processing, monitoring data is collected by Prometaus and stored in a time series database (e.g., influxDB).
Preferably, S200, predicting the load includes:
the node history load is predicted using a sliding window exponential smoothing algorithm, expressed as:
;
Wherein, Is the coefficient of smoothing which is the coefficient of smoothing,Is the current load value of the load,Is a predicted value.
The prediction result is used as a decision basis for dynamic adjustment of resources.
Preferably, S300, performing dynamic resource scheduling according to the predicted load result by using Hadoop task scheduling logic and CTyunOS bottom layer resource scheduling mechanism includes:
s310, calculating task weights to be processed;
s320, distributing nodes according to task weights;
and S330, if the allocated node occupancy rate exceeds the threshold, migrating the task to be processed to a new node by adopting a CTyunOS bottom layer resource scheduling mechanism.
Preferably, S310, calculating task weights to be processed includes:
setting a weight factor based on the task type, the data amount and the priority, calculating the task priority, expressed as:
Priority(Task) = f(Size, Deadline, Compute\_Complexity);
tasks with high priority are assigned to low load nodes.
Tasks with high priority are allocated to low load nodes to reduce latency and resource conflicts. And selecting a node with the lowest resource utilization rate to perform task scheduling according to the real-time monitoring data.
Preferably, S330, if the allocated node occupancy exceeds the threshold, migrating the task to be processed to the new node using CTyunOS underlying resource scheduling mechanism includes:
s331, triggering dynamic expansion when the node resource occupancy rate exceeds 85%;
s332, using CTyunOS thermal migration technology to move the task part to the new node, and automatically selecting the copy closest to the target node to read data during migration;
Seamless migration of tasks from high-load nodes to low-load nodes is achieved using the Docker container's thermo-migration technique (CRIU-Checkpoint/Restore in Userspace). The data consistency is ensured by an HDFS copy mechanism, and the copy closest to the target node is automatically selected to read the data during migration. Based on CTyunOS built-in network flow control module, dynamically adjusting transmission path, and selecting link with lowest delay.
S333, when the node resource utilization rate is lower than 30%, the container and the corresponding resource are actively released.
When the node resource utilization rate is lower than 30%, the system actively releases the container and the corresponding resources, so that waste is reduced.
Preferably, assigning the task with the higher priority to the low load node comprises:
Based on CTyunOS bandwidth allocation strategy, preferentially allocating network bandwidth with higher priority tasks;
Using a token bucket algorithm to limit bandwidth consumption of the low-priority task, so that a network of the high-priority task is unblocked;
‌ ‌ the token bucket algorithm is a commonly used throttling algorithm to limit the number of requests per unit time. ‌ the algorithm maintains a fixed capacity token bucket into which a certain number of tokens are placed per second. When a request arrives, if there are enough tokens in the token bucket, the request is allowed to pass and one token is consumed from the bucket, otherwise the request is denied. ‌ A
And (3) enabling a Snappy compression algorithm in the data transmission process, and merging small files into large files through a Hadoop Archive.
‌ Snappy compression algorithm ‌ is used for performance optimization when processing large amounts of data. The goal of snapy is to provide a reasonable compression rate while guaranteeing compression speed, rather than pursuing maximum compression rate or compatibility with other compression algorithms ‌. The snap process ‌ performs data compression by looking for consecutive identical byte sequences, which are replaced by 3 integer tags when at least 4 consecutive identical bytes are found. This replacement mechanism reduces the size of the data, but at the same time increases the number of marks. To optimize memory space, snapy compresses these integers through bit operations, ensuring that the tag does not occupy too much space ‌. The encoding and decoding ‌ are very efficient in the Snappy encoding and decoding process, and can provide reasonable compression effects while guaranteeing the speed. When encoding, snappy converts the input data into a ui 8 array and performs marker replacement when finding consecutive identical byte sequences, and when decoding, snappy reconstructs the original data ‌ according to the markers. In the embodiment of the invention, the Snappy compression algorithm is started in the data transmission process, the compression ratio reaches 50% -70%, and the transmission flow is greatly reduced.
Preferably, deploying CTyunOS the resource environment further comprises:
Introducing a distributed cache system to cache hot spot data at CTyunOS layers;
‌ distributed caching system ‌ is a caching system that stores data on multiple servers in an effort to improve application performance by reducing the number of accesses to a database. The distributed cache system is typically deployed in a separate application process and on a different machine than the application process, so data read and write operations need to be done ‌ over the network.
The distributed cache system reduces the query times of the database by caching frequently accessed data, thereby reducing the load of the database. The cache data is stored in the memory, so that the access speed of the data can be remarkably improved, and the user experience ‌ is improved. Under the scene of wide user population and high concurrency, the distributed cache system can effectively disperse server pressure, and the overall performance ‌ of the system is improved.
The HDFS blocks which need to be frequently accessed are read in the cache area in advance.
And pre-reading the HDFS blocks which need to be frequently accessed to a cache area, so that the delay of disk reading is avoided.
Preferably, S300, after performing dynamic resource scheduling according to the predicted load result by using Hadoop task scheduling logic and CTyunOS bottom layer resource scheduling mechanism, further includes:
s410, monitoring the state of the node by adopting a CTyunOS heartbeat detection mechanism, detecting that the container fails, and isolating the fault node;
Heartbeat detection is a method for periodically sending signals, and is used for confirming whether each node in a system is in a normal working state. The heartbeat signal is similar to the heartbeat of a human being, and each beat represents one "breath" of the system, ensuring that the system is constantly running. This process is typically transmitted by one party to a heartbeat signal, which is received and acknowledged by the other party. The basic principle of heartbeat detection is to ensure that the communication links between nodes are clear by fixed frequency signaling. If a certain node does not receive the heartbeat signal within a period of time, the node can be judged to be possibly invalid, and corresponding processing measures are adopted. When the heartbeat is reported, additional state information and metadata, such as load conditions, running states of the nodes, etc., are usually carried, so that the management system can better know the health condition of the whole distributed system. In a distributed system, a node typically reports its own status to other nodes at a fixed frequency. The method has the advantages of being simple and easy to implement, and being capable of effectively monitoring the health condition of the nodes. Typically, a node will send a heartbeat signal every time period (e.g., every second). After the node sends the heartbeat signal, if no response is received within a predetermined time, the target node may be considered to have a problem. This mechanism is called timeout detection. Specifically, if a node does not receive any heartbeat response within a specified time (e.g., 3 seconds), a timeout mechanism is triggered. The core of the timeout mechanism is to set a reasonable timeout time that needs to be adjusted according to network delay and node processing power. If the timeout is too short, the normal node may be misjudged as the invalid node, and if the timeout is too long, the timeliness of fault detection may be affected.
S420, marking a task on a fault node as a failure state in the Hadoop scheduler, and triggering a retry mechanism;
S430, selecting the latest copy to start retry preferentially;
S440, recording the intermediate state of the restart task by using a CTyunOS task snapshot function;
‌ CTyunOS the task snapshot function ‌ is a tool for protecting and quickly recovering data, which generates a duplicate of data exactly the same as the current state by making a complete copy of the disk of the cloud host at a specific point in time, the so-called "snapshot". When the cloud host fails or data is lost, the data ‌ can be quickly restored by rolling back to the state of the snapshot. The snapshot function may quickly recover data when data loss or corruption occurs, reducing data loss and downtime, providing higher reliability and availability ‌. By periodically creating snapshots, an effective disaster recovery plan can be created that helps to cope with sudden events ‌, such as hardware failures, natural disasters, and the like. The use of snapshots can provide an efficient deployment and testing environment, ensure consistency of applications across different environments, and perform secure functional testing and performance evaluation ‌.
S450, starting an automatic copy repair process for the damage to the HDFS copy caused by the node fault CTyunOS, and reallocating copy storage among healthy nodes.
CTyunOS initiate an automatic copy repair procedure essentially includes the step ‌ of initiating a repair procedure ‌ in which the system automatically detects the copy status during CTyunOS initiation. If a copy is found to be damaged or missing, the system triggers an automatic repair procedure. Replica detection ‌ the system will check the state of all replicas to determine which replicas are corrupted or missing. This step is the basis of the repair process, ensuring that the system can accurately identify the copy that needs repair. Data recovery ‌ Once the copy that needs to be repaired is determined, the system recovers the data from the other healthy copies. This typically involves copying data from other copies to corrupted or missing copies to ensure data integrity and consistency.
Replica synchronization ‌ after the data recovery is complete, the system ensures data synchronization between all replicas. This includes updating the data of all copies, ensuring that they remain in a consistent state. Copy verification ‌ after repair is completed, the system verifies all copies to ensure that they can work normally and the data is consistent. This step is critical to ensure that the repair is successful. Restarting the system ‌ after all copy repairs and validations are completed, the system will restart, ensuring that the new copy state is properly loaded and applied.
Compared with the prior art, the method has the advantages that the data throughput is remarkably improved, the resource competition problem is effectively solved through dynamic resource scheduling and bottom layer optimization, and the Hadoop cluster performance is improved. The resource utilization rate is higher, the resource allocation is finer by real-time monitoring and intelligent scheduling, and the waste is reduced. And the system stability is enhanced, namely, when resources are tense or nodes are failed, the task execution is ensured not to be interrupted through a dynamic adjustment and task migration mechanism.
The invention utilizes the kernel characteristics of CTyunOS, such as CPU binding, memory dynamic adjustment and network flow control, to realize the efficient resource allocation of Hadoop tasks, thereby remarkably improving the system performance. According to the invention, through CTyunOS monitoring APIs, the resource utilization data of Hadoop nodes are obtained and analyzed in real time, and the resource allocation strategy is dynamically adjusted, so that the system can flexibly cope with load change, and the resource utilization rate and the data throughput are optimized. The invention innovatively applies dynamic bandwidth allocation and data compression strategies, optimizes transmission efficiency aiming at the data characteristics of the HDFS, ensures high data transmission rate to be maintained in a high-load environment, and reduces network bottlenecks. The invention combines CTyunOS heartbeat detection and task snapshot functions to realize rapid isolation and automatic task restarting of the fault node, enhance the reliability and stability of the system and reduce the performance loss caused by faults.
The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A dynamic resource scheduling method for improving Hadoop data throughput in CTyunOS, comprising:
Docking Hadoop with CTyunOS;
Predicting the load;
And carrying out dynamic resource scheduling according to the predicted load result by adopting Hadoop task scheduling logic and CTyunOS bottom layer resource scheduling mechanism.
2. The method for dynamic resource scheduling for improving Hadoop data throughput in CTyunOS according to claim 1, wherein before the Hadoop is docked with CTyunOS, further comprising deploying CTyunOS resource environment, specifically:
creating a containerized instance in CTyunOS, running each Hadoop node in a separate container;
Based on CTyunOS a virtualization management module, the resources of the physical server are pooled into a dynamically allocated computing resource pool, a storage resource pool and a network bandwidth pool;
each independent container is allocated independent bandwidth through network namespace technology.
3. The method for dynamic resource scheduling for improving Hadoop data throughput in CTyunOS of claim 1, wherein said interfacing Hadoop with CTyunOS comprises:
Calling CTyunOS the monitoring API by expanding the YARN scheduler of Hadoop, the acquisition of real-time information comprises:
CPU load, based on the control group limiting the CPU time usage ratio of the container;
memory occupation, monitoring the memory use trend by using a memory file built in the container;
IO throughput, monitoring disk reading and writing speed through a block layer of CTyunOS cores;
Data processing, monitoring data is collected by Prometaus and stored in a time series database.
4. The method for dynamic resource scheduling for improving Hadoop data throughput in CTyunOS of claim 1, wherein said predicting load comprises:
the node history load is predicted using a sliding window exponential smoothing algorithm, expressed as:
;
Wherein, Is the coefficient of smoothing which is the coefficient of smoothing,Is the current load value of the load,Is a predicted value.
5. The method for dynamic resource scheduling for improving Hadoop data throughput in CTyunOS according to claim 1, wherein said adopting Hadoop task scheduling logic and CTyunOS underlying resource scheduling mechanism to perform dynamic resource scheduling according to the predicted load result comprises:
Calculating task weights to be processed;
Distributing nodes according to the task weights;
And if the allocated node occupancy rate exceeds the threshold value, migrating the task to be processed to a new node by adopting a CTyunOS bottom layer resource scheduling mechanism.
6. The method for dynamic resource scheduling for improving Hadoop data throughput in CTyunOS, wherein said calculating task weights to be processed comprises:
setting a weight factor based on the task type, the data amount and the priority, calculating the task priority, expressed as:
Priority(Task) = f(Size, Deadline, Compute\_Complexity);
tasks with high priority are assigned to low load nodes.
7. The method for dynamic resource scheduling for improving Hadoop data throughput in CTyunOS, wherein if the allocated node occupancy exceeds a threshold, migrating the task to be processed to a new node using CTyunOS underlying resource scheduling mechanism comprises:
Triggering dynamic expansion when the node resource occupancy rate exceeds 85%;
Using CTyunOS thermal migration technology to move the task part to a new node, and automatically selecting the copy closest to the target node to read data during migration;
And when the node resource utilization rate is lower than 30%, actively releasing the container and the corresponding resources.
8. The method for dynamic resource scheduling for improving Hadoop data throughput in CTyunOS of claim 6, wherein said assigning high-priority tasks to low-load nodes comprises:
Based on CTyunOS bandwidth allocation strategy, preferentially allocating network bandwidth with higher priority tasks;
Using a token bucket algorithm to limit bandwidth consumption of the low-priority task, so that a network of the high-priority task is unblocked;
and (3) enabling a Snappy compression algorithm in the data transmission process, and merging small files into large files through a Hadoop Archive.
9. The method for dynamic resource scheduling for improving Hadoop data throughput in CTyunOS of claim 2, wherein said deploying CTyunOS resource environment further comprises:
Introducing a distributed cache system to cache hot spot data at CTyunOS layers;
the HDFS blocks which need to be frequently accessed are read in the cache area in advance.
10. The method for dynamic resource scheduling for improving Hadoop data throughput in CTyunOS according to claim 1, wherein after performing dynamic resource scheduling by using Hadoop task scheduling logic and CTyunOS bottom layer resource scheduling mechanism according to the predicted load result, the method further comprises:
Monitoring the node state by adopting a CTyunOS heartbeat detection mechanism, detecting that a container fails, and isolating a fault node;
the task on the fault node is marked as a failure state in the Hadoop scheduler, and a retry mechanism is triggered;
preferably selecting the latest copy to start retry;
recording the intermediate state of the restart task by using a CTyunOS task snapshot function;
And (3) starting an automatic copy repair process for damage to the HDFS copy caused by node failure CTyunOS, and reallocating copy storage among healthy nodes.
CN202411961679.0A 2024-12-30 2024-12-30 A dynamic resource scheduling method to improve Hadoop data throughput in CTyunOS Pending CN119781935A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411961679.0A CN119781935A (en) 2024-12-30 2024-12-30 A dynamic resource scheduling method to improve Hadoop data throughput in CTyunOS

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411961679.0A CN119781935A (en) 2024-12-30 2024-12-30 A dynamic resource scheduling method to improve Hadoop data throughput in CTyunOS

Publications (1)

Publication Number Publication Date
CN119781935A true CN119781935A (en) 2025-04-08

Family

ID=95245697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411961679.0A Pending CN119781935A (en) 2024-12-30 2024-12-30 A dynamic resource scheduling method to improve Hadoop data throughput in CTyunOS

Country Status (1)

Country Link
CN (1) CN119781935A (en)

Similar Documents

Publication Publication Date Title
JP7138126B2 (en) Timeliness resource migration to optimize resource placement
JP6791834B2 (en) Storage system and control software placement method
US10831387B1 (en) Snapshot reservations in a distributed storage system
US7254813B2 (en) Method and apparatus for resource allocation in a raid system
KR102051282B1 (en) Network-bound memory with optional resource movement
US7539991B2 (en) Method and apparatus for decomposing I/O tasks in a raid system
US10817380B2 (en) Implementing affinity and anti-affinity constraints in a bundled application
WO2020134364A1 (en) Virtual machine migration method, cloud computing management platform, and storage medium
US7437727B2 (en) Method and apparatus for runtime resource deadlock avoidance in a raid system
CN108139941A (en) Dynamic resource allocation based on network flow control
US7774571B2 (en) Resource allocation unit queue
WO2021057108A1 (en) Data reading method, data writing method, and server
JP7057408B2 (en) Storage system and its control method
US10594620B1 (en) Bit vector analysis for resource placement in a distributed system
CN102576294A (en) Storage system, method and program including multiple storage devices
CN113687935A (en) Cloud native storage scheduling mode based on super-fusion design
CN119781935A (en) A dynamic resource scheduling method to improve Hadoop data throughput in CTyunOS
Mathiason et al. Virtual full replication by adaptive segmentation
US12293082B2 (en) Computer system
CN118677913B (en) Distributed storage service request processing method, device and distributed storage system
US12019885B2 (en) Information processing system and configuration management method including storage nodes connected by network
Yan et al. A Utility-Based Fault Handling Approach
CN120512367A (en) Cluster management method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination