CN104426797B

CN104426797B - A kind of communication means and device based on queue

Info

Publication number: CN104426797B
Application number: CN201310378798.9A
Authority: CN
Inventors: 张广飞; 张柳航; 常轶松; 侯锐
Original assignee: Huawei Technologies Co Ltd; Institute of Computing Technology of CAS
Current assignee: Huawei Technologies Co Ltd; Institute of Computing Technology of CAS
Priority date: 2013-08-27
Filing date: 2013-08-27
Publication date: 2018-03-13
Anticipated expiration: 2033-08-27
Also published as: CN104426797A

Abstract

Embodiments of the present invention provide a queue-based communication method and device. The adapter connected to the sending node includes an arbiter, a prefetcher, a cache memory, and a sending engine. The method includes: the prefetcher receives the arbitration The ID of the queue to be processed sent by the processor, the ID of the queue to be processed is obtained by the arbitrator when obtaining the ID of the current processing queue; The context of the queue to be processed, if not saved, read the context of the queue to be processed from the memory of the sending node, and save it in the cache memory, for the sending engine to send the queue to be processed Used when queued data. In this way, the data transmission performance of nodes using QP communication can be significantly improved.

Description

Communication method and device based on queue

Technical Field

The invention relates to the technical field of communication, in particular to a queue-based communication method and device.

Background

When a node uses a queue-based communication mechanism QP (queue pair) to perform data transmission, three queues are firstly established for the QP in a memory: the sending queue sq (send queue), the receiving queue rq (receive queue), the completion queue cq (completion queue), and then the adapter connected with the node performs data transmission.

The data transmission process will be briefly described below with reference to the schematic diagram of the adapter transmission scenario shown in fig. 1.

Firstly, an arbiter determines the identity of a currently processed QP, a send engine send queue process engine judges whether the context of the QP is stored in a cache of a cache, and if the context of the QP is stored in the cache, the arbiter directly reads the context from the cache; if not, the current processing process is suspended, and cache replacement is performed first, that is, the context of the currently processed QP is obtained by accessing the memory.

It should be noted that information such as the memory locations of the three queues corresponding to the QP and the number of work Queue elements wqe (work Queue element) is stored in the context of the QP, and the memory space occupied by the QP is large (one context is about 512 bytes), so that the adapter cannot store the contexts of all QPs. In general, the context of the QP is stored in memory, but to increase the access speed of the adapter to the QP context, the QP context that is commonly used may be stored in the cache of the adapter.

Secondly, after the sending engine acquires the context of the QP, the position of the sending queue SQ of the QP in the memory can be obtained, the first WQE of the SQ queue can be read, and corresponding data are sent to the receiving node.

For the receiving node, the data receiving process may be briefly described in conjunction with the receiving scenario diagram of the adapter shown in fig. 2.

Firstly, after a receiving engine receive queue process engine receives data sent by a sending node, judging whether the context of a QP corresponding to the data is stored in cache or not, and if so, directly reading the data from the cache; if not, the current processing process is also suspended, cache replacement is carried out, and the context of the QP is obtained from the memory.

Secondly, after the receiving engine acquires the context of the QP, the position of the RQ of the QP in the memory can be known, so that the first WQE of the RQ queue can be read, the received data is stored in the buffer indicated by the WQE, and the data receiving process is completed.

As can be seen from the above description of the sending and receiving processes, because the cache only stores some contexts of common QPs, the QP context may not be found in the cache, and once the cache is missed, the cache replacement needs to be performed at a high cost, which may reduce the data transmission performance of the node using QP communication.

Disclosure of Invention

The queue-based communication method and device of the embodiment of the invention improve the data transmission performance of the node using QP communication in a preprocessing mode.

Therefore, the embodiment of the invention provides the following technical scheme:

in a first aspect, an embodiment of the present invention provides a queue-based communication method, where an adapter connected to a sending node includes an arbiter, a prefetcher, a cache memory, and a sending engine, and the method includes:

the prefetcher receives the identity of a queue to be processed sent by the arbiter, and the identity of the queue to be processed is obtained by the arbiter when the identity of the queue to be processed is obtained;

the prefetcher judges whether the context of the queue to be processed is stored in the cache memory, if not, the context of the queue to be processed is read from the memory of the sending node and is stored in the cache memory for the sending engine to use when sending the data of the queue to be processed.

In a first possible implementation manner of the first aspect, the method further includes:

after the context of the pending queue is read from the memory of the sending node,

and the prefetcher extracts the identity of the pairing queue corresponding to the queue to be processed from the context of the queue to be processed and sends the identity of the pairing queue to a receiving node through the sending engine.

In a second aspect, an embodiment of the present invention provides a queue-based communication method, where an adapter connected to a receiving node includes a prefetcher, a cache memory, and a receiving engine, the method including:

the prefetcher receives the identity of the pairing queue forwarded by the receiving engine;

the prefetcher judges whether the context of the matched queue is stored in the cache memory, if not, the context of the matched queue is read from the memory of the receiving node and stored in the cache memory for the receiving engine to use when receiving the data of the matched queue.

In a first possible implementation manner of the second aspect, if the receiving node receives identities of paired queues sent by at least two sending nodes, the receiving node sends the identities of the paired queues to the at least two sending nodes

The prefetcher receives the identity of the pairing queue forwarded by the receiving engine, and comprises the following steps:

the prefetcher receives the identities of at least two paired queues and stores the identities of the at least two paired queues in a waiting queue according to the receiving sequence.

With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner, the method further includes:

the prefetcher extracts a preset number of matched queues from the waiting queue in sequence;

the prefetcher determining whether the context of the paired queue is stored in the cache memory comprises:

the prefetcher determines whether the context of the extracted pairing queue with the preset number is stored in the cache memory.

In a third aspect, an embodiment of the present invention provides a queue-based communication apparatus, where the apparatus includes:

the device comprises a receiving unit, a judging unit and a processing unit, wherein the receiving unit is used for receiving the identity of a queue to be processed sent by an arbiter, and the identity of the queue to be processed is obtained by the arbiter when the identity of the queue to be processed is obtained;

a judging unit, configured to judge whether a context of the queue to be processed is stored in a cache memory;

and the reading unit is used for reading the context of the queue to be processed from the memory of the sending node when the context of the queue to be processed is not stored in the cache memory, and storing the context of the queue to be processed into the cache memory for the sending engine to use when sending the data of the queue to be processed.

In a first possible implementation manner of the third aspect, the apparatus further includes:

the extracting unit is used for extracting the identity of the pairing queue corresponding to the queue to be processed from the context of the queue to be processed read by the reading unit;

and the sending unit is used for sending the identity of the pairing queue extracted by the extracting unit to a receiving node through the sending engine.

In a fourth aspect, an embodiment of the present invention provides a queue-based communication apparatus, where the apparatus includes:

the receiving unit is used for receiving the identity of the pairing queue forwarded by the receiving engine;

a judging unit configured to judge whether or not a context of the pairing queue is stored in a cache memory;

a reading unit, configured to, when the context of the matching queue is not stored in the cache memory, read the context of the matching queue from a memory of a receiving node, and store the context of the matching queue in the cache memory, so that the context of the matching queue is used when the receiving engine receives data of the matching queue.

In a first possible implementation manner of the fourth aspect, if the receiving node receives identities of paired queues sent by at least two sending nodes, the receiving node sends the identities of the paired queues to the at least two sending nodes

The receiving unit is specifically configured to receive the identifiers of the at least two pairing queues, and store the identifiers of the at least two pairing queues in the waiting queue according to a receiving order.

With reference to the first possible implementation manner of the fourth aspect, in a second possible implementation manner, the apparatus further includes:

the extracting unit is used for sequentially extracting a preset number of pairing queues from the waiting queues;

the determining unit is specifically configured to determine whether or not the contexts of the extracted pairing queue of the preset number are stored in the cache memory.

According to the communication method and device based on the queue, the context of the queue with cache loss possibly occurring is prepared in advance, so that cache replacement with high cost is not needed to be carried out in real time when the queue is processed; in addition, the continuity of data transmission can be effectively ensured, and the phenomenon of pause in the transmission process is avoided, so that the data transmission performance of the node utilizing QP communication can be obviously improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a diagram of a prior art adapter transmission scenario;

FIG. 2 is a diagram of a prior art receiving scenario of an adapter;

FIG. 3 is a schematic diagram of three queues built in memory when two nodes communicate using QP mechanism;

FIG. 4 is a schematic diagram of an adapter connected to a transmitting node according to the present invention;

fig. 5 is a flowchart of embodiment 1 of the queue-based communication method of the transmitting side of the present invention;

fig. 6 is a flowchart of embodiment 2 of the queue-based communication method of the transmitting side according to the present invention;

FIG. 7 is a schematic diagram of an adapter connected to a receiving node according to the present invention;

fig. 8 is a flowchart of embodiment 1 of a queue-based communication method of a receiving side according to the present invention;

fig. 9 is a flowchart of embodiment 2 of the queue-based communication method of the receiving side of the present invention;

FIG. 10 is a flow chart of the present invention for preparing a context for a preset number of paired queues;

FIG. 11a is a schematic diagram of a transmitting node and adapter in accordance with the present invention;

FIG. 11b is a schematic diagram of a receiving node and an adapter according to the present invention;

fig. 12 is a schematic diagram of a queue-based communication apparatus embodiment 1 of a transmitting side of the present invention;

fig. 13 is a diagram of a queue-based communication device embodiment 2 of a sender of the present invention;

fig. 14 is a schematic diagram of a queue-based communication device embodiment 1 of a receiving side according to the present invention;

fig. 15 is a schematic diagram of a queue-based communication device embodiment 2 of a receiving side according to the present invention;

fig. 16 is a schematic diagram of the hardware configuration of a communication apparatus on the transmission side of the present invention;

fig. 17 is a schematic diagram showing a hardware configuration of a communication apparatus on the reception side of the present invention.

Detailed Description

In order to make the technical field of the invention better understand the scheme of the invention, the following detailed description of the embodiments of the invention is provided in conjunction with the accompanying drawings and the implementation mode.

The following explains an application scenario of the present invention.

The embodiment of the invention mainly aims at a Queue communication mechanism, and for one Queue Pair, the Queue Pair corresponds to three queues, namely a sending Queue SQ, a receiving Queue RQ and a completion Queue CQ. Referring to fig. 3, a schematic diagram of three queues established in a memory when two nodes communicate using a QP mechanism is shown, and assuming that in the current communication process, node 1 is a data sending side and node 2 is a data receiving side, the data transmission process can be roughly described as follows:

1. the software layer of node 1 prepares the data to be transmitted in memory.

Correspondingly, the software layer of node 2 prepares a buffer in memory for receiving data for storage.

2. The software layer of the node 1 creates Work Queue Elements (WQEs) for transmission, inserts the WQEs into the SQ in sequence, and configures the context of the QP corresponding to the SQ (mainly configures the number of the WQEs, generally, if the number of the WQEs is not zero, data needs to be sent, and if the number of the WQEs is zero, no data is sent). It should be noted that each WQE records one or more data buffers to be sent, and specifically may include a starting virtual address and a size of each data buffer.

Correspondingly, the software layer of node 2 creates a received work queue element, WQE, and inserts the WQE into the RQ in order. It should be noted that each WQE in the RQ points to one or more receive buffers, and may specifically include a starting virtual address and a size of each receive buffer.

3. The adapter connected to node 1 processes the first WQE of the SQ queue, fetches the data from the data buffer indicated by the WQE and sends it to node 2.

Correspondingly, the adapter connected with the node 2 takes out the first WQE of the RQ queue once receiving the data, and saves the received data into the buffer area indicated by the WQE.

4. When the adapter of node 1 processes a WQE of SQ, a completion Queue element CQE (completion Queue element) is generated to indicate completion of a transmission and inserted into the CQ of node 1.

Correspondingly, when the adapter of the node 2 receives a data packet, a CQE is also generated and inserted into the CQ of the node 2, so as to indicate the completion of a reception.

As described in the background art, when data is transmitted in the above manner in the prior art, a problem of a decrease in transmission performance due to cache miss may occur.

Referring to fig. 4, there is shown a schematic diagram of the structure of an adapter connected to a sending node according to the present invention, which includes an arbiter 101, a prefetcher 102, a cache memory 103, and a sending engine 104. Wherein,

the arbiter is used for acquiring the identity of the current processing queue and the identity of the queue to be processed, sending the identity of the current processing queue to the sending engine, and sending the identity of the queue to be processed to the prefetcher;

the sending engine is used for reading the context of the current processing queue from the cache memory, reading currently sent data from the memory of the sending node according to the context, and sending the currently sent data to the receiving node;

the prefetcher is used for judging whether the context of the queue to be processed is stored in the cache memory, if not, the context of the queue to be processed is read from the memory of the sending node and is stored in the cache memory for the sending engine to use when sending the data of the queue to be processed;

the prefetcher is also used for extracting the identity of the matched queue corresponding to the queue to be processed from the context of the queue to be processed and sending the identity of the matched queue to the sending engine;

the sending engine is further configured to send the identity of the pairing queue to the receiving node.

In the following, the communication process of the embodiment of the present invention is explained with reference to the flowchart of embodiment 1 shown in fig. 5, where the method may include:

step 201, the prefetcher receives the identity of the queue to be processed sent by the arbiter, and the identity of the queue to be processed is obtained by the arbiter when the identity of the queue to be processed is obtained.

In order to solve the influence of cache miss on data transmission performance, when the arbiter sends the identity of the current processing queue to the sending engine, the arbiter can also obtain the identity of the next queue to be processed in advance, and send the identity to the prefetcher by using the identity as the queue to be processed. Therefore, when the sending engine normally processes and sends the data of the current processing queue, the prefetcher can also prepare the context of the queue to be processed for the sending engine in advance, so that after the sending engine finishes processing the data of the current processing queue, the prepared context of the queue to be processed can be directly used for processing and sending the data of the queue to be processed, the condition that cache is not hit when the sending engine obtains the context of the queue to be processed is avoided, and the problem of transmission performance reduction caused by cache miss can be solved.

Typically, after the software layer at the sending node has added a WQE to the SQ and configured the context, the context will inform the adapter that it currently has data to send so the arbiter knows which QP it is handling.

Taking the sending node with 3 QPs as an example, and the identity identifiers are QP1, QP2, and QP3, if the software layer inserts a WQE into the SQ corresponding to QP2 and configures the number of WQEs in the context to add 1, the context of QP2 will send an instruction to the adapter; then, the software layer inserts a WQE into the SQ corresponding to QP1, and adds 1 to the WQE number in the configuration context, at this time, the context of QP1 also sends an instruction to the adapter; finally, the software layer inserts a WQE into the SQ corresponding to QP3, and configures the number of WQEs in the context plus 1, and the context of QP3 also sends an instruction to the adapter. According to the sequential processing principle, the adapter knows that the processing sequence of the 3 QPs of the sending node is QP2, QP1 and QP3, so that the arbiter can clearly know that: if the current queue is QP2, the pending queue is QP 1; if the current queue is QP1, the pending queue is QP 3; if the current queue is QP3, the pending queue is QP 2.

It should be noted that, the obtaining of the identity of the queue to be processed by the arbiter in this step is performed when obtaining the identity of the queue to be processed by the arbiter, which is to say, the arbiter does not obtain the identities of the queue to be processed and the queue to be processed in an absolute sense, in comparison with the prior art that the arbiter obtains the identity of the queue to be processed and waits for the sending engine to send data of the queue to be processed, and then obtains the identities of the queue to be processed.

Step 202, the prefetcher determines whether the context of the queue to be processed is saved in the cache memory, if not, step 203 is executed; otherwise, the process is not carried out, and the flow is ended.

In step 203, the prefetcher reads the context of the queue to be processed from the memory of the sending node, and stores the context in the cache memory for use when the sending engine sends the data of the queue to be processed.

Because the cache stores the context of part of common QPs, and the contexts of all QPs are stored in the memory of the sending node, the context of the lookup queue in the cache has the following two results:

firstly, the context of the queue to be processed is stored in the cache and can be obtained by accessing the cache, which is called cache hit, and corresponding to the result, the adapter prepares the context of the queue to be processed for the sending engine at the moment and does not need the prefetcher to execute any action;

secondly, the context of the queue to be processed is not stored in the cache, and cannot be obtained by accessing the cache, which is called "cache miss" or "cache miss", and the prefetcher of the embodiment of the invention is required to prepare the context of the queue to be processed for the sending engine in advance by a cache replacement mode corresponding to the result.

Thus, after receiving the id of the queue to be processed in step 201, the prefetcher may first search the context of the queue with the id from the cache, and determine which of the two cases the search result is.

If the first type, the prefetcher does not need to do anything. After the sending engine receives the identity of the queue to be processed sent by the arbiter, the sending engine can directly access the cache to obtain the context of the queue to be processed, and send data according to the context.

If the type is the second type, the prefetcher needs to access the memory of the sending node for cache replacement, that is, the context of the queue with the identity of the queue to be processed is searched from the memory of the sending node and is stored in the cache for the sending engine to use when needed.

The embodiment of the invention prepares the context of the queue with cache loss in advance by the preprocessing or predicting mode, so that on one hand, a sending engine does not need to replace the cache with high cost in real time when processing the queue; on the other hand, the continuity of data transmission can be effectively ensured, and the phenomenon of pause in the transmission process is avoided, so that the data transmission performance of the sending node of the embodiment of the invention is obviously improved.

Referring to fig. 6, a flowchart of embodiment 2 of the communication method according to the embodiment of the present invention is shown, which may include:

step 301, the prefetcher receives the identity of the queue to be processed sent by the arbiter, and the identity of the queue to be processed is obtained by the arbiter when the identity of the queue to be processed is obtained.

Step 302, the prefetcher determines whether the context of the queue to be processed is saved in the cache memory, and if not, step 303 is executed; otherwise, the process is not carried out, and the flow is ended.

Step 303, the prefetcher reads the context of the queue to be processed from the memory of the sending node, and stores the context into the cache memory for use when the sending engine sends the data of the queue to be processed.

Steps 301 to 303 are the same as steps 201 to 203, and are not described herein again.

Step 304, the prefetcher extracts the identity of the paired queue corresponding to the pending queue from the context of the pending queue, and sends the identity of the paired queue to the receiving node through the sending engine.

For a receiving node, only after receiving data from a sending node, the receiving node knows which queue context is to be read from the cache, and if cache miss occurs at the moment, the receiving node stops receiving operation, so that the data transmission performance is reduced more obviously. In order to solve the problem, the prefetcher can also extract the identity of the paired queue from the context of the queue to be processed, send the identity to the sending engine, and the sending engine sends the identity of the paired queue to the receiving node through the control channel with higher priority, so that an adapter connected with the receiving node prepares the context of the paired queue for the receiving engine in advance in a preprocessing mode. By the scheme, the sending node can prepare the context of the queue to be processed in advance for the sending engine to use when sending the data of the queue to be processed; the receiving node may also be made to prepare the context of the paired queue in advance for use by the receiving engine in receiving data from the paired queue (if from the perspective of the sending node, it is understood herein that the receiving engine is used in receiving data from the pending queue sent by the sending node).

It should be noted that the QP may be paired at the sending node and the receiving node, and the identities of the two may be the same, for example, the identity of a queue at the sending node is QP1, and the identity of the queue at the receiving node is also QP 1. Alternatively, the identities of the two may not be the same, such as the identity of a queue at the sending node being QP4 and the identity of its counterpart queue at the receiving node being QP 8. In order to enable the adapter connected to the receiving node to accurately find the receiving buffer of the QP data, the prefetcher preferably further extracts the identity of the pairing queue of the QP from the context of the QP, and sends the identity to the adapter connected to the receiving node.

In summary, the context of the queue obtained in this embodiment is mainly for the following two purposes:

first, the memory location of the three queues holding the QP is stored in context. Therefore, the sending engine can access the three queues accordingly, and for the sending node, the sending engine mainly accesses the sending queue SQ, reads data to be sent according to the first WQE of the SQ and sends the data. Therefore, the problem of transmission performance reduction of the sending node due to cache loss can be solved.

Second, the identity of the paired queue of QPs is stored in context. In order to enable the adapter connected with the receiving node to accurately find the receiving buffer of the QP data, the identity of the paired queue of the QP is also sent to the adapter connected with the receiving node, and then a prefetcher in the adapter prepares the context of the paired queue in advance for the receiving engine to use when needed. Therefore, the problem of transmission performance reduction of the receiving node due to cache loss can be solved.

In addition, it should be noted that, if it is determined in step 302 that the context of the queue to be processed is stored in the cache, the prefetcher does not need to perform cache replacement, but at this time, the context of the paired queue may be stored in the cache of the adapter connected to the receiving node, or may not be stored, so that, when it is determined that the context of the queue to be processed is stored in the cache of the adapter connected to the sending node, the prefetcher may also extract the identity of the paired queue from the identity of the paired queue, and send the identity to the receiving node, so that the prefetcher in the adapter connected to the receiving node prepares the context of the paired queue in advance.

Referring to fig. 7, there is shown a schematic diagram of the structure of an adapter connected to a receiving node according to the present invention, which includes a prefetcher 401, a cache memory 402 and a receiving engine 403. Wherein,

the receiving engine is used for receiving data sent by a sending node, reading the context of the queue corresponding to the data from the cache memory, and storing the data to the memory of the receiving node according to the context;

the receiving engine is further configured to receive an identity of a matching queue sent by the sending node, and forward the identity of the matching queue to the prefetcher;

the prefetcher is configured to receive an identity of a matching queue forwarded by the receiving engine, determine whether a context of the matching queue is stored in the cache memory, and if not, read the context of the matching queue from the memory of the receiving node and store the context of the matching queue in the cache memory for use when the receiving engine receives data of the matching queue.

In the following, the communication process of the embodiment of the present invention is explained with reference to the flowchart of embodiment 1 shown in fig. 8, where the method may include:

step 501, the prefetcher receives the identity of the pairing queue forwarded by the receiving engine.

In order to solve the problem of transmission performance reduction of a receiving node due to cache miss, when an adapter connected with a sending node processes data sending of a current queue, the adapter also obtains an identity of a next queue to be processed (namely the identity of a queue to be processed) in advance, finds the identity of a matched queue of the queue to be processed at the receiving node, and sends the identity of the matched queue to the adapter connected with the receiving node, so that the adapter can prepare the context of the matched queue in advance. Therefore, when the adapter needs to process the pairing queue, the cache miss phenomenon can not occur when the context of the queue is stored in the cache.

This step is the process of the adapter receiving the pairing queue identity. The sending engine of the adapter connected with the sending node sends the identity of the pairing queue to the adapter of the receiving node through the control channel, and the adapter is received by the receiving engine of the adapter and then forwarded to the prefetcher, so that the prefetcher prepares the context of the pairing queue with the identity in advance.

It should be noted that, a receiving node may only communicate with one sending node at present, and the prefetcher receives the identity of the pairing queue sent by one sending node; alternatively, the receiving node may also communicate with at least two sending nodes simultaneously, and the prefetcher receives the identities of at least two of the paired queues sent by the at least two sending nodes. In response to this situation, in order to ensure normal communication between the receiving node and each sending node, the prefetcher will sequentially store the received identifiers of the paired queues into a first-in first-out queue according to the sequential processing principle, so as to prepare the context of each paired queue one by one.

Step 502, the prefetcher determines whether the context of the paired queue is saved in the cache memory, and if not, step 503 is executed; otherwise, the process is not carried out, and the flow is ended.

In step 503, the prefetcher reads the context of the paired queue from the memory of the receiving node, and stores the context of the paired queue in the cache for the receiving engine to use when receiving the data of the paired queue.

As described above at steps 202, 203 of the embodiment shown in fig. 5. All QP contexts are stored in the memory of the receiving node, and only part of the commonly used QP contexts are stored in the cache of the adapter, so when the prefetcher judges whether the context of the matching queue is stored in the cache, the following two judgment results also exist:

first, the cache keeps the context of the pairing queue, that is, the adapter has prepared the context of the pairing queue for the receiving engine, and the prefetcher is not required to execute any action;

secondly, the context of the paired queue is not maintained in the cache, that is, the cache miss phenomenon occurs, and at this time, the context of the paired queue needs to be prepared for the receiving engine in advance by the prefetcher of the embodiment of the present invention through a cache replacement mode.

In the embodiment of the invention, the context of the queue with cache loss possibly occurring is prepared in advance through the preprocessing or predicting mode, so that on one hand, a receiving engine does not need to access the cache in real time after receiving the data of the paired queue and carry out cache replacement with high cost; on the other hand, the continuity of data transmission can be effectively ensured, and the phenomenon of stopping in the transmission process is avoided, so that the data transmission performance of the receiving node of the embodiment of the invention is obviously improved.

Referring to fig. 9, a flowchart of embodiment 2 of the communication method according to the embodiment of the present invention is shown, which may include:

step 601, the prefetcher receives the identifiers of the at least two paired queues and stores the identifiers of the at least two paired queues in a waiting queue according to the receiving sequence.

This step is the same as step 501, but mainly aims at the case where one receiving node communicates with at least two sending nodes, and is not described here again.

Step 602, the prefetcher sequentially extracts a preset number of matching queues from the wait queue.

After the sending node sends the identity identifiers of the paired queues to the receiving node according to a certain sequence, the data of the paired queues are not necessarily sent in the sequence of sending the identity identifiers strictly in the subsequent sending process. Thus, the prefetcher is required to extract a predetermined number of the paired queues from the wait queue for processing, such as extracting the first 3 paired queues of the processed wait queue.

This step is illustrated below with reference to specific examples. If the receiving node currently communicates with 5 sending nodes, and the order in which the receiving node receives the identifiers of the pairing queues sent by the 5 sending nodes is as follows:

the pair queue TQP1 sent by the sending node 1, the pair queue TQP2 sent by the sending node 2, the pair queue TQP3 sent by the sending node 3, the pair queue TQP4 sent by the sending node 4, and the pair queue TQP5 sent by the sending node 5.

Under the influence of the actual communication process, the sequence of sending the data of the pairing queue by the sending node is assumed as follows:

data of the pair queue TQP3 transmitted by the transmitting node 3, data of the pair queue TQP2 transmitted by the transmitting node 2, data of the pair queue TQP4 transmitted by the transmitting node 4, data of the pair queue TQP1 transmitted by the transmitting node 1, and data of the pair queue TQP5 transmitted by the transmitting node 5.

At this point, if the prefetcher still prepares the contexts of the paired queues in the queue one by one in order, cache miss may still occur for some queues. For example, in the order of receiving the pair queue id, the prefetcher should prepare TQP1 context first and prepare TQP2 context when the receiving engine processes TQP1 data; however, referring to the receiving order of the paired queue data, the receiving engine receives TQP3 data first, and the prefetcher is not ready for TQP3 context, so that when the receiving engine accesses the cache, the cache miss still occurs.

To solve the above problem and further improve the data transmission performance, the prefetcher may prepare a predetermined number of contexts of the paired queues in batch for the receiving engine to use. If the preset number is 3, the paired queues extracted from the waiting queue by the prefetcher are TQP1, TQP2 and TQP3, and the contexts of the 3 queues can be prepared in advance correspondingly, so that after the receiving engine receives TQP3 data, the receiving engine can access the cache to obtain TQP3 contexts and perform normal and continuous data receiving.

Step 603, the prefetcher determines whether the cache memory stores the extracted contexts of the pairing queues with preset number, if not, step 604 is executed; otherwise, the process is not carried out, and the flow is ended.

In step 604, the prefetcher reads the context of the paired queue not stored in the cache memory from the memory of the receiving node, and stores the context into the cache memory for use when the receiving engine receives the data of the paired queue.

Since the contexts of the pairing queues of the preset number are prepared in advance in this embodiment, step 603 determines whether the contexts of the extracted pairing queues of the preset number are stored in the cache, which means that when the contexts of all the extracted pairing queues are stored in the cache, no action is required to be executed by the prefetcher; if the cache does not store part or all of the extracted contexts of the pairing queue, the prefetcher is required to replace the cache, and the contexts which are not stored by the cache are prepared in advance.

As an implementation manner of preparing a context of a preset number of paired queues in advance in this embodiment, reference may be made to a flowchart shown in fig. 10, where the flowchart includes:

step 701, the prefetcher selects one of the pairing queues with the preset number as a current pairing queue;

in step 702, the prefetcher determines whether the context of the current pairing queue is stored in the cache memory, if not, step 703 is executed, otherwise, the step 701 is executed.

Step 703, the prefetcher reads the context of the current matching queue from the memory of the receiving node, and stores the context into the cache memory for use when the receiving engine receives the data of the current matching queue; and then, returning to execute the step 701 until all the extracted pairing queues are judged.

When one of the paired queues with the preset number is selected as the current paired queue, the paired queue may be selected according to the sequence of the subsequent queues, in the above example, TQP1 may be selected as the current processing queue first, then TQP2 is selected as the current processing queue in the next round of judgment, and finally TQP3 is selected as the current processing queue. In addition, the selection may be performed in other manners, such as processing odd-numbered bit and then processing even-numbered bit paired queues TQP1, TQP3, TQP2, or randomly selecting the current paired queue TQP3, TQP1, TQP2, which is not limited in the present invention.

The above describes the specific implementation process of the scheme of the present invention from both the receiving node and the sending node, and the communication process of the embodiment of the present invention is explained with reference to fig. 11a and 11 b.

Referring to fig. 11a, a schematic diagram of a sending node (mainly embodied in the figure is a CPU and a memory MEM of the node, where context of QP is stored in MEM) and an adapter is shown.

1. The arbiter arbiters send to the sending engine, in addition to the identity of the QP that needs to be processed for the current processing cycle (taking QP4 as an example), the identity of the QP to be processed (which is not called next _ QP, taking next _ QP = QP3 as an example) to the prefetcher.

2. After receiving the next _ QP, the prefetcher first queries whether a context of the next _ QP (i.e., QP context 3) is stored in the cache, and if so, does not process the context, and waits for receiving an identity of a new queue to be processed sent by the arbiter (in combination with the example shown in fig. 11a, the new queue to be processed is QP 1); if not, the prefetcher needs to prefetch qp context3 from memory into the cache.

3. The prefetcher finds from qp context3 the destination node (i.e., the receiving node of the data transmission process) to which the next _ qp corresponds, and the identity of the pairing queue that is paired with the next _ qp in the destination node, which is referred to as next _ tqp (next _ tqp = TQP7 in connection with the example shown in fig. 11a, and passes the next _ tqp to the transmission engine.

4. After receiving the next _ tqp sent by the prefetcher, the sending engine passes the next _ tqp to the destination node through a control information channel (a channel dedicated to passing the highest priority control frame, generally QP0, i.e., QP0 is generally only used to send the highest priority control frame, and does not serve any other purpose).

5. After the sending engine finishes processing the data of the current processing queue QP4, the arbiter sends QP3 to the sending engine as the current processing queue, the sending engine can access the cache, read the context QP context3 of QP3 from the cache (which may be originally stored by the cache or prepared by the prefetcher according to the above steps), know where the sending queue SQ of QP3 is in the memory according to the content stored in QP context3, further obtain the currently sent WQE from the cache (the signal goes to a direction not shown in the figure), and send the WQE to the destination node through the data channel.

Referring to fig. 11b, there is shown a schematic structural diagram of a receiving node (mainly embodied in the figure is a CPU and a memory MEM of the node, where context of QP is stored in MEM) and an adapter.

6. As the adapter of the receiving party, after receiving the information next _ tqp transmitted by the special control channel, the next _ tqp is transmitted to the prefetcher.

Since the prefetcher may receive multiple next _ tqp, a first-in-first-out wait queue is provided in the prefetcher for holding the received next _ tqp.

7. The prefetcher takes out the previous n items of next _ tqp from the waiting queue, searches the context corresponding to the next _ tqp in the cache, and if all the contexts are found, the prefetcher does not process; if the found object exists, the prefetcher needs to prefetch the found next _ tqp context to the cache from the memory.

8. When the receiving engine receives data of next _ tqp sent by the sending node, if data of TQP7 is received, the cache is accessed to read TQP7 context from the received data (TQP is QP7 at the receiving node relative to QP3 of the sending node, and conversely, QP3 at the sending node is defined as TQP 3) QP context7 (which may be originally stored by the cache or prepared by the prefetcher according to the above steps), and knows where in the memory the receive queue RQ of QP7 is stored according to the content stored in QP context7, and further obtains WQE for specifying a data storage buffer from the obtained rqe (this signal is not shown in the figure), and stores the received data into the specified buffer, thereby completing data reception.

In summary, for the sender, after it finishes processing the current QP, when it needs to process the next QP, the context of the QP is already stored in the cache, and then the sending engine can directly access the cache to obtain the QP context and perform data transfer. For the receiving side, when receiving new data, since the QP context corresponding to the new data is already stored in the cache, the receiving engine may also directly access the cache to obtain the QP context and receive the data.

Correspondingly, the embodiment of the invention also provides a queue-based communication device, namely the prefetcher mentioned above.

Referring to fig. 12, a schematic diagram of an embodiment 1 of a queue-based communication apparatus on a sending side according to an embodiment of the present invention is shown, where the apparatus includes:

a receiving unit 801, configured to receive an identity of a queue to be processed sent by an arbiter, where the identity of the queue to be processed is obtained by the arbiter when obtaining the identity of a current queue to be processed;

a determining unit 802, configured to determine whether a context of the queue to be processed is stored in a cache memory;

a reading unit 803, configured to, when the context of the queue to be processed is not stored in the cache, read the context of the queue to be processed from the memory of the sending node, and store the context of the queue to be processed in the cache for use when the sending engine sends the data of the queue to be processed.

Referring to fig. 13, a schematic diagram of an embodiment 2 of a queue-based communication apparatus of a sending side according to an embodiment of the present invention is shown, where the apparatus includes:

a receiving unit 901, configured to receive an identity identifier of a to-be-processed queue sent by an arbiter, where the identity identifier of the to-be-processed queue is obtained by the arbiter when obtaining an identity identifier of a current processing queue;

a determining unit 902, configured to determine whether a context of the queue to be processed is stored in a cache;

a reading unit 903, configured to read the context of the queue to be processed from the memory of the sending node when the context of the queue to be processed is not stored in the cache memory, and store the context of the queue to be processed in the cache memory, so that the context of the queue to be processed is used when the sending engine sends data of the queue to be processed;

an extracting unit 904, configured to extract an identity of a paired queue corresponding to the queue to be processed from the context of the queue to be processed read by the reading unit;

a sending unit 905, configured to send the identity of the pairing queue extracted by the extracting unit to a receiving node through the sending engine.

Referring to fig. 14, a schematic diagram of an embodiment 1 of a queue-based communication device of a receiving side according to an embodiment of the present invention is shown, where the device includes:

a receiving unit 1001, configured to receive an identity of a pairing queue forwarded by a receiving engine;

a judging unit 1002, configured to judge whether a context of the paired queues is stored in a cache memory;

a reading unit 1003, configured to, when the context of the paired queue is not stored in the cache, read the context of the paired queue from a memory of a receiving node, and store the context of the paired queue in the cache for use when the receiving engine receives data of the paired queue.

If the receiving node receives the identities of the paired queues sent by at least two sending nodes, referring to fig. 15, a schematic diagram of an embodiment 2 of a queue-based communication apparatus of a receiving side according to an embodiment of the present invention is shown, where the apparatus includes:

a receiving unit 1101, specifically configured to receive the identifiers of the at least two pairing queues, and store the identifiers of the at least two pairing queues in a waiting queue according to a receiving order;

an extracting unit 1102, configured to sequentially extract a preset number of paired queues from the waiting queue;

specifically, the determining unit 1103 determines whether or not the contexts of the extracted predetermined number of paired queues are stored in the cache memory.

A reading unit 1104, configured to, when the extracted preset number of contexts of the paired queues are not stored in the cache, read the context of the paired queue that is not stored from the memory of the receiving node, and store the context of the paired queue into the cache for use when the receiving engine receives data of the paired queue.

Furthermore, the embodiment of the invention also provides a hardware structure of the communication device. May include at least one processor (e.g., CPU), at least one network interface or other communication interface, memory, and at least one communication bus for enabling communications among the devices. The processor is used to execute executable modules, such as computer programs, stored in the memory. The Memory may comprise a Random Access Memory (RAM) and may also include a non-volatile Memory, such as at least one disk Memory. The communication connection between the system gateway and at least one other network element is realized through at least one network interface (which can be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network and the like can be used.

Referring to fig. 16, in some embodiments, a memory stores program instructions, and the program instructions may be executed by a processor, where the program instructions include a receiving unit 801, a determining unit 802, and a reading unit 803, and specific implementations of each unit may refer to corresponding units disclosed in fig. 12.

Referring to fig. 17, in some embodiments, a memory stores program instructions, and the program instructions may be executed by a processor, where the program instructions include a receiving unit 1001, a determining unit 1002, and a reading unit 1003, and specific implementations of each unit may refer to corresponding units disclosed in fig. 14.

Aspects of the invention may be described in the general context of computer-executable instructions, such as program elements, being executed by a computer. Generally, program elements include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The inventive arrangements may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program elements may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above detailed description of the embodiments of the present invention, and the detailed description of the embodiments of the present invention used herein, is merely intended to facilitate the understanding of the methods and apparatuses of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A queue-based communication method, characterized in that, the adapter connected to the sending node includes an arbiter, a prefetcher, a cache memory and a sending engine, and the method includes:

The prefetcher receives the ID of the queue to be processed sent by the arbiter, and the ID of the queue to be processed is obtained by the arbitrator when acquiring the ID of the current processing queue;

The prefetcher judges whether the context of the queue to be processed is saved in the cache according to the received identity of the queue to be processed, and if not saved, reads the context from the memory of the sending node The context of the queue to be processed is stored in the cache memory for use when the sending engine sends the data of the queue to be processed.

2. The method according to claim 1, characterized in that the method further comprises:

After reading the context of the queue to be processed from the memory of the sending node,

The prefetcher extracts the ID of the paired queue corresponding to the queue to be processed from the context of the queue to be processed, and sends the ID of the paired queue to the receiving node through the sending engine.

3. A communication method based on a queue, characterized in that the adapter connected to the receiving node comprises a prefetcher, a cache memory and a receiving engine, and the method comprises:

The prefetcher receives the ID of the paired queue of the waiting queue at the receiving node forwarded by the receiving engine;

When the receiving engine is processing the current processing queue, the prefetcher judges whether the context of the pairing queue is saved in the cache memory according to the received identity of the pairing queue, and if not saved, from The context of the pairing queue is read from the memory of the receiving node and stored in the cache memory for use by the receiving engine when receiving the data of the pairing queue.

4. The method according to claim 3, wherein if the receiving node receives the identifications of the paired queues sent by at least two sending nodes, then

The prefetcher receives the identity of the paired queue forwarded by the receiving engine, including:

The prefetcher receives the identities of at least two paired queues, and saves the identities of the at least two paired queues in the waiting queue in order of receiving.

5. method according to claim 4, is characterized in that, described method also comprises:

The prefetcher sequentially extracts a preset number of matching queues from the waiting queue;

The prefetcher judges whether the context of the pairing queue is saved in the cache memory according to the received identity of the pairing queue, including:

The prefetcher judges whether the extracted contexts of the preset number of pairing queues are stored in the cache according to the received identifiers of the pairing queues.

6. A queue-based communication device, characterized in that the device comprises:

The receiving unit is configured to receive the identity mark of the queue to be processed sent by the arbitrator, and the identity mark of the queue to be processed is obtained by the arbitrator when obtaining the identity mark of the current processing queue;

A judging unit, configured to judge whether the context of the queue to be processed is saved in the cache memory according to the received identity of the queue to be processed;

a reading unit, configured to read the context of the queue to be processed from the memory of the sending node when the context of the queue to be processed is not saved in the cache memory, and save it in the cache memory, It is used when the sending engine sends the data in the queue to be processed.

7. The device according to claim 6, further comprising:

An extracting unit, configured to extract the identity of the paired queue corresponding to the queue to be processed from the context of the queue to be processed read by the reading unit;

The sending unit is configured to send the identity of the paired queue extracted by the extracting unit to the receiving node through the sending engine.

8. A queue-based communication device, characterized in that the device comprises:

The receiving unit is used to receive the identification of the paired queue of the waiting queue at the receiving node forwarded by the receiving engine;

A judging unit, configured to judge whether the context of the pairing queue is saved in the cache memory according to the received identity of the pairing queue when the receiving engine processes the current processing queue;

a reading unit, configured to read the context of the paired queue from the memory of the receiving node when the context of the paired queue is not saved in the cache memory, and save it in the cache memory, It is used when the receiving engine receives the data of the pairing queue.

9. The device according to claim 8, wherein if the receiving node receives the identifiers of the paired queues sent by at least two sending nodes, then

The receiving unit is specifically configured to receive the identities of at least two paired queues, and store the identities of the at least two paired queues in the waiting queue in order of receiving.

10. The device according to claim 9, further comprising:

An extracting unit, configured to sequentially extract a preset number of matching queues from the waiting queue;

The judging unit is specifically configured to judge whether the extracted contexts of the preset number of paired queues are stored in the cache according to the received identifiers of the paired queues.