CN119166037B

CN119166037B - Methods to improve storage performance

Info

Publication number: CN119166037B
Application number: CN202311377841.XA
Authority: CN
Inventors: 孙楠楠
Original assignee: Baidai Shanghai Data Technology Co ltd
Current assignee: Baidai Shanghai Data Technology Co ltd
Priority date: 2023-10-23
Filing date: 2023-10-23
Publication date: 2025-11-04
Anticipated expiration: 2043-10-23
Also published as: CN119166037A

Abstract

The present invention relates to a method of improving memory performance. Belonging to the field of data storage. It is identified whether an access request to access the memory falls within the category of a single logical cluster, and if the access request does not span a different logical cluster, then an operation is performed on the memory in accordance with the original access request. Otherwise, if the access request spans different logic clusters, combining the different logic clusters related to the access request into an elastic logic cluster with variable logic positions, and reforming the content which is repeated for the different logic clusters in the access request into the same kind of request with single content, thereby executing operation on the memory according to the access request after the reforming. A typical advantage is the ability to compress the data stream generated by a store operation action, saving bandwidth and significantly improving the data store operation performance of the memory during construction and transmission and parsing of the store operation request.

Description

Method for improving storage performance

Technical Field

The invention mainly relates to the technical field of data storage, in particular to a method for improving storage performance in a storage system supporting a mechanical hard disk or a solid state hard disk in data storage.

Background

Along with the development track of the storage system, the hardware interface technology and the communication protocol are updated. Early mechanical hard disks and solid state disks did not present problems with low-level bandwidth interfaces because of limitations in terms of intrinsic properties of the internal physical storage medium, such as smaller capacity, and smaller data width used for single read and write. The bandwidth of input/output IO of a hard disk, for example, is currently increasing several times to hundreds of times as the storage medium is updated. The storage operation (such as zero fragmentation of instructions, repetition of partial abstract content, etc.) with low-energy-level bandwidth adopted at present can severely restrict the improvement of storage performance in the scene of facing high-energy-level bandwidth.

It is colloquially understood that even if optimizations are performed on storage operations, traditional optimizations such as priorities do not highlight the superiority of such optimizations or provide a matching adaptive solution path. Typically, the limited bandwidth still keeps the read and write data and corresponding optimization operations and other channel information in a preemptive resource situation, and performance improvement is not worth speaking. Such as a single queue mode for data interaction between a host and a mechanical or solid state disk, while allowing simultaneous reading of data from multiple different locations of the storage medium based on high concurrency, single queues become the bottleneck for concurrency.

In a communication process between a host and a hard disk in a storage system, how to compress unnecessary communication data and reduce various interaction data to reduce the pressure of an interface, reduce communication delay, ensure the smoothness of software execution and the like on the premise of accurately writing data to be stored into a memory and accurately reading data from the memory is a doubt to be improved in the storage system. The resolution of these problems will lead to a great performance improvement for the storage system. The development of communication protocols provides room for improvement in storage operation optimization, and speed improvement in IO bandwidth of communication protocols such as PCIe/NVMe is unprecedented, so that there is a great room for improvement in storage operation between a host and a hard disk.

Disclosure of Invention

The application relates to a method for improving storage performance, which is characterized by mainly comprising the steps of identifying whether an access request for accessing a memory falls into the category of a single logic cluster, executing operation on the memory according to an original access request if the access request does not cross different logic clusters, otherwise, combining different logic clusters related to the access request into an elastic logic cluster with variable logic positions if the access request crosses different logic clusters, and reforming the content which is internally aimed at different logic clusters but is repeated into the same kind of request of single content, thereby executing operation on the memory according to the access request after the reforming.

The method comprises the step that the operation type of the access request to the memory at least comprises data reading operation or data writing operation or data erasing operation.

The memory comprises a solid state disk based on flash memory, and maps a logic address (address information) carried by an original access request or a reformed access request to a physical address in the memory.

The method comprises that the total number/number of the logic clusters contained in the elastic logic clusters is a positive integer multiple of a single logic cluster, and the change of the logic positions is implemented in a mode of covering the whole logic cluster or the whole logic clusters at one time.

The method comprises the steps of executing the same operation as the current access request on the logic clusters which are covered by the elastic logic clusters and are not related to the current access request by the information corresponding to the logic addresses of the logic clusters, and keeping the execution time of the same operation synchronous with the execution time of the current access request.

The method described above, wherein if the access request spans different logical clusters, the homogeneous request for a different logical cluster comprises at least a read request or a write request or an erase request (e.g., the homogeneous request is embodied in the reforming step).

In the method, in the elastic logical cluster, addresses of different logical clusters related to the access request are continuous or are separated by the logical clusters not related to the access request.

When the memory receives a plurality of continuous access requests, firstly judging whether target minimum command execution units of the access requests are consistent or not and judging respective operation types of the access requests;

And synchronously executing a series of access requests with the same minimum command execution units and the same operation category once according to the physical addresses mapped by the respective logical addresses and the operation categories of the access requests, thereby replacing the sequential execution of the memory on a plurality of access requests.

A series of access requests with the same minimum command execution unit and different operation categories are sequentially executed according to the physical addresses mapped by the respective logical addresses and according to the operation categories of the physical addresses, or

A series of the access requests, which are different in the minimum command execution unit, identical in the operation class or different, are synchronously executed according to the physical address to which the respective logical address is mapped and according to their operation class.

The application relates to a method for improving storage performance, which is characterized by comprising the following steps:

Identifying whether the access request falls into the category of a single logic cluster of the memory, if the access request spans different logic clusters, combining the different logic clusters related to the access request into an elastic logic cluster with variable logic positions, and reforming the content which is aimed at different logic clusters but is repeated in the access request into the same kind of request with single content, so that the operation is performed on the memory according to the access request after the reformation;

when the memory receives a series of continuous access requests, a series of access requests with the same minimum command execution unit and the same operation type are synchronously executed once according to the physical address mapped by the respective logical address and the operation type of the physical address, and the sequential execution of the memory on the plurality of access requests is replaced.

when the memory receives a plurality of continuous access requests, firstly judging whether target minimum command execution units of the plurality of access requests are consistent or not and judging respective operation types of the plurality of access requests;

identifying whether an access request for accessing the memory falls into the category of a single logical cluster, wherein the access request does not span different logical clusters and then performs an operation on the memory according to the original access request;

In electronic products, the interior of a conventional mechanical hard disk (HDD) contains mechanical components, and a magnetic head needs to be moved to a target position on a rapidly rotating magnetic disk to perform writing and reading, which is a great deal of task consuming in inefficient mechanical actions. The Solid State Disk (SSD) which is far more than the mechanical hard disk in the reading speed is not focused on mechanical actions, and the solid state disk is more shockproof and anti-drop than the mechanical hard disk in the aspect of physical structure, because the solid state disk is made of a solid state electronic memory chip array and replaces the traditional rotating magnetic disk by an integrated circuit.

Although solid state disks have many memory advantages, such as extremely high memory density, it also means that memory particles or memory chips need to use more word lines and bit lines to handle transistors that store bit data. Furthermore, high density transistor arrays also introduce complications in memory management. Typically, there is a large amount of duplicate content in the communication instructions between the host and the hard disk, but how to cope with the duplicate content is one of the troublesome difficulties. If unnecessary operation data can be compressed and interaction data can be stored in a simplified manner, so that the pressure of an interface is reduced, communication delay is reduced, data access of a storage system is smooth and high-speed, and the method is one of the objects which can be achieved by the context scheme of the application. In coordination with the current mainstream high-speed communication protocol, memory performance (such as memory operation performance) is greatly improved. Thus, a method of improving memory performance may be referred to herein as a method of improving data storage operation performance of a memory.

The method and the device have the advantages that during the information storage operation, even if part of requests are repeated and even if the stored operation requests are overlapped in construction, transmission and analysis, unnecessary information of the storage operation can be reduced to the maximum extent and bandwidth can be saved on the premise that the safety and reliability of data are ensured.

Drawings

So that the manner in which the above recited objects, features and advantages of the present application can be understood in detail, a more particular description of the application, briefly summarized above, may be had by reference to the appended drawings.

FIG. 1 is a diagram of a host and solid state disk with physical blocks, pages and sectors to create data interactions.

FIG. 2 is a diagram that identifies whether an access request to access memory falls within the scope of a single logical cluster.

FIG. 3 is a diagram of the existence of a large number of repeated data segments in the interaction between the host and the hard disk and resulting in a blockage.

FIG. 4 is a diagram of a storage operation that handles data construction and transfer, parsing, etc. and generates redundant data.

FIG. 5 is a diagram of access requests crossing different logical clusters and combining them into one elastic logical cluster.

Fig. 6 is a diagram of access requests that may or may not span different logical clusters.

FIG. 7 is that the store operation does not produce or produces very little redundant data and nearly does not take up bandwidth.

Detailed Description

The technical solutions disclosed in the present application will be clearly and completely described in connection with the specific embodiments, but the described embodiments are only examples used for describing the present application and not all embodiments, and on the basis of these embodiments, those skilled in the art will recognize that any solution obtained without making an inventive effort is within the scope of protection of the present application.

Referring to fig. 1, one parameter of the SSD is a read speed, and a conventional SSD is typically designed to communicate with a HOST (HOST Device) through a serial advanced attachment technology (e.g., SATA) interface. With the increasing access speed of flash memory chips in solid state drives, such as improved chips and hardware, the associated interface technology has become an advantage of further increasing the data transmission speed between the solid state drive and a host or similar devices. This advantage is highly advantageous and critical to the improvement and storage of data storage performance as referred to herein.

Referring to FIG. 1, the relevant SATA interface is defined by the "SERIAL ATA Working Group" community. The main function is used for data transmission between the main board and data storage equipment (such as a hard disk and an optical disk drive), and the main function is named because of adopting a serial mode to transmit data, and has the advantages of simple structure and supporting hot plug. Belonging to computer buses. The embedded clock frequency signal for the SATA bus has stronger error correction capability than the traditional interface, can check transmission instructions and transmission data, and can automatically correct errors when errors are found, and the embedded clock frequency signal for the SATA bus mainly aims to improve the reliability of data transmission.

Referring to FIG. 1, the associated PCIe interface is a high-speed serial computer expansion bus standard interface, and PCIe is based primarily on replacing a conventional PCI bus. PCI-Express (Peripheral Component Interconnect express) designs individual serial links to connect each device to the root system/host based on a point-to-point topology. Based on the shared bus topology, the PCI bus in a single direction can be arbitrated with multiple hosts and limited to one host at a time. Conventional PCI buses limit the bus clock to the slowest peripheral on the bus in terms of clock scheme. In contrast, PCIe bus links support full duplex communication between any two endpoints while concurrently accessing across multiple endpoints without inherent limitations.

Referring to FIG. 1, the relevant NVMe interface Specification may be referred to as the VM Express (NVMe) as the nonvolatile memory host control interface Specification (Non Volatile Memory), which pertains to a bus transfer protocol Specification based on a device logic interface, e.g., corresponding to an application layer in a communication protocol, for accessing nonvolatile memory media such as flash memory attached over a PCI-Express (PCIe) bus, which in theory does not necessarily require the use of the PCIe bus protocol. The NVMe specification mainly provides a low-delay and internal concurrency native interface specification for the flash memory-based storage device, and also provides a native storage concurrency support for the current mainstream processor, computer platform and related applications, so that the host hardware and software can fully utilize the parallelization storage capacity of the solid-state storage device. Depending on the PCIe bus, NVMe devices may be mounted on various physical slots or related hardware that support the PCIe bus.

Referring to fig. 1, data is stored in a memory in the form of a file on a computer or an electronic device, and the target data is typically in the form of ASCII code or binary. The drive function matched with the storage medium such as a Flash storage chip is designed, so that data can be conveniently read and written on the memory. The target content can be converted into ASCII codes, stored in an array, written into a designated address of a Flash memory chip, read out data from the address when needed, and read out the read data in an ASCII code format.

Referring to fig. 1, with the development of interface technology from SATA to PCIe and NVMe, improvement of data access by several times or even tens of times or more is brought.

Referring to FIG. 1, a typical theoretical transfer speed for a SATA interface is hundreds of megabytes MB/S per second.

Referring to FIG. 1, the theoretical transmission speed for a comparison such as PCIe/NVMe interface is several giga-S GB/S.

Referring to fig. 1, in an alternative embodiment, unnecessary communication data and various kinds of interaction data should be compressed to alleviate the pressure of interfaces, reduce intermediate communication delay, ensure smoothness of software execution, and the like. Preferably, the implementation of the improvement scheme should give consideration to the bandwidths of various high and low energy levels both upwards and downwards.

Referring to FIG. 1, solid state disk SSDs typically contain one or more storage particles or storage chips (LUNs), which are essentially the smallest unit to execute a command, different LUNs may execute different sequences of commands.

Referring to fig. 1, a single memory granule or memory chip of a solid state disk SSD has multiple planes, each provided with independent data registers and cache registers in optimizing flash access speed.

Referring to FIG. 1, a single Plane (Plane) of a SSD contains one or more blocks (blocks), which typically contain many pages, e.g., the manufacturer specifies that the number of blocks should be a multiple of 32.

Referring to fig. 1, taking one Block (block_0) and another Block (block_63) as examples, the blocks of a single plane are far more than the number illustrated in practical application.

Referring to fig. 1, a page (page) of a solid state disk SSD contains a plurality of bytes. The size of a page is typically a power of 2 but does not include the capacity of this area of spare area (spare). A single page includes several sectors (sectors).

Referring to fig. 1, a spare area (spare) and a data area (user data) are common concepts of a memory.

Referring to fig. 1, a detailed explanation of the inside of the solid state disk SSD may be as follows. The space occupied by a single memory block generally includes a data area (user data) and a spare area (spare). Taking a single memory block with 64 pages as an example, the data area of a single memory block may be divided into 64 copies for storing the write data or information of each page, respectively. The spare area or redundancy space of the memory block may also be divided into 64 copies for storing redundancy data or spare data for each page, respectively. The space occupied by a single page includes a data area and a spare area. For example, block_0 includes pages P0-P63, etc.

Referring to fig. 1, taking a single page containing 8 sectors as an example, the data area of the page may also be divided into 8 parts for storing data of each sector, respectively. The spare area of the page may also be divided into 8 parts and used to store redundant data or spare data for each sector, respectively. In addition, the information storage mode of the page can be adaptively modified, for example, a single page opens up a certain spare space or redundant space for uniformly storing redundant data or spare data of all sectors in the page. Some single pages P3 or P4 are shown equally dividing 8 sectors Sec1, sec2, sec3,. Sec8, etc. The solid state disk can maintain the mapping relation between the logical block address and the physical storage block address through the mapping table.

Referring to fig. 1, in an alternative embodiment, the SSD substantially integrates a host chip and memory granules and the host chip may be referred to as an SSD controller or logic control unit (commond & control logic), the naming of the different memory manufacturers may be slightly different. The logic control unit is included in the memory, and includes a mode register and a command decoder. The external HOST, for example, inputs a command to the logic control unit via the command bus and the address bus, decodes the command via the command decoder, and stores the control parameters in the mode register and the logic control unit to control the logic operation. The external HOST inputs address information through an address bus, and the address information plays an auxiliary role when the logic control unit performs logic control, and in addition, the multiplexed address bus plays a role in combination with internal devices such as a control logic and row address multiplexer, a column address counting latch, a column address decoder and the like, so that a memory unit corresponding to a row address and a column address in the memory array is accurately selected, and data access operation is performed. The logic control unit may also vary in functionality from one storage manufacturer to another, which is by way of example only and not by way of limitation.

Referring to fig. 1, direct storage of a storage medium causes great inconveniences, such as difficulty in recording the location of valid data and difficulty in determining the remaining space of the storage medium, and in what format the data should be interpreted. If the storage space is the same and huge, various contents are stored randomly, and the required documents are difficult to find. It is contemplated that when the computer is operated to use the storage space, only the content is written to the storage space without being managed, and when the computer is operated again to attempt to read a certain document, it has to be searched everywhere from various mounting spaces.

As can be seen from fig. 1, the foregoing storage manner for directly storing data is still acceptable for small-capacity storage media such as EEPROM, but for large-capacity devices such as Flash memory chips or SD cards and SSD solid state disks, an efficient manner is needed to manage the storage contents thereof. Otherwise, a large amount of address information which is complex and easy to change along with the amount of the writing data is memorized to be matched with the writing data. If the storage system is considered from an enterprise-level commercial scene, a large-capacity SSD solid state disk designed for unified storage or distributed storage is supported as a main research object and a storage medium, so that the mass storage requirement, the data reliability and the safety are preconditions which have to be considered.

Referring to fig. 1, a conventional management manner for managing storage contents of a storage medium is a file system, which is an organization structure established on the storage medium for the purpose of storing and managing data, for example, the organization structure includes necessary modules such as an operating system boot area and a directory, and files.

Referring to FIG. 1, for example, file system formats include FAT32, NTFS, exFAT, etc., such formats under Windows operating systems are common file systems. The storage medium may be formatted before the file system is used. The original content is erased before formatting, for example, so that a file allocation table, a directory and the like can be newly built on the storage medium. Thus, the file system can record the physical address of the data storage, the residual space and other information.

Referring to fig. 1, when a file system is used, data is basically stored in the form of files.

Referring to fig. 1, when writing a new file, a file index is created in the directory, which indicates the information of the physical address where the file is stored, and then the data is stored in the address.

Referring to fig. 1, when data needs to be read, an index of the file can be found from the directory, and the data is read out from the corresponding address according to the index. In addition, the method specifically relates to a series of auxiliary structures or processing procedures such as logical addresses, cluster sizes, discontinuous storage and the like.

Referring to fig. 1, through the file system participating in the process of managing the storage contents of the storage medium, it is known that the existence of the file system makes it no longer simple to directly read and write data to the physical address of the storage medium when accessing the data, but rather should follow the read and write format of the file system. For example, through logical conversion, a complete integrity file may be stored in multiple segments to discrete physical addresses and the location of the next segment is known using a directory or linked list. Participation of the file system is undoubtedly necessary if the volume of the integrity file is large and the large storage space that needs to be occupied.

Referring to fig. 2, in an alternative embodiment, the hard disk capacity is expanding, and it is no longer reasonable to define 512 bytes per sector early, and some manufacturers set 512 bytes to 4096 bytes per sector. As NTFS, etc., become standard hard disk file systems, their default allocation unit size (cluster) of the file system may be 4096 bytes. Consideration is given to the fact that clusters and sectors are corresponding, even if physical hard disk partitions are aligned with logical partitions used by a computer, and based on ensuring the read-write efficiency and the read-write speed of the hard disk, optimization measures of interaction are necessary to be designed for the clusters and the hard disk interaction.

Referring to FIG. 2, in an alternative embodiment, a file system may need to write two physical memory space units while operating a cluster, provided that the access requests span different logical clusters. But in essence, a single physical storage space unit is sufficient to meet the data operations and storage requirements of the file system. If the logical cluster is matched with the physical storage space such as a sector, the writing speed of the hard disk is increased, and the service life of the hard disk is prolonged. Otherwise, if the logical cluster operated by the computer is mismatched with the physical storage space such as a sector, the operation speed of the hard disk is reduced, the service life of the hard disk is finished in advance due to unbalance of the read-write area of the hard disk, and part of the space is easy to become a bad area.

Referring to fig. 2, in a specific embodiment, the data of the logical cluster corresponds to 8 logical blocks, and the data amount definition of each logical block is 512B bytes, for example, and the 8 logical blocks are Log1, log2,. Log 8, and the like. Note that the listed data size and number of logical blocks are examples given for convenience of explanation, and the real computer is not limited thereto.

Referring to FIG. 2, in a particular embodiment, an access request, such as a read request, real_D spans different logical clusters, such as logical cluster CLU_1 and logical cluster CLU_2. In this embodiment, the start position of the logical cluster clu_1 partition is the cluster logical block C0 and the end position is the cluster logical block C7, totaling 8 blocks. The start position of the logical cluster clu_2 partition is the cluster logical block C8 and the end position is the cluster logical block C15, totaling 8 blocks. Then the access request spans two physical storage space units and the hard disk may write to both physical storage space units when the file system writes to a certain cluster or the hard disk may read both physical storage space units when the file system reads to a certain cluster.

Referring to FIG. 2, in particular embodiments, an access request such as read request real_D is not impermissible to span across different logical clusters, except that during the interaction of this access request, a control node needs to be allocated to each physical memory location, while in a control node that is subordinate to the same instruction, most of its information or digital segments are highly repetitive. Such as the amount of information typically carried by an access request, such as the type of access request operation, logical address information, physical address information, etc., thus causing significant waste of computer resources.

Referring to fig. 2, in a specific embodiment, the read request real_d has the same part for the first type of request content of the logical cluster clu_1 and the read request real_d has the same part for the second type of request content of the logical cluster clu_2, and naturally, there may also be different contents, the two types or two access requests of the read request real_d have the same part, and the information amount carried by the two access requests has repeated contents, which belongs to the redundancy amount/redundancy amount. Generally, the access request information amount is internally the object of operation, namely, the data itself and the access auxiliary information coexist simultaneously. The operation object is, for example, data to be read or data to be stored or data to be erased or any relevant data.

Referring to FIG. 2, in a specific embodiment, it is illustrated that there is a duplication between the amount of information that a read request real_D accesses a logical cluster, for example, the logical cluster CLU_1 and the logical cluster CLU_2, and the amount of information that the same read request accesses the logical cluster CLU_1. As mentioned above, the information amount generally includes various operation types such as a read operation, a write operation, an erase operation, etc., instruction information included in the information amount such as an interactive master-slave mode, an interactive unidirectional bidirectional communication mode, or various addresses based on hardware control or software control, address information included in the information amount such as various addresses including a storage particle LUN address or other logical addresses or physical addresses, a data validity record included in the information amount, etc., and clock information included in the information amount such as a clock polarity, a phase, or a frequency division factor, and of course, all information included in the information amount will not be described in detail herein. It can be seen that there is a lot of access-attached information accompanied with data in addition to the object of the operation, i.e., the data itself. The read request real_d accesses repeated portions of the information amounts of the logical clusters clu_1 and clu_2, for example, including operation type, instruction information, address information, identification information, clock information, and some other sort of necessary information such as MSB/LSB preceding agreement, CRC check expression, data frame length, and the like. Accessing ancillary information in addition to the operation object, i.e., the data itself, is also an essential component of the access request.

Referring to FIG. 2, in a particular embodiment, if analyzed from the perspective of the HOST HOST, its processor will have to process duplicate various access data and various amounts of access information because read request real_D spans different logical clusters CLU_1 and logical cluster CLU_2. But essentially the same number of things are filled in the processing of the processor for logical cluster clu_1 and the processing of the processor for logical cluster clu_2. In other words, the memory portion of a computer, such as SRAM or DRAM, will have the same content throughout, which can greatly impact computing performance and memory performance for a computer system and severely squeeze the interface bandwidth of both. The low bandwidth limit of the interface SATA can easily perceive the extremely large occupied interface bandwidth of the repeated throughput content, and the high bandwidth of the interface PCIe/NVMe can often hardly intuitively find the relationship between the repeated throughput content and the interface bandwidth, but a larger part of the bandwidth of the PCIe/NVMe is used to carry the auxiliary information of the access information.

Referring to fig. 2, in a particular embodiment, the controller or logic control unit (commond & control logic) involved in the analysis from the perspective of memory MMRY is also required to handle repeated various memory operations and various amounts of access information. Memory MMRY differs from HOST in that it is similar.

Referring to fig. 2, the logic control unit of the memory MMRY typically needs to decode the first type of request content for the logical cluster clu_1 and the second type of request content for the logical cluster clu_2 in the read request real_d, and the same part where two types or two access requests of the read request exist will be decoded and translated repeatedly at the memory MMRY side. Therefore, the same contents are filled in the processing process of the memory end, the performance of the memory is restricted, and the memory end also needs to spend huge hardware and software resources to process the hidden redundancy amount inside the access request. It is noted that the redundancy is based on consideration and knowledge from the human point of view, and actually, from the point of view of the HOST and the memory MMRY, whether or not it is a duplicate, is sequentially executed according to a predetermined instruction.

Referring to FIG. 3, ideally the read request real_D should have a reasonable time consuming real_T during operation of a device such as a computer or memory. The TIME-axis TIME represents the TIME taken to perform an event (e.g., an access request such as a data read operation or a data write operation or a data erase operation). The start position of the logical cluster clu_1 is the cluster logical block C0 and the end position is the cluster logical block C7, belonging to a partition block. The start position of the logical cluster clu_2 partition is the cluster logical block C8 and the end position is the cluster logical block C15, belonging to a partition block. The first request of read request real_d in the first physical memory location takes some time, indicated as wait_t1 in the figure, and the second request of read request real_d in the second physical memory location takes some time, indicated as wait_t2 in the figure, due to the partition characteristics of logical cluster clu_1 and logical cluster clu_2. An access request, such as read request real_d, may take more time if it spans more physical memory space units.

Referring to FIG. 3, the ideal time consuming real_T of HOST HOST executing read request real_D and the total time consuming of the time execution event will be severely delayed, because the main content of the information amount usually needs to complete the data construction and transfer and parsing processes sequentially when executing the event. Processors and memory can be treated repeatedly to create a blocking situation, although computers or memories perform tasks according to their own programs. Therefore, from the task end, no blocking exists, and from the perspective of executing the task by the computer or the memory, no blocking exists, and the program has no resistance or hardware factors preventing the normal execution of the program in the execution stage and is not stopped. The troublesome problem is therefore that the obstruction is hidden.

Referring to fig. 4, the memory SRAM or DRAM internally would be represented as having a data STREAM1 representing the throughput when executing the first event (assuming STREAM1 is the information of one request that needs to be set by the first independent logical cluster being data-operated), as having a data STREAM2 representing the throughput when executing the second event (assuming STREAM2 is the information of one request that needs to be set by the second independent logical cluster being data-operated), and as having a data STREAM3 representing the throughput when executing the third event (assuming STREAM3 is the information of one request that needs to be set by the third independent logical cluster being data-operated). Further data flows are not shown in the figure.

Referring to fig. 4, a first independent logical cluster, a second independent logical cluster, a third independent logical cluster, or more independent logical clusters are grouped into different physical memory space units. Thus, when the file system reads and writes a certain cluster, the hard disk may correspondingly read and write three or more physical storage space units.

Referring to fig. 4, the data STREAMs STREAM1-STREAM3 have normal data such as the read or stored data itself, which is the object of the operation, resulting in normal caches Buff0-Buff2.STREAM1-STREAM3 has a large number of duplicate content therein and generates redundant buffers OV1/OV2, etc. It is assumed that OV1 is a large number of repetitive contents between both STREAM1/STREAM2, such as the same operation type as a read operation or a write operation, and OV2 is a large number of repetitive contents between both STREAM2/STREAM3, such as the same storage grain LUN address or the same instruction information, and so on. It is found that even simple operations generate great memory overhead, which is intolerable in many storage occasions, and typically, the fields of artificial neural networks and the like involve a large amount of matrix operations, and the memory overhead directly compromises the operation speed of the whole system.

Referring to fig. 5, in the illustrated example, the start position of the logical cluster clu_1 partition is the cluster logical block C0 and the end position is the cluster logical block C7, totaling 8 blocks. The start position of the logical cluster clu_2 partition is the cluster logical block C8 and the end position is the cluster logical block C15, totaling 8 blocks.

Referring to fig. 5, in the illustrated example, it is identified whether an access request, e.g., real_d0, to access memory falls within the category of a single logical cluster, access request real_d0 falls within cluster logical blocks C4-C7 of logical cluster clu_1, while also finding that access request real_d0 falls within cluster logical blocks C8-C11 of logical cluster clu_2. It is apparent that the access request spans different logical clusters clu_1 and clu_2 simultaneously.

Referring to fig. 5, in an alternative embodiment, if an access request spans different logical clusters, the different logical clusters involved in the access request are combined into one flexible logical cluster with variable logical positions. The request of real_d0 spans different logical clusters clu_1 and clu_2, and then the different logical clusters involved in the access request, e.g. real_d0, need to be combined into one elastic logical cluster clu_pr0 with variable logical positions. Which includes logical cluster clu_1 at position C4-C7 and logical cluster clu_2 at position C8-C15. Note that the cluster logic blocks C4-C7 and the cluster logic blocks C8-C11 may not be individually combined into the elastic logic cluster clu_pr0, and only the logic clusters (e.g., clu_1 and clu_2) represented by the cluster logic blocks C4-C7 and the cluster logic blocks C8-C11, respectively, can be integrally programmed into the elastic logic cluster clu_pr0.

Referring to fig. 5, in an alternative embodiment, access request real_d0 is internally re-formed into a single content like request for different logical clusters (e.g., different logical clusters clu_1 and clu_2) but repeated content, thereby performing operations on the memory according to the re-formed access request.

Referring to fig. 5, in an alternative embodiment, the access request real_d0 internally contains, for example, read operations for different logical clusters (e.g., different logical clusters clu_1 and clu_2) but repeated contents, where the read operations originally record a first read operation information based on clu_1 in the data structure of the access request and the read operations originally record a second read operation information based on clu_2 in the data structure of the access request, the first and second read operation information are combined and reformed into a single-content read operation request. If the repeated content of different logic clusters (for example, different logic clusters clu_1 and clu_2) in the access request real_d0 contains a write operation, the write operation originally records a first write operation information based on clu_1 in the data structure of the access request, and the read operation originally records a second write operation information based on clu_2 in the data structure of the access request, then the first write operation information and the second write operation information are combined and reformed into a write operation request of single content. The first and second address operation information are combined and reformed into the same LUN address operation request of a single content, if the first LUN information based on clu_1 is originally recorded in the data structure of the access request, while the second LUN information based on clu_2 is originally recorded in the data structure of the access request, for example, the LUN address is contained in the repeated content of different logical clusters (for example, different logical clusters clu_1 and clu_2) inside the access request real_d0.

Referring to fig. 5, regarding the flexible logical cluster clu_pr0 of variable logical position, if a request of real_d0 spans different logical clusters clu_1 and clu_2, such as cluster logic blocks C0-C3, C4-C7, where the access request real_d0 is known to fall into logical cluster clu_1, while at the same time the access request real_d0 falls into cluster logic blocks C8-C11 of logical cluster clu_2, it is necessary to combine different logical clusters involved in the access request, e.g., real_d0, into one flexible logical cluster clu_pr0 of variable logical position, which includes a logical cluster clu_1 of position adjustment to C0-C7 and a logical cluster clu_2 of previous position C8-C15. Note that the cluster logic blocks C0-C7 and the cluster logic blocks C8-C11 may not be individually combined into the elastic logic cluster clu_pr0, and only the logic clusters (e.g., clu_1 and clu_2) represented by the cluster logic blocks C0-C7 and the cluster logic blocks C8-C11, respectively, can be integrally programmed into the elastic logic cluster clu_pr0.

Referring to fig. 6, in the illustrated example, it is identified whether an access request, e.g., real_d1, to access memory falls within the category of a single logical cluster, access request real_d1 falls within cluster logical blocks C0-C7 of logical cluster clu_1, while also discovering that access request real_d1 does not fall within any of the cluster logical blocks of the other logical clusters. It is apparent that the access request does not span different logical clusters at the same time. An access request, e.g., real_d1, does not span a different logical cluster and then operates on the memory as per the original access request, e.g., real_d1. The operation class of the original access request to the memory includes at least a data read operation or a data write operation or a data erase operation. Or the access request real_d1 does not cross different logical clusters, the logical cluster related to the access request is regarded as a fixed logical cluster clu_pr1, or the current fixed logical cluster clu_pr1 is the logical cluster related to the access request real_d1 or the logical cluster itself.

Referring to FIG. 6, in the illustrated example, access request real_D1 and access request real_D2 coexist, meaning that there are multiple accesses between computers or memories, although the request contents of both real_D1 and real_D2 may be the same or different, e.g., the former is a read operation and the latter is a write operation, or vice versa.

Referring to fig. 6, in the illustrated example, the start position of the logical cluster clu_2 partition is cluster logical block C8 and the end position is cluster logical block C15, totaling 8 blocks. The start position of the logical cluster clu_3 partition is the cluster logical block C16 and the end position is the cluster logical block C23, totaling 8 blocks.

Referring to fig. 6, in the illustrated example, it is identified whether an access request, e.g., real_d2, to access memory falls within the category of a single logical cluster, access request real_d2 falls within cluster logical blocks C8-C11 of logical cluster clu_2, while also discovering that access request real_d2 falls within cluster logical blocks C16-C23 of logical cluster clu_3. It is apparent that the access request spans different logical clusters clu_2 and CLU3 at the same time, but does not relate to clu_1.

Referring to fig. 6, in an alternative embodiment, if an access request spans different logical clusters, the different logical clusters involved in the access request are combined into one flexible logical cluster with variable logical positions. The request of real_d2 spans different logical clusters clu_2 and clu_3, and then the different logical clusters involved in the access request, e.g. real_d2, need to be combined into one elastic logical cluster clu_pr2 with variable logical positions. It includes a logical cluster clu_2 at position C8-C11 and a logical cluster clu_3 at position C16-C23. The cluster logic blocks C8-C11 and the cluster logic blocks C16-C23 cannot be individually combined into the elastic logic cluster clu_pr2, and the logic clusters (such as clu_2 and clu_3) represented by the cluster logic blocks C8-C11 and the cluster logic blocks C16-C23 are required to be integrally coded into the elastic logic cluster clu_pr2.

Referring to fig. 6, regarding the flexible logical cluster clu_pr2 with a variable logical position, if a request of real_d2 spans different logical clusters clu_2 and clu_3, such as a cluster logical block C8-C11, a cluster logical block C12-C15, which is aware that the access request real_d2 falls into the logical cluster clu_2, while the access request real_d2 falls into a cluster logical block C16-C23 of the logical cluster clu_3, it is necessary to combine different logical clusters involved in the access request, e.g., real_d2, into one flexible logical cluster clu_pr2 with a variable logical position, which includes a logical cluster clu_2 with a position adjusted to C8-C15 and a logical cluster clu_3 with a previous position of C16-C23.

Referring to fig. 6, in an alternative embodiment, access request real_d2 is internally re-organized into homogeneous requests of a single content for different logical clusters (e.g., different logical clusters clu_2 and clu_3) but for duplicate content, thereby performing operations on the memory in accordance with the re-organized access request.

Referring to fig. 6, in an alternative embodiment, the access request real_d2 internally has for different logical clusters (e.g., different logical clusters clu_2 and clu_3) but repeated content such as a mode instruction that originally recorded a first master-slave mode based on clu_2 in the data structure of the access request while the instruction originally recorded a second master-slave mode based on clu_3 in the data structure of the access request, the first and second master information are consolidated and reformed into a single content master-slave request. The first and second data format operation information is combined and reformed into a single content data frame length request for different logical clusters (e.g., different logical clusters clu_2 and clu_3) within the access request real_d2 but for repeated content such as data frame lengths, the data format originally recorded a first data frame length based on clu_2 in the data structure of the access request and the data format originally recorded a second data frame length based on clu_3 in the data structure of the access request.

Referring to fig. 7, the memory SRAM or DRAM internally would have the same content expressed as having a data STREAM1 representing the throughput when executing the first event (assuming that STREAM1 is the information of one request that needs to be set by the first independent logical cluster being data-operated) and the same content expressed as having a data STREAM2 representing the throughput when executing the second event (assuming that STREAM2 is the information of one request that needs to be set by the second independent logical cluster being data-operated).

Referring to fig. 4, the data STREAMs STREAM1-STREAM3 have normal data such as the read or stored data itself, which is the object of the operation, resulting in normal caches Buff0-Buff2.STREAM1-STREAM3 has a large number of duplicate content therein and generates redundant buffers OV1/OV2, etc. It is assumed that OV1 is a large number of repetitive contents between both STREAM1/STREAM2, such as the same operation type as a read operation or a write operation, and OV2 is a large number of repetitive contents between both STREAM2/STREAM3, such as the same storage grain LUN address or the same instruction information, and so on. In fig. 7, the redundant buffers are merged and reformed into homogeneous requests of a single content, and the data stream has an OV1/ov2 of nearly single occurrence.

Referring to FIG. 7, it is found that no memory overhead is significantly increased by any memory operation, and that no matter where the memory is, the saved memory is suitable for high-speed operation of the whole computer. The various hardware components of the computer, including the interface module and the memory module, etc., are substantially compressed and substantially reduced in their internal content otherwise handled according to conventional schemes, which greatly improves computing and memory performance for the computer system and frees up the interface bandwidth both of which are preempted by the amount of redundant information. The interface SATA or PCIe/NVMe does not consume significant resources to handle the additional redundancy information of the information volumes of these access requests. The controller or logic control unit (commond & control logic) to which the memory MMRY relates also eliminates the need to handle repeated various memory operations and repeated amounts of access information. Thereby significantly improving the performance of the storage. The relevant data operations of computers and memories referred to herein may be considered as one possible implementation of in-memory computing.

Referring to FIG. 6, in an alternative embodiment, the type of operation of the memory by an access request such as real_D1, real_D2 includes at least a data read operation or a data write operation or a data erase operation. The memory is, for example, a solid state disk of fig. 1, in particular a flash-based solid state disk. Memory MMRY maps logical addresses carried by access requests either on an original basis or on a post-reformed basis to their own internal physical addresses. In the case of an original access request, if the access request does not span a different logical cluster, then the operation is performed on the memory according to the original access request. Based on the situation of the access request after the reformation, if the access request spans different logic clusters, the different logic clusters related to the access request are combined into an elastic logic cluster with variable logic positions, and the content which is internally aimed at the different logic clusters and is repeated in the access request can be reformatted into the same kind of request with single content (namely, the reformation access request), so that the operation is performed on the memory according to the reformation access request.

Referring to fig. 6, in an alternative embodiment, for a logical cluster that is covered by an elastic logical cluster but not referred to by a current access request, the same operation as the current access request is performed with respect to information corresponding to its logical address, and the execution time of the same operation is kept synchronized with the execution time of the current access request.

Referring to fig. 5, in a specific embodiment, for a logical cluster such as clu_1 (in detail, a cluster logical block of logical cluster clu_1, e.g., cluster logical blocks C0-C3) that is covered by an elastic logical cluster such as clu_pr0 but is not involved by a current access request such as real_d0, information corresponding to the logical address of the logical cluster such as clu_1 is performed the same operation (e.g., read/write, etc.) as the current access request such as real_d0, and the execution time of the same operation (e.g., read/write, etc.) is kept synchronized with the execution time of the current access request such as real_d0.

Referring to fig. 5, in a more detailed embodiment, for cluster logic blocks C0-C3 that are covered by an elastic logic cluster such as clu_pr0 but are not involved by a current access request such as real_d0 (in detail, cluster logic blocks of logic cluster clu_1 that are covered by an elastic logic cluster but are not involved by a current access request), the same operation (e.g., read/write, etc.) as that of the current access request such as real_d0 (e.g., read/write, etc.) can be performed with respect to information corresponding to the logical addresses of the cluster logic blocks C0-C3 of the logic cluster such as clu_1, and the execution time of the same operation (e.g., read/write, etc.) is synchronized with the execution time of the current access request such as real_d0. For example, the information corresponding to cluster logic blocks C0-C3 and the information corresponding to cluster logic blocks such as cluster logic blocks C4-C7, C8-C11, which are covered by the resilient logic cluster such as clu_pr0 and are involved by the current access request such as real_d0, perform the same operation with the execution time of the current access request such as real_d0 (e.g. for C4-C7, C8-C11) kept synchronized. The method has the advantages that the adaptation relation of the cluster and the sector can be followed, the physical space partition is aligned with the logic partition of the computer, the storage space is not wasted, and the high speed of data read-write operation is ensured. And is balanced with the PICe/NVMe interface speed of the main stream. The data operation is carried out on all the cluster logic blocks of the logic clusters evenly, rather than favoring the continuous execution of the data operation on the minority of the cluster logic blocks all the time, and the service life of the whole logic clusters where the minority of the cluster logic blocks are located is finished in advance.

Referring to fig. 5, in an alternative embodiment, the total number of logical clusters contained in the resilient logical cluster clu_pr0, etc. is a positive integer multiple of a single logical cluster (e.g., 2 times of fig. 5 or 1 to 2 times of fig. 6), and the change in logical position of the resilient logical cluster is implemented in a manner that covers the complete logical cluster or clusters at a time. The change in logical position of the elastic logical cluster does not allow for covering only half of the logical clusters, nor does the change in logical position of the elastic logical cluster allow for covering only other numbers of cluster logical blocks (e.g., 1-7) in a single logical cluster that are lower than the total number of its cluster logical blocks (e.g., 8).

Referring to fig. 5, in an alternative embodiment, the logical position of the resilient logical cluster clu_pr0 is implemented in such a way as to overlay the complete two logical clusters, e.g. to overlay logical clusters clu_1 and clu_2. The change of logical positions of the resilient logical clusters does not allow to cover only other numbers of cluster logical blocks (e.g. 4 cluster logical blocks such as C4-C7 among the logical clusters clu_1) in a single logical cluster clu_1, which is lower than the total number of its cluster logical blocks (e.g. 8). The change of logical positions of the resilient logical clusters does not allow to cover only other numbers of cluster logical blocks (e.g. 4 cluster logical blocks of C8-C11 among the logical clusters clu_2) in a single logical cluster clu_2, which is lower than the total number of its cluster logical blocks (e.g. 8).

Referring to fig. 5, in an alternative example, if an access request crosses a different logical cluster, the different logical clusters involved in the access request are combined into one flexible logical cluster with variable logical positions, and content repeated for the different logical clusters inside the access request is reformed into a homogeneous request of single content (the homogeneous request for the different logical clusters includes at least a read request or a write request or an erase request), and an operation is performed on the memory according to the reformed access request.

Referring to fig. 5, in an alternative embodiment, the addresses of different logical clusters to which access requests relate are consecutive, or their addresses are separated by logical clusters to which access requests do not relate, in an elastic logical cluster.

Referring to fig. 5, in an alternative example, the addresses of different logical clusters (e.g., clu_1 and clu_2) involved in an access request such as real_d0 in the resilient logical cluster clu_pr0 are consecutive. In fig. 6 it is assumed that the request of real_d2 does not comprise a logical cluster clu_2 but only a logical cluster clu_3, and in fig. 6 it is assumed that the requests of real_d1 and D2 belong to the same operation once, and that the addresses of the requests of real_d1, clu_1 and clu_3 of real_d2 are separated by a logical cluster clu_2 not involved in access requests real_d1 and D2. The requests for real_d1 and D2 assume the same operation, e.g., real_d1 and real_d2 belong to the same read operation or write operation, etc., they are combined into a single operation belonging to one alternative (the access request may be denoted real_d1/D2), and they belong to two different operations separately and also to one alternative (the access request may be denoted real_d1 and real_d2).

Referring to fig. 5, in an alternative embodiment, in the resilient logical cluster, the addresses of the cluster logical blocks of different logical clusters to which the access request relates are consecutive, or, alternatively, the addresses of the cluster logical blocks of different logical clusters to which the access request relates are separated by a logical cluster block to which the access request does not relate. From the present embodiment, it can be seen that the method for improving storage performance described herein is excellent in adaptability, and the scheme for improving storage performance is applicable to both the case where addresses of access logical clusters are consecutive (or addresses of cluster logical blocks of logical clusters are consecutive) and the case where addresses of access logical clusters are intermittent (or addresses of cluster logical blocks of logical clusters are intermittent). This consideration is based primarily on the fact that the data of the storage environment may be randomly in a continuously distributed state or in a discontinuously distributed state in the storage space.

Referring to fig. 6, in an alternative example, it is assumed that the request of real_d2 already includes logical cluster clu_2 and includes logical cluster clu_3, and in fig. 6 it is assumed that the requests of real_d1 and D2 belong to the same operation, and that the addresses of the different logical clusters involved in the access request, such as cluster logic blocks C0-C7, C8-C15, C16-C23 of clu_1 to clu_3, are consecutive instead of intermittent, which is an alternative example. The access request at this time belongs to real_d1/D2.

Referring to fig. 6, in an alternative example, assuming that the request of real_d2 does not include logical cluster clu_2 but only logical cluster clu_3, in fig. 6 it is assumed that the requests of real_d1 and D2 belong to the same operation, the addresses of the cluster logic blocks C0-C7, C16-C23 of the different logical clusters, e.g. clu_1 and clu_3, to which the access request relates are discontinuous and are separated by a logical cluster or a logical cluster block, e.g. C8-C15, to which the access request does not relate. This is also an optional example and the access request at this time belongs to real_d1/D2.

Referring to FIG. 6, in an alternative embodiment, memory MMRY, upon receiving a succession of access requests, may first determine whether the smallest command execution units, e.g., LUNs, of the access requests (e.g., real_D1 and real_D2) are consistent and determine the respective operation categories of the access requests (e.g., real_D1 and real_D2). A series of access requests (such as real_d1 and real_d2) with the same minimum command execution units and the same operation types are synchronously executed once according to the physical addresses mapped by the respective logical addresses and according to the operation types, and the sequential execution of the plurality of access requests (such as real_d1 and real_d2 are not executed any more) is replaced by the memory. Real_d1 and D2 assume in this example access requests that access the same LUN and that are of the same class, e.g. they are both of the same class read operation or they are both of the same class write operation. Real_d1 and D2 perform one operation (read operation) synchronously with the physical address to which their respective logical addresses are mapped and with their class of operation (e.g., read operation), and the requests of real_d1 and D2 are performed synchronously in this example, not sequentially in the conventional scheme. The method has the main advantages of strengthening the robustness of data operation, reducing the delay of data operation and improving the balance of the whole storage space of the memory.

Referring to FIG. 6, in an alternative embodiment, memory MMRY, upon receiving a succession of access requests, may first determine whether the smallest command execution units, e.g., LUNs, of the access requests (e.g., real_D1 and real_D2) are consistent and determine the respective operation categories of the access requests (e.g., real_D1 and real_D2). A series of access requests (such as real_D1 and real_D2) with the same minimum command execution unit and different operation categories are sequentially executed according to the physical address mapped by the respective logical addresses and according to the operation categories thereof. Real_d1 and D2 are assumed in this example to be access requests that access the same LUN and that are of different classes, e.g. one of them is a read operation and the other of them is a write operation. Real_d1 and D2 are performed sequentially by the physical address to which their respective logical addresses are mapped and by their class of operation (e.g., read and write operations, respectively). The requests of real_d1 and D2 are performed sequentially in this example, for example, real_d1 is performed first and real_d2 is performed later. The method has the main advantages of strengthening the robustness of data operation, reducing the delay of data operation and improving the balance of the whole storage space of the memory.

Referring to FIG. 6, in an alternative embodiment, memory MMRY, upon receiving a succession of access requests, may first determine whether the smallest command execution units, e.g., LUNs, of the access requests (e.g., real_D1 and real_D2) are consistent and determine the respective operation categories of the access requests (e.g., real_D1 and real_D2). Serial access requests (e.g., real_d1 and real_d2) of different minimum command execution units and of the same or different operation classes are synchronously executed according to the physical address to which the respective logical address is mapped and according to their operation classes. Real_d1 and D2 are assumed in some example to be access requests that access different LUNs 1 and 2 and that are of the same class, e.g., they are both of the same class read operations or they are both of the same class write operations. Real_d1 and D2 perform an operation (e.g., a read) synchronously with the physical address to which their respective logical addresses map and with their class of operation (e.g., a read), the requests of real_d1 and D2 are performed synchronously in this example but not sequentially in a conventional scheme. In contrast, real_d1 and D2 are assumed in some example to be access requests that access different LUNs 1 and 2 and that are of different categories, such as one of them being a write operation and the other of them being a read operation. Real_d1 and D2 perform one operation (e.g., write operation and read operation, respectively) synchronously with the physical address mapped by their respective logical addresses and with their operation categories, and the requests of real_d1 and D2 are performed synchronously in this example but not sequentially in the conventional scheme. The method has the main advantages of strengthening the robustness of data operation, reducing the delay of data operation and improving the balance of the whole storage space of the memory.

Referring to fig. 6, a method for improving storage performance, in conjunction with fig. 5, mainly includes, for example, identifying at a computer side or a memory side whether an access request falls into the category of a single logical cluster of a memory, if the access request spans different logical clusters, combining different logical clusters related to the access request into one flexible logical cluster with a variable logical position, and internally reforming content of the access request for the different logical clusters but repeated into a homogeneous request of a single content, thereby performing an operation on the memory according to the access request after the reforming. As described above, when the memory receives a plurality of consecutive access requests, the memory performs a series of access requests having the same operation type and the same minimum command execution unit, and performs a corresponding operation in synchronization with the physical address mapped to the respective logical address and the operation type thereof, instead of sequentially performing the plurality of access requests by the memory.

Referring to fig. 5, in an alternative embodiment, if an access request spans different logical clusters, the different logical clusters involved in the access request are combined into one flexible logical cluster with variable logical positions, and the content of the access request, which is repeated for the different logical clusters, is reformed into a homogeneous request of single content, so that the operation is performed on the memory according to the access request after the reformation. The address ordering of the respective cluster logic blocks of the different logic clusters is scrambled in the elastic logic clusters, and the cluster logic blocks that perform the operation (i.e., perform the operation on the memory in accordance with the reformed access request) are required to sequentially perform the operation in accordance with the scrambling address (scrambled address) under the condition of scrambling the address ordering. And if the data information of any cluster logic block and the preset verification between the operations are inconsistent during the implementation of the operations, canceling the operations in the elastic logic cluster until all the cluster logic blocks needing to execute the operations return to the original state. Here, cancelling the operation halfway is substantially equivalent to operation failure, the file system needs to cancel the operation at this time automatically, and the data is restored to the original state. In conjunction with fig. 6.

Referring to fig. 5, for example, the access request real_d0 spans the logical clusters clu_1 and clu_2, then the logical clusters clu_1 and clu_2 related to the access request are combined into an elastic logical cluster clu_pr0 with variable logical positions, and the content which is repeated for different logical clusters inside the access request is reformed into a homogeneous request with single content, so that the operation is performed on the memory according to the access request after the reforming. In an alternative embodiment, for the elastic logical cluster clu_pr0, the original address ordering of the cluster logical blocks of different logical clusters, for example, clu_1 and clu_2, is first disordered, and a small number of cluster logical blocks, for example, cluster logical blocks C0-C9 are taken as examples, and the actual cluster logical blocks are not limited thereto. The cluster logic blocks that are required to perform the operations are, for example, C0-C9 as an example to sequentially perform the operations in a disordered address based on the condition of the address ordering disorder. The addresses of the disorder of the address ordering are, for example, C0, C3, C6, C9, C1, C4, C7, C2, C5, C8. The addresses whose relative address ranks are not disturbed are, for example, C0, C1, C2, C3, C4, C5, C6, C7, C8, and C9. Note that the rule of the address ordering disorder is not limited here, but it should be avoided that the same logical cluster such as clu_1 or clu_2 directly performs the operation without the room for data retrieval. If the data information of any cluster logic block, such as any cluster logic block in C9, and the preset verification between the operations are not coincident during the implementation of the operations (such as writing operations or erasing operations or data overwriting, etc.), the operations are canceled in the elastic logic clusters until all cluster logic blocks needing to perform the operations, such as C0, C3, C6, C9, return to the original state, i.e. those cluster logic blocks which have already performed the operations still need to return to the original state, and the cluster logic blocks which have not performed the operations do not continue to perform the operations. The provision of the disordered address ordering can effectively avoid that each cluster logic block arranged in sequence in the same logic cluster performs the operation and loses all original information. The preset verification aspect performs a preset verification operation (for example, requiring the digital quantities a and B to perform an XOR or other verification) on the given operation (for example, the digital quantity a) and the data information (for example, the digital quantity B) as the operation object, and cancels all the operations of the cluster logic blocks C0-C9 if the data information of any cluster logic block and the preset verification mismatch between the operations. The use of XOR checking for the exemplary validation operations herein is only an alternative and not a limiting one.

Referring to fig. 5, according to the foregoing, if the verification between the data of any one cluster of logic blocks and the operation instruction is inconsistent during execution of the operation, the operation is canceled. If the operation is canceled halfway, the equivalence and the operation fail, the file system is set to cancel the operation at the present time automatically, and the data is restored to the original state. Similar to the data coverage, if the data coverage exits halfway, the data coverage is allowed to be actively canceled and the data coverage operation is actively guided to be automatically canceled. The reason is that the current technical conditions cannot ensure that the storage operation action between the computer and the memory is always hundred percent accurate. The combination of the operation instructions themselves or the operation instructions loaded on the data or the intermediate process of constructing, transferring and analyzing any information (such as the operation instructions and the operation objects) among different devices can cause errors in the storage operation actions. In the application, different logic clusters related to the access request are combined into the elastic logic cluster with variable logic positions, wherein the data error of any one logic block of any one logic cluster can cause the error of all data of the whole elastic logic cluster, for example, the error (such as erasure or writing) of the fine cluster logic block Log1 of the logic block C7 of the cluster in FIG. 5 can cause the error of all data of the whole elastic logic cluster CLU_PR0 in a single access request operation, and the error is not limited to the logic cluster C7. This situation typically results in all data being sliced within memory MMRY with irreversible data mishandling consequences. The method comprises the steps of combining different logic clusters related to the access request into an elastic logic cluster with variable logic positions, and reforming repeated contents of the access request for the different logic clusters into a same kind of request with single content, so that the operation is performed on the memory according to the access request after the reforming, wherein obviously, the scheme can bring advantages but also brings disadvantages. Regarding the main cause of the disadvantages, it is possible to compare fig. 2, and such disadvantages can be avoided if the access request is not a reformed access request, for example, the first access request operates clu_1 first and then the second access request operates clu_2 after the operation is completed. If clu_1 and clu_2 are not combined into one flexible logical cluster with variable logical positions, the consequences of intra-memory slice data errors do not occur. Clu_1 errors are just errors of its own and do not affect clu_2, clu_2 errors do not affect clu_1. In contrast, performing an operation on the memory according to the access request after reforming, clu_1 error may affect more space of clu_1 to clu_4096 and even the elastic logical cluster clu_pr0, and clu_2 error may affect more space of clu_1 to clu_4096 and even the elastic logical cluster clu_pr 0. Avoiding these disadvantages may lose the aforementioned advantages, it is apparent that troublesome problems arise here and how to eliminate them is a doubt that needs to be solved. Whereas the aforementioned schemes involving scrambling are viable schemes.

Referring to fig. 5, if the access request spans different logical clusters and the different logical clusters involved in the access request are combined into one flexible logical cluster with variable logical positions, as described above, some disadvantages are brought. This reforms the content within the access request for different logical clusters but repeated into a single content homogeneous request, thereby performing operations on the memory in accordance with the reformed access request, also with associated disadvantages. The disadvantage event is that when any logic cluster in the elastic logic cluster is in error when executing the operation, all logic clusters in the elastic logic cluster are induced to be in error at the same time, and irreversible data loss or error is generated. Note that the access requests after the different single logical clusters are combined together as a whole and the synchronous execution reforming of the different single logical clusters is the root cause of the occurrence of the adverse event. The main purpose of the foregoing description is to avoid the occurrence of the above adverse events.

Referring to fig. 5, according to the foregoing, the measure is equivalent to performing the operation once every several cluster logic blocks and leaving a series of reserved cluster logic blocks to not perform the operation until the operation flows to the last cluster logic block and then performing the same operation again on the reserved cluster logic blocks. The method is also a data operation verification method for disturbing the cluster logic block ordering rule, namely, the correctness of the data operation is verified not according to the original logic cluster ordering, but rather, the address ordering of each cluster logic block of different logic clusters is disturbed in the elastic logic cluster, and the cluster logic blocks for executing the operation are required to verify the correctness of the data operation according to the disorder address under the condition of disturbing the address ordering. The advantage is that there is also a chance of cancelling the malfunction during the discovery of the malfunction, and that the solution herein provides multiple verifications for the operation, thereby ensuring the full correctness of the operation. This is an unattractive and difficult advantage of conventional storage operating schemes. Multiple verifications here, for example, are manifested in providing one chance of verifying for each cluster logical block and at least one chance of data recovery and regression for each logical cluster. This is decided based on the particular scheme in which the operation is performed on the memory.

Referring to fig. 7, it will be understood by those skilled in the art that, regarding the method for improving storage performance, all or part of the steps of implementing the above-described method embodiments may be implemented by hardware related to program instructions, where the above-described program may be stored in a computer readable storage medium, and the program may execute the steps including the above-described method embodiments when executed, where the above-described storage medium includes various media capable of storing program codes, such as ROM, RAM, magnetic disk, or optical disk drive.

The foregoing description and drawings set forth exemplary embodiments of the specific structure of the embodiments, and the foregoing invention provides presently preferred embodiments, without being limited to the precise arrangements and instrumentalities shown. Various alterations and modifications will no doubt become apparent to those skilled in the art after having read the above description. Therefore, the appended claims should be construed to cover all such variations and modifications as fall within the true spirit and scope of the invention. Any and all equivalent ranges and contents within the scope of the claims should be considered to be within the intent and scope of the present invention.

Claims

1. A method for improving storage performance, characterized in that it comprises:

The system identifies whether a memory access request falls within a single logical cluster. If the access request does not cross different logical clusters, the memory operation is performed according to the original access request. Otherwise, if the access request crosses different logical clusters, the different logical clusters involved in the access request are combined into a flexible logical cluster with variable logical location. The duplicate content within the access request for different logical clusters is reorganized into a single content request of the same type. The memory operation is then performed according to the reorganized access request. In the flexible logical cluster, the address order of the cluster logic blocks of each logical cluster is shuffled. The cluster logic blocks that need to perform the operation are executed sequentially according to the shuffled address order. If the data information of any cluster logic block does not match the preset verification between the operation and the operation during the execution of the operation, the operation is immediately canceled in the flexible logical cluster until all cluster logic blocks that need to perform the operation return to their original state.

2. The method according to claim 1, characterized in that:

The access request for the memory operation category includes at least a data read operation, a data write operation, or a data erase operation.

3. The method according to claim 1, characterized in that:

The type of memory includes flash-based solid-state drives, which map logical addresses carried by original access requests or refactored access requests to their internal physical addresses.

4. The method according to claim 1, characterized in that:

The total number of logic clusters contained in the elastic logic cluster is a positive integer multiple of a single logic cluster, and the change of its logical position is implemented in a way that covers one or more logic clusters completely at once.

5. The method according to claim 1, characterized in that:

For logical clusters covered by the elastic logical clusters but not involved in the current access request, perform the same operation as the current access request on the information corresponding to their logical addresses, and keep the execution time of the same operation synchronized with the execution time of the current access request.

6. The method according to claim 1, characterized in that:

The same type of request for different logical clusters includes at least a read request, a write request, or an erase request.

7. The method according to claim 1, characterized in that:

In the elastic logical cluster, the addresses of different logical clusters involved in the access request are consecutive, or their addresses are separated by logical clusters not involved in the access request.

8. The method according to claim 1, characterized in that:

When the memory receives several consecutive access requests, it first determines whether the target minimum command execution unit of the several access requests is the same, and then determines the operation category of each of the several access requests;

A series of access requests with the same minimum command execution unit and the same operation category are synchronously executed once according to the physical address mapped to their respective logical addresses and according to their operation categories, thereby replacing the memory's sequential execution of several access requests.

9. The method according to claim 1, characterized in that:

A series of access requests with the same minimum command execution unit but different operation categories are executed sequentially according to the physical addresses mapped to their logical addresses and according to their operation categories; or

A series of access requests with different minimum command execution units, the same or different operation categories, are executed synchronously according to the physical addresses mapped to their respective logical addresses and according to their operation categories.

10. A method for improving storage performance, characterized in that it comprises:

The system identifies whether an access request falls within the scope of a single logical cluster of the memory. If the access request spans different logical clusters, the different logical clusters involved in the access request are combined into a flexible logical cluster with a variable logical location. The duplicate content within the access request for different logical clusters is reorganized into a single content request of the same type. The memory is then operated according to the reorganized access request. In the flexible logical cluster, the address order of the cluster logic blocks of each logical cluster is shuffled. The cluster logic blocks that need to perform the operation are operated sequentially according to the shuffled address order. If the data information of any cluster logic block does not match the preset verification between the operation and the operation during the operation, the operation is immediately canceled in the flexible logical cluster until all cluster logic blocks that need to perform the operation return to their original state.

When the memory receives several consecutive access requests, it synchronously executes a series of access requests with the same minimum command execution unit and the same operation category, according to the physical address mapped to each of their logical addresses and according to their operation categories, instead of the memory executing the access requests sequentially.