WO2021173874A1 - Système et procédé de compression de données - Google Patents
Système et procédé de compression de données Download PDFInfo
- Publication number
- WO2021173874A1 WO2021173874A1 PCT/US2021/019734 US2021019734W WO2021173874A1 WO 2021173874 A1 WO2021173874 A1 WO 2021173874A1 US 2021019734 W US2021019734 W US 2021019734W WO 2021173874 A1 WO2021173874 A1 WO 2021173874A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- search
- run
- length
- buffer
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/46—Conversion to or from run-length codes, i.e. by representing the number of consecutive digits, or groups of digits, of the same kind by a code word and a digit indicative of that kind
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
- H03M7/4031—Fixed length to variable length coding
- H03M7/4037—Prefix coding
- H03M7/4043—Adaptive prefix coding
- H03M7/4062—Coding table adaptation
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/60—General implementation details not specific to a particular type of compression
- H03M7/6017—Methods or arrangements to increase the throughput
- H03M7/6029—Pipelining
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/93—Run-length coding
Definitions
- Present embodiments relate to lossless hardware compression and decompression of generic data.
- the present embodiments in one aspect a data compression method comprising: identifying a search buffer size according to a hardware requirement and a desired compression ratio, and breaking up a search buffer to independent searchable blocks such that they are routed on silicon and meet a cycle-time requirement.
- a search controi and collation module comprising a pattern search module, a precode module in communication with the pattern search module, an encode module in communication with the precode module, and encode module in communication with the recode module, and an output buffer in communication with the recode module.
- a ia RFC 1951, a byte stream is replaced by a run-length stream.
- An unencoded byte may be replaced by an encoded literal-byte or a sequence of unencoded bytes may be replaced by a run(length, distance).
- the unencoded byte-stream 1234123 can be replaced by literal-bytes 1, 2, 3, 4 and run ⁇ 3,4), where run(3,4) means "copy the 3 bytes from the unencoded stream starting 4 bytes back"
- Run- lengths generally turn out to be quite common and can replace a significant sequence of unencoded bytes with a single run(iength, distance) symbol, This alone can offer significant data compression, in compression systems, there is a finite set of allowed runflength, distance) symbols.
- Huffman compression recodes the run-length encoding so that more frequent symbols have a shorter Huffman code.
- ASCI! "e” is an 8-bit binary symbol but may be recoded as a 4-bit symbol because of its prevalence, "z” may be recoded as a 12-bit symbol. This exceeds its original 8 bits but the rarity of “z” makes its longer recoded length less relevant and the prevalence of "e” makes its recoded shorter length more relevant.
- RFC's describe sophisticated methods for generating run-length codes, and for forming Huffman codes that can be either static or dynamic, in that the Huffman code can be changed on the fly in response to the changing character of the unencoded data.
- These schemes offer great variability for software implementations where the degree of compression can be improved by allowing more computing power to be expended. While great for software implementation, this is impractical for hardware designs.
- Present embodiments present novel Implementations of some of the principles of the RFCs with various novel methods to create a new system for compressing and decompressing data that is amenable to an efficient hardware implementation.
- the RFCs allow run-length distances to be as high as 32 kbytes, or a quarter of a Mbit. However, it is not practical for a high-speed digital circuit to rapidly search a quarter of a Mbit of data for a run-length match. Accordingly, some of the present embodiments effectively compress data even with as little as 256 bytes of data in the run-iength search.
- Figure 1 shows some of the steps taken in the present embodiments and are based on the following:
- searching data for any possible run-length match at any possible distance within the search data resuits in so many wires the design cannot be reduced to working silicon.
- step 121 break the searches into independently searchable blocks where the number of wires and their length are controllable to ensure they can be routed on silicon and meet the cycle time requirements. How these blocks find run-lengths is one aspect of the present embodiments.
- a novel run-length encoding scheme makes it possible to easily extend the capability of a hardware implementation without any fundamental redesign of any system components.
- a Huffman scheme different from the RFCs allows new features to be added to the run-length encoding so as to improve upon the resulting compression.
- Such features are novel and therefore not compatible with the RFCs and represent an improvement in compression.
- the decoding of run-iengths includes a recoding to simplify the decoding process at step 124 'without compromising the compression rate.
- a novel manner is envisaged according to which ail the above components are pieced together to form a coercive system.
- uncompressed data is shifted into a match buffer, conceptually one byte at a time.
- the width of the match buffer (in bytes) determines the largest run-length allowed in a particular implementation. The remainder of the process does not start until the match buffer is full.
- the search buffer is logically a single large shift register whose length is a tradeoff between compression results and practicality for hardware implementation, For example, in some embodiments 512 bytes is a reasonable length and in some embodiments 2048 bytes will lead to a larger design with better compression.
- the search buffer is partitioned into a power-of-2 number of groups, for example 8, called a search group,
- the search group is also partitioned into a power-of-2 number of blocks, for example 8, called search blocks.
- the number of bytes in the search blocks represent the largest match length obtainable for a block. For example, if a particular design is chosen to support a maximum run-length of 8 then the match buffer and search blocks would all be 8 bytes.
- 64 search blocks of 8 bytes represent 512 bytes of search buffer.
- each search group chooses the best run-lengh of all the search blocks in its group then the best run-length between ail the groups is determined. This is a divide-and-conquer approach - if there are 64 search blocks, it's not practical for hardware to choose the best run-length between ail 64 search blocks. Instead, each search group only has to choose between 8 search blocks then the system chooses the best of the 8 search groups.
- the last search group it is preferable for the last search group to not implement its last search block since this may lead to a more efficient huff man encoding scheme, in the given example, this would mean there are only 63 search blocks, or 504 bytes of search history.
- each search block or search group may contain its awn local copy of the match buffer and all the match buffers are identical as seen in step 313.
- a byte of data to be compressed is received, it is sent to all the match buffers.
- the reason for this is that a single match buffer cannot be connected to the logic of all the search blocks without creating problems for a hardware implementation. If each search group maintains a single or multiple local copy of the match buffer, the length of wires and the number of loads on each wire is kept to within reasonable hardware design tolerances.
- the incoming uncompressed data stream is fanned out to all the match buffers.
- the fan-out tree does not need to be limited to a single dock cycle - adding cycle points for physical design reasons is only a latency impact, not a data bandwidth impact nor a compressibility impact. Most significant bytes are shifted out at step 315 to the next search block.
- a search is performed accordingly.
- the search is shown in step 314.
- the proximal match buffer is compared with all the search blocks within the group. If the largest allowed run-iength is 8 then the searches would simultaneously seek matches of 0, 1, 2, 3, 4, 5, 6, 7, and 8 consecutive bytes at every possible starting position within each search block. For a given run-length, r, being sought, only the most significant r bytes of the match buffer are involved in the searches. This is because the most significant bytes of the match buffer are the next to be moved out and, hence, are the next to be run-length encoded.
- each search selects the largest run-length detected and transmits that result, along with the byte index in the search block that identifies either the start or end of the run, to the search group. If there are multiple largest matches (for example, search block bytes 0-2 and 4-6 both match and have a run-iength of 3 ⁇ then it's not functionally important which result is transmitted. However, one result may be favored over another result due to a simplified hardware design, which includes the decoding of these symbols when the data is decompressed. There is also the question of whether the starting index or the ending index of the search buffer, called the offset of the run-length, is transmitted. In some embodiments, there are some optional compression improvements discussed later that are simplified by transmitting the starting index. For example, if the 0-2 run-length is chosen over the 4-6 run-length, that starting index is 0,
- a search group receives the search results from each of its associated search blocks. From those results, it selects the largest run-length and transmits it along with the offset and a block id to the ⁇ top ofthe search tree.
- the transmitted offset is a concatenation of two identifiers, a modified block id and the offset within the winning search-block.
- the search blocks are numbered 0 through b, where a lower number is given to a search block closer to the start ofthe search chain. This number, from the wi nning search block, forms the least significant bits of the modified block id.
- the search groups are numbered, 0 though g, where a lower number is given to the search group closer to the start of the search chain. This forms the most significant bits of the modified block id.
- the modified block id and offset identify the "distance" component of the run(iength, distance) symbol.
- the top of the search tree receives the search results from each of its associated search groups. From those results, it selects the largest run-length and associated offset and block id. These represent the winning run(length, distance), although it is yet to be encoded into a symbol and it may not represent the wining symbol that then goes onto the huffman encoding. Nevertheless, this winning information is passed along to the run-length encoder then onto the Huffman encoder.
- an optional feature of the search group is to analyze ways for combining mniiength, distance ⁇ symbols from two adjacent search blocks.
- the last two bytes from one search block combined with the first two bytes from the next search block may provide a run-length of 4 rather than 2 adjacent run-lengths of 2.
- This optional feature is another reason that it is better to indicate the starting index of a ma tch because the ending Index may be in a different search block.
- the search group can combine two runflength, distance) symbols by just adjusting the length and not modifying the starting index or associated block-id. This method can increase the compression ratio at the expense of logic that is only trivially more complex:.
- the second search block was also required to look for matches at the start of its buffer against bytes in the match buffer that were not the most significant - in this case, it would have detected a 2-length match of bytes 0 and 1 of its search buffer with bytes 2 and 3 of the match buffer. This can then be combined with a 2-length match that the previous search block was already doing when it compared the upper two bytes of its search buffer with bytes 0 and 1 of the match buffer. By combining these two 2-iength matches, it's as if the first matching search buffer had a copy of the first two bytes of the next search block and could directly detect the 4-length run.
- step 411 the search starts and the buffer is filled in step 412.
- step 412 the match buffer is full there is a point in time when there are no valid bytes of data in the search buffer and the system checks for valid bytes at step 413.
- step 414 the system checks whether there are valid bytes. As the clock cycles go by, the search buffer has more and more valid bytes until it is full at step 415. At that point, data shifted out at step 416 of the search buffer is simply discarded at step 416,
- the search does not have to operate in one single cycle as seen in figure 5, It can be a pipelined operation.
- ⁇ Cycle 1 step 511 Byte 1 moves into the distributed match buffers, creating match-buffer state 1.
- ⁇ Cycle 2 step 512 The search blocks analyze match-buffer state 1.
- Byte 2 moves into the match buffers, creating match-buffer state 2.
- ⁇ Cycle 3 step 513 The search groups analyze search block state 1, the search blocks analyze match buffer state 2, the next byte creates match buffer state 3, ⁇ Cycle 4 step 514; The search top analyzes search group state 1, the search groups analyze search block state 2, the search blocks analyze match buffer state 3, the next byte creates match buffer state 4.
- the pipeline does not degrade processing bandwidth, it only adds the latency of a small number of clock cycles. But it allows a faster clock cycle and a far easier place-and-route of the FPGA or ASIC.
- search-1 is a smaller design than search-2.
- Search-1 can have a reduced compression ratio compared to search-2, but when a small design is important, search-1 can generate impressive results when used within the present embodiments.
- search-1 may not scale well as the search blocks carry more than 8 byte or the run-lengths exceed 4 bytes. This is due to the geometrically increasing number of comparators required. At some point this leads to a lower maximum clock frequency and then to a design that can't be successfully placed and routed on silicon,
- Search-2 is for designs seeking higher compression ratios via larger search-blocks and/or longer run-lengths.
- the buffer is right shifted on every cycle a new byte of uncompressed data arrives.
- Two bytes from two different sources are shifted into different parts of a search block's buffer and 2 bytes are shifted out from two different parts of the buffer, in particular a newly arriving uncompressed byte is shifted into bufferfO] of all the search blocks and the data right-shifted out of buffer[2] (the run-length - 1 position) is discarded.
- Daisy chain data from the prior search-block buffer byte, buffer)?] is shifted out and it becomes shifted in data for buffer[3] (the run length position) of the next sequential search block, thus creating a daisy chain linking all the search block buffer[3..7]'s together.
- its shift-in data for buffer[3] is the shift-out data of buffer(2j. How this shift scheme operates will become more evident as we continue.
- c2 (cycle 2) is the first cycle the full run-length has entered the lower 3 bytes of the buffer.
- caS! that subset of the buffer the match buffer.
- C arrived in cl D arrived, and in c2 E arrived.
- the lower case letters of the search buffer are from different parts of the incoming data stream than the upper case letters of the match buffer.
- the upper case Setters of the match buffer are the most recently arriving data and the lower case letters are older data, even older data for the next search block in the daisy chain.
- Cycle 2 has a full run-!ength of arriving data so our example will start from there.
- the objective of cycle 2 is to find all the possible run-length of 2 or more that include E, the most recent byte, anywhere in the search buffer, which holds whatever bytes cdefg represent.
- Eg of Ce means E is compared to g, D is compared to f, and C is compared to e. If they all match then the run-length is 3. If only "Eg of” match then the run-length is 2. The diagram does not show a "Eg” only case since a run-fength of 1 is not usable.
- E[gfeti]" means E is compared with f, g, e, and d, etc. This Is the same Information as in the previous diagram for cycle 2 except it only shows what comparisons are done and not how the run-lengths are formed.
- c2 has all the match data required for run-length analysis by only having the first set of comparators.
- the physical match buffer is actually only 1 element. Removing the unneeded elements saves a iot of flip flops, especially since the match buffer started out being as many bytes as the longest run-length and this is required for every search block. For 64 search blocks and a run-length of 16 bytes, this reduces the flip flop count from 8192 to 512,
- search-2 For a run-length of 16 and a block-size of 32, this is 4216 flop flops per search block. That's the reason search -2 is larger than search-1. However, search-2 only has b-1 comparisons whereas search-1 would have in the order of 500 comparisons whose results would need to be registered but each comparison is S bits and that leads to too many wires for a single clock cycle - in the order of 4,000).
- the longest run-length for the search block is selected and that length, along with the search clock byte number, is sent to the search group as part of the run(iength, distance) determination.
- the search group for search-2 is the same as that for search- 1 except the match buffer has been eliminated from the search-Ts search group.
- the search top for search-2 is the same as that for search-1.
- the run-length encoder 614 receives the winning runflength, distance) 612 from the search top 611, where the search top contains the search groups and search blocks, which is what the uncompressed data runs through, and it also receives a copy of the uncompressed data stream 613 as it arrives byte-by-byte and maintains its version of the match buffer, which may be wider or narrower than those in the search machine.
- Their width is a function of the largest run-length sought.
- the run-length encoder does not do run-length matching and therefore can have few bytes. It conceptually needs at least as many bytes to cover the search machine latency so that the most significant match byte corresponds to resuit currently arriving from the search block.
- the run-length encoder will generate one of the following rcode symbols and pass it onto the Huffman encoder 615;
- the run-length encoder will disqualify a runflength, distance) whose length is 0 or 1, In this case, the most significant byte of the match buffer has to be encoded as one of the other options above, A run-length of 0 is obviously not a run-length match at all. A run-length of 1 is effectively encoded as an rcode for a text symbol and therefore not required as a run-length rcode.
- the run-length encoder analyzes for repeat-bytes in the match buffer. If the most significant byte of the match buffer Is consecutively repeated, the length of that sequence can be encoded as a rbyte symbol.
- the match buffer in the run-length encoder can be wider than what is minimally required so as to accommodate a chosen maximum rbyt length.
- This feature is optional and may increase compressibility with minimal hardware impact. if the length of the r!en exceeds that of the rbyt, it is chosen in preference to the rbyt. If neither the rien or rbyt qualify then the text symbol is chosen instead.
- the winning rcode symbol encodes a certain number of bytes of the uncompressed data stream. This number is called the skip count and is determined as follows:
- the skip count determines how many subsequent bytes of uncompressed data egress the match data are discarded, along with any associated run(tength, distance) and rbyt symbols since they all represent bytes of the uncompressed data that have already been processed into a pending run-length symbol.
- a null symbol takes the place of the discarded symbols and it indicates no new valid symbol for the current cycle.
- the pending rcode symbol is then compared against the prior emitted rcode symbol. If it is a duplicate symbol it is withheld and may be replaced by a repeat-command, rcmd, rcode symbol instead.
- the rcmd symbol encodes how many times the same pending code would have been emitted. When the first non-duplicate code occurs, the rcode symbol emitted is an rcmd symbol along with the count. This feature is optional and may increase .compressibility with minima! hardware impact.
- the rcode is yet to be encoded into its final form.
- the null symbol has no encoding.
- the remaining codes (text, rlen, rbyt, rcmd, spc!] are encoded with a unique static width identifier, called an rcode, and possible additional extra bits.
- the rcode is what will be Huffman encoded and therefore its width is mostly irrelevant.
- the eof symbol is emitted after the final byte of the uncompressed data stream has been processed.
- the special rcode symbol can be used to indicate special actions, such as:
- the rcode is intended to simplify the process of the Huffman encoding.
- An example 10-bit encode scheme for rcodes is as follows.
- Codes 256-263 special symbols, 256 means eof, others have optional meanings.
- Codes 264-273 rbyt symbols
- Codes 274-283 rcmd symbols
- Codes 284-613 r!en symbols
- Codes 614-1023 unused
- the rbyt and rcmd symbols have an associated count.
- the count Is broken into 10 ranges and the range id, 0 through 9, is added to the starting static identifier for the symbol, 264 and 274 for the rbyt and rcmd respectively.
- the bits of the count from rbyt and rcmd with the most significant bit removed set bit comprise the extra bits that will accompany the rcode symbol when transmitted to the huffman encoder.
- the range table is:
- an "rcmd 34" has a range id of 5 so its run-length symbol is 279. Bits 3:0 of 34 is 2. So the fuii run-length symbol is "279 and 4 extra bits with the value 2", A run-length code of 279 means rcmd ⁇ n> where n - 32 + ⁇ the next 4 bits>, which is rcmd 32+2, or rcmd 34.
- each range id will require a new huffman code.
- a given implementation may choose to support only range id's 0-3 and limit rcmds to 15 or less.
- rien symbols have an associated length, block id and offset.
- the block id is encoded similarly but not exactly the same as the count for rbyt and rcmd symbols.
- the table is:
- the extra bits for the offset is the 3-bit offset va!ue 2, or 010 in binary. So rien 5 block-id 8 offset 2 is encoded as rcode 337 and extra bits 001 and 010 for the ' block-id and offset respectively.
- an rcode is 328 with extra bits 11 for the block and 100 for the offset. 326-284 - 42.
- the code table is an example of how 1024 entries can be partitioned. There are many ways this can be done that give variations in the maximum rien lengths that can be encoded, or the maximum number of search blocks that can be encoded, etc.
- the run-length encoder can be pipelined and the search block pipeline can be thought of as running into the run-length pipeline to form a longer pipeline. Similarly for the Huffman encoder. Thus the entire compression is a pipeline of pipelined sub-units, thereby having a high throughout rate and a fast cycle time.
- the result of the run-length encoder is a 10-bit rcode derived from its .symbol table plus one or two optional extra-bit fields.
- the Huffman encoder will recode the rcode into a shorter width code and append the extra bits. The number of extra bits is directly related to, and can be deduced from, the rcode (and therefore also from the rcode's associated Huffman code).
- the run-length encoder may use any element from its symbol table and every one of these elements must be processed by the Huffman encoder.
- the rcode set-size can be varied depending on the objectives of a particular implementation. Ail implementations will support the 256 literal symbols and these plus an eof symbol is all that is required. (In this case, the compression consists of just converting the literal symbols of all possible 8-bit text bytes into a set of huffman encoded symbols that generally have a smaller average number of bits per byte.) But most implementations will include additional rcode symbols, such as various run-!engths, repeat bytes etc. How many of these additional symbols will be included depends on the objectives of the implementation.
- the huffman encoder must be able to process all the possible rcode symbols that will be generated by the run-length encoder but no more - it does not need to support the entire 1024 rcode-symbol-space of the run-length encoder.
- the symbols in that subset have the same value as that same symbol in any other subset.
- the value of the rcode symbols that have to be processed by the Huffman encoder are not changed - there's just some that are no longer used. From the perspective of the Huffman encoder, nothing actually has to change -the Huffman encoder design can remain as is.
- the Huffman encoder converts the rcode with the appended bits from the run-length encoder into an hcode with the appended bits, where the hcode is a Huffman encoding of the rcode.
- Huffman codes from an alphabet of symbols and their associated frequencies is well documented in the art, as are variations on the theme. In some embodiments, the following variations lend themselves to best mode but are not required:
- hcodes have a range for the number of bits in each code, with shorter lengths being assigned to more frequent codes. But too many widths make high-speed hardware encoding and decoding difficult. It is advantageous to bundle groups of low frequency codes and assign them a common frequency so that their assigned hcode lengths are the same. Since these codes are low frequency, the effect on compression ratios can be negligent. 2.
- the values of the assigned Huffman codes can be re-ordered or re-assigned among codes that are decoded together (maybe because they have a common length ⁇ . For example, all 5-bit hcodes may have two particular bits always set to 1 and only the other 3 bits will vary from one another. Furthermore, these same two bits would not be set for hcodes of lengths other than 5 bits.
- the h codes along with any appended extra bits, are generated they are concatenated into a contiguous bit stream that is then segmented into unit size chunks according to the width of the output interface of the compression circuit.
- the output may be a byte-oriented interface, and it may include protocol signals that allow the external logic to control the data-rate from the compression circuit, etc. The need for buffering and control is well understood in the art.
- the compressed data stream can take on any bit length and this is problematic in a system that deals with data in units of bytes, or words, etc. Therefore, the end of the compressed stream may need to be padded with appended zeros to make it an integral length the desired data-unit size.
- the purpose of an eof rcode is to indicate that the padded data that follows should be discarded and not decoded as if they are a Huffman code.
- the compressed stream may also include an appended CRC of the uncompressed data and used as an integrity check for the decompressed data.
- Some embodiments address what hcode values are assigned to each of the rcode symbols. Methods for doing this are well understood in the art and their genesis is the pinnacle paper by David Huffman. It is a method whereby codes of varying bit-lengths are assigned to symbols so that the recoded data stream is smaller than the original data stream by the greatest possible amount by assigning shorter codes to more frequently occurring symbols. The frequency of symbols is clearly dependent on the particular data stream. However, as understood in the art, good compression can generally be achieved across a broad range of data by using the frequency of symbols in a broad range of data. It may not be the optimal Huffman encoding for a particular segment of data but on average will work weii over many types of data sets.
- Some of the present embodiments generate the frequencies of each rcode in a given rcode set from these corpora and a Huffman encode is generated for each of the rcodes. It should be noted that even if an rcode has a frequency of zero, it must still be assigned an hcode mapping - just because it may not have occurred in the reference corpora doesn't mean it won't occur in some other data set that is to be compressed.
- decompression follows a similar flow to compression but in reverse.
- the input data is a sequence of Huffman codes whose values and lengths are decoded and associated extra bits extracted. They are converted into their equivalent rcode and passed onto a run-length decoder.
- the run-length decoder executes the instruction of the rcode. For example, an rcode value less than 256 is simply the literal value of the next byte to be output. An rbyt code indicates the value of a byte and how many times it is to be replicated in the output stream. An rcrnd code tells the run-length decoder how many times to repeat the last command it executed. An eof code may tell the decoder to treat the next incoming 4 bytes as a reference CRC value that should match the CRC regenerated from the output data and then terminate the decompression and discard any padding data that follows.
- An den code is a little more difficult in that it refers to replicating data that was output some time back, in some embodiments, this requires a history ram to cache the output data. But the history ram does not need to maintain any data further back than the number of bytes in the compression units search buffer since that defines the largest distance in the rlen lookback. This is made more complex by the fact that some rcodes can specify long run lengths that may be cumbersome and inefficient to implement or may have a significant adverse effect on the achievable dock frequency and slow the design. Recoding is a solution to this problem. For example, a runflength, distance) symbols of run(32,100) can be recoded into 4 smaller rcodes which are then executed in sequence.
- the next rcode is run(4,3), meaning copy the 4 bytes starting from the 3 rd byte ago.
- the 0 th byte ago is 0 so this instruction will result in the uncompressed data stream becoming:
- run(4,3) can be recoded as run(2,5) run(2,5), which results in:
- run(ien,dist) can be decomposed into a sequence of n copies of: run ( len/n, dist+len-(len/n) ⁇
- run(32,100) can be recoded as 6 lots of run(32/8),10G+32-(32/8)), or 8 * run(8,128).
- run(8,2) transforms the data as follows:
- run(8,2) can be recoded as a sequence of 4 run(2,8), which gives the following identical transform;
- the recoding can have zero effect on performance. If the outgoing data-path is 4 bytes wide, nothing is gained by decoding run(32,l-00 ⁇ versus of a sequence of (run(4,128 ⁇ _ Yet the difference in data-path complexity and maximum cycle-time is dramatic.
- run(7,2) can be recoded as a sequence of 3 run(2,7) and a run(l,8), which gives the following identical transform:
- the mathematics of the recoding can be mostly statically predetermined and hard-coded into a table.
- the table identifies for every run length how long the new sequence is, and what the new length of each sequence element is, and what the addition term for the distance of each element of the sequence is.
- This table can be constructed so that all elements of the sequence are identical except possibly the last, as shown in the previous example. For the example of run(7,2) the 7 th table entry, because It's a run(7,x), would indicate:
- the recoding parameters can be determined on the fly as a mathematical transform.
- Some embodiments have been structured to allow each functional operation to be pipelined, thereby allowing high throughput bandwidth and a high clock cycle frequency.
- the Huffman decode, the rcode recode, and the rcode decode can alt be part of a single pipeline and each of these subunits can themselves be pipelined.
- Figure 7 is an illustration of the compression scheme according to some embodiments.
- the Input Buffer 711 indicates when it is ready to receive new data using the "ready" signal. This allows a new byte of uncompressed data to be written into Input Buffer 711 and sent to the CRC Generation 721 where It is used to generate a CRC checksum over the uncompressed data, as well understood in the art.
- Input Buffer 711 decouples any paths between the logic outside Figure 7 from paths inside Figure 7 and this makes it easier to meet cycle-time constraints.
- input buffer 711 sends the received data to Search Control and Collation 712 which coordinates the run-length search operation between all the Pattern Search blocks, 712, receives all the results from the Pattern Search blocks, 722, and determines the best run-length that has been found.
- Pattern Search 722 contains the Search Blocks that have been described earlier and the Search Groups that may encapsulate groups of Search Blocks, as has been described earlier. The best run-length found by the search is sent to the block of Precode 7
- Precode 713 determines what symbol will be huffman encoded by Encode 714.
- the symbol to be encoded may be determined from the best run-length found by search control and collection 712, a repeat byte symbol (along with the length of the repeated byte sequence), a literal byte, or a special code, or a null symbol, as described earlier.
- the symbol to be encoded, as determined by precede 713, is sent to Encode 714,
- Encode block 714 translates the symbol from Precode 713 into a huffman encoded symbol, as described earlier.
- the huffman encoded symbol is sent to block Recode 715.
- Recode 715 may provide further symbol reduction by compressing a duplicated sequence of symbols from 714 into fewer symbols. For example, if encode 714 generates the same Huffman encoded symbol 5 consecutive times, this may be reduced to the first symbol from encode 714 and the subsequent 4 symbols from ncode 714 are recoded as one symbol that indicates "replicate the prior symbol 4 times.”
- the resulting symbol of recode 715 is sent to Output Buffer 716,
- Output Buffer 716 indicates when it is ready to send out compressed data using the "ready" signal. This allows logic external to the compression circuit to consume the compressed data at its own rate and also decouples the timing in paths internal to the compression circuit with those outside the compression circuit. Output Buffer 716 may also perform any required padding at the end of the compressed data stream to make it an integral number of bytes and append the CRC checksum from CRC generation 721.
- Figure 8 illustrates the decompression scheme according to some embodiments.
- the input Buffer 811 indicates when it is ready to receive new data using the "ready" signal. This allows a new byte of compressed data to be written into Input Buffer 811.
- input Buffer 811 decouples any paths between the logic outside Figure 8 from paths inside Figure 8 and this makes it easier to meet cycle-time constraints, input Buffer 811 sends the compressed data to Precode 812.
- Precode 812 separates the compressed bit stream into a sequence of Huffman symbols.
- the Huffman symbols are sent to Decode 813.
- Decode 813 decodes the Huffman symbols into their non-huffman encoded counterparts. This reverses the operations performed by Encode 714 of Figure 7. The non-huffman symbols, compression symbols, are sent to Recode 814,
- Recode 814 optionally performs simplification of the compression symbols. For example, a run-length-8 symbol may be broken into 2 equivalent-function run-length-4 symbols so as to simplify the inflate function of inflate 815. The resulting compression symbols are sent to Inflate 815.
- Inflate 815 performs the operation of the compression symbol to regenerate the original uncompressed data stream. For example, a repeat-byte-16 command will be inflated to 16 duplicated bytes specified by the repeat-byte command.
- the inflating of a run-length command requires a copy of the decompressed data and this may be maintained in a ram. Run-length commands are limited in how far they can look back into the decompressed because that limit is enforced by the size of the search buffer contained in the blocks of Pattern Search 722 of Figure 7. This limits the size of the ram inside inflate 815 and it's size is not related to the total amount of decompressed data.
- a run-length code will indicate, for example, copy the 8 bytes starting from 55 bytes back into the next 8 bytes of the decompressed data stream. Depending on the structure of the ram, this will be converted into the ram location or locations that will be read to extract the data and these types of processes are well understood in the art.
- the resulting decompressed data is sent to Output Buffer 816.
- Integrity Check 821 receives inputs from precode 812, decode 813, and inflate 815 and sends integrity status information to Output Buffer 816.
- Data from Precode 812 indicates whether the received compressed data stream can be broken into valid Huffman codes.
- Decode 813 indicates whether the compression symbols can be correctly decoded. For example, if a run-length code specifies to copy 4 bytes of data from 300 bytes back but only 200 bytes have been presently decompressed then compression symbol is the result of corruption in the compressed data.
- Inflate 815 provides a copy of the decompressed data to integrity Check 821 so it can generate the CRC of the decompressed data and compare it to the CRC of the original uncompressed data that was appended to the end of the compressed data by Output Suffer 716 of Figure 7. If integrity Check 821 detects any condition that indicates the decompressed data may not match the original uncompressed data, it informs Output Buffer 816,
- Output Buffer 816 indicates when it is ready to send out decompressed data using the "ready" signal. This allows logic external to the compression circuit to consume the compressed data at its own rate and also decouples the timing in paths internal to the compression circuit with those outside the compression circuit. Output Buffer 816 also indicates when the last byte of decompressed data is being sent out and if any integrity errors were detected by the decompression process.
- the following code illustrates the compression and decompression schemes according to some of the present embodiments.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Un système et un procédé de compression et de décompression de matériel consistant à identifier une taille de tampon de recherche selon une exigence matérielle et un taux de compression souhaité, et à décomposer un tampon de recherche en blocs interrogeables indépendants de telle sorte qu'ils sont acheminés sur du silicium et satisfont une exigence de temps de cycle. Le système utilise un module de commande de recherche et de collation comprenant un module de recherche de motif, un module de pré codage en communication avec le module de recherche de motif, un module de codage en communication avec le module de pré codage, et un module de codage en communication avec le module de pré codage, et un tampon de sortie en communication avec le module de pré codage.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202062981535P | 2020-02-26 | 2020-02-26 | |
| US62/981,535 | 2020-02-26 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021173874A1 true WO2021173874A1 (fr) | 2021-09-02 |
Family
ID=77492017
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2021/019734 Ceased WO2021173874A1 (fr) | 2020-02-26 | 2021-02-25 | Système et procédé de compression de données |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2021173874A1 (fr) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115882867A (zh) * | 2023-03-01 | 2023-03-31 | 山东水发紫光大数据有限责任公司 | 一种基于大数据的数据压缩存储方法 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030138158A1 (en) * | 1994-09-21 | 2003-07-24 | Schwartz Edward L. | Multiple coder technique |
| US20100223237A1 (en) * | 2007-11-05 | 2010-09-02 | University Of Florida Research Foundation, Inc. | Lossless data compression and real-time decompression |
| US20140132429A1 (en) * | 2012-11-10 | 2014-05-15 | John Conant Scoville | Method for data compression and inference |
| US10224957B1 (en) * | 2017-11-27 | 2019-03-05 | Intel Corporation | Hash-based data matching enhanced with backward matching for data compression |
-
2021
- 2021-02-25 WO PCT/US2021/019734 patent/WO2021173874A1/fr not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030138158A1 (en) * | 1994-09-21 | 2003-07-24 | Schwartz Edward L. | Multiple coder technique |
| US20100223237A1 (en) * | 2007-11-05 | 2010-09-02 | University Of Florida Research Foundation, Inc. | Lossless data compression and real-time decompression |
| US20140132429A1 (en) * | 2012-11-10 | 2014-05-15 | John Conant Scoville | Method for data compression and inference |
| US10224957B1 (en) * | 2017-11-27 | 2019-03-05 | Intel Corporation | Hash-based data matching enhanced with backward matching for data compression |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115882867A (zh) * | 2023-03-01 | 2023-03-31 | 山东水发紫光大数据有限责任公司 | 一种基于大数据的数据压缩存储方法 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN108768403B (zh) | 基于lzw的无损数据压缩、解压方法及lzw编码器、解码器 | |
| US9054729B2 (en) | System and method of compression and decompression | |
| US6320522B1 (en) | Encoding and decoding apparatus with matching length detection means for symbol strings | |
| Franaszek et al. | Parallel compression with cooperative dictionary construction | |
| US5608396A (en) | Efficient Ziv-Lempel LZI data compression system using variable code fields | |
| JP3009727B2 (ja) | 改良形データ圧縮装置 | |
| CN107836083B (zh) | 用于语义值数据压缩和解压缩的方法、设备和系统 | |
| US7538696B2 (en) | System and method for Huffman decoding within a compression engine | |
| CN111294053B (zh) | 硬件友好的数据压缩方法、系统及装置 | |
| CN108702160B (zh) | 用于压缩和解压缩数据的方法、设备和系统 | |
| US5877711A (en) | Method and apparatus for performing adaptive data compression | |
| WO1994022072A1 (fr) | Traitement de l'information utilisant une analyse syntaxique insensible au contexte | |
| US11677416B2 (en) | Hardware implementable data compression/decompression algorithm | |
| WO2021173874A1 (fr) | Système et procédé de compression de données | |
| CN1848692B (zh) | 编码设备、解码设备、编码方法和解码方法 | |
| EP0438954A1 (fr) | Méthode pour décoder des données comprimées | |
| US5960115A (en) | Method for decompressing multiple codes in a single clock cycle | |
| US5949909A (en) | Apparatus for decompressing multiple codes in a single clock cycle | |
| JPH08223055A (ja) | 可変長コードデコーダ | |
| CN1656688B (zh) | 在压缩之前处理数字数据 | |
| JP2005521324A (ja) | 損失のないデータの圧縮および圧縮解除方法および装置 | |
| JPH05241776A (ja) | データ圧縮方式 | |
| US20080001790A1 (en) | Method and system for enhancing data compression | |
| Huang et al. | Windowed Huffman coding algorithm with size adaptation | |
| Kadhim et al. | Enhancing lossless compression algorithms using combining LZW and arithmetic coding |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21761554 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 21761554 Country of ref document: EP Kind code of ref document: A1 |