[go: up one dir, main page]

WO2004080080A1 - Apparatus and method for compression and decompression of digital data - Google Patents

Apparatus and method for compression and decompression of digital data Download PDF

Info

Publication number
WO2004080080A1
WO2004080080A1 PCT/EP2004/001839 EP2004001839W WO2004080080A1 WO 2004080080 A1 WO2004080080 A1 WO 2004080080A1 EP 2004001839 W EP2004001839 W EP 2004001839W WO 2004080080 A1 WO2004080080 A1 WO 2004080080A1
Authority
WO
WIPO (PCT)
Prior art keywords
string
bits
compression
compression algorithm
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2004/001839
Other languages
French (fr)
Inventor
Enzo Criscione
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ALCOM ALGORITMI COMPRESSIONE DATI Srl
Original Assignee
ALCOM ALGORITMI COMPRESSIONE DATI Srl
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ALCOM ALGORITMI COMPRESSIONE DATI Srl filed Critical ALCOM ALGORITMI COMPRESSIONE DATI Srl
Publication of WO2004080080A1 publication Critical patent/WO2004080080A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Definitions

  • the present invention is in the field of digital data compression and 5 decompression, with particular reference to compression and decompression of digital data without loss of information. Background Art
  • Compression methods which often vary according to the field of application and to the type of information to be managed, substantially comprise two different compression techniques: lossless data compression and lossy data compression.
  • Lossless compression is a compression scheme in which the data, once decompressed or reconstructed, exactly match the original data.
  • Arithmetic coding is a method that maps the source sequences to intervals between a pair of numbers.
  • Entropy coding implies a fixed- to variable-length coding of statistically independent source symbols.
  • the schemes of Huffman coding map source symbols to binary codewords of integer length, so that no codeword is a prefix of a longer codeword.
  • Predictive coding is a form of coding in which a prediction for the current event is made on the basis of previous events.
  • Universal coding is a form of coding that is designed without knowing the source statistics but converges to an optimum code as the length of the source sequence increases.
  • Ziv- Lempel coding is a family of schemes based on dictionary codes that encode strings of symbols by sending information regarding their position in a dictionary. For example, the LZ77 family uses a part of the string that has already been encoded as dictionary, whereas the LZ78 family builds a dictionary of strings encountered in the data stream.
  • Schemes based on LZ77 are used commercially to create the well-known ZIP-type files, whereas schemes based on LZ78 are widely used for example in GIF-type compression of image data.
  • the compression ratio of a digital data compression method is defined as the ratio between the size of the original data and the size of the compressed data.
  • the aim of the present invention is to provide a lossless data compression scheme with improved compression ratios with respect to the background art, with particular reference, albeit not exclusively or limitatively, to the field of digital imaging and photography.
  • codec for lossless data compression and/or decompression that can be implemented as software or hardware and can be used to facilitate data compression on a large scale for applications in which it is necessary to process photographic information or complex images in general.
  • a method for processing a digital file comprising a string of bits A, which comprises the steps of: compressing the string of bits A into a string of bits B by applying a lossy compression method; decompressing the string of bits B into a string of bits A' by applying the decompression algorithm that is the dual of the lossy compression method used for compression from A to B; subtracting the string A' from the string A, obtaining a string of bits C; compressing the string of bits C into a string of bits D by applying a lossless compression method; and storing the sequence of strings B, D.
  • a digital data encoding device that comprises: means for compressing a string of bits A into a string of bits B according to a lossy compression algorithm; means for decompressing the string of bits B into a string of bits A' according to the algorithm that is the dual of the lossy compression algorithm used for compression from A to B; means for calculating by subtraction between said string A and said string A a string of bits C; and means for compressing the string of bits C into a string of bits D according to a lossless compression method.
  • the lossy compression method is preferably chosen from the group that comprises the JPEG compression algorithm, the JPEG2000 compression algorithm and the DIVX compression algorithm
  • the lossless compression method is preferably chosen from the group that comprises predictive coding compression algorithms, Huffman coding compression algorithms, Ziv-Lempel coding compression algorithms, numeric coding compression algorithms, and universal coding compression algorithms.
  • Figure 1 is a flowchart of the compression method according to the present invention
  • Figure 2 is a flowchart of the corresponding decompression method
  • Figure 3 is a block diagram of an example of format for storing the data obtained by means of the inventive method according to the present invention
  • Figure 4 is a block diagram, illustrating in depth the compression method according to the present invention according to a preferred embodiment.
  • the compression routine begins in step 110, when the file A to be compressed is read.
  • the file comprises a string of bits that identifies a positive integer of any size.
  • the string of bits A is compressed by means of a known lossy data compression algorithm, for example the JPEG compression algorithm or the JPEG2000 algorithm.
  • step 115 Before performing compression, in step 115, it is possible to set the parameters of the chosen compression algorithm: for example, when using the JPEG algorithm, the quantization coefficient, which varies from 0 to 99 and affects the degree of loss of information with respect to the original image.
  • the result of the lossy compression step 120 is a string of bits B that is smaller than the initial string A.
  • step 130 the decompression algorithm that is the dual of the compression algorithm used in step 120 is applied to the string B.
  • the result is a string of bits A', whose dimensions are typically equal to those of the initial string A.
  • the data contained in the string A do not coincide with the data contained in A', since as a consequence of the lossy compression performed in step 120 part of the information has been eliminated and therefore it is not possible to reconstruct exactly the initial string A.
  • step 140 an arithmetic difference is computed between the initial string A and the string A' as reconstructed (following lossy compression and decompression of the initial string A), producing a remainder string C.
  • step 150 the remainder string C is compressed with a lossless compression algorithm, producing a string D.
  • step 145 it is also possible to set the parameters of the chosen compression algorithm.
  • the two resulting strings B and D are then saved in step 160. If the two strings are concatenated, as is convenient, in a single file or sequence of bits A COM P, sufficient information for correct reconstruction of the original file, for example information related to the size of one of the two strings B or D, is introduced in said final file, so as to be able to distinguish said two previously obtained data groups.
  • Figure 3 shows the header 310 of the file and the data portion 320, which comprises the string B 321 and the string D 322.
  • the header 310 is constituted by five bytes, the first of which comprises three fields 311, 312 and 313 of two, three and three bits respectively.
  • the first two bits, designated by the reference numeral 311, are used to identify the type of data being processed. For example, "00” can designate a generic or unclassified file, "01” can designate an image file, "10” can designate a sound file, and "11” can designate a text file.
  • the next three bits of the header are used to identify the lossy compression algorithm that is used: for example, "001" can identify the JPEG algorithm, "010” can identify the JPEG2000 algorithm, and so forth.
  • the next three bits can be used to identify the lossless compression algorithm; for example, "001" can identify a Ziv-Lempel algorithm, "010” can identify a predictive coding algorithm, for example the PMD algorithm, "011” can designate a Huffman coding algorithm, and so forth.
  • the next four bytes identify the size of the string B, designated by the reference numeral 321, which is obtained by lossy compression.
  • the size of the final compressed file A C OMP the size of the header 310
  • Figure 2 illustrates the process for decompressing the compressed file A COMP and the reconstruction of the original file A, which occurs dually with respect to the compression process described above.
  • step 210 the header 310 of the compressed file A COMP is read so as to identify the type of file (generic, image, sound, text) that has been compressed and which algorithm has been used for lossy compression of the source file A and lossless compression of the remainder string, as well as the size of the string B.
  • step 220 the data block 321 related to the string B is decompressed by applying the dual of the algorithm used during compression in step 120, accordingly obtaining the string A.
  • the starting point of the string D 322 is calculated by adding to the size of the header 310 the size of the string B 321 : the remaining portion of the file ACOMP is in practice the string B 322.
  • step 230 the string D is decompressed by using the dual of the lossless compression algorithm applied in step 150, so as to regenerate the remainder string C.
  • step 250 the resulting strings A' and C thus obtained are added dually with respect to what was performed in step 140. The result of this sum is identically equal to the source string A.
  • step 410 the string of bits A, i.e., the source file, is loaded into the memory of a computer and the values of the variables are initialized:
  • DBEST designates the best compression obtained up to then on the source file during the iterative process and is initially set to the value DO, which indicates the original size of the string A to be compressed and is therefore a value that indicates a nil degree of compression;
  • IBEST indicates the value of the coefficient that, when applied in the lossy compression step 415, generated the best final value DBEST;
  • FINE is a flag that indicates whether the. iterative process has ended or not and is initially set to the value "FALSE".
  • step 415 the quantization coefficient of the JPEG lossy compression algorithm is set to an initial value that is at one end of the possible range, for example the value "99". This value is assigned to the index I, which then assumes gradually all the possible values within the chosen range (from “99" to "0" in the example of Figure 4).
  • step 415 the string A is compressed into B.
  • the subsequent steps exactly reproduce the corresponding steps described with reference to Figure 1.
  • the string B is decompressed by using the algorithm that is the dual of the compression algorithm used in step 415, accordingly obtaining the string A; in step 425, subtraction is performed between the initial string A and the reconstructed string A', obtaining the remainder string C; in step 430, a lossless compression of the remainder string C is performed, obtaining a string D, until the step 435 is reached. If the FINE flag is set to the value TRUE, the strings B and D are stored in the file A COMP together with the information required for their correct identification; otherwise, the iteration proceeds as described hereafter.
  • Step 445 checks whether the size DIM of the file compressed in the last cycle is smaller or not than the best final size DBEST obtained up to then. If it is, the value of DBEST is modified and set equal to DIM and the index IBEST is set equal to the current value of I. In both cases, the algorithm moves on to the block 455, where the index I, used as a coefficient for the lossless compression algorithm, is decremented by one unit and the compression process is reiterated starting from the block 415. If the test performed in step 460 indicates that the value of I is lower than the lower end of the iteration interval ("0" in the example), the FINE flag is set to TRUE in step 465 and the value of I is reset to IBEST, which retains the index that allowed to achieve the best result. The compression process is then performed one last time by using as a parameter for the lossy compression algorithm the stored IBEST coefficient, and the conditional control of the block 435 ends the cycle by storing the final string A COMP -
  • This can be done for example by adding to the header of each block related to the compression of a portion of the original file the size of the string D obtained by lossless compression of the string C A-A, in addition to the size of the string B obtained by lossy compression of the source string A.
  • the embodiment described above can be implemented in many different manners and the present invention is not limited to a particular implementation.
  • the compression and/or decompression method can also be implemented with other known hardware, such as for example a digital signal processor (DSP), a field programmable gate array (FPGA), an application-specific integrated circuit chip (ASIC), a microcontroller with coded logic, or the like.
  • DSP digital signal processor
  • FPGA field programmable gate array
  • ASIC application-specific integrated circuit chip
  • microcontroller with coded logic or the like.
  • DSP digital signal processor
  • FPGA field programmable gate array
  • ASIC application-specific integrated circuit chip
  • microcontroller with coded logic or the like.
  • the application of the invention in existing codecs is particularly simplified by the fact that the method according to the invention integrates and evolves conventional compression methods that have already been used in various fields, allowing new levels of compression without data, loss.
  • one extremely advantageous use of the invention is in application to the field of digital photography.
  • Most digital still cameras in fact use a lossy JPEG coding for storing the data that represent an image.
  • the quality of the image although being deemed acceptable, does not achieve the physical limit determined by the lens of the camera, and therefore in order to obtain optimum pictures it is currently necessary to disable, where possible, JPEG compression and store an uncompressed image in memory, with obvious inconveniences due to the impossibility to keep in memory a satisfactory number of photographs and to the increase in the time required to transfer the image on processing devices.
  • the following table 1 lists values obtained by applying the algorithm of Figure 4 to a plurality of photographs, compared with the respective values obtained by applying the best currently known lossless compression algorithm, i.e., the JPEG 2000 algorithm used in lossless mode.
  • the JASPERTM implementation was used for the tests.
  • the first column of the table indicates the initial size of the file to be compressed, expressed in bytes.
  • the second column lists, as an absolute value and as a percentage, the results of the compression performed by using an implementation of the present invention.
  • the JPEG compression algorithm was used for step 415, whereas a Huffman coding algorithm was used for step 430.
  • the third column lists the same data obtained by using the JASPER implementation of the JPEG2000 algorithm.
  • the last column lists the percentage improvement achieved by the implementation of the invention used here with respect to JASPER.
  • the present invention achieves the intended aim and objects.
  • the described method and system allow to achieve lossless data compression with higher compression ratios than the background art.
  • the invention is practical to provide, since its implementation is an inventive integration of known lossy and lossless compression algorithms.
  • numerous modifications are evident and can be promptly performed by the person skilled in the art without abandoning the scope of the protection of the present invention.
  • inventive concept on which the present invention is based is independent of the actual implementation of the software modules, which can be provided in any language and on any hardware platform and also as firmware that can be applied to modern electronic devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)

Abstract

A method for processing a digital file comprising a string of bits A, comprising the steps of: compressing the string of bits A into a string of bits B by applying a lossy compression method; decompressing the string of bits B into a string of bits A' by applying the decompression algorithm that is the dual of the lossy compression method; subtracting the string A' from the string A, obtaining a string of bits C; compressing the string of bits C into a string of bits D by applying a lossless compression method; storing the sequence of strings B, D.

Description

APPARATUS AND METHOD FOR COMPRESSION AND DECOMPRESSION OF DIGITAL DATA
Technical Field
The present invention is in the field of digital data compression and 5 decompression, with particular reference to compression and decompression of digital data without loss of information. Background Art
Over the last decades there has been a gradual diffusion of data processing and data communications devices that are used in most electronic instruments in day-to-day use, such as personal computers, cellular telephones, still cameras, video cameras, multimedia players and so forth.
The possibility to manage data electronically provides considerable advantages even in fields where paper or analog media were traditionally used to store information. Thus, for example, computer technology has imposed itself in the fields of image, text and sound processing, especially because of the practicality afforded by computer technology in reprocessing and storing information and because of the quality of digital data, which is not subject to deterioration. This ubiquitous diffusion, combined with the need to provide increasingly higher quality, entails however the need to manage an enormous amount of data, which leads to a growth in the storage capacities of hardware devices which, especially for cost-related reasons and due to the limited modularity of certain products, is not sufficient to solve the problem of managing a constantly growing mass of data. For this reason, in the fields of storage, archiving and processing of digital data it has always been necessary to resort to compression techniques in order to reduce the physical space occupation of the data stored on a hardware medium of any kind (mass storage media, memories, tapes, optical disks, magnetic disks and so forth). Compression methods, which often vary according to the field of application and to the type of information to be managed, substantially comprise two different compression techniques: lossless data compression and lossy data compression. Lossless compression, as the name implies, is a compression scheme in which the data, once decompressed or reconstructed, exactly match the original data. These compression methods include well-known methods that have been available for considerable time, such as arithmetic coding, entropy coding, Huffman coding, predictive coding, universal coding and Ziv-Lempel coding. Arithmetic coding is a method that maps the source sequences to intervals between a pair of numbers. Entropy coding implies a fixed- to variable-length coding of statistically independent source symbols. The schemes of Huffman coding map source symbols to binary codewords of integer length, so that no codeword is a prefix of a longer codeword. Predictive coding is a form of coding in which a prediction for the current event is made on the basis of previous events. Universal coding is a form of coding that is designed without knowing the source statistics but converges to an optimum code as the length of the source sequence increases. Ziv- Lempel coding is a family of schemes based on dictionary codes that encode strings of symbols by sending information regarding their position in a dictionary. For example, the LZ77 family uses a part of the string that has already been encoded as dictionary, whereas the LZ78 family builds a dictionary of strings encountered in the data stream. Schemes based on LZ77 are used commercially to create the well-known ZIP-type files, whereas schemes based on LZ78 are widely used for example in GIF-type compression of image data.
The compression ratio of a digital data compression method is defined as the ratio between the size of the original data and the size of the compressed data. These compression techniques, which are all well-known in the art, have a performance that when measured in terms of compression ratio is surprisingly limited, especially as regards the compression of multimedia data such as sound and images.
Current data compression algorithms in fact are capable of achieving a lossless compression of 10:1 in total fidelity scenarios, in which none of the original data can be sacrificed during size reduction, but only in the case of data of a particular type, such as for example data that represent a text or data of "scattered" files, i.e., files with particularly large quantities of repetitive and unnecessary or empty data, such as pages of fax transmissions.
In the case of audio or video data, instead, in order to achieve valid compression ratios it is necessary to sacrifice part of the information and therefore adopt lossy compression techniques, eliminating or modifying some of the original data while preserving the sensory functional elements of the data. The best-known examples are found in compression technologies of the JPEG family for still pictures, the MPEG family for moving pictures, and the MP3 family for audio.
Although these lossy compression techniques allow to reconstruct the original picture or sound in a sufficiently faithful manner,. in practice they constitute a necessary alternative solution, since up to now it has not been possible to achieve similar compression ratios without data loss.
Therefore, there is a strongly felt need in the art to provide new compression and decompression schemes that allow compression ratios without data loss, particularly in processing multimedia data, on the same order of magnitude as the compression ratios currently known in which data loss occurs. Disclosure of the Invention
The aim of the present invention is to provide a lossless data compression scheme with improved compression ratios with respect to the background art, with particular reference, albeit not exclusively or limitatively, to the field of digital imaging and photography.
An object of the present invention is to provide a new compression/decompression method that is simple to apply and provide and can be easily integrated with existing hardware devices. Another object of the invention is to provide an encoder/decoder
(codec) for lossless data compression and/or decompression that can be implemented as software or hardware and can be used to facilitate data compression on a large scale for applications in which it is necessary to process photographic information or complex images in general. This aim and these and other objects that will become better apparent hereinafter from the description of the present invention are achieved by a method for processing a digital file comprising a string of bits A, which comprises the steps of: compressing the string of bits A into a string of bits B by applying a lossy compression method; decompressing the string of bits B into a string of bits A' by applying the decompression algorithm that is the dual of the lossy compression method used for compression from A to B; subtracting the string A' from the string A, obtaining a string of bits C; compressing the string of bits C into a string of bits D by applying a lossless compression method; and storing the sequence of strings B, D. This aim and these objects are further achieved by a digital data encoding device that comprises: means for compressing a string of bits A into a string of bits B according to a lossy compression algorithm; means for decompressing the string of bits B into a string of bits A' according to the algorithm that is the dual of the lossy compression algorithm used for compression from A to B; means for calculating by subtraction between said string A and said string A a string of bits C; and means for compressing the string of bits C into a string of bits D according to a lossless compression method.
Advantageously, the lossy compression method is preferably chosen from the group that comprises the JPEG compression algorithm, the JPEG2000 compression algorithm and the DIVX compression algorithm, while the lossless compression method is preferably chosen from the group that comprises predictive coding compression algorithms, Huffman coding compression algorithms, Ziv-Lempel coding compression algorithms, numeric coding compression algorithms, and universal coding compression algorithms.
The intended aim and objects are achieved also with the aid of the corresponding decompression method and device, which operate dually. Brief description of the Drawings Further characteristics and advantages of the present invention will become better apparent from the following detailed description of a preferred but not exclusive embodiment of the method for processing a digital file, illustrated by way of non-limiting example in the accompanying drawings, wherein: Figure 1 is a flowchart of the compression method according to the present invention;
Figure 2 is a flowchart of the corresponding decompression method;
Figure 3 is a block diagram of an example of format for storing the data obtained by means of the inventive method according to the present invention;
Figure 4 is a block diagram, illustrating in depth the compression method according to the present invention according to a preferred embodiment.
Ways of carrying out the Invention With reference to Figure 1, a lossless data compression method according to the present invention is now described in detail.
The compression routine begins in step 110, when the file A to be compressed is read.
Regardless of its size, the file comprises a string of bits that identifies a positive integer of any size. In step 120, the string of bits A is compressed by means of a known lossy data compression algorithm, for example the JPEG compression algorithm or the JPEG2000 algorithm.
Before performing compression, in step 115, it is possible to set the parameters of the chosen compression algorithm: for example, when using the JPEG algorithm, the quantization coefficient, which varies from 0 to 99 and affects the degree of loss of information with respect to the original image.
The result of the lossy compression step 120 is a string of bits B that is smaller than the initial string A.
In step 130, the decompression algorithm that is the dual of the compression algorithm used in step 120 is applied to the string B. The result is a string of bits A', whose dimensions are typically equal to those of the initial string A. However, the data contained in the string A do not coincide with the data contained in A', since as a consequence of the lossy compression performed in step 120 part of the information has been eliminated and therefore it is not possible to reconstruct exactly the initial string A.
In step 140, an arithmetic difference is computed between the initial string A and the string A' as reconstructed (following lossy compression and decompression of the initial string A), producing a remainder string C.
The difference C=A-A is advantageously calculated by performing a subtraction by groups of eight bits in modulo 28=256, or by groups of sixteen bits in modulo 216, or by groups of thirty-two bits in modulo 232, or by groups of sixty-four bits in modulo 264 and so forth, depending on the capabilities of the microprocessor, optionally providing a few filler bytes for the strings A and A' in order to achieve a size that is a multiple of the chosen modulus.
In step 150, the remainder string C is compressed with a lossless compression algorithm, producing a string D. Before performing the lossless compression of the string C, in step 145 it is also possible to set the parameters of the chosen compression algorithm.
The two resulting strings B and D are then saved in step 160. If the two strings are concatenated, as is convenient, in a single file or sequence of bits ACOMP, sufficient information for correct reconstruction of the original file, for example information related to the size of one of the two strings B or D, is introduced in said final file, so as to be able to distinguish said two previously obtained data groups.
A non-limiting example of storage structure is shown in Figure 3, which shows the header 310 of the file and the data portion 320, which comprises the string B 321 and the string D 322.
In the illustrated case, the header 310 is constituted by five bytes, the first of which comprises three fields 311, 312 and 313 of two, three and three bits respectively. The first two bits, designated by the reference numeral 311, are used to identify the type of data being processed. For example, "00" can designate a generic or unclassified file, "01" can designate an image file, "10" can designate a sound file, and "11" can designate a text file.
The next three bits of the header, designated by the reference numeral 312, are used to identify the lossy compression algorithm that is used: for example, "001" can identify the JPEG algorithm, "010" can identify the JPEG2000 algorithm, and so forth.
The next three bits, designated by the reference numeral 313, can be used to identify the lossless compression algorithm; for example, "001" can identify a Ziv-Lempel algorithm, "010" can identify a predictive coding algorithm, for example the PMD algorithm, "011" can designate a Huffman coding algorithm, and so forth.
Naturally, the number of bits dedicated to this information can be changed at will for each of the fields if it is necessary to extend the range of alternatives. The next four bytes, designated by the reference numeral 314, identify the size of the string B, designated by the reference numeral 321, which is obtained by lossy compression.
The starting point of the data of the string D 322 and the size of the string, obtained by lossless compression of the remainder string C=A-A, can be obtained by subtraction by knowing the size of the final compressed file ACOMP, the size of the header 310 and the size of the string B 321. Naturally, if one expects to work on data with a maximum size of less than 216 or less than 232, it becomes obvious to provide for a smaller number of bytes (two or three, respectively) to store the size of the remainder string.
Figure 2 illustrates the process for decompressing the compressed file ACOMP and the reconstruction of the original file A, which occurs dually with respect to the compression process described above.
Again with reference to the non-limiting example structure of Figure 3, in step 210 the header 310 of the compressed file ACOMP is read so as to identify the type of file (generic, image, sound, text) that has been compressed and which algorithm has been used for lossy compression of the source file A and lossless compression of the remainder string, as well as the size of the string B. In step 220, the data block 321 related to the string B is decompressed by applying the dual of the algorithm used during compression in step 120, accordingly obtaining the string A.
In parallel, either virtually or really in the case of a multiprocessor device, the starting point of the string D 322 is calculated by adding to the size of the header 310 the size of the string B 321 : the remaining portion of the file ACOMP is in practice the string B 322.
In step 230, the string D is decompressed by using the dual of the lossless compression algorithm applied in step 150, so as to regenerate the remainder string C. In step 250, the resulting strings A' and C thus obtained are added dually with respect to what was performed in step 140. The result of this sum is identically equal to the source string A.
As an integration of the process described above, it is possible to provide an iterative algorithm that seeks the best quantization coefficient to be applied if the JPEG algorithm is used, and it is likewise possible to perform the same operation, by iterating on the possible configuration parameters, both when using a different algorithm for the compression step with data loss and in the configuration of the parameters of the algorithm used for the lossless compression step. A preferred embodiment, in this case also shown merely by way of non-limiting example, is shown in Figure 4.
In step 410, the string of bits A, i.e., the source file, is loaded into the memory of a computer and the values of the variables are initialized: DBEST designates the best compression obtained up to then on the source file during the iterative process and is initially set to the value DO, which indicates the original size of the string A to be compressed and is therefore a value that indicates a nil degree of compression; IBEST indicates the value of the coefficient that, when applied in the lossy compression step 415, generated the best final value DBEST; FINE is a flag that indicates whether the. iterative process has ended or not and is initially set to the value "FALSE".
In step 415, the quantization coefficient of the JPEG lossy compression algorithm is set to an initial value that is at one end of the possible range, for example the value "99". This value is assigned to the index I, which then assumes gradually all the possible values within the chosen range (from "99" to "0" in the example of Figure 4).
In step 415, the string A is compressed into B. The subsequent steps exactly reproduce the corresponding steps described with reference to Figure 1. In particular, in step 420 the string B is decompressed by using the algorithm that is the dual of the compression algorithm used in step 415, accordingly obtaining the string A; in step 425, subtraction is performed between the initial string A and the reconstructed string A', obtaining the remainder string C; in step 430, a lossless compression of the remainder string C is performed, obtaining a string D, until the step 435 is reached. If the FINE flag is set to the value TRUE, the strings B and D are stored in the file ACOMP together with the information required for their correct identification; otherwise, the iteration proceeds as described hereafter.
Step 445 checks whether the size DIM of the file compressed in the last cycle is smaller or not than the best final size DBEST obtained up to then. If it is, the value of DBEST is modified and set equal to DIM and the index IBEST is set equal to the current value of I. In both cases, the algorithm moves on to the block 455, where the index I, used as a coefficient for the lossless compression algorithm, is decremented by one unit and the compression process is reiterated starting from the block 415. If the test performed in step 460 indicates that the value of I is lower than the lower end of the iteration interval ("0" in the example), the FINE flag is set to TRUE in step 465 and the value of I is reset to IBEST, which retains the index that allowed to achieve the best result. The compression process is then performed one last time by using as a parameter for the lossy compression algorithm the stored IBEST coefficient, and the conditional control of the block 435 ends the cycle by storing the final string ACOMP-
The person skilled in the art can easily understand that the process thus described can also be applied by iterating the compression process on portions of file having a predefined maximum size, for example on blocks that are N kilobytes long, and then concatenating the results. In this manner, compression and decompression can be more efficient, since if a multiprocessor is used it is possible to work in parallel on multiple file portions simultaneously. In this case it is trivial for the person skilled in the art to provide extra information bits in order to identify the starting position of a data block. This can be done for example by adding to the header of each block related to the compression of a portion of the original file the size of the string D obtained by lossless compression of the string C=A-A, in addition to the size of the string B obtained by lossy compression of the source string A. The embodiment described above can be implemented in many different manners and the present invention is not limited to a particular implementation.
Moreover, although the various methods described are conveniently implemented in a generic computer that is selectively activated or configured by means of a software executable, a person skilled in the art can also realize that these methods can be performed by means of hardware, in firmware, or in more specialized devices built to perform the steps required by the method. A preferred form of implementation is a dedicated device, for example a chip of an integrated circuit, which includes its own processor on a board, a system clock or memory. In this case, the compression and decompression routines are stored in firmware or the like.
The compression and/or decompression method can also be implemented with other known hardware, such as for example a digital signal processor (DSP), a field programmable gate array (FPGA), an application-specific integrated circuit chip (ASIC), a microcontroller with coded logic, or the like. The application of the invention in existing codecs is particularly simplified by the fact that the method according to the invention integrates and evolves conventional compression methods that have already been used in various fields, allowing new levels of compression without data, loss. In particular, one extremely advantageous use of the invention is in application to the field of digital photography. Most digital still cameras in fact use a lossy JPEG coding for storing the data that represent an image. The quality of the image, although being deemed acceptable, does not achieve the physical limit determined by the lens of the camera, and therefore in order to obtain optimum pictures it is currently necessary to disable, where possible, JPEG compression and store an uncompressed image in memory, with obvious inconveniences due to the impossibility to keep in memory a satisfactory number of photographs and to the increase in the time required to transfer the image on processing devices. By reprogramming current codecs with the illustrated algorithm according to the invention, it now becomes possible to maintain the total quality of the image without renouncing the advantages of compression.
The following table 1 lists values obtained by applying the algorithm of Figure 4 to a plurality of photographs, compared with the respective values obtained by applying the best currently known lossless compression algorithm, i.e., the JPEG 2000 algorithm used in lossless mode. In particular, the JASPER™ implementation was used for the tests.
The first column of the table indicates the initial size of the file to be compressed, expressed in bytes. The second column lists, as an absolute value and as a percentage, the results of the compression performed by using an implementation of the present invention. In particular, the JPEG compression algorithm was used for step 415, whereas a Huffman coding algorithm was used for step 430. The third column lists the same data obtained by using the JASPER implementation of the JPEG2000 algorithm. Finally, the last column lists the percentage improvement achieved by the implementation of the invention used here with respect to JASPER.
Figure imgf000014_0001
Figure imgf000015_0001
Figure imgf000016_0001
Table 1
It can be seen that on average the inventive method described above has allowed to compress without loss the original files to less than one third of the original (32.54%), improving the yield of JASPER by more than 30%.
It has thus been shown that the present invention achieves the intended aim and objects. In particular, it has been shown that the described method and system allow to achieve lossless data compression with higher compression ratios than the background art. It has also been shown that the invention is practical to provide, since its implementation is an inventive integration of known lossy and lossless compression algorithms. Clearly, numerous modifications are evident and can be promptly performed by the person skilled in the art without abandoning the scope of the protection of the present invention. For example, it is obvious for the person skilled in the art to replace the known compression algorithms used in some steps of the process with other known algorithms according to requirements, according to the technology made available commercially and according to the type of file to be processed, and it is also evident that the physical formats of the compressed file may be any according to requirements.
It is also evident that the inventive concept on which the present invention is based is independent of the actual implementation of the software modules, which can be provided in any language and on any hardware platform and also as firmware that can be applied to modern electronic devices.
Accordingly, the scope of the protection of the claims must not be limited by the illustrations or preferred embodiments shown in the description by way of example, but rather the claims must comprise all the patentable novelty characteristics that reside in the present invention, including all the characteristics that would be treated as equivalent by the person skilled in the art.
The disclosures in Italian Patent Application No. MO2003A000053 from which this application claims priority are incorporated herein by reference.

Claims

1. A method for processing a digital file comprising a string of bits A, comprising the steps of: a) compressing said string of bits A into a string of bits B by applying a lossy compression method; b) decompressing said string of bits B into a string of bits A' by applying the decompression algorithm that is the dual of said lossy compression method; c) subtracting the string A from the string A, obtaining a string of bits C; d) compressing the string of bits C into a string of bits D by applying a lossless compression method; e) storing the sequence of strings B, D.
2. The method according to claim 1, characterized in that said lossy compression method is chosen from the group that comprises:
~ the JPEG compression algorithm;
— the JPEG2000 compression algorithm.
3. The method according to claims 1 or 2, characterized in that said lossless compression method is selected from the group that comprises: — predictive coding compression algorithm;
~ Huffman coding compression algorithm;
~ Ziv-Lempel coding compression algorithm;
~ numeric coding compression algorithm;
~ universal coding compression algorithm.
4. A digital data coding device, comprising:
~ means for compressing a string of bits A into a string of bits B according to a lossy compression algorithm;
~ means for decompressing said string B of bits into a string of bits A' according to the algorithm that is the dual of said lossy compression algorithm; ~ means for calculating, by performing a subtraction between said string A and said string A', a string of bits C;
— means for compressing said string of bits C into a string of bits D according to a lossless compression method.
5. The device according to claim 4, characterized in that said lossy compression algorithm is chosen from the group that comprises:
— the JPEG compression algorithm;
~ the JPEG 2000 compression algorithm.
6. The device according to claims 4 or 5, characterized in that said lossless compression algorithm is chosen from the group that comprises:
~ predictive coding compression algorithm; ~ Huffman coding compression algorithm; ~ Ziv-Lempel coding compression algorithm; ~ numeric coding compression algorithm; ~ universal coding compression algorithm.
7. A sequence of bits, characterized in that it comprises:
~ information that identifies a lossy compression algorithm used to compress a string of bits A into a string of bits B;
~ information identifying a lossless compression algorithm used to compress losslessly a string of bits C into a string of bits D, said string C being obtained by subtracting from said string A a string
A that is obtained by decompressing said string B, applying an algorithm that is the dual of said lossy compression algorithm;
- said string B; ~ said string D;
« information that identifies the starting and end points of said strings B and D within said bit sequence.
8. The bit sequence according to claim 7, characterized in that said information that identifies the starting and end points of said strings B and D comprises at least one of the following data: a) the size of the string B; b) the size of the string D; c) the position of the first byte of the string B; d) the position of the first byte of the string D.
9. The bit sequence according to claim 8, characterized in that it comprises information that identifies the type of file that is compressed:
PCT/EP2004/001839 2003-03-04 2004-02-25 Apparatus and method for compression and decompression of digital data Ceased WO2004080080A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IT000053A ITMO20030053A1 (en) 2003-03-04 2003-03-04 DEVICE AND CODING AND DECODING METHOD FOR THE COMPRESSION AND DECOMPRESSION OF DIGITAL DATA.
ITMO2003A000053 2003-03-04

Publications (1)

Publication Number Publication Date
WO2004080080A1 true WO2004080080A1 (en) 2004-09-16

Family

ID=27677358

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2004/001839 Ceased WO2004080080A1 (en) 2003-03-04 2004-02-25 Apparatus and method for compression and decompression of digital data

Country Status (2)

Country Link
IT (1) ITMO20030053A1 (en)
WO (1) WO2004080080A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007084301A3 (en) * 2006-01-12 2008-12-04 Gary Demos Efficient bit-exact lossless image coding residual system
WO2011053178A1 (en) * 2009-10-27 2011-05-05 Intel Corporation Scalable compression using jpeg-ls
EP4203474A4 (en) * 2020-09-17 2024-01-24 Huawei Technologies Co., Ltd. Image processing method and apparatus, device, and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4903317A (en) * 1986-06-24 1990-02-20 Kabushiki Kaisha Toshiba Image processing apparatus
EP0971544A2 (en) * 1998-07-03 2000-01-12 Canon Kabushiki Kaisha An image coding method and apparatus for localised decoding at multiple resolutions
US6021224A (en) * 1997-03-28 2000-02-01 International Business Machines Corporation Multiresolution lossless/lossy compression and storage of data for efficient processing thereof
US6483875B1 (en) * 1997-06-19 2002-11-19 Sony Corporation Picture signal processing apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4903317A (en) * 1986-06-24 1990-02-20 Kabushiki Kaisha Toshiba Image processing apparatus
US6021224A (en) * 1997-03-28 2000-02-01 International Business Machines Corporation Multiresolution lossless/lossy compression and storage of data for efficient processing thereof
US6483875B1 (en) * 1997-06-19 2002-11-19 Sony Corporation Picture signal processing apparatus
EP0971544A2 (en) * 1998-07-03 2000-01-12 Canon Kabushiki Kaisha An image coding method and apparatus for localised decoding at multiple resolutions

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CARRETO-CASTRO M F ET AL: "Comparison of lossless compression techniques", CIRCUITS AND SYSTEMS, 1993., PROCEEDINGS OF THE 36TH MIDWEST SYMPOSIUM ON DETROIT, MI, USA 16-18 AUG. 1993, NEW YORK, NY, USA,IEEE, 16 August 1993 (1993-08-16), pages 1268 - 1270, XP010119958, ISBN: 0-7803-1760-2 *
HSU W H ET AL: "AUTOMATIC SYNTHESIS OF COMPRESSION TECHNIQUES FOR HETEROGENEOUS FILES", SOFTWARE PRACTICE & EXPERIENCE, JOHN WILEY & SONS LTD. CHICHESTER, GB, vol. 25, no. 10, 1 October 1995 (1995-10-01), pages 1097 - 1116, XP000655539, ISSN: 0038-0644 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007084301A3 (en) * 2006-01-12 2008-12-04 Gary Demos Efficient bit-exact lossless image coding residual system
US7769239B2 (en) 2006-01-12 2010-08-03 Gary Demos Efficient bit-exact lossless image coding residual system
WO2011053178A1 (en) * 2009-10-27 2011-05-05 Intel Corporation Scalable compression using jpeg-ls
US9105074B2 (en) 2009-10-27 2015-08-11 Intel Corporation Scalable compression using JPEG-LS
EP4203474A4 (en) * 2020-09-17 2024-01-24 Huawei Technologies Co., Ltd. Image processing method and apparatus, device, and computer readable storage medium

Also Published As

Publication number Publication date
ITMO20030053A0 (en) 2003-03-04
ITMO20030053A1 (en) 2004-09-05

Similar Documents

Publication Publication Date Title
US5818877A (en) Method for reducing storage requirements for grouped data values
JP6560393B2 (en) Context initialization in entropy coding
US5825830A (en) Method and apparatus for the compression of audio, video or other data
JP6025923B2 (en) System and method for compressing a stream of integer data
JP6526099B2 (en) Modified Coding for Transform-Skipped Blocks for CABAC in HEVC
JPH0746142A (en) Data compression system
US5874908A (en) Method and apparatus for encoding Lempel-Ziv 1 variants
KR20110069001A (en) A method of lossless compression of prefix-suffix codes, a method of decompressing a bit sequence representing integers or symbols encoded with compressed prefix-suffix codes, and a storage medium or signal carrying the compressed prefix-suffix codes.
JP2006126810A (en) Lossless adaptive golomb-rice encoding, and decoding of integer data using backward-adaptive rule
JP7477178B2 (en) Method and apparatus for image compression - Patents.com
KR102400514B1 (en) Method and device for digital data compression
KR100733949B1 (en) Lossless Adaptive Encoding of Finite Alphabet Data
JP2006093958A (en) Progressive JPEG decoding system
EP0635807B1 (en) Coding apparatus for image compression
WO2001057804A2 (en) Method and apparatus for compression and decompression of digital images
JP2006129467A (en) Lossless adaptive encoding/decoding of integer data
WO2004080080A1 (en) Apparatus and method for compression and decompression of digital data
WO2001005039A1 (en) Signal processing method and device
Shaikh et al. Huffman coding technique for image compression
US20060214822A1 (en) Digital data decompression implemented in a field programmable array device
EP0874465A1 (en) Error resilient variable length code
JP3866539B2 (en) Encoding method, decoding method, encoding device, decoding device, encoding program, decoding program, and program recording medium thereof
Jeromel et al. Comparison of entropy coders for lossless grayscale image compression
TW202418807A (en) Noniterative entropy coding
JPH0799453A (en) Variable length coding method and apparatus using less memory

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 69(1) EPC (EPO FORM 1205A) DATED 16.12.2005

122 Ep: pct application non-entry in european phase