WO2002060067A2 - Procede de compression de donnees - Google Patents
Procede de compression de donnees Download PDFInfo
- Publication number
- WO2002060067A2 WO2002060067A2 PCT/GB2002/000307 GB0200307W WO02060067A2 WO 2002060067 A2 WO2002060067 A2 WO 2002060067A2 GB 0200307 W GB0200307 W GB 0200307W WO 02060067 A2 WO02060067 A2 WO 02060067A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- array
- pixel
- sub
- data
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/06—Optimizing the usage of the radio link, e.g. header compression, information sizing, discarding information
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/04—Protocols for data compression, e.g. ROHC
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/16—Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/16—Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
- H04L69/161—Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/16—Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
- H04L69/166—IP fragmentation; TCP segmentation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/22—Parsing or analysis of headers
Definitions
- THIS INVENTION RELATES to a method of data compression and more particularly to a method of compressing hypertext markup language (HTML) and graphics for transmission over a wireless network.
- HTML hypertext markup language
- Huffman compression or more correctly, encoding
- encoding provides a method of encoding data based on the frequency of appearance of that data.
- Each element to be encoded has a predetermined frequency or probability of appearance.
- a tree is constructed and the two elements with the least frequencies are joined by a new root which is assigned the sum of their frequencies. This process is repeated for all the elements until the tree is completed.
- a single code bit represents each of the probabilities of occurrence of an element. Starting from the top of the tree, each element is encoded as a series of code bits taken from the top of the tree down to the element to be encoded.
- one aspect of the present invention provides a method of compressing a page of HTML comprising the steps of compressing text content, tags and valve attributes independently of one another using Huffman encoding and separately encoding data attributes not using a Huffman encoding technique.
- the data attributes are each encoded as a bit pattern shorter in length than the data attribute it represents, unique to the HTML page.
- a look-up table is provided for each HTML page to link the content of a data attribute to a respective short number.
- the short numbers are 16 bits in length.
- a data attribute comprises a URL.
- Another aspect of the present invention provides a method of compressing a graphical image comprising an array of pixels divisible into sub-arrays, the method comprising the steps of: preparing a data set representing the image; and Huffman encoding the data set, wherein the data set comprises a first data layer representing one pixel in the array and a subsequent data layer representing the difference between the one pixel and another pixel from each sub-array.
- each sub-array is then treated as an array divisible into sub-arrays and the data set further comprises: a further data layer representing the difference between one pixel from each array and another pixel from each of its sub-arrays.
- the data set is expanded until each sub-array comprises a single pixel.
- the one pixel is located in the corner of each array or sub- array.
- the one pixel is located in the middle of each array or sub-array.
- the method of rendering an image compressed in accordance with the method above comprises the steps of: a) Huffman decoding the first data layer; b) rendering the one pixel across the image; c) Huffman decoding the subsequent data layer; d) identifying the pixels representing each sub-array of the image; e) rendering each identified pixel across its respective sub-array; and f) repeating steps c) to e) until each sub-array comprises a single pixel.
- Figure 1 is an image to be compressed according to a method embodying the present invention
- Figure 2 is a top layer of the image of Figure 1 sent by the method embodying the present invention
- Figure 3 is a second layer of the image of Figure 1 sent by the method embodying the present invention.
- Figure 4 is a fixed layer of Figure 1 sent by the method embodying the present invention.
- Hypertext markup language is a scripting language which tells a web browser how to display the content of a web page.
- the content for a page is marked up using tags which turn on and off formats for the content of the page. Attributes assign values to tags to define colours, font sizes and the like.
- the content itself may be simple text or a link to a universal resource locator (URL).
- URL universal resource locator
- HTML and other similar markup-languages
- the tags and the content making up the HTML are encoded using Huffman compression techniques but independently of one another.
- the attributes and values are bit- encoded.
- the ability to treat content (text), tags, attributes and values separately in the encoding of the page is possible in all cases. This is part of the definition of what is a tag, attribute or value within HTML page. It should be noted that there are only a limited number of HTML tags, a few of which are used very frequently and some of which are rarely used. The distribution of the tags is, therefore, non linear and very well suited to Huffman compression. Indeed, compression of HTML tags alone is around 80% (5:1).
- HTML attributes are typically alphanumeric text, and compresses well using Huffinan.
- the tags e.g. ⁇ table>, ⁇ p>, ⁇ body>
- the attributes are either optional flags, (e.g. "checked"), selectors (e.g. "top”, “middle”, “bottom”), small integers or URLs.
- the optional attributes that can follow a tag are a very small set for each tag.
- Text appearing in HTML has a normal Gaussian distribution and therefore is encoded at a compression of around 33% (1.5:1).
- HTML attributes appear immediately after a tag (and are divided into two classes). Attributes are readily identifiable and distinguishable from text because of their proximity to tags.
- a tag need not always be further defined by an attribute and could simply be ⁇ B> comprising an "opening" tag to indicate that the following text should be in bold.
- the "closing" tag ⁇ B> is identical to the opening tag except containing a / and is used to "turn off the format, in this case the emboldenment of text.
- Such defined tags compress by around 75% (4:1).
- Data attributes are not encoded, i.e. are omitted from the Huffman Tree but are instead each represented by a short number unique to the page they appear on without specifying their content.
- URL identifiers are limited in length to 16 bits (thereby constraining a single web page to include only 65,535 images or links). The reduction of the URLs mentioned on any one page to 16 bits results in approximately 75% compression. This is based on the fact that almost all URLs are eight characters or more and will need to be sent as two characters In other words, since URLs do not compress well as they contain limited repetition and are frequently long.
- This example of the present invention "compresses" URLs by not transmsitting them and substituting a 16-bit integer identifier which is maintained in a look-up table by a proxy which can be referred to by reference to the identifier if a URL link or image is required.
- the 16-bit integer is short in terms of being shorter than the original URL. It does not really matter whether it is an 8-bit, 16-bit or any other length (except that it changes the overall compression ratio), as long as the number is representative of a longer quantity held at the proxy server.
- the compression will be far greater resulting in an average compression ratio in the region of 75 to 80%.
- the URL "eight.htm” is nine characters long (72 bits) and would be transmitted as a 16-bit integer from the proxy look-up table. This example is 78% compressed (being 100- (100*16)/72).
- the ratio is frequently higher as many URLs are considerably longer than 8 or 9 characters, 50 or more characters is not uncommon, compressing the URL by 96%.
- This example of the present invention encodes HTML as: tags ( ⁇ 8 bits after compression) followed by; a tag specific bit pattern indicating which attributes are present; and ordered attributes as individual groups of bits.
- HTML fragment For example, the HTML fragment:
- tags White table
- attributes are: of three defined length of the "span" Span (number of tags attributes are attribute is defined until the ⁇ /font>) present. by a previous tag, Size (encoded as 4 bit and limited to the integer -7 to +7) smallest number of Colour (encoded as 8 bits that can bit RRRGGGBB) represent the largest span contained within the page.
- embodiments of the invention use a layered approach to the compression technique.
- the data representing an image is sent in a plurality of layers of increasing resolution.
- a single absolute colour value comprises the top layer of most coarse resolution, being one pixel derived from a "key" pixel in the image.
- the "key" pixel would be the top left hand corner pixel.
- This representation of the graphic image is still sent as a pixel in the first top layer of data.
- the second layer of data which is of better resolution than the original top layer comprises data indicating the differences between the present layer and the top previous layer. For many images, one pixel in the image is very likely to be the same as the next one to it.
- the system sends the differences from "A”, "C”, “I” or “K” respectively, as appropriate, as shown in Figure 4, but again with twice the resolution as in Figure 3.
- the final individual pixels are filled in.
- the number of layers required is 3, (l+log 2 4) and the actual sequence transmitted is top layer: A; second layer: A-C,A-I,A-K; and third layer: A-B,A-E,A-F 5 C-D,C-G,C-H,I-J,I-M,I-N,K-L,K-0,K-P
- a difference function "-" or an XOR function increases the non- linearity of the data distribution thereby enhancing the Huffman compression.
- the likelihood of the final differences e.g. A-B through to K-P
- being zero is high, which will Huffman encode to a very short bit pattern.
- the "key” or “start” pixel (from which the difference is initially calculated between layers) is the central pixel in a 3 x 3 grid.
- the 3 x 3 version delivers a marginally better compression ratio for large (e.g. >297 pixels square) images.
- a further advantage can be obtained when the method is used with systems having more basic colour depths, for example in an eight bit system rather than a twenty-four bit system. Compression ratios in the region of 66% are achievable for an eight bit system.
- the total reduction using the above method is to about 15% of an uncompressed true colour image.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| AU2002226550A AU2002226550A1 (en) | 2001-01-26 | 2002-01-25 | A method of data compression |
| EP02716158A EP1360772A2 (fr) | 2001-01-26 | 2002-01-25 | Procede de compression de donnees |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB0102113A GB0102113D0 (en) | 2001-01-26 | 2001-01-26 | Improvements in or relating to wireless communication systems |
| GB0102114A GB0102114D0 (en) | 2001-01-26 | 2001-01-26 | A method of data compression |
| GB0102114.6 | 2001-01-26 | ||
| GB0102113.8 | 2001-01-26 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2002060067A2 true WO2002060067A2 (fr) | 2002-08-01 |
| WO2002060067A3 WO2002060067A3 (fr) | 2003-09-18 |
Family
ID=26245622
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/GB2002/000322 Ceased WO2002060152A2 (fr) | 2001-01-26 | 2002-01-25 | Ameliorations se rapportant a des systemes de communication sans fil |
| PCT/GB2002/000307 Ceased WO2002060067A2 (fr) | 2001-01-26 | 2002-01-25 | Procede de compression de donnees |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/GB2002/000322 Ceased WO2002060152A2 (fr) | 2001-01-26 | 2002-01-25 | Ameliorations se rapportant a des systemes de communication sans fil |
Country Status (3)
| Country | Link |
|---|---|
| EP (2) | EP1360772A2 (fr) |
| AU (2) | AU2002228168A1 (fr) |
| WO (2) | WO2002060152A2 (fr) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2004110027A1 (fr) * | 2003-06-06 | 2004-12-16 | Computer Associates Think, Inc. | Systeme et procede de compression de parametres de demande d'url |
| WO2005067153A1 (fr) * | 2003-12-30 | 2005-07-21 | Koninklijke Philips Electronics N.V. | Format de compression de donnees de consultation rapide pour fichiers xml |
| EP1629618A4 (fr) * | 2003-06-04 | 2011-06-15 | Qualcomm Inc | Procede et appareil permettant de traduire des noms de ressources dans un environnement sans fil |
| US8321326B2 (en) | 2009-09-15 | 2012-11-27 | Auerbach Group Llc | Method and system for enhancing the efficiency of a digitally communicated data exchange |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100146112A1 (en) * | 2008-12-04 | 2010-06-10 | Real Dice Inc. | Efficient communication techniques |
| EP2825978B1 (fr) | 2012-03-13 | 2021-06-30 | Google LLC | Système et procédé pour produire une représentation binaire d'une page web |
Family Cites Families (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO1997034240A1 (fr) * | 1996-03-15 | 1997-09-18 | University Of Massachusetts | Arbre compact pour le stockage et l'extraction de documents structures hypermedia |
| US5815516A (en) * | 1996-04-05 | 1998-09-29 | International Business Machines Corporation | Method and apparatus for producing transmission control protocol checksums using internet protocol fragmentation |
| DE69942202D1 (de) * | 1998-05-29 | 2010-05-12 | Access Systems Americas Inc | Verfahren und gerät zum drahtlosen interzugriff |
| EP1106008A1 (fr) * | 1998-08-20 | 2001-06-13 | Nokia Corporation | Procede et appareil assurant un multiplexage utilisateur dans un protocole en temps reel - |
| GB9819136D0 (en) * | 1998-09-02 | 1998-10-28 | Lucent Tech Network Sys Gmbh | Mobile terminal and base station in a packet radio services network |
| FI19992746A7 (fi) * | 1998-12-28 | 2000-06-29 | Spyglass Inc | Menetelmä ja järjestelmä elektronisen datasisällön muuntamiseksi langattomille laitteille |
| US6594276B1 (en) * | 1999-04-01 | 2003-07-15 | Nokia Corporation | Apparatus and associated method for communicating multimedia information upon a communication link |
| CA2384687A1 (fr) * | 1999-09-10 | 2001-03-15 | General Instrument Corporation | Procede et appareil pour comprimer un contenu en langage script |
-
2002
- 2002-01-25 EP EP02716158A patent/EP1360772A2/fr not_active Withdrawn
- 2002-01-25 EP EP02710113A patent/EP1358751A2/fr not_active Withdrawn
- 2002-01-25 WO PCT/GB2002/000322 patent/WO2002060152A2/fr not_active Ceased
- 2002-01-25 AU AU2002228168A patent/AU2002228168A1/en not_active Abandoned
- 2002-01-25 WO PCT/GB2002/000307 patent/WO2002060067A2/fr not_active Ceased
- 2002-01-25 AU AU2002226550A patent/AU2002226550A1/en not_active Abandoned
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP1629618A4 (fr) * | 2003-06-04 | 2011-06-15 | Qualcomm Inc | Procede et appareil permettant de traduire des noms de ressources dans un environnement sans fil |
| WO2004110027A1 (fr) * | 2003-06-06 | 2004-12-16 | Computer Associates Think, Inc. | Systeme et procede de compression de parametres de demande d'url |
| WO2005067153A1 (fr) * | 2003-12-30 | 2005-07-21 | Koninklijke Philips Electronics N.V. | Format de compression de donnees de consultation rapide pour fichiers xml |
| US8321326B2 (en) | 2009-09-15 | 2012-11-27 | Auerbach Group Llc | Method and system for enhancing the efficiency of a digitally communicated data exchange |
| US8538861B2 (en) | 2009-09-15 | 2013-09-17 | Auerbach Group Llc | Use of adaptive and/or customized compression to enhance the efficiency of digital financial data exchanges |
| US8756149B2 (en) | 2009-09-15 | 2014-06-17 | Auerbach Group Llc | Use of adaptive and/or customized compression to enhance the efficiency of digital data exchanges |
Also Published As
| Publication number | Publication date |
|---|---|
| EP1358751A2 (fr) | 2003-11-05 |
| WO2002060152A2 (fr) | 2002-08-01 |
| AU2002226550A1 (en) | 2002-08-06 |
| WO2002060152A3 (fr) | 2003-06-05 |
| AU2002228168A1 (en) | 2002-08-06 |
| EP1360772A2 (fr) | 2003-11-12 |
| WO2002060067A3 (fr) | 2003-09-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP0813167B1 (fr) | Méthode et dispositif pour la compression et décompression de famille de caractères | |
| US8700803B2 (en) | Web page optimization | |
| US5341440A (en) | Method and apparatus for increasing information compressibility | |
| US20030167334A1 (en) | Provision of content to a client device | |
| JP2000501593A (ja) | 加速されたテキスト文字及び線画形成を備えた画像グラフィックスのダウンロード | |
| WO2002060067A2 (fr) | Procede de compression de donnees | |
| CN108573069B (zh) | 一种加速压缩流量正则表达式匹配的Twins方法 | |
| CN102379087B (zh) | 压缩方法、解压缩方法、压缩单元、解压缩单元以及压缩文档 | |
| US8488894B2 (en) | Method and system for dot-matrix font data compression and decompression | |
| US20020138518A1 (en) | Method and system for code processing of document data | |
| Kim et al. | Image compression using chain coding for electronic shelf labels (ESL) systems | |
| KR100327924B1 (ko) | 표의 문자 압축 | |
| Qin et al. | Reversible data embedding for vector quantization compressed images using search‐order coding and index parity matching | |
| JP3768959B2 (ja) | ファイル形式の互換性を持たせるための方法 | |
| US7613319B2 (en) | Electronic watermark generating apparatus, method, and program for generating electronic watermark | |
| JP4122759B2 (ja) | 文書データの符号処理方法及びシステム | |
| Rauschenbach | Compression of palettized images with progressive coding of the color information | |
| DE60103379T2 (de) | Darstellung des allgemeinen technischen gebiets und des stands der technik | |
| US6433708B1 (en) | Method of encoding binary data | |
| Meinel et al. | Multimedia data and its encoding | |
| JP2000250718A (ja) | 画像圧縮伸長装置 | |
| CN117744096A (zh) | 一种图像处理方法、装置、设备及可读存储介质 | |
| Yulianti | Application of hybrid fractal image compression method for aerial photographs | |
| Meyyappan et al. | Lossless Digital Image Compression Method For Bitmap Images | |
| KR20050112089A (ko) | 초경량 브라우저 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
| WWE | Wipo information: entry into national phase |
Ref document number: 2002716158 Country of ref document: EP |
|
| WWP | Wipo information: published in national office |
Ref document number: 2002716158 Country of ref document: EP |
|
| REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
| WWW | Wipo information: withdrawn in national office |
Ref document number: 2002716158 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: JP |
|
| WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |