[go: up one dir, main page]

WO2002060067A2 - Procede de compression de donnees - Google Patents

Procede de compression de donnees Download PDF

Info

Publication number
WO2002060067A2
WO2002060067A2 PCT/GB2002/000307 GB0200307W WO02060067A2 WO 2002060067 A2 WO2002060067 A2 WO 2002060067A2 GB 0200307 W GB0200307 W GB 0200307W WO 02060067 A2 WO02060067 A2 WO 02060067A2
Authority
WO
WIPO (PCT)
Prior art keywords
array
pixel
sub
data
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/GB2002/000307
Other languages
English (en)
Other versions
WO2002060067A3 (fr
Inventor
Matthew Woolf
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
POGO MOBILE SOLUTIONS Ltd
Original Assignee
POGO MOBILE SOLUTIONS Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB0102113A external-priority patent/GB0102113D0/en
Priority claimed from GB0102114A external-priority patent/GB0102114D0/en
Application filed by POGO MOBILE SOLUTIONS Ltd filed Critical POGO MOBILE SOLUTIONS Ltd
Priority to AU2002226550A priority Critical patent/AU2002226550A1/en
Priority to EP02716158A priority patent/EP1360772A2/fr
Publication of WO2002060067A2 publication Critical patent/WO2002060067A2/fr
Anticipated expiration legal-status Critical
Publication of WO2002060067A3 publication Critical patent/WO2002060067A3/fr
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/06Optimizing the usage of the radio link, e.g. header compression, information sizing, discarding information
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/161Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/166IP fragmentation; TCP segmentation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers

Definitions

  • THIS INVENTION RELATES to a method of data compression and more particularly to a method of compressing hypertext markup language (HTML) and graphics for transmission over a wireless network.
  • HTML hypertext markup language
  • Huffman compression or more correctly, encoding
  • encoding provides a method of encoding data based on the frequency of appearance of that data.
  • Each element to be encoded has a predetermined frequency or probability of appearance.
  • a tree is constructed and the two elements with the least frequencies are joined by a new root which is assigned the sum of their frequencies. This process is repeated for all the elements until the tree is completed.
  • a single code bit represents each of the probabilities of occurrence of an element. Starting from the top of the tree, each element is encoded as a series of code bits taken from the top of the tree down to the element to be encoded.
  • one aspect of the present invention provides a method of compressing a page of HTML comprising the steps of compressing text content, tags and valve attributes independently of one another using Huffman encoding and separately encoding data attributes not using a Huffman encoding technique.
  • the data attributes are each encoded as a bit pattern shorter in length than the data attribute it represents, unique to the HTML page.
  • a look-up table is provided for each HTML page to link the content of a data attribute to a respective short number.
  • the short numbers are 16 bits in length.
  • a data attribute comprises a URL.
  • Another aspect of the present invention provides a method of compressing a graphical image comprising an array of pixels divisible into sub-arrays, the method comprising the steps of: preparing a data set representing the image; and Huffman encoding the data set, wherein the data set comprises a first data layer representing one pixel in the array and a subsequent data layer representing the difference between the one pixel and another pixel from each sub-array.
  • each sub-array is then treated as an array divisible into sub-arrays and the data set further comprises: a further data layer representing the difference between one pixel from each array and another pixel from each of its sub-arrays.
  • the data set is expanded until each sub-array comprises a single pixel.
  • the one pixel is located in the corner of each array or sub- array.
  • the one pixel is located in the middle of each array or sub-array.
  • the method of rendering an image compressed in accordance with the method above comprises the steps of: a) Huffman decoding the first data layer; b) rendering the one pixel across the image; c) Huffman decoding the subsequent data layer; d) identifying the pixels representing each sub-array of the image; e) rendering each identified pixel across its respective sub-array; and f) repeating steps c) to e) until each sub-array comprises a single pixel.
  • Figure 1 is an image to be compressed according to a method embodying the present invention
  • Figure 2 is a top layer of the image of Figure 1 sent by the method embodying the present invention
  • Figure 3 is a second layer of the image of Figure 1 sent by the method embodying the present invention.
  • Figure 4 is a fixed layer of Figure 1 sent by the method embodying the present invention.
  • Hypertext markup language is a scripting language which tells a web browser how to display the content of a web page.
  • the content for a page is marked up using tags which turn on and off formats for the content of the page. Attributes assign values to tags to define colours, font sizes and the like.
  • the content itself may be simple text or a link to a universal resource locator (URL).
  • URL universal resource locator
  • HTML and other similar markup-languages
  • the tags and the content making up the HTML are encoded using Huffman compression techniques but independently of one another.
  • the attributes and values are bit- encoded.
  • the ability to treat content (text), tags, attributes and values separately in the encoding of the page is possible in all cases. This is part of the definition of what is a tag, attribute or value within HTML page. It should be noted that there are only a limited number of HTML tags, a few of which are used very frequently and some of which are rarely used. The distribution of the tags is, therefore, non linear and very well suited to Huffman compression. Indeed, compression of HTML tags alone is around 80% (5:1).
  • HTML attributes are typically alphanumeric text, and compresses well using Huffinan.
  • the tags e.g. ⁇ table>, ⁇ p>, ⁇ body>
  • the attributes are either optional flags, (e.g. "checked"), selectors (e.g. "top”, “middle”, “bottom”), small integers or URLs.
  • the optional attributes that can follow a tag are a very small set for each tag.
  • Text appearing in HTML has a normal Gaussian distribution and therefore is encoded at a compression of around 33% (1.5:1).
  • HTML attributes appear immediately after a tag (and are divided into two classes). Attributes are readily identifiable and distinguishable from text because of their proximity to tags.
  • a tag need not always be further defined by an attribute and could simply be ⁇ B> comprising an "opening" tag to indicate that the following text should be in bold.
  • the "closing" tag ⁇ B> is identical to the opening tag except containing a / and is used to "turn off the format, in this case the emboldenment of text.
  • Such defined tags compress by around 75% (4:1).
  • Data attributes are not encoded, i.e. are omitted from the Huffman Tree but are instead each represented by a short number unique to the page they appear on without specifying their content.
  • URL identifiers are limited in length to 16 bits (thereby constraining a single web page to include only 65,535 images or links). The reduction of the URLs mentioned on any one page to 16 bits results in approximately 75% compression. This is based on the fact that almost all URLs are eight characters or more and will need to be sent as two characters In other words, since URLs do not compress well as they contain limited repetition and are frequently long.
  • This example of the present invention "compresses" URLs by not transmsitting them and substituting a 16-bit integer identifier which is maintained in a look-up table by a proxy which can be referred to by reference to the identifier if a URL link or image is required.
  • the 16-bit integer is short in terms of being shorter than the original URL. It does not really matter whether it is an 8-bit, 16-bit or any other length (except that it changes the overall compression ratio), as long as the number is representative of a longer quantity held at the proxy server.
  • the compression will be far greater resulting in an average compression ratio in the region of 75 to 80%.
  • the URL "eight.htm” is nine characters long (72 bits) and would be transmitted as a 16-bit integer from the proxy look-up table. This example is 78% compressed (being 100- (100*16)/72).
  • the ratio is frequently higher as many URLs are considerably longer than 8 or 9 characters, 50 or more characters is not uncommon, compressing the URL by 96%.
  • This example of the present invention encodes HTML as: tags ( ⁇ 8 bits after compression) followed by; a tag specific bit pattern indicating which attributes are present; and ordered attributes as individual groups of bits.
  • HTML fragment For example, the HTML fragment:
  • tags White table
  • attributes are: of three defined length of the "span" Span (number of tags attributes are attribute is defined until the ⁇ /font>) present. by a previous tag, Size (encoded as 4 bit and limited to the integer -7 to +7) smallest number of Colour (encoded as 8 bits that can bit RRRGGGBB) represent the largest span contained within the page.
  • embodiments of the invention use a layered approach to the compression technique.
  • the data representing an image is sent in a plurality of layers of increasing resolution.
  • a single absolute colour value comprises the top layer of most coarse resolution, being one pixel derived from a "key" pixel in the image.
  • the "key" pixel would be the top left hand corner pixel.
  • This representation of the graphic image is still sent as a pixel in the first top layer of data.
  • the second layer of data which is of better resolution than the original top layer comprises data indicating the differences between the present layer and the top previous layer. For many images, one pixel in the image is very likely to be the same as the next one to it.
  • the system sends the differences from "A”, "C”, “I” or “K” respectively, as appropriate, as shown in Figure 4, but again with twice the resolution as in Figure 3.
  • the final individual pixels are filled in.
  • the number of layers required is 3, (l+log 2 4) and the actual sequence transmitted is top layer: A; second layer: A-C,A-I,A-K; and third layer: A-B,A-E,A-F 5 C-D,C-G,C-H,I-J,I-M,I-N,K-L,K-0,K-P
  • a difference function "-" or an XOR function increases the non- linearity of the data distribution thereby enhancing the Huffman compression.
  • the likelihood of the final differences e.g. A-B through to K-P
  • being zero is high, which will Huffman encode to a very short bit pattern.
  • the "key” or “start” pixel (from which the difference is initially calculated between layers) is the central pixel in a 3 x 3 grid.
  • the 3 x 3 version delivers a marginally better compression ratio for large (e.g. >297 pixels square) images.
  • a further advantage can be obtained when the method is used with systems having more basic colour depths, for example in an eight bit system rather than a twenty-four bit system. Compression ratios in the region of 66% are achievable for an eight bit system.
  • the total reduction using the above method is to about 15% of an uncompressed true colour image.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

L'invention concerne un procédé de compression d'une page HTML consistant: à analyser le contenu, les étiquettes, les attributs de valeur et les attributs de données d'une page HTML, à comprimer le contenu, les étiquettes et les attributs de valeur de façon indépendante au moyen du codage de Huffman, et à coder séparément les attributs de données sans utiliser une technique de codage de Huffman.
PCT/GB2002/000307 2001-01-26 2002-01-25 Procede de compression de donnees Ceased WO2002060067A2 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2002226550A AU2002226550A1 (en) 2001-01-26 2002-01-25 A method of data compression
EP02716158A EP1360772A2 (fr) 2001-01-26 2002-01-25 Procede de compression de donnees

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB0102113A GB0102113D0 (en) 2001-01-26 2001-01-26 Improvements in or relating to wireless communication systems
GB0102114A GB0102114D0 (en) 2001-01-26 2001-01-26 A method of data compression
GB0102114.6 2001-01-26
GB0102113.8 2001-01-26

Publications (2)

Publication Number Publication Date
WO2002060067A2 true WO2002060067A2 (fr) 2002-08-01
WO2002060067A3 WO2002060067A3 (fr) 2003-09-18

Family

ID=26245622

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/GB2002/000322 Ceased WO2002060152A2 (fr) 2001-01-26 2002-01-25 Ameliorations se rapportant a des systemes de communication sans fil
PCT/GB2002/000307 Ceased WO2002060067A2 (fr) 2001-01-26 2002-01-25 Procede de compression de donnees

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/GB2002/000322 Ceased WO2002060152A2 (fr) 2001-01-26 2002-01-25 Ameliorations se rapportant a des systemes de communication sans fil

Country Status (3)

Country Link
EP (2) EP1360772A2 (fr)
AU (2) AU2002228168A1 (fr)
WO (2) WO2002060152A2 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004110027A1 (fr) * 2003-06-06 2004-12-16 Computer Associates Think, Inc. Systeme et procede de compression de parametres de demande d'url
WO2005067153A1 (fr) * 2003-12-30 2005-07-21 Koninklijke Philips Electronics N.V. Format de compression de donnees de consultation rapide pour fichiers xml
EP1629618A4 (fr) * 2003-06-04 2011-06-15 Qualcomm Inc Procede et appareil permettant de traduire des noms de ressources dans un environnement sans fil
US8321326B2 (en) 2009-09-15 2012-11-27 Auerbach Group Llc Method and system for enhancing the efficiency of a digitally communicated data exchange

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100146112A1 (en) * 2008-12-04 2010-06-10 Real Dice Inc. Efficient communication techniques
EP2825978B1 (fr) 2012-03-13 2021-06-30 Google LLC Système et procédé pour produire une représentation binaire d'une page web

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997034240A1 (fr) * 1996-03-15 1997-09-18 University Of Massachusetts Arbre compact pour le stockage et l'extraction de documents structures hypermedia
US5815516A (en) * 1996-04-05 1998-09-29 International Business Machines Corporation Method and apparatus for producing transmission control protocol checksums using internet protocol fragmentation
DE69942202D1 (de) * 1998-05-29 2010-05-12 Access Systems Americas Inc Verfahren und gerät zum drahtlosen interzugriff
EP1106008A1 (fr) * 1998-08-20 2001-06-13 Nokia Corporation Procede et appareil assurant un multiplexage utilisateur dans un protocole en temps reel -
GB9819136D0 (en) * 1998-09-02 1998-10-28 Lucent Tech Network Sys Gmbh Mobile terminal and base station in a packet radio services network
FI19992746A7 (fi) * 1998-12-28 2000-06-29 Spyglass Inc Menetelmä ja järjestelmä elektronisen datasisällön muuntamiseksi langattomille laitteille
US6594276B1 (en) * 1999-04-01 2003-07-15 Nokia Corporation Apparatus and associated method for communicating multimedia information upon a communication link
CA2384687A1 (fr) * 1999-09-10 2001-03-15 General Instrument Corporation Procede et appareil pour comprimer un contenu en langage script

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1629618A4 (fr) * 2003-06-04 2011-06-15 Qualcomm Inc Procede et appareil permettant de traduire des noms de ressources dans un environnement sans fil
WO2004110027A1 (fr) * 2003-06-06 2004-12-16 Computer Associates Think, Inc. Systeme et procede de compression de parametres de demande d'url
WO2005067153A1 (fr) * 2003-12-30 2005-07-21 Koninklijke Philips Electronics N.V. Format de compression de donnees de consultation rapide pour fichiers xml
US8321326B2 (en) 2009-09-15 2012-11-27 Auerbach Group Llc Method and system for enhancing the efficiency of a digitally communicated data exchange
US8538861B2 (en) 2009-09-15 2013-09-17 Auerbach Group Llc Use of adaptive and/or customized compression to enhance the efficiency of digital financial data exchanges
US8756149B2 (en) 2009-09-15 2014-06-17 Auerbach Group Llc Use of adaptive and/or customized compression to enhance the efficiency of digital data exchanges

Also Published As

Publication number Publication date
EP1358751A2 (fr) 2003-11-05
WO2002060152A2 (fr) 2002-08-01
AU2002226550A1 (en) 2002-08-06
WO2002060152A3 (fr) 2003-06-05
AU2002228168A1 (en) 2002-08-06
EP1360772A2 (fr) 2003-11-12
WO2002060067A3 (fr) 2003-09-18

Similar Documents

Publication Publication Date Title
EP0813167B1 (fr) Méthode et dispositif pour la compression et décompression de famille de caractères
US8700803B2 (en) Web page optimization
US5341440A (en) Method and apparatus for increasing information compressibility
US20030167334A1 (en) Provision of content to a client device
JP2000501593A (ja) 加速されたテキスト文字及び線画形成を備えた画像グラフィックスのダウンロード
WO2002060067A2 (fr) Procede de compression de donnees
CN108573069B (zh) 一种加速压缩流量正则表达式匹配的Twins方法
CN102379087B (zh) 压缩方法、解压缩方法、压缩单元、解压缩单元以及压缩文档
US8488894B2 (en) Method and system for dot-matrix font data compression and decompression
US20020138518A1 (en) Method and system for code processing of document data
Kim et al. Image compression using chain coding for electronic shelf labels (ESL) systems
KR100327924B1 (ko) 표의 문자 압축
Qin et al. Reversible data embedding for vector quantization compressed images using search‐order coding and index parity matching
JP3768959B2 (ja) ファイル形式の互換性を持たせるための方法
US7613319B2 (en) Electronic watermark generating apparatus, method, and program for generating electronic watermark
JP4122759B2 (ja) 文書データの符号処理方法及びシステム
Rauschenbach Compression of palettized images with progressive coding of the color information
DE60103379T2 (de) Darstellung des allgemeinen technischen gebiets und des stands der technik
US6433708B1 (en) Method of encoding binary data
Meinel et al. Multimedia data and its encoding
JP2000250718A (ja) 画像圧縮伸長装置
CN117744096A (zh) 一种图像处理方法、装置、设备及可读存储介质
Yulianti Application of hybrid fractal image compression method for aerial photographs
Meyyappan et al. Lossless Digital Image Compression Method For Bitmap Images
KR20050112089A (ko) 초경량 브라우저

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2002716158

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2002716158

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWW Wipo information: withdrawn in national office

Ref document number: 2002716158

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP