RU2431918C1

RU2431918C1 - Method of compressing information

Info

Publication number: RU2431918C1
Application number: RU2010111201/09A
Authority: RU
Inventors: Пётр Петрович Кувырков (RU); Пётр Петрович Кувырков; Елена Юрьевна Доронькина (RU); Елена Юрьевна Доронькина; Денис Михайлович Мошкин (RU); Денис Михайлович Мошкин
Priority date: 2010-03-23
Filing date: 2010-03-23
Publication date: 2011-10-20

Abstract

FIELD: information technology. ^ SUBSTANCE: encoded alphabet symbols of messages are put into two units of a matrix; the symbols are encoded using a code representation of index numbers of rows and columns of the matrix at whose intersection lie symbols of the message. If the message includes groups of several series-arranged symbols lying in one row or one column of the matrix, said group of symbols forms a common code consisting of a common row code and column codes or a common column code and row codes. Common code elements of the group of symbols lie in series with separation of row code from the column codes or column code from row codes based on any of the following features: change in polarity, amplitude, frequency and phase of electrical signals. ^ EFFECT: high efficiency of lossless information compression. ^ 4 dwg

Description

Предлагаемое изобретение относится к области информационных технологий, информационной техники, телемеханики и др.The present invention relates to the field of information technology, information technology, telemechanics, etc.

Известны следующие способы сжатия информации: дельта-кодирование [1], алгоритм Лемпеля-Зива-Велча [2], кодирование методом Хаффмана [1], арифметическое кодирование [3] и др. При дельта-кодировании используются разницы между последовательными данными вместо самих данных. Это помогает в случае дальнейшей компрессии этих данных, в которых часто встречаются повторяющиеся значения, и не очень полезно в случае, когда используется само по себе. Алгоритм Лемпеля-Зива-Велча при сжатии динамически создает таблицу преобразования строк. Для каждого конкретного случая она индивидуальна. При кодировании методом Хаффмана и арифметическом кодировании необходимо проводить анализ текста с целью определения частот появления символов в данном тексте с последующим определением вероятности их появления. Во-первых, это создает дополнительные трудности. Во-вторых, в разных текстах одни и те же символы будут закодированы по-разному. В-третьих, закодированный текст передается вместе с кодовой таблицей данных символов, что увеличивает объем передаваемой информации.The following methods of information compression are known: delta coding [1], the Lempel-Ziv-Welch algorithm [2], Huffman coding [1], arithmetic coding [3], etc. In delta coding, differences between serial data are used instead of the data itself . This helps in the case of further compression of this data, in which duplicate values are often found, and is not very useful in the case when it is used by itself. The Lempel-Ziv-Welch algorithm during compression dynamically creates a string conversion table. For each case, it is individual. When coding by the Huffman method and arithmetic coding, it is necessary to analyze the text in order to determine the frequency of occurrence of characters in this text with the subsequent determination of the probability of their occurrence. Firstly, this creates additional difficulties. Secondly, in different texts the same characters will be encoded differently. Thirdly, the encoded text is transmitted along with the code table of these symbols, which increases the amount of information transmitted.

Ни один из перечисленных способов сжатия информации не обладает универсальностью, непосредственностью и возможностью сжатия не только по числу двоичных элементов кода, но и числа символов букв в сообщении.None of the listed methods of information compression has universality, spontaneity and the ability to compress not only by the number of binary code elements, but also by the number of characters of letters in the message.

Технический результат предполагаемого изобретения заключается в обеспечении повышения эффективности, универсальной сущности сжатия информации без потери.The technical result of the proposed invention is to provide increased efficiency, the universal essence of information compression without loss.

Поставленная цель достигается тем, что кодируемые символы (элементы алфавита сообщений) размещают в узлах матрицы. Для кодирования символов используют кодовое представление порядковых номеров строк и столбцов матрицы, на пересечении которых расположены символы сообщения. Пример расположения кодируемых символов представлен на фиг.1 и 2.This goal is achieved by the fact that the encoded characters (elements of the message alphabet) are placed in the nodes of the matrix. To encode characters, use the code representation of the row and column numbers of the matrix, at the intersection of which are the message characters. An example of the location of the encoded characters is presented in figures 1 and 2.

Способ осуществляется следующим образом. Кодируемые символы размещают в узлах матрицы размерностью m×n (фиг.1). Максимальное число кодируемых символов N=m·n. Порядковые номера строк и столбцов матрицы представляют в двоичном виде: 000, 001, 010… и т.д. Совокупность кода строки и кода столбца, на пересечении которых расположен символ, есть кодовое представление данного символа. Например, код символа z₂₃ будет представлять собой комбинацию 010011. При кодировании какого-либо сообщения каждый элемент данного сообщения представляют соответствующим кодом. Например, кодовая комбинация символов сообщения z₂₁z₂₀z₂₃ будет иметь вид 010001010000010011. При наличии в сообщении группы нескольких последовательно расположенных символов, находящихся на одной строке (одном столбце) матрицы, данная группа символов образует общий код, состоящий из кода общей строки и кодов столбцов (из кода общего столбца и кодов строк). Причем элементы общего кода группы символов располагают последовательно с выделением кода строки от кодов столбцов (кода столбца от кодов строк) по тому или иному признаку: изменение полярности, амплитуды, частоты, фазы электрических сигналов. Например, кодовая комбинация символов сообщения z₂₁z₂₀z₂₃ будет иметь вид 010001010000010011 в промежуточном варианте и окончательно 010001000011. Это позволяет более компактно сжать необходимую информацию.The method is as follows. The encoded characters are placed in the nodes of the matrix dimension m × n (figure 1). The maximum number of encoded characters is N = m · n. The serial numbers of the rows and columns of the matrix are represented in binary form: 000, 001, 010 ... etc. The combination of row code and column code, at the intersection of which the symbol is located, is the code representation of this symbol. For example, the character code z ₂₃ will be a combination of 010 011. When encoding a message, each element of this message is represented by a corresponding code. For example, the code combination of characters of the message z ₂₁ z ₂₀ z ₂₃ will have the form 010 001 010 000 010 011. If there are several consecutive characters in the message on the same row (one column) of the matrix, this group of characters forms a common code consisting of from the common row code and column codes (from the common column code and row codes). Moreover, the elements of a common code of a group of symbols are arranged sequentially with the allocation of a row code from column codes (a column code from row codes) according to one or another sign: a change in polarity, amplitude, frequency, phase of electrical signals. For example, the code combination of the message symbols z ₂₁ z ₂₀ z ₂₃ will look like 010 001 010 000 010 011 in the intermediate version and finally 010 001000011. This allows for more compact compression of the necessary information.

Пример 1. Разместим в узлах матрицы размерностью 4×4 буквы русского алфавита (фиг.2). Далее закодируем предлагаемым способом слово «зима». Получим промежуточный результат 1001100001010100 и окончательно 100100010100. В данном случае кодовая комбинация сократилась с 16 до 12 знаков.Example 1. We place in the nodes of the matrix dimension 4 × 4 letters of the Russian alphabet (figure 2). Next, we encode the word "winter" in the proposed way. We get the intermediate result 10 01 10 00 01 01 01 00 and finally 10 0100 01 0100. In this case, the code combination was reduced from 16 to 12 characters.

Составленную матрицу можно использовать для кодирования любой информации, содержащей символы, расположенные в узлах данной матрицы. Матрицу можно расширить до необходимой размерности m×n, чтобы поместить в ее узлы все символы, которые используются для передачи соответствующей информации.The compiled matrix can be used to encode any information containing characters located in the nodes of this matrix. The matrix can be expanded to the required dimension m × n to place in its nodes all the symbols that are used to transmit the corresponding information.

Таким образом предлагаемый способ обеспечивает универсальное и непосредственное сжатие информации без потерь с возможностью сжатия не только по числу двоичных элементов кода, но и числа символов букв в сообщении.Thus, the proposed method provides universal and direct compression of lossless information with the ability to compress not only the number of binary code elements, but also the number of characters of letters in the message.

Возможна следующая техническая реализация данного способа сжатия информации (фиг.3). Через каждый узел матрицы проходит 3 разновидности шин: общая шина сообщений и координатные шины - шины x и шины y. В каждом узле матрицы расположен регистр кода (память) сообщения и элемент «И», на один вход которого подается код сжимаемого сообщения S и на другой вход подается код сообщения, хранимый в регистре данного узла (фиг.4). При совпадении этих двух кодов на выходе элемента «И» формируется единица, которая поступает на координатные шины x и y, и осуществляет через элементы «И» включение триггеров и регистра данной шины x и регистра данной шины y. С одного выхода триггера поступает сигнал на второй вход элемента «И» данного триггера, а с другого входа снимается код адреса данной шины. При поступлении следующего сообщения, код которого находится в регистре узла, связанного с этой же шиной, вызывающего появление единицы, переключение триггера уже не происходит. При поступлении нескольких единиц последовательно на данный триггер с выхода мы снимаем не совокупность кодов адреса триггера, а всего лишь один код (эффект сжатия). При поступлении нового сообщения, код которого отличается от предыдущего и единицы будут расположены в других шинах, происходит включение других триггеров и сброс ранее включенных триггеров. На выходе регистра шины сообщения происходит запись кодовой комбинации S' из кодов адресов x и кодов адресов y. На приемной стороне принятая сжатая комбинация вновь воспроизводится в исходную.The following technical implementation of this method of information compression is possible (figure 3). Through each node of the matrix 3 types of buses pass: a common message bus and coordinate buses - x buses and y buses. At each node of the matrix, there is a message code register (memory) and an “AND” element, on one input of which a compressible message code S is supplied and a message code stored in the register of this node is supplied to the other input (Fig. 4). If these two codes coincide, the unit is formed at the output of the "And" element, which is fed to the coordinate buses x and y, and through the "And" elements it turns on the triggers and register of this bus x and the register of this bus y. From one trigger output, a signal is sent to the second input of the “And” element of this trigger, and the address code of this bus is removed from the other input. When the next message arrives, the code of which is in the register of the node associated with the same bus, causing the appearance of a unit, trigger switching does not occur anymore. When several units arrive sequentially on a given trigger from the output, we remove not a set of trigger address codes, but only one code (compression effect). When a new message arrives, the code of which is different from the previous one and the units will be located on other buses, other triggers are turned on and the triggers previously turned on are reset. At the output of the message bus register, a code combination S 'is written from address codes x and address codes y. On the receiving side, the received compressed combination is again reproduced in the original.

Источники информацииInformation sources

1. Методы сжатия данных. Устройство архиваторов, сжатие изображений и видео. Ватолин Д., Ратушняк А., Смирнов М., Юкин В. - М.: ДИАЛОГ-МИФИ, 2003. - 384 с.1. Data compression methods. The device archivers, image and video compression. Vatolin D., Ratushnyak A., Smirnov M., Yukin V. - M .: DIALOG-MEPhI, 2003 .-- 384 p.

2. Основы кодирования. Вернер М. - М.: Техносфера, 2004. - 288 с.2. The basics of coding. Werner, M. - M .: Technosphere, 2004 .-- 288 p.

3. Теория информации. Кодирование дискретных вероятностных источников. Учебное пособие. В.Н.Потапов. Новосибирск. 1999. - 71 с.3. Theory of information. Coding of discrete probabilistic sources. Tutorial. V.N. Potapov. Novosibirsk 1999 .-- 71 p.

Claims

A method of compressing information containing the alphabet of the message and a code representation of its elements, characterized in that if the message contains a group of several consecutive characters located on the same line or on the same column of the matrix, this group of characters forms a common code consisting of a common line code and column codes or from a common column code and row codes, the elements of a common character group code being arranged sequentially with the allocation of the row code from the column codes or the column code from the row codes have a particular feature: the change of polarity, amplitude, frequency, phase, electrical signals.