CN101751656B - A watermark embedding and extraction method and device - Google Patents
A watermark embedding and extraction method and device Download PDFInfo
- Publication number
- CN101751656B CN101751656B CN2008102404837A CN200810240483A CN101751656B CN 101751656 B CN101751656 B CN 101751656B CN 2008102404837 A CN2008102404837 A CN 2008102404837A CN 200810240483 A CN200810240483 A CN 200810240483A CN 101751656 B CN101751656 B CN 101751656B
- Authority
- CN
- China
- Prior art keywords
- punctuate
- available
- zone
- embedded
- punctuation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 26
- 238000000034 method Methods 0.000 claims abstract description 41
- 230000005484 gravity Effects 0.000 claims description 24
- 238000004458 analytical method Methods 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 abstract description 8
- 238000005516 engineering process Methods 0.000 description 6
- 238000004519 manufacturing process Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000007639 printing Methods 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 239000012634 fragment Substances 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Landscapes
- Editing Of Facsimile Originals (AREA)
Abstract
本发明公开了一种水印嵌入和提取方法及装置,应用于数字内容的版权管理过程中,该方法包括:根据设定的区域确定规则,确定出待嵌入信息的文本文档中的可用标点区域;根据设定的位置确定规则,分别确定出每个所述可用标点区域中标点的原始位置;针对每一个可用标点区域,根据其中标点的原始位置和对应的待嵌入编码,对其中标点的位置进行调整,实现将所述待嵌入编码嵌入到对应的可用标点区域中。在提取水印时,采用对应的规则分别提取出各个可用标点区域中的嵌入编码。上述方法操作简单,嵌入的水印信息隐蔽性好、且稳定性高,同时能获得很好的视觉效果。
The invention discloses a watermark embedding and extraction method and device, which are applied in the copyright management process of digital content. The method includes: determining the available punctuation area in the text document to be embedded according to the set area determination rule; Determine the original position of the punctuation in each of the available punctuation areas respectively according to the position determination rule set; for each available punctuation area, according to the original position of the punctuation and the corresponding code to be embedded, the position of the punctuation in it is performed Adjust to realize embedding the code to be embedded into the corresponding available punctuation area. When extracting the watermark, the corresponding rules are used to extract the embedded codes in each available punctuation area. The above method is easy to operate, the embedded watermark information has good concealment and high stability, and can obtain good visual effects at the same time.
Description
技术领域 technical field
本发明涉及数字版权管理领域,尤指一种用于数字文本的、基于标点的文本数字水印嵌入和提取方法及装置。The invention relates to the field of digital copyright management, in particular to a text digital watermark embedding and extraction method and device based on punctuation for digital text.
背景技术 Background technique
随着全球信息数字化进程的日益加快,文本资料大量涌现,如个人档案、医疗记录、学历证书、专利证件、手写签名、馆藏图书、机要文件等等都是文本的常见形式,这些文本资料的重要性是不言而喻的。此外,随着电子商务、电子政务的日趋流行,对网上发行的电子刊物进行盗版追踪,对来往的电子信函、公文或传真进行真伪判断、完整性认证也日益紧迫。因此在文本资料中嵌入水印(即加入附加的信息),从而实现产权保护、信息安全维护等也就显得尤为重要。With the acceleration of global information digitization, a large number of text materials are emerging, such as personal files, medical records, academic certificates, patent certificates, handwritten signatures, library collections, confidential documents, etc. are all common forms of text. The importance is self-evident. In addition, with the increasing popularity of e-commerce and e-government, it is increasingly urgent to track the piracy of electronic publications issued on the Internet, and to judge the authenticity and integrity of electronic letters, official documents or faxes. Therefore, it is particularly important to embed watermarks (that is, add additional information) in text materials to achieve property rights protection and information security maintenance.
上面所说的文本资料,有两种常见的载体形式:一是纸张,二是电子文档。嵌入水印的方式也有两种:显式和隐式。显式嵌入水印,指加入的信息人眼明显可见,例如半透明图像水印、在背景上印刷上单位名称、附加的条码等等。隐式嵌入水印,指加的信息人眼很难察觉,例如各种隐式的图像水印,需要用特定的仪器结合相应的软件才能识别出来。在上述几种情况的组合中,对纸质文件隐式地嵌入水印较为困难,其中又以在黑白二值的单色纸质文件(下面简称为二值纸张)中进行嵌入难度最大。纸质文件嵌入信息,最常见的是显示方式,例如在纸张背景上加上特定的背景图,写上特定的文字(如银行给客户看的样本文件,会在一些区域写上“样本”的字样);或采用特殊的纸张作为载体等。但在二值纸张,例如在常见的“白纸黑字”中隐式地嵌入信息,是很困难的。There are two common carrier forms for the above-mentioned text materials: one is paper, and the other is electronic documents. There are also two ways to embed watermarks: explicit and implicit. Explicitly embedded watermark means that the added information is clearly visible to human eyes, such as semi-transparent image watermark, unit name printed on the background, additional barcode, etc. Implicit embedded watermark means that the added information is difficult for human eyes to detect. For example, various implicit image watermarks need to be recognized by specific instruments combined with corresponding software. In the combination of the above situations, it is more difficult to implicitly embed watermarks in paper documents, among which embedding in black and white binary monochrome paper documents (hereinafter referred to as binary paper) is the most difficult. Embedding information in paper documents, the most common way is display, such as adding a specific background image on the paper background, writing specific text (for example, the sample documents that banks show customers will write "sample" in some areas typeface); or use special paper as a carrier, etc. But implicitly embedding information on binary paper, such as the common "black and white", is difficult.
现有的技术中,也出现了一些在二值纸张中嵌入数字水印的技术。In the existing technologies, there are also some technologies for embedding digital watermarks in binary paper.
例如:技术文献《Brassil J,Low S,Maxemchuk N F.Copyright Protectionfor the Electronic Distribution of Text Documents.Proceedings of the IEEE,I999,87(7):1181-1196》中,公开了通过修改行间距、字间距来调整文档的排版格式,从而实现信息的嵌入,该方法原理简洁,容易实现。其主要问题是嵌入信息的稳定性与视觉效果的矛盾很难解决:如果行间距、字间距的改变量不足,则信息提取时很难做到准确;如果要保证提取的准确性,则需要较大的改变量,很容易被读者察觉,不能达到“隐式”的效果。For example: In the technical literature "Brassil J, Low S, Maxemchuk N F. Copyright Protection for the Electronic Distribution of Text Documents. Proceedings of the IEEE, I999, 87(7): 1181-1196", it is disclosed that by modifying line spacing, text Adjust the typesetting format of the document by adjusting the spacing, so as to realize the embedding of information. This method is simple in principle and easy to implement. The main problem is that the contradiction between the stability of the embedded information and the visual effect is difficult to solve: if the change of line spacing and word spacing is insufficient, it is difficult to achieve accurate information extraction; if the accuracy of extraction is to be ensured, more A large amount of change is easy to be noticed by readers, and cannot achieve the "implicit" effect.
专利申请《对电子公文或文档进行加密及鉴别真伪的方法》(公开号CN1588351)中公开了将普通汉字做一些轻微的变形,人眼不易察觉,但可以通过手工或OCR技术识别出这种轻微的变形,从而达到隐式嵌入信息的目的。该方法嵌入信息量较大,稳定性高,但涉及到字库、OCR、打印输出、特定的电子文本格式(用于打印)、数据训练等技术,其工序烦琐、工作量大、制作成本非常高昂。The patent application "Method for Encrypting Electronic Official Documents or Documents and Identifying Authenticity" (publication number CN1588351) discloses that ordinary Chinese characters are slightly deformed, which is difficult for human eyes to detect, but can be identified by hand or OCR technology. Slight deformation, so as to achieve the purpose of implicitly embedding information. This method embeds a large amount of information and has high stability, but it involves technologies such as font library, OCR, printout, specific electronic text format (for printing), data training, etc. The process is cumbersome, the workload is heavy, and the production cost is very high. .
专利申请《基于字符拓扑结构的文本数字水印技术》(公开号CN1684115)中公开了通过改变组成字符(串)的各笔划之间的连断关系来改变字符的拓扑结构,使变体字的笔划组成的连通域数目发生变换,从而嵌入信息的方法。该方法的主要缺点首先是制作成本高,此外在信息检测时的鲁棒性也不理想:纸张的轻微污染可能造成原来未连通的笔划连通,而复印很容易造成原来连接的笔划断开,使连通域发生变化,从而使得通过连通域数来检测变得很不稳定。In the patent application "Text Digital Watermarking Technology Based on Character Topology" (publication number CN1684115), it is disclosed that the topological structure of characters can be changed by changing the continuity relationship between the strokes that make up characters (strings), so that the strokes of variant characters The method of transforming the number of connected domains to embed information. The main disadvantage of this method is that the production cost is high, and the robustness of information detection is not ideal: the slight pollution of the paper may cause the original unconnected strokes to be connected, and copying can easily cause the original connected strokes to be disconnected, making the Connected domains change, making detection by the number of connected domains unstable.
专利申请《一种数字水印嵌入与提取方法及装置》(公开号CNI945622A)中公开了将文本中各字符划分区域,翻转每个区域内的点,造成区域内黑像素点发生变化,从而嵌入信息。该方法嵌入水印的制作成本高,且由于视觉特征的限制,翻转的点只能在字符笔划边缘处,实际是改变了笔划的粗细,对视觉效果也不够理想。The patent application "A Digital Watermark Embedding and Extraction Method and Device" (publication number CNI945622A) discloses that each character in the text is divided into regions, and the points in each region are flipped, causing the black pixels in the region to change, thereby embedding information . The production cost of embedding watermark in this method is high, and due to the limitation of visual characteristics, the flipped point can only be at the edge of the character stroke, which actually changes the thickness of the stroke, and the visual effect is not ideal.
可见,现有技术中对文档进行加密(嵌入水印)的方法,存在潜入信息量大,工作量大,制作繁琐,制作成本高的缺点;其稳定性也比较差,在复制过程中很难保持;且容易被读者察觉,隐避效果不好,视觉效果也不佳。It can be seen that the method of encrypting (embedding watermark) on documents in the prior art has the disadvantages of large amount of hidden information, heavy workload, cumbersome production, and high production cost; its stability is also relatively poor, and it is difficult to maintain in the process of copying. ; And it is easy to be noticed by readers, the hiding effect is not good, and the visual effect is not good.
发明内容 Contents of the invention
本发明实施例提供一种水印嵌入和提取方法及装置,用于解决现有技术中在文本文档中嵌入水印信息时存在的稳定性差、隐蔽性差的问题。Embodiments of the present invention provide a method and device for embedding and extracting watermarks, which are used to solve the problems of poor stability and poor concealment existing in embedding watermark information in text documents in the prior art.
一种水印嵌入方法,包括:A watermark embedding method, comprising:
根据设定的区域确定规则,确定出待嵌入信息的文本文档中的可用标点区域;Determine the available punctuation area in the text document to embed information according to the set area determination rule;
根据设定的位置确定规则,分别确定出每个所述可用标点区域中标点的原始位置;According to the set position determination rules, respectively determine the original position of the punctuation in each of the available punctuation areas;
针对每一个可用标点区域,根据其中标点的原始位置和对应的待嵌入编码,对其中标点的位置进行调整,实现将所述待嵌入编码嵌入到对应的可用标点区域中。For each available punctuation area, the position of the punctuation point is adjusted according to the original position of the punctuation point and the corresponding code to be embedded, so as to embed the code to be embedded into the corresponding available punctuation area.
一种水印提取方法,包括:A watermark extraction method, comprising:
根据与嵌入水印信息时相同的区域确定规则,确定出已嵌入信息的文本文档中的可用标点区域;Determine the available punctuation area in the text document with embedded information according to the same area determination rules as when embedding watermark information;
根据与嵌入水印信息时相同的位置确定规则,分别确定出每个所述可用标点区域中标点所在的位置;According to the same position determination rule as when embedding watermark information, respectively determine the position of the punctuation in each of the available punctuation areas;
根据所述标点所在的位置,分别确定出各可用标点区域中的嵌入编码。According to the position of the punctuation, the embedded codes in each available punctuation area are respectively determined.
一种水印嵌入装置,包括:A watermark embedding device, comprising:
区域确定模块,用于根据设定的区域确定规则,确定出待嵌入信息的文本文档中的可用标点区域;an area determination module, configured to determine the available punctuation area in the text document to be embedded with information according to the set area determination rule;
位置确定模块,用于根据设定的位置确定规则,分别确定出每个所述可用标点区域中标点的原始位置;A position determination module, configured to determine the original position of the punctuation points in each of the available punctuation areas according to the set position determination rules;
信息嵌入模块,用于针对每一个可用标点区域,根据其中标点的原始位置和对应的待嵌入编码,对其中标点的位置进行调整,实现将所述待嵌入编码嵌入到对应的可用标点区域中。The information embedding module is used to adjust the position of the punctuation in each available punctuation area according to the original position of the punctuation and the corresponding code to be embedded, so as to embed the code to be embedded into the corresponding available punctuation area.
一种水印提取装置,包括:A watermark extraction device, comprising:
区域确定模块,用于根据与嵌入水印信息时相同的区域确定规则,确定出已嵌入信息的文本文档中的可用标点区域;An area determination module, configured to determine the available punctuation area in the text document in which the information has been embedded according to the same area determination rule as when embedding the watermark information;
位置确定模块,用于根据与嵌入水印信息时相同的位置确定规则,分别确定出每个所述可用标点区域中标点所在的位置;A position determination module, configured to determine the position of the punctuation points in each of the available punctuation areas according to the same position determination rules as when embedding watermark information;
编码提取模块,用于根据所述标点所在的位置,分别确定出各可用标点区域中的嵌入编码。The code extraction module is used to respectively determine the embedded codes in each available punctuation area according to the positions of the punctuation marks.
本发明实施例提供的水印嵌入和提取方法及装置,通过选取可用标点区域;通过对每一个可用标点区域中的标点位置进行调整,实现将待嵌入编码嵌入到对应的可用标点区域中。在提取水印时,则根据调整后的标点位置采用对应的规则分别提取出各个可用标点区域中的嵌入编码。上述方法操作简单,且由于人眼对标点位置改变的敏感度远远小于对字符位置的改变,因此可做较大幅度的改变,使得嵌入的水印信息稳定性高,隐藏性好,同时能够保证良好的视觉效果。The watermark embedding and extraction method and device provided by the embodiments of the present invention realize embedding the code to be embedded into the corresponding available punctuation area by selecting the available punctuation area and adjusting the punctuation position in each available punctuation area. When extracting the watermark, the corresponding rules are used to extract the embedded codes in each available punctuation area according to the adjusted punctuation position. The above method is easy to operate, and since the human eye is far less sensitive to changes in punctuation positions than to changes in character positions, large changes can be made, so that the embedded watermark information has high stability and good concealment, and at the same time can ensure good visuals.
附图说明 Description of drawings
图1为本发明实施例中水印嵌入方法的流程图;Fig. 1 is the flowchart of watermark embedding method in the embodiment of the present invention;
图2为本发明实施例中确定出文档片段中可用标点区域的示例图;Fig. 2 is an example diagram of determining the available punctuation area in the document fragment in the embodiment of the present invention;
图3为本发明实施例中对确定出的可用标点区域进行频带划分的示意图;FIG. 3 is a schematic diagram of frequency band division of determined available punctuation areas in an embodiment of the present invention;
图4为本发明实施例中在可用标点区域中嵌入信息后的文本片段示例;FIG. 4 is an example of a text fragment after embedding information in an available punctuation area in an embodiment of the present invention;
图5为本发明实施例中水印提取方法的流程图;FIG. 5 is a flowchart of a watermark extraction method in an embodiment of the present invention;
图6为本发明实施例中水印嵌入装置的结构示意图;6 is a schematic structural diagram of a watermark embedding device in an embodiment of the present invention;
图7为本发明实施例中水印提取装置的结构示意图。Fig. 7 is a schematic structural diagram of a watermark extraction device in an embodiment of the present invention.
具体实施方式 Detailed ways
本发明实施例提供的水印嵌入和提取方法,根据设定的区域选取规则在待嵌入水印信息的文本文档中选取可用标点区域,确定出每个可用标点区域中标点所在的位置,然后通过调整每个可用标点区域中标点的位置,达到嵌入水印信息的目的;在提取时,仍采用相同的规则确定出已嵌入水印信息的文本文档中的可用标点区域和每个可用标点区域中标点所在的位置,根据标点所在的位置得到嵌入的水印信息。The watermark embedding and extraction method provided by the embodiment of the present invention selects the available punctuation area in the text document to be embedded with watermark information according to the set area selection rules, determines the position of the punctuation point in each available punctuation area, and then adjusts each The position of the punctuation in each available punctuation area to achieve the purpose of embedding watermark information; when extracting, the same rule is still used to determine the available punctuation area in the text document with embedded watermark information and the position of the punctuation in each available punctuation area. , get the embedded watermark information according to the position of the punctuation.
本发明实施例提供的水印嵌入方法,通过调整文本文档中标点的位置,达到嵌入水印信息的目的,其流程图如图1所示,执行步骤如下:The watermark embedding method provided by the embodiment of the present invention achieves the purpose of embedding watermark information by adjusting the position of the punctuation in the text document. Its flow chart is shown in Figure 1, and the execution steps are as follows:
S101:根据设定的区域选取规则,查找并确定出待嵌入信息的文本文档中的可用标点区域。S101: Search and determine the available punctuation area in the text document to embed information according to the set area selection rule.
首先,利用OCR,对欲嵌入信息的文本文档进行版面识别分析,去除文本文档中的边框、表格线、图像、花边等非文本区域的特征,获得纯文本区域。First, use OCR to perform layout recognition and analysis on the text document to embed information, remove the features of non-text areas such as borders, table lines, images, and lace in the text document, and obtain pure text areas.
对纯文本区域进行文字切分和标点分析,找出所有可用标点,确定出可用标点区域。Carry out text segmentation and punctuation analysis on the plain text area, find out all available punctuation points, and determine the available punctuation area.
可用标点是指标点的前后均有至少一个其他字符,即标点所在位置满足:“其它字符、标点、其它字符”这一位置关系。对于不满足此条件的标点则可舍去不用。例如,当标点位于文字行的最后时,由于其后没有其他字符而不符合条件,故对于这样的标点则舍去不用。其中,其它字符包括中文、数字、字母等除标点外的所有其它符号。Available punctuation means that there is at least one other character before and after the index point, that is, the position of the punctuation satisfies the positional relationship of "other characters, punctuation, and other characters". For punctuation that does not meet this condition, it can be discarded. For example, when the punctuation is at the end of the text line, it does not meet the conditions because there are no other characters behind it, so such punctuation is discarded. Wherein, other characters include all other symbols except punctuation, such as Chinese, numbers, and letters.
根据可用标点及其前后相邻的两个其他字符,定义起始边界和终止边界、得到可用标点区域。其中,可用标点区域的起始边界可以包括:前面字符的左边界、右边界、重心位置或中心位置等,可用标点区域的终止边界可以包括:后面字符的左边界、右边界、重心位置或中心位置等。According to the available punctuation and two other adjacent characters before and after, define the start boundary and the end boundary, and obtain the available punctuation area. Wherein, the starting boundary of the available punctuation area may include: the left boundary, the right boundary, the center of gravity position or the center position of the preceding character, and the ending boundary of the available punctuation area may include: the left boundary, the right boundary, the center of gravity position or the center of the following character location etc.
例如:图2所示的文本文档中,确定出可用标点包括:“示,是”之间的“,”、“图。首”之间的“。”、“先,文”之间的“,”等7个可用标点。可用标点区域的起始边界取的是前面字符的左边界,终止边界取的是后面字符的右边界,从而确定出图2所示的“示,是”、“图。首”、“先,文”等7个可用标点区域。For example: in the text document shown in Figure 2, it is determined that the available punctuation includes: "," between "show, yes", "." between "Fig. Shou", " ," and other 7 available punctuation points. What the starting boundary of available punctuation area got was the left boundary of preceding character, and what ending boundary got was the right boundary of back character, thereby determine " showing that is " shown in Fig. 2, " figure. First ", " first, "Text" and other 7 available punctuation areas.
S102:根据设定的位置确定规则,确定出可用标点区域中标点的原始位置。具体包括:S102: Determine the original position of the punctuation point in the available punctuation area according to the set position determination rule. Specifically include:
(1)将每个可用标点区域划分为若干频带。(1) Divide each available punctuation area into several frequency bands.
根据设定的边界规则,计算出各可用标点区域的长度,即起始边界至终止边界的距离。根据计算出来的距离,分别将各可用标点区域平均划分为若干份,每份即为一个频带。例如分为k份,则从第一个频带到最后一个频带,其对应的频带索引分别为0、1、2、…k-1。According to the set boundary rules, calculate the length of each available punctuation area, that is, the distance from the start boundary to the end boundary. According to the calculated distance, each available punctuation area is equally divided into several parts, and each part is a frequency band. For example, if it is divided into k parts, then from the first frequency band to the last frequency band, the corresponding frequency band indexes are 0, 1, 2, ...k-1 respectively.
沿用上边的例子,将图2中确定出的7个可用标点区域均进行频带划分,例如每个可用标点区域划分为16个频带,划分后如图3所示。Following the above example, the 7 available punctuation areas determined in Figure 2 are all divided into frequency bands, for example, each available punctuation area is divided into 16 frequency bands, as shown in Figure 3 after division.
(2)根据每个可用标点区域中的标点所在的坐标位置,分别确定出每个可用标点区域中标点所在的频带以及对应的频带索引。(2) According to the coordinate positions of the punctuation points in each available punctuation area, respectively determine the frequency band where the punctuation points are in each available punctuation area and the corresponding frequency band index.
可以由标点的重心、中心、左边界或右边界等位置参数代表标点的位置,根据标点的重心、中心、左边界或右边界等所在的位置确定标点所在的频带,并确定出对应的频带索引。The position of the punctuation can be represented by position parameters such as the center of gravity, center, left boundary or right boundary of the punctuation, and the frequency band where the punctuation is located can be determined according to the position of the center of gravity, center, left boundary or right boundary of the punctuation, and the corresponding frequency band index can be determined .
根据标点的重心位置确定标点所在的频带时,需要先计算标点重心所在的坐标位置。计算标点重心坐标的公式如下:When determining the frequency band where the punctuation is located according to the position of the center of gravity of the punctuation point, it is necessary to first calculate the coordinate position where the center of gravity of the punctuation point is located. The formula for calculating the barycentric coordinates of punctuation points is as follows:
其中,ΔSi表示该标点包含的水平坐标为xi的黑像素点数;Wherein, ΔS i represents the number of black pixels whose horizontal coordinates are x i included in the punctuation point;
xi表示任意水平坐标值。x i represents any horizontal coordinate value.
S为标点的黑像素点数的总和;S is the sum of the black pixel points of punctuation;
xC为标点的重心坐标。x C is the barycentric coordinates of the punctuation point.
然后,根据计算出的坐标位置所在频带得到标点所在的频带。即根据每个频带所处的坐标范围,判断重心坐标xC落入了哪个频带,以及对应的频带索引。Then, the frequency band where the punctuation point is located is obtained according to the frequency band where the calculated coordinate position is located. That is, according to the coordinate range of each frequency band, determine which frequency band the center of gravity coordinate x C falls into, and the corresponding frequency band index.
例如:对图3所示的第一个区域“示,是”,计算出逗号在水平方向的重心坐标后,根据重心坐标确定出其所在的频带为频带索引为6的频带。For example: for the first area shown in FIG. 3 "show yes", after calculating the barycentric coordinates of the comma in the horizontal direction, the frequency band where the comma is located is determined as the frequency band whose frequency band index is 6 according to the barycentric coordinates.
特别的,对于横排的文本文档,计算水平重心,而对于竖排的文本文档则需要计算的是垂直方向的重心,垂直重心的计算公式可以根据水平重心的计算公式类比得到。In particular, for a horizontal text document, the horizontal center of gravity is calculated, while for a vertical text document, the vertical center of gravity needs to be calculated. The formula for calculating the vertical center of gravity can be obtained by analogy with the formula for calculating the horizontal center of gravity.
当需要根据中心计算时,也有相应的公式,此处不再一一列举。When it is necessary to calculate based on the center, there are also corresponding formulas, which will not be listed here.
需要说明的是,许多标点带有拖尾,例如逗号就带有一个较尖的尾部。这时若将纸张通过扫描仪等设备扫描为图像,则其较尖的尾部易发生消失或断裂的情况,即其尾部的尖缺失了。这时如果以标点的中心或左、右边界来代表标点的位置,则会出现误差。如果以标点的重心代表标点的位置,则由于个别点的缺失或增加对整个标点重心位置的改变非常小,在实际计算过程中,对该值四舍五入后,得到的重心坐标值基本上不会发生变化,因此用标点的重心位置来代表标点的位置,是最佳的选择,可获得更好的稳定性。It should be noted that many punctuation marks have a trailing end, for example commas have a sharper end. At this time, if the paper is scanned into an image by a device such as a scanner, the sharper tail is likely to disappear or break, that is, the tip of the tail is missing. At this time, if the center of the punctuation or the left and right boundaries are used to represent the position of the punctuation, errors will occur. If the center of gravity of the punctuation is used to represent the position of the punctuation point, the change of the center of gravity of the entire punctuation due to the absence or addition of individual points is very small. In the actual calculation process, after the value is rounded, the coordinate value of the center of gravity obtained will basically not occur. Change, so the position of the center of gravity of the punctuation is used to represent the position of the punctuation, which is the best choice for better stability.
S103:根据设定的信息嵌入规则,确定出每个可用标点区域的待嵌入编码。其中,信息嵌入规则可以任意设置和选择。具体包括:S103: Determine the code to be embedded for each available punctuation area according to the set information embedding rules. Among them, the information embedding rules can be set and selected arbitrarily. Specifically include:
首先,根据文本文档对应的待嵌入信息,确定出待嵌入的二进制数;其中确定出的待嵌入的二进制数的位数小于等于确定出的可用标点区域数量。First, the binary number to be embedded is determined according to the information to be embedded corresponding to the text document; wherein the determined number of digits of the binary number to be embedded is less than or equal to the determined number of available punctuation regions.
若待嵌入信息本身是一个二进制数,且其位数小于等于确定出的可用标点区域数量,则直接确定该嵌入信息为待嵌入的二进制数。If the information to be embedded is a binary number, and its number of digits is less than or equal to the determined number of available punctuation areas, then directly determine that the embedded information is a binary number to be embedded.
若待嵌入信息本身是一个二进制数,但其位数大于确定出的可用标点区域数量,则选择一个位数小于等于确定出的可用标点区域数量的二进制数,作为待嵌入的二进制数,并建立所选择的二进制数与待嵌入信息的对应关系,并将待嵌入信息,以及其与选择的二进制数的对应关系保存记录下来,例如记录在数据库中。If the information to be embedded is a binary number, but its number of digits is greater than the determined number of available punctuation areas, then select a binary number whose number of digits is less than or equal to the determined number of available punctuation areas as the binary number to be embedded, and establish The corresponding relationship between the selected binary number and the information to be embedded is saved and recorded, for example, in a database.
若待嵌入信息本身不是一个二进制数,但通过进制转化能转化为二进制数,且转化得到的二进制数位数小于等于确定出的可用标点区域数量,则确定转化得到的二进制数为待嵌入的二进制数。If the information to be embedded is not a binary number itself, but can be converted into a binary number through base conversion, and the number of converted binary digits is less than or equal to the determined number of available punctuation areas, then it is determined that the converted binary number is the binary number to be embedded number.
若待嵌入信息本身不是一个二进制数,但通过进制转化能转化为二进制数,且转化得到的二进制数位数大于确定出的可用标点区域数量,则选择一个位数小于等于确定出的可用标点区域数量的二进制数,作为待嵌入的二进制数,并建立所选择的二进制数与所述待嵌入信息的对应关系,并将待嵌入信息,以及其与选择的二进制数的对应关系记录下来,例如保存在数据库中。If the information to be embedded is not a binary number itself, but can be converted into a binary number through base conversion, and the converted binary digits are greater than the determined number of available punctuation areas, then select a number that is less than or equal to the determined available punctuation areas Quantity of binary numbers, as the binary numbers to be embedded, and establish the corresponding relationship between the selected binary numbers and the information to be embedded, and record the information to be embedded and the corresponding relationship with the selected binary numbers, such as saving in the database.
若待嵌入信息本身不是二进制数,且不能转化为二进制数时;则选择一个位数小于等于确定出的可用标点区域数量的二进制数,作为待嵌入的二进制数,并建立所选择的二进制数与待嵌入信息的对应关系,并将待嵌入信息,以及其与选择的二进制数的对应关系记录下来,例如保存在数据库中。If the information to be embedded is not a binary number and cannot be converted into a binary number; then select a binary number whose number of digits is less than or equal to the determined number of available punctuation areas as the binary number to be embedded, and establish the selected binary number and The correspondence between the information to be embedded and the correspondence between the information to be embedded and the selected binary number are recorded, for example, stored in a database.
然后,根据待嵌入的二进制数和设定的信息嵌入规则,确定出每个可用标点区域的待嵌入编码。具体为:根据待嵌入的二进制数的位数和可用标点区域的数量,依照设定的信息嵌入规则,确定出每个可用标点区域的待嵌入编码。Then, according to the binary number to be embedded and the set information embedding rules, the code to be embedded in each available punctuation area is determined. Specifically, according to the number of digits of the binary number to be embedded and the number of available punctuation areas, the code to be embedded in each available punctuation area is determined according to the set information embedding rules.
若待嵌入的二进制数的位数等于可用标点区域的数量,则直接进行分配,即直接将待嵌入的二进制数包含的二进制编码分别分配给各可用标点区域作为待嵌入编码。If the number of digits of the binary number to be embedded is equal to the number of available punctuation areas, the allocation is performed directly, that is, the binary codes contained in the binary number to be embedded are directly assigned to each available punctuation area as the code to be embedded.
若待嵌入的二进制数位数小于可用标点区域的数量,则通过冗余算法为各可用标点区域分别分配一个待嵌入的二进制数中包含的二进制编码作为待嵌入编码。若可用标点区域的数量为M,二进制数的位数为N,且M<N;则计算M/N,得到商和余数。将余数对应的可用标点区域舍去,然后为剩余的可用标点区域分配待嵌入编码,例如得到的商为3,则为第1-3个可用标点区域分配二进制数中包含的第一位二进制编码作为待嵌入编码,为第4-6个可用标点区域分配二进制数中包含的第二位二进制编码作为待嵌入编码,……,以此类推。If the number of binary digits to be embedded is less than the number of available punctuation areas, each available punctuation area is assigned a binary code contained in the binary number to be embedded as the code to be embedded for each available punctuation area through a redundancy algorithm. If the number of available punctuation areas is M, the number of digits of the binary number is N, and M<N; then calculate M/N to obtain the quotient and the remainder. The available punctuation area corresponding to the remainder is discarded, and then the code to be embedded is assigned to the remaining available punctuation area. For example, if the obtained quotient is 3, then the first binary code contained in the binary number is assigned to the 1st-3rd available punctuation area As the code to be embedded, assign the second binary code contained in the binary number to the 4th to 6th available punctuation area as the code to be embedded, ..., and so on.
沿用上边的例子,如果将123这个数作为待嵌入的信息嵌入文本文档,则将123转化为二进制数:1111011,由于该二进制数的位数为7,而确定出的可用标点区域也为7个,因此直接将1111011分配给图2中所示的7个可用标点区域,其中区域“示,是”分配该二进制数的第一位二进制编码1、区域“图。首”分配该二进制数的第二位二进制编码1、……、等等。Following the above example, if the number 123 is embedded in a text document as the information to be embedded, then convert 123 into a binary number: 1111011. Since the binary number has 7 digits, the determined available punctuation area is also 7 , so directly assign 1111011 to the 7 available punctuation areas shown in Figure 2, wherein the area "shown, yes" assigns the first binary code 1 of the binary number, and the area "Figure. The first" assigns the first bit of the binary number Two-bit binary code 1, ..., and so on.
例如:将345这个数作为待嵌入的信息,则将345转化为二进制数:101011001,由于该二进制数的位数为9,而确定出的可用标点区域也为7个,因此随机选择一个位数小于等于可用标点区域数量的二进制数。此处可以选择一个7位的二进制数也可以选择一个小于7位的二进制数。建立起选择的二进制数与345这个待嵌入信息的对应关系,并存储345这个待嵌入信息及其与所选择。For example: use the number 345 as the information to be embedded, then convert 345 into a binary number: 101011001, since the number of digits of the binary number is 9, and the determined available punctuation area is also 7, so a digit is randomly selected A binary number less than or equal to the number of available punctuation fields. Here, a 7-bit binary number can be selected or a binary number smaller than 7 bits can be selected. The corresponding relationship between the selected binary number and 345 the information to be embedded is established, and the information to be embedded 345 and the selected information are stored.
特别的,当每个可用标点区域对应的待嵌入编码已知或已随机设定时,则可以不用省略步骤S103,在执行完步骤S102后直接进入步骤S104。In particular, when the code to be embedded corresponding to each available punctuation area is known or has been randomly set, step S103 may not be omitted, and step S104 may be directly entered after step S102 is executed.
S104:根据各可用标点区域中标点的原始位置和对应的待嵌入编码,调整各可用标点区域中标点的位置。通过调整标点的位置可以实现每个可用标点区域对应的待嵌入编码嵌入到各可用标点区域中。S104: Adjust the position of the punctuation in each available punctuation area according to the original position of the punctuation point in each available punctuation area and the corresponding code to be embedded. The code to be embedded corresponding to each available punctuation area can be embedded in each available punctuation area by adjusting the position of the punctuation mark.
其中,每个可用标点区域对应的待嵌入编码包括1或0,根据待嵌入编码的奇偶性以及标点所在的频带的频带索引的奇偶性(简称索引奇偶性),对标点的位置进行调整,使其位置发生变化,来达到嵌入信息的目的。具体为:Wherein, the code to be embedded corresponding to each available punctuation area includes 1 or 0, according to the parity of the code to be embedded and the parity of the frequency band index of the frequency band where the punctuation is located (referred to as index parity), the position of the punctuation is adjusted, so that Its position changes to achieve the purpose of embedding information. Specifically:
(i)若某可用标点区域中标点所在的原始位置对应的频带索引为奇数,且该可用标点区域对应的待嵌入编码为0,则移动该可用标点区域内标点的位置至频带索引为偶数的频带,即移动标点的位置使其频带索引变为偶数。(i) If the frequency band index corresponding to the original position of the punctuation point in a certain available punctuation area is an odd number, and the code to be embedded corresponding to the available punctuation area is 0, then move the position of the punctuation point in the available punctuation area to the frequency band index that is an even number Frequency band, that is, move the position of the punctuation so that its frequency band index becomes an even number.
(ii)若某可用标点区域中标点所在的原始位置对应的频带索引为奇数,且该可用标点区域对应的待嵌入编码为1,则不改变该可用标点区域内标点所在的频带,即标点所在的频带索引的奇偶性不变。此时为了使嵌入信息的稳定性更好,也要根据情况移动标点的位置,详见下面的说明。(ii) If the frequency band index corresponding to the original position of the punctuation point in an available punctuation area is an odd number, and the code to be embedded corresponding to the available punctuation area is 1, then the frequency band where the punctuation point is located in the available punctuation area is not changed, that is, the punctuation point is located The parity of the band index is unchanged. At this time, in order to improve the stability of the embedded information, the position of the punctuation should also be moved according to the situation, see the following description for details.
(iii)若某可用标点区域中标点所在的原始位置对应的频带索引为偶数,且该可用标点区域对应的待嵌入编码为0,则不改变该可用标点区域内标点所在的频带,即标点所在的频带索引的奇偶性不变。此时为了使嵌入信息的稳定性更好,也要根据情况移动标点的位置,详见下面的说明。(iii) If the frequency band index corresponding to the original position of the punctuation point in a certain available punctuation area is an even number, and the code to be embedded corresponding to the available punctuation area is 0, then the frequency band where the punctuation point is located in the available punctuation area is not changed, that is, the punctuation point is located The parity of the band index is unchanged. At this time, in order to improve the stability of the embedded information, the position of the punctuation should also be moved according to the situation, see the following description for details.
(iv)若某可用标点区域中标点所在的原始位置对应的频带索引为偶数,且该可用标点区域对应的待嵌入编码为1,则移动该可用标点区域标点的位置至频带索引为奇数的频带,即移动标点的位置使其频带索引变为偶数。(iv) If the frequency band index corresponding to the original position of the punctuation point in an available punctuation area is an even number, and the code to be embedded corresponding to the available punctuation area is 1, then move the position of the available punctuation area punctuation to the frequency band whose frequency band index is an odd number , that is, move the position of the punctuation so that its band index becomes even.
对所有需要嵌入待嵌入编码的可用标点区域都进行上述处理后,就完成对文本文档的水印信息嵌入。After all the available punctuation areas that need to be embedded with codes to be embedded are subjected to the above processing, the watermark information embedding of the text document is completed.
沿用上述例子,当待嵌入信息是123这个数时,对应于第一个可用标点区域“示,是”的待嵌入编码为1,该区域内标点所在的频带频带索引6为偶数,根据上述移动规则的第(iv)条,需要将其移动到频带索引为奇数的频带上去,则将该可用标点区域内的标点移至相邻的频带索引为5或7的频带,本实施例中是以移到频带索引为5的频带上为例。同理将其他6个区域的标点按照上述规则进行移动,移动后各可用标点区域中标点的位置如图4所示。Using the above example, when the number of information to be embedded is 123, the code to be embedded corresponding to the first available punctuation area "show, yes" is 1, and the frequency band index 6 of the punctuation in this area is an even number. Article (iv) of the rule needs to be moved to the frequency band whose frequency band index is an odd number, then the punctuation in the available punctuation area is moved to the frequency band whose adjacent frequency band index is 5 or 7, in this embodiment is Move to the frequency band whose frequency band index is 5 as an example. Similarly, move the punctuation points in the other 6 areas according to the above rules, and the positions of the punctuation points in each available punctuation area after the movement are shown in Figure 4.
上述(ii)、(iii)两种情况下并不需要改变标点所在频带的奇偶性,但如果标点正好位于频带的边缘附近,则很不稳定,很可能由于各种干扰(例如打印、扫描等)造成其频带索引的奇偶性翻转。例如某可用标点区域中标点的处于第3个频带中,其所在位置的具体值为3.01,由于某种干扰,使其变为2.99,则其频带索引的奇偶性由奇数3,变成了2,则会在后续检测时发生错误。因此对于不需要进行频带索引的奇偶性翻转的标点所在位置调整到其所在频带的中间,可大大增加稳定性。例如上述的标点的位于第3个频带中所在位置的具体值为3.01,可以将其调整为3.5,则可有效地增加其稳定性。In the above two cases (ii) and (iii), it is not necessary to change the parity of the frequency band where the punctuation is located, but if the punctuation is just near the edge of the frequency band, it is very unstable, and it is likely to be caused by various interferences (such as printing, scanning, etc.) ) causes the parity flip of its band index. For example, the punctuation in an available punctuation area is in the third frequency band, and the specific value of its location is 3.01. Due to some kind of interference, it becomes 2.99, and the parity of its frequency band index changes from an odd number of 3 to 2. , an error will occur during subsequent checks. Therefore, adjusting the position of the punctuation point that does not need parity flipping of the frequency band index to the middle of the frequency band where it is located can greatly increase the stability. For example, the specific value of the above-mentioned punctuation in the third frequency band is 3.01, which can be adjusted to 3.5, which can effectively increase its stability.
同理,对于需要改变所在频带的频带索引奇偶性的标点,也应将之移动到所要调整到的频带的中间位置处,以使其所处位置的稳定性较好。例如上述的可用标点区域“示,是”,将其索引从6调整为5时,使其所处位置的具体值为5.5。Similarly, for the punctuation that needs to change the parity of the frequency band index of the frequency band, it should also be moved to the middle position of the frequency band to be adjusted, so that the stability of the position is better. For example, in the above available punctuation area "show, yes", when its index is adjusted from 6 to 5, the specific value of its position is 5.5.
本发明实施例提供的水印提取方法,用于提取使用上述水印嵌入方法所嵌入的水印信息,其流程图如图5所示,执行步骤如下:The watermark extraction method provided by the embodiment of the present invention is used to extract the watermark information embedded by the above-mentioned watermark embedding method, the flow chart of which is shown in Figure 5, and the execution steps are as follows:
S201:根据与嵌入水印信息时所采用的相同的区域选取规则,查找并确定出已嵌入信息的文本文档中的可用标点区域。S201: Find and determine the available punctuation area in the text document in which the information has been embedded according to the same area selection rule as used for embedding the watermark information.
该步骤确定可用标点区域的过程具体同步骤S101,此处不再赘述。所不同的是在步骤S101可以随意选择区域选取规则,即可以定义可用标点区域的起始边界和终止边界,而该步骤中必须采用所处理的文本文档嵌入水印信息时所采用的边界选取规则,对可用标点区域使用与嵌入水印信息时相同的起始边界和终止边界。The process of determining the available punctuation area in this step is specifically the same as step S101, and will not be repeated here. The difference is that in step S101, the region selection rule can be selected arbitrarily, that is, the start boundary and the end boundary of the available punctuation area can be defined, and the boundary selection rule adopted when the processed text document must be embedded with watermark information must be adopted in this step, Use the same start and end boundaries for the usable punctuation area as when embedding watermark information.
沿用上边的例子,则可确定出图4所示的文本片段中的7个可用标点区域。Following the above example, seven available punctuation areas in the text segment shown in FIG. 4 can be determined.
S202:根据与嵌入水印信息时相同的位置确定规则,分别确定出每个可用标点区域中标点所在的位置。S202: Determine the positions of the punctuation points in each available punctuation area according to the same position determination rules as when embedding the watermark information.
确定可用标点区域中标点所在位置的过程具体同步骤S102。The process of determining the location of the punctuation in the available punctuation area is specifically the same as step S102.
所不同的是:在确定标点位置时,需要使用与嵌入信息时相同的位置参数(标点的重心、中心、左边界或右边界等)代表可用标点的位置,例如,嵌入信息时以重心代表标点的位置,则此步骤中也必须以重心代表标点的位置。以保证确定出来的标点所在的频带以及对应的频带索引的准确性。The difference is: when determining the position of the punctuation, it is necessary to use the same position parameters (center of gravity, center, left or right boundary of the punctuation, etc.) position, the center of gravity must also be used to represent the position of the punctuation in this step. In order to ensure the accuracy of the frequency band where the determined punctuation points are located and the corresponding frequency band index.
沿用上边的例子,对于第一个可用标点区域“示,是”确定出标点所在的频带为5或7。Using the above example, for the first available punctuation area "show, yes" to determine the frequency band where the punctuation is located is 5 or 7.
S203:根据各可用标点区域中标点所在的位置,分别确定出各可用标点区域中的嵌入编码。具体为:S203: Determine the embedded codes in each available punctuation area according to the positions of the punctuation points in each available punctuation area. Specifically:
根据可用标点区域中标点所在频带对应的频带索引的奇偶性,确定嵌入该可用标点区域中的嵌入编码。According to the parity of the frequency band index corresponding to the frequency band where the punctuation is located in the available punctuation area, the embedded code embedded in the available punctuation area is determined.
若可用标点区域中标点所在的频带对应的频带索引为偶数,则确定该可用标点区域内的嵌入编码为0。If the frequency band index corresponding to the frequency band where the punctuation is located in the available punctuation area is an even number, it is determined that the embedded code in the available punctuation area is 0.
若可用标点区域中标点所在的频带对应的频带索引为奇数,则确定该可用标点区域内的嵌入编码为1。If the frequency band index corresponding to the frequency band where the punctuation is located in the available punctuation area is an odd number, then it is determined that the embedded code in the available punctuation area is 1.
沿用上边的例子,根据确定出的第一个可用标点区域“示,是”中标点所在的频带为5或7,得到该区域内的嵌入编码为1。Following the above example, according to the frequency band where the punctuation in the determined first available punctuation area "show is" is 5 or 7, the embedded code in this area is obtained as 1.
上述水印提取方法还可以包括下述步骤:The above-mentioned watermark extraction method may also include the following steps:
S204:根据嵌入水印信息时相同的信息嵌入规则和确定出的各可用标点区域中的嵌入编码,得到该文本文档中的嵌入信息。具体为:S204: According to the same information embedding rules when embedding the watermark information and the determined embedding codes in each available punctuation area, obtain the embedding information in the text document. Specifically:
根据嵌入水印信息时相同的信息嵌入规则和确定出的各可用标点区域中的嵌入编码,得到嵌入文本文档中的二进制数。According to the same information embedding rules when embedding watermark information and the determined embedding codes in each available punctuation area, the binary number embedded in the text document is obtained.
根据得到的二进制数,得到文本文档对应的嵌入信息。具体包括:若二进制数即为嵌入信息时,则可以直接的到嵌入信息;若二进制数不是嵌入信息时,查找存储的嵌入的二进制数与嵌入信息的对应关系,获取到嵌入信息。According to the obtained binary number, the embedded information corresponding to the text document is obtained. Specifically include: if the binary number is the embedded information, then the embedded information can be obtained directly; if the binary number is not the embedded information, search for the corresponding relationship between the stored embedded binary number and the embedded information, and obtain the embedded information.
沿用上边的例子,根据各个可用标点区域内的嵌入编码和嵌入水印信息时的分配规则,得到嵌入该文本片段中的二进制数为1111011,进一步可以恢复出嵌入该文本文档的信息为123。Using the above example, according to the embedded codes in each available punctuation area and the allocation rules when embedding watermark information, the binary number embedded in the text segment is 1111011, and the information embedded in the text document can be recovered as 123.
上述水印嵌入和提取方法中,在嵌入水印时频带的划分可通过计算机软件分析文本的扫描图,也可由人工用标尺等工具,在纸张上直接划分;相应的在提取水印时,也可以通过计算机软件或人工方式进行提取,得到每个可用标点区域对应的嵌入编码;进而可以恢复出嵌入的二进制数。In the above watermark embedding and extraction method, when embedding the watermark, the frequency band can be divided by analyzing the scanned image of the text through computer software, and can also be directly divided on paper with tools such as rulers; correspondingly, when extracting the watermark, it can also be divided by computer Extract by software or manually to obtain the embedded code corresponding to each available punctuation area; then the embedded binary number can be recovered.
根据上述水印嵌入方法,可以构建一种水印嵌入装置,如图6所示,包括:区域确定模块101、位置确定模块102和信息嵌入模块103。According to the above watermark embedding method, a watermark embedding device can be constructed, as shown in FIG.
区域确定模块101,用于根据设定的区域确定规则,确定出待嵌入信息的文本文档中的可用标点区域。The
较佳的,区域确定模块101,进一步可以包括:获取单元1011、标点确定单元1012和区域确定单元1013。Preferably, the
获取单元1011,用于获取文本文档中的纯文本区域。The acquiring
标点确定单元1012,用于对获取单元1011获取到的纯文本区域进行文字切分和标点分析,确定出包含的可用标点;其中,可用标点的前后均至少有一个相邻的其他字符。The
区域确定单元,用于根据标点确定单元1012确定出的可用标点及其前后相邻的两个其他字符,定义起始边界和终止边界,得到可用标点区域。The area determination unit is configured to define a start boundary and an end boundary according to the available punctuation determined by the
位置确定模块102,用于根据设定的位置确定规则,分别确定出每个可用标点区域中标点的原始位置。The
较佳的,位置确定模块102,进一步可以包括:频带划分单元1021和位置确定单元1022。Preferably, the
频带划分单元1021,用于计算每个可用标点区域的起始边界至终止边界的距离,根据上述距离将每个可用标点区域划分为若干个频带。The frequency
位置确定单元1022,用于根据每个可用标点区域中的标点所在的坐标位置,分别确定出每个可用标点区域中标点所在的频带以及对应的频带索引。The
信息嵌入模块103,用于针对每一个可用标点区域,根据其中标点的原始位置和对应的待嵌入编码,对其中标点的位置进行调整,实现将对应的待嵌入编码嵌入到对应的可用标点区域中。
上述水印嵌入装置,还包括:编码分配模块104,用于根据文本文档对应的待嵌入信息,确定出待嵌入的二进制数;其中,二进制数的位数小于等于确定出的可用标点区域数量;以及根据待嵌入的二进制数和设定的信息嵌入规则,确定出每个可用标点区域的待嵌入编码。The above-mentioned watermark embedding device also includes: a
根据上述水印提取方法,可以构建一种水印提取装置,如图7所示,包括:区域确定模块201、位置确定模块202和编码提取模块203。According to the above watermark extraction method, a watermark extraction device can be constructed, as shown in FIG. 7 , including: an
区域确定模块201,用于根据与嵌入水印信息时相同的区域确定规则,确定出已嵌入信息的文本文档中的可用标点区域。The
较佳的,区域确定模块201进一步可以包括:获取单元2011、标点确定单元2012、区域确定单元2013。Preferably, the
获取单元2011,用于获取文本文档中的纯文本区域。The acquiring
标点确定单元2012,用于对获取单元2011获取到的纯文本区域进行文字切分和标点分析,确定出包含的可用标点;其中,可用标点的前后均至少有一个相邻的其他字符。The
区域确定单元2013,用于根据标点确定单元2012确定出的可用标点、其前后相邻的两个其他字符和嵌入水印信息时定义的起始边界和终止边界,确定出可用标点区域。The
位置确定模块202,用于根据与嵌入水印信息时相同的位置确定规则,分别确定出每个可用标点区域中标点所在的位置。The
较佳的,位置确定模块202进一步可以包括:频带划分单元2021和位置确定单元2022。Preferably, the
频带划分单元2021,用于计算每个可用标点区域的起始边界至终止边界的距离,根据上述距离将每个可用标点区域划分为若干个频带。The frequency
位置确定单元2022,用于并根据每个可用标点区域中的标点所在的坐标位置,分别确定出每个可用标点区域中标点所在的频带以及对应的频带索引。The
编码提取模块203,用于根据位置确定模块202确定出的标点所在的位置,分别确定出各可用标点区域中的嵌入编码。The
上述水印提取装置,还包括:信息恢复模块204,用于根据嵌入水印信息时相同的信息嵌入规则和确定出的各可用标点区域中的嵌入编码,得到嵌入文本文档中的二进制数;以及根据上述二进制数,得到文本文档对应的嵌入信息。The above-mentioned watermark extracting device also includes: an
本发明实施例提供的水印嵌入和提取方法及装置,通过选取可用标点区域;通过对每一个可用标点区域中的标点位置进行调整,实现将待嵌入编码嵌入到对应的可用标点区域中。在提取水印时,则根据调整后的标点位置采用对应的规则分别提取出各个可用标点区域中的嵌入编码。上述方法操作简单,且由于人眼对标点位置改变的敏感度远远小于对字符位置的改变,因此可做较大幅度的改变,使得嵌入的水印信息稳定性高,隐藏性好,同时能够获得良好的视觉效果。The watermark embedding and extraction method and device provided by the embodiments of the present invention realize embedding the code to be embedded into the corresponding available punctuation area by selecting the available punctuation area and adjusting the punctuation position in each available punctuation area. When extracting the watermark, the corresponding rules are used to extract the embedded codes in each available punctuation area according to the adjusted punctuation position. The above method is easy to operate, and because the sensitivity of the human eye to changes in punctuation positions is much smaller than changes in character positions, large changes can be made, so that the embedded watermark information has high stability and good concealment, and at the same time can obtain good visuals.
此外,还可以根据待嵌入信息,确定出待嵌入的二进制数,进而通过设定的信息嵌入规则,为各个可用标点区域分配待嵌入编码,根据待嵌入编码的值和标点的原始位置(所在的频带),调整和移动标点的位置,完成信息嵌入;信息嵌入的安全、可靠性高,信息提取时的准确性也比较高。尤其是通过改变标点重心来嵌入水印信息时,可以获得更好的、更稳定、更准确的嵌入和检测提取效果。In addition, the binary number to be embedded can also be determined according to the information to be embedded, and the code to be embedded can be assigned to each available punctuation area through the set information embedding rules. According to the value of the code to be embedded and the original position of the punctuation (where frequency band), adjust and move the position of the punctuation, and complete the information embedding; the safety and reliability of information embedding are high, and the accuracy of information extraction is relatively high. Especially when embedding watermark information by changing the center of gravity of punctuation, better, more stable and more accurate embedding and detection and extraction effects can be obtained.
本发明实施例提供的上述方法及装置,嵌入水印信息的信息隐蔽性好,且具有很高的鲁棒性,可以抵抗多次打印、复印、缩放的攻击,水印提取可实现盲检测,运算快捷,适用于对视觉要求高,鲁棒性要要求高的场合。解决了现有技术由于实施成本过高,或稳定性与视觉效果的矛盾无法解决而导致难以在实际中使用的问题。The above-mentioned method and device provided by the embodiment of the present invention have good concealment of embedded watermark information, and have high robustness, and can resist multiple printing, copying, and scaling attacks. Watermark extraction can realize blind detection, and the operation is fast , suitable for occasions that require high vision and high robustness. It solves the problem that the existing technology is difficult to be used in practice due to high implementation cost or unresolved contradiction between stability and visual effect.
以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化、替换或应用到其他类似的装置,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求书的保护范围为准。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any skilled person in the technical field can easily think of changes, Replacement or application to other similar devices shall fall within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.
Claims (24)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2008102404837A CN101751656B (en) | 2008-12-22 | 2008-12-22 | A watermark embedding and extraction method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2008102404837A CN101751656B (en) | 2008-12-22 | 2008-12-22 | A watermark embedding and extraction method and device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN101751656A CN101751656A (en) | 2010-06-23 |
| CN101751656B true CN101751656B (en) | 2012-03-28 |
Family
ID=42478602
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN2008102404837A Expired - Fee Related CN101751656B (en) | 2008-12-22 | 2008-12-22 | A watermark embedding and extraction method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN101751656B (en) |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102737204B (en) * | 2011-04-01 | 2015-08-05 | 北京利云技术开发公司 | The method and apparatus of a kind of security document and generation and this security document of detection |
| CN102682248B (en) * | 2012-05-15 | 2015-01-07 | 西北大学 | Watermark embedding and extracting method for ultrashort Chinese text |
| CN106780280B (en) * | 2016-11-30 | 2019-02-01 | 深圳Tcl数字技术有限公司 | Digital watermarking encryption method and device |
| CN108174051B (en) * | 2017-12-08 | 2019-11-08 | 新华三技术有限公司 | A kind of vector watermark decoding method, device and electronic equipment |
| CN110457873B (en) * | 2018-05-08 | 2021-04-27 | 中移(苏州)软件技术有限公司 | A watermark embedding and detection method and device |
| SG11202002013SA (en) * | 2019-05-20 | 2020-04-29 | Alibaba Group Holding Ltd | Identifying copyrighted material using embedded copyright information |
| CN111985208B (en) * | 2020-08-18 | 2024-03-26 | 沈阳东软智能医疗科技研究院有限公司 | Method, device and equipment for realizing punctuation mark filling |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1694399A (en) * | 2005-05-18 | 2005-11-09 | 上海龙方信息技术有限公司 | Method for digital signature locking localization |
| US7106884B2 (en) * | 2002-02-01 | 2006-09-12 | Canon Kabushiki Kaisha | Digital watermark embedding apparatus for document, digital watermark extraction apparatus for document, and their control method |
| CN101093574A (en) * | 2007-07-23 | 2007-12-26 | 中国人民解放军信息工程大学 | Watermark method of vectorial geographical spatial data based on integral wavelet transforms |
-
2008
- 2008-12-22 CN CN2008102404837A patent/CN101751656B/en not_active Expired - Fee Related
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7106884B2 (en) * | 2002-02-01 | 2006-09-12 | Canon Kabushiki Kaisha | Digital watermark embedding apparatus for document, digital watermark extraction apparatus for document, and their control method |
| CN1694399A (en) * | 2005-05-18 | 2005-11-09 | 上海龙方信息技术有限公司 | Method for digital signature locking localization |
| CN101093574A (en) * | 2007-07-23 | 2007-12-26 | 中国人民解放军信息工程大学 | Watermark method of vectorial geographical spatial data based on integral wavelet transforms |
Also Published As
| Publication number | Publication date |
|---|---|
| CN101751656A (en) | 2010-06-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN101751656B (en) | A watermark embedding and extraction method and device | |
| Amano et al. | A feature calibration method for watermarking of document images | |
| US5765176A (en) | Performing document image management tasks using an iconic image having embedded encoded information | |
| US8427509B2 (en) | Method for embedding messages into documents using distance fields | |
| KR101016712B1 (en) | How to detect watermark information | |
| JP4854491B2 (en) | Image processing apparatus and control method thereof | |
| US20040001606A1 (en) | Watermark fonts | |
| CN101119429A (en) | A method and device for digital watermark embedding and extraction | |
| CN102567938B (en) | Watermark image blocking method and device for western language watermark processing | |
| CN1259709A (en) | Method and system for embedding information in document | |
| CN102027526A (en) | Method and system for embedding covert data in a text document using space encoding | |
| CN100498834C (en) | Digital water mark embedding and extracting method and device | |
| Tan et al. | Print-Scan Resilient Text Image Watermarking Based on Stroke Direction Modulation for Chinese Document Authentication. | |
| CN104143200A (en) | A Method of Frame Type Coding and Intelligent Recognition of Image Additional Information | |
| TW200540728A (en) | Text region recognition method, storage medium and system | |
| CN103761700A (en) | Watermark method capable of resisting printing scanning attack and based on character refinement | |
| CN112650992B (en) | A document tracking encryption method based on digital watermark | |
| CN113918895B (en) | A method for tracing the provenance of text documents | |
| Stojanov et al. | A new property coding in text steganography of Microsoft Word documents | |
| JP2017068562A (en) | Information processing device and program | |
| CN102096891B (en) | Method and device for embedding and extracting digital watermark | |
| CN113076528B (en) | Anti-counterfeiting information embedding method, extraction method, device and storage medium | |
| CN112990178B (en) | Text digital information embedding and extracting method and system based on character segmentation | |
| JP2008085579A (en) | Device for embedding information, information reader, method for embedding information, method for reading information and computer program | |
| CN101131698A (en) | Method and device for printing data files |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| TR01 | Transfer of patent right | ||
| TR01 | Transfer of patent right |
Effective date of registration: 20220914 Address after: 100871 No. 5, the Summer Palace Road, Beijing, Haidian District Patentee after: Peking University Patentee after: New founder holdings development Co.,Ltd. Patentee after: BEIJING FOUNDER ELECTRONICS CHIEF INFORMATION TECHNOLOGY Co.,Ltd. Address before: 100871 No. 5, the Summer Palace Road, Beijing, Haidian District Patentee before: Peking University Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Patentee before: BEIJING FOUNDER ELECTRONICS CHIEF INFORMATION TECHNOLOGY Co.,Ltd. |
|
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20120328 |