[go: up one dir, main page]

CN119066705A - Text data protection method and device, computing device and storage medium - Google Patents

Text data protection method and device, computing device and storage medium Download PDF

Info

Publication number
CN119066705A
CN119066705A CN202411120957.XA CN202411120957A CN119066705A CN 119066705 A CN119066705 A CN 119066705A CN 202411120957 A CN202411120957 A CN 202411120957A CN 119066705 A CN119066705 A CN 119066705A
Authority
CN
China
Prior art keywords
object model
document object
text data
shadow document
programming interface
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411120957.XA
Other languages
Chinese (zh)
Inventor
杨阳
刘涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China International Financial Ltd By Share Ltd
Original Assignee
China International Financial Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China International Financial Ltd By Share Ltd filed Critical China International Financial Ltd By Share Ltd
Priority to CN202411120957.XA priority Critical patent/CN119066705A/en
Publication of CN119066705A publication Critical patent/CN119066705A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Document Processing Apparatus (AREA)

Abstract

本申请提出了一种文本数据保护方法。首先,获取反爬虫机制。然后,执行所述反爬虫机制,以实施如下步骤:获取超文本标记语言文件中的要保护处理的至少部分文本数据以及保护处理后的文本数据的宿主元素;确定用于创建影子文档对象模型的应用编程接口方法的代码是否为原生代码;响应于用于创建影子文档对象模型的应用编程接口方法的代码为原生代码,利用所述应用编程接口方法将所述超文本标记语言文件中的要保护处理的至少部分文本数据创建成封闭模式的影子文档对象模型,并将所述封闭模式的影子文档对象模型挂载到用户指定的宿主元素上。最后,对所述宿主元素和影子文档对象模型进行渲染。

The present application proposes a text data protection method. First, an anti-crawler mechanism is obtained. Then, the anti-crawler mechanism is executed to implement the following steps: obtaining at least part of the text data to be protected in a hypertext markup language file and the host element of the protected text data; determining whether the code of the application programming interface method for creating a shadow document object model is native code; in response to the code of the application programming interface method for creating a shadow document object model being native code, using the application programming interface method to create at least part of the text data to be protected in the hypertext markup language file into a closed mode shadow document object model, and mounting the closed mode shadow document object model on a host element specified by the user. Finally, the host element and the shadow document object model are rendered.

Description

Text data protection method and device, computing equipment and storage medium
Technical Field
The present disclosure relates to the field of computers, and in particular, to a method and apparatus for protecting text data, a computing device, and a storage medium.
Background
With the continuous development of computer technology, accessing web pages is becoming an important way for people to acquire information in life. Also, in this environment, various crawler software is also used on the web to crawl important text data in the web page, so that it is seen that text data protection on the web page is also increasingly important. The anti-crawler is a counter to the web crawler, and can prevent or interfere the normal crawling of the crawler through some counter strategies, so that the resources of the website are protected from being grabbed by others, and the problems of data leakage and the like are prevented. The method mainly adopts a crawling-based anti-crawler strategy at present, and the thought is mainly to set crawling barriers in crawling of a crawler so as to prevent or disturb the crawler to acquire real data. Taking the anti-crawler strategy of camouflage by using pictures as an example, the pictures with characters are mixed with normal characters, so that a crawler cannot easily take complete character contents. However, with the development of OCR technology, it is very easy to identify and extract characters in an image, and the anti-crawling strategy is not only easy to crack, but also because the cracking process cannot perform effective detection and interception, so that a crawler is not easy to capture and find.
Disclosure of Invention
In view of the above, the present disclosure provides text data protection methods and apparatus, computing devices, and storage media, which desirably overcome some or all of the above-referenced shortcomings, as well as other possible shortcomings.
According to a first aspect of the present disclosure, a text data protection method is provided. The method first obtains an anticreep mechanism. The method then executes the anticreeper mechanism to implement the steps of obtaining at least a portion of text data to be protected in a hypertext markup language file and host elements of the protected text data, determining whether code of an application programming interface method for creating a shadow document object model is native code, creating the at least a portion of text data to be protected in the hypertext markup language file into a shadow document object model in a closed mode using the application programming interface method in response to the code of the application programming interface method for creating the shadow document object model being native code, and mounting the shadow document object model in the closed mode to the host elements specified by a user. And finally, rendering the host element and the shadow document object model by the method.
In some embodiments, the obtaining the anti-crawler mechanism comprises sending a request for accessing a target webpage, and obtaining a hypertext markup language file and a corresponding JavaScript file for the target webpage, wherein the JavaScript file comprises the anti-crawler mechanism.
In some embodiments, the JavaScript file further includes at least a portion of the text data to be protected and a host element that protects the processed text data in the hypertext markup language file, and wherein the obtaining at least a portion of the text data to be protected and a host element that protects the processed text data in the hypertext markup language file includes obtaining at least a portion of the text data to be protected and a host element that protects the processed text data in the hypertext markup language file from the JavaScript file.
In some embodiments, the method further includes collecting identity information and host element information of the current browser in response to code of an application programming interface method used to create the shadow document object model being not native code.
In some embodiments, determining whether the code of the application programming interface method for creating the shadow document object model is native code includes parsing the code of the application programming interface method for creating the shadow document object model into a string of characters to obtain parsed content, determining that the code of the application programming interface method for creating the shadow document object model is native code in response to the parsed content being the same as the native code, and determining that the code of the application programming interface method for creating the shadow document object model is not native code in response to the parsed content being different from the native code.
In some embodiments, resolving code of an application programming interface method for creating a shadow document object model into character strings to obtain resolved content includes obtaining the application programming interface method for creating the shadow document object model by the host element, checking whether the resolving method for the application programming interface method is tampered with, and resolving code of the application programming interface method for creating the shadow document object model into character strings by the resolving method in response to the resolving method not being tampered with to obtain resolved content.
In some embodiments, the method further comprises determining a host element of the shadow document object model, a schema of the shadow document object model, and content of the shadow document object model based on rendering results from rendering the host element and the shadow document object model, and deleting the rendering results in response to the host element of the shadow document object model not being a host element of the text data to be protected, the schema of the shadow document object model not being a closed schema, or the content of the shadow document object model not being the at least part of the text data to be protected in a hypertext markup language file.
According to a second aspect of the present disclosure, there is provided a text data protection apparatus including an acquisition module, an execution module, and a rendering module. The acquisition module is configured to acquire an anti-crawler mechanism. The execution module is configured to execute the anticreeper mechanism to implement the steps of obtaining at least part of text data to be protected in a hypertext markup language file and host elements of the text data after the protection process, determining whether code of an application programming interface method for creating a shadow document object model is native code, creating the at least part of text data to be protected in the hypertext markup language file into a shadow document object model in a closed mode using the application programming interface method in response to the code of the application programming interface method for creating the shadow document object model being native code, and mounting the shadow document object model in the closed mode on the host elements specified by a user. The rendering module is configured to render the host element and shadow document object model.
According to a third aspect of the present disclosure there is provided a computing device comprising a memory configured to store computer executable instructions, a processor configured to perform any of the methods described above when the computer executable instructions are executed by the processor.
According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium storing computer executable instructions that, when executed, perform any of the methods described above.
According to a fifth aspect of the present disclosure, there is provided a computer program product, characterized in that the computer program product comprises computer executable instructions which, when executed, implement any of the methods described above.
In the text data protection method and device claimed in the present disclosure, an anti-crawler mechanism is first acquired, then an execution module acquires at least part of text data to be protected in a hypertext markup language file and host elements of the text data after protection processing as input of the anti-crawler mechanism, and when determining that codes of an application programming interface method for creating a shadow document object model are native codes, the anti-crawler mechanism creates the at least part of text data to be protected in the hypertext markup language file into a shadow document object model in a closed mode by using the application programming interface method, and mounts the shadow document object model in the closed mode on host elements designated by a user for rendering. In this way, whether the code of the application programming interface method for creating the shadow document object model is a native code is monitored in real time, so that the falsification of the application programming interface method by an external crawler is prevented, the safety is enhanced, and when the code of the application programming interface method for creating the shadow document object model is a native code, at least part of text data to be protected is dynamically created into a shadow document object model in a closed mode, thereby preventing a crawler from acquiring real text data, and the shadow document object model in the closed mode is mounted on a host element designated by a user for rendering, so that a target to be cracked by the crawler can be accurately acquired when the crawler exists.
These and other advantages of the present disclosure will become apparent from and elucidated with reference to the embodiments described hereinafter.
Drawings
Embodiments of the present disclosure will now be described in more detail and with reference to the accompanying drawings, in which:
FIG. 1 illustrates an exemplary application scenario in which a technical solution according to an embodiment of the present disclosure may be implemented;
FIG. 2 illustrates a schematic flow diagram of a text data protection method according to one embodiment of the present disclosure;
FIG. 3 illustrates a schematic diagram of a method of determining whether code of an application programming interface method for creating a shadow document object model is native code, according to one embodiment of the present disclosure;
FIG. 4 illustrates an exemplary block diagram of a text data protection device according to one embodiment of the present disclosure;
FIG. 5 illustrates an example system including an example computing device that represents one or more systems and/or devices that can implement the various techniques described herein.
Detailed Description
The following description provides specific details of various embodiments of the disclosure so that those skilled in the art may fully understand and practice the various embodiments of the disclosure. It should be understood that the technical solutions of the present disclosure may be practiced without some of these details. In some instances, well-known structures or functions have not been shown or described in detail to avoid obscuring the description of embodiments of the present disclosure with such unnecessary description. The terminology used in the present disclosure should be understood in its broadest reasonable manner, even though it is being used in conjunction with a particular embodiment of the present disclosure.
First, some terms related to the embodiments of the present application will be described so as to be easily understood by those skilled in the art.
HTML hypertext markup language (english: hyperTextMarkup Language, abbreviated HTML) is a standard markup language for creating web pages. HTML runs on a browser and is parsed by the browser, which includes a series of tags by which the format of documents on the network can be unified, allowing the distributed internet resources to be connected as a logical entity. HTML text is descriptive text composed of HTML commands that can specify words, graphics, animations, sounds, tables, links, etc.
JavaScript, simply JS, is a lightweight, interpreted or just-in-time compiled programming language with function priority. Its interpreter is called JavaScript engine, which is a part of the browser and widely used in the scripting language of clients, and is usually used on HTML pages to add dynamic functions to the HTML pages.
Document object model (Document Object Model, DOM) document object model (Document ObjectModel, DOM) is a platform and language independent model that can be used to represent HTML documents. The logical structure of the document, and the manner in which the program accesses and manipulates the document, is defined in the document object model. When a web page is loaded, the browser automatically creates a Document Object Model (DOM) for the current page. In the DOM, all parts of the document (e.g., elements, attributes, text, etc.) are organized into a logical tree structure (similar to a genealogy), and the end of each branch in the tree is called a node, each node being an object. JavaScript can be used by the DOM to access, modify, delete or add any content in the HTML document.
Shadow document object model Shadow DOM, also known as Shadow DOM. The Shadow DOM is a specification of HTML that allows a subtree of DOM elements to be inserted as the document is rendered, but this subtree is not in the main DOM tree. It allows browser developers to package their own HTML tags, css styles, and specific javascript code, while the developers can also create custom-like < input >, < video >, < audio >, etc., primary tags. The method is used for realizing encapsulation of the DOM tree and isolating the internal structure of the component from external codes. By means of the Shadow DOM, an independent DOM subtree can be created, the style and behavior of which will not affect the external DOM, nor will it be affected by the external style and behavior. When creating the Shadow DOM, two modes, open and closed, can be selected. The open mode allows access to the Shadow DOM through JavaScript. The closed mode does not allow external JavaScript to access the Shadow DOM.
Host element ShadowDOM container element. It is an element in the common DOM, and may be referred to as a host element. The hosting element may be a custom Web component, such as a custom tag, video tag, or any other custom element.
Fig. 1 illustrates an exemplary application scenario 100 in which a technical solution according to an embodiment of the present disclosure may be implemented. As shown in fig. 1, the application scenario shown includes a terminal 110, a server 120, the terminal 110 being communicatively coupled with the server 120 via a network 130.
Terminal 110 may be an intelligent terminal device with web page access capabilities. By way of example, a browser application 140 may be running on terminal 110 to access a web page. The terminal 110 and browser application 140 support HTML and JS scripts.
The browser application 140 may send a request to the server 120 to access the target web page based on the web address entered by the user. The server 120 stores various web page resources corresponding to the web address, such as a hypertext markup language file for the target web page and a corresponding JavaScript file, etc., provided by a web page or a content provider. The server 120 may return the hypertext markup language file for the target web page and the corresponding JavaScript file to the browser application based on the request. The browser application can acquire the information of the webpage corresponding to the website based on the hypertext markup language file and render the information, so that the content (such as text content; multimedia content such as icons and pictures) of the webpage can be displayed to a user, and corresponding JavaScript files (such as corresponding JavaScript scripts) can be executed.
The JavaScript file may include, for example, an anti-crawler mechanism by which the browser application 140 may obtain at least part of text data in the hypertext markup language file to be protected and host elements of the protected text data, determine whether code of an application programming interface method for creating a shadow document object model is native code, create the at least part of text data in the hypertext markup language file to be protected into a shadow document object model in a closed mode by using the application programming interface method in response to the code of the application programming interface method for creating the shadow document object model being native code, and mount the shadow document object model on the host elements specified by a user. The host element and shadow document object model may then be rendered for presentation to a user.
It should be noted that the terminal 110 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein. The network 130 may be, for example, a Wide Area Network (WAN), a Local Area Network (LAN), a wireless network, a public telephone network, an intranet, and any other type of network known to those skilled in the art.
The scenario described above is merely one example in which embodiments of the present disclosure may be implemented and is not limiting.
Fig. 2 illustrates a schematic flow diagram of a text data protection method 200 according to one embodiment of the present disclosure. The text data protection method may be implemented, for example, at the terminal 110 as shown in fig. 1, and specifically may be implemented, for example, at a browser on the terminal 110. As shown in fig. 2, the method 200 includes the following steps.
At step 210, an anti-crawler mechanism is obtained. The anticreep mechanism may be, for example, a predefined anticreep policy, which may be expressed as a anticreep policy function. In some embodiments, the anticreeper mechanism may be obtained from a web page or a server of the content provider. As an example, a request to access a target web page may first be sent, for example, to a server. Then, a hypertext markup language file for the target web page and a corresponding JavaScript file are obtained, e.g., from a server, wherein the JavaScript file includes an anti-crawler mechanism.
At step 220, the anticreeper mechanism is executed. In performing the implementation of the anticreeper mechanism, in particular, the following steps 2201-2203 may be implemented.
At step 2201, at least a portion of text data to be protected and host elements of the text data after the protection process in the hypertext markup language file are obtained. In the case that the anticreeper mechanism is an anticreeper policy function, at least part of the text data to be protected and the host element of the text data after the protection in the hypertext markup language file may serve as two parameters of the anticreeper policy function. At least a portion of the text data in the hypertext markup language file to be processed for protection may be determined by the provider of the web page, indicating that the at least portion of the text data is prohibited from being obtained by various crawler software. The text data to be protected may be part of text data or all of text data in the hypertext markup language file, which is not limited herein.
In some embodiments, at least a portion of the text data to be protected for processing and a host element of the text data after protection processing in the hypertext markup language file may be included in a JavaScript file that is transmitted by a server to a browser. In this case, at least part of the text data to be protected and the host element of the text data after the protection processing in the hypertext markup language file can be directly obtained from the JavaScript file on the browser side, which can effectively reduce the complexity of communication and save network resources, which is not limitative of course.
At step 2202, it is determined whether code of an application programming interface method used to create the shadow document object model is native code. The determination of whether the code of the application programming interface method used to create the shadow document object model is native code may be made in any suitable manner.
FIG. 3 illustrates a schematic diagram of a method 300 of determining whether code of an application programming interface method for creating a shadow document object model is native code. As shown in fig. 3, the method 300 includes steps 310-330. In step 310, code of an application programming interface method for creating a shadow document object model is parsed into character strings to obtain parsed content. As an example, toString methods may be used to parse code for application programming interface methods used to create the shadow document object model and similarly parse native code described below. In JavaScript, a string is a common data type. toString is one of methods of manipulating strings for converting other data types into string types. The method is widely used, and can convert numbers, arrays, objects, functions and the like into character string types. In step 320, it is determined whether the parsed content is identical to the content of the native code. In response to the parsed content being identical to the native code, then the code of the application programming interface method used to create the shadow document object model is determined to be the native code at step 330. In response to the parsed content being different from the content of the native code, it is determined at step 340 that the code of the application programming interface method used to create the shadow document object model is not native code.
In some embodiments, when parsing code of an application programming interface method for creating a shadow document object model into character strings, the application programming interface method for creating a shadow document object model by the host element may be first obtained. For example, an application programming interface method for creating a shadow document object model for the host element may be obtained using an attchshadow. In JavaScript attachShadow is a method for creating a shadow DOM for a specified element (host element). valueOf is a method that returns the original value of the specified object. Then, it is checked whether the parsing method for the application programming interface method is tampered. For example, in the case of parsing with toString methods, it is checked toString whether the method is tampered with. If the parsing method is not tampered, the parsing method is utilized to parse codes of an application programming interface method for creating the shadow document object model into character strings so as to obtain parsed contents. For example, if the toString method passes the verification, i.e. there is no tampering, the toString method is used for the application programming interface method to obtain the content of the character string type of the application programming interface method, and the blank space and the line feed character in the character string are removed, so as to obtain the parsed content. And if the parsed content is the same as the character string type content of the native code, determining that the application programming interface method is not tampered. Otherwise, the application programming interface method has been tampered, rendering is stopped, and the user behavior log is reported, for example, to a server or a manager. This can enhance the security of the text data and the crawler mechanism.
In step 2203, in response to the code of the application programming interface method used to create the shadow document object model being native code, at least a portion of the text data in the hypertext markup language file to be protected is created into a shadow document object model in a closed mode using the application programming interface method, and the shadow document object model in the closed mode is mounted to a host element specified by a user. The shadow document object model in the closed mode does not allow external JavaScript access, and text data to be protected can be effectively protected. The mounting of the shadow document object model in the closed mode on the host element appointed by the user is the basis for rendering text data by a subsequent browser. In some embodiments, the identity information and the host element information of the current browser are collected in response to the code of the application programming interface method used to create the shadow document object model not being native code. The crawler information may be reported, which can enhance the security of the text data.
At step 230, the host element and shadow document object model are rendered. Rendering results obtained by rendering the host element and the shadow document object model can be presented in a browser. In some embodiments, the rendering result obtained by rendering the host element and the shadow document object model may be determined, and then, if the host element of the shadow document object model is not the host element of the text data to be protected, the mode of the shadow document object model is not a closed mode, or the content of the shadow document object model is not the at least part of the text data to be protected in the hypertext markup language file, the rendering result is deleted and the rendering result is not presented. And if the host element of the shadow document object model is the host element of the text data to be protected, the mode of the shadow document object model is a closed mode, and the content of the shadow document object model is the at least part of the text data to be protected in the hypertext markup language file, the rendering result is presented. Alternatively or additionally, if the host element of the shadow document object model is not a host element of the text data to be protected, the schema of the shadow document object model is not a closed schema, or the content of the shadow document object model is not the at least part of the text data to be protected in a hypertext markup language file, the creation and mounting of the shadow document object model may be terminated if a crawler is considered to be cracking the anti-crawler mechanism or policy. If the content is already installed, the content in the host element is deleted immediately, so that the crawler cannot acquire any content. And then, collecting the identity information and the host element information of the current browser system, and completing the report of the crawler information. This can enhance the security of the text data.
In the text data protection method claimed in the present disclosure, an anti-crawler mechanism is first acquired, at least part of text data to be protected and host elements of the text data after protection processing in a hypertext markup language file are acquired as input of the anti-crawler mechanism, and when determining that code of an application programming interface method for creating a shadow document object model is native code, the anti-crawler mechanism creates the at least part of text data to be protected in the hypertext markup language file into a shadow document object model in a closed mode by using the application programming interface method, and mounts the shadow document object model in the closed mode on host elements designated by a user for rendering. In this way, whether the code of the application programming interface method for creating the shadow document object model is a native code is monitored in real time, so that the falsification of the application programming interface method by an external crawler is prevented, the safety is enhanced, and when the code of the application programming interface method for creating the shadow document object model is a native code, at least part of text data to be protected is dynamically created into a shadow document object model in a closed mode, thereby preventing a crawler from acquiring real text data, and the shadow document object model in the closed mode is mounted on a host element designated by a user for rendering, so that a target to be cracked by the crawler can be accurately acquired when the crawler exists.
Fig. 4 illustrates an exemplary block diagram of a text data protection device 400 according to one embodiment of the present disclosure. As shown in fig. 4, the text data protection apparatus includes an acquisition module 410, an execution module 420, and a rendering module 430.
The acquisition module 410 is configured to acquire an anti-crawler mechanism. The anticreep mechanism may be, for example, a predefined anticreep policy, which may be expressed as a anticreep policy function. In some embodiments, the anticreeper mechanism may be obtained from a web page or a server of the content provider. As an example, a request to access a target web page may first be sent, for example, to a server. Then, a hypertext markup language file for the target web page and a corresponding JavaScript file are obtained, e.g., from a server, wherein the JavaScript file includes an anti-crawler mechanism.
The execution module 420 is configured to execute the anticreeper mechanism to implement the steps of obtaining at least a portion of text data to be protected in a hypertext markup language file and host elements of the protected processed text data, determining if code of an application programming interface method for creating a shadow document object model is native code, creating the at least a portion of text data to be protected in the hypertext markup language file into a shadow document object model in a closed mode using the application programming interface method in response to the code of the application programming interface method for creating the shadow document object model being native code, and mounting the shadow document object model in the closed mode on the host elements specified by a user.
In some embodiments, at least a portion of the text data to be protected for processing and a host element of the text data after protection processing in the hypertext markup language file may be included in a JavaScript file that is transmitted by a server to a browser. In this case, at least part of the text data to be protected and the host element of the text data after the protection processing in the hypertext markup language file can be directly obtained from the JavaScript file on the browser side, which can effectively reduce the complexity of communication and save network resources, which is not limitative of course.
Rendering module 430 is configured to render the host element and shadow document object model. Rendering results obtained by rendering the host element and the shadow document object model can be presented in a browser. In some embodiments, the rendering result obtained by rendering the host element and the shadow document object model may be determined, and then, if the host element of the shadow document object model is not the host element of the text data to be protected, the mode of the shadow document object model is not a closed mode, or the content of the shadow document object model is not the at least part of the text data to be protected in the hypertext markup language file, the rendering result is deleted and the rendering result is not presented.
In the text data protection device claimed in the present disclosure, an anti-crawler mechanism is first acquired, then an execution module acquires at least part of text data to be protected and host elements of the text data after protection processing in a hypertext markup language file as input of the anti-crawler mechanism, and when determining that code of an application programming interface method for creating a shadow document object model is native code, the anti-crawler mechanism creates the at least part of text data to be protected in the hypertext markup language file into a shadow document object model in a closed mode by using the application programming interface method, and mounts the shadow document object model in the closed mode on host elements designated by a user for rendering. In this way, whether the code of the application programming interface method for creating the shadow document object model is a native code is monitored in real time, so that the falsification of the application programming interface method by an external crawler is prevented, the safety is enhanced, and when the code of the application programming interface method for creating the shadow document object model is a native code, at least part of text data to be protected is dynamically created into a shadow document object model in a closed mode, thereby preventing a crawler from acquiring real text data, and the shadow document object model in the closed mode is mounted on a host element designated by a user for rendering, so that a target to be cracked by the crawler can be accurately acquired when the crawler exists.
FIG. 5 illustrates an example system 500 that includes an example computing device 510 that represents one or more systems and/or devices that can implement the various techniques described herein. Computing device 510 may be, for example, a server of a service provider, a device associated with a server, a system-on-chip, and/or any other suitable computing device or computing system. The text data protection 400 described above with reference to fig. 4 may take the form of a computing device 510. Alternatively, the text data protection 400 may be implemented as a computer program in the form of an application 516.
The example computing device 510 as illustrated includes a processing system 511, one or more computer-readable media 512, and one or more I/O interfaces 513 communicatively coupled to each other. Although not shown, computing device 510 may also include a system bus or other data and command transfer system that couples the various components to one another. A system bus may include any one or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. Various other examples are also contemplated, such as control and data lines.
Processing system 511 represents functionality that performs one or more operations using hardware. Thus, the processing system 511 is illustrated as including hardware elements 514 that may be configured as processors, functional blocks, and the like. This may include implementation in hardware as application specific integrated circuits or other logic devices formed using one or more semiconductors. The hardware element 514 is not limited by the material from which it is formed or the processing mechanism employed therein. For example, the processor may be comprised of semiconductor(s) and/or transistors (e.g., electronic Integrated Circuits (ICs)). In such a context, the processor-executable instructions may be electronically-executable instructions.
Computer-readable medium 512 is illustrated as including memory/storage 515. Memory/storage 515 represents memory/storage capacity associated with one or more computer-readable media. Memory/storage 515 may include volatile media (such as Random Access Memory (RAM)) and/or nonvolatile media (such as Read Only Memory (ROM), flash memory, optical disks, magnetic disks, and so forth). Memory/storage 515 may include fixed media (e.g., RAM, ROM, a fixed hard drive, etc.) and removable media (e.g., flash memory, a removable hard drive, an optical disk, and so forth). The computer-readable medium 512 may be configured in a variety of other ways as described further below.
One or more I/O interfaces 513 represent functionality that allows a user to input commands and information to the computing device 510 using various input devices, and optionally also allows information to be presented to the user and/or other components or devices using various output devices. Examples of input devices include keyboards, cursor control devices (e.g., mice), microphones (e.g., for voice input), scanners, touch functions (e.g., capacitive or other sensors configured to detect physical touches), cameras (e.g., motion that does not involve touches may be detected as gestures using visible or invisible wavelengths such as infrared frequencies), and so forth. Examples of output devices include a display device (e.g., a monitor or projector), speakers, a printer, a network card, a haptic response device, and so forth. Accordingly, the computing device 510 may be configured in a variety of ways to support user interaction, as described further below.
Computing device 510 also includes application 516. Application 516 may be, for example, a software instance of text data protection 400 and implement the techniques described herein in combination with other elements in computing device 510.
Various techniques may be described herein in the general context of software hardware elements or program modules. Generally, these modules include routines, programs, objects, elements, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The terms "module," "functionality," and "component" as used herein generally represent software, firmware, hardware, or a combination thereof. The features of the techniques described herein are platform-independent, meaning that the techniques may be implemented on a variety of computing platforms having a variety of processors.
An implementation of the described modules and techniques may be stored on or transmitted across some form of computer readable media. Computer-readable media can include a variety of media that are accessible by computing device 510. By way of example, and not limitation, computer readable media may comprise "computer readable storage media" and "computer readable signal media".
"Computer-readable storage medium" refers to a medium and/or device that can permanently store information and/or a tangible storage device, as opposed to a mere signal transmission, carrier wave, or signal itself. Thus, computer-readable storage media refers to non-signal bearing media. Computer-readable storage media include hardware such as volatile and nonvolatile, removable and non-removable media and/or storage devices implemented in methods or techniques suitable for storage of information such as computer-readable instructions, data structures, program modules, logic elements/circuits or other data. Examples of a computer-readable storage medium may include, but are not limited to RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical storage, hard disk, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other storage devices, tangible media, or articles of manufacture adapted to store the desired information and which may be accessed by a computer.
"Computer-readable signal medium" refers to a signal bearing medium configured to transmit instructions to hardware of computing device 510, such as via a network. Signal media may typically be embodied in computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, data signal, or other transport mechanism. Signal media also include any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
As previously described, the hardware elements 514 and computer-readable media 512 represent instructions, modules, programmable device logic, and/or fixed device logic implemented in hardware that may be used in some embodiments to implement at least some aspects of the techniques described herein. The hardware elements may include integrated circuits or components of a system on a chip, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), complex Programmable Logic Devices (CPLDs), and other implementations in silicon or other hardware devices. In this context, the hardware elements may be implemented as processing devices that perform program tasks defined by instructions, modules, and/or logic embodied by the hardware elements, as well as hardware devices that store instructions for execution, such as the previously described computer-readable storage media.
Combinations of the foregoing may also be used to implement the various techniques and modules described herein. Accordingly, software, hardware, or program modules, and other program modules may be implemented as one or more instructions and/or logic embodied on some form of computer-readable storage medium and/or by one or more hardware elements 514. Computing device 510 may be configured to implement particular instructions and/or functions corresponding to software and/or hardware modules. Thus, for example, by using the computer-readable storage medium of the processing system and/or the hardware element 514, a module may be implemented at least in part in hardware as a module executable by the computing device 510 as software. The instructions and/or functions may be executable/operable by one or more articles of manufacture (e.g., one or more computing devices 510 and/or processing systems 511) to implement the techniques, modules, and examples described herein.
In various implementations, the computing device 510 may take on a variety of different configurations. For example, computing device 510 may be implemented as a computer-like device including a personal computer, desktop computer, multi-screen computer, laptop computer, netbook, and the like. Computing device 510 may also be implemented as a mobile appliance-like device that includes mobile devices such as mobile phones, portable music players, portable gaming devices, tablet computers, multi-screen computers, and the like. Computing device 510 may also be implemented as a television-like device that includes devices having or connected to generally larger screens in casual viewing environments. Such devices include televisions, set-top boxes, gaming machines, and the like.
The techniques described herein may be supported by these various configurations of computing device 510 and are not limited to the specific examples of techniques described herein. The functionality may also be implemented in whole or in part on the "cloud" 520 through the use of a distributed system, such as through the platform 522 as described below.
Cloud 520 includes and/or represents a platform 522 for resources 524. Platform 522 abstracts underlying functionality of hardware (e.g., servers) and software resources of cloud 520. The resources 524 may include applications and/or data that may be used when executing computer processing on a server remote from the computing device 510. The resources 524 may also include services provided over the internet and/or over subscriber networks such as cellular or Wi-Fi networks.
Platform 522 may abstract resources and functionality to connect computing device 510 with other computing devices. Platform 522 may also be used to abstract a hierarchy of resources to provide a corresponding level of hierarchy of encountered demand for resources 524 implemented via platform 522. Thus, in an interconnect device embodiment, implementation of the functionality described herein may be distributed throughout system 500. For example, the functionality may be implemented in part on computing device 510 and by platform 522 abstracting the functionality of cloud 520.
The present disclosure provides a computer readable storage medium having stored thereon computer readable instructions that when executed implement any of the methods described above.
The present disclosure provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computing device, and executed by the processor, cause the computing device to perform any of the methods provided in the various alternative implementations described above.
It should be understood that for clarity, embodiments of the present disclosure have been described with reference to different functional units. However, it will be apparent that the functionality of each functional unit may be implemented in a single unit, in a plurality of units or as part of other functional units without departing from the present disclosure. For example, functionality illustrated to be performed by a single unit may be performed by multiple different units. Thus, references to specific functional units are only to be seen as references to suitable units for providing the described functionality rather than indicative of a strict logical or physical structure or organization. Thus, the present disclosure may be implemented in a single unit or may be physically and functionally distributed between different units and circuits.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various devices, elements, components or sections, these devices, elements, components or sections should not be limited by these terms. These terms are only used to distinguish one device, element, component, or section from another device, element, component, or section.
Although the present disclosure has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present disclosure is limited only by the appended claims. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. The order of features in the claims does not imply any specific order in which the features must be worked. Furthermore, in the claims, the word "comprising" does not exclude other elements, and the term "a" or "an" does not exclude a plurality. Reference signs in the claims are provided merely as a clarifying example and shall not be construed as limiting the scope of the claims in any way.

Claims (11)

1.一种文本数据保护方法,包括:1. A text data protection method, comprising: 获取反爬虫机制;Get anti-crawler mechanism; 执行所述反爬虫机制,以实施如下步骤:Execute the anti-crawler mechanism to implement the following steps: 获取超文本标记语言文件中的要保护处理的至少部分文本数据以及保护处理后的文本数据的宿主元素;Acquire at least a portion of text data to be protected and processed in a hypertext markup language file and a host element of the text data after protection processing; 确定用于创建影子文档对象模型的应用编程接口方法的代码是否为原生代码;Determining whether code for an application programming interface method used to create a shadow document object model is native code; 响应于用于创建影子文档对象模型的应用编程接口方法的代码为原生代码,利用所述应用编程接口方法将所述超文本标记语言文件中的要保护处理的至少部分文本数据创建成封闭模式的影子文档对象模型,并将所述封闭模式的影子文档对象模型挂载到用户指定的宿主元素上;In response to the code of the application programming interface method for creating the shadow document object model being native code, using the application programming interface method to create at least a portion of the text data to be protected in the hypertext markup language file into a closed-mode shadow document object model, and mounting the closed-mode shadow document object model on a host element specified by a user; 对所述宿主元素和影子文档对象模型进行渲染。The host element and the shadow document object model are rendered. 2.根据权利要求1所述的方法,其中,所述获取反爬虫机制,包括:2. The method according to claim 1, wherein the obtaining of the anti-crawler mechanism comprises: 发送访问目标网页的请求;Send a request to access the target web page; 获取针对所述目标网页的超文本标记语言文件以及相应的JavaScript文件,其中所述JavaScript文件包括反爬虫机制。Obtain a hypertext markup language file and a corresponding JavaScript file for the target web page, wherein the JavaScript file includes an anti-crawler mechanism. 3.根据权利要求2所述的方法,其中,所述JavaScript文件还包括超文本标记语言文件中的要保护处理的至少部分文本数据以及保护处理后的文本数据的宿主元素;以及其中,所述获取超文本标记语言文件中的要保护处理的至少部分文本数据以及保护处理后的文本数据的宿主元素,包括:3. The method according to claim 2, wherein the JavaScript file further comprises at least a portion of the text data to be protected in the hypertext markup language file and a host element of the text data after the protection processing; and wherein obtaining at least a portion of the text data to be protected in the hypertext markup language file and a host element of the text data after the protection processing comprises: 从所述JavaScript文件获取超文本标记语言文件中的要保护处理的至少部分文本数据以及保护处理后的文本数据的宿主元素。At least part of the text data to be protected and processed in the hypertext markup language file and a host element of the text data after protection processing are obtained from the JavaScript file. 4.根据权利要求1所述的方法,还包括:4. The method according to claim 1, further comprising: 响应于用于创建影子文档对象模型的应用编程接口方法的代码不是原生代码,收集当前浏览器的身份信息和宿主元素信息。In response to the code of the application programming interface method for creating the shadow document object model being not native code, identity information and host element information of the current browser are collected. 5.根据权利要求1所述的方法,其中,所述确定用于创建影子文档对象模型的应用编程接口方法的代码是否为原生代码,包括:5. The method according to claim 1, wherein the step of determining whether the code of the application programming interface method for creating the shadow document object model is native code comprises: 将用于创建影子文档对象模型的应用编程接口方法的代码解析为字符串,以得到解析后的内容;Parsing the code of the application programming interface method for creating the shadow document object model into a string to obtain the parsed content; 响应于解析后的内容与原生代码的内容相同,则确定用于创建影子文档对象模型的应用编程接口方法的代码为原生代码;In response to the parsed content being identical to the content of the native code, determining that the code of the application programming interface method for creating the shadow document object model is the native code; 响应于解析后的内容与原生代码的内容不同,则确定用于创建影子文档对象模型的应用编程接口方法的代码不是原生代码。In response to the parsed content being different from the content of the native code, it is determined that the code of the application programming interface method for creating the shadow document object model is not the native code. 6.根据权利要求5所述的方法,其中,将用于创建影子文档对象模型的应用编程接口方法的代码解析为字符串,以得到解析后的内容,包括:6. The method according to claim 5, wherein parsing the code of the application programming interface method for creating the shadow document object model into a string to obtain the parsed content comprises: 获取所述宿主元素创建影子文档对象模型的应用编程接口方法;Obtain an application programming interface method for creating a shadow document object model from the host element; 检验针对所述应用编程接口方法的解析方法是否被篡改;Checking whether the parsing method for the application programming interface method has been tampered with; 响应于所述解析方法没有被篡改,则利用所述解析方法将用于创建影子文档对象模型的应用编程接口方法的代码解析为字符串,以得到解析后的内容。In response to the parsing method not being tampered with, the code of the application programming interface method for creating the shadow document object model is parsed into a character string using the parsing method to obtain parsed content. 7.根据权利要求1所述的方法,还包括:7. The method according to claim 1, further comprising: 根据对所述宿主元素和影子文档对象模型进行渲染得到的渲染结果,确定所述影子文档对象模型的宿主元素、所述影子文档对象模型的模式、所述影子文档对象模型的内容;Determine the host element of the shadow document object model, the mode of the shadow document object model, and the content of the shadow document object model according to a rendering result obtained by rendering the host element and the shadow document object model; 响应于所述影子文档对象模型的宿主元素不是所述要保护处理的文本数据的宿主元素、所述影子文档对象模型的模式不是封闭模式、或者所述影子文档对象模型的内容不是超文本标记语言文件中的要保护处理的所述至少部分文本数据,则删除所述渲染结果。In response to the host element of the shadow document object model not being the host element of the text data to be protected, the mode of the shadow document object model is not a closed mode, or the content of the shadow document object model is not at least part of the text data to be protected in the hypertext markup language file, the rendering result is deleted. 8.一种文本数据保护装置,包括:8. A text data protection device, comprising: 获取模块,被配置成获取反爬虫机制;The acquisition module is configured to acquire anti-crawler mechanisms; 执行模块,被配置成执行所述反爬虫机制,以实施如下步骤:The execution module is configured to execute the anti-crawler mechanism to implement the following steps: 获取超文本标记语言文件中的要保护处理的至少部分文本数据以及保护处理后的文本数据的宿主元素;Acquire at least a portion of text data to be protected and processed in a hypertext markup language file and a host element of the text data after protection processing; 确定用于创建影子文档对象模型的应用编程接口方法的代码是否为原生代码;Determining whether code for an application programming interface method used to create a shadow document object model is native code; 响应于用于创建影子文档对象模型的应用编程接口方法的代码为原生代码,利用所述应用编程接口方法将所述超文本标记语言文件中的要保护处理的至少部分文本数据创建成封闭模式的影子文档对象模型,并将所述封闭模式的影子文档对象模型挂载到用户指定的宿主元素上;In response to the code of the application programming interface method for creating the shadow document object model being native code, using the application programming interface method to create at least a portion of the text data to be protected in the hypertext markup language file into a closed-mode shadow document object model, and mounting the closed-mode shadow document object model on a host element specified by a user; 渲染模块,被配置成对所述宿主元素和影子文档对象模型进行渲染。The rendering module is configured to render the host element and the shadow document object model. 9.一种计算设备,包括9. A computing device comprising 存储器,其被配置成存储计算机可执行指令;a memory configured to store computer-executable instructions; 处理器,其被配置成当所述计算机可执行指令被处理器执行时执行如权利要求1-7中的任一项所述的方法。A processor, which is configured to perform the method according to any one of claims 1 to 7 when the computer executable instructions are executed by the processor. 10.一种计算机可读存储介质,其存储有计算机可执行指令,当所述计算机可执行指令被执行时,执行如权利要求1-7中的任一项所述的方法。10. A computer-readable storage medium storing computer-executable instructions, wherein when the computer-executable instructions are executed, the method according to any one of claims 1 to 7 is executed. 11.一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机可执行指令,计算机可执行指令在被执行时实现根据权利要求1至7中任一项所述的方法。11. A computer program product, characterized in that the computer program product comprises computer executable instructions, which implement the method according to any one of claims 1 to 7 when executed.
CN202411120957.XA 2024-08-15 2024-08-15 Text data protection method and device, computing device and storage medium Pending CN119066705A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411120957.XA CN119066705A (en) 2024-08-15 2024-08-15 Text data protection method and device, computing device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411120957.XA CN119066705A (en) 2024-08-15 2024-08-15 Text data protection method and device, computing device and storage medium

Publications (1)

Publication Number Publication Date
CN119066705A true CN119066705A (en) 2024-12-03

Family

ID=93634487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411120957.XA Pending CN119066705A (en) 2024-08-15 2024-08-15 Text data protection method and device, computing device and storage medium

Country Status (1)

Country Link
CN (1) CN119066705A (en)

Similar Documents

Publication Publication Date Title
US10642904B2 (en) Infrastructure enabling intelligent execution and crawling of a web application
JP7330891B2 (en) System and method for direct in-browser markup of elements in Internet content
JP5480892B2 (en) Advertisement presentation based on WEB page dialogue
WO2023093673A1 (en) Information processing method, apparatus and system, and storage medium
US20130212465A1 (en) Postponed rendering of select web page elements
CN106528657A (en) Control method and device for jumping from browser to application program
US20090158141A1 (en) Method and system to secure the display of a particular element of a markup file
US10084878B2 (en) Systems and methods for hosted application marketplaces
US20180343174A1 (en) Rule based page processing and network request processing in browsers
US9250940B2 (en) Virtualization detection
KR20110009675A (en) Method and system for selectively ensuring the display of advertisements on a web browser
WO2014200853A2 (en) Determining message data to present
CN105069132A (en) Webpage implementation method based on static shell
US20150222664A1 (en) Conflict resolution in extension induced modifications to web requests and web page content
KR20120016333A (en) Pre-caching method for web application and terminal device applying the same
US11477158B2 (en) Method and apparatus for advertisement anti-blocking
US9436669B1 (en) Systems and methods for interfacing with dynamic web forms
US8793616B2 (en) Look ahead of links/alter links
CN115421693A (en) Method and device for realizing micro front-end architecture, computer equipment and storage medium
CN113553522B (en) A page display method, device, electronic device and storage medium
CN119066705A (en) Text data protection method and device, computing device and storage medium
CN112257100A (en) Method and device for detecting sensitive data protection effect and storage medium
CN115687815B (en) Page information display method, device, equipment and medium
CN114741628A (en) A web page loading method and related device
CN108509229A (en) Method, terminal device and the computer readable storage medium of the cross-domain control of window

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination