CN101779436B

CN101779436B - Tracking the origins of data and controlling data transmission

Info

Publication number: CN101779436B
Application number: CN200880102951.2A
Authority: CN
Inventors: 朱利安·L·弗雷德曼; 彼得·沃顿
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2007-08-15
Filing date: 2008-08-05
Publication date: 2014-01-15
Anticipated expiration: 2028-08-05
Also published as: CA2689216C; WO2009021881A2; WO2009021881A3; KR101201003B1; JP2010536107A; US8181260B2; US20090049557A1; CN101779436A; KR20100052472A; JP5306348B2; EP2188969A2; CA2689216A1; EP2188969B1

Abstract

Provided are methods, apparatus and computer programs for tracking the origins of data and controlling transmission of the data. In one embodiment, transmission of sensitive data by script operations is limited, to prevent transmission to any network location other than to the source of that sensitive data, by a new function within a scripting engine of an HTTP client that is responsive to origin tags placed within the data. Origin tags that are associated with data inputs are propagated to any output data items, so that transmission of derived information can also be controlled.

Description

A method and system for controlling the transmission of sensitive data

技术领域 technical field

本发明涉及用于对可编辑脚本的(scriptable)客户端(如Web浏览器)跟踪数据的来源并控制敏感数据的传输的方法、数据处理系统。 The present invention relates to methods, data processing systems for tracking the origin of data and controlling the transfer of sensitive data to scriptable clients such as web browsers. the

背景技术Background technique

Web浏览器是在用户的数据处理装置上运行并提供对万维网上的信息的访问(使用超文本传输协议(HTTP)提交请求到Web服务器计算机以检索网页并浏览网页内容或与网页内容交互)的计算机程序。一些Web浏览器和类似的HTTP客户端可解释脚本语言。典型地，使用JavaScript^TM，尽管一些Web浏览器理解VBScript(Visual Basic脚本编辑)，并且该机制可扩展到其他语言。通过将脚本语言指令包括在页面的文本中，当用有适当能力的浏览器观看时，作者可使得其展现任意复杂行为，以及或替代地，表现为静态文档。可直接包括这样的指令，或通过参考分开的文件而包括，在所述分开的文件中包含了所述指令。 A web browser is one that runs on a user's data processing device and provides access to information on the World Wide Web (using the Hypertext Transfer Protocol (HTTP) to submit requests to a web server computer to retrieve web pages and browse or interact with web content) Computer program. Some web browsers and similar HTTP clients interpret scripting languages. Typically, JavaScript ^™ is used, although some web browsers understand VBScript (Visual Basic Scripting) and the mechanism can be extended to other languages. By including scripting language instructions in the text of a page, an author can cause it to exhibit arbitrarily complex behavior and, or alternatively, appear as a static document when viewed with a suitably capable browser. Such instructions may be included directly, or by reference to a separate document in which they are contained.

网页中的嵌入脚本指令通过浏览器内的子系统或“脚本编辑引擎(scripting engine)”解释。该引擎自身以编程语言(典型地，如C++或Java^TM的更结构化的语言)编写。该引擎能够执行多个操作；每个脚本语言指令字面上是用于引擎执行其可用操作中的特定一个的指令。 Embedded scripting instructions in web pages are interpreted by a subsystem or "scripting engine" within the browser. The engine itself is written in a programming language (typically a more structured language like C++ or Java ^™ ). The engine is capable of performing multiple operations; each scripting language instruction is literally an instruction for the engine to perform a specific one of its available operations.

脚本编辑引擎还访问表示浏览器内的网页自身的数据结构。某些可由该引擎执行的操作涉及从这些数据结构读取或写入这些数据结构，当在浏览器内观看时有效地编辑该页面。可在脚本操作中使用的其他数据源包括可用初始值设置或从任何其他源填充的脚本变量、以及使用已知为“XMLHttpRequest”的技术分开下载的其他数据。 The scripting engine also accesses data structures representing the web pages themselves within the browser. Some of the operations that can be performed by the engine involve reading from or writing to these data structures, effectively editing the page when viewed within a browser. Other sources of data that can be used in script actions include script variables that can be set with initial values or populated from any other source, and other data that is downloaded separately using a technique known as "XMLHttpRequest". the

与在脚本编辑引擎内严格地操作信息并与网页的内部表现交换信息一样，存在使得浏览器经由其网络连接与其他系统交互、或执行由人类用户通常命令的其他动作的一类脚本指令。典型地限制在该组中可用的指令的选择，以便减少恶意脚本可能有的影响。 As with strictly manipulating information within a scripting engine and exchanging information with the internal representation of a web page, there is a class of scripting instructions that cause the browser to interact with other systems via its network connections, or perform other actions typically ordered by a human user. The choice of instructions available in this set is typically limited in order to reduce the impact that malicious scripts may have. the

重要的是，注意到因为脚本作为网页的一部分出现，并且因为脚本自身可能包含可能稍后插入到页面中的信息，所以脚本自身是数据的形式，并且本发明中的“数据元素”的所有考虑将同等地适用于构成脚本的指令列表。 It is important to note that because scripts appear as part of a web page, and because scripts themselves may contain information that may later be inserted into the page, scripts themselves are forms of data, and all considerations of "data elements" in this invention The same applies to the list of directives that make up the script. the

作为可如何使用上述设备(facility)的示例，考虑使用AJAX(异步JavaScript和XML)技术以网页形式传递的基于Web的文档编辑应用。图形Web浏览器最初下载携带各种显示元素(如标尺、按钮和编辑文档区)并且还参照脚本指令的文件的页面。这些指令在如何响应用户的动作方面指引脚本编辑引擎。例如，如果用户要激活“黑体”按钮，则可指令脚本编辑引擎首先读取页面以便确定用户已经选择了文本区中的哪些词。该信息将由以将文本标记为黑体的方式修改页面数据结构的另一指令使用。最终，称为“呈现引擎”的浏览器的另一部分将读取那些标记，结果，将导致标记的词以黑体外观出现在页面上。 As an example of how the above facility may be used, consider a web-based document editing application delivered in the form of a web page using AJAX (Asynchronous JavaScript and XML) technology. The graphical web browser initially downloads the page carrying the various display elements such as rulers, buttons and edit document fields and also references the file of scripting instructions. These instructions instruct the scripting engine on how to respond to the user's actions. For example, if the user were to activate a "bold" button, the scripting engine could be instructed to first read the page in order to determine which words in the text area the user has selected. This information will be used by another instruction that modifies the page data structure in a way that marks the text as bold. Eventually, another part of the browser called the "rendering engine" will read those tags and, as a result, will cause the tagged words to appear on the page in a boldface appearance. the

如支持JavaScript的Web浏览器的可编辑脚本的HTTP客户端传统地一次加载仅仅来自一个源位置的数据。然而，期望在Web服务环境中，能够将来自若干源位置的脚本和数据组合到一个脚本环境中。返回到基于Web的文档编辑应用的示例，用户通常需要例如通过将文档上载到提供者的服务器，将他们的文档暴露并委托给应用的提供者。 Scriptable HTTP clients such as JavaScript-enabled web browsers traditionally load data from only one source location at a time. However, it is desirable in a Web services environment to be able to combine scripts and data from several source locations into one scripting environment. Returning to the example of a web-based document editing application, users typically need to expose and delegate their documents to the provider of the application, eg, by uploading the document to the provider's server. the

尽管大量应用程序当前正以Web服务的形式经由万维网变得可用，所述Web服务开发在Web浏览器内运行的脚本并与Web服务器通信，但是许多组织和个人不愿意接受他们的机密文档的固有暴露。这限制了可用应用的使用。 Although a large number of applications are currently becoming available via the World Wide Web in the form of Web services that develop scripts that run within Web browsers and communicate with Web servers, many organizations and individuals are unwilling to accept the inherent security of their confidential documents. exposed. This limits the use of available applications. the

对于在客户端内以脚本编辑语言编写的此类应用，对从一个或多个不同位置获得的数据运行将是可能的。然而，一旦脚本和数据加载到脚本编辑环境中，当前没有适当的机制用于确保脚本不被恶意地或在提供如拼写检查的过程中将数据传输回到其自己的服务器。已知解决方案涉及阻止未授权的脚本访问某些文件和对象，但是这可能是过分限制性的。 For such applications written in a scripting language within the client, it would be possible to run on data obtained from one or more different locations. However, once the script and data are loaded into the scripting environment, there is currently no mechanism in place for ensuring that the script is not transmitted maliciously or in the process of providing data such as spell checking back to its own server. Known solutions involve preventing unauthorized scripts from accessing certain files and objects, but this may be overly restrictive. the

例如，美国专利No.6,986,062描述了基于脚本的来源和定义的许可控制脚本访问对象的能力。客户端的访问控制数据结构中的条目包括与对象相关联的源标识符字段和许可标识符字段-脚本的源被记录并随后检查，并且阻止未授权的脚本访问某些对象。 For example, US Patent No. 6,986,062 describes controlling a script's ability to access an object based on the origin of the script and defined permissions. Entries in the client's access control data structure include a source identifier field and a permission identifier field associated with the object - the source of the script is recorded and subsequently checked, and unauthorized scripts are prevented from accessing certain objects. the

美国专利No.6,505,300描述了对于如脚本的不受信任的内容限制执行文本。当进程尝试访问资源时，与该进程相关联的令牌与资源的安全性信息进行比较，以确定是否允许该访问。脚本的源可确定如何信任它并且可对特定资源执行什么进程。 US Patent No. 6,505,300 describes restricting the execution of text to untrusted content such as scripts. When a process attempts to access a resource, the token associated with the process is compared with the resource's security information to determine whether the access is permitted. The source of a script determines how it is trusted and what processes can be performed on a particular resource. the

美国专利申请No.2006/0230452公开了从外部位置获得文件并添加关于获得的文件的来源的标签信息。获得的文件的来源可用于随后的安全性策略判断，如允许还是阻止内容的执行或呈现。 US Patent Application No. 2006/0230452 discloses obtaining files from external locations and adding tagging information about the source of the obtained files. The source of the obtained file can be used in subsequent security policy decisions, such as allowing or blocking the execution or presentation of the content. the

发明内容 Contents of the invention

本发明的第一方面提供一种在脚本编辑环境内执行的、控制敏感数据的传输的方法，所述放包括以下步骤：将来源的指示与第一数据元素相关联；将来源的指示传播到从第一数据元素生成的数据元素；以及限制第一数据元素和所述生成的数据元素仅传输到许可的目的地，其中参照来源的指示标识所述许可的目的地。 A first aspect of the invention provides a method, performed within a scripting environment, of controlling the transfer of sensitive data comprising the steps of: associating an indication of origin with a first data element; propagating the indication of origin to a data element generated from the first data element; and restricting transmission of the first data element and said generated data element to only permitted destinations, wherein said permitted destination is identified with reference to an indication of a source. the

第一数据元素可包括一组数据元素的一部分，并且潜在地非常大的一组数据，来源的指示与该数据关联。生成的数据元素还可以是一组生成的数据元素之一。 The first data element may comprise a portion of a set of data elements, and potentially a very large set of data, with which the indication of source is associated. The generated data element may also be one of a set of generated data elements. the

在本发明的一个实施例中，阻止第一数据元素和从第一数据元素生成的任何数据元素传输到第一数据元素的来源以外的任何来源。这可以阻止脚本编辑操作将敏感数据跨越网络传输到数据源以外的任何网络节点，同时允许脚本在限制数据传输的规则的约束内合作。 In one embodiment of the invention, the first data element and any data elements generated from the first data element are prevented from being transmitted to any source other than the source of the first data element. This prevents script editing operations from transferring sensitive data across the network to any network node other than the source of the data, while allowing scripts to cooperate within the constraints of the rules limiting data transfer. the

在一个实施例中，如果在脚本编辑环境内从每个具有关联的来源指示的多个数据元素生成新数据元素，则新数据元素将具有从多个数据元素得到的来源的关联指示。如果多个敏感输入数据元素具有不同来源，则限制传输的步骤可阻止输出数据传输到任何目的地；而从具有共同来源的多个输入数据元素得到的输出可传输回到共同的来源，同时阻止其传输到任何其他目的地。 In one embodiment, if a new data element is generated within the scripting environment from a plurality of data elements each having an associated indication of origin, the new data element will have an associated indication of origin derived from the plurality of data elements. If multiple sensitive input data elements have different origins, the step of restricting transmission prevents output data from being transmitted to any destination; while output resulting from multiple input data elements with a common origin can be transmitted back to a common source while preventing its transmission to any other destination. the

在运行“不受信任的”脚本的脚本编辑环境内，限制传输的步骤是有利的。在该上下文中的“不受信任的”脚本是对其没有验证脚本或其提供者的安全性控制或信任度的任何脚本(因此，术语“不受信任的”不暗示脚本或其提供者已经被识别为具有任何恶意)。 The step of restricting transfers is advantageous within a scripting environment where "untrusted" scripts are run. An "untrusted" script in this context is any script for which there is no security control or trust to verify the script or its provider (thus, the term "untrusted" does not imply that the script or its provider has identified as having any malicious intent). the

当数据由脚本操作时，本发明提供了对敏感数据的保护，确保脚本不将该数据发送到还没有信息的任何服务器，而不需要验证脚本或其提供者的信任度。关联和传播来源的指示的步骤可由Web浏览器内的脚本编辑引擎实现，其对数据内的安全性标签响应以阻止任何脚本将标记的数据发送到数据的源位置以外的位置。根据一个实施例的“脚本编辑引擎”包括脚本解释器、标签生成器和标签传播器。 When the data is manipulated by a script, the invention provides protection of sensitive data, ensuring that the script does not send that data to any server that does not already have the information, without the need to verify the trustworthiness of the script or its provider. The steps of associating and propagating the indication of origin can be accomplished by a scripting engine within the web browser that responds to security tags within the data to prevent any script from sending the marked data to a location other than the source location of the data. A "script editing engine" according to one embodiment includes a script interpreter, a tag generator, and a tag propagator. the

本发明的第二方面提供一种数据处理系统，包括：至少一个数据处理单元；至少一个数据存储单元；脚本解释器；用于将来源的指示与第一数据元素相关联的部件；用于将来源的指示传播到从第一数据元素生成的新数据元素的部件；以及用于限制将第一数据元素和所述生成的数据元素从数据处理系统仅传输到许可的目的地的部件，其中参考来源的指示标识所述许可的目的地。 A second aspect of the present invention provides a data processing system comprising: at least one data processing unit; at least one data storage unit; a script interpreter; means for associating an indication of a source with a first data element; means for propagating an indication of origin to a new data element generated from a first data element; and means for restricting transmission of the first data element and said generated data element from the data processing system to permitted destinations only, wherein reference The indication of source identifies the destination of the license. the

在该第二方面的一个实施例中，脚本解释器、用于关联的部件、用于传播的部件和用于限制传输的部件全部在如Web浏览器的可编辑脚本的HTTP客户端计算机程序内实现。在一个实施例中，用于关联的部件和用于传播的部件提供为HTTP客户端程序内的改进的脚本编辑引擎的特征，可使得该脚本编辑引擎可用作记录介质上的包括计算机可读程序代码的程序产品。 In one embodiment of this second aspect, the script interpreter, the means for associating, the means for propagating and the means for restricting transmission are all within a scriptable HTTP client computer program such as a web browser accomplish. In one embodiment, the components for associating and the components for propagating are provided as features of an improved scripting engine within an HTTP client program, which can make the scripting engine available as a computer-readable Program Product of Program Code. the

附图说明 Description of drawings

下面参照附图通过示例的方式更详细地描述本发明实施例，附图中： The embodiment of the present invention is described in more detail below by way of example with reference to the accompanying drawings, in the accompanying drawings:

图1是其中可实现本发明的网络的示意表示； Figure 1 is a schematic representation of a network in which the present invention can be implemented;

图2示出其中可实现本发明的示例客户端数据处理系统； Figure 2 shows an example client data processing system in which the present invention can be implemented;

图3示出根据本发明实施例的脚本编辑(scripting)引擎的组件； Figure 3 shows the components of a scripting engine according to an embodiment of the invention;

图4示出根据本发明实施例、在客户端和服务器之间的交互操作的序列，在本发明实施例中，生成和存储来源标签，此后将其传播到新的数据元素然后用于阻止敏感数据传输到数据来源以外的目的地；以及 Figure 4 shows a sequence of interactive operations between a client and a server according to an embodiment of the invention, in which an origin tag is generated and stored, thereafter propagated to new data elements and then used to block sensitive transfer of data to a destination other than the source of the data; and

图5示出根据本发明实施例的传播注释(annotation)的操作的序列。 FIG. 5 shows a sequence of operations for propagating annotations according to an embodiment of the present invention. the

具体实施方式 Detailed ways

1.网络环境和示例客户端系统 1. Network environment and sample client system

图1示意性示出的分布式数据处理网络包括任何数量的客户端数据处理系统10、20、30以及经由HTTP相互通信的服务器数据处理系统40、50和60。尽管本发明无论在系统硬件方面还是操作系统环境方面都不限于特定类型的数据处理系统，但是如图2示意性示出的典型的客户端系统10可以是包括至少一个数据处理单元100、至少一个数据存储单元、内部通信单元130、输入/输出组件和网络接口190的膝上型或桌面型数据处理系统，所述至少一个数据存储单元典型地包括易失性系统存储器110和如盘存储的非易失性存储组件120，所述输入/输出组件包括设备驱动器140和用于鼠标160、键盘170和监视器180的连接接口150。在本发明的实现中使用的典型客户端系统具有其上安装的多个程序代码组件，包括操作系统软件200、Web浏览器210和多个应用程序220、230。Web浏览器210适于使用请求响应HTTP模型经由网络通信与远程Web服务器交互。Web服务器40、50和60的每个可包括一个或多个HTTP服务器70和一个或多个应用服务器80、90，其在单服务器数据处理系统上或协作以提供高可用性和吞吐量的服务器机群上运行，但是本发明不要求任何特定Web服务器架构。 The distributed data processing network shown schematically in Fig. 1 includes any number of client data processing systems 10, 20, 30 and server data processing systems 40, 50 and 60 communicating with each other via HTTP. Although the present invention is not limited to a specific type of data processing system in terms of system hardware or operating system environment, a typical client system 10 as schematically shown in FIG. 2 may include at least one data processing unit 100, at least one A laptop or desktop data processing system with a data storage unit, internal communication unit 130, input/output components, and network interface 190, the at least one data storage unit typically including volatile system memory 110 and storage devices such as disk storage A non-volatile storage component 120 , the input/output component includes a device driver 140 and a connection interface 150 for a mouse 160 , a keyboard 170 and a monitor 180 . A typical client system used in the implementation of the present invention has a number of program code components installed thereon, including operating system software 200, a web browser 210, and a number of application programs 220,230. Web browser 210 is adapted to interact with remote Web servers via network communications using a request-response HTTP model. Each of Web servers 40, 50, and 60 may include one or more HTTP servers 70 and one or more application servers 80, 90, either on a single server data processing system or cooperatively to provide high availability and throughput server farms but the present invention does not require any particular web server architecture. the

在用于本发明的客户端系统中，Web浏览器210包括使得能够在下载的网页内执行脚本的脚本解释器，其包括脚本编辑引擎240的大部分组件。下面在本发明实施例的上下文中，参照图3到6描述本发明的各种组件和特征，其中若干组件实现为新颖的脚本编辑引擎的程序代码组件。如图3所示，脚本编辑引擎240包括脚本解释器250、来源标签生成器260和标签传播器270。本领域技术人员将理解，如下述用于阻止传输的部件的本发明的各种组件可同等地实现为硬件组件，如专用集成电路(ASIC)。 In the client system used in the present invention, the web browser 210 includes a script interpreter, which includes most of the components of the script editing engine 240 , enabling execution of scripts within downloaded web pages. Various components and features of the present invention are described below with reference to FIGS. 3 through 6 in the context of embodiments of the invention, several of which are implemented as program code components of the novel scripting engine. As shown in FIG. 3 , the script editing engine 240 includes a script interpreter 250 , a source tag generator 260 and a tag propagator 270 . Those skilled in the art will appreciate that the various components of the present invention, such as the means for blocking transmission described below, could equally be implemented as hardware components, such as application specific integrated circuits (ASICs). the

如下所述，生成和传播的标签由Web浏览器或其他HTTP客户端用来控制敏感数据元素的传输。 The generated and propagated tags are used by web browsers or other HTTP clients to control the transmission of sensitive data elements, as described below. the

2.指示来源的标签的生成和存储 2. Generation and storage of tags indicating origin

如上所述，存在多种可在脚本编辑引擎操作中使用的数据元素。所有这些元素存储在与脚本编辑引擎240自身相关联或与Web浏览器210相关联(或，在其他实现中，与其脚本编辑引擎是一部分的另一程序相关联)的数据结构中。尽管脚本编辑操作将读取和修改这些数据结构内的数据作为其内部工作的一部分，但是不存在这样的脚本编辑操作，通过其脚本可为其自己的目的检查或改变数据结构自身。 As mentioned above, there are a variety of data elements that can be used in the operation of the scripting engine. All of these elements are stored in data structures associated with scripting engine 240 itself or with web browser 210 (or, in other implementations, another program of which the scripting engine is a part). Although scripting operations read and modify data within these data structures as part of their inner workings, there are no scripting operations by which a script can inspect or alter the data structures themselves for their own purposes. the

在现有脚本编辑引擎中，这些数据结构典型地包含数据元素的当前值和用于脚本编辑引擎使用的一些额外信息。这样的信息的一个示例是关于自动垃圾收集的信息。本发明涉及将来源的指示与数据元素相关联，并且对于实现本发明中涉及的数据元素的“注释”可以以与对脚本编辑引擎使用而提供的其他信息相同的方式存储在脚本编辑引擎的数据结构中。这意味着仅需要对当前的Web浏览器和其他可编辑脚本的HTTP客户端程序进行小的改变来实现本发明，并且确保对于脚本没有机制可用于干扰其自身注释。 In existing scripting engines, these data structures typically contain the current value of the data element and some additional information for use by the scripting engine. An example of such information is information about automatic garbage collection. The present invention involves associating an indication of origin with a data element, and "comments" to the data elements involved in implementing the present invention may be stored in the scripting engine's data in the same manner as other information provided for use with the scripting engine in structure. This means that only minor changes are required to current web browsers and other scriptable HTTP client programs to implement the invention, and ensures that no mechanism is available for a script to interfere with its own annotations. the

图4示出对于一组数据元素的指示来源的标签或“注释”的初始生成中涉及的一系列步骤。所有被认为是敏感的数据在被首先用于脚本操作中之前，必须被分配来源标签。存在很多可将数据引入现代Web浏览器的方式，但是其大部分涉及HTTP协议(在W3C网络工作组的请求评论2616“超文本传输协议-HTTP/1.1”中规定，1999年6月)。对于这样的方法，适于实现本发明的Web浏览器或其他HTTP客户端在请求中可使用特殊HTTP报头，而在响应中可使用另一特殊HTTP报头。注意，HTTP规范允许使用额外报头；不实现相关功能性的系统将忽略与其有关的报头。 Figure 4 illustrates the series of steps involved in the initial generation of a source-indicating label or "note" for a set of data elements. All data considered sensitive must be assigned a source tag before it is first used in a scripted action. There are many ways in which data can be introduced into modern web browsers, but most of them involve the HTTP protocol (specified in W3C Web Working Group Request for Comment 2616 "Hypertext Transfer Protocol - HTTP/1.1", June 1999). For such an approach, a web browser or other HTTP client adapted to implement the present invention may use a special HTTP header in the request and another special HTTP header in the response. Note that the HTTP specification allows the use of additional headers; systems that do not implement the relevant functionality will ignore the associated headers. the

因此，如图4所示，根据本发明实施例的客户端-服务器交互序列以如Web浏览器的可编辑脚本的客户端发送300HTTP请求到Web服务器而开始。尽管不是所有HTTP客户端将实现本发明，但是实现本发明的客户端将典型地(在伴随数据的每个请求的许多报头中)包括指示该事实的新的报头。 Thus, as shown in FIG. 4, the client-server interaction sequence according to an embodiment of the present invention begins with a scriptable client such as a web browser sending 300 an HTTP request to the web server. Although not all HTTP clients will implement the invention, clients that implement the invention will typically include (among many headers with each request of the data) a new header indicating this fact. the

当前可用的Web浏览器已经发送大量关于其支持的功能性的信息、以及甚至其用户需要的内容的种类，如可接受哪种(人类)语言。例如，请求可能看起来像这样： Currently available web browsers already send a lot of information about the functionality they support, and even the kind of content their users want, such as which (human) languages are acceptable. For example, a request might look like this:

GET/HTTP/1.1 GET/HTTP/1.1

Host：ibm.com Host: ibm.com

User-Agent：Mozilla/5.0(Windows；U；Windows NT 5.1；en-US；rv：1.5) User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: 1.5)

Accept：text/xml，application/xml，application/xhtml+xml，text/html；q＝0.9， Accept: text/xml, application/xml, application/xhtml+xml, text/html; q=0.9,

text/plain：q＝0.8，video/x-mng，image/png，image/jpeg，image/gif；q＝0.2，*/*；q＝0.1 text/plain: q=0.8, video/x-mng, image/png, image/jpeg, image/gif; q=0.2, */*; q=0.1

Accept-Language：en-us，en；q＝0.5 Accept-Language: en-us, en; q=0.5

Accept-Encoding：gzip，deflate Accept-Encoding: gzip, deflate

Accept-Charset：ISO-8859-l，utf-8；q＝0.7，*；q＝0.7 Accept-Charset: ISO-8859-l, utf-8; q=0.7, *; q=0.7

Keep-Alive：300 Keep-Alive: 300

Connection：keep-alive Connection: keep-alive

可添加新的请求报头以标识请求者是否可提供用于“安全”内容的保护。内容类型“安全”与现有内容类别不同(因为其不是MIME类型、人类语言、编码类型或字符组)。在第一实施例中，新的请求者报头具有字段名“接受-安全性”，但是这仅是说明示例。无论字段名可以是什么，新报头字段可以是对请求客户端是否使用来源标签提供内容保护的简单指示。 A new request header can be added to identify whether the requester can provide protection for "safe" content. The content-type "safe" is different from existing content-classes (because it is not a MIME type, human language, encoding type, or charset). In the first embodiment, the new Requestor header has the field name "Accept-Security", but this is only an illustrative example. Whatever the field name may be, the new header field MAY be a simple indication to the requesting client whether or not to provide content protection using an origin tag. the

请求报头“接受-安全性：来源标签”暗示如果服务器希望发送安全数据，则浏览器可保护安全数据，但是其不要求安全数据。额外报头可包括在每个HTTP请求中，尽管这不是必要的。这样的新报头允许未来的与不同种类的可选实现的安全性的扩展。例如，“接受-安全性：”请求报头和基于来源标签限制内容的传输可与另一安全性选项一起实现。 The request header "Accept-Security:Origin Tag" implies that the browser can protect secure data if the server wishes to send it, but it does not require secure data. Extra headers can be included in every HTTP request, although this is not required. Such a new header allows future extensions of security with different kinds of optional implementations. For example, an "Accept-Security:" request header and restricting the transmission of content based on origin tags can be implemented with another security option. the

服务器访问310准备好下载到客户端的请求页，但是Web服务器可配置为在传输任何敏感数据之前检查新请求报头的存在。当对HTTP请求响应时，可对所有HTTP请求执行检查，或如果Web服务器提供有检测数据敏感性的指示的机制时，则可仅响应于Web服务器标识访问页内的敏感数据而执行检查。在一个实施例中，可在内容服务器上创建或存储数据时插入数据敏感性的指示。 The server accesses 310 the requested page ready for download to the client, but the web server can be configured to check for the presence of new request headers before transmitting any sensitive data. When responding to an HTTP request, the check may be performed on all HTTP requests, or only in response to the Web server identifying sensitive data within the accessed page if the Web server provides a mechanism to detect an indication of data sensitivity. In one embodiment, an indication of data sensitivity may be inserted when the data is created or stored on the content server. the

对于本发明，关于在数据敏感性的检查之前还是之后检查客户端能力的具体序列不是关键的。如果Web服务器标识请求页内的敏感信息，并且Web服务器确定请求客户端没有承诺保护该数据(即，在请求中没有包括新报头)，则根据本发明一个实施例的服务器通过拒绝该请求或通过提供没有敏感数据的请求页来答复320客户端。 The specific sequence in which client capabilities are checked before or after the check of data sensitivity is not critical to the present invention. If the web server identifies sensitive information within the requested page, and the web server determines that the requesting client has not committed to protecting the data (i.e., did not include a new header in the request), the server according to one embodiment of the invention either rejects the request or passes Serve request pages without sensitive data in reply to 320 clients. the

Web服务器可编程和配置为以不同方式运行(并且一些Web服务器可不实现支持本发明的任何新功能)，但是在本发明中提供的新请求报头使得服务器可能进行关于是否提供敏感数据的通知的判断。服务器然后可拒绝提供其敏感数据到没有向各个服务器通知客户端的能力和保护敏感数据的承诺的任何客户端。 Web servers can be programmed and configured to behave in different ways (and some web servers may not implement any new functionality to support the invention), but the new request headers provided in the invention make it possible for servers to make decisions about whether to provide notification of sensitive data . The server may then refuse to provide its sensitive data to any client that has not informed the respective server of the client's capabilities and commitment to protecting the sensitive data. the

注意，上述本发明实施例不包括任何这样的机制，其中，当声称保护敏感数据的HTTP客户端实际上不保护敏感数据时，阻止该HTTP客户端。在该第一实施例中，将HTTP客户端认为是人类用户的代理，并且本发明的范围仅扩展到用户相信的数据。已知的用户认证解决方案应当用于确保仅授权的用户可获得敏感数据。 Note that the embodiments of the invention described above do not include any mechanism whereby an HTTP client claiming to protect sensitive data is blocked when the HTTP client does not in fact protect the sensitive data. In this first embodiment, the HTTP client is considered to be a proxy for a human user, and the scope of the invention extends only to data that the user believes. Known user authentication solutions should be used to ensure that only authorized users have access to sensitive data. the

如果Web服务器标识请求页中的敏感数据并且请求报头中的字段的检查确定请求客户端提供对于敏感数据的保护，则Web服务器传输330请求页和“敏感数据”标志到请求客户端。尽管与本发明无关，但是通常期望使用如安全套接字层的安全性机制来保护敏感数据的任何传输，以阻止其在运送中的窃取。 If the web server identifies sensitive data in the requested page and inspection of the fields in the request header determines that the requesting client provides protection for the sensitive data, the web server transmits 330 the requested page and a "sensitive data" flag to the requesting client. Although not relevant to the present invention, it is generally desirable to protect any transmission of sensitive data using a security mechanism such as the Secure Sockets Layer to prevent its theft in transit. the

因此，在本实施例中，通过服务器提供其来执行哪个数据是敏感的确定。根据本发明，当响应于HTTP请求提供330敏感数据时，响应将携带指示其内容要被处理为敏感数据的报头。“敏感数据”报头可包括来自Web服务器的、关于特定来源信息的显式指令，以包括在来源标签中，并且该指令将仅通过来源标签生成器260起作用。如果接收340该数据的客户端是可编辑脚本的，并且该数据要放置在对于脚本操作可访问的存储器的区域中，则当保存数据时并且在执行脚本之前，客户端必须将源标签附到350数据。此后，来源标签从接收的数据元素传播360到客户端内生成的任何新数据元素(如下所述)，并且来源标签用于控制标记数据元素的传输370(如下所述)。 Therefore, in this embodiment, the determination of which data is sensitive is performed by the server providing it. According to the invention, when providing 330 sensitive data in response to an HTTP request, the response will carry a header indicating that its content is to be treated as sensitive data. The “Sensitive Data” header may include explicit instructions from the web server regarding specific origin information to include in the origin tag, and this instruction will only work through the origin tag generator 260 . If the client receiving 340 the data is scriptable, and the data is to be placed in an area of storage accessible to script operations, the client must attach the source tag to the 350 data. Thereafter, origin tags are propagated 360 from received data elements to any new data elements generated within the client (described below), and origin tags are used to control the transmission 370 of tagged data elements (described below). the

服务器可以确定在标签中要描述来源的精度。例如，小公司可能用使得浏览器将“company.com”的来源标签应用到该数据的报头为其内联网上的所有文档服务。然后脚本可将该数据发送到公司内的任何系统，但是不发送到外部的任何地方。较大公司可能更严格地限制其某些数据，可能用使得浏览器为其标记“secretdocstore.r-and-d.company.com”或“webapps.finance.company.com”的报头。不能信任可能也驻留于此地其他应用的共享服务器上的Web应用可以服务数据，使得其标记域“apps.sharedhostingcompany.com”以及端口号“8080”和文件路径“/customers/account42/secretdocs”；这提供了基于在特定Web应用的控制下专用的位置信息的来源的指示，而相同服务器上的其他位置可能较不精确地指定。因此，在Web应用的专用控制下的位置可如下指定： The server can determine the precision with which to describe the source in the tag. For example, a small company might serve all documents on its intranet with a header that causes the browser to apply an origin tag of "company.com" to the data. The script can then send that data to any system within the company, but not anywhere outside. Larger companies may restrict some of their data more strictly, perhaps with headers that cause browsers to flag "secretdocstore.r-and-d.company.com" or "webapps.finance.company.com" for them. A web application that cannot be trusted to serve data on a shared server that may also reside on other applications here makes it mark the domain "apps.sharedhostingcompany.com" with the port number "8080" and the file path "/customers/account42/secretdocs"; This provides an indication based on the source of location information that is private under the control of a particular web application, while other locations on the same server may be less precisely specified. Therefore, a location under the dedicated control of a web application may be specified as follows:

http://apps.sharedhostingcompany.com：8080/customers/account42/secretdocs/ http://apps.sharedhostingcompany.com:8080/customers/account42/secretdocs/

上述标记方案因此具有能够处理共享服务器的情况的足够的精度；而当不处理共享服务器时，主机名可能就足够了。 The above labeling scheme thus has enough precision to handle the case of shared servers; whereas when not handling shared servers, the hostname may suffice. the

输入到脚本对话框和页面上的文本框中的信息、以及从本地文件加载的数据没有可能对其委托敏感性问题的来源服务器。该问题可通过标记从用户获得或以其他方式从计算机获得或在计算机上生成的信息来处理，在该计算机上正运行脚本编辑引擎，其中特殊来源标签意味着“本地”。该标签广义上类似于其他来源标签，但是可以如下所述给出特殊处理。 Information entered into script dialogs and text boxes on pages, as well as data loaded from local files, has no source server to which it may entrust sensitive issues. This problem can be addressed by labeling information obtained from a user or otherwise obtained from or generated on the computer on which the scripting engine is running, where a special source tag means "local". This tag is broadly similar to other source tags, but may be given special treatment as described below. the

多数浏览器还可经由HTTP以外的协议访问数据，其中文件传输协议(FTP)是广泛使用的示例。在一些情况下，这些协议还可用如上对于HTTP 所述的额外报头扩展；而在其他情况下，以与使用上述“本地”来源标签的本地文件(例如，在浏览器实现SMB或NFS共享文件系统协议时)的相同方式处理数据是更适当的。即使剩下一些没有来源标记系统的深奥的协议，本发明仍然有用和有效，因为它们不可能用于将从本发明受益的Web应用的种类。 Most browsers can also access data via protocols other than HTTP, of which File Transfer Protocol (FTP) is a widely used example. In some cases, these protocols can also be extended with additional headers as described above for HTTP; It is more appropriate to process the data in the same manner as when using the protocol. Even if there are some esoteric protocols left without a source tagging system, the invention is still useful and effective because they are unlikely to be used for the kind of web applications that would benefit from the invention. the

在本发明一个实施例中，增强用于生成和存储注释的上述技术，以减少注释传播所需的存储容量并允许其更快实现。在典型应用中，在任何时间将仅存在少量注释值(可能仅一个)在使用中，但是这些值可应用到大量各个数据元素。该增强考虑了该特性并利用来源值的中心表。各个注释仅需要指向表的适当行，而不需要携带其自身对该值的复本。 In one embodiment of the present invention, the above-described technique for generating and storing annotations is enhanced to reduce the storage capacity required for annotation propagation and allow it to be implemented more quickly. In a typical application, there will be only a small number of annotation values (perhaps only one) in use at any time, but these values may apply to a large number of individual data elements. This enhancement takes this property into account and utilizes a central table of source values. Each annotation only needs to point to the appropriate row of the table, and does not need to carry its own copy of the value. the

3.标签的传播 3. Dissemination of labels

一旦数据项具有来源标签，相同标签就传播360到从其得到的任何其他数据。更具体地，对于具有输入和输出两者的所有脚本操作，在前者的数据结构中存在的任何来源标签将通过脚本编辑引擎240的标签传播器组件270传送到后者的对应的数据结构。例如，JavaScriptString.subString()方法是取文本串为输入并产生更短文本串作为输出的操作。如果该方法用在根据本发明的脚本编辑引擎240中来从由“example.com”安全获得的页面提取一些文本，则提取的文本也将标记有“example.com”的来源。对于该实施例，对对象执行的任何脚本编辑操作应当被认为是取该对象为输入。 Once a data item has a source tag, the same tag is propagated 360 to any other data derived from it. More specifically, for all script operations that have both inputs and outputs, any source tags present in the data structures of the former will be passed by the tag propagator component 270 of the scripting engine 240 to the corresponding data structures of the latter. For example, the JavaScriptString.subString() method is an operation that takes a text string as input and produces a shorter text string as output. If this method is used in the scripting engine 240 according to the present invention to extract some text from a page secured by "example.com", the extracted text will also be marked with the origin of "example.com". For this embodiment, any scripting operation performed on an object should be considered to take that object as input. the

如上面的背景技术部分中所表述的，脚本编辑引擎典型地以结构化语言编写并提供某些操作。这些操作由以该结构化语言编写的程序代码定义。在本实施例中，在该部分中描述的额外行为通过将标签传播功能添加到脚本编辑引擎内的程序代码来实现。在不实现本发明的脚本编辑引擎中，脚本将继续要求如其将进行的一系列操作，但是生成350来源标签然后透明地传播360到用户和脚本编程者。 As stated in the Background section above, scripting engines are typically written in a structured language and provide certain operations. These operations are defined by program code written in this structured language. In this embodiment, the additional behaviors described in this section are implemented by adding label propagation functionality to the program code within the script editing engine. In a scripting engine not implementing the present invention, the script would continue to require a series of operations as it would, but generate 350 source tags and then propagate 360 transparently to the user and scripter. the

当脚本操作对多于一个输入项运行、并且至少两个输入项具有不全来自相同来源的来源标签时，输出数据获取特殊标签。因为标记数据除了其来源不能发送到任何地方，并且新数据的源包围多个来源，所以在许可传输的规则内不存在可发送到的位置。因此，特殊标签指示“无处(nowhere)”，并且以此方式标记的数据不能离开脚本编辑环境。 Output data gets a special label when a script action is run on more than one input and at least two of the inputs have source labels that do not all come from the same source. Because tagged data cannot be sent anywhere but its source, and the source of new data encompasses multiple sources, there is no place within the rules for permissive transfers to which it can be sent. Therefore, the special tag indicates "nowhere" and data tagged in this way cannot leave the scripting environment. the

在该具体实施例中，上述传播利用四个不断增加的限制性的来源标签类型： In this particular embodiment, the propagation described above utilizes four increasingly restrictive source tag types:

无：该项没有来源标签，因为通过其获得的HTTP会话或其它手段没有指定为安全的。 None: The item has no source tag because the HTTP session or other means obtained through it is not designated as secure. the

本地：该项由用户输入或从本地文件加载。 Local: The item is entered by the user or loaded from a local file. the

来源：标准来源标签，指示从其获得数据的位置。在脚本编辑环境中可能存在同时使用的许多不同的来源。 Source: A standard source tag indicating where the data was obtained from. There may be many different sources used simultaneously in a scripting environment. the

无处：通过其获得该数据项的路径涉及标记有多于一个来源的数据。 Nowhere: The path through which the data item was obtained involves marking data that has more than one origin. the

表1中示出在两输入脚本编辑操作的情况下交互的标签的这些种类： These categories of tags that interact in the case of two-input script editing operations are shown in Table 1:

对于输入一个的标签类型 For a label type that enters a

表1 Table 1

就好像对于两个输入的操作，具有三个或更多输入元素的脚本编辑操作将对其(各)输出分配在输入中找到的最限制性的标签类型，除了如果最限制性的类型是“来源”并且存在多于一个来源位置，“无处”将用于如上所述的输出。在单个输入操作的情况下，所有输出简单地继承输入项的标签。 Just as for an operation with two inputs, a scripted edit operation with three or more input elements will assign its (each) output the most restrictive label type found in the input, except if the most restrictive type is " source" and there is more than one source location, "nowhere" will be used for the output as above. In the case of a single input operation, all outputs simply inherit the label of the input item. the

图5中表现了用于与脚本编辑操作的输出相关联的标签的确定的一系列步骤，尽管图5中示出的具体操作序列仅涉及本发明的一个说明性实施例。根据该实施例，当HTTP客户端从其本地存储器检索400输入数据(如检索的网页)以开始处理时，HTTP客户端以传统方式处理输入数据，直到标识脚本。HTTP客户端然后调用其脚本解释器250。在执行每个脚本指令之前，脚本解释器提取410与输入数据元素相关联的标签组，并将其传递到标签传播器270。标签传播器然后应用一组规则以确定对于脚本编辑操作的输出适当的标签。如果对输入数据存在任何来源标签的检查420确定没有与输入数据元素相关联的标签，则该处理的捷径(shortcut)是可能的。该“捷径”在图5的步骤420和步骤500之间的箭头中示出。 A series of steps for the determination of tags associated with the output of a script editing operation is represented in FIG. 5, although the specific sequence of operations shown in FIG. 5 relates to only one illustrative embodiment of the present invention. According to this embodiment, when an HTTP client retrieves 400 input data (such as a retrieved web page) from its local storage to begin processing, the HTTP client processes the input data in a conventional manner until a script is identified. The HTTP client then invokes its script interpreter 250 . Before executing each script instruction, the script interpreter extracts 410 the tag set associated with the input data element and passes it to the tag propagator 270 . The label propagator then applies a set of rules to determine the appropriate label for the output of the script editing operation. A shortcut to this process is possible if the check 420 for the presence of any source tags on the input data determines that there are no tags associated with the input data element. This "shortcut" is shown in the arrow between steps 420 and 500 of FIG. 5 . the

如果输入没有标签，则如步骤420所确定的，输出也将没有相关联的标签。然而，如果通过用户输入或从本地文件加载新数据元素，则这些输入将处理为具有标签类型“本地”，在此情况下脚本编辑引擎的处理的输出也将具有相关联的标签类型“本地”(或，如果与其他标记的输入组合，则更限制性的标签类型)。这将在下面更详细描述。 If the input does not have a label, then as determined in step 420, the output will not have an associated label either. However, if new data elements are loaded via user input or from a local file, these inputs will be processed as having a tag type of "local", in which case the output of the scripting engine's processing will also have an associated tag type of "local" (or, if combined with other labeled inputs, a more restrictive label type). This will be described in more detail below. the

如果在步骤420的确定是肯定的，则因为存在相关输入标签，所以在以最限制性的标签类型开始的序列中实现根据本实施例的标签传播处理。即，通过标签传播器270执行的第一步骤包括确定430是否任何输入具有标签类型“无处”。如果是，则标签类型“无处”与脚本编辑引擎的输出数据元素或多个元素相关联440。 If the determination at step 420 is positive, then the label propagation process according to the present embodiment is implemented in a sequence beginning with the most restrictive label type because there are relevant input labels. That is, the first step performed by the label propagator 270 includes determining 430 whether any input has the label type "Nowhere". If so, the tag type "Nowhere" is associated 440 with the scripting engine's output data element or elements. the

下面在跟随示例性传播序列的描述之后的题为“控制数据传输的来源的指示的使用”的部分下，更详细地描述标签如何用于控制数据传输。然而，这里通过说明提供该控制的示例。当数据元素具有相关联的标签类型“无处”时，脚本解释器将能够执行500脚本指令，除了响应于这些脚本指令的HTTP客户端的操作将经历由相关联的标签“无处”施加的限制。具体地，将阻止具有“无处”标签的数据元素传输到脚本编辑环境外。 How tags are used to control data transmissions is described in more detail below under the section entitled "Use of Indication of Origin of Controlling Data Transmissions" following the description of an exemplary propagation sequence. However, an example of this control is provided here by way of illustration. When a data element has an associated tag type "nowhere", the script interpreter will be able to execute 500 script instructions, except that operations of the HTTP client in response to these script instructions will be subject to the restrictions imposed by the associated tag "nowhere" . Specifically, data elements with the "nowhere" tag will be prevented from being transferred outside the scripting environment. the

返回到图5的传播序列，如果没有输入元素具有标签类型“无处”，则传播器确定450对于具有多个不同“来源”标签的特定脚本指令是否存在多个输入。如果是，则标签类型“无处”与输出数据元素或多个元素相关联440。 Returning to the propagation sequence of FIG. 5, if no input element has a tag type of "nowhere," the propagator determines 450 whether there are multiple inputs for a particular script instruction with multiple different "source" tags. If so, the tag type "Nowhere" is associated 440 with the output data element or elements. the

如果步骤450的确定是否定的，则传播器确定460对于当前脚本操作的输入数据元素是否具有单个“来源”类型标签(即，仅存在一个敏感数据元素，或所有敏感数据元素具有相同来源)。如果是，则相关“来源”类型标签与脚本指令的输出数据元素相关联470。 If the determination of step 450 is negative, the propagator determines 460 whether the input data element for the current script operation has a single "origin" type tag (ie, there is only one sensitive data element, or all sensitive data elements have the same origin). If so, an associated "source" type tag is associated 470 with the script instruction's output data element. the

如果在步骤460的确定是否定的，则传播器确定480对当前脚本指令的任何输入是否具有相关联的“本地”标签类型(包括由用户输入或从本地文件加载的数据元素，其被认为具有“本地”标签-参见上述)。如果是，则“本地”标签与脚本操作的输出数据元素相关联490。 If the determination at step 460 is negative, the propagator determines 480 whether any input to the current script instruction has an associated "local" tag type (including data elements entered by the user or loaded from a local file, which are considered to have "Local" tab - see above). If so, a "local" tag is associated 490 with the scripted operation's output data element. the

已经确定要与脚本编辑操作的输出相关联的标签之后，HTTP客户端内的脚本编辑引擎然后执行其处理500，经历由如下所述的标签施加的对于数据传输的限制。对于脚本内的每个脚本指令重复执行步骤410到500，并且脚本编辑引擎随着其执行而传播标签。 Having determined the tags to be associated with the output of the scripting operation, the scripting engine within the HTTP client then performs its process 500, subject to the restrictions on data transmission imposed by the tags as described below. Steps 410 to 500 are repeated for each script instruction within the script, and the scripting engine propagates tags as it executes. the

除了脚本编辑操作外，可以有对脚本域起作用、但不是实际的脚本操作自身的其他操作。例如，在文本框中键入的用户正创建“本地”数据，但是用户可将该“本地”数据与“来源”数据组合。在此情况下，标签传播器组件(或在HTTP客户端中其他地方运行的等效组件)期望应用规则以确定对于得到的数据适当的标签。在本实施例中，HTTP客户端调用标签传播器来确定对于标记数据的任何这种组合的适当标签。在其他实施例中，标签传播器可实现为与脚本编辑引擎分开的组件。在用户键入到文本框的示例中，典型地对于每次按键调用脚本编辑引擎，使得其可激活被登记来通知改变的脚本的任何部分，所以脚本编辑引擎的标签传播器组件容易地被调用以确定新数据的适当标签。 In addition to script editing actions, there can be other actions that act on the script field, but are not the actual script action itself. For example, the user typing in the text box is creating "local" data, but the user can combine this "local" data with the "source" data. In this case, the tag propagator component (or an equivalent component running elsewhere in the HTTP client) is expected to apply rules to determine the appropriate tags for the resulting data. In this embodiment, the HTTP client invokes the tag propagator to determine the appropriate tag for any such combination of tagged data. In other embodiments, the tag propagator may be implemented as a separate component from the scripting engine. In the example where the user types into a textbox, the scripting engine is typically called for each keypress, so that it can activate any part of the script that is registered to be notified of changes, so the tag propagator component of the scripting engine is easily called to Determine appropriate labels for new data. the

4.使用来源的指示以控制数据传输 4. Indication of use of origin to control data transfers

某些脚本操作涉及从Web浏览器到网络上任何地方的服务器的数据传输。例如，(如RFC1866中定义的)HTML形式可以在脚本控制下提交，可使用XMLHttpRequest(如上所述)，或可指令浏览器访问其“路径”、“查询”或“片段”部分已经专门构造为携带数据的手段的URL。本发明定义了在允许任何数据离开浏览器之前必须执行的额外处理。所述行为以浏览器的结构化语言程序代码实现，而不需要对在其中运行的脚本的任何改变。 Certain scripting operations involve the transmission of data from a web browser to a server anywhere on the network. For example, an HTML form (as defined in RFC1866) may be submitted under script control, XMLHttpRequest may be used (as described above), or the browser may be instructed to access its "path", "query", or "fragment" sections having been specially constructed as The URL of the means to carry the data. The present invention defines additional processing that must be performed before any data is allowed to leave the browser. The behavior is implemented in the browser's structured language program code without requiring any changes to the scripts running within it. the

使得标识为能够传输数据的任何脚本编辑操作依赖于其输入数据的标签。一个脚本指令的输入数据可以是之前脚本指令的输出，在此情况下，上述标签的传播将已经确定了用于控制数据传输的相关标签。传输控制可实现为脚本编辑引擎240的组件，或可以在浏览器内分开地实现。如果根据下面定义的规则不能传输数据，则传输操作必定不能完成。操作优选地还报告错误(不必对用户指出)。 Makes any script editing operation marked as able to transfer data dependent on the label of its input data. The input data of a script command may be the output of a previous script command, in which case the propagation of the above tags will have identified the associated tags for the control data transfer. Transport control may be implemented as a component of scripting engine 240, or may be implemented separately within the browser. The transfer operation must not complete if the data cannot be transferred according to the rules defined below. The operation preferably also reports errors (not necessarily pointed out to the user). the

一些传输操作可能取多输入，其可能潜在地具有不同标签。在此情况下，在允许传输规则之前，应该参照标签传播器270如上所述组合标签。 Some transfer operations may take multiple inputs, which may potentially have different labels. In this case, the tags should be combined as described above with reference to the tag propagator 270 before the rules are allowed to be transmitted. the

如果传输的数据没有任何类型的来源标签，则其可以自由地发出。这是不实现本发明的浏览器(除非提供替代数据保护机制)、以及处理不敏感的数据的浏览器的情况。 If the transmitted data does not have any type of origin tag, it can be sent freely. This is the case for browsers that do not implement the invention (unless alternative data protection mechanisms are provided), as well as browsers that handle data that is not sensitive. the

如果数据携带特殊“无处”标签，则不能传输到本地脚本编辑环境外。 If the data carries the special "nowhere" tag, it cannot be transferred outside the local scripting environment. the

如果数据携带类型“来源”的来源标签，则可以传输数据，但仅传输到与其标签兼容的位置。兼容位置是以下之一： If the data carries a source tag of type "source", the data may be transferred, but only to locations compatible with its tag. A compatible location is one of the following:

●具有相同DNS域或其子域(例如，其来源是“dev.company.com”的数据可发送到“docstore.dev.company.com”)；以及 ● have the same DNS domain or a subdomain thereof (for example, data whose source is "dev.company.com" can be sent to "docstore.dev.company.com"); and

●驻留在对应于标签内指定的端口号的因特网协议端口上(如果标签指定端口号)；以及 Residing on the Internet Protocol port corresponding to the port number specified in the label (if the label specifies the port number); and

●如果标签指定路径，还存在于该路径上或其后代上(例如，其来源包括路径“/apps/expensetool”的数据可发送到“/apps/expensetool/submitform”)。 • If the tag specifies a path, also exist on that path or its descendants (eg, data whose source includes the path "/apps/expensetool" may be sent to "/apps/expensetool/submitform"). the

如果数据携带特殊“本地”标签，则通过浏览器可实现下述规则： If the data carries a special "local" tag, the following rules can be implemented through the browser:

1.如果脚本编辑引擎内不存在具有“来源”或“无处”标签的数据，则浏览器可以假设不涉及敏感数据，并允许“本地”数据发送到任何地方。这具有不改变现有网页和应用的行为的优点，但是对于具有“本地”标签的数据替代的行为可能是优选的，以避免当首次创建新数据元素时的潜在安全性暴露。 1. If there is no data with "origin" or "nowhere" tags within the script editing engine, the browser can assume that no sensitive data is involved and allow "local" data to be sent anywhere. This has the advantage of not changing the behavior of existing web pages and applications, but may be preferable for data substitution with "local" tags to avoid potential security exposure when new data elements are first created. the

2.如果引擎中的一些数据是敏感的，则浏览器可配置为阻止“本地”数据的传输，或在传输“本地”信息之前询问用户。然而，如果委托决定给用户，则必须认识到用户有时在他们不应该接受安全性暴露时接受了安全性暴露。在许多情况下，在发送具有来源标记的数据之前，由用户输入的数据将已经通过正常处理组合，所以跟随的将是对于“来源”标签的规则、而不是对于“本地”标签的规则。 2. If some data in the engine is sensitive, the browser can be configured to block the transmission of "local" data, or to ask the user before transmitting "local" information. However, if the decision is delegated to users, it must be recognized that users sometimes accept security exposures when they should not. In many cases, the data entered by the user will have been assembled by normal processing before sending the data with the origin tag, so the rules for the "source" tag, rather than the "local" tag, will follow. the

3.当浏览器中的所有来源标签相同时，通过允许“本地”数据传输到该来源(同时仍然质疑其到别处的分发)，可处理许多情况而不用求助用户控制。在常见情况下(如安全文档编辑应用)，该解决方案平衡了安全性和可用性。 3. Many cases can be handled without resorting to user control by allowing "local" data to be transferred to that origin (while still questioning its distribution elsewhere) when all origin tags are the same in the browser. In common cases (such as secure document editing applications), the solution balances security and usability. the

上述一组规则达成了数据安全性和用户体验的破坏之间的平衡，同时认识到浏览器可以没有这样的信息的敏感性的明确认识。 The above set of rules strikes a balance between data security and disruption of the user experience, while recognizing that browsers may not have an explicit awareness of the sensitivity of such information. the

5.加密数据的扩展 5. Extension of encrypted data

上述特征提供了用于控制敏感性信息的传输的完整和有利的系统。然而，可以通过包括加密数据的概念以提供进一步可用的能力的方式扩展这些设备。注意，这与运送中的数据流的任何SSL加密是完全分开的。 The above features provide a complete and advantageous system for controlling the transmission of sensitive information. However, these devices can be extended in a way that provides further usable capabilities by including the concept of encrypting data. Note that this is completely separate from any SSL encryption of the data stream in transit. the

该设备要求专门编写脚本以利用它，这与可用不更改的脚本和页面实现的基本发明不同。为了支持该扩展，将使得在脚本编辑引擎中可用新操作，其可使用适当公开的密码将数据转换到加密形式并从加密形式转换数据。 The device requires specially written scripts to take advantage of it, unlike basic inventions that can be implemented with scripts and pages that don't change. To support this extension, new operations will be made available in the scripting engine that can convert data to and from encrypted form using properly disclosed ciphers. the

添加第五种来源标签：“加密”。对于加密数据的规则如下： Add a fifth source tag: "encrypted". The rules for encrypted data are as follows:

创建：数据可加密地到达，在此情况下，这通过携带它的传输中的适当的HTTP报头指示。当脚本编辑引擎的显式加密支持用于执行加密现有数据的脚本编辑操作时，结果也标记有该“加密”标签，无论其可能已经携带的任何之前的标签。 Creation: The data may arrive encrypted, in which case this is indicated by an appropriate HTTP header in the transport carrying it. When a scripting engine's explicit encryption support is used to perform a scripting operation that encrypts existing data, the result is also tagged with the "encrypted" tag, regardless of any previous tags it may have carried. the

传播：通常，将加密数据与其他信息组合不太可能有用。然而其是可能的，在此情况下输出将获取没有加密的输入元素的标签(或如果存在多于一个，则最限制性的这样的标签)，如对于上述正常传播那样。 Dissemination: In general, combining encrypted data with other information is unlikely to be useful. It is however possible, in which case the output will take no encrypted input element's label (or the most restrictive such label if there is more than one), as for the normal propagation described above. the

解密：当使用脚本编辑引擎的设备解密加密数据时，输出给出“无处”的来源标签并不能传输。 Decryption: When decrypting encrypted data using a device with a scripting engine, the output gives a source tag of "nowhere" and cannot be transmitted. the

传输：标记为“加密”的数据可传输到任何地方，因为其对于没有持有密钥的任何实体是无用的。 Transmission: Data marked "encrypted" can be transmitted anywhere since it is useless to any entity that does not hold the key. the

因此，本发明将来源的指示与敏感数据(无论是标记各个数据元素还是标记网页)、包含敏感数据的文件或数据库相关联，然后根据标签控制该数据的向上传输。敏感数据可由用户输入到客户端数据处理系统，或数据可保持在安全内容服务器上，如保持在存储的网页内，但是本发明可应用到被认为足够敏感以判断控制其向上传输的任何数据。在本发明的上述实施例中，大量数据将没有以此方式约束的其向上传输，因为经由因特网可用的大量数据意图公开可用，并且因为创建或编译特定条信息的个人或组织可能已经判断不需要保护。然而，还有大量经由数据处理网络可访问的私有信息，并可能通过“不信任的”脚本暴露，并且如果用户要安全和有效地利用可用Web服务和脚本而不将其信息暴露给他人，则仍需要保护该信息。 Thus, the present invention associates an indication of origin with sensitive data (whether tagging individual data elements or tagging web pages), files or databases containing sensitive data, and then controls the upward transmission of that data based on tags. Sensitive data may be entered by the user into the client data processing system, or the data may be maintained on a secure content server, such as within a stored web page, but the invention is applicable to any data deemed sensitive enough to arbitrarily control its upward transmission. In the above-described embodiments of the invention, large amounts of data would not have their upward transmission constrained in this manner, because the large amount of data available via the Internet is intended to be publicly available, and because the individual or organization that created or compiled a particular piece of information may have judged that it does not require Protect. However, there is also a large amount of private information accessible via data processing networks and potentially exposed through "untrusted" scripts, and if users are to safely and efficiently utilize available web services and scripts without exposing their information to others, then There is still a need to protect that information. the

本领域技术人员将理解，在本发明范围内可以以各种方式修改或扩展上述本发明的特定实施例，因此实施例的上述详细描述应当被认为是说明性的，而非限制本发明。 Those skilled in the art will appreciate that the particular embodiments of the invention described above may be modified or extended in various ways within the scope of the invention and thus the above detailed description of the embodiments should be considered illustrative rather than limiting. the

例如，上述新的请求报头仅表示本发明的一种可能的实现，并且本发明的替代实施例不要求任何新请求报头。替代地，通过根据本发明实施例的修改脚本编辑引擎、并且无论何时从第三方网站获得该数据都将其标记为敏感的，释放和替代对类型XMLHttpRequest的请求的使用的安全性约束。类型XMLHttpRequest的请求当前不能由脚本用于(至少不能直接用于)自从其下载脚本的服务器以外的服务器获得数据。这阻止这样的攻击，其中恶意JavaScript利用用户的信任或在防火墙之后的用户的数据处理系统的位置，以从第三方站点读取数据到Web浏览器中，然后将该数据传输回到脚本从其起源的站点。这被称为“跨站点脚本编辑脆弱性”。使用根据本发明实施例的脚本编辑引擎内的来源标签、并且将下载数据标识为敏感的，可阻止这样的攻击而不用依赖对XMLHttpRequest的已知限制。因此，可许可从第三方站点下载数据用于有用的处理，同时将这样的数据标记为敏感的以维持安全性。 For example, the new request headers described above represent only one possible implementation of the invention, and alternative embodiments of the invention do not require any new request headers. Instead, the security constraints on the use of requests of type XMLHttpRequest are released and replaced by modifying the scripting engine according to an embodiment of the present invention and marking this data as sensitive whenever it is obtained from a third-party website. Requests of type XMLHttpRequest cannot currently be used by scripts (at least not directly) to obtain data from a server other than the server from which the script was downloaded. This prevents attacks in which malicious JavaScript exploits the user's trust or the location of the user's data processing system behind a firewall to read data from a third-party site into the web browser and then transmit that data back to the script from which The site of origin. This is known as a "cross-site scripting vulnerability". Using source tags within the scripting engine according to embodiments of the present invention, and identifying the downloaded data as sensitive, prevents such attacks without relying on known limitations on XMLHttpRequest. Accordingly, downloading of data from third-party sites may be permitted for useful processing, while such data is marked as sensitive to maintain security. the

其次，不同浏览器可响应于接收到包括“敏感数据”标志的数据而实现不同用于插入来源标签的机制；并且相关数据结构内的来源标签的类别和具体表示在本发明的不同实施例之间也可变化。一旦本领域技术人员阅读了期望行为的上面的描述，就认识到如本申请说明书中定义的本发明范围内的各种实现选择。 Second, different browsers may implement different mechanisms for inserting source tags in response to receiving data that includes a "sensitive data" flag; can also vary. Once those skilled in the art have read the above description of desired behavior, they will recognize various implementation options within the scope of the invention as defined in the present specification. the

Claims

1. in script editing environment, carry out, for controlling a method for the transmission of sensitive data, comprise the following steps:

The indication in source is associated with the first data element;

The indication in source is propagated into the data element generating from the first data element, the data element of wherein said generation generates from described the first data element, and the step of wherein propagating the indication in source comprises:

Sign is for one or more input groups of script editing operation, and this script editing operation generates one or more output;

The indication in any source that sign is associated with one or more inputs;

From the indication in source of input, obtain the secondary indication in source; And

The secondary indication in the preservation source being associated with the output of one or more generations; And

The data element that limits the first data element and described generation is only transferred to the destination of license, wherein with reference to the destination of permitting described in the sign in source;

The step that the data element of the step wherein indication in source being associated with the first data element and restriction the first data element and described generation is only transferred to the destination of license is all in response to determines that the first data element comprises that sensitive data carries out.

2. the method for claim 1, wherein obtains the first data element from the source data disposal system away from script editing environment, and the step of wherein restriction transmission comprises that prevention the first data element is transferred to source data disposal system any destination in addition.

3. the method for claim 1, the step wherein indication in source being associated with the first data element is in response to determining that process the first data element by script editing operation carries out.

4. the method for claim 1, wherein associated, propagate and the step of restriction is carried out on client data treating apparatus, this client data treating apparatus comprises the network connection interface for communicating by letter with remote server data treating apparatus,

Wherein said method also comprises:

Via network connection interface, request is sent to remote server data treating apparatus, to retrieve the first data element from remote server data treating apparatus; And

From remote server data treating apparatus, receive data.

5. method as claimed in claim 4, the step wherein sending request is included in given client end data treating apparatus in this request and comprises for carrying out the parts of the step of restriction transmission.

6. method as claimed in claim 4, the step wherein indication in source being associated with the first data element is in response to receiving data and carry out from remote server data treating apparatus, and these data comprise the indication that comprises sensitive data about data.

7. method as claimed in claim 4; wherein the step of sending and receiving is carried out by HTTP client-side program; and wherein by HTTP client-side program, in response to HTTP request, obtain the first data element, this HTTP request comprises that HTTP client-side program can provide the indication to the protection of sensitive data.

8. the method as described in one of claim 4 to 7, wherein step associated and that propagate realizes by the script editing engine of moving in client data treating apparatus.

9. the method for claim 1, wherein retrieves the first data element from the data handling system away from script editing environment by script editing operation.

10. in script editing environment, carry out, for controlling a data handling system for the transmission of sensitive data, comprising:

For the parts that the indication in source is associated with the first data element;

For the indication in source being propagated into the parts of the data element generating from the first data element, wherein saidly for the indication in source being propagated into the data element that the parts of the data element generating from the first data element generate, from described the first data element, generate, wherein these parts comprise:

For identifying the one or more input groups for script editing operation, this script editing operation generates the device of one or more outputs;

For identifying the device of the indication in any source being associated with one or more inputs;

For the indication in the source from input, obtain the device of the secondary indication in source; And

The device of the secondary indication of originating for the preservation being associated with the output of one or more generations; And

For limiting, the data element of the first data element and described generation is only transferred to the parts of the destination of license, the destination of wherein permitting described in the sign of reference source from data handling system;

Wherein for parts that the indication in source is associated with the first data element and the parts that for limiting, the data element of the first data element and described generation are only transferred to the destination of license from data handling system, be all in response to and determine that the first data element comprises that sensitive data carries out.

11. data handling systems as claimed in claim 10, wherein from the source data disposal system away from script editing environment, obtain the first data element, and wherein for limiting the parts that the data element of the first data element and described generation is only transferred to the destination of license from data handling system, comprise for stoping the first data element to be transferred to the parts of source data disposal system any destination in addition.

12. data handling systems as claimed in claim 10, wherein for unit response that the indication in source is associated with the first data element in determining that process the first data element by script editing operation carries out.

13. data handling systems as claimed in claim 10, wherein retrieve the first data element from the data handling system away from script editing environment by script editing operation.