CN100350406C

CN100350406C - Method, system and computer software for providing genomic web portal

Info

Publication number: CN100350406C
Application number: CNB018041396A
Authority: CN
Inventors: 大卫M·克拉福德; 弗农A·诺维尔
Original assignee: Affymetrix Inc
Current assignee: Affymetrix Inc
Priority date: 2000-01-25
Filing date: 2001-01-24
Publication date: 2007-11-21
Anticipated expiration: 2021-01-24
Also published as: JP2003521057A; CA2398382A1; AU2001237965A1; EP1252513A2; WO2001056216A2; WO2001056216A3; EP1252513A4; WO2001056216A9; CN1426534A

Abstract

描述了系统，方法和计算机程序产品，它们处理有关购买生物设备，材料，或有关试剂的查询或订单。在一些实现方式中，一用户选择探针设置标识符，它识别能够检测生物分子的微阵列探针设置组。识别的相应基因或EST和用有关的产品数据与之相关，并把产品数据提供给用户。此外，用户可以基于产品数据选择购买的产品。如果是这样的话，基于购买订单可以调整用户的帐户。在相同的或其他的实现方式中，一个局部的基因组数据库被周期的更新。响应于探针设置标识符的一用户选择，从局部基因组数据库把对应于基因或EST的有关数据提供到用户。Systems, methods, and computer program products are described that process inquiries or orders for purchasing biological equipment, materials, or related reagents. In some implementations, a user selects a probe set identifier that identifies the set of microarray probe sets capable of detecting biomolecules. The corresponding gene or EST is identified and associated therewith with relevant product data, and the product data is provided to the user. In addition, users can select products to purchase based on product data. If so, the user's account may be adjusted based on the purchase order. In the same or other implementations, a local genome database is periodically updated. Responsive to a user selection of a probe set identifier, relevant data corresponding to a gene or EST is provided to the user from the local genome database.

Description

Method and system for providing GeneNet portal

相关申请related application

本申请要求美国临时专利申请序列号为No.60/178,077，标题为“用于提供基因网入口的方法，系统，和计算机软件”的优先权，该申请于2000年1月25日提出申请，在此结合参考它的全部内容以用于所有目的。This application claims priority to U.S. Provisional Patent Application Serial No. 60/178,077, entitled "Method, System, and Computer Software for Providing GeneNet Portal," filed January 25, 2000, It is hereby incorporated by reference in its entirety for all purposes.

背景技术Background technique

本发明涉及生物信息学领域，特别涉及在诸如因特网的网络上用于提供基因信息的计算机系统，方法，和产品。The present invention relates to the field of bioinformatics, in particular to a computer system, method, and product for providing gene information on a network such as the Internet.

对于分子生物学，生物化学，和许多有关健康领域的研究需要大量的组织机构以及由新的实验技术产生的复杂数据的分析。通过快速发展的生物信息学领域从事这些任务。例如参看，由H.Rashidi and K.Buehler著的《生物信息学基础》：生物学和医学应用( Application in Biological Science and Medicine)(CRC Press，London，2000)；生物信息学：分析基因和蛋白质的实用指南( A Practical Guide to the Analysis of Gene and Proteine)(B.F.Ouelette and A.D.Bzevanis，eds.，Wiley&Sons，Inc.，1998)，在此结合参考它们的全部内容。概括的说，生物信息学的一个范畴是把计算技术应用到大型的基因数据库，通常在诸如因特网之类的网络上分布和通过网络访问，以便达到说明基因结构和/或位置，蛋白质功能，和新陈代谢处理之间的关系。Research in molecular biology, biochemistry, and many health-related fields requires extensive organization and analysis of complex data generated by new experimental techniques. These tasks are pursued through the rapidly developing field of bioinformatics. See, eg, Fundamentals of Bioinformatics by H. Rashidi and K. Buehler: Application in Biological Science and Medicine (CRC Press, London, 2000); Bioinformatics: Analysis of Genes and Proteins ( A Practical Guide to the Analysis of Gene and Protein ) (BFOuelette and ADBzevanis, eds., Wiley & Sons, Inc., 1998), the entire contents of which are hereby incorporated by reference. Broadly speaking, a field of bioinformatics is the application of computational techniques to large genetic databases, usually distributed over and accessed over a network such as the Internet, in order to describe gene structure and/or location, protein function, and Relationships between metabolic processes.

发明内容Contents of the invention

扩展使用微阵列技术是促进生物信息发展的一个推动力。特别的是，微阵列和相关的仪器和计算机系统已经快速的发展起来，并且大规模的收集组织采样中有关基因或表达序列标签(EST)的表达式的数据。在这些事情当中，可使用该数据以用于研究遗传特性和检测有关基因和其他疾病或条件的突变。更特别的是，通过微阵列实验获得的数据对于研究是有价值的，因为在许多其它原因当中，许多疾病状态实质上是通过各种基因的不同的表现水平来表现其特征的，也通过在遗传DNA的复制数中的改变或通过特定基因转录(transcripts)(例如，通过启动控制，提供RNA前体，或RNA处理)水平中的改变。这样，例如，研究员使用微阵列回答问题：在一个恶性肿块的细胞中表现的是哪个基因，但没有在健康组织中表现或者没有在按照一个特殊状况的治疗的组织中表现？在特殊的组织结构中表现的是哪个基因或EST而没有在其他的组织中表现？在特殊的种类中表现的是哪个基因或EST而没有在其他的种类中表现？然而，在回答这些和其他的问题中，数据收集只是一个开始步骤。从由微阵列技术产生的大量的数据中提取生物意义的信息，和设计改进的试验设备，对研究人员来说是一大挑战。现在需要的是用先进的工具和信息提供给研究员以便执行这些任务。The expanded use of microarray technology is a driving force for the advancement of bioinformatics. In particular, microarrays and related instrumentation and computer systems have been developed rapidly and collect data on the expression of genes or expressed sequence tags (ESTs) in tissue samples on a large scale. Among other things, the data can be used to study genetic characteristics and detect mutations in genes and other diseases or conditions. More particularly, data obtained through microarray experiments are valuable for research because, among many other reasons, many disease states are characterized essentially by differential expression levels of various genes, also by the presence of Alterations in the copy number of inherited DNA or through changes in the level of specific gene transcripts (eg, by initiation control, provision of RNA precursors, or RNA processing). Thus, for example, researchers use microarrays to answer the question: Which gene is expressed in the cells of a malignant mass, but not in healthy tissue or tissue treated according to a particular condition? Which gene or EST is expressed in a particular tissue construct but not in others? Which gene or EST is expressed in a particular species but not in others? However, data collection is only a first step in answering these and other questions. Extracting biologically meaningful information from the large amount of data generated by microarray technology, and designing improved experimental equipment, is a major challenge for researchers. What is needed now is to provide researchers with advanced tools and information to perform these tasks.

在此针对这些和其他的需要描述了系统、方法和计算机程序产品。在一些实现方式中，一个网入口处理有关购买生物设备或物质材料，或者有关试剂的查询或订单。用户选择“探针设置标识符”(在下面描述的一个广义术语)，它可以与一个或多个探针的探针组设置组相关。这些探针能够检测生物分子。这些生物分子包括，但不限于此，包括DNA表现的核酸或对应基因的mRNA转录和/或表现(出于方便，该核酸此后简单的称为“mRNA转录”)。相应的基因或EST被识别并与提供给用户的有关数据相关。以某些方面，用户可以选择购买基于数据的产品。如果用户决定作出一个购买，根据购买的订单调整用户的帐户。Systems, methods and computer program products are described herein addressing these and other needs. In some implementations, a web portal handles inquiries or orders related to the purchase of biological equipment or material, or related reagents. The user selects a "probe set identifier" (a broad term described below), which can be associated with a probe set set of one or more probes. These probes are capable of detecting biomolecules. These biomolecules include, but are not limited to, nucleic acids including DNA expression or mRNA transcription and/or expression of corresponding genes (for convenience, the nucleic acid is hereafter simply referred to as "mRNA transcription"). Corresponding genes or ESTs are identified and correlated with the relevant data provided to the user. In some aspects, users may choose to purchase data-based products. If the user decides to make a purchase, the user's account is adjusted according to the purchased order.

这些实现方式的一个优点是，基于来自一个初始实验的结果，可以用针对实验的产品建议展示给一个用户。通过用户的探针设置标识符的选择表示出这些初步的结果，例如通过指定那些探针设置标识符，它们对应于在控制中和试验采样中表示为相当高等级微分式的探针。An advantage of these implementations is that, based on the results from an initial experiment, product recommendations for the experiment can be presented to a user. These preliminary results are indicated by the user's choice of probe set identifiers, eg, by specifying those probe set identifiers corresponding to probes expressed as relatively high-order differentials in the control and experimental samples.

在相同的或其他的实现方式中，一个局部基因组数据库被周期的更新。在某些方面，可以从远程数据库作出这种更新。响应于探针设置标识符的一个用户选择，有关基因或EST的数据从局部基因组数据库被提供到用户。另一方面，有关基因或EST的数据从局部基因数据库被提供到用户以响应一个基因和/或EST标识符的用户选择。In the same or other implementations, a local genome database is periodically updated. In some aspects, such updates can be made from a remote database. In response to a user selection of the probe set identifier, data about the gene or EST is provided to the user from the local genome database. In another aspect, data about genes or ESTs is provided to the user from the local genetic database in response to user selection of a gene and/or EST identifier.

这些实现方式的一些优点包括基于实验的结果用户能够启动一个数据请求的能力。仅仅作为一个例子，用户通过选择对应于相对高的微分基因表达式的探针设置标识符来表明这些结果。此外，这些实现还可以具有优点，因为在用户请求的时间上该基因数据是局部的可用的并且通常不需要包含询问一个远程数据库来响应用户的请求。相反，周期的进行远端数据库的询问，例如一周。这样，即使用户的选择包括大量的探针设置标识符，指明大量的基因和EST的表达式或微分式，从局部基因数据库可以快速地把一个响应提供到用户。由于远程数据库的多径或批询问而通常避免了有效延迟。Some advantages of these implementations include the ability for a user to initiate a data request based on the results of an experiment. As just one example, the user indicates these results by selecting probe set identifiers that correspond to relatively high differential gene expression. Furthermore, these implementations may also have advantages in that the genetic data is locally available at the time of the user's request and generally need not involve interrogating a remote database in response to the user's request. Instead, query the remote database periodically, for example one week. Thus, even if the user's selection includes a large number of probe set identifiers, expressions or differentials specifying a large number of genes and ESTs, a response can be quickly provided to the user from the local gene database. Effective delays are usually avoided due to multipathing or batching of queries to remote databases.

此外，在前述的和其他的实现中，描述了一种方法，通过一个用户放置一个计算机实现的查询或者订购有关购买的一个或多个产品。用户选择第一组探针设置标识符，该选择经因特网被发送到一个入口系统，该系统能够具有对应于用户选择的探针设置标识符的一个或多个基因或EST的相关数据。用户从入口系统接收相关的数据。用户可以选择一些或全部的数据否则表示出另外的期望来购买与数据有关的产品。如果用户选择购买一个产品，用户的帐户因此而被调整。Additionally, in the foregoing and other implementations, a method is described by which a user places a computer-implemented inquiry or order regarding the purchase of one or more products. A user selects a first set of probe set identifiers, and the selection is sent via the Internet to a portal system capable of having data associated with one or more genes or ESTs corresponding to the user selected probe set identifiers. The user receives relevant data from the portal system. The user may select some or all of the data or otherwise express an additional desire to purchase products related to the data. If the user chooses to purchase a product, the user's account is adjusted accordingly.

在一些实现方式中，描述了一种系统，用于提供有关一个或多个基因或EST的数据，其中每个基因或EST至少具有一个对应于由一个探针设置标识符识别的探针设置，并能够检测一个生物学分子。生物学分子可以是一个相应基因的一种核酸或一种mRNA转录。如上所述，一个或多个探针设置标识符可以包括一个基因或EST标识符，比如一个接入编号。系统包括一个输入管理器，它接收第一组探针设置标识符的一个用户选择；一个基因确定器，识别对应于由第一组探针设置标识符标识的探针设置的基因或EST；一个相关器，用数据相关基因或EST；和一个输出管理器，给用户提供数据。这些仪器的输入和输出管理器可以经因特网被耦合到用户。In some implementations, a system is described for providing data about one or more genes or ESTs, where each gene or EST has at least one probe set corresponding to a probe set identified by a probe set identifier, and capable of detecting a biological molecule. The biological molecule can be a nucleic acid or an mRNA transcript of a corresponding gene. As noted above, the one or more probe set identifiers may include a gene or EST identifier, such as an accession number. The system includes an input manager that receives a user selection of a first set of probe set identifiers; a gene determiner that identifies genes or ESTs corresponding to the probe set identified by the first set of probe set identifiers; a A correlator, which correlates genes or ESTs with the data; and an output manager, which provides the data to the user. The input and output managers of these instruments can be coupled to users via the Internet.

第一组探针设置标识符可以是第二组探针设置的探针设置标识符的一个子集，探针设置具有能够检测相应基因或EST的表达式或微分式的能力。例如，通过一个探针阵列软件应用程序提供的一个图形用户接口用户可以选择该子集。例如可以这样作出该选择，通过在表示探针设置的一个分散绘图中围绕着非正常值画一个圈，其中该非正常值表示具有相对高阶的微分式的探针设置。作为许多可能的其他的例子，在一个指令的表格中通过探针设置标识符的高亮输入项用户可以选择子集。The first set of probe set identifiers may be a subset of the probe set identifiers of the second set of probe sets having the ability to detect the expression or differential of a corresponding gene or EST. For example, the subset can be selected by the user through a graphical user interface provided by a probe array software application. This selection can be made, for example, by drawing a circle around outliers in a scatterplot representing probe settings, wherein the outliers represent probe settings with relatively high order differentials. As many other examples are possible, the user can select subsets of the highlighted entries by probe setting identifiers in a command form.

典型的把探针设置安置在一个或多个探针阵列上，如提到的，可以是任何各种类型的微阵列，比如使用VLSIPS^TM技术(下面描述)的那些综合的或点状阵列。因此，术语“探针设置”一般理解为不仅包括一组综合的探针，例如按照VLSIPS^TM技术，而且还包括按照各种点状阵列技术(也在下面描述)沉积的一个或多个点。这些点例如是低聚核苷酸或从那些克隆产生的其他的cDNA无性繁殖或PCR产品。该数据可以包括关于可用性，价格，成分，适用性的产品数据，或者包括生物设备或物质的各种产品的订单，或者一种试剂，它可以用于生物设备或物质，或者附加的信息，比如核苷酸或蛋白质顺序信息或定位的或功能注释信息。作为一些例子，该设备可以是一个探针阵列或一个显微镜载片，或者物质可以是克隆，低聚核苷酸，抗体，或蛋白质。Typically probe arrangements are placed on one or more probe arrays, which, as mentioned, can be any of various types of microarrays, such as those integrated or spot arrays using VLSIPS ^(TM) technology (described below). Thus, the term "probe arrangement" is generally understood to include not only a comprehensive set of probes, eg according to the VLSIPS ^™ technique, but also one or more spots deposited according to various spot array techniques (also described below). These spots are for example oligonucleotides or other cDNA clones or PCR products generated from those clones. This data may include product data regarding availability, price, composition, suitability, or orders for various products including biological devices or substances, or a reagent, which may be used in biological devices or substances, or additional information such as Nucleotide or protein sequence information or localized or functional annotation information. As some examples, the device may be a probe array or a microscope slide, or the substance may be clones, oligonucleotides, antibodies, or proteins.

其他的实现方式直接针对用于提供有关一个或多个基因或EST的数据的方法，其中每个基因或EST至少具有由探针设置标识符识别的相应的探针设置，和能够进行生物分子的检测。生物分子可以是一种核酸或一种相应基因的mRNA转录。该方法包括步骤：接收第一组探针设置标识符的用户选择；识别相应于由第一组探针设置标识符标识的探针设置的基因或EST；用数据与基因或EST进行相关；并将数据提供给用户。还有其他的仪器是直接针对实现前述方法的一种计算机程序产品。Other implementations are directed towards methods for providing data about one or more genes or ESTs, each gene or EST having at least a corresponding probe set identified by a probe set identifier, and capable of biomolecular detection. A biomolecule may be a nucleic acid or an mRNA transcript of a corresponding gene. The method includes the steps of: receiving a user selection of a first set of probe set identifiers; identifying genes or ESTs corresponding to the probe sets identified by the first set of probe set identifiers; correlating the genes or ESTs with the data; and Provide data to users. Still other instruments are directed to a computer program product for carrying out the aforementioned method.

另外的实现直接针对一种方法，用于放置一个计算机实现的查询或有关购买一个或多个产品的订购指令。该方法包括步骤：在用户计算机上接收第一组一个或多个探针设置标识符的用户选择，其中每个探针设置标识符识别一个能够检测相应基因的表达式的探针设置；把用户选择通过因特网提供到一个入口系统，该入口系统能够用数据与相应于由第一组探针设置标识符标识的探针设置的一个或多个基因或EST进行相关；并从入口系统接收相关的数据。此外，用户还可以选择用于购买的产品数据。Additional implementations are directed to a method for placing a computer-implemented inquiry or ordering order for the purchase of one or more products. The method includes the steps of: receiving, at a user computer, a user selection of a first set of one or more probe set identifiers, wherein each probe set identifier identifies a probe set capable of detecting expression of a corresponding gene; selection is provided via the Internet to a portal system capable of correlating the data with one or more genes or ESTs corresponding to the probe set identified by the first set of probe set identifiers; and receiving the associated data. In addition, users can also select product data for purchase.

另一个实现直接针对一种系统，用于提供有关一个或多个基因或EST的数据，其中每个基因或EST至少具有一个由探针设置标识符标识的相应的探针设置，和能够检测一个生物分子。生物分子可以是一种核酸或一种一个对应于基因的mRNA转录。系统包括一个数据库管理器，它周期性地更新包括有关基因或EST数据的一个局部基因数据库；一个输入管理器，接收一个用户选择的探针设置标识符；一个用户服务管理器，相应于探针设置标识符构造有关基因或EST的局部基因数据库数据；以及一个输出管理器，把数据提供到用户。Another implementation is directed towards a system for providing data about one or more genes or ESTs, each of which has at least one corresponding probe set identified by a probe set identifier, and is capable of detecting a Biomolecules. A biomolecule can be a nucleic acid or an mRNA transcript corresponding to a gene. The system includes a database manager which periodically updates a local gene database comprising relevant gene or EST data; an input manager which receives a user-selected probe set identifier; a user service manager which corresponds to the probe set Set identifiers to construct local gene database data about genes or ESTs; and an output manager to provide the data to users.

在上述的实现中，数据库管理器可以周期性更新局部基因数据库，例如一周，用序列数据，外来结构或定位数据，拼接变量数据，标记结构或定位数据，多形态数据，同族数据，蛋白质同族分类数据，路径数据，可替换的基因命名数据，文献列举数据，注解数据，其他的基因组或蛋白质组数据，或者任何它们的组合。通过与远程数据库可能是在因特网上周期性的通信可以完成这种更新。可以包括任何成百上千的公共或所有人的远程数据库，比如GenBank，GenBankNew，SwissPort，GenPept，DB EST，Unigene，PIR，Prosite，PFAM，Prodom，Blocks，PDB，PDBfinder，EC Enzyme，Kegg Pathway，Kegg Ligand，OMIM，OMIM Map，OMIM ALLele，DB SNP，和/或PubMed。而数据库管理器周期性的与远程数据库通信，典型的(但不是必须的)不响应一个用户的请求，输入管理器典型的(但不是必须的)动态的接收用户的探针设置标识符的选择。在本文中使用的单词“动态的”意在表示实时响应一个用户的查询。In the implementation described above, the database manager may update the local gene database periodically, e.g., once a week, with sequence data, foreign structure or localization data, splice variant data, marker structure or localization data, polymorphic data, homology data, protein homology classifications data, pathway data, alternative gene nomenclature data, bibliographic data, annotation data, other genomic or proteomic data, or any combination thereof. This updating can be accomplished through periodic communication with a remote database, possibly over the Internet. Can include any of hundreds or thousands of public or proprietary remote databases such as GenBank, GenBankNew, SwissPort, GenPept, DB EST, Unigene, PIR, Prosite, PFAM, Prodom, Blocks, PDB, PDBfinder, EC Enzyme, Kegg Pathway, Kegg Ligand, OMIM, OMIM Map, OMIM ALLele, DB SNP, and/or PubMed. While the database manager periodically communicates with the remote database, typically (but not necessarily) not responding to a user request, the input manager typically (but not necessarily) dynamically receives the user's choice of probe setting identifier . The word "dynamic" as used herein is intended to mean responding to a user's query in real time.

在另一个实现中，描述了用于提供产品数据的一种系统，该数据可以包括生物产品数据。系统具有一个输入管理器，它从用户接收一个基因，EST，和/或探针设置标识符。例如，用户可以规定一个或多个基因接入号码。此外，系统具有一个用户服务管理器，用一个或多个产品数据相关或关联基因，EST，和/或探针设置标识符。用户服务管理器另外可选择协同一个数据库管理器，从一个或多个局部和/或远程数据库或者其它的局部或远程数据源获得产品数据，例如从一个网页。此外在系统中还包括一个输出管理器，提供产品数据到用户。在一些方面，根据购买可以调整用户帐户，或者对于依赖于卖主的用户，可以调整一个卖主帐户。从用户接收信息和把信息提供到用户可以在一个网络上进行，比如因特网。在另一个方面，描述了用于提供产品数据的一种方法，例如，生物产品数据。该方法包括步骤：从用户接收一个基因，EST，和/或探针设置标识符；用一个或多个产品数据与基因，EST，和/或探针设置标识符相关；从一个局部和/或一个远程数据库或者其它的局部和/或远程数据源获得产品数据；和提供产品数据到用户。该方法可选的包括根据购买调整一个用户帐户，或者对于依赖于卖主的用户调整一个卖主的帐户。In another implementation, a system for providing product data, which may include biological product data, is described. The system has an input manager that receives a gene, EST, and/or probe set identifier from a user. For example, a user may specify one or more genetic access numbers. Additionally, the system has a user services manager that sets identifiers with one or more product data associated or linked genes, ESTs, and/or probes. The customer service manager may also optionally cooperate with a database manager to obtain product data from one or more local and/or remote databases or other local or remote data sources, such as from a web page. In addition, an output manager is included in the system to provide product data to users. In some aspects, user accounts may be adjusted based on purchases, or for users that are dependent on vendors, a vendor account may be adjusted. Receiving information from users and providing information to users can take place over a network, such as the Internet. In another aspect, a method for providing product data, eg, biological product data, is described. The method comprises the steps of: receiving a gene, EST, and/or probe set identifier from a user; correlating the gene, EST, and/or probe set identifier with one or more product data; receiving from a local and/or obtaining the product data from a remote database or other local and/or remote data source; and providing the product data to the user. The method optionally includes adjusting a user account based on purchases, or adjusting a vendor's account for users dependent on the vendor.

另一个方面是用于提供有关一个或多个基因或EST产品数据的一种系统。每个基因或EST至少具有由一个探针设置标识符标识的相应的探针设置，和能够检测一个生物分子。该系统包括一个输入管理器，接收一个或多个探针设置标识符；一个相关器，用一个或多个产品数据的第一组与探针设置标识符相关；和一个输出管理器，提供第一组数据给用户。另一个方面是用于提供有关一个或多个基因或EST的产品数据的一种系统。该系统包括一个输入管理器，接收一个或多个基因和/或EST标识符；一个相关器，用一个或多个产品数据的第一组与标识符相关；和一个输出管理器，提供第一组数据给用户。Another aspect is a system for providing data about one or more genes or EST products. Each gene or EST has at least a corresponding probe set identified by a probe set identifier and is capable of detecting a biomolecule. The system includes an input manager that receives one or more probe set identifiers; a correlator that correlates the probe set identifiers with one or more first sets of product data; and an output manager that provides the first A set of data to the user. Another aspect is a system for providing product data about one or more genes or ESTs. The system includes an input manager that receives one or more gene and/or EST identifiers; a correlator that correlates the identifiers with a first set of one or more product data; and an output manager that provides the first Group data to the user.

一个附加的方面是用于提供有关一个或多个基因或EST产品数据的一种方法。每个基因或EST至少具有由一个探针设置标识符标识的相应的探针设置，和能够检测一个生物分子。该方法包括步骤，接收一个或多个探针设置标识符；用一个或多个产品数据的第一组与探针设置标识符相关；和提供第一组数据给用户。另一个方面是提供有关一个或多个基因或EST产品数据的一种方法。该方法包括步骤，接收一个或多个基因和/或EST标识符；用一个或多个产品数据的第一组与标识符进行相关；和提供第一组数据给用户。An additional aspect is a method for providing data about one or more genes or EST products. Each gene or EST has at least a corresponding probe set identified by a probe set identifier and is capable of detecting a biomolecule. The method includes the steps of receiving one or more probe set identifiers; correlating the probe set identifiers with a first set of one or more product data; and providing the first set of data to a user. Another aspect is a method of providing data about one or more genes or EST products. The method includes the steps of receiving one or more gene and/or EST identifiers; correlating the identifiers with a first set of one or more product data; and providing the first set of data to a user.

按照本发明的另一个方面，描述了用于提供有关一个或多个基因或EST产品数据的一种系统。该系统包括接收装置，用于在因特网上接收一个或多个基因或EST标识符；相关装置，用于用一个或多个产品数据与基因或EST标识符进行相关；以及提供装置，用于提供产品数据给用户。According to another aspect of the invention, a system for providing data about one or more genes or EST products is described. The system includes receiving means for receiving one or more gene or EST identifiers on the Internet; correlating means for correlating one or more product data with the gene or EST identifiers; and providing means for providing Product data to users.

按照本发明的另一个方面，描述了用于提供有关一个或多个基因或EST产品数据的一种系统，其中每个基因或EST至少具有由探针设置标识符标识的相应的探针设置，并能够检测一个生物分子。该系统包括：接收装置，用于从用户接收一个或多个探针设置标识符的第一组的选择；相关装置，用于将一个或多个产品数据的第一组与第一组的探针设置标识符进行相关；和提供装置，用于提供第一组数据给用户。According to another aspect of the present invention, a system for providing product data about one or more genes or ESTs is described, wherein each gene or EST has at least a corresponding probe set identified by a probe set identifier, and be able to detect a biomolecule. The system includes: receiving means for receiving from a user a selection of a first set of one or more probe setting identifiers; correlating with the setting identifier; and providing means for providing the first set of data to the user.

在一个附加的方面，描述了用于提供有关一个或多个基因或EST数据的一种系统，其中每个基因或EST至少具有由探针设置标识符表示的一个相应的探针设置，和能够检测生物分子。该系统包括更新装置，用于周期性地更新包括有关基因或EST数据的一个局部基因数据库；输入管理装置，用于从用户接收一个或多个探针设置标识符的第一组的选择；数据管理装置，用于从局部基因数据库周期性地更新有关对应于第一组探针设置标识符的基因或EST的第一组数据；和提供装置，用于提供第一组数据给用户。In an additional aspect, a system for providing data about one or more genes or ESTs is described, wherein each gene or EST has at least one corresponding probe set represented by a probe set identifier, and is capable of Detect biomolecules. The system includes updating means for periodically updating a local gene database comprising relevant gene or EST data; input management means for receiving from a user a selection of a first set of one or more probe set identifiers; data managing means for periodically updating the first set of data from the local gene database regarding genes or ESTs corresponding to the first set of probe set identifiers; and providing means for providing the first set of data to a user.

上述的实现方式不必彼此包含或排斥并可以以任何方式组合，是非冲突的和有各种可能的，不管它们是以相同的，或不同的方面或实现方式出现。一个实现的描述并不是用来对其他实现方式进行限制。此外，在该说明书中其他地方描述的任何一个或各个功能，步骤，操作，或技术可以以可替换的实现方式来结合在概述中描述的任何一个或多个功能，步骤，操作，或技术。因此，上述的实现方式仅是示例而不是用来限定的。The implementations described above are not necessarily mutually inclusive or exclusive and can be combined in any way, non-conflicting and variously possible, whether they appear in the same or different aspects or implementations. The description of one implementation is not intended to limit other implementations. Furthermore, any one or each of the functions, steps, operations, or techniques described elsewhere in this specification may be combined in alternative implementations with any one or more of the functions, steps, operations, or techniques described in the Summary. Therefore, the above-mentioned implementation manners are only examples and not intended to be limiting.

附图说明Description of drawings

通过结合附图进行的下列的详细描述，上述和其他的优点将变得更加显而易见。在附图中，相同的参考数字表示相同的结构或方法步骤，并且参考数字的最左边的一个或两个数字说明该图的编号，在图中该参考单元是第一次出现(例如，单元180在图1中第一次出现和单元1020在图10中第一次出现)。在功能块图中，长方形通常表示功能单元，平行四边形通常表示数据，带弧边的长方形通常表示存储的数据，带有一对双边界的长方形通常表示预定义的功能单元，和梯形通常表示手动操作。在方法流程图中，长方形通常表示方法步骤和菱形通常表示判定单元。然而，所有这些习惯用法只是意在典型或示例，而不是用来受限制。The above and other advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. In the drawings, like reference numbers indicate like structural or method steps, and the leftmost one or two digits of a reference number indicate the number of the figure in which the referenced element first appears (e.g., element 180 appears for the first time in FIG. 1 and unit 1020 appears for the first time in FIG. 10). In function block diagrams, rectangles typically represent functional units, parallelograms typically represent data, rectangles with curved sides typically represent stored data, rectangles with a pair of double borders typically represent predefined functional units, and trapezoids typically represent manual operations . In method flow diagrams, rectangles generally represent method steps and diamonds generally represent decision units. However, all of these idioms are intended to be typical or exemplary only, and not intended to be limiting.

图1是包括一个扫描仪和一个计算机系统的一个探针阵列分析系统的功能方框图，在计算机系统上可以执行计算机应用程序，用于提供探针设置标识符和用于接收针用于处理探针设置标识符的用户选择；Figure 1 is a functional block diagram of a probe array analysis system including a scanner and a computer system on which a computer application program can be executed for providing probe set identifiers and for receiving needles for processing the probes User choice of setting identifier;

图2是探针阵列分析应用程序的一个实施例的功能方框图，如所示的用于在图1的计算机系统的系统存储器中所存储的应用程序；Figure 2 is a functional block diagram of one embodiment of a probe array analysis application, as shown for the application stored in the system memory of the computer system of Figure 1;

图3是用于经因特网获得基因信息的一个常规系统的功能方框图；Fig. 3 is a functional block diagram of a conventional system for obtaining genetic information via the Internet;

图4是经因特网耦合到远程数据库和网页以及耦合到客户的一个基因入口的一个实施例的功能方框图，包括具有图1的用户计算机系统的网络；Figure 4 is a functional block diagram of one embodiment of a gene portal coupled to remote databases and web pages via the Internet and to customers, comprising a network with the user computer system of Figure 1;

图5是图4的基因入口的一个实施例的功能方框图，包括一个数据库服务器，入口应用计算机系统，和入口端因特网服务器的示例实施例；Figure 5 is a functional block diagram of an embodiment of the gene portal of Figure 4, including a database server, a portal application computer system, and an exemplary embodiment of a portal-side Internet server;

图6是一个简化图，表示计算机应用平台的一个实施例，用于结合诸如图4中所示的客户实现图4和5的基因入口；Figure 6 is a simplified diagram representing an embodiment of a computer application platform for implementing the gene portals of Figures 4 and 5 in conjunction with clients such as those shown in Figure 4;

图7是一个实施例的方法流程图，用于提供给一个用户有关基因表达式，或者微分式，实验结果的基因产品信息；Figure 7 is a flow chart of a method of an embodiment for providing a user with gene product information related to gene expression, or differential expression, experimental results;

图8是可以在图5的入口应用计算机系统上执行的一个用户服务管理器应用的一个实施例的功能方框图；Figure 8 is a functional block diagram of one embodiment of a user service manager application that may be executed on the portal application computer system of Figure 5;

图9是一个简化图，表示一个基因或探针设置标识符到数据库的一个实施例，比如可以通过图8的用户服务管理器结合图7的方法；Fig. 9 is a simplified diagram, representing an embodiment of a gene or probe setting identifier to a database, for example, the user service manager in Fig. 8 can be combined with the method in Fig. 7;

图10是一个图示用户接口的实施例，可以通过图2的一个探针阵列分析应用来产生；以及Figure 10 is an embodiment of a graphical user interface that may be generated by a probe array analysis application of Figure 2; and

图11是另一个图示用户接口的实施例，可以通过图2的一个探针阵列分析应用来产生。FIG. 11 is another example illustrating a user interface that may be generated by a probe array analysis application of FIG. 2 .

具体实施方式Detailed ways

现在，结合参考基因入口400的一个示例实施例来描述系统、方法和计算机产品。所示的入口400处于图4中的一个因特网环境中，并在图5-11中进行了更详细的示例。Systems, methods and computer products are now described with reference to an example embodiment of the Gene Portal 400 . Portal 400 is shown in an Internet environment in Figure 4 and is illustrated in more detail in Figures 5-11.

在一个典型的实现中，入口400可以用于从带有探针阵列的实验中把有关结果的信息提供给一个用户。该实验通常包括使用扫描设备检测探针靶对的杂交，和通过各种软件应用程序分析检测的杂交，现在结合图1和2进行描述。In a typical implementation, portal 400 can be used to provide a user with information about results from experiments with probe arrays. The experiment typically involves detection of hybridization of probe-target pairs using a scanning device, and analysis of the detected hybridization by various software applications, now described with reference to FIGS. 1 and 2 .

探针阵列103Probe Array 103

各种技术和科技可以用于存放或合成在一个基片或支持物上的生物材料的密集的阵列。例如，由加利福尼亚Santa Clara Affymetrix有限公司制造的AffymetrixGeneChip阵列，按照有时称作VLSIPS^TM(超大规模的固定聚合物综合)的技术进行合成。VLSIPS^TM技术的一些方面在下列的美国专利中都有描述：5,143,854(Pirrung，et al.)；5,445,934(Fodor，et al.)；5,744,305(fodor，et al.)；6,022,963(Mcgall，et al.)；和6,083,697(Beecher，et al.)在。在此结合这些专利的全部内容作为参考。这些阵列的探针由低聚核苷酸组成，它通过一些方法来合成，该方法包括激活一个基片区和然后用选择的单体溶液接触基片的步骤。被激活的区域通过一个掩膜用一个光源显示，这与在制造集成电路中使用的照相技术是相同的。基片的其他区域保持非激活状态，因为掩膜阻断了对它们的照射。通过重复激活不同的区域组和用基片接触不同的单片溶液，在基片上产生了聚合物的不同的阵列。在这些方法的各种实现中使用了各种其他的步骤，比如从基片中洗涤未反应的单片溶液。Various techniques and technologies can be used to deposit or synthesize dense arrays of biological materials on a substrate or support. For example, Affymetrix(R) GeneChip(R) arrays, manufactured by Affymetrix Inc. of Santa Clara, California, are synthesized according to a technique sometimes referred to as VLSIPS ^(TM) (Very Large Scale Immobilized Polymer Synthesis). Some aspects of VLSIPS ^™ technology are described in the following U.S. Patents: 5,143,854 (Pirrung, et al.); 5,445,934 (Fodor, et al.); 5,744,305 (fodor, et al.); 6,022,963 (Mcgall, et al. ); and 6,083,697 (Beecher, et al.) at. The entire contents of these patents are hereby incorporated by reference. The probes for these arrays consist of oligonucleotides which are synthesized by methods which include the steps of activating a region of the substrate and then contacting the substrate with a solution of selected monomers. The activated areas are revealed through a mask with a light source, the same photography technique used in the manufacture of integrated circuits. Other areas of the substrate remain inactive because the mask blocks illumination from them. By repeatedly activating different sets of regions and exposing the substrate to different monolithic solutions, different arrays of polymers were created on the substrate. Various other steps are used in various implementations of these methods, such as washing unreacted monolithic solutions from the substrate.

这些探针一般地连同标签的生物试样一起使用，比如细胞，蛋白质，基因或EST，其它的DNA序列，或其他的生物元素。在此称为“目标”的这些试样被处理以便它们与探针阵列中的确定探针空间上相关联。例如，一个或多个化学标签的生物试样，即目标，在探针阵列上被分布。一些目标与至少空间上补偿探针杂交和保持在探针的位置上，同时非杂交的目标被洗去。这些带有它们的“标记”或“标签”的杂交的目标因而与补偿探针的目标有关。杂交的探针和目标有时可以被称作“探针-目标对”。检测这些对能出于各种目的，比如确定是否一种目标核蛋白酸具有与一个特定参考序列相同或不同的一个核苷酸序列。例如参看，美国专利No.5,837,832，涉及和结合了上面的内容。其它的使用包括基因表达式监视和评估(例如参见，美国专利No.5,800,992(Fodor，et al.)；美国专利No.6,040,138(Lockhart,et al.)；和国际申请号PCT/US98/15151，公开为WO99/05323(BALABAN，etal.))，基因类型(美国专利No.5,856,092，Dal，et al.)，或其他核蛋白酸的检测。上述‘992’，‘138’，和‘092’专利，以及公开物WO99/05323，在此出于所有目的而结合参考它们的全部内容。These probes are typically used in conjunction with labeled biological samples, such as cells, proteins, genes or ESTs, other DNA sequences, or other biological elements. These samples, referred to herein as "targets," are processed so that they are spatially associated with defined probes in the probe array. For example, one or more chemically labeled biological samples, ie, targets, are distributed on the probe array. Some targets hybridize to at least spatially compensated probes and remain in place of the probes, while non-hybridized targets are washed away. These hybridized targets with their "labels" or "tags" are thus related to the targets of the compensation probes. Hybridized probes and targets may sometimes be referred to as "probe-target pairs." Detecting these pairs can serve various purposes, such as determining whether a nucleic acid of interest has a nucleotide sequence that is identical or different to a particular reference sequence. See, eg, US Patent No. 5,837,832, relating to and incorporating the above. Other uses include gene expression monitoring and assessment (see, e.g., U.S. Patent No. 5,800,992 (Fodor, et al.); U.S. Patent No. 6,040,138 (Lockhart, et al.); and International Application No. PCT/US98/15151, Published as WO99/05323 (BALABAN, et al.)), genotype (US Patent No. 5,856,092, Dal, et al.), or detection of other nucleoproteins. The '992', '138', and '092 patents mentioned above, as well as publication WO99/05323, are hereby incorporated by reference in their entirety for all purposes.

现存在有用于在一个基片或支持上沉积探针的其他的技术。例如，商业上在显微镜片上制造的“点状阵列”。这些阵列包括液体点，它们包含潜在变化的合成物和浓缩物的生物材料。例如，在阵列中的一个点可以在一种水溶液中包括少许短条低聚核苷酸，或者它可以包括高浓缩的长条合成蛋白质。Affymetrix417^TM阵列器是一种设备，它按照这些技术和方法，在一个显微镜片上沉积密集压缩的生物材料阵列，这些技术和方法在PCT申请PCT/US99/00730(国际公开号WO99/36760)进行了描述，在此结合它的全部内容作为参考。此外，还存在用于产生点状阵列的其他的技术。例如，美国专利nO.6,040,193(Winkler，et al.)是针对处理配制滴剂以产生点状阵列。‘193专利，和美国专利No.5,885,837(Winkler)也描述了在一个基片上或在放量在基片的块上使用微通道或密纹，以便合成生物材料的阵列。这些专利进一步描述了通过反应区上的惰性区和测试点把一个基片的反应区彼此隔离。在此结合参考‘193和‘837专利的全部内容。另外的技术是基于喷流生物材料以形成一个点状阵列。其他实施的喷射技术可以使用诸如吸液比重计或压力电子泵以推进生物材料。目前有各种其他的技术用于在一个基片上或在基片内合成，沉积，或定位生物材料。Other techniques exist for depositing probes on a substrate or support. For example, "arrays of dots" that are commercially fabricated on microscope slides. These arrays include liquid spots that contain potentially varying compositions and concentrates of biological material. For example, a spot in an array may comprise a few short strands of oligonucleotides in an aqueous solution, or it may comprise highly concentrated strands of synthetic proteins. The Affymetrix(R) 417 ^™ Arrayer is a device for depositing densely packed arrays of biological material on a microscope slide according to the techniques and methods described in PCT Application PCT/US99/00730 (International Publication No. WO99/36760) described, the entire contents of which are hereby incorporated by reference. In addition, other techniques exist for producing spot arrays. For example, US Patent nO. 6,040,193 (Winkler, et al.) formulates drops for treatment to produce dotted arrays. The '193 patent, and US Patent No. 5,885,837 (Winkler) also describe the use of microchannels or dimples on a substrate or on blocks loaded onto a substrate to synthesize arrays of biomaterials. These patents further describe the isolation of reactive regions of a substrate from each other by inert regions and test points on the reactive regions. The entire contents of the '193 and '837 patents are hereby incorporated by reference. Other techniques are based on jetting biological material to form a dotted array. Other implemented ejection techniques may use devices such as hydrometers or pressure electronic pumps to propel the biomaterial. Various other techniques are currently available for synthesizing, depositing, or localizing biological materials on or within a substrate.

为了确保合适的解释在此使用的术语“探针”，应该注意在相关文献中出现的相矛盾的惯例。在一些文章中使用的单词“探针”并不涉及如上所述的在一个基片上被合成的或在一个载片上被沉积的生物材料，但在此被称作“目标”。为避免混淆，在此使用的术语“探针”被称作诸如按照VLSIPS^TM技术合成的那些探针；以便生成点状阵列的沉积的生物材料；和合成的，沉积的，或定位的试样以形成按照其他目前的或未来技术的阵列。这样，出于方便，按照任何这些技术形成的微阵列此后可以被通常和集中地称作“探针阵列”。而且，术语“探针”并不限于固定在阵列格式中的探针。相反，对于其它并行的试验设备，描述的功能和方法对于提供基因组信息和智能e-商业也是有用的。例如，这些功能和方法可以被应用于探针设置标识符，识别在珠子上和珠子中的，光纤中的，或其他物质或媒介的固定的探针。In order to ensure proper interpretation of the term "probe" as used herein, attention should be paid to contradictory conventions appearing in the relevant literature. The word "probe" used in some articles does not refer to biological material synthesized on a substrate or deposited on a slide as described above, but is referred to herein as "target". To avoid confusion, the term "probe" is used herein to refer to probes such as those synthesized according to the VLSIPS ^™ technique; deposited biological material to generate dotted arrays; and synthesized, deposited, or localized samples to form arrays according to other present or future technologies. Thus, for convenience, microarrays formed according to any of these techniques may be referred to hereinafter generally and collectively as "probe arrays". Furthermore, the term "probe" is not limited to probes immobilized in an array format. On the contrary, for other parallel experimental facilities, the described functions and methods are also useful for providing genomic information and smart e-commerce. For example, these functions and methods can be applied to probe set identifiers, identifying immobilized probes on and in beads, in optical fibers, or other substances or media.

典型的探针通过检测在目标中出现mRNA的转录存在或分布量能够检测对应的基因或EST的表达式。通过检测标签的cRNA可以依次地完成这种检测，该标签的cRNA是从目标中的mRNA导出的cDNA中导出的。通常，一个探针设置包含在唯一转录区中的子序列和不对应于一个完整的基因序列。在此通常使用的单词“设置”涉及一个或多个；例如，探针设置可以由一个或多个探针组成，和一组探针设置标识符可以由一个或多个探针设置标识符组成。Typical probes are capable of detecting the expression of a corresponding gene or EST by detecting the presence or distribution of transcripts present in the target mRNA. This detection can be done in turn by detecting tagged cRNA derived from cDNA derived from mRNA in the target. Typically, a probe set contains subsequences in uniquely transcribed regions and does not correspond to a complete gene sequence. The word "set" as used generally herein refers to one or more; for example, a probe set may consist of one or more probes, and a set of probe set identifiers may consist of one or more probe set identifiers .

扫描仪190Scanner 190

图1是一个系统的功能方框图，尤其适用于在其他事情中分析已经被标签的目标杂交的探针阵列。图1的表示杂交的探针阵列103可以包括任何类型的探针阵列，如上所述。使用各种商业设备可以检测在杂交探针阵列103中的标签的目标，出于方便此后称作“扫描仪”。图1所示的一个示例的设备是扫描仪190。通过从标签中检测荧光或其他的辐射，或者通过发射的，反射的，或散射的辐射扫描仪成像目标。出于方便，此后这些处理通常集中的简单称作“辐射”的检测所使用各种的检测方案。取决于辐射和其它因素的类型。一个典型的方案是使用光和其它的元素以提供激励光和选择性地收集辐射。此外，通常包括使用光电二极管，电荷耦合器，光电倍增管，或类似的设备以便登记收集辐射的各种光检测器系统。例如，使用荧光标签的一种扫描系统在美国专利No.5,143,854中被描述，可以参考上述内容进行结合。在美国专利5,578,832；5,631,734；5,834,758；5,981,956和6,025,601，以及在PCT申请PCT/US99/06097(公开号WO99/47964)中描述了其他的扫描仪或扫描系统，在此出于所有目的结合参考它们的全部内容。Figure 1 is a functional block diagram of a system particularly useful for analyzing, among other things, an array of probes hybridized to a tagged target. Hybridization-representing probe array 103 of FIG. 1 may comprise any type of probe array, as described above. The labeled targets in the hybridization probe array 103 can be detected using various commercial equipment, hereafter referred to as "scanners" for convenience. One example device shown in FIG. 1 is scanner 190 . Scanners image targets by detecting fluorescent or other radiation from tags, or by emitted, reflected, or scattered radiation. For convenience, hereafter these processes generally focus on detection simply referred to as "radiation" using various detection schemes. Depends on the type of radiation and other factors. A typical approach is to use light and other elements to provide excitation light and selectively collect radiation. Additionally, various photodetector systems using photodiodes, charge-coupled devices, photomultiplier tubes, or similar devices to register collected radiation are typically included. For example, one scanning system using fluorescent tags is described in US Patent No. 5,143,854, incorporated herein by reference. Other scanners or scanning systems are described in U.S. Patent Nos. 5,578,832; 5,631,734; 5,834,758; 5,981,956 and 6,025,601, and in PCT Application PCT/US99/06097 (publication number WO99/47964), the contents of which are incorporated herein by reference for all purposes. all content.

扫描仪190提供表示检测的辐射的强度(也可以是其它的特征，比如颜色)的数据，以及在基片上检测到辐射的位置。该数据一般以一种数据文件的形式被存储在一个存储器设备中，比如在用户计算机100的系统存储器120中。一种类型的数据文件，比如图2所示的图像数据文件212，一般包括对应于扫描基片的子区域的元素的强度和位置信息。在文章中的术语“元素的”意味着来自该区域辐射的强度，和/或其他的特征，每个表示一个单一的值。当显示成用于观看或处理的一个图像时，元素的图像元素，或像素，通常表示该信息。因此，例如，一个象素具有从基片扫描辐射时表示基片子区域元素的强度的一个单一的值。该象素也可以具有表示另外特征的另外的值，比如颜色。例如，在其中检测高强度辐射的一个扫描的元素的子区域可以通过具有高亮度的象素表示(此后，称为“明亮”像素)，和低强度辐射可以通过低亮度(一个“暗淡”像素)的一个象素来表示。可替换的，可以作出一个象素的彩色值来表示强度，颜色，或检测的辐射的其他的特征。这样，一个高亮度辐射的区域可以被显示成红象素以及一个低亮度辐射区域可以被显示成蓝像素。作为另一个例子，在基片的特定的子区域上的一个波长的检测的辐射可以被表示成红象素，和在另一个子区域上检测的第二波长的辐射可以通过一种接近的蓝像素表示。许多其他的方案是已知的。Scanner 190 provides data representing the intensity (and possibly other characteristics, such as color) of the detected radiation, as well as the location on the substrate where the radiation was detected. The data is typically stored in a memory device, such as system memory 120 of user computer 100, in the form of a data file. One type of data file, such as image data file 212 shown in FIG. 2, generally includes intensity and position information for elements corresponding to sub-regions of the scanned substrate. The term "elemental" in the text means the intensity of radiation from the region, and/or other characteristics, each representing a single value. When displayed as an image for viewing or manipulation, the element's picture elements, or pixels, generally represent that information. Thus, for example, a pixel has a single value representing the intensity of a subarea element of the substrate as radiation is scanned from the substrate. The pixel may also have additional values representing additional characteristics, such as color. For example, a subregion of a scanned element in which high-intensity radiation is detected may be represented by a pixel with high intensity (hereinafter, referred to as a "bright" pixel), and low-intensity radiation may be represented by a low-intensity (a "dim" pixel ) to represent a pixel. Alternatively, a pixel's color value can be made to represent intensity, color, or other characteristic of the detected radiation. Thus, an area of high luminance radiation can be displayed as red pixels and an area of low luminance radiation can be displayed as blue pixels. As another example, detected radiation of one wavelength on a particular subregion of the substrate can be represented as red pixels, and detected radiation of a second wavelength on another subregion can be represented by a near blue pixel. Pixel representation. Many other solutions are known.

探针阵列分析应用199Probe Array Analysis Applications 199

一般的，人们可以检查一个打印的或显示的由一个图像文件中的数据构成的图像并可以识别那些是明亮的或暗淡的单元，或者另外通过一个像素特征(比如颜色)来识别。然而，这需要频繁的以一种自动化的，可计量的，和重复性的方式提供该信息，这是与各种图像处理和/或分析技术相兼容的。例如，通过有关位置的计算机应用可以提供信息来用于处理，在该位置上用已知的位置检测杂交的目标，在已知的位置上，已知的相同的探针被合成或沉积。诸如目标DNA或RNA的核苷酸或单体的信息则可以被推导。已经描述了作出这些推导的技术，例如，在美国专利No.5,733,729(Lipshutz)中，和美国专利NO.5,837,832中，在此出于各种目的结合参考它们的全部内容。Typically, one can examine a printed or displayed image constructed from data in an image file and can identify cells that are bright or dim, or otherwise identified by a pixel characteristic such as color. However, this frequently requires providing this information in an automated, quantifiable, and repeatable manner that is compatible with various image processing and/or analysis techniques. For example, information can be provided for processing by computer application regarding the locations at which hybridized targets are detected using known locations at which known identical probes were synthesized or deposited. Information such as nucleotides or monomers of the target DNA or RNA can then be derived. Techniques for making these derivations have been described, for example, in US Patent No. 5,733,729 (Lipshutz), and US Patent No. 5,837,832, the entire contents of which are hereby incorporated by reference for all purposes.

商业上可获得各种的计算机软件应用程序来用于控制扫描仪(和其他的有关杂交处理的仪器，比如杂交箱)，以及用于获得和处理由扫描仪提供的图像文件。示例是来自Affymetrix公司的Jaguar^TM应用程序，这一方面描述在美国临时专利申请，序列号为60/226,999，于2000年8月22日申请，和来自Affymetrix的微阵列程序应用，这方面描述在美国临时专利申请，序列号为60/220,587，于2000年7月25日申请。由这些应用程序产生的处理过的图像文件通常被进一步处理以提取附加的数据。特别是，数据挖掘软件应用程序通常用于辅助标识和分析生物上的所关心的模式或探针设置的杂交的程度。Affymetrix数据挖掘工具是这种类型的一个软件应用的例子。此外，软件应用程序用于存储和管理通常由探针阵列实验和通过上述的图像处理和数据挖掘软件产生的大量的数据。Affymetrix实验室信息管理系统(LIMS)是这些数据管理应用程序的一个例子，它的这些内容被描述在美国临时专利申请，序列号为60/220,645，于2000年7月25日提交申请。此外，由数据库管理软件访问的各种特性数据库，比如AffymetrixEASI(表达式分析序列信息)数据库和数据库软件，提供给研究者探针设置和基因或EST标识符之间的关系。在这段中提到的所有专利申请在此结合参考它们的全部内容。A variety of computer software applications are commercially available for controlling scanners (and other hybridization-related instruments, such as hybridization chambers), and for obtaining and processing image files provided by scanners. Examples are the Jaguar ^™ application from Affymetrix Corporation, described in this regard in U.S. Provisional Patent Application Serial No. 60/226,999, filed August 22, 2000, and the Microarray program application from Affymetrix, described in this regard at U.S. Provisional Patent Application Serial No. 60/220,587, filed July 25, 2000. Processed image files produced by these applications are often further processed to extract additional data. In particular, data mining software applications are often used to aid in the identification and analysis of patterns of biological interest or the degree of hybridization of probe sets. The Affymetrix(R) data mining tool is an example of a software application of this type. In addition, software applications are used to store and manage the large volumes of data typically generated by probe array experiments and through the image processing and data mining software described above. An example of such data management applications is the Affymetrix(R) Laboratory Information Management System (LIMS), the subject of which is described in US Provisional Patent Application Serial No. 60/220,645, filed July 25,2000. In addition, various property databases accessed by database management software, such as the Affymetrix(R) EASI (Expression Analysis Sequence Information) database and database software, provide the researcher with the relationship between probe sets and gene or EST identifiers. All patent applications mentioned in this paragraph are hereby incorporated by reference in their entirety.

出于参考的方便性，这些类型的计算机软件应用程序(即用于获得和处理图像文件，数据挖掘，数据管理，各种数据库和与有关探针阵列分析的其他的应用程序)在图1中通常集中的表示为分析应用程序199。图2是探针阵列分析应用程序199的一个功能方框图，如示例存储用于执行的(对应于应用程序199的可执行的代码199A)在图1的用户计算机100的系统存储器120中的程序。For ease of reference, these types of computer software applications (i.e., for acquiring and processing image files, data mining, data management, various databases, and other applications related to probe array analysis) are shown in Figure 1 Typically centrally represented as an analysis application 199 . FIG. 2 is a functional block diagram of probe array analysis application 199 , as an example program stored for execution (corresponding to executable code 199A of application 199 ) in system memory 120 of user computer 100 of FIG. 1 .

作为本领域技术人员应该清楚，应用程序199存储在和/或从计算机100执行并不是必需的；相反，应用程序199的一些或全部可以存储在和/或从一个应用程序服务器或者其他的计算机平台执行，它们在一个网络中被连接到计算机100。例如，对于涉及大规模数据库操作的应用程序就具有特别的优越性，比如AffymetrixLIMS或者Affymetrix数据挖掘工具(DMT)，将从一个数据库服务器来执行，比如图4的用户数据库服务器412。可替换的，LIMS，DMT，和/或其他的应用程序可以从计算机100上执行，但在其上运行的那些应用程序的数据库的一些或全部可以被存储来用的在服务器412上公共访问(可能连同一个数据库管理程序，比如来自Oracle公司的Oracle8.0.5数据库管理系统)。按照已知的技术使用商业上可用的硬件和软件就可以实现这样的网络安排，比如那些可用于一个局域网络或广域网的。图4中表示了通过网络电缆480将用户计算机100连接到用户数据库服务器412(并连接到用户端因特网客户410，它可以是相同的计算机)的一个局域网。同样的，出于控制扫描仪190和接收从它输入的数据的目的，可以使扫描仪190(或多个扫描仪)经电缆480用于一个用户的网络。As should be clear to those skilled in the art, it is not required that the application 199 be stored on and/or executed from the computer 100; rather, some or all of the application 199 may be stored on and/or executed from an application server or other computer platform. To execute, they are connected to the computer 100 in a network. For example, it is especially advantageous for applications involving large-scale database operations, such as Affymetrix(R) LIMS or Affymetrix(R) Data Mining Tools (DMT), to be executed from a database server, such as user database server 412 of FIG. Alternatively, LIMS, DMT, and/or other applications may execute from computer 100, but some or all of the databases of those applications running thereon may be stored for public access on server 412 ( Possibly along with a database management program, such as the Oracle(R) 8.0.5 database management system from Oracle Corporation). Such network arrangements can be implemented according to known techniques using commercially available hardware and software, such as those available for a local or wide area network. Shown in Figure 4 is a local area network connecting user computer 100 to user database server 412 (and to client Internet client 410, which may be the same computer) via network cable 480. Likewise, scanner 190 (or scanners) may be made available via cable 480 to a user's network for the purpose of controlling scanner 190 and receiving data input therefrom.

再参考图2，可执行的应用程序199A以各种格式产生各种类型的数据，那些所显示的仅仅作为示例。出于方便，在此使用的术语“文件”涉及由可执行应用程序199A产生的或使用的数据，但可以使用相关领域的已知的可替换技术的任何类型的用于存储，传送，和/或操作的数据。在该图的例子中，数据分析程序210从扫描仪190接收图像数据文件212并在其中产生单元强度文件216。该例子的文件216包含由扫描仪190扫描的每个探针，表示对于那个探针由扫描仪190测量的象素强度的单一的值。这样，该值是出现在目标中的标记的mRNA的分布量的一个测量，而该目标杂交到相应的探针。许多这样的mRNA可以出现在每个探针中，作为一个探针可以包括，例如，设计的成百万的低聚核苷酸仪以检测nRNA。Referring again to FIG. 2, executable application programs 199A generate various types of data in various formats, those shown being merely examples. For convenience, the term "file" is used herein to refer to data generated or used by the executable application 199A, but any type of data for storage, transfer, and/or or manipulated data. In the example of the figure, data analysis program 210 receives image data file 212 from scanner 190 and generates element intensity file 216 therein. File 216 for this example contains for each probe scanned by scanner 190 a single value representing the pixel intensity measured by scanner 190 for that probe. Thus, this value is a measure of the distributed amount of labeled mRNA present in the target hybridized to the corresponding probe. Many such mRNAs can be present in each probe, as a probe can include, for example, millions of oligonucleotides designed to detect nRNA.

在示例的例子中，探针阵列数据分析程序210产生一个包含有关试验，抽样，和探针阵列信息的实验信息文件213，该文件通常由用户101输入。该示例的数据分析程序210的一个主要的功能是分析文件216和/或文件212，可能连同来自文件213的信息和内部库文件(未显示)，它们规定了探针和控制的序列和位置的细节。诸如该例的数据分析程序210的程序的目的通常是提供信息，比如杂交程度，绝对的和/或微分(在两个或更多的实验上)表达式，基因型比较，多形态和变种检测，以及其他分析的结果。在该例中，文件215表示这种数据分析程序210的分析输出。数据分析程序210可以处理文件215以生成报告文件214，它可以响应用户101的有关形式和内容的请求。作为本领域普通技术人员来说应该清楚的是，由示例的数据分析程序210产生的前述的和后述的文件，报告，以及数据表示仅仅是示例，可以以许多其他的方法处理，组合，安排，和/或表示描述的数据以及其他的数据。In the illustrated example, the probe array data analysis program 210 generates an experiment information file 213 , which is typically entered by the user 101 , containing information about the experiments, samples, and probe arrays. A primary function of the data analysis program 210 of this example is to analyze file 216 and/or file 212, possibly along with information from file 213 and internal library files (not shown), which specify the sequence and location of probes and controls. detail. The purpose of a program such as the data analysis program 210 of this example is generally to provide information such as degree of hybridization, absolute and/or differential (over two or more experiments) expression, genotype comparison, polymorphism and variant detection , and the results of other analyses. In this example, file 215 represents the analysis output of such data analysis program 210 . Data analysis program 210 may process file 215 to generate report file 214, which may respond to user 101 requests regarding form and content. It should be clear to those of ordinary skill in the art that the aforementioned and hereinafter described files, reports, and data representations generated by the exemplary data analysis program 210 are merely exemplary and can be processed, combined, and arranged in many other ways , and/or data representing descriptions and other data.

此外，数据分析程序210产生各种类型的曲线、图、表格和其他的表格式的和/或图形的诸如包含在文件215中的分析数据的表达式。在图10中显示了一个示例，显示了一个图形用户接口(GUI)1000，它具有分散式绘图窗口1010和表格式窗口1020。在分散绘图窗口1010中，直线1011对在不同实验中由探针设置组测量的微分式的等级提供一个参考基准。点的位置，每个点表示来自一个或多个微阵列的一个探针设置，沿着一个轴规定在一个实验或一组实验(例如，测量控制采样的实验)中探针设置的表达式的程度，和沿着其他的轴，另一个实验或另组实验中的表达式的程度等级(例如，测量疾病采样的实验)。Additionally, data analysis program 210 generates various types of curves, graphs, tables, and other tabular and/or graphical representations of the analyzed data such as contained in file 215 . An example is shown in FIG. 10 , which shows a graphical user interface (GUI) 1000 having a discrete drawing window 1010 and a tabular window 1020 . In scatter plot window 1010, line 1011 provides a reference for the levels of differentials measured by probe set-ups in different experiments. The location of points, each representing a probe setup from one or more microarrays, along an axis specifying the expression of the probe setup in an experiment or set of experiments (e.g., experiments that measure control sampling). degree, and along the other axis, the degree of expression in another experiment or set of experiments (eg, an experiment measuring disease sampling).

在图10中，用户101具有围绕着群集点1016的一个划界线1014(使用现有已知的技术)。在表格式窗口1020中，对应于窗口1010中一个点的每个探针设置被识别和被描述在一个分离的行中。在该例中，以及如在列1034中一样行输入项包括在一个特殊试验中表达等级的测量(如在列1032)一样，一个指示是否表达式在实验中不存在(A)或存在(P)。对应于点的行，即探针设置组，包围在环圈1014中的在窗口1020中被高度照亮以便用户101可以容易的识别有关选择的探针设置组的信息。此外，如在列1036一样，在窗口1020中的每行包括一个探针设置标识符。In FIG. 10, the user 101 has a demarcation line 1014 around cluster points 1016 (using prior known techniques). In tabular window 1020, each probe setting corresponding to a point in window 1010 is identified and described in a separate row. In this example, and as in column 1034, the row entry includes a measure of the level of expression in a particular trial (as in column 1032), an indication of whether the expression was absent (A) or present (P ). The row corresponding to the dots, ie, the probe setting group, enclosed in the circle 1014 is highly illuminated in the window 1020 so that the user 101 can easily identify information about the selected probe setting group. Additionally, as in column 1036, each row in window 1020 includes a probe set identifier.

例如，对应于行1021和1022的该探针设置被加亮去示出在窗口1010中已经围绕其相应的点。在列1036中对于这些行，即，”M13903_at″和″M14091_at″的输入项分别是用于其相应的探针设置的探针设置标识符。因此图10是说明了由用户101可以选择探针设置标识符的很多技术。尤其是，用户101在当前的例子中通过在窗口1010围绕的点(而在这样情况下，该选择的探针设置标识符包括围绕的点)和/或在窗口1020中通过挑选一个行进行这些选择(而在这样情况下，该选择的探针设置标识符包括在列1036中的名称)。如图2所示，探针设置标识符222表示这些或者其他的可以由用户101通过施加诸如数据分析程序210提供用于选择的探针设置标识符。此外，在这个例子的用于命名探针设置的数据分析程序210中使用的协定包括有时表示该基因的接入编号或者对应于该探针设置的EST信息。例如，在行1021中的探针设置标识名“M13903_at”表示基因的接入编号或者对应于该行是M13903的对应于该探针设置的EST。在其他的例子中，该相应的接入编号可以直接显示。用于由用户101选择的这些接入编号的设备是由在图2中的接入编号124表示的。虽然，如所说明的，接入编号可以起一种探针设置标识符的作用(因此接入编号124可以被认为是探针设置标识符222的一个子组)，为了说明和讨论方便起见，在图2中它们被清楚地示出。For example, the probe settings corresponding to rows 1021 and 1022 are highlighted to show that their corresponding points have been surrounded in window 1010 . The entries in column 1036 for these rows, ie, "M13903_at" and "M14091_at", respectively, are probe set identifiers for their corresponding probe sets. FIG. 10 thus illustrates a number of techniques by which a user 101 may select a probe setting identifier. In particular, the user 101 does this in the present example by surrounding points in window 1010 (and in this case, the selected probe setting identifier includes surrounding points) and/or by picking a row in window 1020 Select (and in this case, the selected probe set identifier includes the name in column 1036). As shown in FIG. 2 , probe set identifiers 222 represent these or other probe set identifiers that may be provided for selection by user 101 through applications such as data analysis program 210 . In addition, the convention used in the data analysis program 210 for naming probe sets of this example includes sometimes indicating the accession number of the gene or EST information corresponding to the probe set. For example, the probe set identification name "M13903_at" in row 1021 indicates the accession number of the gene or the EST corresponding to the probe set that is M13903 for that row. In other examples, the corresponding access number may be displayed directly. The facilities for these access numbers selected by the user 101 are indicated by the access numbers 124 in FIG. 2 . Although, as illustrated, the access number can function as a kind of probe set identifier (and thus the access number 124 can be considered a subset of the probe set identifier 222), for ease of illustration and discussion, They are clearly shown in FIG. 2 .

其他的可执行的应用199A，诸如数据挖掘工具220也可以提供探针设置标识符222(选择性地包括接入编号224)给用户101。另一个例子是数据库应用230，其中一个说明性的GUI在图11中表示。数据库应用230是一个用于相关探针设置的应用程序，一般地对于相应的基因或者EST通过探针设置标识符，诸如名称、号码和/或符号识别。数据库230的一个例子是来源于上述注释的Affymetrix公司的EASI数据库应用。在图11的例子中，GUI1100包括一个查询窗口1110，和一个结果窗口1120。如图11所示，按照已知的技术，通过选择一个特定的探针阵列1112和与阵列1112相关的说明文字部分1114或者任何与阵列1112相关的探针设置，用户101有效地已经产生一个查询。应用程序230实施该数据库(未示出)的搜索，并且在窗口1120中显示该查询的结果。如在下面相对于图5的数据库的说明，数据库应用程序230以及其相关的数据库的功能也可以或选择性地被包括在入口400内，使得由数据库管理512通过询问本地程序库数据库516满足该用户的查询。在两种情况下，该用户查询的结果一般地包括满足该查询的探针阵列的标识，例如阵列1122，以及探针设置标识符，例如标识符1124以及1126。如在先前举例的，被给予标识符1124的名称″AF058789_at″可以表示是对应于其标识的探针设置的基因的接入编号或者EST。用户101可以用相应的标识符1126加亮一个探针设置标识符，诸如在图11示出的。窗口1120的公认的树状结构表示通过标识符1126识别的该探针设置是安排在阵列1122上的。通过标识符1126识别的与该探针设置相关的描述性信息也被加亮，并且以与标识符1126一样的树状结构在相同的行中显示。Other executable applications 199A, such as data mining tools 220 may also provide probe setup identifier 222 (optionally including access number 224 ) to user 101 . Another example is database application 230, an illustrative GUI of which is shown in FIG. The database application 230 is an application program for associated probe sets, generally identified by probe set identifiers, such as names, numbers, and/or symbols, for corresponding genes or ESTs. An example of database 230 is the EASI database application from Affymetrix, Inc. as noted above. In the example of FIG. 11 , GUI 1100 includes a query window 1110 , and a results window 1120 . As shown in FIG. 11 , the user 101 has effectively generated a query by selecting a particular probe array 1112 and a caption portion 1114 associated with the array 1112 or any probe settings associated with the array 1112 in accordance with known techniques. . Application 230 performs a search of the database (not shown), and displays the results of the query in window 1120 . As described below with respect to the database of FIG. user's query. In both cases, the results of the user query generally include an identification of the probe array satisfying the query, such as array 1122 , and a probe set identifier, such as identifiers 1124 and 1126 . As exemplified previously, the name "AF058789_at" given to the identifier 1124 may indicate the accession number or EST that is the gene set corresponding to the probe identified therein. The user 101 may highlight a probe setting identifier with a corresponding identifier 1126, such as shown in FIG. 11 . The recognized tree structure of window 1120 shows that the probe settings identified by identifier 1126 are arranged on array 1122 . Descriptive information associated with this probe setup identified by identifier 1126 is also highlighted and displayed in the same row as identifier 1126 in a tree structure.

在图2中示出的LIM应用225也是作为一个可执行的分析应用程序199A的示范的例子。应用225可以管理由数据分析程序210(例如文件212-216)使用或者产生的文件，以及由DMT 220及其他类型的探针阵列分析应用程序产生或者使用的文件或者数据。LIM 225可以随着时间的过去存储、保持、处理以及显示由一个或多个实验者产生的这些及其他数据，去简化管理和实验计划以及就其结果提出报告。基于程序库数据库(未示出)，LIM 225也可以提供在图2中由文件217(在下面进行描述)表示的SIF信息。如上所述相对于应用程序230，文件217可以选择或者另外地由入口400存储和保持。例如，SIF信息可以存储在本地程序库数据库516中，并且由数据库管理512管理，它可以包括LIM诸如LIM 225或者合并某些或者所有其功能。The LIM application 225 is also shown in FIG. 2 as an exemplary example of an executable analysis application 199A. Application 225 may manage files used or generated by data analysis program 210 (e.g., files 212-216), as well as files or data generated or used by DMT 220 and other types of probe array analysis applications. The LIM 225 can store, maintain, process and display these and other data generated by one or more experimenters over time to simplify management and experiment planning and reporting of results. Based on a library database (not shown), LIM 225 may also provide SIF information represented in FIG. 2 by file 217 (described below). As described above with respect to applications 230 , files 217 may be selected or otherwise stored and maintained by portal 400 . For example, SIF information may be stored in local library database 516 and managed by database management 512, which may include a LIM such as LIM 225 or incorporate some or all of its functionality.

用户计算机100user computer 100

在图1中示出的用户计算机100可以是特别设计和装备的计算装置，以支持和执行探针阵列应用程序199的某些或者所有功能。计算机100也可以是现在或今后开发的任何各种类型的通用计算机，诸如个人计算机、网络服务器、工作站或者其他的计算机平台。计算机100一般地包括已知的元器件，诸如处理器105、操作系统110、图形用户界面(GUI)控制器115、系统存储器120、存储器存储设备125以及输入输出控制器130。相关领域的技术人员将明白，计算机100的元件存在许多可能的配置，未示出的某些元器件一般地可以包括在计算机100内，诸如超高速缓冲存储器、数据备份单元以及许多其他的设备。处理器105可以是商用的处理器，例如由英特尔公司制造的Pentium^处理器、由Sun微系统制造的SPARC^处理器，或者可以是可用的其他的处理器的一种。处理器105执行操作系统110，例如它可以是来源于微软公司的Windows^类型操作系统(诸如具有SP6a的Windows NT^4.0)；可以从许多卖方获得的Unix^或者Linux类型操作系统；其它的或者未来的操作系统；或者他们的某些组合。操作系统110与程序包和硬设备以公知的方式接口，并且便于处理器105协调和执行以各种编程语言编写的不同的计算机程序的功能。操作系统110一般地和处理器105协同，协调和执行计算机100的其他的组成部分的功能。操作系统110还完全按照已知的技术提供时刻表、输入输出控制、文件和数据管理、存储器管理，以及通信控制以及相关的业务。The user computer 100 shown in FIG. 1 may be a computing device specially designed and equipped to support and execute some or all of the functions of the probe array application 199 . The computer 100 may also be any various types of general-purpose computers developed now or in the future, such as personal computers, network servers, workstations or other computer platforms. Computer 100 generally includes known components such as processor 105 , operating system 110 , graphical user interface (GUI) controller 115 , system memory 120 , memory storage 125 , and input-output controller 130 . Those skilled in the relevant art will appreciate that there are many possible configurations for the elements of computer 100, and that certain components not shown may generally be included in computer 100, such as cache memory, data backup units, and many other devices. Processor 105 may be a commercially available processor, such as a Pentium ^(R) processor manufactured by Intel Corporation, a SPARC ^(R) processor manufactured by Sun Microsystems, or one of other available processors. Processor 105 executes an operating system 110, which may be, for example, a Windows ^(R) type operating system from Microsoft Corporation (such as Windows NT( ^R) 4.0 with SP6a); a Unix ^(R) or Linux type operating system available from many vendors; other or future operating systems; or some combination of them. Operating system 110 interfaces with program packages and hardware devices in a known manner, and facilitates processor 105 to coordinate and execute the functions of different computer programs written in various programming languages. The operating system 110 generally cooperates with the processor 105 to coordinate and execute the functions of the other components of the computer 100 . Operating system 110 also provides scheduling, input and output control, file and data management, memory management, and communication control and related services in full accordance with known techniques.

系统存储器120可以是任何已知的或者将来出现的存储装置设备。例如它包括任何一种通常可用的随机存取存储器(RAM)、诸如驻留数据的硬盘或者磁带的磁性介质、诸如直读式记录光盘的光学介质、或者其他的存储器存储设备。存储装置设备125可以是任何已知的或者将来出现的设备，包括高密度盘驱动器、磁带驱动器、活动硬盘驱动器、或者软盘驱动器。这种类型的存储器存储设备125一般分别从程序存储器介质(未示出)读取和/或写入，诸如高密度盘、磁带、活动硬盘或者软磁盘。所有的这些程序存储器介质，或者其它的现在在用或者稍后可能开发的可以认为是计算机程序产品。显然，这些程序存储器介质一般地存储计算机软件程序和/或数据。也称作计算机控制逻辑的计算机软件程序一般地存储在系统存储器120中和/或结合存储器存储设备125使用的该程序存储器设备中。System memory 120 may be any known or future storage device. It includes, for example, any of commonly available random access memory (RAM), magnetic media such as hard disks or magnetic tape for resident data, optical media such as compact discs for direct-read recording, or other memory storage devices. Storage device 125 may be any known or future device, including a compact disk drive, tape drive, removable hard disk drive, or floppy disk drive. Memory storage devices 125 of this type typically read from and/or write to a program memory medium (not shown), such as a compact disk, magnetic tape, removable hard disk, or floppy disk, respectively. All of these program storage media, or others now in use or which may be developed at a later date, may be considered computer program products. Obviously, these program storage media generally store computer software programs and/or data. A computer software program, also referred to as computer control logic, is typically stored in system memory 120 and/or in this program memory device used in conjunction with memory storage device 125 .

在某些实施例中，所描述计算机程序产品包括在其上存储的具有控制逻辑(计算机软件程序，包括程序代码)的计算机可用的介质。当由处理器105执行的时候，该控制逻辑使处理器105去实施在此处描述的功能。在另外一个实施例中，例如，某些功能主要是在使用硬设备状态机的硬设备中实施的。硬设备状态机的实现使得实施在此处描述的功能将对相关领域的技术人员来说是显而易见的。In some embodiments, the described computer program product comprises a computer usable medium having stored thereon control logic (computer software program, including program code). When executed by processor 105, the control logic causes processor 105 to perform the functions described herein. In another embodiment, for example, certain functions are implemented primarily in hardware using a hardware state machine. Implementation of a hardware state machine such that implementing the functionality described herein will be apparent to those skilled in the relevant arts.

输入输出控制器130可以包括用于接受和处理来自用户信息的任何种类的已知设备，无论是人工或者机械，无论本地或者远程。上述的设备包括，例如，调制调解器卡、网络接口卡、声卡或者其他类型用于任何种类已知的输入装置102的控制器。输入输出控制器130的输出控制器可以包括用于呈现信息给用户的任何种类已知的显示设备180的控制器，无论是人工或者机械，无论本地或者远程。如果一种显示设备180提供视觉信息，这种信息一般地可以是逻辑上和/或物理上组织为图象元素的阵列，图象元素往往被称为像素。图形用户界面(GUI)控制器115可以包括用于在计算机100和用户101之间提供图形输入输出接口以及用于处理用户输入的任何种类已知的或者未来的软件程序。在举例说明的实施例中，该计算机100的功能性单元通过系统总线104彼此通信。这些通信的某些在不同的实施例可以使用网络或者其他类型的远程通信来实现。The input output controller 130 may comprise any kind of known device, whether manual or mechanical, whether local or remote, for receiving and processing information from a user. Such devices include, for example, modem cards, network interface cards, sound cards, or other types of controllers for any type of known input device 102 . The output controllers of the input output controllers 130 may include any known controller of a display device 180 for presenting information to a user, whether manual or mechanical, whether locally or remotely. If a display device 180 provides visual information, such information may generally be logically and/or physically organized as an array of picture elements, often referred to as pixels. Graphical user interface (GUI) controller 115 may include any kind of known or future software program for providing a graphical input-output interface between computer 100 and user 101 and for processing user input. In the illustrated embodiment, the functional units of the computer 100 communicate with each other via a system bus 104 . Some of these communications may, in various embodiments, be accomplished using a network or other types of remote communications.

对于那些有关领域的技术人员来说，如果以软件实施，应用程序199可以经由输入装置102中的一种载入系统存储器120和/或存储器存储设备125将是显然的。应用程序199的全部或者部分也可以驻留在只读存储器或者存储器存储设备125的类似装置中，上述的设备不要求应用程序199首先经由输入装置102加载。那些本领域的技术人员将明白，为方便运行，应用程序199或者其部分可以由处理器105以已知的方式加载到系统存储器120或者超高速缓冲存储器(未示出)或者两个上述存储器中。It will be apparent to those skilled in the relevant art that if implemented in software, the application program 199 may be loaded into the system memory 120 and/or the memory storage device 125 via one of the input devices 102 . All or part of application 199 may also reside in read-only memory or the like on memory storage device 125 , which does not require application 199 to be first loaded via input device 102 . Those skilled in the art will appreciate that application program 199, or portions thereof, may be loaded by processor 105 into system memory 120 or cache memory (not shown), or both, for ease of execution in a known manner. .

获得基因组数据的传统技术Traditional techniques for obtaining genomic data

用于经因特网获得基因组数据的若干常规方法是可利用的，其中一些在由Ouelette和Bzevanis所编的书中描述，合并在上面作为参考。图3是一个表示简化的例子的功能方块图。如图3所示，用户101可以查阅任何很多公用的或者其他资料以获得接入编号224′。如手工操作312表示的，用户101通过经由任何网络浏览器进入医学和国家卫生研究所(如2001年1月可访问的因特网URL http：//www.ncbi.nlm.nih.gov)的国家图书馆的国家生物技术信息(NCBI)中心的因特网网址启动请求312。尤其是，用户101可以进入Entrez搜索与检索系统，其在NCBI从不同的数据库提供信息。这些数据库提供对于核苷酸序列、蛋白质的序列、大分子结构、整体基因组以及相关于此公布数据的信息。示例性地假定，用户101以此方式进入NCBIEntrez核苷酸数据库314，并且接收包括基因或者EST序列316的信息。特别地，如果接入编号224′表示大量(例如一百)感兴趣的EST或者基因，作为可以容易地去做探针阵列实验的分析的情况，迄今描述的操作任务可能花费很多时间，或许几小时。Several conventional methods for obtaining genomic data via the Internet are available, some of which are described in the book edited by Ouelette and Bzevanis, incorporated above by reference. Figure 3 is a functional block diagram showing a simplified example. As shown in FIG. 3, user 101 may consult any of a number of public or other materials to obtain access number 224'. As indicated by manual operation 312, user 101 accesses the National Book of Medicine and National Institute of Health (Internet URL http://www.ncbi.nlm.nih.gov as accessible as of January 2001) via any web browser. The Internet web site of the National Center for Biotechnology Information (NCBI) of the library initiates the request 312. In particular, the user 101 has access to the Entrez search and retrieval system, which provides information from various databases at NCBI. These databases provide information on nucleotide sequences, protein sequences, macromolecular structures, whole genomes, and published data related thereto. Exemplarily assume that user 101 enters NCBI Entrez nucleotide database 314 in this manner and receives information including gene or EST sequences 316 . In particular, if the accession number 224' represents a large number (e.g., a hundred) of ESTs or genes of interest, as is the case when analysis of probe array experiments can easily be done, the operational tasks described so far may take a lot of time, perhaps few Hour.

用户101一般地从序列316复制序列信息，并且通过NCBI的BLAST网页324(如2001年1月在http：//www.ncbi.nlrn.nih.gov/BLAST/可访问)粘贴这个信息进入到可访问的HTML文件之内。这个由用户启动的图3的批BLAST请求322表示的操作，如果包括许多序列，它也可能是耗费时间的和冗长的。BLAST是基本本地定位搜索工具的缩写，在该领域是众所周知的，并且由相似性搜索程序组成，使用试探式算法寻找对于两个蛋白质和DNA的序列数据库去寻找本地定位。例如，用户101可以使用″blastn″核苷酸序列数据库实施BLAST搜索。由类似的核苷酸和/或蛋白质的序列数据326表示的这个批BLAST搜索的结果对于用户101持续很多小时可能是不行的。然后用户101可以手动地或者使用各种各样的软件工具启动比较和估计332，。随后用户101可以报出报告334，以解释搜索的发现和定位策略以及对于下一步实验的要求User 101 typically copies sequence information from sequence 316 and pastes this information into available Accessed within the HTML file. This user-initiated operation, represented by batch BLAST request 322 of FIG. 3, can also be time-consuming and lengthy if it involves many sequences. BLAST, short for Basic Local Location Search Tool, is well known in the art and consists of a similarity search program that uses a heuristic algorithm to search against both protein and DNA sequence databases to find a local location. For example, user 101 may perform a BLAST search using the "blastn" nucleotide sequence database. The results of this batch of BLAST searches represented by similar nucleotide and/or protein sequence data 326 may not be feasible for user 101 for many hours. The user 101 can then initiate the comparison and estimation 332, either manually or using a variety of software tools. Then the user 101 can report a report 334 to explain the discovery and location strategy of the search and the requirements for the next step of the experiment

从用户101输入到基因组入口400Input from user 101 to genome entry 400

图4是一个示例性说明由用户101可以与基因组网入口400连接配置的功能方块图。应该会明白，图4只是简化和说明性地，在图4示出的网络和因特网连接的很多的实施和变化对于那些本领域的普通技术人员将是显然的。FIG. 4 is an exemplary functional block diagram illustrating configurations that can be connected to the GenomeNet portal 400 by the user 101 . It should be understood that FIG. 4 is simplified and illustrative only and that many implementations and variations of the network and Internet connections shown in FIG. 4 will be apparent to those of ordinary skill in the art.

用户101利用用户计算机100和如上所述的的分析应用程序199(包括产生和/或访问文件212-217的某些或者全部)。如图4所示，在这个例子中，将文件212-217保持在用户数据库服务器412上，用户计算机100经网络电缆480耦合到用户数据库服务器412。计算机100′、100″以及在局域网或者包括企业内部网、因特网或者任何其他网络的广域网中的其他用户的计算机也可以经电缆480耦合到服务器412。User 101 utilizes user computer 100 and analysis application 199 as described above (including generating and/or accessing some or all of files 212-217). As shown in FIG. 4 , in this example, files 212 - 217 are maintained on user database server 412 to which user computer 100 is coupled via network cable 480 . Computers 100', 100", and other users' computers on a local area network or a wide area network including an intranet, the Internet, or any other network may also be coupled to server 412 via cable 480.

应该明白电缆400仅仅代表任何类型的网络连通性，它可以包括电缆、发射机、中继站、网络服务器以及许多未示出但是对于那些相关领域普通技术人员是显然的其他的组成部分。经用户计算机100，用户101可以操作由用户端因特网客户410提供的网浏览器去通过因特网499与入口400通信。入口400可以是类似于经因特网499与其他的用户和/或用户的网络通信，如由因特网客户410′和410″表示的。It should be understood that cable 400 is merely representative of any type of network connectivity, which may include cables, transmitters, repeaters, network servers, and many other components not shown but apparent to those of ordinary skill in the relevant arts. Via the user computer 100, the user 101 can operate a web browser provided by the client Internet client 410 to communicate with the portal 400 through the Internet 499. Portal 400 may be similar to other users and/or network communications of users via Internet 499, as represented by Internet clients 410' and 410".

如前所述，由用户101提供给入口400的信息一般包括一个或多个″探针设置标识符″。这些探针设置标识符一般地作为在探针阵列上实施的实验的结果以引起用户101的注意。例如，用户101可以选择那些能够允许从相应的特别感兴趣的基因或者EST的检测mRNA转录表示的标识微阵列探针的探针设置标识符。正如本领域中众所周知的，一个EST是不能充分地表征基因序列的片段，然而一个基因序列通常是完全和充分地表征的。该词″基因″在此处通常用于涉及基因的已知序列的全部大小，以及涉及可计算推算的基因。在某些实施中，由该阵列检测的代表这些基因或者EST具体的序列可以被称为″序列信息片段(SIF)″，并且如上所述相对于LIMS 225操作可以记录在″SIF文件″中。在特定的实施中，一个SIF已经认为是较好地代表来自给定基因或者EST的mRNA转录的交感序列的一部分。该交感序列可能是通过比较和分组EST取得的，并且也可能通过比较EST与基因组序列信息取得的。一个SIF是在该阵列上具体地设计用于探针的交感序列的一部分。相对于网入口400的操作，假定某些微阵列探针设置可以设计成基于EST序列能检测基因的表达式。As previously described, the information provided to portal 400 by user 101 typically includes one or more "probe set identifiers." These probe set identifiers are typically brought to the attention of the user 101 as a result of experiments performed on the probe array. For example, the user 101 may select those probe set identifiers that enable identification of microarray probes that allow detection of mRNA transcript representations from corresponding genes or ESTs of particular interest. As is well known in the art, an EST is a fragment that does not fully characterize a gene sequence, whereas a gene sequence is usually completely and adequately characterized. The word "gene" is used here generally to refer to the full size of the known sequence of a gene, as well as to a gene that can be deduced computationally. In certain implementations, the sequences detected by the array representing the specificity of these genes or ESTs can be referred to as "Sequence Information Fragments (SIF)" and can be recorded in "SIF files" as described above with respect to LIMS 225 operations. In particular implementations, a SIF has been considered to better represent the portion of the sympathetic sequence transcribed from an mRNA of a given gene or EST. The consensus sequence may be obtained by comparing and grouping ESTs, and may also be obtained by comparing ESTs with genomic sequence information. A SIF is part of the sympathetic sequence specifically designed for probes on the array. With respect to the operation of web portal 400, it is assumed that certain microarray probe arrangements can be designed to detect the expression of genes based on EST sequences.

如上所述，术语″探针设置″泛指来自在微阵列上的一排探针的一个或多个探针。例如，在一个Affymetrix^GeneChip^探针阵列中，其中探针是在衬底上人工合成的，探针设置可以由30或者40个探针组成，一般地其中一半被控制。这些探针共同的或者以它们某些或者全部不同的组合被认为是表示基因或者EST的表达式。在定点探针阵列中，一个或多个点可以同样地构成一个″探针设置″。As noted above, the term "probe set" generally refers to one or more probes from an array of probes on a microarray. For example, in an Affymetrix ^(R) GeneChip ^(R) probe array, where probes are synthesized on a substrate, the probe set may consist of 30 or 40 probes, typically half of which are controlled. These probes collectively or in different combinations of some or all of them are considered to represent the expression of a gene or EST. In a site-directed probe array, one or more spots may likewise constitute a "probe set".

该术语″探针设置标识符″被广泛地在此处使用，其中很多类型的这种标识符可能和将要被包括在这个术语的含义内。探针设置标识符的一种类型是名称、号码或者其他的分配给识别探针设置的目的的符号。这个名称、号码或符号例如可以是由该探针阵列的制造商任意地分配给探针的设置。例如用户可以通过加亮或者键入该名称选择这个类型的探针设置标识符。作为在此处想要的另一种类型的探针设置是图形表示的探针设置。例如，可以在分散绘图或者其他的示意图上显示的那些点，其中每个点代表一个探针设置。The term "probe set identifier" is used broadly herein, where many types of such identifiers can and will be included within the meaning of this term. One type of probe set identifier is a name, number, or other symbol assigned for the purpose of identifying a probe set. This name, number or symbol can be, for example, arbitrarily assigned to the probes by the manufacturer of the probe array. For example, the user may select a probe set identifier of this type by highlighting or typing the name. Another type of probe setup that is contemplated here is a graphically represented probe setup. For example, those points can be displayed on a scatterplot or other schematic, where each point represents a probe setting.

典型地，在图上该点的位置表示在一个或多个实验中来自混合、标记的、目标(在下面更详细描述)的、信号的强度。这样的话，用户通过敲击、画一个围绕的环，或者选择一个或多个点可以选择一个探针设置标识符。在与数据分析程序210的操作结合，并且更具体地说，与相对于用户101画围绕在散绘图上的绘环1014，和/或选择与加亮行1021或者1022有关的名称或者接入编号结合来提供上述选择的例子。其他的例子在上面相对于由用户101在数据库中选择的行1126提供，该数据库以接入编号和其他的基因组信息来相关探针设置。Typically, the location of the point on the graph represents the strength of the signal from the mixed, labeled, target (described in more detail below) in one or more experiments. In this manner, the user may select a probe setting identifier by tapping, drawing a circle around it, or selecting one or more points. In conjunction with the operation of the data analysis program 210, and more specifically, with respect to the user 101 drawing the circle 1014 around the scatter plot, and/or selecting the name or access number associated with the highlighted row 1021 or 1022 Examples of the above options are provided in combination. Other examples are provided above with respect to row 1126 selected by user 101 in a database that correlates probe sets with access number and other genomic information.

作为在此处使用的术语，另一个类型的探针设置标识符包括核苷酸序列。例如，说明性地假定特定的SIF是500碱基的单一顺序，其是共同序列的一部分或者从EST和/或基因组序列信息收集的标本序列。进一步假定一个或多个探针设置被设计成能代表该SIF。因此规定500碱基序列全部或者一部分的用户可以认为已经具有对应的探针设置的全部或者某些。作为进一步的例子，用户可以规定500碱基顺序的一部分，它可以是SIF唯一的，或者也可以标识另一个SIF、EST、EST的群、交感序列和/或基因分组。在那种情况下，该用户已经对于一个或多个基因或者EST规定探针设置标识符。在另一个变化中，说明性地假定特定的SIF是特定的交感序列的一部分。进一步假定用户规定交感序列的一部分是未包括在该SIF内，而对将要表示的交感序列或者基因或者EST的交感序列是唯一的。在那种情况下，即使用户规定的序列未包括在该SIF内，由用户规定的该序列是对应于该SIF标识该探针设置的探针设置标识符。作为那些有关领域的技术人员现在将会理解，相对于EST和基因或者EST的部分序列的用户说明要求并联的情况是可能的。Another type of probe set identifier, as the term is used herein, includes nucleotide sequences. For example, it is illustratively assumed that a particular SIF is a single sequence of 500 bases that is part of a consensus sequence or specimen sequence collected from EST and/or genomic sequence information. Assume further that one or more probe settings are designed to be representative of the SIF. Therefore, a user specifying all or part of a 500-base sequence can be considered to already have all or some of the corresponding probe sets. As a further example, a user may specify a portion of a 500-base sequence, which may be unique to a SIF, or which may also identify another SIF, EST, group of ESTs, consensus sequences, and/or groupings of genes. In that case, the user has specified probe set identifiers for one or more genes or ESTs. In another variation, it is illustratively assumed that a particular SIF is part of a particular sympathetic sequence. Assume further that the user specifies that a portion of the sympathetic sequence is not included in the SIF, but is unique to the sympathetic sequence or gene or EST's sympathetic sequence to be represented. In that case, even if the user-specified sequence is not included in the SIF, the sequence specified by the user is the probe set identifier corresponding to the SIF identifying the probe set. As those skilled in the relevant art will now understand, it is possible that the user specification requires parallelization with respect to ESTs and partial sequences of genes or ESTs.

探针设置标识符的另一个例子是基因或者EST的一个接入编号。基因和EST接入编号是公开可利用的。因此一个探针设置可以通过接入编号或者一个或多个EST和/或对应于该探针设置的基因的号码识别。在探针设置和EST的或者基因之间的一致性可以在适当的数据库中保持，诸如由数据库应用程序230或者本地程序库数据库516访问的，其中该一致性可以提供给用户。同样地，对使用其公开可利用的接入编号作为探针设置标识符的目的来说，除EST以外的基因片段或者序列可以被映射(例如，通过查阅适当的数据库)给相应的基因或者EST。例如，用户可以对与特定的SIF相关的产品或者基因组信息感兴趣，特定的SIF源自于EST-1和EST-2。该用户可以装备有在SIF(或者SIF序列的部分或者全部)和EST-1或者EST-2或者双方之间的一致性。为了获得与该SIF相关的产品或者基因组数据，或者它的部分序列，该用户可以选择EST-1、EST-2或者两者的接入编号。Another example of a probe set identifier is an accession number for a gene or EST. Gene and EST accession numbers are publicly available. A probe set can thus be identified by an accession number or the number of one or more ESTs and/or genes corresponding to the probe set. Concordance between probe sets and EST's or genes can be maintained in an appropriate database, such as accessed by database application 230 or native library database 516, where the concordance can be provided to the user. Likewise, gene fragments or sequences other than ESTs can be mapped (e.g., by consulting appropriate databases) to the corresponding genes or ESTs for the purpose of using their publicly available accession numbers as probe set identifiers. . For example, a user may be interested in product or genome information related to a particular SIF derived from EST-1 and EST-2. The user may be equipped with agreement between the SIF (or part or all of the SIF sequence) and EST-1 or EST-2 or both. In order to obtain the product or genome data related to the SIF, or its partial sequence, the user can select the access number of EST-1, EST-2 or both.

基因组网入口400Genome Net Entry 400

基因组网入口400提供给用户101与一个或多个基因或者EST相关的数据。每个基因或者EST具有至少一个对应的通过探针设置标识符识别的探针设置，正如所述的，作为说明性的和非限制性的例子，该探针标识符可以是号码、名称、接入编号、符号、图形表示(例如点或加亮的列表的条目)、或者核苷酸序列。该相应的探针设置能够允许检测其对应的基因的表达式。响应用户选择的一个或多个探针设置标识符，入口400给用户101提供基因组信息和/或关于生物制品的信息。这个信息可以有助于用户101解析实验的结果，以及设计或者实施后续的实验。The GenomeNet portal 400 provides data associated with one or more genes or ESTs to a user 101 . Each gene or EST has at least one corresponding probe set identified by a probe set identifier which, as stated, may be, as an illustrative and non-limiting example, a number, name, connection Enter numbers, symbols, graphical representations (such as dots or highlighted list entries), or nucleotide sequences. This corresponding probe set can allow detection of the expression of its corresponding gene. Portal 400 provides genomic information and/or information about biological products to user 101 in response to user selection of one or more probe set identifiers. This information can assist user 101 in interpreting the results of experiments, and in designing or conducting subsequent experiments.

图5是入口400的许多可能的实施例的一个的功能方块图。在这个例子中，入口400具有包括三个计算机平台的硬件组成部分：数据库服务器510、因特网服务器530以及应用服务器520。入口400的不同的功能单元，诸如数据库管理器512、输入和输出管理器532和534，以及用户服务管理器522在这些计算机平台上执行其操作。即，在一个典型的实施中，管理器512、532、534以及522的功能是通过软件应用的执行以及通过由服务器510、530以及520代表的计算机平台来执行的。入口400首先相对于其计算机平台描述，然后相对于其功能单元描述。FIG. 5 is a functional block diagram of one of many possible embodiments of portal 400 . In this example, portal 400 has hardware components comprising three computer platforms: database server 510 , Internet server 530 , and application server 520 . The various functional units of portal 400, such as database manager 512, input and output managers 532 and 534, and user services manager 522 perform their operations on these computer platforms. That is, in a typical implementation, the functions of managers 512 , 532 , 534 and 522 are performed by the execution of software applications and by a computer platform represented by servers 510 , 530 and 520 . Portal 400 is described first with respect to its computer platform and then with respect to its functional units.

虽然它们典型地属于通常被称为服务器的计算机类别，服务器510、520以及530的每一个可以是任何类型的已知的计算机平台或者在未来将开发的类型。但是，它们也可以是主机、工作站或者其他的计算机类型。它们通过任何已知的或者未来的电缆类型或者其他的通信系统连接，双方联网或者不联网。它们可以是相互定位或者它们实际上可以是分离的。根据类型和/或所选定的计算机平台构成，在任何计算机平台上可以采用不同的操作系统。合恰的操作系统包括Windows NT^、Sun Solaris、Linux、OS/400、康柏Tru64、Unix、SGIIRIX、西门子Reliant Unix等等。Each of servers 510, 520, and 530 may be any type of computer platform known or to be developed in the future, although they typically belong to the class of computers commonly referred to as servers. However, they can also be mainframes, workstations, or other computer types. They are connected by any known or future cable type or other communication system, both networked or not. They may be positioned relative to each other or they may actually be separate. Different operating systems may be used on any computer platform depending on the type and/or selected computer platform configuration. Suitable operating systems include Windows NT ^(R) , Sun Solaris, Linux, OS/400, Compaq Tru64, Unix, SGIIRIX, Siemens Reliant Unix, and the like.

以此方式在多个计算机平台上执行入口400的功能存在很大的优点，诸如低成本调配、数据库转换或者转换为企业应用程序，和/或更有效的防火墙。但是，其他的配置也是可能的。例如，对于那些相关领域的普通的技术人员是众所周知，除了由图5表示的三层服务器端组成部分之外，所谓的双重的或者N层结构是可能的。例如，参见E.Roman的Mastering Enterprise JavaBeans^TM和Java^TM2平台(John Wiley&Sons公司，NY，1999)和J.Schneider以及R.Arora的Using EnterpriseJava^TM(Que公司，Indianapolis 1997)，为了通用的目的在其整体中在此合并参考其中两者。There are significant advantages to implementing the functionality of portal 400 on multiple computer platforms in this manner, such as low cost provisioning, database conversion or conversion to enterprise applications, and/or more efficient firewalls. However, other configurations are also possible. For example, it is well known to those of ordinary skill in the relevant art that in addition to the three-tier server-side components represented by FIG. 5, so-called dual or N-tier structures are possible. See, for example, E. Roman, Mastering Enterprise JavaBeans ^™ and the Java ^™ 2 Platform (John Wiley & Sons Company, NY, 1999) and J. Schneider and R. Arora, Using EnterpriseJava ^™ (Que Company, Indianapolis 1997), for general purposes at Both of these are incorporated herein by reference in their entirety.

很清楚未在图5中示出的用于因特网商务的许多硬件和相关的软件或者程序包组成部分可以在服务器端结构中实施。对于实施一个或多个防火墙的组成部分去保护数据和应用程序、不间断电源供给、局域网交换机、网络服务器路由软件以及许多其他的组成部分都未示出。同样地，通常包括在服务器类别计算平台中的各种计算机组成部分和其他的类型计算机将被包括但是未示出。例如，这些组成部分包括处理器、存储单元、输入/输出设备、总线及其上面说明的与用户计算机103有关的组成部分。那些本领域普通的技术人员将会容易地理解如何实现这些及其他常规的组成部分。It is clear that many hardware and related software or program package components for Internet commerce not shown in FIG. 5 can be implemented in a server-side architecture. Not shown are components that implement one or more firewalls to protect data and applications, uninterruptible power supplies, LAN switches, web server routing software, and many others. Likewise, various computer components commonly included in server class computing platforms and other types of computers would be included but not shown. These components include, for example, processors, storage units, input/output devices, buses, and the components described above in relation to the user computer 103 . Those of ordinary skill in the art will readily understand how to implement these and other conventional components.

入口400的功能单元也可以按照各种软件提供商和平台实现(虽然不排除入口400的某些或者全部的功能也可以以硬件或者程序包实现)。在各种各样的商用产品之中可利用用于实现电子商务网入口的产品是来自BEA系统的BEA WebLogic，它是所谓的″中间件″应用程序。这些及其他中间件应用程序有时被称为″应用服务器″，但是不要与应用服务器520混淆，应用服务器520是一个计算机。这些中间件应用程序的功能通常是将辅助其他的软件单元(诸如管理器512、522或者532)去共享资源和协调行为。该目标包括使写入、保持以及改变该软件单元更容易，以避免数据阻塞，并且防止系统死机或者从系统故障中恢复。因此，这些中间件应用程序可以提供加载平衡、失败经过以及故障容忍度，有关领域的普通的那些技术人员将理解所有的这些特征。The functional units of the portal 400 can also be implemented according to various software providers and platforms (although some or all of the functions of the portal 400 can also be implemented by hardware or program packages). Among the various commercial products available for implementing e-commerce portals is BEA WebLogic from BEA Systems, which is a so-called "middleware" application. These and other middleware applications are sometimes referred to as "application servers", but are not to be confused with application server 520, which is a computer. The function of these middleware applications is usually to assist other software units (such as managers 512, 522 or 532) to share resources and coordinate behavior. The goals include making it easier to write, maintain, and change the software unit to avoid data blocking, and to prevent system freezes or recovery from system failures. Thus, these middleware applications can provide load balancing, failover, and fault tolerance, all of which features will be understood by those of ordinary skill in the relevant art.

其他的开发产品，诸如来自Sun微系统公司的Java^TM2平台可以在入口400中采用以提供一套应用编程接口(API)，尤其是提高实施可升级的和安全的组成部分。来源于Sun微系统的被称为J2EE(Java^TM2企业版)的平台被配置用于随企业JavaBeans使用。企业JavaBeans使用以Java语言编写的分布式目标应用程序简化服务器端组成部分的结构。因此，在一个实施中，入口400的功能单元可以以Java编写，并且使用J2EE和企业JavaBeans实现。如由那些本领域普通的技术人员理解，各种各样的其他的软件开发方法或者结构可以用来实现入口400的功能单元以及其相互连接。Other development products, such as the Java( ^TM) 2 platform from Sun Microsystems, can be employed in portal 400 to provide a set of application programming interfaces (APIs), especially to improve implementation scalability and secure components. A platform called J2EE (Java ^(TM) 2 Enterprise Edition) from Sun Microsystems is configured for use with Enterprise JavaBeans. Enterprise JavaBeans simplifies the structure of server-side components using distributed object applications written in the Java language. Thus, in one implementation, the functional units of portal 400 may be written in Java and implemented using J2EE and Enterprise JavaBeans. As will be appreciated by those of ordinary skill in the art, various other software development methods or structures may be used to implement the functional units of portal 400 and their interconnections.

这些平台和组成部分的一个实施在图6中示出。图6是一个简化的图形，说明在用户方上用户端因特网客户410和在入口端上因特网服务器530的输入和输出管理器532和534之间的交互作用，以及在入口400的三层(服务器510、520以及530)之中的通信。在客户410上的浏览器605向服务器530发送和从服务器530接收HTML文献620。HTML文献625包括applet 627。在用户计算机103上运行的浏览器605提供一个用于applet 627的运行时间容器。在服务器530上的管理器532和534的功能，诸如GUI操作的实现可以随Java^TM平台操作通过servlet和/或JSP 640实现。在服务器530上执行的servlet引擎提供一个用于servlet 640的运行时间容器。来自Sun微系统公司的JSP(Java服务器主页)是一个用于GUI操作的文字类环境，一种备选方案是来自微软公司的ASP(活动服务器主页)。App服务器650是在上面被称为中间件的产品，并且在应用服务器520上执行。EJB(企业JavaBeans^TM)是一种规定用于企业beans结构的标准，它是应用程序组件。类似地，CORBA(通用的对象请示代理软件结构)是一种用于分布式目标系统的标准，即，由CORBA标准是通过CORBA依次的产品诸如Java^TM IDL来实现的。一种EJB依从的产品的例子在上面被称为WebLogic。对于那些有关领域的技术人员来说，用于因特网入口和其与客户通信的标准、平台、组成部分及其他单元的实施的更详细的资料是为大家所熟知的。One implementation of these platforms and components is shown in FIG. 6 . Figure 6 is a simplified diagram illustrating the interaction between the client Internet client 410 on the user side and the input and output managers 532 and 534 of the Internet server 530 on the portal side, and the three layers of the portal 400 (server 510, 520, and 530). Browser 605 on client 410 sends and receives HTML documents 620 to and from server 530 . HTML document 625 includes applet 627 . Browser 605 running on user computer 103 provides a runtime container for applet 627 . Functions of managers 532 and 534 on server 530, such as GUI operations, may be implemented via servlets and/or JSP 640 along with Java ^™ platform operations. The servlet engine executing on server 530 provides a runtime container for servlet 640 . JSP (Java Server Home Page) from Sun Microsystems is a text-like environment for GUI operations, an alternative is ASP (Active Server Home Page) from Microsoft. App server 650 is a product referred to above as middleware, and executes on application server 520 . EJB (Enterprise JavaBeans ^™ ) is a standard specifying the structure for enterprise beans, which are application components. Similarly, CORBA (Common Object Request Broker Architecture) is a standard for distributed object systems, ie, the CORBA standard is implemented by CORBA in turn products such as Java ^(TM) IDL. An example of an EJB compliant product is referred to above as WebLogic. Further details of the implementation of standards, platforms, components and other elements for Internet access and its communication with clients are well known to those skilled in the relevant art.

如上所述，入口400的一个功能单元是输入管理器532。管理器532从用户101经因特网499接收一组，即一个或多个探针设置标识符。管理器532处理和转发这些信息给用户业务管理器522。这些功能被按照已知的为因特网服务器的操作所共用的技术实施，也通常以类似的文字引用介绍该服务器。入口400的另一个功能单元是输出管理器534。也按照那些已知的方法，管理器534经因特网499提供由用户业务管理器522组合的信息给用户101，其中一个方面相对于图6描述如上。由管理器522组合的信息在图5中表示为数据524，标记为″响应用户请求综合的基因组和/或产品网页″。在一定意义上，该数据尤其是至少部分地基于该数据被集成在由用户101的探针设置标识符的技术规范上，因此对应于那些标识符该基因和/或EST具有共享的关系。由管理器534代表的数据524可以按照各种已知的方法实现。作为某些例子，数据524可以包括HTML或者XML文献、电子邮件或者其他文件、或者其他形式的数据。该数据可以包括因特网URL地址，使得用户101可以从远端源取回附加的HTML、XML或者其他的文献或者数据。As mentioned above, one functional unit of the portal 400 is the input manager 532 . Manager 532 receives a set, ie, one or more probe set identifiers, from user 101 via Internet 499 . Manager 532 processes and forwards these messages to subscriber service manager 522 . These functions are performed according to techniques known and common to the operation of Internet servers, which servers are also generally described with similar textual references. Another functional unit of portal 400 is output manager 534 . Also in accordance with those known methods, the manager 534 provides the information assembled by the subscriber service manager 522 to the subscriber 101 via the Internet 499, one aspect of which is described above with respect to FIG. The information assembled by manager 522 is represented in Figure 5 as data 524, labeled "Integrated Genome and/or Product Web Pages in Response to User Request". The data is inter alia in the sense that the data is integrated at least in part on the specification of identifiers set by the probes of the user 101 , thus corresponding to those identifiers the genes and/or ESTs have a shared relationship. Data 524 represented by manager 534 can be implemented in various known ways. As some examples, data 524 may include HTML or XML documents, email or other files, or other forms of data. This data may include Internet URL addresses so that user 101 may retrieve additional HTML, XML or other documents or data from remote sources.

入口400进一步包括数据库管理器512。在举例说明的实施例中，数据库管理器512协调来自或者到任何本地数据库511、513、514、516以及518的数据的存储、维护、补充等等其他的传输。管理器512可以和适当的数据库应用程序，诸如Oracle 8.0.5数据库管理系统合作实现这些功能。Portal 400 further includes database manager 512 . In the illustrated embodiment, database manager 512 coordinates the storage, maintenance, replenishment, and other transfer of data from and to any local databases 511 , 513 , 514 , 516 , and 518 . Manager 512 may cooperate with an appropriate database application program, such as the Oracle 8.0.5 database management system, to perform these functions.

在某些实施中，管理器512周期性地更新本地基因组数据库518。在数据库518中的数据更新包括与一个或多个探针设置相应的基因或者EST相关的数据。该探针设置可以在任何微阵列产品上使用或者计划使用，和/或期望或者计划在任何制造商或者研究人员的微阵列产品中使用。例如，该探针设置可以包括在来自Affymetrix公司库存的GeneChip^探针阵列上人工合成的所有的探针设置，包括其Arabidopsis基因组阵列、CYP450阵列、果蝇基因组阵列、大肠杆菌基因组阵列、GenFlex^TM标记阵列、HIV PRT Plus阵列、HuGeneFL阵列、人类基因组U95组、HuSNP探针阵列、鼠科的基因组U74组、P53探针阵列、老鼠基因组U34组、老鼠神经生物学U34组、老鼠毒物学U34阵列或者酵母基因组S98阵列。该探针设置也可以包括那些在常规阵列上用于用户101或者其它的人工合成的。但是，在数据库518中更新的数据没必要如此限制。而是，其可以涉及许多基因或者EST。可以存储在数据库518的数据的类型相对于管理器522的操作描述如下，直接定期从远端源采集这些数据，在数据库518中提供在本地保持的数据给用户。In some implementations, the manager 512 periodically updates the local genome database 518 . Data updates in database 518 include data associated with genes or ESTs corresponding to one or more probe sets. The probe set can be used or planned for use on any microarray product, and/or desired or planned for use in any manufacturer's or researcher's microarray product. For example, the probe set can include all probe sets artificially synthesized on the GeneChip ^(R) probe array from Affymetrix company inventory, including its Arabidopsis genome array, CYP450 array, Drosophila genome array, E. coli genome array, GenFlex ^™ Marker Array, HIV PRT Plus Array, HuGeneFL Array, Human Genome U95 Panel, HuSNP Probe Array, Murine Genome U74 Panel, P53 Probe Array, Mouse Genome U34 Panel, Mouse Neurobiology Panel U34, Mouse Toxicology U34 Array Or yeast genome S98 array. The probe set may also include those for user 101 or other artificial synthesis on conventional arrays. However, the data updated in database 518 need not be so limited. Rather, it can involve many genes or ESTs. The types of data that may be stored in database 518 are described below with respect to the operation of manager 522, which is collected directly from remote sources on a regular basis and provided locally held data in database 518 to users.

数据库516包括在上面相对于数据库应用程序230所引用的数据类型，即，与其相应的基因或者EST以及其标识符相关的数据。数据库516也包括SIF及其他程序库数据。用户业务管理器522有时将相对于程序库及其他数据更新的信息提供数据库管理器512。有时候，虽然这些信息也可以被公开进行利用，作为在网络站点上用于加载，但这些更新信息将由专有信息的拥有者或者管理者提供。Database 516 includes data of the type referenced above with respect to database application 230, ie, data associated with its corresponding gene or EST and its identifier. Database 516 also includes SIF and other library data. From time to time, the user service manager 522 provides the database manager 512 with updated information with respect to program libraries and other data. Occasionally, such updated information will be provided by the owner or administrator of the proprietary information, although such information may also be publicly available for upload on a web site.

在本地产品数据库514中由管理器512存储的信息可以同样地由卖方、销售者或者代理商提供或者从公共资源诸如网络站点中获得。各式各样的相关产品信息可以包括在数据库514中，其中例如包括实用性、价格、成分、适宜性或者订购数据。该信息可以涉及各式各样的产品，包括所有的类型的生物设备或者物质，或者所有的类型的可以用于生物设备或者物质的试剂。只提供几个例子，如该设备、物质或者试剂可以是一种低聚核苷酸、探针阵列、克隆、抗体或者蛋白质。存储在数据库514中的数据也可以包括链接，诸如因特网URL地址，到产品数据可利用的远端地址，诸如卖方网址。The information stored by manager 512 in local product database 514 may likewise be provided by vendors, distributors or agents or obtained from public sources such as web sites. A wide variety of related product information may be included in the database 514 including, for example, availability, price, composition, suitability, or ordering data. The information may relate to a wide variety of products, including all types of biological devices or substances, or all types of reagents that may be used in biological devices or substances. To name a few examples, the device, substance or reagent can be an oligonucleotide, probe array, clone, antibody or protein. Data stored in database 514 may also include links, such as Internet URL addresses, to remote locations where product data is available, such as vendor websites.

数据库511包括与探针的序列有关的探针设置标识符的信息。这些信息可以由探针的制造商、设计探针用于定点阵列或者其他的常规阵列的研究人员或者其它人来提供。此外，入口400的应用不局限于以阵列形式排列的探针。如所述的，探针可以固定在小珠、光纤或者其他的衬底或者介质上或者之中。因此，数据库511可能也包括考虑这些探针序列的信息。The database 511 includes information of probe set identifiers related to the sequences of the probes. This information can be provided by the manufacturer of the probe, the researcher designing the probe for use in a fixed-point array or other conventional array, or others. Furthermore, the application of the portal 400 is not limited to probes arranged in an array. As noted, the probes may be immobilized on or in beads, optical fibers, or other substrates or media. Therefore, database 511 may also include information considering these probe sequences.

数据库519包括用户和它们用于和或者经过入口400进行商务的帐户的信息。可以从用户获得任何种类的帐户信息，诸如当前的订单、过去的订单等等，所有的一切对于那些普通的本领域技术人将是容易地显而易见的。同时，按照已知的在电子商务中使用的方法，与用户相关的信息可以通过记录和/或解析用户与入口400的交互作用来研究。例如，用户业务管理器522可能注意到用户兴趣的基因组区域，它们的购买或者产品查询行为，其各种各样的业务的访问频率等等，并且将这个信息提供给数据库管理器512，用于在数据库519中存储或者更新。Database 519 includes information about users and the accounts they use to conduct business with or through portal 400 . Any kind of account information may be obtained from the user, such as current orders, past orders, etc., all of which would be readily apparent to those of ordinary skill in the art. Meanwhile, information related to the user can be studied by recording and/or analyzing the interaction of the user with the portal 400, according to known methods used in electronic commerce. For example, user business manager 522 may note the genomic regions of user interest, their purchasing or product query behavior, their frequency of visits to various businesses, etc., and provide this information to database manager 512 for use in Stored or updated in the database 519.

入口400的另一个功能单元是用户业务管理器522。管理器522可以周期性地使得数据库管理器512去从各种各样的信源，诸如远程数据库402更新本地基因组数据库518。例如，按照任意的按年代先后的时刻表(例如，每天每周等等)，根据已知的方法，管理器522通过制订适当的查询可以启动搜索远程数据库402，寻址各种各样的数据库402的URL，或者通过其他的传统方法用于通过因特网实施数据搜索和/或检索数据或者文献。这些搜索查询和相应的地址可以以已知的方式提供给输出管理器534用于出示给数据库402。输入管理器532接收对于查询的答复，并且提供它们给管理器522，然后提供它们给数据库管理器512，用于更新数据库518，所有这些全部根据各种已知的方法用于管理信息流向、来自以及在因特网站点内，。Another functional unit of the portal 400 is the user service manager 522 . Manager 522 may periodically cause database manager 512 to update local genome database 518 from various sources, such as remote database 402 . For example, on any chronological schedule (e.g., daily, weekly, etc.), the manager 522 can initiate a search of the remote database 402 by formulating appropriate queries, addressing various databases, according to known methods 402, or by other conventional methods for conducting data searches and/or retrieving data or documents over the Internet. These search queries and corresponding addresses may be provided to output manager 534 for presentation to database 402 in a known manner. Input manager 532 receives replies to queries and provides them to manager 522, which in turn provides them to database manager 512 for updating database 518, all according to various known methods for managing information flow, from and within the Internet site, .

入口应用程序管理器526管理入口400的行政方面，可能利用中间件产品诸如应用服务器产品的辅助。所描述的这些行政任务的一个可以是发布定期的指令给管理器522去启动数据库518的定期更新。做为选择，管理器522可以自动启动这个任务。按照相同的周期时刻表不需要在数据库518中的所有数据被更新。而是，按照不同的时刻表，一般是对于不同类型数据和/或来自不同的信源的数据更新。此外，这些时刻表可以改变，并且无须按照一致的时刻。即，对于特定的数据的更新可以在一天以后出现，然后在二天以后再次更新，其次以不同的周期可以继续去变化。很多因素可以影响经管理器526或者管理器522的确定去保持或者变化这些周期，诸如来自各种各样的远程数据库402的响应时间，在那些数据库中信息的值和/或时间性，与访问相关的成本考虑或者该数据库的许可，必须访问的信息数量等等。Portal application manager 526 manages the administrative aspects of portal 400, possibly with the assistance of middleware products such as application server products. One of these administrative tasks described may be issuing periodic instructions to manager 522 to initiate regular updates of database 518 . Alternatively, manager 522 may automatically initiate this task. Not all data in database 518 is updated according to the same periodic schedule. Rather, data updates are generally for different types of data and/or from different sources, according to different schedules. Furthermore, these timetables can change and do not have to follow consistent times. That is, an update for specific data may occur one day later, and then be updated again two days later, and then continue to change in different cycles. Many factors can affect the determination by manager 526 or manager 522 to maintain or vary these periods, such as response times from the various remote databases 402, the value and/or timeliness of information in those databases, and access The associated cost considerations or licensing of the database, the amount of information that must be accessed, etc.

在某些实施中，管理器522从在本地基因组数据库518中的数据构成一组与对应于由用户101选择的探针设置标识符组基因或者EST相关的数据。该用户选择可以由输入管理器532按照已知的方法转发给管理器522。同样按照已知的方法，基于该用户选择，管理器522通过形成适当的查询诸如一种SQL语言从数据库518获得数据。然后管理器522转发该查询给数据库管理器512以相对于数据库518来执行。In some implementations, the manager 522 constructs from data in the local genome database 518 a set of data associated with the set of genes or ESTs corresponding to the probe set identifier selected by the user 101 . The user selection may be forwarded by input manager 532 to manager 522 in known manner. Based on the user selections, manager 522 obtains data from database 518 by formulating appropriate queries, such as a SQL language, also according to known methods. Manager 522 then forwards the query to database manager 512 for execution against database 518 .

如所述的，以此方式可以从远程数据库402访问各种类型的数据，并且保持在本地基因组数据库518中。例子包括序列数据、外来(exonic)结构或者定位数据、拼接变量数据、标记结构或者定位数据、多形态数据、同族数据、蛋白质同族分类数据、路径数据、可替换的基因名称数据、文学列举数据以及注释数据。也可能有许多其他的例子。同样，目前不是可用的，而在未来变得可用的基因组数据可以访问以及如在此处描述的在本地保持。目前适用于以描述的方式访问的远程数据库402的例子包括GenBank，GenBank New，SwissProt，GenPept，DB EST，Unigene，PIR，Prosite，PPAM，Prodom，Blocks，PDB，PDBfinder，EC Enzyme，Kegg Pathway，Kegg Ligand，OMIM，OMIM Map，OMIM Allele，DB SNP以及PubMed。目前存在适宜的成百上千的其他的数据库，因此这个列表仅仅是说明性的。In this manner, various types of data can be accessed from the remote database 402 and maintained in the local genome database 518, as described. Examples include sequence data, exonic structure or location data, splice variant data, marker structure or location data, polymorphism data, homology data, protein homology data, pathway data, alternative gene name data, literature enumeration data, and Annotate data. Many other examples are possible as well. Likewise, genomic data that is not currently available but becomes available in the future can be accessed and maintained locally as described herein. Examples of remote databases 402 currently available for access as described include GenBank, GenBank New, SwissProt, GenPept, DB EST, Unigene, PIR, Prosite, PPAM, Prodom, Blocks, PDB, PDBfinder, EC Enzyme, Kegg Pathway, Kegg Ligand, OMIM, OMIM Map, OMIM Allele, DB SNP, and PubMed. Hundreds or thousands of other databases currently exist suitable, so this list is merely illustrative.

此外，本地基因组数据库518也可以用获得的数据或者从由数据库管理器512服务的其他的本地数据库推导出(由用户服务管理器522)的数据来补充。尤其是，虽然为了说明方便起见示出的本地产品数据库514是和数据库518分离的，但其可以是同一的数据库。作为选择，在数据库514中的全部数据或者一部分可以从数据库518复制或者可访问。In addition, local genomic database 518 may also be supplemented with data obtained or derived (by user services manager 522 ) from other local databases served by database manager 512 . In particular, although local product database 514 is shown as separate from database 518 for ease of illustration, it may be the same database. Alternatively, all or a portion of the data in database 514 may be copied or made accessible from database 518 .

现在提供更具体的例子，用户服务管理器522怎样接收和响应来自用户101的请求，以用于基因组信息和用于产品信息和/或订购。这些例子是相对于图7、图8和图9描述的。A more specific example is now provided of how the user services manager 522 receives and responds to requests from users 101, both for genomic information and for product information and/or ordering. These examples are described with respect to FIGS. 7 , 8 and 9 .

图7是一个表示示例性的方法的流程图，通过举例说明入口400的实施例可以响应用户对基因组或者产品信息的请求。按照这个例子的步骤710，输入管理器532经因特网499从客户410接收由用户101对数据的请求。例如，这个请求可以包括一个HTML或者XML文件，其包括某一个探针设置标识符的用户101的选择。如所述的，作为一个非限定的例子，该探针设置标识符可以是数字、名称、接入编号、符号、图形表示或者核苷酸或者其他的序列。在某些情况下，用户101通过利用一个或多个分析应用程序199A可以进行这个选择，以选择探针设置标识符(例如，如上所述画一个围绕点的环)，然后通过各种已知的方法激活与入口400的通信，诸如右击鼠标。按照各种已知的方法，该请求也可以规定用户101是否对基因组和/或产品数据以及所期望的数据的类型详情感兴趣。例如，用户101可以从下拉菜单选择产品的类别、卖方或者产品的名称等等。如上所述，管理器532提供用户101的请求给用户服务管理器522。FIG. 7 is a flowchart illustrating an exemplary method by which an embodiment of a portal 400 may respond to a user request for genomic or product information. Following step 710 of this example, input manager 532 receives a request for data by user 101 from client 410 via Internet 499 . For example, the request may include an HTML or XML document that includes the user's 101 selection of a certain probe setting identifier. As stated, the probe set identifier may be a number, name, accession number, symbol, graphical representation, or nucleotide or other sequence, as a non-limiting example. In some cases, user 101 may make this selection by utilizing one or more analysis applications 199A to select a probe set identifier (e.g., draw a ring around a point as described above), and then use various known A method of activating communication with portal 400, such as a right mouse click. The request may also specify whether the user 101 is interested in genomic and/or product data and details of the type of data desired, according to various known methods. For example, the user 101 may select a category of a product, a seller or a name of a product, etc. from a drop-down menu. As mentioned above, the manager 532 provides the request of the user 101 to the user service manager 522 .

按照步骤720，用户服务管理器522启动用户101的识别。图8是一个更详情地给出管理器522的功能单元的方框图，包括帐户ID确定器822，在这个说明性的实施例中其进行标识用户101的任务。确定器822可以利用任何已知的方法去获得这个信息，诸如使用cookies技术或者从由用户输入的识别号码的用户请求中提取。通过数据库管理器512，确定器810可以比较用户标识和在用户帐户数据库513中的条目以进一步标识用户101。在另外一个实施例中，如上所述，虽然可以记录统计或者与用户101的请求相关的信息，但无须获得用户101的标识。According to step 720 , the user service manager 522 initiates the identification of the user 101 . FIG. 8 is a block diagram showing in more detail the functional elements of manager 522, including account ID determiner 822, which performs the task of identifying user 101 in this illustrative embodiment. The determiner 822 may utilize any known method to obtain this information, such as using cookies technology or extracting from a user request for an identification number entered by the user. Via database manager 512 , determiner 810 may compare the user identification to an entry in user account database 513 to further identify user 101 . In another embodiment, as described above, the identity of the user 101 need not be obtained, although statistics or information related to the user's 101 request may be recorded.

按照步骤725，用户服务管理器522制订一个适当的查询(例如，使用SQL语言版本)用于相关探针设置标识符与相应的基因或者EST。基因或者EST确定器820是示例性地执行这个操作任务的管理器522的功能单元。确定器820转发该询问给数据库管理器512。如果由用户101提供的探针设置标识符包括序列信息，那么该询问可以从数据库511，和/或从在数据库516中的SIF信息中寻求，一个或多个探针设置的识别具有相应(例如，类似于生物学含义)的序列。如果该探针设置标识符包括名称或者号码(例如，接入编号)，那么该询问可以从数据库516寻找该探针设置的标识，如所述的，包括和名称、号码及其他对应于基因或者EST的探针设置标识符相关的数据。用户101也可以在本地采用数据库应用程序230去获得这个信息，并且按照已知的方法，在该信息请求中包含它。在这种情况下，无须实施步骤725。According to step 725, the user services manager 522 formulates an appropriate query (eg, using a version of the SQL language) for correlating probe set identifiers with corresponding genes or ESTs. Gene or EST determiner 820 is a functional unit of manager 522 that exemplarily performs this operational task. Determiner 820 forwards the query to database manager 512 . If the probe set identifier provided by user 101 includes sequence information, the query may be sought from database 511, and/or from SIF information in database 516, identification of one or more probe sets having corresponding (e.g. , similar to the sequence of the biological meaning). If the probe set identifier includes a name or number (e.g., an access number), the query can look for an identification of the probe set from the database 516, including and name, number, and others corresponding to the gene or EST probes set identifier-related data. The user 101 can also use the database application 230 locally to obtain this information and include it in the information request according to known methods. In this case, step 725 need not be performed.

如在步骤730表示的，用户服务管理器522接着会用基因组信息和/或产品信息来相关所表示的基因和/或EST。在举例说明的例子中这个操作任务的执行是通过相关器830来进行的。在许多可能的实施例的一个中，相关器830制订一个询问经由数据库管理器512到数据库513，以便在本地产品数据库514和/或本地基因组数据库518中获取连接到适当的信息。图9是一个数据库513简化的图形表示。那些本领域普通的技术人员将会理解，这个表示是为清楚说明目的提供的，并且许多其他的实施例是可能的。在到数据库513的适当的询问的一个方面中，为了说明假定是关系数据库，基因或者EST接入编号902与链接904到探针设置ID 912相关。如在图9表示的，通过将两个ID 902A和902B相关到同一链接904N，多个基因和/或EST可以与同一的探针设置ID有关。用于建立这个相关关系的信息类似于如上所述在数据库516中提供的信息，并且因此该链接可以使用数据库516预先确定或者动态地确定。As indicated at step 730, the user services manager 522 will then correlate the indicated genes and/or ESTs with the genomic information and/or product information. The performance of this operational task is performed by correlator 830 in the illustrated example. In one of many possible embodiments, correlator 830 formulates a query via database manager 512 to database 513 to obtain links to appropriate information in local product database 514 and/or local genome database 518 . FIG. 9 is a simplified graphical representation of a database 513. Those of ordinary skill in the art will appreciate that this representation is provided for clarity of illustration and that many other embodiments are possible. In one aspect of an appropriate query to the database 513, to illustrate the assumed relational database, a gene or EST access number 902 is associated with a link 904 to a probe set ID 912. As represented in FIG. 9, multiple genes and/or ESTs can be related to the same probe set ID by associating the two IDs 902A and 902B to the same link 904N. The information used to establish this correlation is similar to that provided in database 516 as described above, and thus the link may be predetermined using database 516 or determined dynamically.

在另外一个实施例中，相关器830简单的相关一个或多个基因或者EST标识符，诸如接入编号与诸如生物制品的产品。这些实施例在图8中是由从确定器810(它是可选择的)径直到相关器830的箭头表示的。该相关可以按照任何种类的传统方法实现，诸如通过提供一个询问给本地产品数据库514、给远程主页404和/或远程数据库402。这些询问可以通过分类、类型、名称或者产品的卖方标引或者键控，例如，在检验查表、关系数据库或者其他的数据结构中可能是恰当的。此外，按照那些相关领域的普通技术人员所知的方法，该询问可以搜索产品、产品网页，或者逻辑上或者句法上与基因或者EST标识符有关的其他的产品数据源。然后该询问的结果可以由输出管理器534提供给用户101，诸如经因特网499提供给客户410。In another embodiment, the correlator 830 simply correlates one or more genetic or EST identifiers, such as accession numbers, with a product, such as a biologic. These embodiments are represented in FIG. 8 by arrows leading from determiner 810 (which is optional) to correlator 830 . The correlation can be accomplished in any variety of conventional ways, such as by providing a query to the local product database 514, to the remote home page 404 and/or the remote database 402. These queries may be indexed or keyed by category, type, name, or vendor of the product, as may be appropriate, for example, in inspection lookup tables, relational databases, or other data structures. Additionally, the query may search products, product web pages, or other sources of product data that are logically or syntactically related to the gene or EST identifier, according to methods known to those of ordinary skill in the relevant art. The results of this query may then be provided to the user 101 by the output manager 534 , such as to the client 410 via the Internet 499 .

随着到探针设置ID912的适当的链接904，可以获得链接到相关的产品和/或基因组数据的一个或多个链接916。例如，链接904N可以链接到探针设置912C，它是与链接916C到相关的产品和/或基因组数据有关系的。用于建立这个相关的信息可以由用户基于专业输入和/或计算机执行的询问实质分析(例如，统计和/或由一个自适应系统，诸如神经系统网络)预先确定。例如，可以观察或者预料(如所述的，由人工或计算机)用户引导基因表达式实验导致识别某种基因可能希望使用对于该基因的抗体去继续控制蛋白质水平实验。在基因和适当的抗体之间的关系可以被存储在合适的数据库中，例如数据库516。因此链接916C可以包括到产品或基因组数据标识符的链接，它识别关于适当的抗体(例如，到产品/基因组ID922A的链接)的数据的链接，识别一般的抗体目录链接(例如，ID922B)，或者识别明确设计用于检测另一个兴趣的接合形式基因的探针阵列链接(例如，ID922C)。为了说明的目的，尤其是在这个例子中，假定链接916C通向ID922C。关于接合变量探针阵列可用性的信息可以由链接926的内容预先确定。例如，可以存储因特网和/或数据库询问URL的链接926D(如所示，与ID922C有关系)通向卖主的网页、本地产品数据库514和/或本地基因组数据库518。同样，链接926D的内容可以由数据库514或者518或者远程数据库诸如数据库402或网页404动态地确定。这些处理和类似的处理由图7的步骤735表示。Along with an appropriate link 904 to a probe set ID 912, one or more links 916 to related product and/or genomic data can be obtained. For example, link 904N may link to probe set 912C, which is related to link 916C to related product and/or genomic data. The information used to establish this correlation may be predetermined by the user based on expert input and/or computer-implemented analysis of the nature of the query (eg, statistically and/or by an adaptive system such as a neural network). For example, it may be observed or expected (by human or computer, as described) that a user conducting a gene expression experiment leading to the identification of a certain gene may wish to use antibodies against that gene to proceed to control protein level experiments. The relationship between genes and appropriate antibodies may be stored in a suitable database, such as database 516 . Thus link 916C may include a link to a product or genome data identifier that identifies a link to data on the appropriate antibody (e.g., a link to product/genome ID 922A), identifies a general antibody catalog link (e.g., ID922B), or Identify probe array links (eg, ID922C) explicitly designed to detect another gene of interest in the junction form. For purposes of illustration, especially in this example, assume that link 916C leads to ID 922C. Information regarding the availability of the junction variable probe array may be predetermined by the content of link 926 . For example, a link 926D (related to ID 922C as shown) to an Internet and/or database query URL leading to a vendor's web page, local product database 514 and/or local genome database 518 may be stored. Likewise, the content of link 926D may be dynamically determined by database 514 or 518 or a remote database such as database 402 or web page 404 . These and similar processes are represented by step 735 of FIG. 7 .

正如那些本领域普通技术人员将理解的，数据库513的这种说明性的安排可能具有很多改变和变化的实施例。例如，探针设置标识数据可以链接到阵列标识符(诸如阵列ID914)，然后它可以与链接916有关联。作为很多可能的例子的另一个，基因或EST接入编号可以直接链接到产品和/或基因组数据ID922，或者，甚至直接到链接926。例如示例的实施由用户基于更窄的询问提供机会用于进行大范围关联。例如，用户可以只选择一个探针设置标识符，但是标识符可以链接到多重基因和/或ETS的数据，其还可以链接到多个产品或基因组数据。在另一个例子中，链接926D可以包括一个到本地基因组数据库518的链接。基于探针设置标识符、基因或EST接入编号、序列信息或者其它的由用户101的询问提供或推出的数据，数据库518可以按照已知的询问和/或检索技术检索相关的数据。This illustrative arrangement of database 513 is possible for many modifications and varied embodiments, as will be appreciated by those of ordinary skill in the art. For example, probe set identification data can be linked to an array identifier, such as array ID 914 , which can then be associated with link 916 . As another of the many possible examples, a gene or EST access number could be linked directly to a product and/or genomic data ID 922, or, even directly to a link 926. An implementation such as the example provides an opportunity for the user to make wide-ranging associations based on narrower queries. For example, a user may select only one probe set identifier, but the identifier may be linked to multiple gene and/or ETS data, which may also be linked to multiple product or genomic data. In another example, link 926D may include a link to local genome database 518 . Based on probe set identifiers, gene or EST access numbers, sequence information, or other data provided or derived from user 101 queries, database 518 may retrieve relevant data according to known query and/or retrieval techniques.

现在返回到图7，尤其是步骤740，按照由相关器830拥有的询问返回的数据被作为适当的返回的数据的本质提供给产品数据处理器842、基因组数据处理器844，或者两者。这了便于说明，处理器842和844的功能分开示出，但是没有必要这么做。处理器842和844应用所有已知的介绍或者数据传送技术以准备图形用户接口，用于传送的文件和其它形式的数据。然后将这样处理的数据提供给输出管理器534，用于传送给客户410。Returning now to FIG. 7 , and in particular step 740 , the data returned by the query owned by correlator 830 is provided to product data processor 842 , genomic data processor 844 , or both, as appropriate returned data in nature. For ease of illustration, the functions of processors 842 and 844 are shown separately, but this is not necessary. Processors 842 and 844 apply all known presentation or data transfer techniques to prepare graphical user interfaces, files and other forms of data for transfer. The data thus processed is then provided to output manager 534 for transmission to client 410 .

在某些实施例中，用户101可以对这种由表示希望购买产品或者接收更多的信息而发送的数据响应。用于索取进一步信息的请求可以以类似于图7的如上所述的方式处理。如果用户101表示出希望购买产品(参见判定单元745)的要求，该表示的产品可以准备装船或者其它处理，并且按照已知的用于实施电子商务的方法，可以调整该用户的帐户。作为许多供选择的实施例的一个，用户服务管理器522可以通知产品卖方用户101的订单，并且该卖方可以船运或者命令这批产品装船。在这个实施例的一个方面中，管理器522接着会说明费用应该从用于介绍的卖方处收费。In some embodiments, user 101 may respond to such data sent by expressing a desire to purchase a product or receive further information. Requests for further information may be handled in a manner similar to that described above for FIG. 7 . If the user 101 expresses a desire to purchase a product (see decision unit 745), the indicated product can be prepared for shipment or otherwise processed, and the user's account can be adjusted according to known methods for implementing electronic commerce. As one of many alternative embodiments, the customer service manager 522 can notify the product seller of the customer's 101 order, and the seller can ship or order the product to be shipped. In one aspect of this embodiment, the manager 522 will then state that the fee should be charged from the seller for the referral.

在入口400的某些实施例中，用户101可以提供给入口400(例如，经由客户410、因特网499以及输入管理器532)一个或多个基因或者EST上升号码或者其他的基因或者EST标识符。做为选择，或者此外，用户101可以提供给入口400一个或多个探针设置标识符。用户101可以从公共资源，从标志用户101已经作为进行试验探针阵列的结果，或者从在探针阵列上具有相应的探针的一系列基因或者EST，或者从任意其他的资源或者以任意其他的方式获得基因、EST和/或探针组标识符。输入管理器532接收一个或多个基因、EST或者探针组标识符，并且将它或者它们提供给用户服务管理器522，它制订一个询问给数据库管理器512。按照已知的询问方法和格式，该询问从与基因、EST和/或探针设置标识符相关的产品信息的本地产品数据库514寻找信息。为此目的，本地产品数据库514基于或者在任意一个或多个基因、EST和/或探针组标识符上的键控可以标引或者可查找产品。按照已知的方法，某些实施例可以包含和基因、EST或者探针设置标识符类似性匹配，例如如果提交了基因、EST、SFI(对应于该探针组标识符)序列的全部或者一部分。同样，按照已知的方法诸如查表，可以实施名称连接关系功能，使得供选择的名称或者基因、EST或者探针设置标识符的形式可以找到，并且在产品数据查询中使用。此外，在某些实施例中，按照已知的因特网搜索技术，管理器522可以启动远程数据库402和/或远程卖方网页404的远程数据检索以从远端源获得产品信息。这些搜索可以基于例如产品分类或者卖方，而该产品分类在本地产品数据库514中与产品、分类相关或者卖方与基因、EST或者由用户101提供的探针设置标识符有关。管理器522可以提供对应于基因、EST和/或探针组标识符的产品数据，从本地产品数据库514和/或远程页或者数据库404或者402中获得产品数据，并且经由输出管理器534将这些产品数据提供给用户101。例如，这个产品数据可能包含在网页524中。在一些实施例中，入口400提供一个用于提供产品数据，典型的生物制品数据的系统。该系统包含：输入管理器532，它从用户101接收一个或多个基因、EST、和/或探针组标识符；用户服务管理器522，它用一个或多个产品数据来相关基因、EST、和/或探针设置标识符，而且使(例如，经由数据库管理器512)产品数据或者例如从在本地数据库514或者在某些实施例中例如远程地从主页404或者数据库402获得；以及输出管理器534，其将该产品数据提供给用户101。In some embodiments of portal 400, user 101 may provide portal 400 (eg, via client 410, Internet 499, and input manager 532) one or more genetic or EST ascending numbers or other genetic or EST identifiers. Alternatively, or in addition, user 101 may provide portal 400 with one or more probe set identifiers. The user 101 may obtain a probe array from a public source, from a marker that the user 101 has performed as a result of testing the probe array, or from a set of genes or ESTs that have corresponding probes on the probe array, or from any other source or in any other manner. Gene, EST and/or probe set identifiers are obtained in the same way. Input manager 532 receives one or more gene, EST or probe set identifiers and provides it or them to user services manager 522 , which formulates a query to database manager 512 . The query seeks information from a local product database 514 of product information associated with gene, EST and/or probe set identifiers, following known query methods and formats. To this end, the local product database 514 may index or find products based on or keying on any one or more gene, EST and/or probe set identifiers. Certain embodiments may include similarity matching to gene, EST, or probe set identifiers, according to known methods, for example, if all or part of the gene, EST, SFI (corresponding to the probe set identifier) sequence is submitted . Also, name concatenation functions can be implemented according to known methods such as table lookups so that alternative names or forms of gene, EST or probe set identifiers can be found and used in product data queries. Additionally, in some embodiments, manager 522 may initiate remote data retrieval of remote database 402 and/or remote vendor web page 404 to obtain product information from remote sources, in accordance with known Internet search techniques. These searches may be based on, for example, product categories associated with products, categories in the local product database 514 or sellers associated with genes, ESTs, or probe set identifiers provided by the user 101 , or sellers. Manager 522 can provide product data corresponding to gene, EST and/or probe set identifiers, obtain product data from local product database 514 and/or remote pages or databases 404 or 402, and export these via output manager 534 Product data is provided to the user 101 . For example, this product data may be included in web page 524 . In some embodiments, portal 400 provides a system for providing product data, typically biologics data. The system comprises: an input manager 532, which receives one or more gene, EST, and/or probe set identifiers from a user 101; a user services manager 522, which correlates a gene, EST, and/or probe set identifier with one or more product data; , and/or probe set identifiers, and make (e.g., via the database manager 512) product data either obtain from the home page 404 or the database 402, such as from the local database 514 or, in some embodiments, remotely from the home page 404 or the database 402; Manager 534, which provides the product data to the user 101.

类似地，提供了一种提供生物制品数据的方法，该方法包含步骤：从用户101接收一个或多个基因、EST、和/或探针设置标识符；用一个或多个产品数据相关基因、EST、和/或探针组标识符；使产品数据从本地(例如数据库514)或者从远端(例如主页404或者数据库402)获得；以及该产品数据提供给用户101。Similarly, a method of providing biological product data is provided, the method comprising the steps of: receiving from a user 101 one or more gene, EST, and/or probe set identifiers; EST, and/or probe set identifier; having product data obtained locally (eg, database 514 ) or remotely (eg, home page 404 or database 402 ); and providing the product data to user 101 .

如上所指出的，入口400的功能单元可以以硬件、软件、程序包或者其任意组合实现。在如上所述的实施例中，为了方便起见通常假定入口400的功能以软件实现。即，举例说明的实施例的功能单元包括软件指令装置，以便去执行描述的功能。这些软件指令可以以任何程序设计语言编程，诸如Java、Perl、C++、其他的高级程序设计语言、低级语言以及其任意组合。因此入口400的功能单元可以称为执行″一组基因组网络入口指令″，以及其功能单元可以类似地被描述为由服务器510、520以及530执行的基因组网入口指令的装置。As noted above, the functional units of portal 400 may be implemented in hardware, software, program packages, or any combination thereof. In the embodiments described above, it is generally assumed that the functions of the portal 400 are implemented in software for the sake of convenience. That is, the functional units of the illustrated embodiments comprise software instruction means to perform the described functions. These software instructions can be programmed in any programming language, such as Java, Perl, C++, other high-level programming languages, low-level languages, and any combination thereof. The functional units of portal 400 may thus be referred to as executing “a set of GenomeNet portal instructions” and their functional units may similarly be described as means of GenomeNet portal instructions executed by servers 510 , 520 , and 530 .

在某些实施例中，计算机程序产品被描述为包括在其上存储的具有控制逻辑(计算机软件程序，包括程序代码)的计算机可用的介质。当由处理器执行的时候，该控制逻辑使处理器去实现在此处描述的入口400的功能。在另外一个实施例中，例如，某些上述的功能主要是在使用硬设备状态机的硬设备中实施的。硬设备状态机的实现使得实施在此处描述的功能将对有关领域的技术人员来说是显而易见的。In some embodiments, a computer program product is described as comprising a computer-usable medium having control logic (computer software program, including program code) stored thereon. When executed by a processor, the control logic causes the processor to implement the functions of portal 400 described herein. In another embodiment, for example, some of the functions described above are implemented primarily in hardware using a hardware state machine. Implementation of a hardware state machine such that implementing the functions described herein will be apparent to those skilled in the relevant arts.

已经描述了各种各样的实施例和实施方式，对于那些相关领域的技术人员来说应该是显而易见的，已经仅仅通过举例来呈现的上述实施例只是说明性的，而不是限定性的。用于在举例说明的实施例的各种各样的功能单元之中分配功能的许多其他的方案是可能的。任何单元的功能可以以在备选方案实施例中的多种方式执行。同样，在备选方案实施例中，几个单元的功能可以由较少的或者单个单元执行。Having described various embodiments and implementations, it should be apparent to those skilled in the relevant arts that the foregoing embodiments have been presented by way of illustration only, and not limitation. Many other schemes are possible for distributing functionality among the various functional units of the illustrated embodiments. The functionality of any unit may be performed in various ways in alternative embodiments. Likewise, the functions of several units may be performed by fewer or a single unit in alternative embodiments.

例如，为了清楚起见，用户服务管理器522的功能被作为由在图8示出的功能单元的实施来描述。但是，管理器522没有必要被分成这些或者其他的不同的功能单元。类似地，为了方便起见分别地描述的特定的功能单元的操作没有必要分别地执行。例如，产品数据处理器842的某些或者全部功能可以由基因组数据处理器844实施，反之亦然。类似地，在某些实施例中，所有的功能单元可以比那些相对于举例说明的实施例描述的实施例执行更少的或者不同的操作。同样，在一个特定的实施例中，为了清楚的说明起见示出的功能单元可以合并在其他的功能单元内。For example, for clarity, the functionality of the user service manager 522 is described as being implemented by the functional units shown in FIG. 8 . However, manager 522 need not necessarily be divided into these or other distinct functional units. Similarly, operations of specific functional units that are described separately for convenience do not necessarily need to be performed separately. For example, some or all of the functions of product data processor 842 may be performed by genomic data processor 844, and vice versa. Similarly, in some embodiments all functional units may perform fewer or different operations than those described with respect to the illustrated embodiments. Also, in a particular embodiment, shown functional units may be incorporated within other functional units for clarity of illustration.

例如，处理器842和844的功能可以被归于单个功能单元。类似地，数据库管理器512的某些或者全部功能可以由用户服务管理器522和/或由输入管理器532执行。For example, the functionality of processors 842 and 844 may be grouped into a single functional unit. Similarly, some or all of the functions of database manager 512 may be performed by user services manager 522 and/or by input manager 532 .

还有，该功能序列或者部分功能通常可以改变。例如，帐户ID确定器810的功能可以在用户数据处理器840之后执行。从而在这点上在图8中的数据流量和控制仅仅是示范性的。类似地，在图7示出的方法步骤没有必要总是按照那些图说明性的例子建议的顺序执行。例如，识别用户的方法步骤720可以在步骤725、730或者735之后执行。Also, the sequence of functions or parts of functions may often vary. For example, the functions of account ID determiner 810 may be performed after user data processor 840 . Thus the data flow and control in Figure 8 is merely exemplary in this regard. Similarly, the method steps shown in Figure 7 are not necessarily always performed in the order suggested by the illustrative examples of those Figures. For example, method step 720 of identifying a user may be performed after step 725 , 730 or 735 .

某些功能单元、文件、数据结构等等可以在举例说明的实施例中作为位于计算机100的系统存储器120或者通常在服务器510、520、或者530中描述。但是，在其他的实施例中，它们可以位于或者分配在计算机系统或者其他的平台，其相互定位和/或彼此远离。例如，在图5示出的一个或多个数据文件或者数据结构511、513、514、516或者518相互定位，并且“局限于”服务器510，可以位于计算机系统中或者远离服务器510的系统。在这些情况下，相对于这些数据文件或者数据结构的数据库管理器512的操作可以经网络或者通过任何众多的其他的已知的用于传送数据和/或控制到或者来自远端位置的装置执行。Certain functional units, files, data structures, etc. may be represented in the illustrated embodiment as being located in the system memory 120 of the computer 100 or typically in the server 510 , 520 , or 530 . However, in other embodiments, they may be located or distributed on computer systems or other platforms that are located and/or remote from each other. For example, one or more data files or data structures 511, 513, 514, 516 or 518 shown in FIG. In these cases, the operation of database manager 512 with respect to these data files or data structures may be performed over a network or by any of numerous other known means for transferring data and/or control to or from remote locations. .

此外，那些相关领域的技术人员将明白，在功能单元和各种各样的数据结构之间和之中的控制和数据流可以在许多方法从如上所述的控制和数据流中改变。尤其是，中间功能单元(未示出)可以直接控制数据流，并且各种各样的单元的功能可以组合、分解或者重新排序去允许并行处理或者用于其他的原因。同样，中间数据结构或者文件可以使用，并且各种各样的描述的数据结构或者文件可以组合或者排列。因此很多其他实施例及其改进都落在由所附权利要求以及其等效规定定义的本发明的范围之内。Furthermore, those skilled in the relevant art will appreciate that the control and data flow between and among the functional units and the various data structures may be varied in many ways from that described above. In particular, intermediate functional units (not shown) may directly control data flow, and the functions of the various units may be combined, decomposed or reordered to allow parallel processing or for other reasons. Likewise, intermediate data structures or files may be used, and various described data structures or files may be combined or permuted. Accordingly, many other embodiments and modifications thereof are within the scope of the invention as defined by the appended claims and their equivalents.

Claims

1. A system for providing data about one or more genes or expressed sequence tags, wherein each gene or expressed sequence tag has at least one corresponding probe set identified by a probe set identifier and is capable of detecting Biomolecules, including:

constituting an input manager and configured to receive a selection from a user of a first set of one or more probe setting identifiers;

constituting a gene caller configured to identify a first set of one or more genes or expressed sequence tags corresponding to the probe set identified by the first set of probe set identifiers;

constituting a correlator configured to correlate the first set of gene or expressed sequence tags with the first set of one or more data; and

Constitutes the output manager and is set to provide the first set of data to the user.

2. The system of claim 1, wherein:

The first set of probe set identifiers identifies a set of probe sets capable of detecting biomolecules including nucleic acids.

3. The system of claim 1, wherein:

The first set of probe set identifiers identifies a set of probe sets capable of detecting biomolecules comprising mRNA transcripts of corresponding genes.

4. The system of claim 1, wherein:

The first set of probe set identifiers includes probe set identifiers of the second set of one or more probe set identifiers that have been able to detect all or part of the expression or differential of their corresponding gene or expressed sequence tags.

5. The system of claim 4, wherein:

The probe set groups identified by the second set of probe set identifiers are positioned on the one or more probe arrays.

6. The system of claim 5, wherein:

The probe set identified by the second set of probe set identifiers includes in situ synthesized oligonucleotides.

7. The system of claim 6, wherein:

The probe array includes a probe array comprising oligonucleotide probes.

8. The system of claim 5, wherein:

At least one probe set group identified by the second set of probe set identifiers consists of a single point on the point-shaped probe array.

9. The system of claim 5, wherein:

The probe array includes a dot array.

10. The system of claim 9, wherein:

At least one spot of the spot array comprises an oligonucleotide.

11. The system of claim 1, wherein:

said users include remote users, and

The input manager receives selections from remote users over the network.

12. The system of claim 11, wherein:

The network includes the Internet.

13. The system of claim 1, wherein:

At least a first probe set identifier of the first set of probe set identifiers includes a gene identifier for a gene corresponding to the first probe set identifier.

14. The system of claim 13, wherein:

Gene identifiers include an accession number.

15. The system of claim 1, wherein:

A user selects a first set of probe set identifiers based at least in part on an indication of a level of expression or differential expression of a gene or expressed sequence tag corresponding to an expression identified by the first set of probe set identifiers. group of probe settings.

16. The system of claim 1, wherein:

The first set of one or more data includes one or any combination of product data regarding availability, price, composition, suitability, or order.

17. The system of claim 16, wherein:

The first set of one or more data includes product data about the biological device or material, or reagents that may be used with a biological device or material.

18. The system of claim 17, wherein:

Devices, materials, or reagents include one or any combination of oligonucleotides, probe arrays, nucleotide clones, antibodies, or proteins.

19. The system of claim 1, wherein:

The first set of one or more data includes at least in part data stored in the local product database.

20. The system of claim 19, wherein:

The first set of one or more data includes at least one link to remote data representing a seller of the biological product.

21. A system for providing product data on one or more genes or expressed sequence tags comprising:

constitutes an input manager and is configured to receive one or more gene or expressed sequence tag identifiers;

constituting a correlator and configured to correlate the gene or expressed sequence tag identifier with one or more product data; and

Constitutes the output manager and is set up to provide product data to the user.

22. The system of claim 21, wherein: said product data is biological product data.

23. The system of claim 21, wherein:

The gene or expressed sequence tag identifier includes a gene or expressed sequence tag access number.

24. A method for providing data about one or more genes or expressed sequence tags, wherein each gene or expressed sequence tag has at least a corresponding probe set identified by a probe set identifier, and is capable of detecting a A biomolecule comprising the steps of:

the input manager receives a selection of a first set of one or more probe setting identifiers from the user;

the gene determiner identifies one or more genes or expressed sequence tags corresponding to a first set of probe sets identified by the first set of probe set identifiers;

a correlator correlating the first set of one or more data with the first set of gene or expressed sequence tags; and

The output manager provides the first set of data to the user.

25. The method of claim 24, wherein

26. The method of claim 24, wherein:

27. A method for providing product data about one or more genes or expressed sequence tags, the product data being provided by a system comprising an input manager, a correlator and an output manager, the method comprising:

receiving, by an input manager, one or more gene or expressed sequence tag identifiers;

correlating the one or more product data with gene or expressed sequence tag identifiers by a correlator; and

The output manager provides product data to the user.

28. The method of claim 27, adapted to process requests or commands received from users over a network.

29. The method of claim 28, comprising identifying a first set of one or more gene or expressed sequence tags corresponding to probe set identifiers capable of detecting biomolecules.

30. The method of claim 29, wherein the user selects a first set of one or more probe setting identifiers, and the first set of product data is provided to the user via the one or more web pages.

31. The method of claim 30, wherein:

In the first group, the probe set identifier identifies a probe set capable of detecting biomolecules including nucleic acids.

32. The method of claim 30, wherein:

In the first set, the probe set identifier identifies the set of probe sets capable of detecting biomolecules comprising mRNA transcripts of the corresponding gene.

33. The method of claim 30, further comprising the step of:

A second user selection of one or more products to purchase is received over the network based on the portion of the first set of product data provided to the user.

34. The method of claim 33, further comprising the step of:

identify an account corresponding to the user, and

An account corresponding to the user is adjusted based at least in part on price data corresponding to the product selected by the second user.

35. The method of claim 33, further comprising the step of:

The product is sold to the user based on the second user's selection of the product.

36. The method of claim 30, wherein a user performs at least one gene expression experiment on an array of probes to select said probe set identifier.

37. The method of claim 30, wherein the user selects the first set of probe set identifiers based on an indication of the extent of expression of genes or expressed sequence tags corresponding to the probe sets identified by the first set of probe set identifiers.

38. The method of claim 30, wherein:

At least one probe set identified by the first set of probe set identifiers is disposed on the one or more probe arrays.

39. The method of claim 30, wherein:

Said product data is selected from the group comprising product data about availability, composition, suitability and order.

40. The method of claim 30, wherein:

The product data is selected from the group comprising product data pertaining to the biological device or material, or a reagent that may be used in the biological device or material.

41. The method of claim 30, wherein:

The product data is selected from the group consisting of product data related to oligonucleotides, probe arrays, nucleotide clones, antibodies, or proteins.

42. The method of claim 30, wherein:

This product data is associated with PCR primers and/or PCR probes.

43. The method of claim 30, wherein the product data includes at least one link to remote data representing a seller of the biological product.