WO2006026702A2 - Methods and systems for semantic identification in data systems - Google Patents
Methods and systems for semantic identification in data systems Download PDFInfo
- Publication number
- WO2006026702A2 WO2006026702A2 PCT/US2005/031097 US2005031097W WO2006026702A2 WO 2006026702 A2 WO2006026702 A2 WO 2006026702A2 US 2005031097 W US2005031097 W US 2005031097W WO 2006026702 A2 WO2006026702 A2 WO 2006026702A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- item
- identifier
- semantic
- semantic identifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
Definitions
- This invention relates to the field of information technology, and more particularly to the field of data integration systems.
- EAI efforts encounter many challenges, ranging from the need to handle different protocols, the need to address ever-increasing volumes of data and numbers of transactions, and an ever-increasing appetite for faster integration of data.
- Various approaches to EAI have been taken, including least-common-denominator approaches, atomic approaches, and bridge-type approaches.
- EAI is based upon communication between individual applications.
- the complexity of EAI solutions grows geometrically in response to linear additions of platforms and applications.
- While data integration systems provided useful tools for addressing the needs of an enterprise, such systems are typically deployed as custom solutions. They have a lengthy development cycle, and may require sophisticated technical training to accommodate changes in business structure and information requirements.
- data integration system tools that permit use, reuse, and modification of functionality in a changing business environment.
- One such tool is a semantic identifier that may allow for the unique identification of an item based on its relationship with other items without the need to store additional data.
- a translation engine is another such tool that may translate data, metadata, semantic identifiers and other items from one format, language and/or data model to another.
- a level of abstraction property of a hub or database may allow for the differentiation of multiple instances or forms of an item.
- a semantic identifier may exist for an item.
- the item may be an object, data item, datum, column, row, table, database, instance, attribute, metadata, concept, topic, subject, semantic identifier, other identifier, RFID tag, vendor, supplier, customer, person, team, organization, user, network, system, device, family, store, product, product line, product feature, product specification, product attribute, price, cost, bill of materials, shipping data, tax data, course, educational program, location, map, division, organization, organism, process, rule, law, rating system, good, service and service offering or other item or concept.
- An item may be related to a data integration job and/or data integration platform.
- a semantic identifier may identify the item based on the item's relationship with one or more other items.
- a relationship may be the absence of a relationship.
- a relationship may be based on semantics.
- a relationship may involve the position of the item in a relational hierarchy.
- a semantic identifier may be a unique identifier for an item. It is possible that a unique semantic identifier for an item takes into account less than all the relationships of that item with other items. It may be advantageous to create a semantic identifier that is based on the minimum number of relationships to ensure uniqueness. The number of relationships required to create a unique semantic identifier for an item may vary based on context. A semantic identifier may be context-dependent. A semantic identifier may be dynamic.
- a semantic identifier may be stored, maintained, recorded, processed and/or interpreted in a syntax that may be stored, maintained, recorded, processed and/or interpreted in a string structure or format.
- the syntax and/or string structure or format may be parseable.
- the syntax and/or string structure or format may be truncated, modified, shortened, parsed or re-ordered. It may be possible to truncate, modify, shorten or re-order a syntax and/or string and still maintain the unique identifier.
- a shorter syntax and/or string may be useful in certain contexts and may increase performance.
- a semantic identifier may be associated with a semantic context such as a step in an enterprise method, a datum in a database, a datum in a row or column, a row or column in a table, a row or column in a database, a datum in a table, a table in a database, metadata in a database, an item in a hub or repository, an item in a database, an item in a table, an item in a column, an item in a row, a person in an organization, a sender or recipient of a communication, a user on a network, a system on a network, a device on a network, a person in a family, an item in a store, a dish on a menu, a product in a product line, a product in a product offering, a course or step in an educational or training program, a location on a map, a location of an item, a division of an organization, a person on a team, a rule in a
- a database may have a table with a column.
- the unique semantic identifier for that column may be "column name of table name of database name.”
- This unique semantic identifier may be stored, maintained, recorded, processed and/or interpreted using the following syntax: column name::table name::database name.
- the syntax and/or any associated string may be parsed and unnecessary elements may be removed. For example, if only one database existed the following syntax may still generate a unique identifier for the column: column name::table name.
- the database relationship is not required to create a unique semantic identifier.
- the database may have only one table, so that the following syntax may be a unique identifier for the column: column name:: database name.
- the table relationship is not required to create a unique identifier. Use of a shorter syntax and/or string may decrease processing times and increase efficiency.
- a translation engine may perform translation operations one or more semantic identifiers, databases, databases including semantic identifiers, systems of information, systems of information including semantic identifiers or other items.
- the translation operation may translate or otherwise modify the format, language and/or data model of a semantic identifier.
- a translation operation may involve a translation or mapping to or from one or more data tools, languages, formats and/or data models to or from at least one other data tool, language, format and/or data model.
- a translation operation may involve a translation or mapping to or from DataStage 7, QualityStage, Business Objects, IBM - DB2 Cube Views, UML 1.1, UML 1.3, ERStudio, ProfileStage, PowerDesigner (with added support for Packages and Extended Attributes) and/or MicroStrategy.
- a translation engine and/or translation operation may be embodied in a metabroker.
- a translation engine, a mapping of a translation operation or a translation operation can trace data that is translated in the execution of the operation backward and forward between an original semantic context and a translated semantic context.
- a translation operation may be performed, executed and/or conducted in batch, real-time and/or on a continuous basis.
- a translation operation may be provided or made available as a service, for example, as part of a service oriented architecture.
- semantic identifier database, database including one or more semantic identifiers, system of information, system of information including one or more semantic identifiers or other item it can be translated to or from, mapped to, linked to, used with or associated with any other semantic identifier, database, database including one or more semantic identifiers, system of information, system of information including one or more semantic identifiers or other item sharing at lease one translation operation.
- An item may exist in multiple forms or instances, such as a physical modeling activity and/or logical modeling activity.
- An item, including any associated data or metadata may exist in multiple forms or instances in a database and/or hub.
- any differentiating characteristic may be used, such as a level of abstraction, a position in a hierarchy, a relationship to another item, one or more distinguishing attributes of the item, the context in which the item is found, the physical location in which the item is found, or the like.
- a table named "employee” may be brought into the hub.
- the hub collector may have two forms or instances of "employee” in the hub; one corresponding to the physical modeling activity and another corresponding to the logical modeling activity.
- the level of abstraction property of hub data collection allows for the differentiation between the physical model and logical model instances or forms.
- a translation engine When performing a translation operation, which may be in response to a query, a translation engine may grab, load or obtain all of the items from a hub or database. It may then filter, select, store, translate, modify, or otherwise operate on the items based on a distinguishing characteristic, such as abstraction level, position in a hierarchy, a relationship to another item, an attribute of the items, a physical locations or the like.
- a translation engine when performing a translation operation, which may be in response to a query, a translation engine may filter, select, store, translate, modify or otherwise operate on items, including any data and/or metadata, at the hub or database and grab, load or obtain only those items of the relevant level of abstraction or having the relevant attributes, positions, relationships, locations or the like.
- the filtering, selection, storage, translation, modification or other operation may be performed at runtime or design time and may be conducted in batch, real-time or on a continuous basis.
- the filtering, selection, storage, translation, modification or other operation may be based on information or inputs obtained by the translation engine and/or system at development-time, design- time or run-time, such as data model, a mapping of a data model, a differentiating characteristic of the syntax of an identifier, or the like.
- the information may be updated in a dynamic fashion in real-time.
- a system may refine a select command for selecting data from a database based on a known mapping of the database, such as to select logical items and omit physical items, or vice versa.
- a query may be a message or operation.
- the translation engine may perform a translation operation on the query itself resulting in a revised query or select command which may be sent directly to the hub or database.
- the revised query or select command may be in a format directly compatible with the hub or database.
- a computer program product may include a computer useable medium including computer readable program code, wherein the computer readable program code when executed on one or more computers causes the one or more computers to perform any one or more of the methods above.
- data source or “data target” are intended to have the broadest possible meaning consistent with these terms, and shall include a database, a plurality of databases, a repository information manager, a queue, a message service, a repository, a data facility, a data storage facility, a data provider, a website, a server, a computer, a computer storage facility, a CD, a DVD, a mobile storage facility, a central storage facility, a hard disk, a multiple coordinating data storage facilities, RAM, ROM, flash memory, a memory card, a temporary memory facility, a permanent memory facility, magnetic tape, a locally connected computing facility, a remotely connected computing facility, a wireless facility, a wired facility, a mobile facility, a central facility, a web browser, a client, a laptop, a personal digital assistant ("PDA"), a telephone, a cellular phone, a mobile phone, an information platform, an analysis facility, a processing facility, a business enterprise system or other facility where data is handled
- PDA personal digital
- EJBs support rapid and simplified development of distributed, transactional, secure and portable Java applications.
- EJBs support a container architecture that allows concurrent consumption of messages and provide support for distributed transactions, so that database updates, message processing, and connections to enterprise systems using the J2EE architecture can participate in the same transaction context.
- JMS shall mean the Java Message Service, which is an enterprise message service for the Java-based
- J2EE enterprise architecture shall mean the J2EE Connector Architecture of the J2EE platform described more particularly below. It should be appreciated that, while EJB, JMS, and JCA are commonly used software tools in contemporary distributed transaction environments, any platform, system, or architecture providing similar functionality may be employed with the data integration systems described herein.
- Real time shall include periods of time that approximate the duration of a business transaction or business and shall include processes or services that occur during a business operation or business process, as opposed to occurring off-line, such as in a nightly batch processing operation. Depending on the duration of the business process, real time might include seconds, fractions of seconds, minutes, hours, or even days.
- Business process shall include any methods, service, operations, processes or transactions that can be performed by a business, including, without limitation, sales, marketing, fulfillment, inventory management, pricing, product design, professional services, financial services, administration, finance, underwriting, analysis, contracting, information technology services, data storage, data mining, delivery of information, routing of goods, scheduling, communications, investments, transactions, offerings, promotions, advertisements, offers, engineering, manufacturing, supply chain management, human resources management, data processing, data integration, work flow administration, software production, hardware production, development of new products, research, development, strategy functions, quality control and assurance, packaging, logistics, customer relationship management, handling rebates and returns, customer support, product maintenance, telemarketing, corporate communications, investor relations, and many others.
- Service oriented architecture shall include services that form part of the infrastructure of a business enterprise.
- services can become building blocks for application development and deployment, allowing rapid application development and avoiding redundant code.
- Each service may embody a set of business logic or business rules that can be bound to the surrounding environment, such as the source of the data inputs for the service or the targets for the data outputs of the service.
- SOA Service oriented architecture
- Methods shall include data that brings context to the data being processed, data about the data, information pertaining to the context of related information, information pertaining to the origin of data, information pertaining to the location of data, information pertaining to the meaning of data, information pertaining to the age of data, information pertaining to the heading of data, information pertaining to the units of data, information pertaining to the field of data and/or information pertaining to any other information relating to the context of the data.
- WSDL Web Services Description Language
- WSDL includes an XML format for describing network services (often web services) as a set of endpoints operating on messages containing either document- oriented or procedure-oriented information. The operations and messages are described abstractly, and then bound to a concrete network protocol and message format to define an endpoint. Related concrete endpoints are combined into abstract endpoints (services). WSDL is extensible to allow description of endpoints and their messages regardless of what message formats or network protocols are used to communicate.
- Methodabroker shall include systems or methods that may involve a translation engine or other means for performing translation operations or other operations on data or metadata.
- the translation operations or other operations may involve the translation of data or metadata from one or more formats, languages and/or data models to one or more formats, languages and/or data models.
- Fig. 1 is a schematic diagram of a business enterprise with a plurality of business processes, each of which may include a plurality of different computer applications and data sources.
- Fig. 2 is a schematic diagram showing data integration across a plurality of business processes of a business enterprise.
- Fig. 3 is a schematic diagram showing an architecture for providing data integration for a plurality of data sources for a business enterprise.
- Fig. 4 shows an item in relation to other items.
- Fig. 5 shows an item in relation to other items.
- Fig. 6 A shows an item in a certain context.
- Fig. 6B shows an item in a certain context.
- Fig. 7 shows certain strings.
- Fig. 8 shows an item and a corresponding string.
- Fig. 9 shows a string and certain of its variations.
- Fig. 10 shows a translation engine acting on certain strings.
- Fig. 11 shows an item that may exist in multiple forms or instances.
- Fig. 12 shows an item that may exist in multiple forms or instances in a hub or database.
- Fig. 13 shows an item in a hub at various levels of abstraction.
- Fig. 14 shows a translation process in which all items are grabbed at the database or hub.
- Fig. 15 shows a translation process in which items are filtered at the database or hub.
- Fig. 16 shows a translation process in which the query is translated.
- the invention(s) disclosed herein can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements.
- the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
- the invention(s) can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
- a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system
- a data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus.
- the memory elements can 005/031097
- I/O devices including but not limited to keyboards, displays, pointing devices, etc ) can be coupled to the system either directly or through intervening I/O controllers
- Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening p ⁇ vate or public networks Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters Fig 1 represents a platform 100 for facilitating integration of va ⁇ ous data of a business enterprise
- the platform includes a plurality of business processes, each of which may include a plurality of different computer applications and data sources
- the platform may include several data sources 102, which may be data sources such as those desc ⁇ bed above These data sources may include a wide variety of data types from a wide variety of physical locations
- the data source may include systems from providers such as such as Sybase, Microsoft, Informix, Oracle, Inlomover, EMC, Trillium, First Logic, Siebel, PeopleSoft, IBM, Apache, or Netscape
- the data sources 102 may include systems using database products or standards such as IMS, DB2, ADABAS, VSAM, MD Series, UDB,
- Data targets are discussed later in this desc ⁇ ption In general, these data targets may be any of the data sources 102 noted above This difference in nomenclature typically denotes whether a data system provides data or receives data in a data integration process However, it should be appreciated that this distinction is not intended to convey any difference in capability between data sources and data targets (unless specifically stated otherwise), since in a conventional data integration system, data sources may receive data and data targets may provide data
- the platform illustrated in Fig 1 also includes a data integration system 104
- the data integration system may, for example, facilitate the collection of data from the data sources 102 as the result of a query or retrieval command the data integration system 104 receives
- the data integration system 104 may send commands to one or more of the data sources 102 such that the data source(s) provides data to the data integration system 104 Since the data received may be in multiple formats including varying metadata, the data integration system may reconfigure the received data such that it can be later combined for integrated processing
- the functions that may be performed by the data integration system 104 are desc ⁇ bed in more detail below
- the platform 100 also includes several retrieval systems 108
- the retneval systems 108 may include databases or processing platforms used to further manipulate the data communicated from the data integration system 104
- the data integration system 104 may cleanse, combine, transform or otherwise manipulate the data it receives from the data sources 102 such that a retrieval system 108 can use the processed data to produce reports 110 useful to the business
- the reports 110 may be used to report data associations, answer complex quenes, answer simple que ⁇ es, or form other reports useful to the business or user, and may include raw data, tables, charts, graphs, and any other representations of data from the retneval systems 108
- the platform 100 may also include a database or data base management system 112.
- the database 112 may be used to store information temporally, temporarily, or for permanent or long-term storage.
- the data integration system 104 may collect data from one or more data sources 102 and transform the data into forms that are compatible with one another or compatible to be combined with one another. Once the data is transformed, the data integration system 104 may store the data in the database 112 in a decomposed form, combined form or other form for later retrieval.
- Fig. 2 is a schematic diagram showing data integration across a plurality of entities and business processes of a business enterprise.
- the data integration system 104 facilitates the information flowing between user interface systems 202 and data sources 102.
- the data integration system 104 may receive queries from the interface systems 202, where the queries necessitate the extraction and possibly transformation of data residing in one or more of the data sources 102.
- the interface systems 202 may include any device or program for communicating with the data integration system 104, such as a web browser operating on a laptop or desktop computer, a cell phone, a personal digital assistant ("PDA"), a networked platform and devices attached thereto, or any other device or system that might interface with the data integration system 104.
- PDA personal digital assistant
- a user may be operating a PDA and make a request for information to the data integration system 104 over a WiFi or Wireless Access Protocol/Wireless Markup Language ("WAP/WML") interface.
- the data integration system 104 may receive the request and generate any required queries to access information from a website or other data source 102 such as an FTP file site.
- the data from the data sources 102 may be extracted and transformed into a format compatible with the requesting interface system 202 (a PDA in this example) and then communicated to the interface system 202 for user viewing and manipulation.
- the data may have previously been extracted from the data sources and stored in a separate database 112, which may be a data warehouse or other data facility used by the data integration system 104.
- the data may have been stored in the database 112 in a transformed condition or in its original state.
- the data may be stored in a transformed condition such that the data from a number of data sources 102 can be combined in another transformation process.
- a query from the PDA may be transmitted to the data integration system 104 and the data integration system 104 may extract the information from the database 112. Following the extraction, the data integration system 104 may transform the data into a combined format compatible with the PDA before transmission to the PDA.
- Fig. 3 is a schematic diagram showing an architecture for providing data integration for a plurality of data sources 102 for a business enterprise.
- An embodiment of a data integration system 104 may include a discover data stage 302 to perform, possibly among other processes, extraction of data from a data source and analysis of column values and table structures for source data.
- a discover data stage 302 may also generate recommendations about table structure, relationships, and keys for a data target. More sophisticated profiling and auditing functions may include date range validation, accuracy of computations, accuracy of if-then evaluations, and so forth.
- the discover data stage 302 may normalize data, such as by eliminating redundant dependencies and other anomalies in the source data.
- the discover data stage 302 may provide additional functions, such as drill down to exceptions within a data source 102 for further analysis, or enabling direct profiling of mainframe data.
- a non-limiting example of a commercial embodiment of a discover data stage 302 may be found in IBM's WebSphere ProfileStage product.
- the data integration system 104 may also include a data preparation stage 304 where the data is prepared, standardized, matched, or otherwise manipulated to produce quality data to be later transformed.
- the data preparation stage 304 may perform generic data quality functions, such as reconciling inconsistencies or checking for correct matches (including one-to-one matches, one-to-many matches, and deduplication) within data.
- the data preparation stage 304 may also provide specific data enhancement functions. For example, the data preparation stage 304 may ensure that addresses conform to multinational postal references for improved international communication.
- the data preparation stage 304 may conform location data to multinational geocoding standards for spatial information management.
- the data preparation stage may modify or add to addresses to ensure that address information qualifies for U.S. Postal Service mail rate discounts under Government Certified U.S. Address Correction.
- the data integration system may also include a data transformation stage 308 to transform, enrich and deliver transformed data.
- the data transformation stage 308 may perform transitional services such as reorganization and reformatting of data, and perform calculations based on business rules and algorithms of the system user.
- the data transformation stage 308 may also organize target data into subsets known as datamarts or cubes for more highly tuned processing of data in certain analytical contexts.
- the data transformation stage 308 may employ bridges, translators, or other interfaces (as discussed generally below) to span various software and hardware architectures of various data sources and data targets used by the data integration system 104.
- the data transformation stage 308 may include a graphical user interface, a command line interface, or some combination of these, to design data integration jobs across the platform 100.
- a non-limiting example of a commercial embodiment of a data transformation stage 308 may be found in IBM's WebSphere DataStage product.
- the stages 302, 304, 308 of the data integration system 104 may be executed using a parallel execution system 310 or in a serial or combination manner to optimize the performance of the system 104.
- the data integration system 104 may also include a metadata management system 312 for managing metadata associated with data sources 102.
- the metadata management system 312 may provide for interchange, integration, management, and analysis of metadata across all of the tools in a data integration environment.
- a metadata management system 312 may provide common, universally accessible views of data in disparate sources, such as IBM's WebSphere ODBC MetaBroker, CA ERwin, IBM's WebSphere ProfileStage, IBM's WebSphere DataStage, IBM's WebSphere QualityStage, IBM DB2 Cube Views, and Cognos Impromptu.
- the metadata management system 312 may also provide analysis tools for data lineage and impact analysis for changes to data structures.
- the metadata management system 312 may further be used to prepare a business data glossary of data definitions, algorithms, and business contexts for data within the data integration system 104, which glossary may be published for use throughout an enterprise.
- a non-limiting example of a commercial embodiment of a metadata management system 312 may be found in IBM's WebSphere MetaStage product.
- Fig. 4 depicts a semantic identifier for an item.
- the item may be an object, class, attribute, data item, data model, metadata model, model, definition, identity, structure, language, mapping, relationship, instance or other item or concept, including another semantic identifier.
- the semantic identifier may identify the item based on the item's attributes, the item's physical location, the relationship of the item with one or more other items, such as in a hierarchy, or the like. In some cases a relationship may be defined as the absence of some particular relationship.
- a relationship may be based on semantics.
- a relationship may involve the position of the item in a relational hierarchy. For example, in Fig. 4 item 31097
- Item 1 5202 may be identified based on its relationship with the other items to which it is related. Item 1 5202 may be identified as being directly related to item 2 5204, item 3 5208 and item 4 5210, indirectly related to item 5 5212 and indirectly related to item 6 5214 through item 5 5212 and item 4 5210. Item 1 may also be identified as being directly related to item 2 5204, item 3 5208 and item 4 5210. In embodiments, the indirect relationships between item 1 5202 and item 5 5212 and item 6 5214 may be captured in the relationship of item 5202 1 to item 4 5210. This concatenation or recursive type of identification may permit dynamic, in addition to static, identifiers.
- the semantic identifier for item 1 5202 which incorporates item 2 5204, item 3 5208 and item 45210 would incorporate this change through incorporation of item 4 5210 and would not need to be updated to account for the changes in item 6 5214 as it would if item 6 5214 was directly included in the semantic identifier.
- Figure 5 presents a more concrete example of a semantic identifier.
- Jim may be identified as Jim, residing at 111 Anyroad, Anytown, Anystate USA, with phone number 555-555-5555 and social security number 013-65- 8067.
- Jim may be identified in terms of his relationships with others.
- Jim may be identified as the son of Betty, brother of Larry and Jeff, father of Jessica and nephew of Frank.
- the semantic identifier may be a unique identifier for an item.
- this semantic identifier would be a unique identifier for Jim.
- a unique semantic identifier to an item takes into account fewer than all of the relationships of that item with other items.
- the existence of these relationships alone would be enough to create a unique semantic identifier.
- Jim's relationships with Jeff and Frank would not need to be considered.
- Figure 6 A depicts two items of interest: item 1 5402 and item 7 5404.
- item 1 5402 may be distinguished from item 7 5404 by item l's 5402 relationship with item 5 5410 and item 6 5412. That is, in context A, the unique semantic identifier for item 1 5402 may be that it is directly related to items 2, 3 and 4, indirectly related to item 5 5410 though item 4 and indirectly related to item 6 5412 through item 5 5410 and item 4.
- the unique semantic identifier for item 7 5404 may be that it is directly related to only items 2 and 3.
- Figure 6B presents item 1 5402 in a different context, context B 5414.
- any one or more of item l's 5402 direct relationships with item 4, absence of a direct relationship with item 6 or indirect relationship with item 5 may be taken into account.
- item 1 5402 may be uniquely semantically identified as directly related to items 2 and 3, but not directly related to item 6.
- the unique identifier for item 1 differs between context A 5408 and context B 5414.
- a semantic identifier for an item such as an item related to a data integration job or a data integration platform, may be provided with a context-dependent identifier for the item.
- a context-dependent identifier may be stored in an atomic format, such as in a data repository.
- contexts A 5408 and B 5414 may be two different imports, mappings, run versions, models, metabroker models, instances, tools, views, objects, classes, items, relationships, attributes, or any combination of any of the foregoing.
- a matching or comparison facility may compare the syntax of the identity of an item in different imports, run versions, models, metabroker models, instances, tools and/or items and determine or assist with the determination of what action to take or refram from taking based on the compa ⁇ son
- a matching engine may compare the model used by import instance A to the model used by metabroker B Based on this comparison it may be decided that metabroker B can access the data and metadata of import instance A without transformation or modification, and the compa ⁇ son facility may direct the metabroker B to proceed
- tool A 5408 may be compared to tool B 5414, and it may be determined to perform a cross-tool object merge, wherein each tool can access and use the objects of the other tool
- the comparison facility may trigger a translation facility to assist the cross
- a semantic identifier may be stored, maintained, recorded, processed and/or interpreted m a syntax that may be stored, maintained, recorded, processed and/or interpreted m a string structure or format
- Figure 7 depicts an example of a syntax and a corresponding st ⁇ ng composed in that syntax
- the syntax 5502 may be column name table name database name This syntax may be related, for example, to a semantic identifier that identifies a column of a table in a database
- a string composed in this syntax 5504 may be age employee employee database
- This st ⁇ ng may be related, for example, to a semantic identifier that identifies the age of an employee in a particular employee database
- the st ⁇ ng corresponding to the semantic identifier for item 1 5402 in context B 5414 may be direct relation to item 2 direct relation to item 3 direction relationship to item 4
- the semantic identifier and corresponding string may also incorporate the lack of a direct relationship between items 1 5402 and item 6
- the semantic identifier in st ⁇ ng format for item 9 5602 may be direct to item 2 direct to item 3 direct to item 4 indirect to item 5 5604
- a stnng may be capable of being parsed
- a syntax and/or st ⁇ ng may be truncated, modified and/or the elements of a syntax and/or st ⁇ ng may be re-ordered
- st ⁇ ng 5702 is a truncation of st ⁇ ng 5604
- st ⁇ ng 5704 is a truncation and modification and/or re-orde ⁇ ng of st ⁇ ng 5604
- st ⁇ ng 5708 is a modification and/or re-orde ⁇ ng of st ⁇ ng 5606
- the truncation, modification and/or re-orde ⁇ ng may be performed by a translation engine It may be useful to truncate a syntax and/or st ⁇ ng when all of the relationships included m the syntax and/
- a translation engine may perform translation operations with respect to one or more semantic identifiers, databases 112, databases 112 including semantic identifiers, systems of information, systems of information including semantic identifiers or other items
- Figure 10 depicts a translation engine 5802 acting on a semantic US2005/031097
- the translation operation may translate or otherwise modify the format, language and/or data model of a semantic identifier.
- a translation operation may involve a translation or mapping to or from one or more data tools, languages, formats and/or data models to or from at least one other data tool, language, format and/or data model.
- a translation operation may involve a translation or mapping to, from or between known data integration tools, such as WebSphere DataStage 7 from IBM, WebSphere QualityStage from IBM, Business Objects tools, IBM - DB2 Cube Views, UML 1.1, UML 1.3, ERStudio, IBM's WebSphere ProfileStage, PowerDesigner (with added support for Packages and Extended Attributes) and/or Micro Strategy tools.
- a translation engine and/or translation operation may optionally be embodied in a metabroker.
- a translation operation may be performed, executed and/or conducted in batch, real-time and/or on a continuous basis.
- a translation operation may be provided or made available as a service, for example, as part of a service oriented architecture.
- the SOA can be part of the infrastructure of an enterprise computing system of a business enterprise.
- services become building blocks for application development and deployment, allowing rapid application development and avoiding redundant code.
- Each service embodies a set of business logic or business rules that can be blind to the surrounding environment, such as the source of the data inputs for the service or the targets for the data outputs of the service.
- services can be reused in connection with a variety of applications, provided that appropriate inputs and outputs are established between the service and the applications.
- the service-oriented architecture allows the service to be protected against environmental changes, so that the architecture functions even if the surrounding computer environment is changed. As a result, services may not need to be recoded as a result of infrastructure changes, which may result in savings of time and effort.
- An SOA may be for a web service and may involve three entities, a service provider, a service requester and a service registry.
- the registry may be public or private.
- the service requester may search a registry for an appropriate service. Once an appropriate service is discovered, the service requester may receive code, such as Web Services Description Language (“WSDL") code, that is necessary to invoke the service.
- WSDL Web Services Description Language
- the service requester may then interface with the service provider, such as through messages in appropriate formats (such as the Simple Object Access Protocol (“SOAP”) format for web service messages), to invoke the service.
- SOAP protocol is a preferred protocol for transferring data in web services.
- the SOAP protocol defines the exchange format for messages between a web services client and a web services server.
- the SOAP protocol uses an extensible Markup Language (“XML”) schema, XML being a generic language specification commonly used in web services for tagging data, although other markup languages may be used.
- mapping of a translation operation can, among other things, trace data that is translated in the execution of the operation backward and forward between an original semantic context and a translated semantic context.
- the appropriate identifier for the data item may vary, such as by varying or truncating a syntax and/or string to enable more efficient storage or faster processing, or by varying the relationships used to form a unique identifier 5 031097
- a dynamic identifier may combine the benefits of retraceable translation with the benefits of rapid processing, efficient data processing and effective operation in various contexts in which a data item is used.
- a given item such as an item that has an identity in a model, may exist in multiple forms or instances, such as a physical instance and a logical modeling instance.
- Figure 11 depicts an item, namely, a table of employee information 5902.
- the concept or entity "employees" can exist in a number of different forms within an enterprise.
- the employee table 5902 may exist as a physical table that stores values related to employees in a physical data storage facility.
- the entity employee may also be represented as a logical entity, such as an icon or text that represents employees in a logical modeling activity 5908, or in various other forms or instances.
- Figure 12 depicts the employee table 5902 in one form or a single instance in a database 6002 and/or more than one form or instance in a database 6004 or hub 6008.
- any differentiating characteristic may be used, such as a level of abstraction, a physical property of an item, a location of the item within a hierarchy, a location of an item in a database, a context in which an item is found, a syntax of an item, a relationship of an item to other items, an attribute of an item, the class of an item, or other characteristic.
- a level of abstraction a physical property of an item
- a location of the item within a hierarchy a location of an item in a database
- a context in which an item is found a syntax of an item, a relationship of an item to other items, an attribute of an item, the class of an item, or other characteristic.
- the items, or individuals in this case may be distinguished based on age, gender, hair color, IQ, political affiliation and/or number of trips to the doctor in the past three months.
- the employee table may exist in multiple forms or instances in the hub 6102, such as a physical employee table 5904, such as used to store values in a database that relate to data that pertains to employees, and a logical employee model 5908, such as to be used in a view of process that relates to employees.
- an item such as a table named "employee”
- a hub collector may have two forms or instances of "employee” in the hub; one corresponding to the physical database instance and another corresponding to the logical modeling activity.
- a differentiating characteristic such as a property of the item attributed to the item in the hub allows for the differentiation between the physical instances and the logical model instances or forms. In embodiments that differentiating characteristic can be called a level of abstraction, such as to distinguish between logical and physical levels of abstraction.
- the hub may associate other characteristics with items, such as different forms of identifiers, relationships, classes, attributes, physical locations, logical positions, models and the like.
- a system such as a translation engine 6204, may grab, load or obtain all of the items from a hub 6208 or database 6210. It may select or filter 6204 the items based on any differentiating characteristic.
- the methods and systems described herein provide for selective handling of instances of the same item or entity based on any differentiating characteristic.
- a translation engine 6204 may filter or select items, including any data and/or metadata, at the hub 6208 or database 6210 and grab, load or obtain only those items of the relevant level of abstraction. For example, it may filter or select out those instances or forms with a logical level of abstraction, keeping only those with a physical level of abstraction.
- the filtering or selection may be performed at runtime or design time and may be conducted in batch, real-time or on a continuous basis. In embodiments such a method of filtering or selection may be provided as an RTI service in a services oriented architecture.
- the filtering or selection may be based on information, such as a mapping of a data model, a mapping of a metadata model, a differentiating characteristic, a relationship of an item to another item, an attribute of an item, or the syntax of an identifier, that is obtained by the translation engine and/or system at development-time, design- time or run-time.
- information may be updated in a dynamic fashion in real-time.
- the translation engine 6204 may perform a translation operation on the query 6202 itself, resulting in a revised query 6402, which may be sent for further processing, such as directly to the hub 6208 or database 6210.
- the revised query 6402 may be rendered in a format that is directly compatible with the native format of the hub 6208 or database 6210. For example, by rendering the query in the native format of the database 6210, the system may increase processing efficiency for the query.
- the query 6402 may be filtered or a command such as a select command may be generated to keep a logical modeling entity rather than a physical entity, in which case the query 6402 may be rendered in a format suitable for a logical modeling activity (such as a graphical user interface), rather than for the database.
- a command such as a select command
- the query 6402 may be rendered in a format suitable for a logical modeling activity (such as a graphical user interface), rather than for the database.
- a command such as a select command
- the methods and systems described herein can be used to capture semantic contexts and to handle data integration tasks with respect to a wide range of items related to an enterprise, such as an object, data item, datum, column, row, table, database, instance, attribute, metadata, concept, topic, subject, semantic identifier, other identifier, RFID tag, vendor, supplier, customer, person, team, organization, user, network, system, device, family, store, product, product line, product feature, product specification, product attribute, price, cost, bill of materials, shipping data, tax data, course, educational program, location, map, division, organization, organism, process, rule, law, rating system, good, service and/or service offering.
- items related to an enterprise such as an object, data item, datum, column, row, table, database, instance, attribute, metadata, concept, topic, subject, semantic identifier, other identifier, RFID tag, vendor, supplier, customer, person, team, organization, user, network, system, device, family, store, product, product line, product feature, product specification, product attribute, price
- the methods and systems described herein can be used in a variety of semantic contexts, such as a step in an enterprise method, a datum in a database, a datum in a row or column, a row or column in a table, a row or column in a database, a datum in a table, a table in a database, metadata in a database, an item in a hub or repository, an item in a database, an item in a table, an item in a column, an item in a row, a person in an organization, a sender or recipient of a communication, a user on a network, a system on a network, a device on a network, a person in a family, an item in a store, a dish on a menu, a product in a product line, a product in a product offering, a course or step in an educational or training program, a location on a map, a location of an item, a division of an organization, a person on a team,
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2007530351A JP2008511936A (en) | 2004-08-31 | 2005-08-31 | Method and system for semantic identification in a data system |
| EP05794064A EP1815349A4 (en) | 2004-08-31 | 2005-08-31 | Methods and systems for semantic identification in data systems |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US60640704P | 2004-08-31 | 2004-08-31 | |
| US60/606,407 | 2004-08-31 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2006026702A2 true WO2006026702A2 (en) | 2006-03-09 |
| WO2006026702A3 WO2006026702A3 (en) | 2006-04-27 |
Family
ID=36000723
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2005/031097 Ceased WO2006026702A2 (en) | 2004-08-31 | 2005-08-31 | Methods and systems for semantic identification in data systems |
Country Status (4)
| Country | Link |
|---|---|
| EP (1) | EP1815349A4 (en) |
| JP (1) | JP2008511936A (en) |
| CN (1) | CN101044472A (en) |
| WO (1) | WO2006026702A2 (en) |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2009110245A (en) * | 2007-10-30 | 2009-05-21 | Yamatake Corp | Information linkage window system and program |
| EP2112593A1 (en) | 2008-04-25 | 2009-10-28 | Facton GmbH | Domain model concept for developing computer applications |
| US8635173B2 (en) | 2010-03-12 | 2014-01-21 | Microsoft Corporation | Semantics update and adaptive interfaces in connection with information as a service |
| WO2014074908A3 (en) * | 2012-11-08 | 2014-08-14 | Microsoft Corporation | Intermediary model to handle web vocabulary conflicts |
| US9076152B2 (en) | 2010-10-20 | 2015-07-07 | Microsoft Technology Licensing, Llc | Semantic analysis of information |
| JP2015133144A (en) * | 2006-08-31 | 2015-07-23 | スウィーニー,ピーター | System and method for information architecture of consumer definition and computer program |
| US9183275B2 (en) | 2007-01-17 | 2015-11-10 | International Business Machines Corporation | Data profiling method and system |
| WO2018013310A1 (en) * | 2016-07-11 | 2018-01-18 | Investcloud Inc | Data exchange common interface configuration |
| JP2020077419A (en) * | 2018-11-09 | 2020-05-21 | フェニックス コンタクト ゲーエムベーハー ウント コムパニー カーゲー | Device and method for generating neutral data of product specification |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8428984B2 (en) * | 2009-08-31 | 2013-04-23 | Sap Ag | Transforming service oriented architecture models to service oriented infrastructure models |
| CN102402507B (en) * | 2010-09-07 | 2014-07-09 | 重庆邮电大学 | Heterogeneous data integration system for service-oriented architecture (SOA) multi-message mechanism |
| CN102541861A (en) * | 2010-12-14 | 2012-07-04 | 金蝶软件(中国)有限公司 | Method, device and system for establishing mapping relation in system integration |
| CN104461494B (en) * | 2014-10-29 | 2018-10-26 | 中国建设银行股份有限公司 | A kind of method and device for the data packet generating data processing tools |
| EP3676701B1 (en) * | 2017-10-12 | 2023-11-29 | Hewlett-Packard Development Company, L.P. | Schema syntax |
| KR102150335B1 (en) * | 2019-01-17 | 2020-09-01 | 주식회사 쓰리데이즈 | Database management system |
| CN119884058B (en) * | 2024-11-22 | 2025-10-21 | 中国人民解放军国防科技大学 | Multi-domain metadata management method for high performance computing scenarios |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5692184A (en) * | 1995-05-09 | 1997-11-25 | Intergraph Corporation | Object relationship management system |
| US6044374A (en) * | 1997-11-14 | 2000-03-28 | Informatica Corporation | Method and apparatus for sharing metadata between multiple data marts through object references |
| US7533107B2 (en) * | 2000-09-08 | 2009-05-12 | The Regents Of The University Of California | Data source integration system and method |
| US6937983B2 (en) * | 2000-12-20 | 2005-08-30 | International Business Machines Corporation | Method and system for semantic speech recognition |
-
2005
- 2005-08-31 CN CNA2005800290342A patent/CN101044472A/en active Pending
- 2005-08-31 JP JP2007530351A patent/JP2008511936A/en active Pending
- 2005-08-31 WO PCT/US2005/031097 patent/WO2006026702A2/en not_active Ceased
- 2005-08-31 EP EP05794064A patent/EP1815349A4/en not_active Withdrawn
Non-Patent Citations (1)
| Title |
|---|
| See references of EP1815349A4 * |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2015133144A (en) * | 2006-08-31 | 2015-07-23 | スウィーニー,ピーター | System and method for information architecture of consumer definition and computer program |
| US9183275B2 (en) | 2007-01-17 | 2015-11-10 | International Business Machines Corporation | Data profiling method and system |
| JP2009110245A (en) * | 2007-10-30 | 2009-05-21 | Yamatake Corp | Information linkage window system and program |
| EP2112593A1 (en) | 2008-04-25 | 2009-10-28 | Facton GmbH | Domain model concept for developing computer applications |
| US8635173B2 (en) | 2010-03-12 | 2014-01-21 | Microsoft Corporation | Semantics update and adaptive interfaces in connection with information as a service |
| US9076152B2 (en) | 2010-10-20 | 2015-07-07 | Microsoft Technology Licensing, Llc | Semantic analysis of information |
| US11301523B2 (en) | 2010-10-20 | 2022-04-12 | Microsoft Technology Licensing, Llc | Semantic analysis of information |
| WO2014074908A3 (en) * | 2012-11-08 | 2014-08-14 | Microsoft Corporation | Intermediary model to handle web vocabulary conflicts |
| WO2018013310A1 (en) * | 2016-07-11 | 2018-01-18 | Investcloud Inc | Data exchange common interface configuration |
| JP2020077419A (en) * | 2018-11-09 | 2020-05-21 | フェニックス コンタクト ゲーエムベーハー ウント コムパニー カーゲー | Device and method for generating neutral data of product specification |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2008511936A (en) | 2008-04-17 |
| EP1815349A2 (en) | 2007-08-08 |
| EP1815349A4 (en) | 2008-12-10 |
| CN101044472A (en) | 2007-09-26 |
| WO2006026702A3 (en) | 2006-04-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8060553B2 (en) | Service oriented architecture for a transformation function in a data integration platform | |
| US7814142B2 (en) | User interface service for a services oriented architecture in a data integration platform | |
| US8041760B2 (en) | Service oriented architecture for a loading function in a data integration platform | |
| US7814470B2 (en) | Multiple service bindings for a real time data integration service | |
| US8352478B2 (en) | Master data framework | |
| US7761406B2 (en) | Regenerating data integration functions for transfer from a data integration platform | |
| US8307109B2 (en) | Methods and systems for real time integration services | |
| JP4594306B2 (en) | Self-describing business object | |
| US20050262193A1 (en) | Logging service for a services oriented architecture in a data integration platform | |
| US20050223109A1 (en) | Data integration through a services oriented architecture | |
| US20060010195A1 (en) | Service oriented architecture for a message broker in a data integration platform | |
| US20050262189A1 (en) | Server-side application programming interface for a real time data integration service | |
| US20050228808A1 (en) | Real time data integration services for health care information data integration | |
| US20050235274A1 (en) | Real time data integration for inventory management | |
| US20050234969A1 (en) | Services oriented architecture for handling metadata in a data integration platform | |
| US20050240354A1 (en) | Service oriented architecture for an extract function in a data integration platform | |
| US20060069717A1 (en) | Security service for a services oriented architecture in a data integration platform | |
| US20050262190A1 (en) | Client side interface for real time data integration jobs | |
| US20050240592A1 (en) | Real time data integration for supply chain management | |
| US20050222931A1 (en) | Real time data integration services for financial information data integration | |
| US20050232046A1 (en) | Location-based real time data integration services | |
| JP2008511928A (en) | Metadata management | |
| US20050243604A1 (en) | Migrating integration processes among data integration platforms | |
| US20050251533A1 (en) | Migrating data integration processes through use of externalized metadata representations | |
| CN109033113B (en) | Data warehouse and data mart management method and device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| WWE | Wipo information: entry into national phase |
Ref document number: 200580029034.2 Country of ref document: CN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2007530351 Country of ref document: JP |
|
| NENP | Non-entry into the national phase in: |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2005794064 Country of ref document: EP |
|
| WWP | Wipo information: published in national office |
Ref document number: 2005794064 Country of ref document: EP |