US20160357758A1 - Metadata search based on semantics - Google Patents
Metadata search based on semantics Download PDFInfo
- Publication number
- US20160357758A1 US20160357758A1 US15/243,342 US201615243342A US2016357758A1 US 20160357758 A1 US20160357758 A1 US 20160357758A1 US 201615243342 A US201615243342 A US 201615243342A US 2016357758 A1 US2016357758 A1 US 2016357758A1
- Authority
- US
- United States
- Prior art keywords
- entity
- metadata
- relationship
- repository
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
-
- G06F17/3053—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/907—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G06F17/30312—
-
- G06F17/30477—
-
- G06F17/30554—
Definitions
- a search engine will index data elements and only those elements which have the words “Quarterly”, “Revenue”, or any combinations of the above will show up on the search result.
- This approach doesn't consider the fact that in technical systems an element name may be different than a business terminology.
- the corresponding database table that stores the “Quarterly Revenue” may be called “QTR_SALES_REV” and thus in a conventional index based search, the database table QTR_SALES_REV that stored the “Quarterly Revenue” will not be returned.
- FIG. 1 illustrates a method according to some embodiments.
- FIG. 2 illustrates a system according to some embodiments.
- FIG. 3 illustrates a repository according to some embodiments.
- FIG. 4 illustrates an apparatus according to some embodiments.
- FIG. 5 illustrates a weight table according to some embodiments.
- Metadata may comprise data that characterizes other data and may exist in many different places within an enterprise.
- Metadata may comprise metadata semantics.
- metadata semantic may be defined as inherent rules and metadata relationships.
- metadata relationship may be defined as the relationships between metadata objects which can be explicit or implicitly derived from a system. Metadata semantics may be added to search indexes and thus, a search may be performed on not only keyword matching but also on following a plurality of metadata paths (e.g., a graph) of object relationships to reach as many relevant objects as possible.
- Each object in the path may be scored and the score for each object may determine an object's relevance (e.g., a relevance of the object to be included in search results). The score may be based on keyword matching, object relationships and the relationship depth in the path.
- employees' tax identities may be stored in multiple places in a database and may be used for multiple purposes under different names like SSN (social security number in US), SIN (social insurance number in Canada), TAX_ID, etc.
- SSN social security number in US
- SIN social insurance number in Canada
- TAX_ID etc.
- a user may desire to discover each field where tax identities are stored, and what impact might occur if a change is made to these fields. If a search is performed based on only keyword matching, the user will need to investigate which database tables or views store tax identities by searching different keywords like SSN, SIN and TAX_ID, and then manually trace those keywords to other metadata like reports and business terms that have relationships with these tables and views and would be impacted by a change to the searched fields.
- the present embodiments using enriched searches with metadata semantics, may perform a single search (e.g., the term “tax identity”) and the search result may contain all relevant database tables, views, reports and business terms that comprise a high enough relevance score to be included in search results where the relevance score is based on types of objects, relationships, and their depth.
- a single search e.g., the term “tax identity”
- the search result may contain all relevant database tables, views, reports and business terms that comprise a high enough relevance score to be included in search results where the relevance score is based on types of objects, relationships, and their depth.
- FIG. 1 is a flow chart that illustrates a method 100 that may be performed according to some embodiments.
- the flow chart in FIG. 1 does not imply a fixed order to the steps, and embodiments of the present invention can be practiced in any order that is practicable.
- the methods may be performed by any of the devices described herein.
- the method shown in FIG. 1 may be performed, for example, by the system 200 of FIG. 2 and the apparatus 400 FIG. 4 .
- the method 100 may be embodied on a non-transitory computer-readable medium.
- a plurality of metadata associated with an entity is received.
- the plurality of metadata may be transmitted by a metadata engine, such as, but not limited to SAP's Metadata Management module in SAP Information Steward.
- System 200 may comprise a metadata engine 240 and a user device 230 in communication with a server 250 .
- the metadata engine 240 transmits collected metadata to the server 250 .
- the metadata engine 240 may have collected a plurality of metadata semantics associated with a system such as database or business application (not shown).
- the plurality of metadata is stored in a repository.
- a repository may comprise a relational database, a flat file, an in-memory database, etc.
- the metadata engine may consolidate metadata from various data sources and store the metadata into a central repository for metadata management.
- the repository may include metadata from various data sources.
- the metadata engine may consolidate metadata from a database system or business application and transmit that data to the server 250 where the plurality of metadata is stored in a repository, such as, database 220 .
- the server 250 may comprise the metadata engine 240 .
- a search index 260 is built by the metadata engine which is comprised of metadata in the database 220 .
- the search index 260 is used by the processor 210 for returning metadata search result to the user device.
- a search request associated with a data object is received at 130 , referring back to FIG. 1 .
- a search request for the term “Quarterly Revenue” may be received from the user device 230 .
- the search request may be received at the server 250 .
- a search request may contain one or more keywords for a metadata search.
- search results that comprise a portion of the plurality of metadata stored in the repository are determined.
- the determination may be based on a search index that has been enhanced with metadata semantics.
- a search index may be enriched and augmented with metadata semantics, metadata relationships and business glossary terms.
- each search index may be (1) augmented with consolidated metadata which includes definition of that element in various contexts such as how that element is defined in various enterprise systems, (2) augmented with metadata associated with a parent or child of each entity contained in the search index, (3) augmented with various relationships, which are discovered through metadata analysis along with other objects in the enterprise systems (4) provided with a relationship distance based on object weighting to determine an relevance of an object in a given context.
- the repository 300 may illustrate an example of data entities that may be stored in a repository 300 .
- the repository 300 illustrated as a table, may list metadata objects which are related to a respective data entity.
- the repository 300 defines fields 310 , 320 , 330 and 340 .
- Field 310 relates to a data entity name and field 320 relates to an entity type for an associated entity name.
- an entity name may be REG_SALES_WEBI.RPT which has an entity type of report.
- Other examples illustrated in the repository 300 comprise an entity name of REVENUE which is a type of report field, REV_AMOUNT which is a type database column, and QTR_SALES_REV which is a type database table.
- Field 330 may relate to one or more target entities which comprise metadata objects associated with a respective data entity as listed in field 310 .
- a data entity REG_SALES_WEBI.RPT is related to metadata objects such as QTR_SALES_REV, PRODUCT_SALES, REGIONAL SALES REPORT, REGION, COUNTRY, YEAR, QUARTER, REVENUE, SALES by a relationship type which is contained in field 340 .
- QTR_SALES_REV may be related to REG_SALES_WEBI.RPT through lineage relationship
- PRODUCT_SALES is related to REG_SALES_WEBI.RPT through 2 levels of lineage relationship
- REGIONAL SALES REPORT is business glossary definition associated with REG_SALES_WEBI.RPT
- REGION, COUNTRY, YEAR, QUARTER, REVENUE are each type report field.
- business glossary may be defined as business terms, terminology and concepts that are defined by a business user. Typically, a business user or a data steward may create a business glossary and associate terms in the glossary to various metadata entities to convey the meanings, relationships and other aspects.
- impact and lineage may be defined as a relationship between a source and target entity. The target entity may be affected when a change is made to the source entity. For example, if it is known that a first object impacts a second object (Obj1 ⁇ Obj2), then Obj1 has an impact relationship to Obj2, while Obj2 has lineage relationship to Obj1. There may be many levels of impact and lineage relationship between two objects. In the case of Obj1 ⁇ Obj2 ⁇ Obj3, Obj3 has a “level 2 lineage” relationship to Obj1. It's also possible that the objects in the relationships may reside in separate systems.
- a search request for “Quarterly Revenue” may be received at a processor, such as processor 210 , the processor examines all the search indexes and finds the REG_SALES_WEBI.RPT report because the search index contains a report fields relationship to QUARTER and REVENUE. Based on the lineage relationship between REVENUE report field and REV_AMOUNT column, and parent container relationship between REV_AMOUNT column and QTR_SALES_REV table, the QTR_SALES_REV table is returned in the search.
- the search may also return various other elements from the repository 300 that relate to the REG_SALES_WEBI.RPT such as report fields, business glossary definitions, other entities contained in a parent container/folder, other entities that would be impacted (e.g., type impact) from the user device 230 .
- the search request may be received at the server 250 .
- a semantic and relationship enriched search may find the report field “REVENUE” by simple keyword matching, and a processor may expand the search results along the semantics and relationships described in the repository 300 to find the related metadata objects such as the database column REV_AMOUNT, the report REG_SALES_WEB.RPT, and the table “QTR_SALES_REV”. After that, the search may continue based on object relationships and finds a list of business terms. Finally the processor may combine all these different objects, and sorts them based on the relevance score.
- the relevance score may be defined as follows:
- a search (e.g., a query q) in a document d, which means metadata, may be scored using the following formula:
- TF may comprise a frequency of the term t.
- IDF may comprise inverse document frequency.
- the score of query q for document d may be calculated on TF-IDF, relationship frequency, depth of an object in a relationship graph, and its related parent objects.
- TF-IDF may comprise a numerical statistic which may reflect how important a word is to a document in a collection.
- Relationship frequency may comprise another measurement that describes how many hidden relationships exist in a related (indirect) object.
- tf(t in d) may relate to a frequency of a term t in a document d.
- tf(t in d) may be normalized to (Frequency of a term t in a document d/total number of terms in a document) 1/2 .
- idf(t, D) may relate to term t's inverse document frequency that is based on number of documents containing the term within a collection of document D. It may be calculated as (1+LOG (numDocs/(docFreq+1))) where numDocs is the total number of the documents and docFreq is the number of documents containing the term.
- relationshipf (t in d) may relate to a relationship of a document. It may be based on a relationship found related to the document d given by a term t within a collection of relationships of document D.
- the formula is 1+LOG(relationshipWeight*numberOccurs/(totalRelationshipsWeight+1)) where relationshipWeight is the weight of a relationship type, and totalRelationshipsWeight is the sum of number of relationships weight to the document.
- depth(n th ) may relate to the level of depth of the object to the top object.
- FIG. 5 illustrates a weight table 500 according to some embodiments.
- FIG. 5 defines fields 510 and 520 .
- Field 510 relates to a type of relationship and field 520 relates to a weight given to a respective relationship.
- a relationship type of “same as” may relate to two objects that are the same by looking at rules to determine that, even if they have different names, the two objects are the same.
- a relationship type of “parent-child” may relate to a parent-child relationship of objects such that a parent may have multiple children but a child may only have a single parent.
- a relationship type of “association” may relate to objects that have some association with each other but to not have a parent-child relationship. For example, two objects may work in conjunction with each other or may comprise a friendship relationship (e.g., social networks).
- a relationship type of “source-target may relate to two objects where one is a source and the other object is a target of the source object.
- a relationship type of “business glossary” may relate to business names or user defined relationships.
- Metadata semantics may be derived from metadata relationships.
- An enhanced search index (e.g., keywords as well as metadata) may be based on a metadata object's name, description and other attributes.
- the enhanced search index may comprise metadata semantics and relationships and business terms and thus the search index may be based on relationships which are linked to other related objects. For each type of relationship, the weight used in the score calculation can be different and configurable. Search results may be limited to an arbitrary number (e.g., 10) and the search results may then be transmitted to a user device.
- the apparatus 400 may be associated with a server that receives a search request such as server 200 .
- the apparatus 400 may comprise a storage device 401 , a medium 402 , a processor 403 , and a memory 404 .
- the apparatus 400 may further comprise a digital display port, such as a port adapted to be coupled to a digital computer monitor, television, portable display screen, or the like.
- the medium 402 may comprise any computer-readable medium that may store processor-executable instructions to be executed by the processor 403 .
- the medium 402 may comprise a non-transitory tangible medium such as, but not limited to, a compact disk, a digital video disk, flash memory, optical storage, random access memory, read only memory, or magnetic media.
- a program may be stored on the medium 402 in a compressed, uncompiled and/or encrypted format.
- the program may furthermore include other program elements, such as an operating system, a database management system, and/or device drivers used by the processor 403 to interface with peripheral devices.
- the processor 403 may include or otherwise be associated with dedicated registers, stacks, queues, etc. that are used to execute program code and/or one or more of these elements may be shared there between.
- the processor 403 may comprise an integrated circuit.
- the processor 403 may comprise circuitry to perform a method such as, but not limited to, the method described with respect to FIG. 1 .
- the processor 403 communicates with the storage device 401 .
- the storage device 401 may comprise any appropriate information storage device, including combinations of magnetic storage devices (e.g., a hard disk drive), optical storage devices, flash drives, and/or semiconductor memory devices.
- the storage device 401 stores a program for controlling the processor 403 .
- the processor 403 performs instructions of the program, and thereby operates in accordance with any of the embodiments described herein.
- the main memory 404 may comprise any type of memory for storing data, such as, but not limited to, a flash driver, a Secure Digital (SD) card, a micro SD card, a Single Data Rate Random Access Memory (SDR-RAM), a Double Data Rate Random Access Memory (DDR-RAM), or a Programmable Read Only Memory (PROM).
- the main memory 404 may comprise a plurality of memory modules.
- information may be “received” by or “transmitted” to, for example: (i) the apparatus 400 from another device; or (ii) a software application or module within the apparatus 400 from another software application, module, or any other source.
- the storage device 401 stores a database (e.g., including information associated with metadata semantics and metadata relationships).
- a database e.g., including information associated with metadata semantics and metadata relationships.
- the database described herein is only an example, and additional and/or different information may be stored therein.
- various databases might be split or combined in accordance with any of the embodiments described herein.
- an external database may be used.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
According to some embodiments, a method and an apparatus of enriching search results with metadata are provided to receive a plurality of metadata associated with an entity and storing the plurality of metadata in a repository. A search request associated with the entity is received and search results that comprise a portion of the plurality of metadata stored in the repository are determined.
Description
- Traditional search mechanisms are based on keyword matching by creating indexes on various text elements. Thus, a user can only perform searches based on keywords that match data elements contained in an index. For example a user may search for “Quarterly Revenue”.
- In a conventional index based search, a search engine will index data elements and only those elements which have the words “Quarterly”, “Revenue”, or any combinations of the above will show up on the search result. This approach doesn't consider the fact that in technical systems an element name may be different than a business terminology. In other words, the corresponding database table that stores the “Quarterly Revenue” may be called “QTR_SALES_REV” and thus in a conventional index based search, the database table QTR_SALES_REV that stored the “Quarterly Revenue” will not be returned.
- Therefore, it is desirable to have a system and method to expand a conventional index based search to return greater amounts of relevant data.
-
FIG. 1 illustrates a method according to some embodiments. -
FIG. 2 illustrates a system according to some embodiments. -
FIG. 3 illustrates a repository according to some embodiments. -
FIG. 4 illustrates an apparatus according to some embodiments. -
FIG. 5 illustrates a weight table according to some embodiments. - The present embodiments relate to a method, apparatus and system to enrich searches with metadata from a metadata repository. Metadata may comprise data that characterizes other data and may exist in many different places within an enterprise. Metadata may comprise metadata semantics. The term “metadata semantic” may be defined as inherent rules and metadata relationships. The term “metadata relationship” may be defined as the relationships between metadata objects which can be explicit or implicitly derived from a system. Metadata semantics may be added to search indexes and thus, a search may be performed on not only keyword matching but also on following a plurality of metadata paths (e.g., a graph) of object relationships to reach as many relevant objects as possible. Each object in the path may be scored and the score for each object may determine an object's relevance (e.g., a relevance of the object to be included in search results). The score may be based on keyword matching, object relationships and the relationship depth in the path.
- For example, employees' tax identities may be stored in multiple places in a database and may be used for multiple purposes under different names like SSN (social security number in US), SIN (social insurance number in Canada), TAX_ID, etc. For auditing purposes, a user may desire to discover each field where tax identities are stored, and what impact might occur if a change is made to these fields. If a search is performed based on only keyword matching, the user will need to investigate which database tables or views store tax identities by searching different keywords like SSN, SIN and TAX_ID, and then manually trace those keywords to other metadata like reports and business terms that have relationships with these tables and views and would be impacted by a change to the searched fields. The present embodiments, using enriched searches with metadata semantics, may perform a single search (e.g., the term “tax identity”) and the search result may contain all relevant database tables, views, reports and business terms that comprise a high enough relevance score to be included in search results where the relevance score is based on types of objects, relationships, and their depth.
- Turning now in detail to the drawings,
FIG. 1 is a flow chart that illustrates amethod 100 that may be performed according to some embodiments. The flow chart inFIG. 1 does not imply a fixed order to the steps, and embodiments of the present invention can be practiced in any order that is practicable. Moreover, the methods may be performed by any of the devices described herein. The method shown inFIG. 1 may be performed, for example, by thesystem 200 ofFIG. 2 and theapparatus 400FIG. 4 . Themethod 100 may be embodied on a non-transitory computer-readable medium. - At 110, a plurality of metadata associated with an entity is received. The plurality of metadata may be transmitted by a metadata engine, such as, but not limited to SAP's Metadata Management module in SAP Information Steward.
- For illustrative purposes, and to aid in understanding features of the specification, an example will be introduced. This example is not intended to limit the scope of the claims.
- Now referring to
FIG. 2 , an embodiment of asystem 200 is illustrated.System 200 may comprise ametadata engine 240 and auser device 230 in communication with aserver 250. Themetadata engine 240 transmits collected metadata to theserver 250. In the present example, themetadata engine 240 may have collected a plurality of metadata semantics associated with a system such as database or business application (not shown). - Referring back to
FIG. 1 , at 120, the plurality of metadata is stored in a repository. A repository may comprise a relational database, a flat file, an in-memory database, etc. The metadata engine may consolidate metadata from various data sources and store the metadata into a central repository for metadata management. Thus, the repository may include metadata from various data sources. Continuing with the above example, the metadata engine may consolidate metadata from a database system or business application and transmit that data to theserver 250 where the plurality of metadata is stored in a repository, such as,database 220. In some embodiments, theserver 250 may comprise themetadata engine 240. Asearch index 260 is built by the metadata engine which is comprised of metadata in thedatabase 220. Thesearch index 260 is used by theprocessor 210 for returning metadata search result to the user device. - A search request associated with a data object (e.g., an entity) is received at 130, referring back to
FIG. 1 . Continuing with the above example, a search request for the term “Quarterly Revenue” may be received from theuser device 230. The search request may be received at theserver 250. A search request may contain one or more keywords for a metadata search. - At 140, search results that comprise a portion of the plurality of metadata stored in the repository are determined. The determination may be based on a search index that has been enhanced with metadata semantics. In the present embodiments, a search index may be enriched and augmented with metadata semantics, metadata relationships and business glossary terms. In some embodiments, semantic knowledge may be added to a search index process or, in other words, each search index may be (1) augmented with consolidated metadata which includes definition of that element in various contexts such as how that element is defined in various enterprise systems, (2) augmented with metadata associated with a parent or child of each entity contained in the search index, (3) augmented with various relationships, which are discovered through metadata analysis along with other objects in the enterprise systems (4) provided with a relationship distance based on object weighting to determine an relevance of an object in a given context.
- Now referring to
FIG. 3 , an embodiment of arepository 300 is illustrated. Therepository 300 may illustrate an example of data entities that may be stored in arepository 300. Therepository 300, illustrated as a table, may list metadata objects which are related to a respective data entity. Therepository 300 defines 310, 320, 330 and 340.fields Field 310 relates to a data entity name andfield 320 relates to an entity type for an associated entity name. For example, and as illustrated inrepository 300, an entity name may be REG_SALES_WEBI.RPT which has an entity type of report. Other examples illustrated in therepository 300 comprise an entity name of REVENUE which is a type of report field, REV_AMOUNT which is a type database column, and QTR_SALES_REV which is a type database table. -
Field 330 may relate to one or more target entities which comprise metadata objects associated with a respective data entity as listed infield 310. For example, a data entity REG_SALES_WEBI.RPT is related to metadata objects such as QTR_SALES_REV, PRODUCT_SALES, REGIONAL SALES REPORT, REGION, COUNTRY, YEAR, QUARTER, REVENUE, SALES by a relationship type which is contained infield 340. As illustrated, QTR_SALES_REV may be related to REG_SALES_WEBI.RPT through lineage relationship, PRODUCT_SALES is related to REG_SALES_WEBI.RPT through 2 levels of lineage relationship, REGIONAL SALES REPORT is business glossary definition associated with REG_SALES_WEBI.RPT, and REGION, COUNTRY, YEAR, QUARTER, REVENUE are each type report field. - The term “business glossary” may be defined as business terms, terminology and concepts that are defined by a business user. Typically, a business user or a data steward may create a business glossary and associate terms in the glossary to various metadata entities to convey the meanings, relationships and other aspects. The terms “impact” and “lineage” may be defined as a relationship between a source and target entity. The target entity may be affected when a change is made to the source entity. For example, if it is known that a first object impacts a second object (Obj1→Obj2), then Obj1 has an impact relationship to Obj2, while Obj2 has lineage relationship to Obj1. There may be many levels of impact and lineage relationship between two objects. In the case of Obj1→Obj2→Obj3, Obj3 has a “
level 2 lineage” relationship to Obj1. It's also possible that the objects in the relationships may reside in separate systems. - Continuing with the above example, a search request for “Quarterly Revenue” may be received at a processor, such as
processor 210, the processor examines all the search indexes and finds the REG_SALES_WEBI.RPT report because the search index contains a report fields relationship to QUARTER and REVENUE. Based on the lineage relationship between REVENUE report field and REV_AMOUNT column, and parent container relationship between REV_AMOUNT column and QTR_SALES_REV table, the QTR_SALES_REV table is returned in the search. The search may also return various other elements from therepository 300 that relate to the REG_SALES_WEBI.RPT such as report fields, business glossary definitions, other entities contained in a parent container/folder, other entities that would be impacted (e.g., type impact) from theuser device 230. The search request may be received at theserver 250. - A semantic and relationship enriched search may find the report field “REVENUE” by simple keyword matching, and a processor may expand the search results along the semantics and relationships described in the
repository 300 to find the related metadata objects such as the database column REV_AMOUNT, the report REG_SALES_WEB.RPT, and the table “QTR_SALES_REV”. After that, the search may continue based on object relationships and finds a list of business terms. Finally the processor may combine all these different objects, and sorts them based on the relevance score. - The relevance score may be defined as follows:
- A search (e.g., a query q) in a document d, which means metadata, may be scored using the following formula:
-
- TF may comprise a frequency of the term t. IDF may comprise inverse document frequency. The score of query q for document d may be calculated on TF-IDF, relationship frequency, depth of an object in a relationship graph, and its related parent objects. TF-IDF may comprise a numerical statistic which may reflect how important a word is to a document in a collection. Relationship frequency may comprise another measurement that describes how many hidden relationships exist in a related (indirect) object.
- tf(t in d) may relate to a frequency of a term t in a document d. In order to avoid bias to large documents tf(t in d) may be normalized to (Frequency of a term t in a document d/total number of terms in a document) 1/2.
- idf(t, D) may relate to term t's inverse document frequency that is based on number of documents containing the term within a collection of document D. It may be calculated as (1+LOG (numDocs/(docFreq+1))) where numDocs is the total number of the documents and docFreq is the number of documents containing the term.
- relationshipf (t in d) may relate to a relationship of a document. It may be based on a relationship found related to the document d given by a term t within a collection of relationships of document D. The formula is 1+LOG(relationshipWeight*numberOccurs/(totalRelationshipsWeight+1)) where relationshipWeight is the weight of a relationship type, and totalRelationshipsWeight is the sum of number of relationships weight to the document.
- depth(nth) may relate to the level of depth of the object to the top object.
- Since metadata objects may come from various data sources, the types of relationships between them may be different. The term relationshipWeight is denoted as the weight of a type of relationship used in the score calculation.
FIG. 5 illustrates a weight table 500 according to some embodiments.FIG. 5 defines 510 and 520.fields Field 510 relates to a type of relationship andfield 520 relates to a weight given to a respective relationship. - A relationship type of “same as” may relate to two objects that are the same by looking at rules to determine that, even if they have different names, the two objects are the same. A relationship type of “parent-child” may relate to a parent-child relationship of objects such that a parent may have multiple children but a child may only have a single parent. A relationship type of “association” may relate to objects that have some association with each other but to not have a parent-child relationship. For example, two objects may work in conjunction with each other or may comprise a friendship relationship (e.g., social networks). A relationship type of “source-target may relate to two objects where one is a source and the other object is a target of the source object. A relationship type of “business glossary” may relate to business names or user defined relationships.
- Some factors used for scoring comprise the following:
-
tf(t in d) (Frequency of a term t in a document d/total number of terms in a document)1/2 idf(t) 1 + LOG (numDocs/(docFreq + 1)) relationshipf(t, d) 1 + LOG(relationshipWeight * numberOccurs/ (totalRelationshipsWeight + 1)) numDocs The number of all documents numberOccurs The number of relationship of this kind to this object totalRelationshipsWeight The total weight of relationships to this object docFreq The number of document which has the term depth(nth) 1/the number of the depth to this object score(p) The score of parent - As described above, for a given metadata entity, a search using metadata combines search indexes from keyword matching and metadata semantic matching. Metadata semantics may be derived from metadata relationships. An enhanced search index (e.g., keywords as well as metadata) may be based on a metadata object's name, description and other attributes. The enhanced search index may comprise metadata semantics and relationships and business terms and thus the search index may be based on relationships which are linked to other related objects. For each type of relationship, the weight used in the score calculation can be different and configurable. Search results may be limited to an arbitrary number (e.g., 10) and the search results may then be transmitted to a user device.
- Now referring to
FIG. 4 , an embodiment of anapparatus 400 is illustrated. In some embodiments, theapparatus 400 may be associated with a server that receives a search request such asserver 200. - The
apparatus 400 may comprise astorage device 401, a medium 402, aprocessor 403, and amemory 404. According to some embodiments, theapparatus 400 may further comprise a digital display port, such as a port adapted to be coupled to a digital computer monitor, television, portable display screen, or the like. - The medium 402 may comprise any computer-readable medium that may store processor-executable instructions to be executed by the
processor 403. For example, the medium 402 may comprise a non-transitory tangible medium such as, but not limited to, a compact disk, a digital video disk, flash memory, optical storage, random access memory, read only memory, or magnetic media. - A program may be stored on the medium 402 in a compressed, uncompiled and/or encrypted format. The program may furthermore include other program elements, such as an operating system, a database management system, and/or device drivers used by the
processor 403 to interface with peripheral devices. - The
processor 403 may include or otherwise be associated with dedicated registers, stacks, queues, etc. that are used to execute program code and/or one or more of these elements may be shared there between. In some embodiments, theprocessor 403 may comprise an integrated circuit. In some embodiments, theprocessor 403 may comprise circuitry to perform a method such as, but not limited to, the method described with respect toFIG. 1 . - The
processor 403 communicates with thestorage device 401. Thestorage device 401 may comprise any appropriate information storage device, including combinations of magnetic storage devices (e.g., a hard disk drive), optical storage devices, flash drives, and/or semiconductor memory devices. Thestorage device 401 stores a program for controlling theprocessor 403. Theprocessor 403 performs instructions of the program, and thereby operates in accordance with any of the embodiments described herein. - The
main memory 404 may comprise any type of memory for storing data, such as, but not limited to, a flash driver, a Secure Digital (SD) card, a micro SD card, a Single Data Rate Random Access Memory (SDR-RAM), a Double Data Rate Random Access Memory (DDR-RAM), or a Programmable Read Only Memory (PROM). Themain memory 404 may comprise a plurality of memory modules. - As used herein, information may be “received” by or “transmitted” to, for example: (i) the
apparatus 400 from another device; or (ii) a software application or module within theapparatus 400 from another software application, module, or any other source. - In some embodiments, the
storage device 401 stores a database (e.g., including information associated with metadata semantics and metadata relationships). Note that the database described herein is only an example, and additional and/or different information may be stored therein. Moreover, various databases might be split or combined in accordance with any of the embodiments described herein. In some embodiments, an external database may be used. - Embodiments have been described herein solely for the purpose of illustration. Persons skilled in the art will recognize from this description that embodiments are not limited to those described, but may be practiced with modifications and alterations limited only by the spirit and scope of the appended claims.
Claims (21)
1-20. (canceled)
21. A method to enrich search results with metadata, the method comprising:
receiving a plurality of metadata associated with an entity wherein the entity is associated with a plurality of relationships and relationship types;
storing the plurality of metadata in a repository, the plurality of metadata comprising (i) an entity name and (ii) a plurality of target entities associated with the entity name wherein each respective target entity is associated with a relationship type field that indicates a type of relationship between the entity and the respective target entity;
receiving a search request associated with the entity; and
determining search results that comprise a portion of the plurality of metadata stored in the repository.
22. The method of claim 21 , wherein the repository further comprises a relationship weight field that indicates a weight for each type of relationship and the portion of the plurality of metadata is based on a score calculation associated with the weight of each relationship type.
23. The method of claim 21 , wherein the repository further comprises an entity type field.
24. The method of claim 21 , wherein the plurality of metadata is received from a metadata engine.
25. The method of claim 21 , wherein the plurality of metadata comprises entities such as a report, a report field, a database column, and a database table.
26. The method of claim 21 , where the portion of the plurality of metadata is transmitted based on a score calculation.
27. The method of claim 21 , wherein the entity is a target entity and the metadata is associated with the entity being affected when a change is made to a source entity.
28. The method of claim 27 , wherein the target entity resides in a first system and the source entity resides in a second system.
29. The method of claim 21 , wherein the entity is a source entity and the metadata is associated with a change to the entity that affects a target entity.
30. The method of claim 29 , wherein the target entity resides in a first system and the source entity resides in a second system.
31. An apparatus comprising:
a processor; and
a non-transitory computer-readable medium comprising instructions that when executed by a processor perform a method to enrich search results with metadata, the method comprising:
receiving a plurality of metadata associated with an entity wherein the entity is associated with a plurality of relationships and relationship types;
storing the plurality of metadata in a repository, the plurality of metadata comprising (i) an entity name and (ii) a plurality of target entities associated with the entity name wherein each respective target entity is associated with a relationship type field that indicates a type of relationship between the entity and the respective target entity;
receiving a search request associated with the entity; and
determining search results that comprise a portion of the plurality of metadata stored in the repository.
32. The apparatus of claim 31 , wherein the repository further comprises a relationship weight field that indicates a weight for each type of relationship and the portion of the plurality of metadata is based on a score calculation associated with the weight of each relationship type.
33. The apparatus of claim 31 , wherein the repository further comprises an entity type field.
34. The apparatus of claim 31 , wherein the plurality of metadata is received from a metadata engine.
35. The apparatus of claim 31 , wherein the plurality of metadata comprises entities such as a report, a report field, a database column, and a database table.
36. The apparatus of claim 31 , where the portion of the plurality of metadata is transmitted based on a score calculation.
37. The apparatus of claim 31 , wherein the entity is a target entity and the metadata is associated with the entity being affected when a change is made to a source entity.
38. A non-transitory computer-readable medium comprising instructions that when executed by a processor perform a method to enrich search results with metadata, the method comprising:
receiving a plurality of metadata associated with an entity wherein the entity is associated with a plurality of relationships and relationship types;
storing the plurality of metadata in a repository, the plurality of metadata comprising (i) an entity name and (ii) a plurality of target entities associated with the entity name wherein each respective target entity is associated with a relationship type field that indicates a type of relationship between the entity and the respective target entity;
receiving a search request associated with the entity; and
determining search results that comprise a portion of the plurality of metadata stored in the repository.
39. The medium of claim 38 , wherein the repository further comprises a relationship weight field that indicates a weight for each type of relationship and the portion of the plurality of metadata is based on a score calculation associated with the weight of each relationship type.
40. The medium of claim 38 , wherein the repository further comprises an entity type field.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/243,342 US20160357758A1 (en) | 2014-01-29 | 2016-08-22 | Metadata search based on semantics |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US14/167,424 US9449117B2 (en) | 2014-01-29 | 2014-01-29 | Metadata search based on semantics |
| US15/243,342 US20160357758A1 (en) | 2014-01-29 | 2016-08-22 | Metadata search based on semantics |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US14/167,424 Continuation US9449117B2 (en) | 2014-01-29 | 2014-01-29 | Metadata search based on semantics |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20160357758A1 true US20160357758A1 (en) | 2016-12-08 |
Family
ID=53679221
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US14/167,424 Active 2034-08-09 US9449117B2 (en) | 2014-01-29 | 2014-01-29 | Metadata search based on semantics |
| US15/243,342 Abandoned US20160357758A1 (en) | 2014-01-29 | 2016-08-22 | Metadata search based on semantics |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US14/167,424 Active 2034-08-09 US9449117B2 (en) | 2014-01-29 | 2014-01-29 | Metadata search based on semantics |
Country Status (1)
| Country | Link |
|---|---|
| US (2) | US9449117B2 (en) |
Families Citing this family (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10475043B2 (en) | 2015-01-28 | 2019-11-12 | Intuit Inc. | Method and system for pro-active detection and correction of low quality questions in a question and answer based customer support system |
| US10755294B1 (en) | 2015-04-28 | 2020-08-25 | Intuit Inc. | Method and system for increasing use of mobile devices to provide answer content in a question and answer based customer support system |
| US10475044B1 (en) | 2015-07-29 | 2019-11-12 | Intuit Inc. | Method and system for question prioritization based on analysis of the question content and predicted asker engagement before answer content is generated |
| US10380200B2 (en) | 2016-05-31 | 2019-08-13 | At&T Intellectual Property I, L.P. | Method and apparatus for enriching metadata via a network |
| US10572954B2 (en) * | 2016-10-14 | 2020-02-25 | Intuit Inc. | Method and system for searching for and navigating to user content and other user experience pages in a financial management system with a customer self-service system for the financial management system |
| US10733677B2 (en) | 2016-10-18 | 2020-08-04 | Intuit Inc. | Method and system for providing domain-specific and dynamic type ahead suggestions for search query terms with a customer self-service system for a tax return preparation system |
| US10552843B1 (en) | 2016-12-05 | 2020-02-04 | Intuit Inc. | Method and system for improving search results by recency boosting customer support content for a customer self-help system associated with one or more financial management systems |
| US10748157B1 (en) | 2017-01-12 | 2020-08-18 | Intuit Inc. | Method and system for determining levels of search sophistication for users of a customer self-help system to personalize a content search user experience provided to the users and to increase a likelihood of user satisfaction with the search experience |
| US10922367B2 (en) | 2017-07-14 | 2021-02-16 | Intuit Inc. | Method and system for providing real time search preview personalization in data management systems |
| US11093951B1 (en) | 2017-09-25 | 2021-08-17 | Intuit Inc. | System and method for responding to search queries using customer self-help systems associated with a plurality of data management systems |
| US11436642B1 (en) | 2018-01-29 | 2022-09-06 | Intuit Inc. | Method and system for generating real-time personalized advertisements in data management self-help systems |
| US11269665B1 (en) | 2018-03-28 | 2022-03-08 | Intuit Inc. | Method and system for user experience personalization in data management systems using machine learning |
| CN112783836A (en) * | 2019-11-04 | 2021-05-11 | 电科云(北京)科技有限公司 | Information exchange method, device and computer storage medium |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7236966B1 (en) * | 2002-03-08 | 2007-06-26 | Cisco Technology | Method and system for providing a user-customized electronic book |
| US20100106729A1 (en) * | 2008-09-27 | 2010-04-29 | International Business Machines Corporation | System and method for metadata search |
| US20110252065A1 (en) * | 2010-04-12 | 2011-10-13 | Sung-Ho Ryu | Apparatus and method for semantic-based search and semantic metadata providing server and method of operating the same |
| US20130166573A1 (en) * | 2011-12-27 | 2013-06-27 | Business Objects Software Ltd. | Managing Business Objects Data Sources |
| US20130173547A1 (en) * | 2011-12-30 | 2013-07-04 | Bmc Software, Inc. | Systems and methods for migrating database data |
| US20130218898A1 (en) * | 2012-02-16 | 2013-08-22 | Oracle International Corporation | Mechanisms for metadata search in enterprise applications |
| US20150234813A1 (en) * | 2013-11-04 | 2015-08-20 | Michael R. Knapp | Systems and Methods for Categorizing and Accessing Information Databases and for Displaying Query Results |
| US20160004720A1 (en) * | 2013-02-21 | 2016-01-07 | Hitachi Data Systems Engineering UK Limited | Object-Level Replication of Cloned Objects in a Data Storage System |
Family Cites Families (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7734566B2 (en) | 2004-11-01 | 2010-06-08 | Sap Ag | Information retrieval method with efficient similarity search capability |
| US7805432B2 (en) | 2006-06-15 | 2010-09-28 | University College Dublin National University Of Ireland, Dublin | Meta search engine |
| US7987176B2 (en) | 2007-06-25 | 2011-07-26 | Sap Ag | Mixed initiative semantic search |
| US20090063522A1 (en) | 2007-08-17 | 2009-03-05 | Oracle International Corporation | System and method for managing ontologies as service metadata assets in a metadata repository |
| US9424339B2 (en) | 2008-08-15 | 2016-08-23 | Athena A. Smyros | Systems and methods utilizing a search engine |
| US9037567B2 (en) | 2009-04-15 | 2015-05-19 | Vcvc Iii Llc | Generating user-customized search results and building a semantics-enhanced search engine |
| US9092515B2 (en) | 2010-07-02 | 2015-07-28 | M-Files Oy | Method, a computer system and a computer readable medium for querying objects by means of metadata |
| US8762384B2 (en) | 2010-08-19 | 2014-06-24 | Sap Aktiengesellschaft | Method and system for search structured data from a natural language search request |
| US20120150792A1 (en) | 2010-12-09 | 2012-06-14 | Sap Portals Israel Ltd. | Data extraction framework |
| US8527451B2 (en) | 2011-03-17 | 2013-09-03 | Sap Ag | Business semantic network build |
| US8935230B2 (en) | 2011-08-25 | 2015-01-13 | Sap Se | Self-learning semantic search engine |
| US8886639B2 (en) | 2012-04-19 | 2014-11-11 | Sap Ag | Semantically enriched search of services |
| US9177289B2 (en) | 2012-05-03 | 2015-11-03 | Sap Se | Enhancing enterprise service design knowledge using ontology-based clustering |
| US20130325757A1 (en) | 2012-06-05 | 2013-12-05 | Sap Ag | Cascading learning system as semantic search |
-
2014
- 2014-01-29 US US14/167,424 patent/US9449117B2/en active Active
-
2016
- 2016-08-22 US US15/243,342 patent/US20160357758A1/en not_active Abandoned
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7236966B1 (en) * | 2002-03-08 | 2007-06-26 | Cisco Technology | Method and system for providing a user-customized electronic book |
| US20100106729A1 (en) * | 2008-09-27 | 2010-04-29 | International Business Machines Corporation | System and method for metadata search |
| US20110252065A1 (en) * | 2010-04-12 | 2011-10-13 | Sung-Ho Ryu | Apparatus and method for semantic-based search and semantic metadata providing server and method of operating the same |
| US20130166573A1 (en) * | 2011-12-27 | 2013-06-27 | Business Objects Software Ltd. | Managing Business Objects Data Sources |
| US20130173547A1 (en) * | 2011-12-30 | 2013-07-04 | Bmc Software, Inc. | Systems and methods for migrating database data |
| US20130218898A1 (en) * | 2012-02-16 | 2013-08-22 | Oracle International Corporation | Mechanisms for metadata search in enterprise applications |
| US20160004720A1 (en) * | 2013-02-21 | 2016-01-07 | Hitachi Data Systems Engineering UK Limited | Object-Level Replication of Cloned Objects in a Data Storage System |
| US20150234813A1 (en) * | 2013-11-04 | 2015-08-20 | Michael R. Knapp | Systems and Methods for Categorizing and Accessing Information Databases and for Displaying Query Results |
Also Published As
| Publication number | Publication date |
|---|---|
| US20150213021A1 (en) | 2015-07-30 |
| US9449117B2 (en) | 2016-09-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9449117B2 (en) | Metadata search based on semantics | |
| US20230205828A1 (en) | Related entities | |
| Kononenko et al. | Mining modern repositories with elasticsearch | |
| US8380750B2 (en) | Searching and displaying data objects residing in data management systems | |
| US9727628B2 (en) | System and method of applying globally unique identifiers to relate distributed data sources | |
| US8972387B2 (en) | Smarter search | |
| US9152697B2 (en) | Real-time search of vertically partitioned, inverted indexes | |
| US11120057B1 (en) | Metadata indexing | |
| US9218396B2 (en) | Insight determination and explanation in multi-dimensional data sets | |
| US9779172B2 (en) | Personalized search result summary | |
| WO2012129149A2 (en) | Aggregating search results based on associating data instances with knowledge base entities | |
| US9959326B2 (en) | Annotating schema elements based on associating data instances with knowledge base entities | |
| US9792341B2 (en) | Database query processing using horizontal data record alignment of multi-column range summaries | |
| US10783195B2 (en) | System and method for constructing search results | |
| US20160196355A1 (en) | Searching method, searching apparatus and device | |
| US10430394B2 (en) | Data masking name data | |
| US20180341709A1 (en) | Unstructured search query generation from a set of structured data terms | |
| Xiao et al. | Probabilistic top-k range query processing for uncertain databases | |
| US10176230B2 (en) | Search-independent ranking and arranging data | |
| US10191942B2 (en) | Reducing comparisons for token-based entity resolution | |
| US11010391B2 (en) | Domain agnostic similarity detection | |
| Bach et al. | Hybrid column/row-oriented DBMS | |
| US20160019269A1 (en) | System and method for variable presentation semantics of search results in a search environment | |
| Li | On contextual ranking queries in databases | |
| HK1179368B (en) | Presenting search results based upon subject-versions |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |