WO2009113266A1 - コンテンツ検索装置及びコンテンツ検索方法 - Google Patents
コンテンツ検索装置及びコンテンツ検索方法 Download PDFInfo
- Publication number
- WO2009113266A1 WO2009113266A1 PCT/JP2009/000926 JP2009000926W WO2009113266A1 WO 2009113266 A1 WO2009113266 A1 WO 2009113266A1 JP 2009000926 W JP2009000926 W JP 2009000926W WO 2009113266 A1 WO2009113266 A1 WO 2009113266A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- content
- keyword
- keywords
- section
- database
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/912—Applications of a database
- Y10S707/913—Multimedia
- Y10S707/914—Video
Definitions
- the present invention relates to a content search apparatus for searching content that a user wants to use from among a large amount of stored content.
- the related keyword means a word (keyword) related to the word (keyword) specified by the user.
- the degree of association representing the degree of association between keywords related to each other is calculated based on the number of co-occurrence between keywords, the appearance frequency, and the like.
- Such a search device updates the association between keywords for presenting related keywords at the same time as updating data in the content database in which the content is stored (see, for example, Patent Document 1). Therefore, this search device presents the related keywords based on the latest content stored in the content database to the user.
- the related keywords that the user recalls from specific keywords differ from user to user. For example, a user who has watched only a recently broadcast drama in which “Actor A” has appeared recalls “Actress B” who has appeared in that drama as a related keyword. On the other hand, a user who has watched only a broadcast drama one year ago when “Actor A” appeared recalls “Actress C” who appeared in that drama as a related keyword. Thus, when the user's individual knowledge is different, the related keyword that the user recalls for “actor A” can be a different keyword. That is, when the search device presents only related keywords generated based on the latest content, a related keyword that cannot be recalled by some users is presented. As a result, there is a problem that the user cannot select a keyword and cannot narrow down the content.
- Patent Document 2 a method of classifying all the contents stored in the content database according to a fixed time period.
- the search device using the method of Patent Literature 2 can construct an association between keywords for each time interval.
- this search device can present the related keywords generated based on the association of different time intervals to the user for each of a plurality of time intervals.
- this search device has keywords “actress A” related to “actress B”, which is strongly related to content corresponding to the last year, and “actress C”, which is strongly related to content corresponding to the last year. Can be presented to the user at the same time.
- the search device presents related keywords in a plurality of time intervals, so that the user can select related keywords that are suitable for his / her knowledge. That is, the user can effectively narrow down the content by repeatedly selecting related keywords.
- the search device cannot present a relevant keyword that is relevant to the frequency with which the keyword configuration of each attribute changes greatly. For example, in the content of “news genre” where the keyword composition changes greatly, a keyword that is highly relevant to the keyword “topic” is “Parliament” ⁇ “Soccer” ⁇ “Typhoon” during a specific period. Suppose that it changed in order.
- the conventional search device determines a fixed time such that the specific period is included in one time interval in accordance with the content of the “drama genre” that is less frequently changed in keyword configuration.
- a keyword having the highest degree of relevance to “topic” is presented as a related keyword from among “National Diet”, “Soccer” and “Typhoon”. That is, there may be a case where “National Diet” or “Soccer” is presented instead of “Typhoon” which is a related keyword most suitable for current topics with respect to “Topic”.
- the search device determines a fixed time so as to match content with a high frequency of keyword structure change, the search device presents the same related keyword in a plurality of time intervals. For example, when a conventional search device determines a fixed time in accordance with the content of “news genre”, which frequently changes the keyword composition, the search device uses “drama genre”, which does not change the keyword composition significantly. In the content, the same related keyword is presented in a plurality of time intervals. Since the search device has a limited number of keywords that can be presented to the user at a time, presenting a plurality of the same keywords leads to a narrowing of the range of options for the user. As a result, there is a high possibility that an extra search step will occur when the user selects a keyword. That is, with the conventional search device, the user cannot efficiently search for content.
- the present invention solves the above-described problems, and provides a content search apparatus that can efficiently present related keywords suitable for current affairs to the user.
- a content search device uses a related keyword related to a keyword indicating the content of the content, from a content database in which the content is stored for each content attribute indicating the content classification, A content search device for searching for a predetermined content, wherein a plurality of keywords indicating the content of content that is included in the related section and belongs to the classification indicated by the content attribute for each related section that represents a predetermined time section
- a dictionary database storing the degree of association between the content database, a plurality of first keywords indicating the contents of the first content stored in the content database, and a plurality of second keywords indicating the contents of the second content stored in the content database.
- a related interval determined so that the first content and the second content are included in the same time interval is calculated for each content attribute.
- the degree of relevance between keywords calculated for each content attribute and the related section are stored in the dictionary database.
- Dictionary update means for updating the stored relevance level, and output information for outputting a related keyword related to the keyword input by the user for each related section according to the relevance level stored in the dictionary database
- Output generating means for generating.
- the content search device uses the dictionary database based on the related section calculated for each content attribute. By updating, it is possible to efficiently present related keywords that are suitable for current affairs to the user.
- the content search device updates the dictionary database so that the relevant section has a shorter time than other attributes, so it matches the current situation It is possible to present the keyword thus made to the user.
- the content search device updates the dictionary database so that the related section has a longer time than other attributes. It is possible to present the keywords efficiently without presenting the keywords.
- the content search device uses a dictionary based on the related section calculated according to the change in the content keyword configuration. By updating the database, it is possible to efficiently present related keywords that match current affairs to the user.
- the content search device updates the dictionary database so that the related section has a shorter time than the other, so that the keyword suitable for current affairs is selected by the user. Can be presented.
- the content search apparatus updates the dictionary database so that the related section has a longer time than the other, so the same keyword is not presented in a plurality of related sections. In other words, keywords can be presented efficiently.
- the related section calculating means may calculate a related section using the content included in the latest related section stored in the dictionary database as the second content.
- the related section calculation unit may determine whether a difference between a predetermined number of keywords having a high appearance frequency among the first keywords and a predetermined number of keywords having a high appearance frequency among the second keywords has a predetermined reference value.
- a related section may be calculated based on whether or not it is satisfied.
- the degree of difference is determined regardless of the difference in the number of keywords included in the new time interval and the related interval calculated when the previous content was updated. Can be calculated.
- the related interval calculation means as the second content, content included in the time interval of a predetermined length of time intervals corresponding to the content newly added to the content database last time, It is good also as calculating a related section.
- the content search device can update the latest content stored in the content database regardless of the related interval calculated when the previous content was updated. It is possible to calculate the degree of difference in the keyword configuration between the content and the newly added content. Therefore, the content search apparatus can present a keyword more suitable for current events.
- the content search apparatus further acquires attribute content acquisition means for acquiring a content attribute related to the keyword input by the user, and the keyword input by the user and the attribute acquisition means by referring to the dictionary database.
- Related keyword acquisition means for acquiring a related keyword corresponding to the set content attribute for each related section, wherein the output generation means outputs the related keyword acquired by the related keyword acquisition means It is good also as generating information.
- the content search apparatus can present related keywords suitable for the user's input.
- the related keyword acquisition unit generates a related keyword for each of the plurality of content attributes when the plurality of content attributes are acquired by the attribute acquisition unit, and the output generation unit includes the plurality of content attributes. Output information for outputting the related keyword generated for each content attribute for each content attribute and each related section may be generated.
- FIG. 1 is a block diagram showing a functional configuration of a content search apparatus according to an embodiment of the present invention.
- FIG. 2 is a diagram illustrating an example of a content database.
- FIG. 3 is a conceptual diagram of a related word dictionary.
- FIG. 4 is a diagram illustrating an example of a related word dictionary.
- FIG. 5 is a flowchart showing the flow of processing of the content search apparatus.
- FIG. 6 is a flowchart showing a flow of processing (step S106 shown in FIG. 5) related to related section calculation by the related section calculation unit.
- FIG. 7 is a flowchart showing a flow of processing (step S204 shown in FIG. 6) regarding the calculation of the change rate by the related section calculation unit.
- FIG. 8 is a diagram illustrating an example of the initial search screen.
- FIG. 9 is a diagram illustrating an example of a search screen.
- FIG. 10 is a diagram illustrating an example of a search screen.
- FIG. 11A, FIG. 11B, and FIG. 11C are diagrams for explaining a related interval calculation method.
- FIG. 12A and FIG. 12B are diagrams for explaining the related interval calculation method.
- FIG. 13 is an example of a related keyword presentation screen for a plurality of attributes.
- FIG. 14A is an example of a related keyword presentation screen output by the content search apparatus according to the related art.
- FIG. 14B is an example of a related keyword presentation screen output by the content search apparatus according to the embodiment of the present invention.
- FIG. 15 is a conceptual diagram of processing for generating content attributes by the content database update unit according to the first modification of the present invention.
- FIG. 15 is a conceptual diagram of processing for generating content attributes by the content database update unit according to the first modification of the present invention.
- FIG. 16 is a diagram for explaining processing for generating content attributes by the content database update unit according to Modification 1 of the present invention.
- FIG. 17 is a flowchart showing a flow of processing (step S106 shown in FIG. 5) relating to the related section calculation by the related section calculating unit according to the second modification of the present invention.
- FIG. 18 is a diagram showing an example of a document matrix in the second modification of the present invention.
- FIG. 19 is a diagram for explaining a process of acquiring a document matrix in the second modification of the present invention.
- FIG. 1 is a configuration diagram showing a content search apparatus 100 according to an embodiment of the present invention.
- the content search apparatus 100 includes a content database 101, a dictionary database 102, an input unit 103, an input selection unit 104, a content database update unit 105, a related section calculation unit 106, a dictionary update unit 107, and an attribute acquisition unit. 108, a related keyword acquisition unit 109, an output generation unit 110, and an output unit 111.
- the content database 101 is a database that stores content such as moving images, images, music, and texts to be searched, and content-attached information that indicates the content of the content.
- the content ancillary information refers to information indicating content details such as keywords and content attributes.
- the content attribute means category information for classifying content. For example, in the case of content related to a television program, “genre” described in EPG (Electronic Program Guide) is the content attribute.
- Fig. 2 shows an example of content ancillary information stored in the content database.
- the content database 101 stores content ancillary information including a content ID 20, a content attribute 21, a title 22, a broadcast date 23, a keyword 24, and a summary 25 as shown in FIG.
- the broadcast date 23 is an example of time information indicating information related to the time of content.
- the time information is information indicating the time related to the content. Note that the time information does not need to be a broadcast date, and may be the date and time when the content is registered in the content database 101.
- the keyword 24 indicates a word (keyword) indicating the content.
- the keyword 24 stores a keyword attached to the EPG in advance.
- the keyword 24 may store a keyword extracted by executing morphological analysis on the title 22 or the outline 25.
- the dictionary database 102 is a database that stores the degree of association between keywords stored in the content database 101.
- the dictionary database 102 is content that includes a time indicated by time information for each content attribute for classifying content and for each related section representing a predetermined time section.
- a related word dictionary 102a describing the degree of association between a plurality of keywords corresponding to content belonging to the classification indicated by the content attribute is stored.
- the related word dictionary 102a can store the degree of relevance between keywords using related sections having different lengths of time within content attributes and between content attributes.
- the related section refers to a time section for calculating the degree of association between keywords.
- FIG. 3 shows a conceptual diagram of the related word dictionary 102a in the case where the content database 101 includes contents having four content attributes of “news”, “sports”, “variety”, and “hobbies / education”.
- the related word dictionary 102a is classified into four content attributes.
- section dictionaries having different lengths of related sections, such as section dictionary 31 (N1) and section dictionary 32 (N2), are stored.
- each related section has a different length of time for each content attribute (“news”, “sports”, “variety”, and “hobby / education”).
- FIG. 4 shows an example of the related word dictionary 102a stored in the dictionary database 102.
- the related word dictionary 102 a includes a content attribute 41, a related section 42, a keyword 43, a related keyword 44, and a relevance degree 45.
- the content attribute 41 is “news”
- the relevance level 45 of the related keyword 44 “autumn” related to the keyword 43 “news” in which the related section 42 is “2007/9/10 to 2007/9/12” is “0.94”.
- the related keyword acquiring unit 109 can acquire the related keyword for the keyword selected by the user.
- the input unit 103 shown in FIG. 1 receives information related to user operation input and content database update, and notifies the input selection unit 104 of the received information.
- the input selection unit 104 selects whether the information received from the input unit 103 is information indicating “keyword selection”, “content selection”, or “content database update”. The selection method will be described later.
- the content database update unit 105 updates the content stored in the content database 101 and the content attached information when the input selection unit 104 selects the information received from the input unit 103 as “content database update”.
- the content database update unit 105 copies all acquisition target data distributed by the content server to the content database 101. That is, all the data held before the update is once deleted and overwritten newly.
- the present invention when used in a television broadcast viewing reservation application, only data after the update date and time is stored in the content database 101 on the device side due to the database characteristics of the television broadcast (no data before the broadcast date).
- storage-type content such as VOD (Video on Demand)
- VOD Video on Demand
- the related section calculation unit 106 calculates a new related section for each content attribute by referring to the content database 101 and the dictionary database 102 when the content database update unit 105 updates the content database 101. Specifically, the related interval calculation unit 106, for each content attribute, a plurality of keywords (first keywords) indicating the contents of the content (first content) newly stored in the content database 101, and the content database 101 already. The degree of difference is calculated from a plurality of keywords (second keywords) indicating the contents of the content (second content) stored in. Then, the related interval calculation unit 106 calculates a new related interval based on whether or not the calculated degree of difference between the first keyword and the second keyword satisfies a predetermined reference value. That is, the related interval calculation unit 106 calculates a new related interval so that the first content and the second content are included in the same time interval as the degree of difference between the first keyword and the second keyword is smaller. A detailed calculation method of the related section will be described later.
- the dictionary update unit 107 calculates the degree of association between keywords in the content included in the new related section calculated by the related section calculation unit 106. Then, the dictionary updating unit 107 registers the calculated degree of association of the new related section in the related word dictionary 102a together with the keyword and the related keyword. Note that the degree of association between keywords is calculated based on the co-occurrence of words (the degree to which two words appear in the same content). Accordingly, the value of the degree of association increases as the combination of keywords appearing more frequently in the same content.
- Non-Patent Document 1 Metal Space generation method for associative search based on relevance between words appearing in a document” (Hidenori Honma et al., 16th Data Engineering Workshop (DEWS2005). ), 6A-o2, The Institute of Electronics, Information and Communication Engineers, 2005), and the like are calculated.
- the attribute acquisition unit 108 acquires the content attribute of the keyword selected by the user in the input unit 103 when the input selection unit 104 selects the information received from the input unit 103 as “keyword selection”. A content attribute determination method will be described later.
- the related keyword acquisition unit 109 refers to the related word dictionary 102 a to acquire the content attribute acquired by the attribute acquisition unit 108 and the related keyword and the degree of association corresponding to the keyword selected by the user using the input unit 103.
- the output generation unit 110 sets the related keyword acquired by the related keyword acquisition unit 109 to the related level for each related section. In response, output information for display is generated. For example, the output generation unit 110 generates output information for displaying in order from related keywords having a high degree of relevance. For example, the output generation unit 110 may generate output information for displaying a related keyword having a higher degree of relevance with a larger character. Further, when the input selection unit 104 selects the information received from the input unit 103 as “content selection”, the output generation unit 110 displays content such as a program corresponding to the information input by the user through the input unit 103. Generate output information for display.
- the output unit 111 outputs the output information generated by the output generation unit 110 to an output medium.
- the output medium for example, a monitor such as a television is used.
- FIG. 5 is a flowchart showing the overall processing flow executed by the content search apparatus 100 of FIG.
- the input unit 103 receives an operation input from the user and notifies the input selection unit 104 of the received information (step S101).
- the input sorting unit 104 sorts whether the information notified from the input unit 103 is information indicating which process is “keyword selection”, “content selection”, or “content database update” (step S102). ).
- the attribute acquisition unit 108 selects the keyword and the keyword selected by the user in the input unit 103.
- the keyword content attribute is acquired (step S108).
- the related keyword acquisition unit 109 acquires a related keyword based on the acquired content attribute and the related word dictionary 102a (step S109).
- generation part 110 produces
- the output unit 111 outputs the output information generated by the output generation unit 110 to the output medium (step S111), and ends the process.
- the input selection unit 104 displays the information received from the input unit 103 as “ It is determined whether or not it is selected as “database update” (step S104).
- step S104 when the input selection unit 104 selects the information received from the input unit 103 as “database update” (Yes in step S104), the content database update unit 105 updates the content database 101 (step S105). ). Subsequently, the related section calculation unit 106 calculates a related section set in the related word dictionary 102a (step S106). The detailed processing flow of step S106 will be described later. Then, based on the calculated related section, the dictionary update unit 107 updates the related word dictionary 102a (step S107) and ends the process.
- step S104 when the input selection unit 104 does not select the information received from the input unit 103 as “database update” (No in step S104), that is, when it is selected as “content selection”, output generation is performed.
- the unit 110 generates output information for displaying a program corresponding to information input by the user through the input unit 103 (step S110).
- the output unit 111 outputs the output information generated by the output generation unit 110 to the output medium (step S111), and ends the process.
- FIG. 6 is a flowchart showing a flow of processing (step S106 shown in FIG. 5) related to related section calculation by the related section calculation unit 106.
- the related section calculation unit 106 acquires the related section updated last time in the related word dictionary 102a (hereinafter referred to as the last updated section) for each content attribute (step S201).
- the related section calculation unit 106 creates a keyword list, which is a list of keywords of the acquired previous update section, for each content attribute (step S202).
- the keyword in the previous update section corresponds to a plurality of second keywords indicating the content of the second content stored in the content database 101.
- the related section calculation unit 106 acquires a keyword list that is a list of keywords of content newly added to the content database 101 for each content attribute (step S203).
- the newly added content keyword corresponds to a plurality of first keywords indicating the contents of the first content stored in the content database 101.
- the related interval calculation unit 106 compares the keyword list created in step S202 with the keyword list created in step S203, and calculates the change rate of the keyword configuration (step S204).
- the change rate of the keyword configuration is an example of the degree of difference.
- the related interval calculation unit 106 calculates a time interval corresponding to the content newly added to the content database 101 as a new related interval. (Step S206). That is, the related interval calculation unit 106 calculates a time interval corresponding to the first content as a new related interval.
- the time interval corresponding to the content indicates a time interval including the time indicated by the time information of the content. For example, when content broadcast on September 10th and 11th, 2007 is newly added to the content database 101, the time interval corresponding to the content is September 10th to 11th, 2007.
- the related interval calculation unit 106 adds the previous update interval and the time interval corresponding to the content newly added to the content database 101.
- the combined time interval is calculated as a new related interval (step S207). That is, the related interval calculation unit 106 calculates a time interval including a time interval corresponding to the first content and a time interval corresponding to the second content as a new related interval. In this way, after the related section is calculated by the related section calculation unit 106, the process of step S107 shown in FIG. 5 is executed.
- FIG. 7 is a flowchart showing a flow of processing (step S204 shown in FIG. 6) relating to the change rate calculation by the related interval calculation unit 106.
- the related section calculation unit 106 acquires keywords (new keyword candidates) for which the following processing (steps S302 to S308) has not yet been executed from the keyword list of the additional content created in step S203 (step S301). ). Furthermore, the related interval calculation unit 106 acquires keywords (comparison keywords) for which the following processing (steps S303 to S305) has not yet been executed from the keyword list of the previous update interval created in step S202 (step S302). ).
- the related section calculation unit 106 determines whether or not the new keyword candidate and the comparison keyword acquired in step S301 and step S302 partially match (step S303). Note that partial matching means that 80 or more characters in a keyword of 4 or more characters match.
- step S303 when the new keyword candidate and the comparison keyword partially match (Yes in step S303), the related section calculation unit 106 determines that the new keyword candidate is not a new keyword (step S308). On the other hand, when the new keyword candidate and the comparison keyword do not partially match (No in step S303), the related section calculation unit 106 determines whether or not the new keyword candidate and the comparison keyword match synonyms (step S304). . Synonym matching means that a synonym of a new keyword candidate matches a comparison keyword.
- the related section calculation unit 106 determines that the new keyword candidate is not a new keyword (step S308).
- the new keyword candidate and the comparison keyword do not match the synonyms (No in step S304)
- the related section calculation unit 106 determines whether the new keyword candidate and the comparison keyword match with each other (step S305).
- the notation fluctuation match means that a keyword obtained by replacing a new keyword candidate by using hiragana, kana, kanji or romaji matches the comparison keyword.
- step S305 if the new keyword candidate and the comparison keyword are inconsistently matched (Yes in step S305), the related section calculation unit 106 determines that the new keyword candidate is not a new keyword (step S308). On the other hand, when the new keyword candidate and the comparison keyword do not match in writing (No in step S305), the related interval calculation unit 106 determines whether or not all keywords included in the keyword list in the previous update interval have been acquired. (Step S306).
- step S306 when all the keywords included in the keyword list in the previous update section have not been acquired (No in step S306), the process is repeated from the acquisition of the keyword in step S302 again.
- the related section calculation unit 106 determines that the new keyword candidate is a new keyword (step S307).
- the related section calculation unit 106 determines whether or not all keywords included in the keyword list of the additional content have been acquired (step S309).
- the process is repeated from the keyword acquisition in step S301 again.
- the related section calculation unit 106 determines the number of keywords determined to be new keywords in step S307 as the previous time. The rate of change is calculated by dividing by the number of keywords included in the keyword list of the update section (step S310).
- step S107 shown in FIG. 5 is executed.
- FIGS. FIG. 8 to FIG. 10 are diagrams showing transition of screens output by content search when a user wants to watch a program related to a news program from among TV programs that can be viewed.
- the content search device 100 When starting the search, the content search device 100 presents the initial search screen shown in FIG. 8 to the user.
- the initial search keyword presented in the initial search screen is a keyword indicating a genre such as “sports” or “documentary”, for example.
- the user selects “news” from the initial search keywords.
- FIG. 9 shows a search screen presented by the content search device 100 after the user selects an initial search keyword.
- a content list 70 and a related keyword list 71 related to the selected keyword (news) are presented.
- related keyword list 71 related keywords are presented for each related section in descending order of relevance.
- the user selects content from the content list 70 when there is content to be viewed on the search screen.
- the user selects a keyword related to the content to be viewed from the related keyword list 71.
- the content search device 100 displays the selected content and ends the search process.
- the content search apparatus 100 presents a screen on which the content list and the related keyword are displayed again based on the selected keyword. For example, in the search screen shown in FIG. 9, when the user selects the Diet 72 from the related keyword list 71, as shown in FIG. Keywords to be presented for each related section.
- the user searches for the content he / she wants to see while repeatedly selecting related keywords presented by the system.
- step S101 in FIG. 5 the input unit 103 receives information input to the system by the user. Specifically, keywords such as “news” selected by the user on the initial search screen of FIG. 8 and “National Diet” selected on the search screen of FIG. 9 are input information. The content selected by the user from the content list 70 shown in FIG. 9 is also input information. Furthermore, although not shown, when the user selects content database update, the selection is also input information.
- the content search apparatus 100 updates the content database 101 when there is a user input.
- the content search apparatus 100 can update the content database 101 at an arbitrary time. May be updated.
- the content search device 100 may update the content database 101. In such a case, input of new content to the content database 101 becomes input information.
- step S102 of FIG. 5 the input sorting unit 104 sorts the input information received from step S101 into one of “keyword selection”, “content selection”, and “content database update”. Specifically, “keyword selection” is selected, for example, when the user selects any keyword from the related keyword list 71 on the search screen of FIG. 9. In addition, “content selection” is selected when the user selects any content from the content list 70 on the search screen illustrated in FIG. 9, for example. “Content database update” is selected when the user selects to update the content database, for example, although not shown.
- step S103 of FIG. 5 the input selection unit 104 determines whether the selection in step S102 is “keyword selection”. If the input selection unit 104 determines that the information received from the input unit 103 is “keyword selection”, the input selection unit 104 passes the corresponding selection keyword to the attribute acquisition unit 108. Then, the process proceeds to step S108. On the other hand, if the input selection unit 104 does not determine that the information received from the input unit 103 is “keyword selection”, the process proceeds to step S104. Specifically, for example, when the user selects the keyword “news” on the initial search screen of FIG. 8 or when the user selects the keyword “national assembly” on the search screen of FIG. Are both selected as “keyword selection”. Then, the input selection unit 104 passes the keyword “news” or “national legislation” to the attribute acquisition unit 108. Then, the process proceeds to step S108.
- the input selection unit 104 determines whether or not the selection in step S102 is “content database update”. If the input selection unit 104 determines that the information received from the input unit 103 is “content database update”, the process proceeds to step S105. On the other hand, if the input selection unit 104 determines that the information received from the input unit 103 is not “content database update”, that is, the input selection unit 104 determines that the information received from the input unit 103 is “content selection” in step S102. The input selecting unit 104 acquires the content ID corresponding to the content selected by the user from the content database 101. Then, the input selection unit 104 passes the acquired content ID to the output generation unit 110. Thereafter, the process proceeds to step S109.
- the input selection unit 104 acquires a content ID corresponding to the selected program from the content database 101 and outputs the acquired content ID. It passes to the generation unit 110. Thereafter, the process proceeds to step S110.
- the input selection unit 104 passes update data to the content database update unit 105. Then, the process proceeds to step S105.
- step S105 the content database update unit 105 adds the update data acquired by the process in step S104 to the content database 101.
- the related section calculation unit 106 calculates the related section based on the update data newly added to the content database 101 by the content database update unit 105 in step S105.
- step S106 will be described in detail below.
- the related interval calculation unit 106 acquires the previously updated related interval included in the related word dictionary 102a for each content attribute such as “news” and “sports”. Specifically, the related section calculation unit 106 acquires the latest related section 42 for each content attribute from the data stored in the related word dictionary 102a illustrated in FIG. As shown in FIG. 11A, the related section acquired here is the last update section 1001 (tn-2 to tn-1).
- the related section calculation unit 106 creates a keyword list of contents included in the related section acquired in step S201 for each content attribute. Specifically, for example, the related section calculation unit 106 may acquire the keyword 43 corresponding to the previous update section 1001 for each content attribute with reference to the related word dictionary 102a illustrated in FIG.
- step S203 in FIG. 6 the related interval calculation unit 106 creates a keyword list corresponding to the update data acquired in step S104 in FIG. 5 for each content attribute.
- the keyword list in the previous update section and the keyword list corresponding to the newly updated content are created for each content attribute by the processing in steps S201 to S203 in FIG.
- the related interval calculation unit 106 compares the keyword list created in step S202 with the keyword list created in step S203, and calculates the change rate of the keyword configuration.
- the change rate of the keyword configuration is an example of the degree of difference.
- the number of keywords that are not included in the keyword list related to the previous update section 1001 is used as a numerator, and the keywords included in the keyword list related to the previous update section 1001 This is a value calculated using the number as the denominator.
- the number of new keywords not included in the keyword list of the previous update section 1001 is 40 in the keyword list obtained from the content newly updated this time created in step S203.
- the change rate of the keyword configuration is 0.2.
- the number of new keywords is calculated based on the processing shown in FIG.
- step S205 in FIG. 6 the related interval calculation unit 106 determines whether or not the change rate of the keyword configuration acquired in step S204 exceeds a predetermined threshold value. If it is determined in step S205 in FIG. 6 that the rate of change is equal to or greater than the predetermined threshold, in step S206 in FIG. 6, the related interval calculation unit 106 newly adds only the time interval corresponding to the newly added content. Calculated as a related interval.
- step S206 in FIG. 6 the related interval calculation unit 106 determines that the previously updated interval and the content that is newly updated this time. A time interval obtained by adding the corresponding time intervals is calculated as a new related interval.
- the related interval calculation unit 106 determines that the change rate of the keyword configuration is equal to or greater than the threshold value.
- the new related section 1003 (tn ⁇ 1 to tn) identical to the additional section 1002 is calculated as a new related section, and when the rate of change falls below the threshold, ),
- a new related section 1004 (tn ⁇ 2 to tn) obtained by adding the previous update section 1001 and the additional section 1002 is calculated as a new related section.
- the content search apparatus 100 in order to present a keyword with current affairs, it is preferable that the content search apparatus 100 generates a related keyword using the related word dictionary 102a in which the related section is changed according to a change in the keyword configuration.
- the related interval calculation unit 106 can calculate the related interval based on the change rate of the keyword configuration for each content attribute, the content search apparatus 100 presents a keyword with current affairs. Is possible.
- the related section calculation unit 106 relates the related word dictionary 102a corresponding to the content newly added to the content database 101. A section can be calculated.
- the number of keywords included in the two keyword lists to be compared is not particularly defined.
- the change rate is set only for a predetermined number of keywords having a high appearance frequency. It may be calculated.
- the related interval calculation unit 106 includes the top n appearance frequency keywords in the keyword list created in step S202, and the top appearance frequency n keywords in the keyword list created in step S203. To calculate the change rate of the keyword composition. For example, among the 100 most frequently occurring keywords in the keyword list created in step S203, the number of new keywords not included in the 100 most frequently occurring keywords in the keyword list in the previous update section is 40. In the case of the individual, the change rate of the keyword configuration is 0.4.
- the related interval calculation unit 106 has acquired the latest related interval of the related word dictionary 102a as the previous update interval, but corresponds to the content updated last time in the content database 101.
- a time interval of a predetermined length included in the time interval may be set as the previous update interval.
- the related interval calculation unit 106 acquires keyword comparison intervals 1103 (tn ⁇ 2 to tn ⁇ 1) shown in FIG. 12B.
- the section calculation unit 106 acquires a time section corresponding to the minimum time unit from the previous content update section closer to the additional section 1102.
- Comparison of change rate in minimum time unit can set related section corresponding to minute change of keyword relevance.
- the content search device can always present a new related keyword to the user.
- step S107 of FIG. 5 the dictionary update unit 107 updates the related word dictionary based on the related section calculated in step S105.
- the dictionary creation method is as described in FIG.
- step S108 of FIG. 5 the attribute acquisition unit 108 determines the content attribute of the keyword acquired in step S103. Then, the attribute acquisition unit 108 passes the content attribute determined as a keyword to the related keyword acquisition unit 109. Thereafter, the process proceeds to step S109.
- content attribute determination when the keyword presented on the initial search screen in FIG. 8 is a keyword shared with the content attribute, the attribute acquisition unit 108 selects the keyword selected by the user on the initial search screen as the content attribute. What is necessary is just to determine.
- the content attribute of the related keyword selected by the user on the search screen shown in FIG. 9 is determined as “news”. This is a search for narrowing down the content attribute with the content attribute “sports” selected first, and is effective in the case of a narrow search.
- the related keyword acquisition part 109 refers to the dictionary database 102, and acquires the related keyword corresponding to the keyword acquired in step S103 and the keyword attribute acquired in step S108. Then, the related keyword acquisition unit 109 passes the acquired related keyword to the output generation unit 110. Thereafter, the process proceeds to step S110. Specifically, for example, when the user selects “news” on the initial search screen of FIG. 8 and subsequently selects “National Diet” on the search screen of FIG. 9, the attribute acquisition unit 108 sets the keyword attribute to “news”. Is determined. Then, the related keyword acquisition unit 109 refers to the related word dictionary 102a illustrated in FIG.
- the related keyword acquisition unit 109 acquires the keywords “primary speech”, “politics”, and “pension” in the related section from September 10 to 12, 2007.
- step S110 of FIG. 5 when the related keyword is acquired in step S109, the output generation unit 110 outputs a search screen as shown in FIG. 9, for example, using the acquired related keyword and the content database 101. Generate output information for On the other hand, when the content ID is acquired in step S104, the output generation unit 110 generates output information for displaying the content using the acquired content ID and the content database 101.
- step S111 of FIG. 5 the output unit 111 outputs the output information generated in step S110 to a monitor or the like.
- the content search apparatus 100 can refer to the related word dictionary 102a having different related sections for each content attribute, and therefore adapts to current events that differ for each content attribute.
- the related keyword can be presented to the user.
- the attribute acquisition unit 108 may acquire content attributes using a method different from the method described above.
- the attribute acquisition unit 108 may acquire a plurality of content attributes having a high keyword appearance frequency among the content attributes having the acquired keyword. For example, when the keyword “National Diet” is present in two content attributes “News” and “Variety”, the ranking of the appearance frequency of the keyword “National Diet” in each content attribute is equal to or higher than a predetermined threshold.
- two content attributes may be acquired as keyword attributes.
- An example of the screen output in this case is shown in FIG. As illustrated in FIG. 13, the output unit 111 outputs a related keyword for the keyword “National Diet” to each content attribute of “News” and “Variety”.
- the user can select a related keyword for each content attribute. Therefore, the content search apparatus 100 can avoid the presentation of related keywords that the user does not intend (for example, the user wants a variety of related keywords, but the related keywords of news are presented). As a result, the content search device 100 can reduce the search man-hours for the search by the user.
- FIG. 14 shows an output example when the related keywords output by the content search apparatus of the present embodiment and the related keywords output using the fixed section described in Patent Document 2 are arranged and output.
- FIG. 14A is an example of a related keyword presentation screen output by the content search device according to the prior art. As shown in the figure, each related keyword is generated for each of the time sections 121, 122, and 123 obtained by dividing the data from August 13 to September 12, 2007 into 10 days.
- FIG. 14B is an example of a related keyword presentation screen output by the content search apparatus 100 according to the present embodiment. As shown in the figure, each related keyword is generated in time intervals 124, 125, and 126 calculated based on the change rate of the keyword configuration for each attribute.
- the time interval for creating a related keyword is short with respect to the frequency of data content change (for example, when the keyword configuration does not change for 20 days)
- the time interval 121 and the time interval 122 in FIG. Like the “election”, the content search apparatus outputs the same keyword in a plurality of time intervals.
- the output of a plurality of such identical keywords on the same screen leads to a narrower range of user choices. As a result, if the user wants to select another keyword, there is a high possibility that an extra search step will occur.
- the content search device 100 determines a time interval in which a related keyword is generated in response to a change in keyword configuration for each content attribute. Therefore, the content search apparatus 100 can reduce the possibility of outputting a plurality of the same keyword in different time intervals. That is, as shown in a time interval 124 in FIG. 14B, a time interval in which the change in keyword configuration is small becomes one time interval 124. As a result, the keyword presented in the time interval 124 is different from the keyword presented in the time interval 125 adjacent to the time interval 124.
- the time period for creating a related keyword is long with respect to the frequency of data content change (for example, when a change in keyword structure occurs every 5 days), the period before and after the period when the keyword structure changes is changed. Highly relevant keywords will be presented preferentially. Therefore, the content search apparatus cannot present a keyword that matches current affairs. That is, as shown in the time interval 123 of FIG. 14A, the keyword “America” having a higher degree of relevance before the keyword composition change is ranked higher than the keyword “prime address” adapted to the current situation after the keyword composition change. It will be presented. In this case as well, as described above, when the user wants to select another keyword, there is a high possibility that the number of search steps increases.
- the content search device 100 changes the time interval for generating the related keyword for each content attribute according to the change rate of the keyword configuration. Therefore, the content search apparatus 100 can present related keywords that are suitable for current affairs. That is, as shown in the time interval 125 and the time interval 126 in FIG. 14B, the content search apparatus 100 can change the time interval around September 10, 2007 when the keyword configuration has changed greatly. In the time interval 126, it becomes possible to present the keyword “prime address” that is suitable for current events.
- the content search device uses a related word dictionary for generating related keywords in accordance with the degree of difference in keyword configuration between newly added content and already stored content. Since it updates based on a related section, the related keyword suitable for current affairs can be shown to a user efficiently. As a result, when searching for content from a content database including content with a plurality of content attributes having different current affairs, the user can narrow down the content interactively by repeatedly selecting related keywords.
- the content search device is different from the content search device 100 according to the first embodiment shown in FIG. 1 in that the content database update unit 105 generates the content attribute 21.
- the content database update unit 105 generates “cluster label” as the content attribute 21 by clustering the content stored in the content database 101. Then, the content database update unit 105 registers the generated content attribute 21 in the content database 101. In this way, the content database update unit 105 performs clustering, so that the content search device has a similar program content such as a cluster (content set) of sports programs or a cluster of movie programs. Content can be classified into the same group. That is, the “cluster label” is information equivalent to the “genre” of the EPG, and is an example of the content attribute 21.
- Non-Patent Document 2 Information Search and Language Processing
- FIG. 15 is a conceptual diagram of processing in which the content database update unit 105 generates a cluster label as the content attribute 21.
- the content database update unit 105 generates a plurality of clusters by performing clustering using keywords or the like included in the content ancillary information stored in the content database 101. Then, the content database update unit 105 generates a cluster label corresponding to the generated cluster. For example, the content database update unit 105 gives randomly generated cluster labels (CL1, CL2, CL3, and CL4) to the cluster. As a result, any cluster label is generated for all content IDs stored in the content database 101. Then, the content database update unit 105 registers the generated cluster label as the content attribute 21 in the content database 101.
- the content database update unit 105 can automatically register the content attribute 21 corresponding to the “genre” of the EPG. Therefore, the content search apparatus according to this modification can output related keywords even in the content database 101 in which the content attributes are not registered in advance.
- the content database update unit 105 generates a cluster label for update data stored in the content database 101 each time the database is updated.
- the cluster labels (CL21, CL22, and CL23) for the update data and the cluster labels (CL11, CL12, and CL13) for the data before update that are already stored in the content database 101 are:
- the cluster label “CL11” is assigned to the sport-type cluster before the update, but “CL22” is assigned to the sport-type cluster of the update data. That is, the same label is not assigned to clusters having the same content.
- the content database update unit 105 first calculates the degree of similarity between each cluster of update data and each cluster of data before update. Subsequently, the content database update unit 105 creates a pair of clusters having high similarity. Then, the content database update unit 105 gives the cluster label before the update as the cluster label of the update data in the created pair.
- the similarity is measured by a method described in, for example, a method using a cosine scale or inner product between clusters (Non-Patent Document 3, “Information Retrieval Algorithm”, Kitakenji et al., Kyoritsu Publishing, pp. 60-63, 2002). It may be calculated.
- the content database update unit 105 assigns the same cluster label to clusters with similar contents before and after the update so that the cluster label “CL22” in the above example can be converted into “CL11”. be able to.
- the content search device is different from the content search device 100 according to the first embodiment in the details of the process executed by the related section calculation unit 106.
- the related section calculation unit 106 is a document of the content before update and the additional content newly added to the content database 101 already stored in the content database 101 in step S106 shown in FIG.
- the related interval is calculated using the spatial similarity.
- FIG. 17 is a flowchart showing a flow of processing (step S106 shown in FIG. 5) related to related section calculation by the related section calculation unit 106.
- step S106 shown in FIG. 5 the same steps as those in FIG. 6 are denoted by the same reference numerals, and detailed description thereof is omitted.
- the related interval calculation unit 106 acquires the previous update interval for each content attribute (step S201).
- the related interval calculation unit 106 creates a document matrix of the acquired content of the previous update interval for each content attribute (step S1701). That is, the related section calculation unit 106 creates a document matrix for each content attribute using a plurality of second keywords indicating the contents of the second content already stored in the content database.
- the created document matrix is referred to as a document matrix group A.
- the document matrix is a matrix that represents keyword frequency information (appearance frequency, tf-idf, etc.) in each content, as shown in FIG.
- the related section calculation unit 106 creates a document matrix of content newly added to the content database 101 for each content attribute (step S1702). That is, the related interval calculation unit 106 creates a document matrix for each content attribute using a plurality of first keywords indicating the contents of the first content newly stored in the content database.
- the created document matrix is referred to as a document matrix group B.
- the related section calculation unit 106 acquires document matrices having the same content attribute from the document matrix groups A and B (step S1703).
- the document matrices acquired from each of the document matrix groups A and B are referred to as document matrices A1 and B1.
- FIG. 19 shows an example of the document matrices A1 and B1 acquired in step S1703.
- the document matrix groups A and B include document matrices having content attributes of “sports” and “movie”, respectively.
- the document matrices A1 and B1 corresponding to the content attribute “sports” are acquired by selecting the document matrix having the content attribute “sports” from the document matrix groups A and B, respectively.
- the related interval calculation unit 106 calculates the similarity of each document matrix using the document matrices A1 and B1 (step S1704).
- the related interval calculation unit 106 calculates, for example, the cosine distance of each document vector of the matrix B1 with respect to the document matrix A1, and the ratio of documents whose cosine distance is equal to or greater than the threshold with respect to the total number of documents in the document matrix B1. Is calculated as the similarity.
- the related interval calculation unit 106 If the calculated similarity is smaller than the predetermined threshold (Yes in step S1705), the related interval calculation unit 106 newly sets a time interval corresponding to the content (first content) newly added to the content database 101. It calculates as a related section (step S206). On the other hand, if the calculated similarity is equal to or greater than the predetermined threshold (No in step S1705), the related interval calculation unit 106 adds the time interval corresponding to the content newly added to the content database 101 and the previous update interval. The combined time interval is calculated as a new related interval (step S207). That is, the related interval calculation unit 106 calculates a time interval obtained by adding the time interval corresponding to the first content and the time interval corresponding to the second content as a new related interval.
- step S1706 determines whether or not all content attributes stored in the content database 101 have been selected in step S1703. If all the content attributes have not been selected (No in step S1706), the process returns to step S1703. On the other hand, when all the content attributes are selected (Yes in step S1706), the process in step S107 shown in FIG. 5 is executed.
- the related section calculation unit 106 can calculate a related section according to the similarity in content units. Thereby, even when the related section is calculated in the content database 101 in which the number of keywords greatly differs among the contents, the influence of the content having a large number of keywords can be reduced in calculating the related section. It is possible to prevent erroneous calculation of related sections.
- the related section calculation unit 106 may calculate the related section in step S106 based on whether or not the content title dissimilarity between the pre-update content and the additional content satisfies a predetermined reference value. Specifically, the related interval calculation unit 106 calculates a matching rate between the title of the content in the previous update interval and the title of the newly added content. When the matching rate is equal to or higher than the threshold, the related interval calculation unit 106 adds a time interval obtained by adding the previous update interval and the time interval corresponding to the content newly added to the content database 101 to the new related interval. Calculate as interval.
- the related interval calculation unit 106 calculates a time interval corresponding to the content newly added to the content database 101 as a new related interval.
- the related section calculation unit 106 can greatly reduce the processing time for calculating the related section. it can.
- the content search device has been described based on the embodiment and its modifications.
- the present invention is not limited to the above embodiment and its modifications. Unless it deviates from the meaning of the present invention, various modifications conceived by those skilled in the art have been applied to the above-described embodiment and its modifications, and forms constructed by combining different embodiments and components in the modifications. Are included within the scope of the present invention.
- the content database is provided in the content search device, but the content database may be provided in another device different from the content search device.
- the content search device and other devices are connected via a network or the like.
- the content search device of the above embodiment searches for a desired TV program from a content database in which TV programs are stored.
- viewing content such as movies and music, text content such as books and papers, etc.
- Content may be searched from a content database in which is stored. That is, the content searched by the content search apparatus according to the present invention may be content having character information.
- the present invention can be realized not only as a content search apparatus as described above, but also as a content search method that uses the operation of characteristic components included in the content search apparatus as a step. It can also be realized as a program for causing a computer to execute the characteristic steps included in. Such a program can be distributed via a recording medium such as a CD-ROM (Compact Disc-Read Only Memory) or a communication network such as the Internet.
- a recording medium such as a CD-ROM (Compact Disc-Read Only Memory) or a communication network such as the Internet.
- the present invention is used as a content search device for searching for a content that a user wants to use from a large amount of content, for example, as a device for searching a program that a user wants to watch from a database in which a large number of TV programs are stored Is possible.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
21 コンテンツ属性
22 タイトル
23 放送日
24 キーワード
25 概要
31、32 区間辞書
41 コンテンツ属性
42 関連区間
43 キーワード
44 関連キーワード
45 関連度
70 コンテンツリスト
71 関連キーワードリスト
100 コンテンツ検索装置
101 コンテンツデータベース
102 辞書データベース
102a 関連語辞書
103 入力部
104 入力選別部
105 コンテンツデータベース更新部
106 関連区間算出部
107 辞書更新部
108 属性取得部
109 関連キーワード取得部
110 出力生成部
111 出力部
121、122、123、124、125、126 時間区間
1001 前回更新区間
1002、1102 追加区間
1003、1004 新関連区間
1101 前回コンテンツ更新区間
1103 キーワード比較区間
図1は、本発明の実施の形態に係るコンテンツ検索装置100を示す構成図である。図1に示すように、コンテンツ検索装置100は、コンテンツデータベース101、辞書データベース102、入力部103、入力選別部104、コンテンツデータベース更新部105、関連区間算出部106、辞書更新部107、属性取得部108、関連キーワード取得部109、出力生成部110、及び出力部111を備える。
次に、上記実施の形態の変形例1について図面を用いて説明する。
次に、上記実施の形態の変形例2について図面を用いて説明する。
Claims (10)
- コンテンツの内容を示すキーワードに関連する関連キーワードを用いて、コンテンツの分類を示すコンテンツ属性ごとにコンテンツが記憶されたコンテンツデータベースから、所定コンテンツを検索するコンテンツ検索装置であって、
所定の時間区間を表す関連区間ごとに、前記関連区間に含まれるコンテンツであって前記コンテンツ属性により示される分類に属するコンテンツの内容を示す複数のキーワード間の関連度が記憶される辞書データベースと、
前記コンテンツデータベースに記憶される第1コンテンツの内容を示す複数の第1キーワードと、前記コンテンツデータベースに記憶された第2コンテンツの内容を示す複数の第2キーワードとにより算出される前記コンテンツ属性ごとの相違度が所定基準値を満たすか否かに基づき、前記第1コンテンツと前記第2コンテンツとが同一の時間区間に含まれるように定められる関連区間を前記コンテンツ属性ごとに算出する関連区間算出手段と、
前記関連区間算出手段により算出された関連区間に含まれるコンテンツにおいて、前記コンテンツ属性ごとに算出されるキーワード間の関連度と、前記関連区間と、を用いて前記辞書データベースに記憶されている関連度を更新する辞書更新手段と、
前記辞書データベースに記憶されている関連度に応じて、ユーザが入力したキーワードに関連する関連キーワードを前記関連区間ごとに出力するための出力情報を生成する出力生成手段と
を備えることを特徴とするコンテンツ検索装置。 - 前記関連区間算出手段は、前記相違度が所定基準値を超える場合には、前記第1コンテンツに対応する時間区間により定められる関連区間を算出し、前記相違度が所定基準値以下である場合には、前記辞書データベースに記憶された最新の関連区間と前記第1コンテンツに対応する時間区間とを足し合わせた時間区間により定められる関連区間を算出する
ことを特徴とする請求項1に記載のコンテンツ検索装置。 - 前記関連区間算出手段は、前記辞書データベースに記憶される最新の関連区間に含まれるコンテンツを前記第2コンテンツとして、関連区間を算出する
ことを特徴とする請求項2に記載のコンテンツ検索装置。 - 前記関連区間算出手段は、前記第1キーワードのうち出現頻度が高い所定の数のキーワードと、前記第2キーワードのうち出現頻度が高い所定の数のキーワードとの相違度が所定基準値を満たすか否かに基づき、関連区間を算出する
ことを特徴とする請求項3に記載のコンテンツ検索装置。 - 前記関連区間算出手段は、前記コンテンツデータベースに前回新たに加えられたコンテンツに対応する時間区間のうち、予め定められた長さの時間の時間区間に含まれるコンテンツを前記第2コンテンツとして、関連区間を算出する
ことを特徴とする請求項2に記載のコンテンツ検索装置。 - 前記コンテンツ検索装置は、さらに、
ユーザが入力したキーワードに関連するコンテンツ属性を取得する属性取得手段と、
前記辞書データベースを参照することにより、前記ユーザが入力したキーワードと前記属性取得手段により取得されたコンテンツ属性とに対応する関連キーワードを、関連区間ごとに取得する関連キーワード取得手段とを備え、
前記出力生成手段は、前記関連キーワード取得手段により取得された関連キーワードを出力するための前記出力情報を生成する
ことを特徴とする請求項1に記載のコンテンツ検索装置。 - 前記関連キーワード取得手段は、前記属性取得手段により複数のコンテンツ属性が取得された場合は、前記複数のコンテンツ属性の各々に対して関連キーワードを生成し、
前記出力生成手段は、前記複数のコンテンツ属性の各々に対して生成された関連キーワードを、コンテンツ属性ごと、関連区間ごとに出力するための出力情報を生成する
ことを特徴とする請求項6に記載のコンテンツ検索装置。 - 関連区間算出手段は、前記第1キーワードのうち前記第2キーワードと重複しないキーワードの数を、第2キーワードの数により除した値である相違度が所定基準値を満たすか否かに基づき、関連区間を算出する
ことを特徴とする請求項1に記載のコンテンツ検索装置。 - コンテンツの内容を示すキーワードに関連する関連キーワードを用いて、コンテンツの分類を示すコンテンツ属性ごとにコンテンツが記憶されたコンテンツデータベースから、所定コンテンツをコンピュータが検索するコンテンツ検索方法であって、
前記コンピュータは、
所定の時間区間を表す関連区間ごとに、前記関連区間に含まれるコンテンツであって前記コンテンツ属性により示される分類に属するコンテンツの内容を示す複数のキーワード間の関連度が記憶される辞書データベースを備え、
前記コンテンツ検索方法は、
前記コンテンツデータベースに記憶される第1コンテンツの内容を示す複数の第1キーワードと、前記コンテンツデータベースに記憶された第2コンテンツの内容を示す複数の第2キーワードとにより算出される前記コンテンツ属性ごとの相違度が所定基準値を満たすか否かに基づき、前記第1コンテンツと前記第2コンテンツとが同一の時間区間に含まれるように定められる関連区間を、前記コンピュータが前記コンテンツ属性ごとに算出する関連区間算出ステップと、
前記関連区間算出ステップにより算出された関連区間に含まれるコンテンツにおいて、前記コンテンツ属性ごとに算出されるキーワード間の関連度と、前記関連区間と、を用いて前記辞書データベースに記憶されている関連度を前記コンピュータが更新する辞書更新ステップと、
前記辞書データベースに記憶されている関連度に応じて、ユーザが入力したキーワードに関連する関連キーワードを前記関連区間ごとに出力するための出力情報を前記コンピュータが生成する出力生成ステップと
を含むことを特徴とするコンテンツ検索方法。 - コンテンツの内容を示すキーワードに関連する関連キーワードを用いて、コンテンツの分類を示すコンテンツ属性ごとにコンテンツが記憶されたコンテンツデータベースから、所定コンテンツを検索する、コンピュータ実行可能なプログラムであって、
前記コンピュータは、
所定の時間区間を表す関連区間ごとに、前記関連区間に含まれるコンテンツであって前記コンテンツ属性により示される分類に属するコンテンツの内容を示す複数のキーワード間の関連度が記憶される辞書データベースを備え、
前記プログラムは、
前記コンテンツデータベースに記憶される第1コンテンツの内容を示す複数の第1キーワードと、前記コンテンツデータベースに記憶された第2コンテンツの内容を示す複数の第2キーワードとにより算出される前記コンテンツ属性ごとの相違度が所定基準値を満たすか否かに基づき、前記第1コンテンツと前記第2コンテンツとが同一の時間区間に含まれるように定められる関連区間を前記コンテンツ属性ごとに算出する関連区間算出ステップと、
前記関連区間算出ステップにより算出された関連区間に含まれるコンテンツにおいて、前記コンテンツ属性ごとに算出されるキーワード間の関連度と、前記関連区間と、を用いて前記辞書データベースに記憶されている関連度を更新する辞書更新ステップと、
前記辞書データベースに記憶されている関連度に応じて、ユーザが入力したキーワードに関連する関連キーワードを前記関連区間ごとに出力するための出力情報を生成する出力生成ステップと
をコンピュータに実行させることを特徴とするプログラム。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US12/672,085 US8073851B2 (en) | 2008-03-10 | 2009-03-02 | Content searching device and content searching method |
| JP2009528429A JP4388137B2 (ja) | 2008-03-10 | 2009-03-02 | コンテンツ検索装置及びコンテンツ検索方法 |
| CN2009801012516A CN101889281B (zh) | 2008-03-10 | 2009-03-02 | 内容检索装置及内容检索方法 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2008059914 | 2008-03-10 | ||
| JP2008-059914 | 2008-03-10 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2009113266A1 true WO2009113266A1 (ja) | 2009-09-17 |
Family
ID=41064940
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2009/000926 WO2009113266A1 (ja) | 2008-03-10 | 2009-03-02 | コンテンツ検索装置及びコンテンツ検索方法 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US8073851B2 (ja) |
| JP (1) | JP4388137B2 (ja) |
| CN (1) | CN101889281B (ja) |
| WO (1) | WO2009113266A1 (ja) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101916268A (zh) * | 2010-08-04 | 2010-12-15 | 哈尔滨工业大学深圳研究生院 | 汉语词组库的建立及更新方法 |
| US20120163772A1 (en) * | 2009-10-22 | 2012-06-28 | Shinji Nabeshima | Reproducing device, reproducing method, program and recording medium |
| JP2020119254A (ja) * | 2019-01-23 | 2020-08-06 | 株式会社日立製作所 | テキストデータ収集装置及び方法 |
| KR20200098381A (ko) * | 2019-02-11 | 2020-08-20 | 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. | 콘텐츠를 검색하는 방법, 장치, 기기 및 저장 매체 |
Families Citing this family (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8768930B2 (en) * | 2009-10-10 | 2014-07-01 | Oracle International Corporation | Product classification in procurement systems |
| US8385723B2 (en) * | 2010-06-18 | 2013-02-26 | Microsoft Corporation | Recording of sports related television programming |
| KR101196935B1 (ko) | 2010-07-05 | 2012-11-05 | 엔에이치엔(주) | 실시간 인기 키워드에 대한 대표 문구를 제공하는 방법 및 시스템 |
| KR101196989B1 (ko) * | 2010-07-06 | 2012-11-02 | 엔에이치엔(주) | 실시간 인기 키워드에 대한 대표 문구를 제공하는 방법 및 시스템 |
| US8719207B2 (en) | 2010-07-27 | 2014-05-06 | Oracle International Corporation | Method and system for providing decision making based on sense and respond |
| US9348941B2 (en) * | 2011-06-16 | 2016-05-24 | Microsoft Technology Licensing, Llc | Specification of database table relationships for calculation |
| US20130066632A1 (en) * | 2011-09-14 | 2013-03-14 | At&T Intellectual Property I, L.P. | System and method for enriching text-to-speech synthesis with automatic dialog act tags |
| CN103744897A (zh) * | 2013-12-24 | 2014-04-23 | 华为技术有限公司 | 故障信息的关联搜索方法、系统和网络管理系统 |
| CN104331434A (zh) * | 2014-10-22 | 2015-02-04 | 乐视网信息技术(北京)股份有限公司 | 一种生成搜索提示词服务的方法及其装置 |
| CN105912645B (zh) * | 2016-04-08 | 2019-03-05 | 上海智臻智能网络科技股份有限公司 | 一种智能问答方法及装置 |
| CN110574102B (zh) * | 2017-05-11 | 2023-05-16 | 株式会社村田制作所 | 信息处理系统、信息处理装置、记录介质以及词典数据库的更新方法 |
| WO2020128936A2 (en) * | 2018-12-20 | 2020-06-25 | Germishuys Dennis Mark | Association determination |
| JP7642335B2 (ja) * | 2020-09-11 | 2025-03-10 | 株式会社東芝 | 情報処理装置、方法、及びプログラム |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH05307569A (ja) * | 1992-05-01 | 1993-11-19 | Nippon Telegr & Teleph Corp <Ntt> | 時間変動する情報に対応する情報の蓄積及び検索方法 |
| JPH07192009A (ja) * | 1992-03-23 | 1995-07-28 | Nippon Telegr & Teleph Corp <Ntt> | 情報の蓄積、検索および除去処理方法 |
| JPH11175530A (ja) * | 1997-12-08 | 1999-07-02 | Nippon Telegr & Teleph Corp <Ntt> | 情報潮流提示方法および装置ならび情報潮流提示プログラムを記録した記録媒体 |
| JP2000242652A (ja) * | 1999-02-18 | 2000-09-08 | Nippon Telegr & Teleph Corp <Ntt> | 情報潮流検索方法、装置、および情報潮流検索プログラムを記録した記録媒体 |
| WO2005066837A1 (ja) * | 2003-12-26 | 2005-07-21 | Matsushita Electric Industrial Co., Ltd. | 辞書作成装置および辞書作成方法 |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP4034374B2 (ja) | 1997-02-18 | 2008-01-16 | 株式会社ニューズウオッチ | 情報検索システムおよび情報検索方法 |
| JP2001216311A (ja) | 2000-02-01 | 2001-08-10 | Just Syst Corp | イベント分析装置、及びイベント分析プログラムが格納されたプログラム装置 |
| JP2002183175A (ja) | 2000-12-08 | 2002-06-28 | Hitachi Ltd | テキストマイニング方法 |
| GB0307148D0 (en) * | 2003-03-27 | 2003-04-30 | British Telecomm | Data retrieval system |
| JP2004318723A (ja) | 2003-04-18 | 2004-11-11 | Nippon Telegr & Teleph Corp <Ntt> | 関連情報提供スケジュール作成方法および装置 |
| US20050120391A1 (en) * | 2003-12-02 | 2005-06-02 | Quadrock Communications, Inc. | System and method for generation of interactive TV content |
| JP4366249B2 (ja) * | 2004-06-02 | 2009-11-18 | パイオニア株式会社 | 情報処理装置、その方法、そのプログラム、そのプログラムを記録した記録媒体、および、情報取得装置 |
| WO2006046390A1 (ja) * | 2004-10-29 | 2006-05-04 | Matsushita Electric Industrial Co., Ltd. | 情報検索装置 |
| JP2007188225A (ja) | 2006-01-12 | 2007-07-26 | Yafoo Japan Corp | 要約文抽出システム |
-
2009
- 2009-03-02 US US12/672,085 patent/US8073851B2/en not_active Expired - Fee Related
- 2009-03-02 JP JP2009528429A patent/JP4388137B2/ja not_active Expired - Fee Related
- 2009-03-02 CN CN2009801012516A patent/CN101889281B/zh not_active Expired - Fee Related
- 2009-03-02 WO PCT/JP2009/000926 patent/WO2009113266A1/ja active Application Filing
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH07192009A (ja) * | 1992-03-23 | 1995-07-28 | Nippon Telegr & Teleph Corp <Ntt> | 情報の蓄積、検索および除去処理方法 |
| JPH05307569A (ja) * | 1992-05-01 | 1993-11-19 | Nippon Telegr & Teleph Corp <Ntt> | 時間変動する情報に対応する情報の蓄積及び検索方法 |
| JPH11175530A (ja) * | 1997-12-08 | 1999-07-02 | Nippon Telegr & Teleph Corp <Ntt> | 情報潮流提示方法および装置ならび情報潮流提示プログラムを記録した記録媒体 |
| JP2000242652A (ja) * | 1999-02-18 | 2000-09-08 | Nippon Telegr & Teleph Corp <Ntt> | 情報潮流検索方法、装置、および情報潮流検索プログラムを記録した記録媒体 |
| WO2005066837A1 (ja) * | 2003-12-26 | 2005-07-21 | Matsushita Electric Industrial Co., Ltd. | 辞書作成装置および辞書作成方法 |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120163772A1 (en) * | 2009-10-22 | 2012-06-28 | Shinji Nabeshima | Reproducing device, reproducing method, program and recording medium |
| US8538235B2 (en) * | 2009-10-22 | 2013-09-17 | Panasonic Corporation | Reproducing device, reproducing method, program and recording medium |
| CN101916268A (zh) * | 2010-08-04 | 2010-12-15 | 哈尔滨工业大学深圳研究生院 | 汉语词组库的建立及更新方法 |
| JP2020119254A (ja) * | 2019-01-23 | 2020-08-06 | 株式会社日立製作所 | テキストデータ収集装置及び方法 |
| JP7085499B2 (ja) | 2019-01-23 | 2022-06-16 | 株式会社日立製作所 | テキストデータ収集装置及び方法 |
| JP2022116312A (ja) * | 2019-01-23 | 2022-08-09 | 株式会社日立製作所 | テキストデータ収集装置及び方法 |
| JP7425827B2 (ja) | 2019-01-23 | 2024-01-31 | 株式会社日立製作所 | テキストデータ収集装置及び方法 |
| KR20200098381A (ko) * | 2019-02-11 | 2020-08-20 | 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. | 콘텐츠를 검색하는 방법, 장치, 기기 및 저장 매체 |
| KR102345401B1 (ko) * | 2019-02-11 | 2021-12-30 | 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. | 콘텐츠를 검색하는 방법, 장치, 기기 및 저장 매체 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN101889281B (zh) | 2012-10-17 |
| JP4388137B2 (ja) | 2009-12-24 |
| US8073851B2 (en) | 2011-12-06 |
| CN101889281A (zh) | 2010-11-17 |
| JPWO2009113266A1 (ja) | 2011-07-21 |
| US20100293169A1 (en) | 2010-11-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP4388137B2 (ja) | コンテンツ検索装置及びコンテンツ検索方法 | |
| US20250231951A1 (en) | Contextualizing knowledge panels | |
| US8005826B1 (en) | Identifying media content in queries | |
| US9654834B2 (en) | Computing similarity between media programs | |
| KR100921078B1 (ko) | 정보 처리 장치 및 방법 | |
| KR101061234B1 (ko) | 정보처리 장치와 방법, 및 기록 매체 | |
| US20060167859A1 (en) | System and method for personalized searching of television content using a reduced keypad | |
| US20090077056A1 (en) | Customization of search results | |
| US8452760B2 (en) | Relevancy presentation apparatus, method, and program | |
| US20120036139A1 (en) | Content recommendation device, method of recommending content, and computer program product | |
| JP2011529600A (ja) | 意味ベクトルおよびキーワード解析を使用することによるデータセットを関係付けるための方法および装置 | |
| JP2010067175A (ja) | ハイブリッド型コンテンツ推薦サーバ、推薦システムおよび推薦方法 | |
| CN103069825B (zh) | 用于电视搜索助手的系统和方法 | |
| US20070074254A1 (en) | Locating content in a television environment | |
| CN103984740A (zh) | 基于组合标签的检索页显示的方法和系统 | |
| US20180067935A1 (en) | Systems and methods for digital media content search and recommendation | |
| CN104854588A (zh) | 用于搜索标记的主要为非文本的项目的系统和方法 | |
| US8838616B2 (en) | Server device for creating list of general words to be excluded from search result | |
| JP2012065054A (ja) | 電子番組表生成システム、放送局、テレビ受信機、サーバ及び電子番組表生成方法 | |
| KR102072723B1 (ko) | 콘텐츠 추천어 제공 방법 및 그 콘텐츠 제공 장치 | |
| JP5545883B2 (ja) | 推薦データ成形方法、推薦データ成形装置および推薦データ成形プログラム | |
| JP5415369B2 (ja) | 番組検索装置および番組検索プログラム | |
| Feng et al. | A novel user behavioral aggregation method based on synonym groups in online video systems | |
| Kumar | Mining user interests from web history | |
| Li et al. | Personalized event-based news video retrieval with dynamic user-log |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| WWE | Wipo information: entry into national phase |
Ref document number: 200980101251.6 Country of ref document: CN |
|
| ENP | Entry into the national phase |
Ref document number: 2009528429 Country of ref document: JP Kind code of ref document: A |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09720267 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 12672085 Country of ref document: US |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 09720267 Country of ref document: EP Kind code of ref document: A1 |