WO2018115878A1

WO2018115878A1 - A method and system for digital linear media retrieval

Info

Publication number: WO2018115878A1
Application number: PCT/GB2017/053848
Authority: WO
Inventors: Olabode OLOWU; Alexander BRDAR
Original assignee: Really Neural Ltd
Current assignee: Really Neural Ltd
Priority date: 2016-12-20
Filing date: 2017-12-20
Publication date: 2018-06-28
Anticipated expiration: 2019-06-20
Also published as: GB201621768D0

Abstract

The present invention relates to a method for retrieving semantically related content for a linear media comprised of a plurality of segments. The method during playback of the linear media: the steps of searching for one or more content semantically related to a currently playing segment of the plurality of segments and displaying at least one of the one or more related content. A method for searching within a plurality of linear media each comprising one or more tracks is also disclosed. This method includes for each of the linear media the steps of: segmenting the one or more tracks for the linear media; and semantically analysing each segment of at least one of the tracks to build an index for the linear media; and receiving a query string then searching the index for each linear media to retrieve one or more matching segments using the query string. Systems and software are further disclosed.

Description

A Method And System for Digital Linear Media Retrieval Field of Invention The present invention is in the field of digital media searching and retrieval. More particularly, but not exclusively, the present invention relates to the semantic retrieval of content for linear media.

Background

Traditional video platforms, such as YouTube, iTunes, NetFlix, Vimeo, and BBC I Player, use a system that searches content hosted by each of the respective platforms. The user searches for content using search terms which are matched with various properties describing each media file. In essence, the metadata of each media file is searched for matching keywords. The results pages list the most relevant media files based on the search term supplied. Each media file is ranked on relevance to the search term, with the most relevant appearing first.

The behaviour and activity of the user is often used to recommend other similar media files of interest to the user. Some media content, such as linear media, is dynamic in nature therefore the content properties may be different at different times during playback of the media content. However, traditional media platforms do not provide for associated properties which differ within the media content. Traditional media platforms display a static description for the entire media object or file. There is a desire to provide for searching or retrieval of linear media content, such as audio or video, which provides content which is of greater relevance to the user than traditional media platforms. US 9,230,547 describes a method for transcribing metadata, such as text from speech, from a source media, such as audio/video. This method enables searching within a source media based upon a specific word or phrase within the transcription. It is an object of the present invention to provide a linear media searching and/or retrieval method and system which overcomes the disadvantages of the prior art, or at least provides a useful alternative.

Summary of Invention

According to a first aspect of the invention there is provided a method for retrieving semantically related content for a linear media comprised of a plurality of segments, including:

during playback of the linear media:

searching for one or more content semantically related to a currently playing segment of the plurality of segments; and

displaying at least one of the one or more related content.

The linear media may comprise a plurality of tracks, each track split into the plurality of segments. The plurality of tracks may include a video track and/or an audio track.

The method may further including the steps of: displaying a link to each of the one or more related content; and receiving input from a user to actuate one of the links. The one or more related content associated with the one link may be displayed only after actuation of the one link. The at least one of the one or more related content may be displayed automatically at the end of playback of the currently playing segment. In one embodiment, only the most relevant of the one or more related content is displayed automatically.

Each of the one or more content may be a linear media. Eacj of the one or more content may be a segment within a linear media. One of the one or more content may be played-back and during playback of the one content: searching for further one or more content semantically related to a currently playing segment of the content and displaying at least one of the further one or more related content.

The current segment of the plurality of segments may be semantically analysed. The current segment may be semantically analysed by extracting a textual transcription.

The textual transcription may be extracted from the audio track.

The method may further include, for each of the one or more content and prior to playback of the digital media, the steps of: segmenting an audio track and a video track for each content; and semantically analysing each segment of the audio track to build an index for the content.

The linear media may be partitioned into segments based upon one or more selected from the set of video changes, audio changes, and transcription analysis.

The one or more content may be semantically related to the currently playing segment using a natural language processing method. According to a further aspect of the invention there is provided a method for searching within a plurality of linear media each comprising one or more tracks, including:

for each of the linear media:

segmenting the one or more tracks for the linear media; and

semantically analysing each segment of at least one of the tracks to build an index for the linear media;

receiving a query string; and

searching the index for each linear media to retrieve one or more matching segments using the query string.

Each of the one or more tracks include an audio track and/or a video track.

The method may further include the step of: playing back at least one of the segments which match the query string. The at least one segments may be played back in sequence constructed based upon segments with the closest match to the query string. The method may further include the step of: displaying a plurality of segments for linear media which match the query string.

The method may further include the step of: receiving input from a user who provided the query string to select at least one of the plurality of segments, wherein the selected segment is played back. Each matching segment may be played back in sequence.

The matching segments may be those which match the query string within a defined threshold. The defined threshold may be predefined or the defined threshold may defined by a user who provided the query string.

The size of the segments may be defined by the user. Semantically analysing each segment may include extracting a textual transcription from the audio track.

Semantically analysing each segment may include retrieving textual information corresponding to the audio track from a database.

One or more tracks of each of the linear media may be partitioned into segments based upon one or more selected from the set of video changes, audio changes, and transcription analysis.

A sequence of user interface elements may be displayed within a user interface, each element may correspond to one content segment in a chain of semantically related content. Each user interface element, when actuated by a user, may play-back the corresponding content segment. During play-back of the corresponding content segment, the corresponding related content may be displayed.

According to a further aspect of the invention there is provided a system for retrieving content for a linear media comprising a plurality of segments, comprising:

A processor configured to, during playback of the linear media on a playback device, search for one or more content semantically related to a currently playing segment of the plurality of segments;

A memory configured to store the one or more content; and

A playback device configured to displaying at least one of the one or more related content.

According to a further aspect of the invention there is provided a system for searching within a plurality of linear media comprising one or more tracks, comprising:

One or more processors configured to, for each of the plurality of linear media: segment the one or more tracks for the linear media and semantically analyse each segment of at least one of the tracks to build an index for the linear media, to receive a query string, and to search the index for each linear media to retrieve one or more matching segments using the query string; and

A memory configured to store the index for each of the plurality of linear media.

Other aspects of the invention are described within the claims. Brief Description of the Drawings

Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which: Figure 1 : shows a system for retrieving content for linear media in accordance with an embodiment of the invention;

Figure 2: shows a system for search within linear media in accordance with an embodiment of the invention;

Figure 3: shows a method for retrieving content for linear media in accordance with an embodiment of the invention;

Figure 4: shows a method for search within linear media in accordance with an embodiment of the invention;

Figure 5: shows a user interface for displaying linear media in accordance with an embodiment of the invention; Figure 6: shows a user interface for displaying linear media in accordance with an embodiment of the invention; Figure 7: shows a user interface for displaying linear media in accordance with an embodiment of the invention; and

Figure 8: shows a user interface for displaying linear media in accordance with an embodiment of the invention.

Detailed Description of Preferred Embodiments

The present invention provides a linear media searching and/or retrieval method and system.

For one embodiment, the inventors have discovered that content for linear media, such as audio or video, can be retrieved based upon a currently playing segment of the linear media. In this way, content of greater relevance may be able to be displayed to a user. Notably the inventors have noticed that content that is semantically related to the currently playing segment may be of greater relevance to a user.

In another embodiment, the inventors have discovered linear media can be partitioned into segments and semantically analysed to build an index may be searched by a query string. In this way, segments of the linear media that are of particular relevance to a query string may be retrieved.

In Figure 1 , a system 100 for retrieving content for linear media in accordance with an embodiment of the invention is shown.

The system 100 includes a processor 101 . The processor 101 may be configured to search for content semantically related to a segment of a linear media. The processor 101 may be further configured to segment the linear media into a plurality of segments. The processor 101 may search for the content during playback of the segment of the linear media on a playback device 102. The linear media may comprise one or more tracks, and the tracks may include an audio track and/or a video track. The content may be one or more segments from one or more other linear media.

It will be appreciated that the processor 101 may comprise a plurality of physical processors which may perform one or more of functions described, and that the physical processors may be proximally or distributed across a computing system.

The system 100 may include a memory 103. The memory 103 may be connected to the playback device 102 and configured to store content. The memory 103 may also be configured to store the linear media. It will be appreciated that the memory 103 may comprise one or more physical memory modules which may be located proximally to one another or distributed across a computing system.

The system 100 may include a playback device 102. The playback device 102 may be a user device, such as a desktop or laptop computer, a tablet, smartphone, or smart television. The playback device 102 may include an output 104 for reproducing content. The output 104 may be also configured for reproducing the linear media. The output 104 may be, for example, an audio output, an electronic display, or a combination of both. The playback device 102 may include a user input, such as a touch input (e.g. a touchpad or a touch-screen), a pointer device, and/or a keyboard. The content reproduced may that located by the processor 101 during its search.

It will be appreciated that the processor 101 , the memory 103, and the playback device 102 may exist within a single apparatus or may be distributed across a plurality of apparatuses. The plurality of apparatuses may be connected via one or more communications systems such as a network or a plurality of networks such as the Internet. In Figure 2, a system 200 for searching within a plurality of linear media in accordance with an embodiment of the invention is shown.

The system 200 includes one or more processors 201 . One of the processors 201 may be configured to segment track(s) of the linear media into a plurality of segments for each linear media. One of the processors 201 may be configured to semantically analyse each segment of at least one of the tracks for the linear media to build an index. One of the processors 201 may be configured to receive a query string and search the index for the linear media to retrieve one or more matching segments using the query string. The tracks may include an audio track and/or a video track.

The system 200 may include a communication module 202. The communications module 202 may be configured for receiving a request comprising a query string from another apparatus and providing the query string to one of the processors 201 to retrieve the matching segments. The apparatus may be a user device. The communications module 202 may be further configured to forward one or more of the matching segments retrieved by one of the processors 201 to the requesting apparatus.

The system 200 may include a memory 203. The memory 203 may be configured for storing the index for each linear media, and may be configured for also storing the linear media. It will be appreciated that the processor(s) 201 , the memory 203, and the communications module 202 may exist within a single apparatus or may be distributed across a plurality of apparatuses. The plurality of apparatuses may be connected via one or more communications systems such as a network or a plurality of networks such as the Internet.

Referring to Figure 3, a method 300 for retrieving content for linear media in accordance with an embodiment of the invention will be described. In step 301 , during playback of linear media (e.g. on playback device 102), one or more content may be searched for (e.g. by processor 101 ) which is semantically related to a currently playing segment of the linear media.

It may be determined that the segments may be semantically related via a natural language processing method.

The linear media may comprise one or more tracks. For example, the linear may comprise an audio track and/or a video track.

The linear media may be segmented by one of the following methods:

a) By Audio In one embodiment, an audio track of the linear media may be analysed to detect audio changes. When the changes exceed a threshold, the linear media may be segmented at this change in the audio track. In another embodiment, the audio track may be processed to detect meaningful segments based upon the identification of point in time (PIT) events.

For example, speaker diarisation may be used to determine different speakers within the linear media. Point in time events may then be generated for each speaker.

By way of illustration, in a recording of a broadcast political interview it is usual for an interviewer to introduce a guest, set up the conversation, and then to start asking pertinent questions. The guest will then answer the questions, and the interviewer may then ask follow up questions. To help in determining meaningful segments, speaker diarisation may be used to determine at what PIT the interviewer is speaking and then the guest interviewee. Those PIT events are recorded as time points in the media.

A concrete example might be an interview with Frank Gehry, the architect, from TED talks. The transcript is available here http://www.ted.com/talks/frank_gehry_asks_then_what/transcript7langu age=en

At 6:28 the interviewer asks: "What's the status of the New York project?". Frank Gehry then responds for nearly a minute. The interviewer has a follow-up question based on Gehry's answer at 7:25: "The picture on the screen, is that Disney?". Gehry responds with a one-word answer and another follow-up question is asked at 7:30 "How much further along is it than that, and when will that be finished?". The output from this is a set of PIT events related to the stream which has:

6:28 Speaker A

6:31 Speaker B

7:25 Speaker A

7:29 Speaker B

7:30 Speaker A

And so on...

Speaker diarisation is a process which allows the system to understand how many people are speaking and attempts to distinguish when each individual starts and finishes speaking. At this point there is no meaning derived, it's simply a set of events recorded at different time points in the media when a speaker was talking.

This is a very simple example with two speakers in an interview setting. In fact the simplest example is when a single person is speaking on the audio track. A more complex example might be where multiple speakers from an audience for example are asking questions of a lecturer, in which case one person (the lecturer) would be speaking throughout the audio track and many members of the audience might ask the lecturer questions.

Meaning can be identified from these states to broadly classify a type of audio track, which may help the system in understanding the media:

A monologue, such as a non-interactive lecture, containing a single speaker

An interview or discussion with two speakers on the track A Q&A session (like a lecture or a White House press briefing with the president) where one person speaks at different points throughout the track to communicate their point of view and multiple other people ask questions at different points in the track. b) By Video

A video track of the linear media may be to detect visual changes. When the visual changes exceed a threshold, the linear may be segmented at the change.

For example, the video track is scanned frame-by-frame for scene changes. In alternative embodiments, object recognition techniques may be used to scan the video and determine further meaning.

Scene changes are calculated by compiling a histogram of each frame. The frames are checked for large changes in the histogram. As scenes change the histogram profile between two frames will change significantly. A histogram records the colour profile of a frame as graph of values.

For example, consider a news broadcast. During a typical news broadcast, an anchor will speak to introduce a report. At some point the anchor will stop and the report will start. The point at which that switch occurs from the anchor to the report will have a significant histogram change.

Using this technique point in time events may be created for large histogram changes, i.e. where the histogram of a frame differs significantly by more than 30% from the previous. Segments may be created originating and ending at these point in time events. c) By Transcript

A textual transcript of an audio track for the linear media may be analysed to detect an appropriate segment point, for example, where there are gaps in the speech.

In one example, the transcript is processed using NLP techniques to aid in chunking up the text into segments of related speech. First the entire text is broken up into a set of sentences. The sentences are then each run individually through an NLP parser, like a network grammar parser, which gives a parse tree detailing the structure (nouns, pronouns, adjectives, verbs, etc.) of a sentence and the grammatical relations between the words in the sentence. This allows the system to determine, for each sentence, what the meaning is of a sentence.

For example, the phrase "I saw the man with glasses" is broken down as follows:

The root (saw) determines what the subject (i) is doing. The object (man) was seen with glasses.

This helps the system create meaning segments because now from a single sentence intent and meaning have been identified. d) By Time

Segments may be defined based upon a specific time period, for example, every 30 seconds. e) By Size

Segments may be defined based upon a specific data size for segment within the linear media, for example, every 6000 Kbs. f) A Combination of Two or More of the Above

In one embodiment, the point in time audio and video events may be used to segment the parsed sentences. Using the example transcript from the Frank Gehry interview as given earlier:

6:28 RSW: What's the status of the New York project?

6:31 FG: ί don't really know. Tom Krens came to me with Bilbao and explained it all to me, and ! thought he was nuts, i didn't think he knew what he was doing, and he pulled it off. So, I think he's Icarus and Phoenix all in one guy. (Laughter) He gets up there and then he ... comes back up. They're still talking about it. September 11 generated some interest in moving it over to Ground Zero, and I'm totally against that. I just feel uncomfortable talking about or building anything on Ground Zero I think for a long time.

7:25 RSW: The picture on the screen, is that Disney?

7:29 FG: Yeah.

7:30 RSW: How much further along is it than that, and when will that be finished? 7:33 FG: That will be finished in 2003 -- September, October -- and I'm hoping Kyu, and Herbie, and Yo-Yo and all those guys come play with us at that place. Luckily, today most of the people I'm working with are people I really like. Richard Koshaiek is probably one of the main reasons that Disney Hall came to me. He's been a cheerleader for quite a long time. There aren't many people around that are really involved with architecture as clients. If you think about the world, and even just in this audience, most of us are involved with buildings. Nothing that you would call architecture, right? And so to find one, a guy like that, you hang on to him. He's become the head of Art Center, and there's a building by Craig Ellwood there. I knew Craig and respected him. They want to add to it and it's hard to add to a building like that -- it's a beautiful, minimalist, black steel building -- and Richard wants to add a library and more student stuff and it's a lot of acreage. I convinced him to let me bring in another architect from Portugal: Alvaro Siza.

The system knows three things at this point:

1 . Audio cues which tell us that there are two participants and the PIT that each starts speaking

2. Video cues relating to changes in the video. This is not particularly useful in this instance, there are very few and they turn out not to relate to the conversation very much in this segment.

3. A parse tree for each of the sentences.

Putting all of this together, the system can analyse the video cues as being related because a picture is shown at one point (around 7:27) which is a cutaway from the main shot of the two men speaking to a picture of a building. The histogram is significantly different at this point against the average over the past few frames, and so the system knows that at this point something has happened in the conversation.

In this example the system can mix the PIT events which identifies: 1 . There are 2 participants in the conversation, so the system makes a determination that this is an interview.

2. When each participant is speaking.

3. The heuristics of an interview may enable the system to conclude that one of the participants is making statements/asking questions and that the other is answering those questions and elaborating on them.

4. When changes in the video occur, such as the cutaway shots. To segment the text, the system may use the PIT events and attempt to relate them with some small degree of variance to the parsed transcript - i.e. the change in the video to cutaway to a photo at 7:27 relates to the question being asked which starts some two seconds previously, so the system has some tolerance there where PIT events don't have to exactly relate to changes in speech or text.

1 . The first sentence is simple to determine, because the video doesn't vary and the audio PIT tags start and end with the sentence:

6:28 RSW: What's the status of the New York project?

2. Because the system knows this is an interview, the second sentence then becomes an answer to that question, and the system can determine that this is an answer to the question. The system can then use the audio PIT events to determine that a single speaker is stringing a set of sentences together, and that they are related in their meaning.

6:31 FG: I don't really know. Tom Krens came to me with Bilbao and explained it all to me, and I thought he was nuts. I didn't think he knew what he was doing, and he pulled it off. So, I think he's Icarus and Phoenix all in one guy. (Laughter) He gets up there and then he ... comes back up. They're still talking about it. September 11 generated some interest in moving it over to Ground Zero, and I'm totally against that. I just feel uncomfortable talking about or building anything on Ground Zero I think for a long time.

A simple segmentation approach would be to take these two sets of sentences create a single segment from them. The meaning then becomes about the "New York project". If a currently playing segment is about "architecture projects in New York" or "Frank Gehry's New York projects" both of these will semantically relate to these sentences and this segment will be surfaced as one related content. Each (or at least some) of the content may be a segment from another linear media. During play-back of this segment, content related to this currently played-back segment may be semantically searched for and displayed. This process may be repeated. In this way, for example, a sequence of content segments, each segment relevant to each previously displayed segment may be played-back to a user.

The linear media segment may be displayed within a user interface (such as described in relation to Figure 5). A start and stop time markers of the linear media segment may be determined and the resulting portion of the segment may be saved to a favourites list or only that portion may be used to determine semantically related segments. The start and stop time markers may be defined by the user during play-back (for example, via a slider user interface control) or may be detected automatically based upon how long the user views the segment.

In step 302, at least one of the related content that is located (e.g. by processor 101 ) could be displayed (e.g. on the playback device 102). The related content that is selected to be displayed could be selected based upon user selection (e.g. at the playback device) from a plurality of related content options displayed to the user (e.g. at the playback device 102), or it could be automatically selected. Automatic selection may be based upon the most relevant content identified. The related content may be a segment of linear media and may also be played-back with the user interface (as described in Figure 5). Start and stop markers may be defined by a user for the related linear media content that is played-back. These may refine the segment size that is delivered for future semantic matches. The content options may be displayed as a plurality of links, each link corresponding to a related content. Actuation of the link by the user within a user interface (such as described in relation to Figure 5) at the playback device 102 may trigger playback of the corresponding related content. The semantic relationship between content segments within a sequence may be displayed within a user interface (such as that described in relation to Figure 7) as a series of bookmarks or "breadcrumb trails". In one embodiment, a user may "back-track" to a previous segment by selecting the bookmark. The segments semantically related to that previous segment may then be displayed again.

Referring to Figure 4, a method 400 for searching within a plurality of linear media in accordance with an embodiment of the invention will be described. Each of the linear media may comprise one or more tracks, such as an audio track and/or a video track.

In step 401 , each track of each linear media is segmented (e.g. at processor 201 ).

The tracks may be segmented in any of ways described in relation to Figure 3, for example, by audio, video, transcript, time, size, or any combination. In one embodiment, the tracks may be segmented by a time period provided by a user who provides the subsequent query string.

In step 402, each segment of at least one of the tracks is semantically analysed to build an index for the linear media (e.g. at processor 201 ). The segments may be semantically analysed using a natural language processing method to extract meaning from the segments. In one embodiment, the semantic analysis includes extraction of a transcription from the linear media. For example, an audio track for the linear media may be transcribed or a transcription for the audio track may be retrieved from the database, and the transcription may be used to extract text transcribed from speech occurring within the segment. In step 403, a query string is received (e.g. at processor 201 ). The query string may be received from another apparatus such as a user apparatus via, for example, communications module 202. The query string may be provided by a user at the user apparatus. In step 404, the index for the linear media is searched to retrieve one or more matching segments using the query string.

The matching segments may be those segments which match the query string within a defined threshold. The threshold for matching may be predefined or provided by the user who provided the query string.

In one embodiment, the method described in relation to Figure 3 may then be used. For example, one of the matching segments above may be played-back and content segment semantically related to the one matching segment may be retrieved in accordance with method 300.

In one embodiment, the following search method is used to located matching segments using a query string: User Search

A user may search using a natural language search or search phrases. To distinguish between a natural language search and simple search phrases within the query string the system makes the following determination:

A statistical sentence detector is used to determine if there is one or more sentences. If there is not then the search falls back to using the search phrases approach.

A network grammar parser is run over the sentence(s) produced to determine if there is a consistent sentence structure and that the sentence makes sense. If this is the case then the natural language search is used. · If the network grammar parser fails to find a coherent sentence then the search falls back to using the search phrases approach

Search Phrase Search A simple statistical parser is used to pull out the following which are then used as separate search phrases (OR'd and weighted) for the search method. A statistical parser is used here because the syntax of the text is not well-formed enough for a network grammar parser to coherently extract the information. This approach then is a method for constructing a search derived from simple search phrases which are not complete sentences. The statistical parser attempts to break down the text into parts-of-speech.

An example of the weightings given to parts-of-speech from this statistical parsing could be the following:

· Proper Nouns (High weighting)

Noun Phrases (Medium weighting)

Nouns (Medium-Low weighting)

Verb Phrases (Low weighting) The search phrase "gene therapy" could be entered into the search and will return results from the search method based on the transcript chunks. Multiple search phrases are OR'd so that the search method can find documents which they are related to:

"gene therapy" OR "depression"

With the mechanics of the search method, this OR'd expression weights documents higher which match both phrases. Additionally, the query may be modified with weightings as described above so that proper nouns are given priority.

This search provides a good-enough fit where a user has entered simple search phrases.

Natural Language Search

A natural language search such as "what are the latest gene therapy treatments for treating depression?" will be treated differently. The network grammar parser is applied to this sentence so that the meaning of the sentence can be derived.

The following method may then be applied:

· Temporal components are pulled from the sentence and provide a time-frame for the search. For example, the temporal modifier is "latest" which is not to match against the documents, as it is a modifier for the search. The search method can understand this to mean many things based on the corpus of results it has, and so this is applied as a filter after the search has completed and can mean:

o Given 1 ,000 results spanning 15 years, take the past years' worth of results

o Given 10 results spanning 12 months, take the past 3 months' worth of results

· The users' previous searches are also taken into account for providing temporal filters. For example, a user who has recently been searching for or who has clicked through to media which has been released in the past 3 months, it can be derived that this user wishes for searches to be filtered in this way. A decision tree is trained to determine how these temporal filters will work. The decision tree is retrained and refined over time to include analytics (and therefore user interactions) along with temporal modifiers in the search (e.g. "latest") to provide a best-fit for the users' search.

The search method will use the extracted and stored meanings for the chunked sentences to match against the non-temporal elements of the parsed sentence structure. The main parts of the search phrases are now taken into account. The search phrases for "gene therapy" and "treating depression" are now pulled from the sentence and OR'd, similar to the search phrase search. However, this now runs with a search method where the parsed constructs have been stored and not against the chunked text itself.

Historical user behaviour may be used to improve the search results. For example, the historical user behaviour may include:

1 . How long has the user watched a segment;

2. Any pauses, rewinds or increases in volume while watching a segment which will improve the ranking for that segment; and

3. Any forward, video 'closes' or end, or volume reductions which will reduce the ranking for that segment.

Historical user behaviour may be used to improve search results for all users or for that specific user. A specific user may be identified via the use of a user login or stored cookie or matching device identifiers or other identifier, or via the user of biometric such as facial recognition or retinal eye scans.

A user interface 500 for use with embodiments of the invention will be described with reference to Figure 5. The user interface 500 includes a first visual portion 501. The first portion 501 is configured for displaying the video track for linear media.

The user interface 500 may include visual control portion 502 for the linear media which may contain user-actionable controls such as standard controls: play, rewind, fast-forward, pause, and top, and special controls: play next related segment now, and play previous segment now.

The user interface 500 may include a second visual portion 503. The second portion 503 is configured for displaying links (504a to 504d) for one or more content related to the segment currently being played within the first portion 501 . The links (504a to 504d) may be visual representations of the one or more content such as thumbnails or icons, or textual representations of the one or more content such as titles or summaries, or a combination of both. The links (504a to 504d) may be user-actionable to select the next content to display within the first portion 501. The content may be a segment within a linear media.

The user interface 500 may include further elements as shown in Figures 6, 7, and 8.

The inventors have discovered that jumping from related segment to related segment may create difficulties for the user to track their search results. For example, if the system 100 or 200 indexes a corpus of videos from the technology category of TED Talks and the search phrase "Will robots take our jobs?" is submitted.

The first result returned may be a segment within the video by David Autor: Will automation take away. The sentence "Automation, employment, robots and jobs - it'll all take care of itself?" within the video is most relevant to the search phrase. Once the user starts watching the video at this point. The search engine analyses the segment being watched to surface similar content. The most relevant segment is within the video by Radhika Nagpal: What intelligent machines can learn from a school of fish. This jump to a seeming unrelated result to the original search term can make it difficult for users to keep track of the search path which got them to any segment search result.

The inventors have devised several elements which may improve the user interface 500 to assist users in tracking search results and/or their search "path". A search path is a sequence of content segments retrieved during play-back as described in relation to Figure 3 above.

In one embodiment, when the user selected pause within 502, the dynamic semantic search is also paused and the results will not change. The user can then actuate another control "favourites" to add the results at that point within the video to a later retrievable list of favourites. The favourites may also be organisable by the user via a user interface into folders. In one embodiment illustrated in Figure 6, the user is provided with further controls within 502 to "clip the video segment" to focus on the parts of the video that are of most interest to the user. This "clipped" segment can then be saved to favourites list. In one embodiment illustrated in Figure 7, details in relation to the search path may be recorded and displayed within a user interface. The user may then select, within the user interface, one or more of the search phrases or ancestor videos which surfaced from a previously typed search within this search path to return to the search results displayed at that point in the search path. In one embodiment illustrated in Figure 8, links between favourite related segments can be displayed using a relationship diagram such as a spider diagram within a user interface. A user can then select a node and the segment associated with that node will be played-back.

Potential advantages of some embodiments of the present invention include: an improved index can be created a linear media to enable the retrieval of relevant content, and an improved retrieval of relevant content relating to linear media is possible.

While the present invention has been illustrated by the description of the embodiments thereof, and while the embodiments have been described in considerable detail, it is not the intention of the applicant to restrict or in any way limit the scope of the appended claims to such detail. Additional advantages and modifications will readily appear to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details, representative apparatus and method, and illustrative examples shown and described. Accordingly, departures may be made from such details without departure from the spirit or scope of applicant's general inventive concept.

Claims

1 . A method for retrieving semantically related content for a linear media comprised of a plurality of segments, including:

during playback of the linear media:

searching for one or more content semantically related to a currently playing segment of the plurality of segments; and displaying at least one of the one or more related content.

2. A method as claimed in claim 1 , wherein the linear media comprises plurality of tracks, each track split into the plurality of segments.

3. A method as claimed in claim 2, wherein the plurality of tracks include a video track.

4. A method as claimed in any one of claims 2 to 3, wherein the plurality of tracks include an audio track.

5. A method as claimed in any one of the preceding claims, further including:

displaying a link to each of the one or more related content; and receiving input from a user to actuate one of the links;

wherein the one or more related content associated with the one link is displayed only after actuation of the one link.

6. A method as claimed in any one of the preceding claims, wherein the at least one of the one or more related content is displayed automatically at the end of playback of the currently playing segment.

7. A method as claimed in claim 6, wherein only the most relevant of the one or more related content is displayed automatically.

8. A method as claimed in any one of the preceding claims, wherein each of the one or more content is a linear media.

9. A method as claimed in claim 8, wherein each of the one or more content is a segment within a linear media.

10. A method as claimed in any one of claims 8 to 9, wherein one of the one or more content is played-back and during playback of the one content:

searching for further one or more content semantically related to a currently playing segment of the content; and

displaying at least one of the further one or more related content.

11. A method as claimed in any one of the preceding claims, wherein the current segment of the plurality of segments is semantically analysed.

12. A method as claimed in claim 1 1 , wherein the current segment is semantically analysed by extracting a textual transcription.

13. A method as claimed in claim 12, when dependent on claim 4, wherein the textual transcription is extracted from the audio track.

14. A method as claimed in any one of the preceding claims, further including:

for each of the one or more content and prior to playback of the digital media:

segmenting an audio track and a video track for each content; and

semantically analysing each segment of the audio track to build an index for the content.

15. A method as claimed in any one of the preceding claims, wherein the linear media is partitioned into segments based upon one or more selected from the set of video changes, audio changes, and transcription analysis.

16. A method as claimed in any one of the preceding claims, wherein the one or more content is semantically related to the currently playing segment using a natural language processing method.

17. A method as claimed in any one of the preceding claims, when dependent on claim 10, wherein a sequence of user interface elements is displayed within a user interface, each element corresponding to one content segment in a chain of semantically related content.

18. A method as claimed in claim 17, wherein each user interface element, when actuated by a user, plays-back the corresponding content segment.

19. A method as claimed in claim 18, wherein during play-back of the corresponding content segment, the corresponding at least one of the further one or more related content is displayed.

20. A method for searching within a plurality of linear media each comprising one or more tracks, including:

a) for each of the linear media:

segmenting the one or more tracks for the linear media; and semantically analysing each segment of at least one of the tracks to build an index for the linear media;

b) receiving a query string; and

c) searching the index for each linear media to retrieve one or more matching segments using the query string.

21 . A method as claimed in claim 20, wherein each of the one or more tracks include an audio track.

22. A method as claimed in any one of claims 20 to 21 , wherein each of the one or more tracks include a video track.

23. A method as claimed in any one of claims 20 to 22, further including playing back at least one of the segments which match the query string.

24. A method as claimed in claim 23, wherein the at least one segments are played back in sequence constructed based upon segments with the closest match to the query string.

25. A method as claimed in claim 23, further including displaying a plurality of segments for linear media which match the query string.

26. A method as claimed in any one of claims 20 to 25, further including receiving input from a user who provided the query string to select at least one of the plurality of segments, wherein the selected segment is played back.

27. A method as claimed in claim 26, wherein each matching segment is played back in sequence.

28. A method as claimed in any one of claims 20 to 27, wherein the matching segments are those which match the query string within a defined threshold.

29. A method as claimed in claim 28, wherein the defined threshold is predefined.

30. A method as claimed in claim 28, wherein the defined threshold is defined by a user who provided the query string.

31 . A method as claimed in any one of claims 20 to 30, wherein the size of the segments are defined by the user.

32. A method as claimed in any one of claims 20 to 31 , when dependent on claim 21 , wherein semantically analysing each segment includes extracting a textual transcription from the audio track.

33. A method as claimed in any one of claims 20 to 32, when dependent one claim 21 , wherein semantically analysing each segment includes retrieving textual information corresponding to the audio track from a database.

34. A method as claimed in any one of claims 20 to 33, wherein the one or more tracks of each of the linear media are partitioned into segments based upon one or more selected from the set of video changes, audio changes, and transcription analysis.

35. A system for retrieving content for a linear media comprising a plurality of segments, comprising:

A memory configured to store the one or more content; and

36. A system for searching within a plurality of linear media comprising one or more tracks, comprising:

One or more processors configured to, for each of the plurality of linear media: segment the one or more tracks for the linear media and semantically analyse each segment of at least one of the tracks to build an index for the linear media, to receive a query string, and to search the index for each linear media to retrieve one or more matching segments using the query string; and A memory configured to store the index for each of the plurality of linear media.

37. A computer program configured to perform the method of any one of claims 1 to 35.

38. A computer readable medium configured to store the computer program of claim 37.