WO2018208192A1

WO2018208192A1 - A system and methods to provide recommendation for items of content

Info

Publication number: WO2018208192A1
Application number: PCT/SE2017/050458
Authority: WO
Inventors: Rafia Inam; Erlendur Karlsson; Lackis ELEFTHERIADIS
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 2017-05-08
Filing date: 2017-05-08
Publication date: 2018-11-15
Anticipated expiration: 2019-11-08

Abstract

Method performed by a first communication device (101), the method being for handling a recommendation to a second communication device (102). The first communication device (101) and the second communication device (102) operate in a telecommunications network (100). The first communication device (101) determines (202) a recommendation for an item of content to be provided by a third communication device (103) operating in the telecommunications network (100). The determination of the recommendation is based on signals collected by a receiving device (130) located in a space where one or more users of the third communication device (103) are located, during a time period. The signals are at least one of: audio signals and video signals. The first communication device (101) initiates (203) sending a first indication of the determined recommendation to the second communication device (102).

Description

A system and methods to provide recommendation for items of content

TECHNICAL FIELD

The present disclosure relates generally to a first communication device and methods performed thereby for handling a recommendation to a second communication device. The present disclosure also relates generally to the second communication device, and methods performed thereby for handling the recommendation from the first communication device. The present disclosure additionally relates generally to a fourth communication device, and methods performed thereby for handling a characterization of a context of a third communication device. The present disclosure further relates generally to a computer program product, comprising instructions to carry out the actions described herein, as performed by the first communication device, the second

communication device, or the fourth communication device. The computer program product may be stored on a computer-readable storage medium.

BACKGROUND

Wireless devices within a telecommunications network may be e.g., stations (STAs), User Equipments (UEs), mobile terminals, wireless terminals, terminals, and/or Mobile Stations (MS). Wireless devices are enabled to communicate wirelessly in a cellular communications network or wireless communication network, sometimes also referred to as a cellular radio system, cellular system, or cellular network. The communication may be performed e.g. between two wireless devices, between a wireless device and a regular telephone, and/or between a wireless device and a server via a Radio Access Network (RAN) , and possibly one or more core networks, comprised within the

telecommunications network. Wireless devices may further be referred to as mobile telephones, cellular telephones, laptops, or tablets with wireless capability, just to mention some further examples. The wireless devices in the present context may be, for example, portable, pocket-storable, hand-held, computer-comprised, or vehicle-mounted mobile devices, enabled to communicate voice and/or data, via the RAN, with another entity, such as another terminal or a server.

The telecommunications network covers a geographical area which may be divided into cell areas, each cell area being served by a network node or Transmission Point (TP), for example, an access node such as a Base Station (BS), e.g. a Radio Base Station (RBS), which sometimes may be referred to as e.g., evolved Node B ("eNB"), "eNodeB", "NodeB", "B node", or BTS (Base Transceiver Station), depending on the technology and terminology used. The base stations may be of different classes such as e.g. Wide Area Base Stations, Medium Range Base Stations, Local Area Base Stations and Home Base Stations, based on transmission power and thereby also cell size. A cell is the

geographical area where radio coverage is provided by the base station at a base station site. One base station, situated on the base station site, may serve one or several cells. Further, each base station may support one or several communication technologies. The telecommunications network may also be a non-cellular system, comprising network nodes which may serve receiving nodes, such as wireless devices, with serving beams.

In the context of this disclosure, the expression Downlink (DL) is used for the transmission path from the base station to the wireless device. The expression Uplink (UL) is used for the transmission path in the opposite direction i.e., from the wireless device to the base station.

In 3rd Generation Partnership Project (3GPP) Long Term Evolution (LTE), base stations, which may be referred to as eNodeBs or even eNBs, may be directly connected to one or more core networks.

3GPP LTE radio access standard has been written in order to support high bitrates and low latency both for uplink and downlink traffic. All data transmission in LTE is controlled by the radio base station.

A Recommender System may be understood as a system that may recommend items of potential interest to a user within a particular area or scope. Today, there exist recommender systems that may recommend books, financial services, garments, movies, music, news, research articles, restaurants, romantic partners -online dating-, social tags, and products in general. The Recommender systems of today may typically produce a list of recommendations in one of two ways: through collaborative and content-based filtering. Collaborative filtering approaches may build a model from a user's past behavior -such as items previously purchased or selected and/or numerical ratings given to those items-, as well as similar decisions made by other users with similar taste and preferences to those of the user. This model may then be used to predict items, or ratings for items, that the user may have an interest in. Content-based filtering approaches may be understood to utilize a series of discrete characteristics of an item in order to recommend additional items with similar properties. These approaches may often be combined in so called Hybrid Recommender Systems. These existing Recommender Systems use a relatively static sample set of preferences, likes and dislikes of a user, and may result in

recommendations that do not match the preferences of the user, which may result in turn in a waste of system resources, and low user satisfaction.

SUMMARY

It is an object of embodiments herein to improve the handling a recommendation for an item of content in a telecommunications network.

According to a first aspect of embodiments herein, the object is achieved by a method performed by a first communication device. The method is for handling a recommendation to a second communication device. The first communication device and the second communication device operate in a telecommunications network. The first communication device determines a recommendation for an item of content to be provided by a third communication device operating in the telecommunications network. The determination of the recommendation is based on signals collected by a receiving device located in a space where one or more users of the third communication device are located, during a time period. The signals are at least one of: audio signals and video signals. The first communication device also initiates sending a first indication of the determined recommendation to the second communication device.

According to a second aspect of embodiments herein, the object is achieved by a method performed by a fourth communication device. The method for handling a characterization of a context of the third communication device having the one or more users. The first communication device and the third communication device may operate in the telecommunications network. The fourth communication device obtains signals collected by the receiving device located in the space where the one or more users of the third communication device are located, during the time period. The signals are at least one of: audio signals and video signals. The fourth communication device determines the characterization of the context of the third communication device by determining one or more factors. The one or more factors are obtained from an analysis of the obtained signals. The one or more factors comprise at least one of: a) one or more characteristics of the one or more users from: number, gender, age, and identity of the one or more users; b) a first mood derived from a tone of one or more voices detected in the audio signals; c) a second mood derived from a first semantic analysis of a language used by one or more voices detected in the audio signals; d) a topic of discussion derived from a second semantic analysis of a language used by one or more voices detected in the audio signals, and e) a third mood derived from at least one of: a body movement and a gesture detected in each of the one or more users. The fourth communication device then initiates sending a second indication of the determined characterization of the context to the first communication device operating in the telecommunications network or another communication device operating in the telecommunications network.

According to a third aspect of embodiments herein, the object is achieved by a method performed by the second communication device. The method is for handling the recommendation from the first communication device. The first communication device and the second communication device operate in the telecommunications network. The second communication device receives the first indication for the recommendation for the item of content to be provided by the third communication device operating in the telecommunications network. The recommendation is based on the signals collected by the receiving device located in a space where one or more users of the third

communication device are located, during a time period, the signals are at least one of: audio signals and video signals. The second communication device also initiates providing, to the one or more users, a third indication of the received recommendation on an interface of the second communication device.

According to a fourth aspect of embodiments herein, the object is achieved by the first communication device for handling the recommendation to the second

communication device. The first communication device and the second communication device are configured to operate in the telecommunications network. The first communication device is further configured to determine the recommendation for the item of content to be provided by the third communication device configured to operate in the telecommunications network. The determination of the recommendation is configured to be based on signals configured to be collected by the receiving device located in the space where the one or more users of the third communication device are located during the time period. The signals are configured to be at least one of: audio signals and video signals. The first network node is also configured to initiate sending the first indication of the recommendation configured to be determined to the second communication device.

According to a fifth aspect of embodiments herein, the object is achieved by the fourth communication device for handling the characterization of the context of a third communication device configured to have the one or more users. The fourth

communication device and the third communication device are configured to operate in the telecommunications network. The fourth communication device is further configured to obtain signals configured to be collected by the receiving device located in the space where the one or more users of the third communication device are located, during the time period. The signals are configured to be at least one of: audio signals and video signals. The fourth communication device is also configured to determine the

characterization of the context of the third communication device by determining the one or more factors configured to be obtained from the analysis of the signals configured to be obtained. The one or more factors comprise at least one of: a) the one or more characteristics of the one or more users from: number, gender, age, and identity of the one or more users; b) the first mood configured to be derived from the tone of one or more voices configured to be detected in the audio signals; c) the second mood configured to be derived from the first semantic analysis of the language used by one or more voices configured to be detected in the audio signals; d) the topic of discussion configured to be derived from a second semantic analysis of the language used by the one or more voices configured to be detected in the audio signals, and e) the third mood configured to be derived from at least one of: the body movement and the gesture configured to be detected in each of the one or more users. The fourth communication device is also configured to initiate sending the second indication of the characterization of the context configured to be determined to the first communication device configured to operate in the telecommunications network or another communication device configured to operate in the telecommunications network.

According to a sixth aspect of embodiments herein, the object is achieved by the second communication device for handling the recommendation from the first

communication device. The first communication device and the second communication device are configured to operate in the telecommunications network. The second communication device is further configured to receive the first indication for the recommendation for the item of content to be provided by the third communication device configured to operate in the telecommunications network. The recommendation is based on signals collected by the receiving device configured to be located in the space where the one or more users of the third communication device are located, during the time period, the signals being configured to be at least one of: audio signals and video signals. The second communication device is also configured to initiate providing, to the one or more users, the third indication of the recommendation configured to be received on the interface of the second communication device.

According to a seventh aspect of embodiments herein, the object is achieved by a computer program. The computer program comprises instructions which, when executed on at least one processor, cause the at least one processor to carry out the method performed by the first communication device. According to an eighth aspect of embodiments herein, the object is achieved by computer-readable storage medium. The computer-readable storage medium has stored thereon a computer program comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method performed by the first communication device.

According to a ninth aspect of embodiments herein, the object is achieved by a computer program. The computer program comprises instructions which, when executed on at least one processor, cause the at least one processor to carry out the method performed by the fourth communication device.

According to a tenth aspect of embodiments herein, the object is achieved by computer-readable storage medium. The computer-readable storage medium has stored thereon a computer program comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method performed by the fourth communication device.

According to an eleventh aspect of embodiments herein, the object is achieved by a computer program. The computer program comprises instructions which, when executed on at least one processor, cause the at least one processor to carry out the method performed by the second communication device.

According to a twelfth aspect of embodiments herein, the object is achieved by computer-readable storage medium. The computer-readable storage medium has stored thereon a computer program comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method performed by the second communication device.

By the first communication device determining the recommendation for the item of content to be provided by the third communication device, based on the signals collected in the space wherein one or more users of the third communication device are located during the time period, the first communication device enables the second communication device to provide a recommendation to the one or more users which is optimally adapted to the current context and mood of the one or more users, and thus provide the one or more users with much more relevant recommendations than those based on collaborative and content-based filtering methods. Therefore, a process of selection of an item of content by the one or more users is shortened, taking less capacity and processing resources from the network, reducing power consumption of the devices involved, e.g., battery consumption in wireless devices, and overall enhancing the satisfaction of the one or more users. The determination of the recommendation by the first communication device may be enabled by the fourth communication device determining the characterization of the context of the third communication device.

BRIEF DESCRIPTION OF THE DRAWINGS

Examples of embodiments herein are described in more detail with reference to the accompanying drawings, and according to the following description.

Figure 1 is a schematic diagram illustrating embodiments of a telecommunications

network, according to embodiments herein.

Figure 2 is a flowchart depicting embodiments of a method in a first communication

device, according to embodiments herein.

Figure 3 is a flowchart depicting embodiments of a method in a fourth communication device, according to embodiments herein.

Figure 4 is a flowchart depicting embodiments of a method in a second communication device, according to embodiments herein.

Figure 5 is a schematic diagram illustrating an example of the different components of the telecommunications network and their interactions, according to embodiments herein.

Figure 6 is a schematic diagram illustrating another example of the different components of the telecommunications network and their interactions, according to embodiments herein.

Figure 7 is a schematic diagram illustrating another example of the different components of the telecommunications network and their interactions, according to embodiments herein.

Figure 8 is a schematic diagram illustrating an example of a method performed by

components of a telecommunications network, according to embodiments herein.

Figure 9 is a schematic diagram illustrating another example of a method performed by components of a telecommunications network, according to embodiments herein.

Figure 10 is a schematic block diagram illustrating embodiments of a first communication device, according to embodiments herein.

Figure 1 1 is a schematic block diagram illustrating embodiments of a fourth

communication device, according to embodiments herein.

Figure 12 is a schematic block diagram illustrating embodiments of a second

communication device, according to embodiments herein. DETAILED DESCRIPTION

As part of the development of embodiments herein, a problem will first be identified and discussed.

As stated earlier, the existing Recommender Systems use a relatively static sample set of preferences, likes and dislikes of a user, and do not adapt to the contextual setting of the user, which may be characterized by time, type of place, the company the user is in, the mood of the user and the mood of the company the user is in, etc.. Attempts have been made to incorporate contextual information into the recommendation function.

However, so far the contextual information used has been rather simple such as the season, week day, time of day, etc. These systems have been shown to provide slightly better recommendations than those not making use of contextual information.

Nevertheless, the contextual information being used in current recommender systems is far from capturing those contextual factors that profoundly influence a user's preference in any given place and time.

For an example, in the case of a TV recommender system, a group of young rowdy men in the living room watching TV is much more likely to be susceptible for fast paced action than a romantic comedy, and if they are discussing soccer, a live broadcast from the Champion's League game might be just be right thing.

These kinds of contexts play a substantial role in the choice of content a user may make, but are not captured by the current recommendation systems. Therefore, existing recommendation systems result in a process of selection of an item of content by a user that is unnecessarily lengthened, taking unnecessary capacity and processing resources from the network, increasing power consumption of the devices involved, e.g., battery consumption in wireless devices, and overall low satisfaction of the users.

In order to address this problem, several embodiments are comprised herein, which may be understood to relate to a context-based recommender system. Contextual factors that may influence the preference of a user at any given place and time may be factors such as: the type of group the user may be in, such as rowdy men, young women, parents with their young children, parents with their teenage children, a romantic couple, etc, the mood and topic of a discussion the group the user may be in may be having, or the mood and tone of the voices of the people in the group.

In the case of a TV recommender system, for example, if a group of parents and their young children are discussing particular cartoon characters, they may be much more likely to be susceptible for animated movies with those characters. On the other hand, if the user context is a group of children with their parents their susceptibility for animated films or family movies will most likely be very high.

These examples illustrate how useful it may be for a recommender system to have access to this kind of user context information and to be able to use it in its

recommendations.

Embodiments herein relate to a recommender system that may take the

characterization of these types of current user context factors into consideration by utilizing an Audio Scene Analysis (ASA) and/or a Video scene Analysis (VSA) system to obtain a characterization of the current context factors described above, and providing a recommendation based on the aggregated information of the user profile and the current context information provided by the ASA and/or the VSA system.

Embodiments herein may be applicable, but not limited to, areas such as films, TV series, sports programs, music, news, books, online computer games and even placement of advertisements.

Embodiments will now be described more fully hereinafter with reference to the accompanying drawings, in which examples are shown. In this section, the embodiments herein will be illustrated in more detail by a number of exemplary embodiments. It should be noted that the exemplary embodiments herein are not mutually exclusive. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments.

Figure 1 depicts a non-limiting example of a telecommunications network 100, sometimes also referred to as a cellular radio system, cellular network or wireless communications system, in which embodiments herein may be implemented. The telecommunications network 100 may for example be a network such as a Long-Term Evolution (LTE), e.g. LTE Frequency Division Duplex (FDD), LTE Time Division Duplex (TDD), LTE Half-Duplex Frequency Division Duplex (HD-FDD), LTE operating in an unlicensed band, WCDMA, Universal Terrestrial Radio Access (UTRA) TDD, GSM network, GERAN network, Ultra-Mobile Broadband (UMB), EDGE network, network comprising of any combination of Radio Access Technologies (RATs) such as e.g. Multi- Standard Radio (MSR) base stations, multi-RAT base stations etc., any 3rd Generation Partnership Project (3GPP) cellular network, Wireless Local Area Network/s (WLAN) or WiFi network/s, Worldwide Interoperability for Microwave Access (WMax), 5G system or any cellular network or system. In some examples, the telecommunications network 100 may support Information-Centric Networking (ICN). In some examples, connectivity between the different entities in the telecommunication network 100 of Figure 1 , may be enabled as an over-the-top (OTT) connection, using an access network, one or more core networks, and/or one or more intermediate networks, which are not depicted in the Figure, to simplify it. The OTT connection may be transparent in the sense that the participating communication devices through which the OTT connection may pass may be unaware of routing of uplink and downlink communications.

The telecommunications network 100 comprises a plurality of communication devices, whereof a first communication device 101 , a second communication device 102, a third communication device 103 and a fourth communication device 104 are depicted in Figure 1.

The first communication device 101 may be understood as a first computer system, which may be implemented as a standalone server in e.g., a host computer in the cloud, as depicted in the non-limiting example of Figure 1. The first communication device 101 may in some examples be a distributed node or distributed server, with some of its functions being implemented locally, e.g., by a client manager on a TV set-top-box, and some of its functions implemented in the cloud, by e.g., a server manager. The first communication device 101 may also be implemented as processing resources in a server farm. The first communication device 101 may be under the ownership or control of a service provider, or may be operated by the service provider or on behalf of the service provider.

The second communication device 102 may be understood as a second computer system, e.g., a client computer, in the telecommunications network 100. The second communication device 102 may be for example, as depicted in the example of Figure 1 , a wireless device as described below, e.g., a UE. In other examples which are not depicted in Figure 1 , the second communication device 102 may be locally located, e.g., on a TV set-top-box. The second communication device 102 may also be a network node in the telecommunications network 100, as described below, or a computer, e.g., at a content provider of e.g., media content, in the telecommunications network 100. The second communication device 102 may even be the same device as the third communication device 103. The second communication device may have an interface 110, such as e.g., a touch-screen, a button, a remote control, etc...

The third communication device 103 may also be understood as a third computer system, which may be implemented as a standalone client computer in e.g., a TV set-top- box in the telecommunications network 100, as depicted in the non-limiting example of Figure 1 , a smart TV, or a wireless device as described below, such as a tablet or a smartphone. The third communication device 103 may in some examples be a distributed node or distributed server, with some of its functions being implemented locally, e.g., by a client manager on a TV set-top-box, and some of its functions implemented in the cloud, by e.g., a server manager on e.g., a media server. The third communication device 103 may be located in or co-located with a device to reproduce media for the one or more users, or reproducing device 120, such as a TV or a media player.

The fourth communication device 104 may be understood as a fourth computer system, which may be implemented as a standalone client computer in e.g., a TV set-top- box in the telecommunications network 100, as depicted in the non-limiting example of Figure 1 , a smart TV, or a wireless device as described below, such as a tablet or a smartphone. The fourth communication device 104 may in some examples be a distributed node or distributed server, with some of its functions being implemented locally, e.g., by a client manager on a TV set-top-box, and some of its functions implemented in the cloud, by e.g., a server manager on e.g., a media server. In the example of Figure 1 , the fourth communication device 104 is co-located with the third communication device 103, on a TV set-top-box, as a client. The fourth communication device 104 may also be located in or co-located with the reproducing device 120. The fourth communication device 104 may be understood as a communication device managing or controlling audio or video signals, e.g., an Audio Scene Analyzer (ASA) and/or a Video Scene Analyzer (VSA).

As stated above, in some embodiments, some of the first communication device 101 , the second communication device 102, the third communication device 103 and the fourth communication device 104, may be co-located or be the same device. Particularly, any of the second communication device 102, the third communication device 103 and the fourth communication device 104 may be the same communication device, or may be co-located. In the example of Figure 1 , the third communication device 103 and the fourth communication device 104 are co-located. All the possible combinations are not depicted in Figure 1 to simplify the Figure. Additional non-limiting examples of the telecommunications network 100 are presented below, in Figures 5, 6 and 7.

Any of network nodes comprised in the telecommunications network 100, may be a radio network node, that is, a transmission point such as a radio base station, for example an eNB, an eNodeB, or an Home Node B, an Home eNode B or any other network node capable to serve a wireless device, such as a user equipment or a machine type communication device in the telecommunications network 100. A network node may also be a Remote Radio Unit (RRU), a Remote Radio Head (RRH), a multi-standard BS (MSR BS), or a core network node, e.g., a Mobility Management Entity (MME), Self-Organizing Network (SON) node, a coordinating node, positioning node, Minimization of Driving Test (MDT) node, etc...

The telecommunications network 100 covers a geographical area which, which in 5 some embodiments may be divided into cell areas, wherein each cell area may be served by a radio network node, although, one radio network node may serve one or several cells. Any of the radio network nodes that may be comprised in the telecommunications network 100 may be of different classes, such as, e.g., macro eNodeB, home eNodeB or pico base station, based on transmission power and thereby also cell size. In some

10 examples wherein the telecommunications network 100 may be a non-cellular system, any of the radio network nodes comprised in the telecommunications network 100 may serve receiving nodes with serving beams. Any of the radio network nodes that may be comprised in the telecommunications network 100 may support one or several communication technologies, and its name may depend on the technology and

15 terminology used. In 3GPP LTE, any of the radio network nodes that may be comprised in the telecommunications network 100 may be directly connected to one or more core networks.

A plurality of wireless devices may be located in the wireless communication network 100. Any of the wireless devices comprised in the telecommunications network

20 100 may be a wireless communication device such as a UE which may also be known as e.g., mobile terminal, wireless terminal and/or mobile station, a mobile telephone, cellular telephone, or laptop with wireless capability, just to mention some further examples. Any of the wireless devices comprised in the telecommunications network 100 may be, for example, portable, pocket-storable, hand-held, computer-comprised, or a vehicle-

25 mounted mobile device, enabled to communicate voice and/or data, via the RAN, with another entity, such as a server, a laptop, a Personal Digital Assistant (PDA), or a tablet computer, sometimes referred to as a surf plate with wireless capability, Machine-to- Machine (M2M) device, device equipped with a wireless interface, such as a printer or a file storage device, modem, or any other radio network unit capable of communicating

30 over a wired or radio link in a communications system. Any of the wireless devices

comprised in the telecommunications network 100 is enabled to communicate wirelessly in the telecommunications network 100. The communication may be performed e.g., via a RAN and possibly one or more core networks, comprised within the telecommunications network 100. The telecommunications network also comprises a receiving device 130. The receiving device 130 may be understood as a device capable of detecting and collecting audio signals, such as a microphone, or video signals, such as a camera. The receiving device may be typically collocated with either one of the second communication device 5 102 or the third communication device 103. In the non-limiting example depicted in Figure 1 , the receiving device 130 is co-located with the third communication device 103.

The first communication device 101 is configured to communicate within the telecommunications network 100 with the second communication device 102 over a first link 141 , e.g., a radio link or a wired link. The first communication device 101 is

0 configured to communicate within the telecommunications network 100 with the fourth communication device 104 over a second link 142, e.g., another radio link or another wired link. The second communication device 102 may be further configured to communicate within the telecommunications network 100 with the third communication device 103 over a third link 143, e.g., another radio link, a wired link, an infrared link,5 etc... In other examples than that depicted in Figure 1 , each of the first communication device 101 , the second communication device 102, the third communication device 103 and the fourth communication device 104, when implemented as separate devices, may communicate with each other with a respective link, which may be a wired or a wireless link.

0 Any of the first link 141 , the second link 142 and the third link 143 may be a direct link or it may go via one or more core networks in the telecommunications network 100, which are not depicted in Figure 1 , or it may go via an optional intermediate network. The intermediate network may be one of, or a combination of more than one of, a public, private or hosted network; the intermediate network, if any, may be a backbone network5 or the Internet; in particular, the intermediate network may comprise two or more subnetworks (not shown).

In general, the usage of "first", "second", and/or "third", "fourth" and "fifth" herein may be understood to be an arbitrary way to denote different elements or entities, and may be0 understood to not confer a cumulative or chronological character to the nouns they modify.

Embodiments of a method performed by the first communication device 101 , the method being for handling a recommendation to the second communication device 102, will now be described with reference to the flowchart depicted in Figure 2. The first communication device 101 and the second communication device 102 operate in the telecommunications network 100.

The method may comprise one or more of the following actions. In some embodiments all the actions may be performed. In some embodiments, one or more actions may be performed. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. In Figure 2, optional actions are indicated with dashed lines.

Action 201

The first communication device 101 may be understood as a recommender system, performing a recommendation function. That is, as a device determining a

recommendation for providing an item of content to a user of the third communication device 103. This may be performed when a recommendation may be requested by the one or more users of the third communication device 103. In some embodiments, the content may be a media content, such as a movie or a song, and the third communication device 103 may be a media streaming device, e.g., a TV set-top-box, as depicted in Figure 1. According to embodiments herein, the first communication device 101 may determine the recommendation based on contextual factors that may influence a preference of one or more users of the third communication device 103 in any given place and time. The contextual factors may be derived from audio signals and/or video signals. Accordingly, the determination of the recommendation may be based on audio signals and/or video signals collected in a space where the one or more users of the third communication device 103 are located during a time period. The time period may be understood as a certain period of time preceding when the one or more users of the third communication device 103 are to be provided with the recommendation for content, e.g., a few minutes before a family is about to receive a recommendation to watch a movie. The audio signals and/or video signals may be collected by the receiving device 130 located in the space. The space may be defined by e.g., an operator of the

telecommunications network 100, and may cover a certain three dimensional zone around the receiving device 130, which may cover for example a room where the third

communication device 103 may be placed.

According to the foregoing, the first communication device 101 may, in this Action, obtain a characterization of a context of the third communication device 103. The characterization may be understood as a series of characteristics defining the particular context for the one or more users of the third communication device 103 in the time period, as described above. Obtaining may be understood as comprising determining, calculating, or receiving the obtained characterization from another communication device in the telecommunications network 100, such as from the fourth communication device 104, e.g., via the second link 142.

The obtaining of the characterization is summarized here, but it is described in further detail for the fourth communication device 104, in relation to Figure 3. The description provided for the fourth communication device 104 should be understood to apply to the first communication device 101 as well, for the examples wherein the first communication device 101 may determine the characterization itself.

The characterization may be obtained by determining, based on a processing of the signals collected, at least one of: one or more first factors, and one or more second factors, as follows.

In some embodiments, the signals may be audio signals. In such embodiments, the recommendation may be further based on one or more first factors obtained from a first analysis of the audio signals collected. The one or more first factors may comprise at least one of: a) one or more characteristics of the one or more users from: number, gender, age, and identity of the one or more users; b) a first mood derived from a tone of one or more voices detected in the audio signals, wherein the first mood may be understood to correspond to that of at least one of the one or more users; c) a second mood derived from a first semantic analysis of a language used by the one or more voices detected in the audio signals, wherein the second mood may be understood to

correspond to that of at least one of the one or more users; and d) a topic of discussion derived from a second semantic analysis of a language used by the one or more voices detected in the audio signals.

In some particular embodiments, the one or more factors may be obtained by at least one of the following options. The one or more characteristics may be derived by segmenting the audio signals collected during the time period, into single speaker segments. The first mood may be derived based on a natural language processing of a transcript of the single speaker segments, obtained by Automatic Speech Recognition. The first semantic analysis may be based on one or more first language models. The one or more first language models may be, for example, bag-of-words models or Vector Space Models of sentences, which may enable mood classification through different weighting schemes and similarity measures, where the models may have been trained on mood-labelled training data. The second semantic analysis may be based on one or more second language models. The one or more second language models may be, for example, bag-of-words models or Vector Space Models of sentences that may enable topic classification through different weighting schemes and similarity measures, where the models may have been trained on topic-labelled training data.

In some embodiments, the signals may be video signals. In such embodiments, the recommendation may be further based on one or more second factors obtained from a second analysis of the video signals collected, wherein the one or more second factors comprise at least one of: a) the one or more characteristics of the one or more users from: number, gender, age, and identity of the one or more users; and b) a third mood derived from at least one of: a body movement and a gesture detected in each of the one or more users distinguished in the video signals.

Action 202

In this Action, the first communication device 101 determines the recommendation for the item of content to be provided by the third communication device 103 operating in the telecommunications network 100. As stated earlier, the determination of the recommendation is based on the signals collected by the receiving device 130 located in the space where the one or more users of the third communication device 103 are located, during the time period, the signals being at least one of: audio signals and video signals.

The determining 202 of the recommendation may be further based on the obtained characterization of the context in Action 201. The characterization of the context used by the first communication device 101 may be e.g., the latest one available in a Client User Database, if it is not too old. If no up to date characterization is available, first

communication device 101 may request from the user, via e.g., a user interface, mood tips, or use an average characterization available in the Client User Database.

The first communication device 101 may base the determination of the

recommendation on the obtained context characterization in Action 201 , along with a user identification. The first communication device 101 may have compiled a list of

recommended content items from the available content in a Content Database based on the context characterization and the preference profile and content access profile of the user. The preference profile of the user may comprise information on the preferences of the user, likes and dislikes, and previous choices in different contexts, a list of other users with similar taste and preferences, and other related information, while the content access profile of the user may have information about what content the user has access to, based on the type of subscription the user may have. Given the user context, a

recommendation selection may be made from the content available to the user, as stipulated by e.g., the subscription of the user, which may best correlate with the preferences of the user in that context. This may be achieved through different means. Collaborative and content-based filtering may be used on the contextual conditioned user preferences. As more data may become available, another approach may be to compute posterior probabilities for each available content item, conditioned on the profile information of the user in that context.

Action 203

Once the first communication device 101 may have determined the

recommendation for the item of content, in this Action, first communication device 101 initiates sending a first indication of the determined recommendation to the second communication device 102. The first indication may be, for example, the list of recommended content mentioned in the previous Action, e.g., a list of recommended movies, or a marking, e.g., an asterisk, next to an already existing list of movies, pointing to those that are recommended, or a streamed preview of one or more recommended movies, for example.

The first indication may be presented to the user on the screen through the user interface 110.

The sending may be performed via the first link 141. In some embodiments, the second communication device 102 may be the third communication device 103. That is, the recommendation may be directly provided on the device where the one or more users may eventually obtain the content, e.g., watch a movie, whether or not it may be the one recommended by the first communication device 101.

Embodiments of a method performed by the fourth communication device 104, the method being for handling the characterization of the context of the third communication device 103 having the one or more users will now be described with reference to the flowchart depicted in Figure 3. As stated earlier, the fourth communication device 104 and the third communication device 103 operate in the telecommunications network 100.

The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the first communication device 101 , and will thus not be repeated here. For example, the content may be a media content, and the third communication device 103 may be a media streaming device.

The method may comprise one or more of the following actions. In some embodiments all the actions may be performed. In some embodiments, one or more actions may be performed. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. In Figure 3, an optional action is indicated with dashed lines.

Action 301

The fourth communication device 104 may be understood as an Audio Scene Analysis (ASA) System or audio signal analyser, and/or as a Video Scene Analyzer (VSA) system, which may determine the characterization of the context of the third

communication device 103. In order to be able to determine the characterization, in this Action 301 , the fourth communication device 104 may initially obtain the signals collected by the receiving device 130 located in the space where the one or more users of the third communication device 103 are located, during the time period. As stated earlier, the signals may be at least one of: audio signals and video signals.

Obtaining may be understood as comprising collecting, in examples wherein the receiving device 130 may be co-located with the fourth communication device 104, or as receiving from device in the telecommunications network 100, such as from the receiving device 130, e.g., via a wired or wireless link.

Action 302

As explained earlier, the one or more factors may be obtained from an analysis of the obtained signals. The one or more factors may comprise at least one of: a) the one or more characteristics of the one or more users from: number, gender, age, and identity of the one or more users; b) the first mood derived from a tone of one or more voices detected in the audio signals; c) the second mood derived from the first semantic analysis of a language used by one or more voices detected in the audio signals; d) the topic of discussion derived from the second semantic analysis of the language used by the one or more voices detected in the audio signals; and e) the third mood derived from at least one of: the body movement and the gesture detected in each of the one or more users.

The analysis of the obtained signals may require a general set of language models, gender models and age models, speaker and acoustic models for the enrolled speakers, visual recognition models, such as spatial gesture models, which may be three dimensional (3D) volumetric models or two-dimensional (2D) appearance based models, and facial expression models which may be implemented with convolutional neural nets in a similar way that a general image object classification may be performed.

In this Action, the fourth communication device 104 may obtain a model for each one of the one or more factors, based on repeatedly performing the obtaining 301 of the signals and the determining of the one or more factors over a plurality of time periods. In examples wherein the fourth communication device 104 may calculate the model itself, obtaining may be understood in this Action 302 as comprising determining, calculating, or building, with for example, machine learning methods. Obtaining may also comprise receiving from another communication device in the telecommunications network 100, e.g., via a wired or wireless link or as retrieving, e.g., from a database, in examples wherein the model may have previously been calculated by the fourth communication device 104 or another communication device in the telecommunications network 100. The fourth communication device 104 may be a distributed node and some of its functionality may be performed on, e.g., a Media Server, and some of its

functionality being performed by a client computer. The Media Server may have, in some examples, e.g., a Server Database with ASA and/or VSA models for all users in the system, and the client may always check with the server if its models are up to date and do an update when needed. The speaker, acoustic, and visual models for each enrolled speaker may be continually updated as more training data may become available for training those models. In some examples, the training may be performed in the media server.

Action 303

In this Action 303, the fourth communication device 104 determines the

characterization of the context of the third communication device 103 by determining the one or more factors obtained from the analysis of the obtained signals. The one or more factors may comprise any of the one or more first factors, and the one or more second factors, as described earlier.

As stated earlier, in some embodiments, the signals may be audio signals. In examples wherein the fourth communication device 104 is an Audio Scene Analysis (ASA) System, the fourth communication device 104 may use state of the art speech analysis and recognition tools to segment the signals collected by the receiving device 130, for example, to segment an audio signal into single speaker segments which may be classified by language, gender, age, speaker identifier, and speech mood and tone. From the different single speaker segments, an estimate may be made of the number of people, gender mix, age mix, which enrolled speakers are present, and the overall speech mood and tone. Automatic Speech Recognition (ASR) may also be used on these segments to obtain a transcript of what may be being said. Natural language processing may be used on those transcripts to estimate the sentiment in the language and a characterization of the discussion. In embodiments wherein the signals may be audio signals, at least one of the following may apply: a) the one or more characteristics may be derived by segmenting the audio signals collected during the time period, into single speaker segments; b) the first mood may be derived based on a natural language processing of a transcript of the single speaker segments, obtained by Automatic Speech Recognition; c) the first semantic analysis may be based on the one or more first language models; and d) the second semantic analysis may be based the on one or more second language models.

In embodiments, the signals may be video signals. In examples wherein the fourth communication device 104 is a Video Scene Analysis (VSA) System, the fourth communication device 104 may use, for example, a) facial detection algorithms to identify the people at the location, which may be used to identify the people in the group and thus classify which group it may be; b) facial expression classification algorithms to estimate the facial-mood of the people in the group; and c) movement and gesture analysis methods to estimate the motion-mood of the people in the group.

In embodiments wherein the signals may be video signals, the one or more factors, which in these embodiments may be understood to correspond to the one or more second factors, may comprise at least one of : a) the one or more characteristics of the one or more users from: number, gender, age, and identity of the one or more users; and b) the third mood derived from at least one of: the body movement and the gesture detected in each of the one or more users.

The time period over which the scene analysis may be performed, may typically be a few minutes, and the context characterization for that time period may be sent to a client controller or client Control Function (CLIENT-CTRL) in the fourth communication device 104, e.g., in the set-top-box, which may time-stamp and store the results in a Client User Database.

Action 304

In this Action 303, the fourth communication device 104 initiates sending a second indication of the determined characterization of the context to the first communication device 101 operating in the telecommunications network 100 or another communication device 102, 103, 104 operating in the telecommunications network 100. Similarly to what was described earlier, to initiate sending may comprise the fourth communication device 104 sending itself, e.g., via the second link 142, or triggering another network node to send. The second indication may be, for example, a message comprising a code for the determined characterization, or a comprehensive list of the determined one or more factors.

In some embodiments, the second indication may further indicate the obtained model for each one of the one or more factors.

Embodiments of a method performed by a second communication device 102, the method being for handling a recommendation from a first communication device 101 will now be described with reference to the flowchart depicted in Figure 4. As stated earlier, the first communication device 101 and the second communication device 102 operate in a telecommunications network 100.

The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the first communication device 101 , and will thus not be repeated here. For example the content may be a media content, and the third communication device 103 may be a media streaming device.

The method may comprise one or more of the following actions. In some embodiments all the actions may be performed. In some embodiments, one or more actions may be performed. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. Note that in some embodiments, the order of the Actions may be changed. In Figure 4 an optional action is indicated with dashed lines.

Action 401

The second communication device 102 may be understood as a communication device receiving the recommendation determined by the first communication device 101.

In this Action 701 , the second communication device 102 receives the first indication for the recommendation for the item of content to be provided by the third communication device 103 operating in the telecommunications network 100. The recommendation is based on signals collected by a receiving device 130 located in a space where one or more users of the third communication device 103 are located, during the time period. The signals are at least one of: audio signals and video signals. The receiving may be performed, e.g., via the first link 141.

Action 402

In this Action, the second communication device 102 initiates providing, to the one or more users, a third indication of the received recommendation on an interface 1 10 of the second communication device 102. The third indication, similarly to the first indication may be, for example, a list of recommended movies, or a marking, e.g., an asterisk, next to an already existing list of movies, pointing to those that are

recommended, or a streamed preview of one or more recommended movies.

Action 403

In some embodiments, the second communication device 102 may, in this Action, initiate providing the item of content on the third communication device 103, based on a selection received from the one or more users on the interface 110, the selection being further based on the provided third indication. Similarly to what was described earlier, initiate providing may comprise the second communication device 102 providing itself, e.g., via the third link 143, or triggering another network node to send.

To exemplify some of the foregoing in other words, particular examples herein may relate to providing a recommendation based on a current context characterization, e.g., number of people, gender mix, age mix, which persons are comprised in the one or more users, speech tone and mood, language mix, discussion mood and topic characterization, etc... and applying machine-learning techniques to train one or more models for context characterization.

One advantage of embodiments herein is that the methods described enable providing recommendations that may be optimally adapted to the current context and mood of the one or more users, and thus provide the one or more users with much more relevant recommendations than those based on collaborative and content-based filtering methods.

A few non-limiting particular examples of embodiments herein are depicted in the next Figures. Figure 5 is a schematic block diagram illustrating an interaction between the fourth communication device 104, which in this example is an Audio Scene Analyzer, and the first communication device 101 , referred to in the Figure as a Recommender System. The fourth communication device 104, the Audio Scene Analyzer, may receive according to Action 301 , audio signals from the receiving device 130, e.g., microphones, in the space where the one of more users are located. Next, according to Action 303, it may use speech and natural language analysis tools to characterize current context factors in the space over the time period, such as: number of people, gender mix, age mix, which known persons, mood and tone of voices, language mix, and discussion mood and topic of the discussion. The second indication of the determined characterization of the context may then be sent to the first communication device 101 , according to Action 304. The first communication device 101 obtains the second indication according to Action 201. Having the context characterization, the Recommender System 202 may retrieve the context dependent preferences of a user that may be available in the user preference profile in e.g., a user database 501 , which may then be used to arrive at a context dependent recommendation of the assets that may be available to the user, as stipulated by e.g., a subscription of the user, and which may be available at e.g., an asset database 502. An asset may be understood as an item of content or content item, such as a movie, a TV series, etc... The preference profile of a user may comprise information relating to the preferences of the user, such as likes and dislikes, and previous choices in different contexts, a list of other users with similar taste and preferences and other related information. The Recommender System here may therefore be understood as being a context and user profile based Recommender System.

In some example implementations herein, the first communication device 101 may be co-located, and integrated with the fourth communication device 104 in a same system. One example of such a system may be referred to as a recommender system, which may be a TV recommender system. This recommender system may typically be a part of an Internet Protocol Television (IPTV) service solution with some of the functions implemented locally in a set-top-box, and others on a media server in the cloud. A simplified schematic block diagram of such a system is illustrated in the non-limiting example of Figure 6, depicting function blocks and databases at the client and server side. In this particular example, the fourth communication device 104 is an Audio Scene Analysis system. Audio signals may be obtained, according to Action 301 , from microphones in the space and over the time period. Directional microphones

implemented with microphone arrays may be beneficial here to spatially filter out the sound from the TV and amplify the sound from the one or more users. In this non-limiting example, the Audio Scene Analysis is performed by the fourth communication device 104 in the client to provide a characterization of the audio scene according to Action 303, which is then used as part of the input to the first communication device 101 , the recommendation system, on the server. Upon request, the Recommendation System may, according to Action 203, send the first indication as a list of recommended content items to the second communication device 102 on the client, which is presented to the user, according to Action 402, on the screen of the reproducing device 120, here a TV, through the user interface 110. The user may then be able to inspect the recommended content items through the user interface 110, and choose a content item to watch. The user input may be provided through a remote control interface 601 implemented on a smart phone or a remote controller device. A video renderer 602 may be used to, according to Action 402, initiate providing the item of content on the third communication device 103, based on the selection received from the one or more users. The selected content may be retrieved from a server content database 603. As illustrated in the Figure, the fourth communication device 104 may have access to a Client User Database (Client User DB) 604 and a Client Audio Scene Analysis Database (Client ASA DB) 605. The ASA models mentioned in Action 302 may be stored locally in the Client ASA DB 605. The Client ASA DB 605 in, for example, Figure 6, may contain all language-, speaker-, gender- and acoustic-models that may be required by the fourth communication device 104 to perform its audio scene analysis, as well as the short term storage of audio data that may be used to train and update these models as more audio may become available. The fourth communication device 104 may have an Application Program Interface (API) for managing the retrieval, training and updating of these models on or from the server. On the Media server, the fourth communication device 104 may have access to the Server ASA Database (DB) 502 with ASA models for all users in the system, and the client may always check with the server if its models are up to date and do an update when needed, as described in Action 302. The user database on the client, that is the Client User DB 604, may contain only user profiles of enrolled users of a particular account, while on the server side, the Server User DB 501 may contain profiles of all users in all accounts with information on the subscribed services and packages and other relevant account information. The context characterization along with the user identification may be sent by the fourth communication device 104 from the client, via a client controller (Client CTRL) 606 to a server controller (Server CTRL) 607 in the media server. The Client CTRL 606 and the Server CTRL 607 are described in relation to Figure 7. The first communication device 101 , through the server, may compile the list of recommended content items from the available content in the Server Content Database (Server Content DB) 603, based on the context characterization and the preference profile of the user. This list may then sent to the client and presented to the user on the screen of the reproducing device 120 through the user interface 110.

Figure 7 gives a more detailed description of the different control functions in a recommender system according to a non-limiting example herein, wherein functions of the first communication device 101 , the second communication device 102, the third communication device 103, and the fourth communication device 104 are integrated into a same system, with some functions implemented locally on a Media Client, and some of the functions implemented on a Media Server, e.g., in the cloud. Figure 7 depicts some of the main function managers and databases at the client and server side, as well as the interfaces between the paired control managers in each control management layer on the device and the server for each one of the communication devices. In this particular example, the fourth communication device 104 is also an Audio Scene Analysis system.

The fourth communication device 104 may comprise a Client ASA Manager 701 and a Server (Serv) ASA Manager 702, which may manage the Audio Scene Analysis functionality of the system, as explained in the section above for the fourth communication device 104 in relation to Figure 3.

The first communication device 101 may comprise a Client Recommender (Recm) Manager 703 and a Serv Recm Manager 704, which may manage the recommendation functionality of the system, as explained in the section above for the first communication device 101 in relation to Figure 2.

The third communication device 103 may comprise a Client Streaming (Strm) Manager 705 and a Serv Strm Manager 706, which may manage the streaming functionality of the system, which would control the streaming of the chosen content item from the server to the client. This may include functionality such as continuously optimizing the user experience for the available bandwidth and capabilities of rendering devices in the client.

The second communication device 102 may comprise a Client User (Usr) Manager 707 and a Serv Usr Manager 708, which may manage the user accounts functionality as has been explained in the sections above. This entails authenticating the users, verifying what functionality they may be entitled to through their subscription account, enabling that functionality in the system, and keeping all account and user information secure and updated. One function in particular that may be managed by these managers may be the User Enrolment into the system, which may be understood to have the purpose of initializing a) some of the user profile information that may be used to enable personalized recommendations and b) provide audio data that may be used to adapt ASA models to the user. Each account may have several users, and each user may need to be enrolled into the system by providing, for example, a) an initial set of TV programs and films they like, and, b) one or more speech samples of a set of predefined sentences.

The data from a) may be used to initialize the preference profile of the user and the data from b) may be used to adapt the Speaker Models (SM) and Acoustic Models (AM) that may be used in the Audio Scene Analyzer.

Security may be a concern for the user. The user may not feel comfortable allowing the system to store the audio captured in their environment on remote servers, even if it is only temporary until enough data has been stored to perform updates for the user of an ASA model and or a VSA model. In this case, two alternative solutions may be provided. In a first solution, there may be no storage of audio on a remote server and no training of ASA and/or VSA models, where the captured speech and/or video signals may only be used to generate the current context in the ASA and/or VSA module on the client, and then deleted. Since this solution prohibits full training of ASA and/or VSA models on the server with machine learning algorithms, the user specific ASA and/or VSA models may not be optimal, resulting in poorer recommendation results.

In a second solution, temporary storage of audio and/or video data on the server may be allowed and used for the training of ASA and/or VSA models, where the captured speech and/or video may be used to generate the current context, and then sent to the cloud for temporary storage. There it may be used to train the ASA and/or VSA models for better recommendations in the future. This speech and/or video data may only be used for the purpose of training ASA and/or VSA models of speech and/or video, and a user- agreement may cover this use of the data.

The second communication device 102 may comprise the interface 1 10, a Client Usr Interface (Intf), which may manage the interface between the one or more users and the system, where user input may be acquired through a remote control interface implemented on a smart phone or a remote control device 601 and graphical objects displayed on the TV screen of a reproducing device 120 with the assistance of the video renderer 602.

All modules within the Media Server exemplified in Figures 6 and 7 may be implemented in the cloud.

Figure 8 depicts a non-limiting example of a process of context characterization generation, for an integrated system according to embodiments herein. In this particular example, the fourth communication device 104 is also an Audio Scene Analysis system. In particular, Figure 8 describes a sequence diagram of the execution of a scene analysis cycle to perform online speech analysis to generate a current Context Characterisation. Please note that for both sequence diagrams of Figure 8 and 9, for the purpose of simplicity, components blocks are depicted at a higher level of abstraction than those presented in Figure 7 and the details are considered implicitly. For example,

communication between Client_CTRL 606 and Server_CTRL 607 is depicted, while the internal components of both of these controls are not depicted.

The steps of Figure 8 are explained below, in relation to the Actions described above: I . A user starts a session and turns on the microphone (Mic_Audio) to provide the audio input for Context Characterization, which the fourth communication device 104 may obtain in Action 301.

2. The fourth communication device 104 in the system automatically checks for the updated Models to be used for the Context Characterization process in Action 302. Action 302 may comprise steps 2-4 and 6-7 of Figure 8. Note that this checking and consequently downloading of the models to the Client_ASA database 605 may be performed at different occasions: for example models may be downloaded when the user starts a session, as shown in this figure, or they may be downloaded after a particular time, e.g., once a day or once a week, based on the user settings that the user may set during the user enrolment process.

Thus to start this process, the Client_CRTL 606 asks the Server_CTRL 607 for the new Models.

3. The Server_CTRL 607checks for the new Models in the Server_ASA database 502 that stores the Models.

4. If the new Models are available, they are downloaded from the

Server_ASA database 502 to the Client_ASA database 605.

5. To generate the Context Characterization, a continuous scanning is done in Action 301 , and audio is sent to the Client_CTRL 606 for the time period. The time period may be based on a user setting and may vary.

6. The Client_CTRL 606 asks for the Models from Client_ASA database 605.

7. The Client_ASA database 605 sends Models back.

8. Action 303 may start when the Client_CTRL 606 of the fourth

communication device 104 asks the Audio Scene Analyser (ASA) MGR 701 of the fourth communication device 104 to perform the analysis. Action 303 may comprise steps 8-10 of Figure 8.

9. The ASA MGR 701 performs the analysis.

10. And generates a Context based on the current input.

I I . Action 304 may start when the ASA 104 sends this Context

Characterization to the Client_CTRL 606. Action 304 may comprise steps 1 1-14 of Figure 8.

12. The Client_CTRL 606 stores this Characterization in the Client_User database 604 to be used further

13. The Client_CTRL 606 forwards this Characterization along with User identification to the Server_CTRL 607, according to Action 303. 14. The Server_CTRL 607 stores this Characterization along with User identification in the Server_User database 501 to be used further.

Figure 9 illustrates a non-limiting example of a sequence diagram describing the execution of the recommendation function of a recommendation system such as the first 5 communication device 101 , according to embodiments herein. In particular, Figure 9 describes a sequence diagram to perform an online recommendation based on the current Context Characterisation, as provided for example by the fourth communication device 104.

The steps of Figure 9 are explained below, in relation to the Actions described 10 above:

1. A user of the one or more users asks for the recommendation.

2. The Client_CTRL 606 of the first communication device 101 forwards the request to the Server_CTRL 607.

3. The Server_CTRL 607 sends the User ID to the Server_User database

15 501.

4. The Server_User database 501 sends back the preferences of the user. Note that these preferences are made by the user in the initial user enrolment and are also based on the details of the user's subscription.

5. According to Action 202, the Server_CTRL 607, which may have obtained 20 the characterization of the context as described in step 11-12 of Figure 8, asks the

Server_Content database 603 to provide a list of recommendations. Action 202 may comprise steps 5-7 of Figure 9.

6. The Server_Content database 603 sends back an initial list of Recommendations.

25 7. The Server_CTRL 607 chooses from the initial list based on the user's preferences and creates a Recommendation list.

8. Action 203 may start when the Server_CTRL 607 sends the

Recommendation list to the Client_CTRL 606.

9. The Client_CTRL 606 displays the list on TV/screen of the reproducing 30 device 120 using the User Interface modules.

10. The user selects an item from the displayed list.

1 1. The selected item is sent to the Client_CTRL 606.

12. The selected item is sent to the Server_CTRL 607.

13. The selected item is sent to the Server_Content database 603. 14. The rendering starts from the Server_Content database 603 on the TV/Screen of the reproducing device 120, represented as TV through User Interface (TVthrouUI) in the Figure.

15. The latest data to train the Models is sent from the Server_CTRL 607 to the Server_ASA database 502 whenever needed/set by the user.

The particular examples illustrated herein have described only examples in the area of TV recommender systems. Similar examples for other types of recommender systems and even systems for placement of advertisements may be easily derived by a person skilled in the art based on the embodiments described herein.

To perform the method actions described above in relation to Figures 2, 5-7 and 9, the first communication device 101 for handling the recommendation to the second communication device 102 may comprise the following arrangement depicted in Figure 10. As stated earlier, the first communication device 101 and the second communication device 102 are configured to operate in the telecommunications network 100.

The first communication device 101 is further configured to, e.g., by means of an determining module 1001 configured to, determine the recommendation for the item of content to be provided by the third communication device 103 configured to operate in the telecommunications network 100; the determination of the recommendation is configured to be based on signals configured to be collected by the receiving device 130 located in the space where the one or more users of the third communication device 103 are located, during the time period; the signals are configured to be at least one of: the audio signals and the video signals.

In some embodiments wherein the signals may be configured to be audio signals, the recommendation may be further configured to be based on one or more first factors configured to be obtained, e.g., by means of an determining module 1001 further configured to, from a first analysis of the audio signals configured to be collected, wherein the one or more first factors may comprise at least one of:

a. one or more characteristics of the one or more users from: number, gender, age, and identity of the one or more users; b. a first mood configured to be derived from a tone of one or more voices configured to be detected in the audio signals;

c. a second mood configured to be derived from a first semantic analysis of a language used by the one or more voices configured to be detected in the audio signals; and

d. a topic of discussion configured to be derived from a second semantic analysis of a language used by the one or more voices configured to be detected in the audio signals.

In some embodiments, e.g., by means of the determining module 1001 further configured to, at least one of:

a. the one or more characteristics may be configured to be derived by segmenting the audio signals which may be configured to be collected during the time period, into single speaker segments;

b. the first mood may be configured to be derived based on a natural language processing of a transcript of the single speaker segments, which may be configured to be obtained by Automatic Speech Recognition;

c. the first semantic analysis may be configured to be based on one or more first language models;

d. the second semantic analysis may be configured to be based on one or more second language models.

In some embodiments, wherein the signals may be configured to be video signals, the recommendation may be further based on one or more second factors configured to be obtained, e.g., by means of the determining module 1001 further configured to, from a second analysis of the video signals configured to be collected, wherein the one or more second factors may comprise at least one of:

a. the one or more characteristics of the one or more users from: number, gender, age, and identity of the one or more users; and

b. a third mood configured to be derived from at least one of: a body movement and a gesture configured to be detected in each of the one or more users configured to be distinguished in the video signals.

The first communication device 101 is further configured to, e.g., by means of an initiating module 1002 configured to, initiate sending the first indication of the

recommendation configured to be determined, to the second communication device 102.

The first communication device 101 may be further configured to, e.g., by means of an obtaining module 1003 configured to, obtain the characterization of the context of the third communication device 103 by determining, based on the processing of the signals configured to be collected, at least one of: the one or more first factors, and the one or more second factors. The determining of the recommendation may be configured to be further based on the characterization of the context configured to be obtained.

In some embodiments, the second communication device 102 may be the third communication device 103.

The embodiments herein may be implemented through one or more processors, such as a processor 1004 in the first communication device 101 depicted in Figure 10, together with computer program code for performing the functions and actions of the embodiments herein. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the in the first communication device 101. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the first communication device 101.

The first communication device 101 may further comprise a memory 1005

comprising one or more memory units. The memory 1005 is arranged to be used to store obtained information, store data, configurations, and applications etc. to perform the methods herein when being executed in the first communication device 101.

In some embodiments, the first communication device 101 may receive information from the second communication device 102, the third communication device 103, the fourth communication device 104, and/or any of the pertinent databases described above, through a receiving port 1006. In some embodiments, the receiving port 1006 may be, for example, connected to one or more antennas in first communication device 101. In other embodiments, the first communication device 101 may receive information from another structure in the telecommunications network 100 through the receiving port 1006. Since the receiving port 1006 may be in communication with the processor 1004, the receiving port 1006 may then send the received information to the processor 1004. The receiving port 1006 may also be configured to receive other information from other communication devices or structures in the telecommunications network 100.

The processor 1004 in the first communication device 101 may be further configured to transmit or send information to e.g., the second communication device 102, the third communication device 103, the fourth communication device 104, and/or any of the pertinent databases described above, through a sending port 1007, which may be in communication with the processor 1004, and the memory 1005. Those skilled in the art will also appreciate that the determining module 1001 , the initiating module 1002, and the obtaining module 1003 described above may refer to a combination of analog and digital modules, and/or one or more processors configured with software and/or firmware, e.g., stored in memory, that, when executed by the one or more processors, such as the processor 1004, perform as described above. One or more of these processors, as well as the other digital hardware, may be included in a single Application-Specific Integrated Circuit (ASIC), or several processors and various digital hardware may be distributed among several separate components, whether individually packaged or assembled into a System-on-a-Chip (SoC). In one example the processor 1004 may be implemented as several processors, such as the Client CTRL 606, or more specifically, the Client RECM MGR 703, and the SERV CTRL 607, or more specifically, the SERV RECM MGR 704, distributed among several separate components, such as the Media Client and the Media Server.

Also, in some embodiments, the different modules 1001-1003 described above may be implemented as one or more applications running on one or more processors such as the processor 1004.

Thus, the methods according to the embodiments described herein for the first communication device 101 may be respectively implemented by means of a computer program 1008 product, comprising instructions, i.e., software code portions, which, when executed on at least one processor 1004, cause the at least one processor 1004 to carry out the action described herein, as performed by the first communication device 101. The computer program 1008 product may be stored on a computer-readable storage medium 1009. The computer-readable storage medium 1009, having stored thereon the computer program 1008, may comprise instructions which, when executed on at least one processor 1004, cause the at least one processor 1004 to carry out the action described herein, as performed by the first communication device 101. In some embodiments, the computer-readable storage medium 1009 may be a non-transitory computer-readable storage medium, such as a CD ROM disc, or a memory stick. In other embodiments, the computer program 1008 product may be stored on a carrier containing the computer program 1008 just described, wherein the carrier is one of an electronic signal, optical signal, radio signal, or the computer-readable storage medium 1009, as described above.

To perform the method actions described above in relation to Figures 3 and 5-8, the fourth communication device 104 for handling the characterization of the context of the third communication device 103 configured to have one or more users may comprise the following arrangement depicted in Figure 11. As stated earlier, the fourth communication device 104 and the third communication device 103 are configured to operate in the telecommunications network 100.

The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the fourth

communication device 104, and will thus not be repeated here. For example, in some embodiments, the content may be a media content, and the third communication device 103 may be a media streaming device.

The fourth communication device 104 is configured to, e.g., by means of an

obtaining module 1101 configured to, obtain the signals configured to be collected by the receiving device 130 located in the space where the one or more users of the third communication device 103 are located, during the time period. The signals are configured to be at least one of: audio signals and video signals.

The fourth communication device 104 is further configured to, e.g., by means of a determining module 1102 configured to, determine the characterization of the context of the third communication device 103 by determining one or more factors configured to be obtained from an analysis of the signals configured to be obtained. The one or more factors may comprise at least one of: a) the one or more characteristics of the one or more users from: number, gender, age, and identity of the one or more users; b) the first mood configured to be derived from the tone of one or more voices configured to be detected in the audio signals; c) the second mood configured to be derived from the first semantic analysis of a language used by one or more voices configured to be detected in the audio signals; d) the topic of discussion configured to be derived from the second semantic analysis of a language used by the one or more voices configured to be detected in the audio signals, e) the third mood configured to be derived from at least one of: the body movement and the gesture configured to be detected in each of the one or more users.

In some embodiments, the signals may be configured to be audio signals, and, e.g., by means of the determining module 1102 further configured to, at least one of: a) the one or more characteristics may be configured to be derived by segmenting the audio signals configured to be collected during the time period, into single speaker segments; b) the first mood may be configured to be derived based on the natural language processing of the transcript of the single speaker segments, configured to be obtained by Automatic Speech Recognition; c) the first semantic analysis may be configured to be based on the one or more first language models; and d) the second semantic analysis may be configured to be based on the one or more second language models. In some embodiments, the signals are configured to be video signals and, the one or more factors may comprise at least one of: a) the one or more characteristics of the one or more users from: number, gender, age, and identity of the one or more users; and b) the third mood configured to be derived from at least one of: the body movement and 5 the gesture configured to be detected in each of the one or more users.

The fourth communication device 104 is further configured to, e.g., by means of an initiating module 1103 configured to, initiate sending the second indication of the characterization of the context configured to be determined to a first communication device 101 configured to operate in the telecommunications network 100 or another 10 communication device 102, 103, 104 configured to operate in the telecommunications network 100.

The fourth communication device 104 may be further configured to, e.g., by means of an obtaining module 1104 configured to, obtain the model for each one of the one or more factors, based on repeatedly performing the obtaining of the signals and the

15 determining of the one or more factors over a plurality of time periods. The second

indication may be configured to further indicate the obtained model for each one of the one or more factors.

The embodiments herein may be implemented through one or more processors, such as a processor 1105 in the fourth communication device 104 depicted in Figure 11 ,

20 together with computer program code for performing the functions and actions of the

embodiments herein. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the in the fourth communication device 104. One such carrier may be in the form of a CD ROM disc.

25 It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the fourth communication device 104.

The fourth communication device 104 may further comprise a memory 1106 comprising one or more memory units. The memory 1 106 is arranged to be used to store

30 obtained information, store data, configurations, schedulings, and applications etc. to perform the methods herein when being executed in the fourth communication device 104.

In some embodiments, the fourth communication device 104 may receive information from the first communication device 101 , the second communication device 35 102, the third communication device 103, and/or any of the pertinent databases described above, through a receiving port 1107. In some embodiments, the receiving port 1107 may be, for example, connected to one or more antennas in fourth communication device 104. In other embodiments, the fourth communication device 104 may receive

information from another structure in the telecommunications network 100 through the 5 receiving port 1 107. Since the receiving port 1 107 may be in communication with the processor 1 105, the receiving port 1 107 may then send the received information to the processor 1 105. The receiving port 1 107 may also be configured to receive other information.

The processor 1 105 in the fourth communication device 104 may be further

10 configured to transmit or send information to e.g., the first communication device 101 , the second communication device 102, the third communication device 103, and/or any of the pertinent databases described above, through a sending port 1108, which may be in communication with the processor 1105, and the memory 1106.

Those skilled in the art will also appreciate that the obtaining module 1 101 , the

15 determining module 1102, the initiating module 1 103 and the obtaining module 1104

described above may refer to a combination of analog and digital modules, and/or one or more processors configured with software and/or firmware, e.g., stored in memory, that, when executed by the one or more processors such as the processor 1 105, perform as described above. One or more of these processors, as well as the other digital hardware,

20 may be included in a single Application-Specific Integrated Circuit (ASIC), or several

processors and various digital hardware may be distributed among several separate components, whether individually packaged or assembled into a System-on-a-Chip (SoC). In one example the processor 1 105 may be implemented as several processors, such as the Client CTRL 606, or more specifically, the Client ASA MGR 701 , and the SERV CTRL

25 607, or more specifically, the SERV ASA MGR 702, distributed among several separate components, such as the Media Client and the Media Server.

Also, in some embodiments, the different modules 1 101-1 104 described above may be implemented as one or more applications running on one or more processors such as the processor 1105.

30 Thus, the methods according to the embodiments described herein for the fourth communication device 104 may be respectively implemented by means of a computer program 1109 product, comprising instructions, i.e., software code portions, which, when executed on at least one processor 1105, cause the at least one processor 1105 to carry out the action described herein, as performed by the fourth communication device 104.

35 The computer program 1 109 product may be stored on a computer-readable storage medium 1 110. The computer-readable storage medium 11 10, having stored thereon the computer program 1109, may comprise instructions which, when executed on at least one processor 1 105, cause the at least one processor 1 105 to carry out the action described herein, as performed by the fourth communication device 104. In some embodiments, the computer-readable storage medium 1 110 may be a non-transitory computer-readable storage medium 11 10, such as a CD ROM disc, or a memory stick. In other

embodiments, the computer program 1109 product may be stored on a carrier containing the computer program 1 109 just described, wherein the carrier is one of an electronic signal, optical signal, radio signal, or the computer-readable storage medium 1 110, as described above.

To perform the method actions described above in relation to Figures 4, 5-7 and 9, the second communication device 102 for handling the recommendation from the first communication device 101 may comprise the following arrangement depicted in Figure 12. As stated earlier, the first communication device 101 and the second communication device 102 may be configured to operate in the telecommunications network 100.

The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the second

communication device 102, and will thus not be repeated here. For example, the content may be a media content, and the third communication device 103 may be a media streaming device.

The second communication device 102 is configured to, e.g., by means of a receiving module 1201 configured to, receive the first indication for the recommendation for the item of content to be provided by the third communication device 103 configured to operate in the telecommunications network 100. The recommendation is based on the signals collected by the receiving device 130 configured to be located in the space where one or more users of the third communication device 103 are located, during the time period. The signals are configured to be at least one of: audio signals and video signals.

The second communication device 102 is further configured to, e.g., by means of an initiating module 1202 configured to, initiate providing, to the one or more users, the third indication of the recommendation configured to be received on the interface 110 of the second communication device 102.

The second communication device 102 may be further configured to, e.g., by means of the initiating module 1202 configured to, initiate providing the item of content on the third communication device 103, based on a selection configured to be received from the one or more users on the interface 110. The selection may be further configured to be based on the third indication configured to be provided.

The embodiments herein may be implemented through one or more processors, such as a processor 1203 in the second communication device 102 depicted in Figure 12, together with computer program code for performing the functions and actions of the embodiments herein. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the in the second communication device 102. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the second communication device 102.

The second communication device 102 may further comprise a memory 1204 comprising one or more memory units. The memory 1204 is arranged to be used to store obtained information, store data, configurations, schedulings, and applications etc. to perform the methods herein when being executed in the second communication device

102.

In some embodiments, the second communication device 102 may receive information from the first communication device 101 , the third communication device 103, and/or the fourth communication device 104, through a receiving port 1205. In some embodiments, the receiving port 1205 may be, for example, connected to one or more antennas in second communication device 102. In other embodiments, the second communication device 102 may receive information from another structure in the telecommunications network 100 through the receiving port 1205. Since the receiving port 1205 may be in communication with the processor 1203, the receiving port 1205 may then send the received information to the processor 1203. The receiving port 1205 may also be configured to receive other information.

The processor 1203 in the second communication device 102 may be further configured to transmit or send information to e.g., the first communication device 101 , the third communication device 103, and/or the fourth communication device 104, through a sending port 1206, which may be in communication with the processor 1203, and the memory 1204.

Those skilled in the art will also appreciate that the receiving module 1201 , and the initiating module 1202 described above may refer to a combination of analog and digital modules, and/or one or more processors configured with software and/or firmware, e.g., stored in memory, that, when executed by the one or more processors such as the processor 1203, perform as described above. One or more of these processors, as well as the other digital hardware, may be included in a single Application-Specific Integrated Circuit (ASIC), or several processors and various digital hardware may be distributed among several separate components, whether individually packaged or assembled into a System-on-a-Chip (SoC). In one example the processor 1203 may be implemented as several processors, distributed among several separate components, such as the Media Client and the Media Server.

Also, in some embodiments, the different modules 1201-1202 described above may be implemented as one or more applications running on one or more processors such as the processor 1203.

Thus, the methods according to the embodiments described herein for the second communication device 102 may be respectively implemented by means of a computer program 1207 product, comprising instructions, i.e., software code portions, which, when executed on at least one processor 1203, cause the at least one processor 1203 to carry out the action described herein, as performed by the second communication device 102. The computer program 1207 product may be stored on a computer-readable storage medium 1208. The computer-readable storage medium 1208, having stored thereon the computer program 1207, may comprise instructions which, when executed on at least one processor 1203, cause the at least one processor 1203 to carry out the actions described herein, as performed by the second communication device 102. In some embodiments, the computer-readable storage medium 1208 may be a non-transitory computer-readable storage medium 1208, such as a CD ROM disc, or a memory stick. In other

embodiments, the computer program 1207 product may be stored on a carrier containing the computer program 1207 just described, wherein the carrier is one of an electronic signal, optical signal, radio signal, or the computer-readable storage medium 1208, as described above.

Claims

1. A method performed by a first communication device (101), the method being for handling a recommendation to a second communication device (102), the first communication device (101) and the second communication device (102) operating in a telecommunications network (100), the method comprising:

- determining (202) a recommendation for an item of content to be provided by a third communication device (103) operating in the telecommunications network (100), the determination of the recommendation being based on signals collected by a receiving device (130) located in a space where one or more users of the third communication device (103) are located, during a time period, the signals being at least one of: audio signals and video signals, and

- initiating (203) sending a first indication of the determined recommendation to the second communication device (102).

The method according to claim 1 , wherein the signals are audio signals, and wherein the recommendation is further based on one or more first factors obtained from a first analysis of the audio signals collected, wherein the one or more first factors comprise at least one of:

a. one or more characteristics of the one or more users from: number,

gender, age, and identity of the one or more users;

b. a first mood derived from a tone of one or more voices detected in the audio signals;

c. a second mood derived from a first semantic analysis of a language used by the one or more voices detected in the audio signals;

d. a topic of discussion derived from a second semantic analysis of a

language used by the one or more voices detected in the audio signals.

The method according to claim 2, wherein at least one of:

a. the one or more characteristics are derived by segmenting the audio

signals collected during the time period, into single speaker segments; b. the first mood is derived based on a natural language processing of a transcript of the single speaker segments, obtained by Automatic Speech Recognition;

c. the first semantic analysis is based on one or more first language models; d. the second semantic analysis is based on one or more second language models.

The method according to any of claims 1-3, wherein the signals are video signals, and wherein the recommendation is further based on one or more second factors obtained from a second analysis of the video signals collected, wherein the one or more second factors comprise at least one of:

b. a third mood derived from at least one of: a body movement and a gesture detected in each of the one or more users distinguished in the video signals.

The method according to any of claims 2-4, further comprising:

- obtaining (201) a characterization of a context of the third communication device (103) by determining, based on a processing of the signals collected, at least one of: the one or more first factors, and the one or more second factors,

and wherein the determining (202) of the recommendation is further based on the obtained characterization of the context.

The method according to any of claims 1-5, wherein the content is a media content, and wherein the third communication device (103) is a media streaming device.

The method according to any of claims 1-6, wherein the second communication device (102) is the third communication device (103).

A computer program (1008), comprising instructions which, when executed on at least one processor (1004), cause the at least one processor (1004) to carry out the method according to any one of claims 1 to 7.

A computer-readable storage medium (1009), having stored thereon a computer program (1008), comprising instructions which, when executed on at least one processor (1004), cause the at least one processor (1004) to carry out the method according to any one of claims 1 to 7.

10. A method performed by a fourth communication device (104), the method being for handling a characterization of a context of a third communication device (103) having one or more users, the fourth communication device (104) and the third communication device (103) operating in a telecommunications network (100), the method comprising:

- obtaining (301) signals collected by a receiving device (130) located in a space where the one or more users of the third communication device (103) are located, during a time period, the signals being at least one of: audio signals and video signals,

- determining (303) the characterization of the context of the third

communication device (103) by determining one or more factors obtained from an analysis of the obtained signals, wherein the one or more factors comprise at least one of:

a) one or more characteristics of the one or more users from: number, gender, age, and identity of the one or more users; b) a first mood derived from a tone of one or more voices detected in the audio signals;

c) a second mood derived from a first semantic analysis of a language used by one or more voices detected in the audio signals;

d) a topic of discussion derived from a second semantic analysis of a language used by one or more voices detected in the audio signals, e) a third mood derived from at least one of: a body movement and a gesture detected in each of the one or more users, and

- initiating (304) sending a second indication of the determined characterization of the context to a first communication device (101) operating in the telecommunications network (100) or another communication device (102, 103, 104) operating in the telecommunications network (100).

1 1. The method according to claim 10, wherein the signals are audio signals, and

wherein at least one of:

a. the one or more characteristics are derived by segmenting the audio

signals collected during the time period, into single speaker segments; b. the first mood is derived based on a natural language processing of a

transcript of the single speaker segments, obtained by Automatic Speech Recognition; c. the first semantic analysis is based on one or more first language models; d. the second semantic analysis is based on one or more second language models. 12. The method according to any of claims 10-1 1 , wherein the signals are video

signals, and wherein the one or more factors comprise at least one of:

b. the third mood derived from at least one of: the body movement and the gesture detected in each of the one or more users.

13. The method according to claim 1 1 , further comprising:

- obtaining (302) a model for each one of the one or more factors, based on repeatedly performing the obtaining (301) of the signals and the determining of the one or more factors over a plurality of time periods, and

wherein the second indication further indicates the obtained model for each one of the one or more factors.

14. The method according to any of claims 10-13, wherein the content is a media

content, and wherein the third communication device (103) is a media streaming device.

15. A computer program (1109), comprising instructions which, when executed on at least one processor (1 105), cause the at least one processor (1105) to carry out the method according to any one of claims 10 to 14.

16. A computer-readable storage medium (11 10), having stored thereon a computer program (1109), comprising instructions which, when executed on at least one processor (1 105), cause the at least one processor (1105) to carry out the method according to any one of claims 10 to 14.

17. A method performed by a second communication device (102), the method being for handling a recommendation from a first communication device (101), the first communication device (101) and the second communication device (102) operating in a telecommunications network (100), the method comprising:

- receiving (401) a first indication for a recommendation for an item of content to be provided by a third communication device (103) operating in the telecommunications network (100), the recommendation being based on signals collected by a receiving device (130) located in a space where one or more users of the third communication device (103) are located, during a time period, the signals being at least one of: audio signals and video signals, and

- initiating (402) providing, to the one or more users, a third indication of the received recommendation on an interface (110) of the second communication device (102).

The method according to claim 13, further comprising:

- initiating (403) providing the item of content on the third communication device (103), based on a selection received from the one or more users on the interface (1 10), the selection being further based on the provided third indication.

A computer program (1207), comprising instructions which, when executed on at least one processor (1203), cause the at least one processor (1203) to carry out the method according to any one of claims 17 to 18.

A computer-readable storage medium (1208), having stored thereon a computer program (1207), comprising instructions which, when executed on at least one processor (1203), cause the at least one processor (1203) to carry out the method according to any one of claims 17 to 18.

A first communication device (101) for handling a recommendation to a second communication device (102), the first communication device (101) and the second communication device (102) being configured to operate in a telecommunications network (100), the first communication device (101) being further configured to:

- determine a recommendation for an item of content to be provided by a third communication device (103) configured to operate in the telecommunications network (100), the determination of the recommendation being configured to be based on signals configured to be collected by a receiving device (130) located in a space where one or more users of the third communication device (103) are located, during a time period, the signals being configured to be at least one of: audio signals and video signals, and

- initiate sending a first indication of the recommendation configured to be determined to the second communication device (102).

22. The first communication device (101) according to claim 21 , wherein the signals are configured to be audio signals, and wherein the recommendation is further configured to be based on one or more first factors configured to be obtained from a first analysis of the audio signals configured to be collected, wherein the one or more first factors comprise at least one of:

a. one or more characteristics of the one or more users from: number,

gender, age, and identity of the one or more users;

b. a first mood configured to be derived from a tone of one or more voices configured to be detected in the audio signals;

c. a second mood configured to be derived from a first semantic analysis of a language used by the one or more voices configured to be detected in the audio signals;

23. The first communication device (101) according to claim 21 , wherein at least one of:

a. the one or more characteristics are configured to be derived by segmenting the audio signals configured to be collected during the time period, into single speaker segments;

b. the first mood is configured to be derived based on a natural language processing of a transcript of the single speaker segments, configured to be obtained by Automatic Speech Recognition;

c. the first semantic analysis is configured to be based on one or more first language models;

d. the second semantic analysis is configured to be based on one or more second language models.

24. The first communication device (101) according to any of claims 21-23, wherein the signals are configured to be video signals, and wherein the recommendation is further based on one or more second factors configured to be obtained from a second analysis of the video signals configured to be collected, wherein the one or more second factors comprise at least one of: a. the one or more characteristics of the one or more users from: number, gender, age, and identity of the one or more users; and

b. a third mood configured to be derived from at least one of: a body

movement and a gesture configured to be detected in each of the one or more users configured to be distinguished in the video signals. The first communication device (101) according to any of claims 21-24, further configured to:

- Obtain a characterization of a context of the third communication device (103) by determining, based on a processing of the signals configured to be collected, at least one of: the one or more first factors, and the one or more second factors,

and wherein the determining of the recommendation is configured to be further based on the characterization of the context configured to be obtained. The first communication device (101) according to any of claims 21-25, wherein the content is a media content, and wherein the third communication device (103) is a media streaming device. The first communication device (101) according to any of claims 21-26, wherein the second communication device (102) is the third communication device (103). A fourth communication device (104) for handling a characterization of a context of a third communication device (103) configured to have one or more users, the fourth communication device (104) and the third communication device (103) being configured to operate in a telecommunications network (100), the fourth communication device (104) being further configured to:

- obtain signals configured to be collected by a receiving device (130) located in a space where the one or more users of the third communication device (103) are located, during a time period, the signals being configured to be at least one of: audio signals and video signals,

- determine the characterization of the context of the third communication device (103) by determining one or more factors configured to be obtained from an analysis of the signals configured to be obtained, wherein the one or more factors comprise at least one of: a. one or more characteristics of the one or more users from: number, gender, age, and identity of the one or more users; b. a first mood configured to be derived from a tone of one or more voices configured to be detected in the audio signals;

c. a second mood configured to be derived from a first semantic analysis of a language used by one or more voices configured to be detected in the audio signals;

d. a topic of discussion configured to be derived from a second semantic analysis of a language used by one or more voices configured to be detected in the audio signals, e. a third mood configured to be derived from at least one of: a body movement and a gesture configured to be detected in each of the one or more users, and

- initiate sending a second indication of the characterization of the context configured to be determined to a first communication device (101) configured to operate in the telecommunications network (100) or another communication device (102, 103, 104) configured to operate in the telecommunications network (100).

29. The fourth communication device (104) according to claim 28, wherein the signals are configured to be audio signals, and wherein at least one of:

30. The fourth communication device (104) according to any of claims 28-29, wherein the signals are configured to be video signals, and wherein the one or more factors comprise at least one of: a. the one or more characteristics of the one or more users from: number, gender, age, and identity of the one or more users; and

b. the third mood configured to be derived from at least one of: the body

movement and the gesture configured to be detected in each of the one or more users.

31. The fourth communication device (104) according to claim 29, further configured to:

- obtain a model for each one of the one or more factors, based on repeatedly performing the obtaining of the signals and the determining of the one or more factors over a plurality of time periods, and

wherein the second indication is configured to further indicate the obtained model for each one of the one or more factors.

32. The fourth communication device (104) according to any of claims 28-31 , wherein the content is a media content, and wherein the third communication device (103) is a media streaming device.

33. A second communication device (102) for handling a recommendation from a first communication device (101), the first communication device (101) and the second communication device (102) being configured to operate in a telecommunications network (100), the second communication device (102) being further configured to:

- receive a first indication for a recommendation for an item of content to be provided by a third communication device (103) configured to operate in the telecommunications network (100), the recommendation being based on signals collected by a receiving device (130) configured to be located in a space where one or more users of the third communication device (103) are located, during a time period, the signals being configured to be at least one of: audio signals and video signals, and

- initiate providing, to the one or more users, a third indication of the

recommendation configured to be received on an interface (110) of the second communication device (102).

34. The second communication device (102) according to claim 33, further configured to:

- initiate providing the item of content on the third communication device (103), based on a selection configured to be received from the one or more users on the interface (1 10), the selection being further configured to be based on the third indication configured to be provided.