DE102024107071A1

DE102024107071A1 - Media system and method for context-related recognition of an emotional state in a motor vehicle by means of a media system comprising a voice-controlled human-machine interface (MMS)

Info

Publication number: DE102024107071A1
Application number: DE102024107071.4A
Authority: DE
Inventors: Manya Sahakyan; Doreen Engelhardt; Lena Rittger; Norbert Pfleger
Original assignee: Audi AG
Current assignee: Audi AG
Priority date: 2024-03-12
Filing date: 2024-03-12
Publication date: 2025-09-18
Also published as: WO2025190545A1

Abstract

Die Erfindung betrifft ein Mediensystem und Verfahren zur kontextbezogenen Erkennung eines emotionalen Zustands in einem Kraftfahrzeug mittels eines Mediensystems, das eine sprachgesteuerte Mensch-Maschine-Schnittstelle (MMS) umfasst. Dazu wird ein emotionalen Zustand (9) zumindest eines Nutzers mittels sensorbasiertem Emotion-Monitorings (6) erfasst. Zusätzlich wird eine Appraisal-Komponente (7) herangezogen, aus der ein erwarteter emotionaler Zustand (9) des zumindest einen Nutzers auf Grundlage von Kontextdaten prädiziert (1). So werden jeweils Beobachtungsdaten erzeugt. Als nächstes werden diese Beobachtungsdaten aus dem Emotion-Monitoring (6) und der Appraisal-Komponente (7) in einer Plausibilisierung (8) herangezogen, die den emotionalen Zustand (9) des zumindest einen Nutzers validiert (4). Anschließend wird gemäß des validierten (4) emotionalen Zustands (9) zumindest eine Komponente durch das Mediensystem gesteuert, z.B. eine Beleuchtung. The invention relates to a media system and method for context-related recognition of an emotional state in a motor vehicle by means of a media system comprising a voice-controlled human-machine interface (MMS). For this purpose, an emotional state (9) of at least one user is detected by means of sensor-based emotion monitoring (6). In addition, an appraisal component (7) is used, from which an expected emotional state (9) of the at least one user is predicted (1) on the basis of context data. Observation data is thus generated in each case. Next, these observation data from the emotion monitoring (6) and the appraisal component (7) are used in a plausibility check (8), which validates (4) the emotional state (9) of the at least one user. Subsequently, at least one component, e.g., lighting, is controlled by the media system according to the validated (4) emotional state (9).

Description

Die Erfindung betrifft ein Mediensystem und Verfahren zur kontextbezogenen Erkennung eines emotionalen Zustands in einem Kraftfahrzeug mittels eines Mediensystems, das eine sprachgesteuerte Mensch-Maschine-Schnittstelle (MMS) umfasst.The invention relates to a media system and method for context-related recognition of an emotional state in a motor vehicle by means of a media system comprising a voice-controlled human-machine interface (MMS).

Es ist bekannt, Emotionserkennung im Kraftfahrzeug als Beitrag zur Erhöhung der Fahrsicherheit durchzuführen. Dabei ist es das Ziel, Emotionen oder einen affektiven oder emotionalen Zustand eines Nutzers zu erfassen und durch ein Fahrzeugsystem, wie z. B. einem Mediensystem, zu regulieren, indem z.B. ein Innenraumparameter wie die Beleuchtung angepasst oder Musik eingespielt werden. Technisch lassen sich Emotionen durch Sensorvorrichtungen, wie z.B. eine Video- und/oder Audio-Analyse und/oder Text-Analyse von Sprachäußerungen im Innenraum auf der Basis von Video- und/oder Audiosignalen erfassen. Daneben können Fahrzeug- und Körpersensoren eingesetzt werden, um vom Fahrverhalten bzw. von physiologischen Parametern auf einen emotionalen Zustand zu schließen. Die Erfassung oder Erkennung des emotionalen Zustands eines Nutzers allein durch äußere Beobachtung gestaltet sich jedoch als anspruchsvoll, da ein emotionaler Zustand nicht immer in klar erkennbarer oder expressiver Weise ausgedrückt wird.Emotion recognition in motor vehicles is known to contribute to increasing driving safety. The goal is to capture emotions or an affective or emotional state of a user and regulate them using a vehicle system, such as a media system, by adjusting an interior parameter such as lighting or playing music. Technically, emotions can be captured using sensor devices, such as video and/or audio analysis and/or text analysis of speech utterances in the interior based on video and/or audio signals. In addition, vehicle and body sensors can be used to infer an emotional state from driving behavior or physiological parameters. However, capturing or recognizing a user's emotional state through external observation alone is challenging, as an emotional state is not always expressed in a clearly recognizable or expressive manner.

Die GB2607086A offenbart einen digitalen Assistenten für einen Nutzer eines Fahrzeugs. Der digitale Assistent kann verschiedene Rollen einnehmen, wie z. B: als Coach, Mitfahrer oder Freund. Eine Emotionsdetektion wird aus Benutzerbeobachtung durchgeführt.The GB2607086A reveals a digital assistant for a vehicle user. The digital assistant can assume various roles, such as coach, passenger, or friend. Emotion detection is performed based on user observation.

Aber die Emotionserkennung ist vor allem bei einer Empathie-gesteuerten Interaktion zwischen dem Nutzer und dem digitalen Assistenten gewünscht, was der Stand der Technik nicht liefern kann.But emotion recognition is especially desirable for empathy-driven interactions between the user and the digital assistant, which the current state of technology cannot provide.

Die DE 10 2005 058 227 A1 offenbart einen Emotions-basierten Software-Roboter für ein Fahrzeug. In Abhängigkeit von erfassten Verhaltensweisen eines Fahrers in Verbindung mit einer Erkennung bestimmter Kontexte bzw. Situationen wird auf eine Emotion des Fahrers geschlossen und eine auf die Emotion eingehende Maßnahme eingeleitet.The DE 10 2005 058 227 A1 discloses an emotion-based software robot for a vehicle. Based on the driver's behavior and the recognition of specific contexts or situations, the system infers the driver's emotion and initiates a corresponding action.

Der Erfindung liegt die Aufgabe zugrunde, einen emotionalen Zustand eines Nutzers zu erfassen und eine einfühlsam gestaltete Antwort, die auf den erfassten emotionalen Zustand des Nutzers abgestimmt ist, für ihn bereitzustellen.The invention is based on the object of detecting an emotional state of a user and providing him with a sensitively designed response that is tailored to the detected emotional state of the user.

Die Aufgabe wird durch die Gegenstände der unabhängigen Patentansprüche gelöst. Vorteilhafte Weiterbildungen der Erfindung sind durch die abhängigen Patentansprüche, die folgende Beschreibung sowie die Figuren beschrieben.The problem is solved by the subject matter of the independent patent claims. Advantageous developments of the invention are described by the dependent patent claims, the following description, and the figures.

Die Erfindung betrifft ein Mediensystem und Verfahren zur kontextbezogenen Erkennung eines emotionalen Zustands im Kraftfahrzeug mittels eines Mediensystems, das eine sprachgesteuerte Mensch-Maschine-Schnittstelle (MMS) umfasst. Das Mediensystem, aufweisend die sprachgesteuerte Mensch-Maschine-Schnittstelle, ist als softwarebasierte und/oder hardwarebasierte Komponente zu verstehen, durch die man beispielsweise einen Medieninhalt ausgeben lassen kann und/oder eine Beleuchtung einstellen und/oder eine Ergonomieeinstellung, zum Beispiel eines Sitzes, vornehmen und/oder eine Innenraumtemperatur zum Beispiel eines Kraftfahrzeugs regulieren kann. Das Mediensystem und/oder die Mensch-Maschine-Schnittstelle kann mit verschiedenen Fahrzeugsensoren über CAN-Bus Schnittstellen interagieren, um z. B. eine Sitzmassage oder eine Ambientebleuchtung anzusteuern. Durch das Mediensystem werden die folgenden Schritte durchgeführt.The invention relates to a media system and method for context-related recognition of an emotional state in a motor vehicle by means of a media system comprising a voice-controlled human-machine interface (MMS). The media system, comprising the voice-controlled human-machine interface, is to be understood as a software-based and/or hardware-based component through which, for example, media content can be output and/or lighting can be adjusted and/or ergonomic settings, for example of a seat, can be made and/or an interior temperature, for example of a motor vehicle, can be regulated. The media system and/or the human-machine interface can interact with various vehicle sensors via CAN bus interfaces in order to control, for example, a seat massage or ambient lighting. The following steps are carried out by the media system.

In einem Schritt a) wird ein emotionaler Zustand zumindest eines Nutzers mittels sensorbasiertem Emotion-Monitoring erfasst. Dies geschieht, indem physische Merkmale, wie zum Beispiel die Stimme und/oder Gesichtsausdrücke des zumindest einen Nutzers erfasst und zu einem emotionalen Zustand interpretiert werden. Mit anderen Worten wird der emotionale Zustand des zumindest einen Nutzers mittel zumindest einer Sensorvorrichtung „von außen“ erkannt und/oder erfasst. Dieser Schritt dient als Indikator oder Hinweis für einen emotionalen Zustand des zumindest einen Nutzers.In step a), an emotional state of at least one user is recorded using sensor-based emotion monitoring. This is done by detecting physical characteristics, such as the voice and/or facial expressions of the at least one user, and interpreting them into an emotional state. In other words, the emotional state of the at least one user is detected and/or recorded "from the outside" using at least one sensor device. This step serves as an indicator or indication of an emotional state of the at least one user.

In einem Schritt b) wird eine Appraisal-Komponente oder Auswertungs-Komponente herangezogen, aus der ein erwarteter emotionaler Zustand des zumindest einen Nutzers auf Grundlage von Kontextdaten prädiziert wird. Mit anderen Worten soll also ein (erwarteter) emotionaler Zustand „von innen“ oder implizit, also ein innerer Vorgang des zumindest einen Nutzers erfasst werden, der auf einen resultierenden emotionalen Zustand schließen lässt (z.B. Stress durch einen Zielkonflikt). Dies geschieht lediglich auf Grundlage von Kontextdaten, welche beispielsweise Informationen aus einem oder mehreren Kalendern des zumindest einen Nutzers und/oder aus einem Globalen Navigationssystem heranziehen und daraus entnehmen können, ob der zumindest einen Nutzer z. B. pünktlich zu einem Termin sein wird. Beispielsweise kann ein Informations-Extraktor implementiert sein, welcher Termine, wie z. B. anstehende Flüge und/oder Videokonferenzen, automatisch z. B. aus Cloud-Services extrahiert und/oder speichert. Zusätzliche Informationen, darunter Nutzerdaten, wie z. B. der Name des Nutzers, sein Geburtsdatum, Kontaktdaten und/oder berufliche Details, können ebenfalls abgerufen und/oder extrahiert werden. Die Integration von Nutzerdaten kann dazu beitragen, die Qualität der Sprachverarbeitung für den Sprachassistenten zu verbessern, indem beispielsweise der Dialog mit dem Nutzer personalisiert wird. Der Schritt b) kann somit in einer Ausführungsform insofern sensorlos erfolgen, als dass ohne das äußere Erscheinungsbild des Nutzers zu berücksichtigen dennoch auf eine prädizieren emotionalen Zustand rückgeschlossen wird.In step b), an appraisal component or evaluation component is used, from which an expected emotional state of the at least one user is predicted based on context data. In other words, an (expected) emotional state is to be recorded "from within" or implicitly, i.e. an internal process of the at least one user, which allows inferences to be drawn on a resulting emotional state (e.g., stress due to a conflict of goals). This is done solely on the basis of context data, which, for example, draws information from one or more calendars of the at least one user and/or from a global navigation system and can determine, for example, whether the at least one user will be on time for an appointment. For example, an information extractor can be implemented which automatically extracts and/or saves appointments, such as upcoming flights and/or video conferences, e.g., from cloud services. Additional information, including user data, such as the user's name, date of birth, contact details and/or professional Details can also be retrieved and/or extracted. The integration of user data can contribute to improving the quality of speech processing for the voice assistant, for example, by personalizing the dialogue with the user. Step b) can thus be performed sensorlessly in one embodiment, in that a predicted emotional state can be inferred without taking the user's external appearance into account.

Ziel der Appraisal-Komponente ist es, eine Erwartungshaltung zu entwickeln, welche Emotion der Nutzer, gegeben einen situativen Kontext, einnehmen könnte. Somit wird versucht, anhand des situativen Kontextes eine Erklärung für die möglicherweise vorliegende Emotion oder den emotionalen Zustand des Nutzers zu ermitteln, was als appraisal bezeichnet werden kann. Ein einfaches, anschauliches Beispiel ist die Erwartung, dass Autofahrer nicht gerne im Stau stehen und somit, wenn man aktuell in Stau steht, der Nutzer wahrscheinlich schlecht gelaunt ist. Dabei ist nicht jedes Ereignis oder Kontextinformation für jeden use case oder Anwendungsfall gleich relevant. Zum Beispiel ist, wenn nach Informationen zur Ankunft und Fahrzeit gefragt wird, die Kontextinformation „Stau“ wichtig, die Bedienbarkeit ist aber viel relevanter, wenn man bestimmte Aktionen mit dem Sprachassistenten durchführen möchte, z.B. mit einem Kontakt aus dem Adressbuch telefonieren. Die Realisierung der Appraisal-Komponente ist mit verschiedenen Methoden der Künstlichen Intelligenz (KI) möglich.The goal of the appraisal component is to develop an expectation of which emotion the user might experience, given a situational context. Thus, the attempt is made to use the situational context to determine an explanation for the user's potentially present emotion or emotional state, which can be referred to as appraisal. A simple, illustrative example is the expectation that drivers don't like being stuck in traffic, and thus, if one is currently stuck in traffic, the user is likely to be in a bad mood. Not every event or contextual information is equally relevant for every use case. For example, if information about arrival and travel time is requested, the contextual information "traffic jam" is important, but usability is much more relevant if one wants to perform certain actions with the voice assistant, e.g., make a phone call to a contact from the address book. The appraisal component can be implemented using various artificial intelligence (AI) methods.

Ist der zumindest eine Nutzer beispielsweise nicht pünktlich, also zum Beispiel um eine halbe Stunde oder eine Stunde verspätet (allgemein liegt also ein Zielkonflikt vor), so kann dem zumindest einen Nutzer automatisch der erwartete emotionale Zustand „Wut“ zugeordnet werden. Eine Pünktlichkeit des Nutzers zu einem Termin kann vom Informations-Extraktor berechnet werden, z. B. durch Verwendung des globalen Navigationssystems, indem eine Zeitdifferenz zwischen dem erwarteten Ankunftszeitpunkt und dem tatsächlichen Eintreffen an einem Zielort des Nutzers berechnet wird. Anhand äußerer Begebenheiten soll also auf einen potentiellen emotionalen Zustand des zumindest einen Nutzers geschlossen werden. Sollte sich beispielsweise der emotionale Zustand „Wut“ bestätigen, so kann der Sprachassistent versuchen den Nutzer zu beruhigen und/oder ihn z. B. fragen, ob ein Massageprogramm für ihn gestartet werden soll und/oder z. B. ausgeben: „Keine Sorge, Du erreichst dein Ziel wie geplant gegen 14 Uhr! Soll ich das Relaxprogramm aktivieren?“.If, for example, at least one user is not on time, for example, half an hour or an hour late (generally a conflict of goals exists), the expected emotional state of "anger" can be automatically assigned to at least one user. The information extractor can calculate the user's punctuality for an appointment, e.g., by using the global navigation system, by calculating a time difference between the expected arrival time and the actual arrival at the user's destination. Based on external events, the potential emotional state of at least one user can be inferred. If, for example, the emotional state of "anger" is confirmed, the voice assistant can attempt to calm the user and/or, for example, ask whether a massage program should be started for them and/or, for example, output: "Don't worry, you will reach your destination as planned around 2 p.m.! Should I activate the relaxation program?"

Dies ist insbesondere sinnvoll, wenn in Schritt a) ein neutraler emotionaler Zustand, also der Zustand „Neutral“ erfasst wird, was vorkommen kann, wenn sich der zumindest eine Nutzer z. B. möglicherweise nicht expressiv ausdrückt, wodurch die zugrundeliegende Emotion seines emotionalen Zustands, wie etwa Freude oder Wut, nicht offensichtlich gezeigt oder offenbart wird.This is particularly useful if a neutral emotional state, i.e. the state “Neutral”, is recorded in step a), which can occur if, for example, at least one user may not express themselves expressively, whereby the underlying emotion of their emotional state, such as joy or anger, is not obviously shown or revealed.

Die Schritte a) und b) können unabhängig voneinander durchgeführt werden, wobei jeweils Beobachtungsdaten erzeugt werden.Steps a) and b) can be performed independently, each generating observation data.

In einem Schritt c) werden die Beobachtungsdaten aus dem Emotion-Monitoring und der Appraisal-Komponente in einer Plausibilisierung zusammengeführt, wobei die Plausibilisierung den emotionalen Zustand des zumindest einen Nutzers validiert und/oder prädiziert. Hierzu kann eingestellt sein, das der Schritt a) oder b) unterschiedlich bewertet werden beziehungsweise die Beobachtungsdaten aus a) und b) jeweils eine unterschiedliche Gewichtung umfassen. Mit anderen Worten führt die Plausibilisierung dazu, dass die in a) und b) jeweils erzeugten Beobachtungsdaten sich gegenseitig bestätigen können oder dass daraus geschlossen wird, dass eine Unsicherheit bei der Erfassung des emotionalen Zustands vorliegt, da jeweils ein unterschiedlicher emotionaler Zustand, also aus a) und b) erfasst beziehungsweise prädiziert wurde. Wird eine solche Unsicherheit gemessen, kann die Gewichtung der Appraisal-Komponente erhöht, insbesondere verdoppelt oder verdreifacht werden, um z. B. einen aus dem Schritt a) erfassten emotionalen Zustand „Neutral“, welcher womöglich den emotionalen Zustand „Wut“ oder „Freude“ „verschleiert“, da sich der Nutzer nicht expressiv ausgedrückt hat, in den tatsächlichen emotionalen Zustand umzudeuten. Damit lässt sich ein emotionaler Zustand des Nutzers gezielt und/oder genau bestimmen, wobei gemäß des erfassten emotionalen Zustands dann eine Steuerung einer Komponente des Mediensystems vorgenommen werden kann, um den emotionalen Zustand, z. B. „Freude“ zu erhalten oder bei „Wut“ abzumildern.In step c), the observation data from the emotion monitoring and the appraisal component are combined in a plausibility check, whereby the plausibility check validates and/or predicts the emotional state of at least one user. For this purpose, it can be set that step a) or b) are evaluated differently or that the observation data from a) and b) each have a different weighting. In other words, the plausibility check leads to the observation data generated in a) and b) being able to confirm each other or that it is concluded that there is uncertainty in the recording of the emotional state because a different emotional state, i.e. from a) and b), was recorded or predicted in each case. If such uncertainty is measured, the weighting of the appraisal component can be increased, in particular doubled or tripled, for example. For example, an emotional state of "neutral" captured in step a), which may "mask" the emotional state of "anger" or "joy" because the user has not expressed themselves expressively, can be reinterpreted as the actual emotional state. This allows the user's emotional state to be specifically and/or precisely determined, whereby a component of the media system can then be controlled according to the captured emotional state in order to maintain the emotional state, e.g., "joy," or to mitigate it in the case of "anger."

Ein mit der Erfindung realisierbares, konkretes Beispiel der Plausibilisierung ist folgendes (die Zahlenwerte drücken einen Anteil von 0 bis 1 aus, können also auch als Prozentanteil gelesen werden, d.h. 0.08 = 8%; Andere Werteintervalle sind vom Fachmann ebenfalls nutzbar):

Durch das Emotion-Monitoring im Schritt a) wurden beispielsweise folgende Wahrscheinlichkeiten erzielt:
- „Wut“: 0,08 „Neutral“: 0,67 „Freude“: 0,25

A concrete example of plausibility check that can be realized with the invention is the following (the numerical values express a proportion from 0 to 1, so they can also be read as a percentage, ie 0.08 = 8%; other value intervals can also be used by the person skilled in the art):

For example, the following probabilities were achieved through emotion monitoring in step a):
- “Anger”: 0.08 “Neutral”: 0.67 “Joy”: 0.25

Die Appraisal-Komponente prädiziert im Schritt b) einen emotionalen Zustand „Freude“, da der Nutzer z. B. pünktlich ist und kein Stau erfasst wurde.In step b), the appraisal component predicts an emotional state of “joy” because, for example, the user is on time and no traffic jam was detected.

Dann kann die Plausibilisierung im Schritt c) umfassen, dass folgendes berechnet wird:

Wahrscheinlichkeit („Freude“) - Wahrscheinlichkeit (Wut) > 0.1 -> True. Die Differenz zwischen der Wahrscheinlichkeit von „Freude“ und „Wut“ sollte größer als 0.1 sein, was hier der Fall ist. Es kann also zumindest eine Abstandsvorgabe oder Salienzvorgabe (Auffälligkeit) vorgesehen werden.

Then the plausibility check in step c) may include calculating the following:

Probability ("Joy") - Probability (Anger) > 0.1 -> True. The difference between the probability of "Joy" and "Anger" should be greater than 0.1, which is the case here. Therefore, at least a distance or salience (conspicuousness) requirement can be specified.

Wahrscheinlichkeit („Freude“) > Wahrscheinlichkeit („Neutral“) -> False. Die Wahrscheinlichkeit von „Freude“ sollte größer sein als die von „Neutral“, was hier nicht der Fall ist.Probability ("Joy") > Probability ("Neutral") -> False. The probability of "Joy" should be greater than that of "Neutral," which is not the case here.

Die Wahrscheinlichkeit „Neutral“ < 0.9 Wahrscheinlichkeit („Neutral“) -> True. Die Wahrscheinlichkeit von Neutralität sollte weniger als 0.9 betragen, was hier der Fall ist. Es kann also zumindest ein Schwellenwert vorgesehen werden.The probability of "Neutral" < 0.9 Probability ("Neutral") -> True. The probability of neutrality should be less than 0.9, which is the case here. Therefore, at least a threshold can be provided.

Obwohl die zweite Berechnung ein „False“ wiedergibt, wäre hier der validierte emotionale Zustand „Freude“, da die Appraisal-Komponente diesen emotionalen Zustand prädiziert und die Wahrscheinlichkeiten des Emotion-Monitorings nicht einen vorgegebenen Schwellwert z. B. für eine bestimmte Emotion überschreiten (z. B. Wahrscheinlichkeit „Wut“ über 0,7 oder 0,7 bis 1, da dies darauf hindeutet, dass der emotionale Zustand „Wut“ sehr sicher beim Nutzer vorzufinden ist).Although the second calculation returns a "False," the validated emotional state here would be "joy," since the appraisal component predicts this emotional state and the probabilities of the emotion monitoring do not exceed a predefined threshold, e.g., for a specific emotion (e.g., probability "anger" above 0.7 or 0.7 to 1, since this indicates that the emotional state "anger" is very likely to be present in the user).

Solche Regeln können vorab definiert sein und müssen nicht zwangsläufig mit den zuvor genannten übereinstimmen, sondern können vom Fachmann der jeweiligen Implementierung angepasst werden.Such rules can be defined in advance and do not necessarily have to correspond to those mentioned above, but can be adapted by the specialist to the respective implementation.

Die Schritte a), b) und c) können auch als „Late Fusion“ zusammengefasst werden.Steps a), b) and c) can also be summarized as “late fusion”.

Gemäß des validierten emotionalen Zustands kann dann zumindest eine Komponente durch das Mediensystem gesteuert werden. Ist der validierte emotionale Zustand beispielsweise „Freude“, so kann zum Beispiel ein Sprachassistent des Mediensystems seinen Interaktionsstil so an den zumindest einen Nutzer anpassen, dass dieser dem zumindest einen Nutzer Fragen stellt und/oder Konversationsthemen vorschlägt, die den positiven Gemütszustand des zumindest einen Nutzers aufrecht erhalten. Es kann auch vorgesehen sein, dass ein Medieninhalt ausgegeben wird, der mit dem emotionalen Zustand „Freude“ in Verbindung steht. Zum Beispiel könnten fröhliche Musikstücke und/oder positive Nachrichten und/oder unterhaltsame Geschichten vorgeschlagen werden. Es ist z. B. vorgesehen, eine Sprachausgabe durch ein sogenanntes Emotional Text-To-Speech (TTS) z. B. von einer Drittanbieter-Software umzusetzen. Die Software ermöglicht es, die Sprachausgabe mit verschiedenen Stilen zu steuern. Zum Beispiel könnte der „Cheerful Style“ einen positiven und fröhlichen Ton ausdrücken, während der „Sad Style“ einen traurigen Tonfall erzeugt. Diese Steuerung der Stile ermöglicht es, die emotionalen Nuancen der Sprachausgabe je nach Bedarf, also nach erkanntem emotionalen Zustand des Nutzers, anzupassen.According to the validated emotional state, at least one component can then be controlled by the media system. If the validated emotional state is "joy," for example, a voice assistant of the media system can adapt its interaction style to the at least one user by asking the at least one user questions and/or suggesting conversation topics that maintain the at least one user's positive state of mind. It can also be provided that media content is output that is associated with the emotional state of "joy." For example, cheerful pieces of music and/or positive news and/or entertaining stories could be suggested. For example, it is provided that speech output is implemented using so-called Emotional Text-To-Speech (TTS), e.g., from third-party software. The software makes it possible to control the speech output using different styles. For example, the "Cheerful Style" could express a positive and happy tone, while the "Sad Style" creates a sad tone. This control of styles makes it possible to adapt the emotional nuances of the speech output as needed, i.e. according to the user's recognized emotional state.

Zudem ist erfindungsgemäß vorgesehen, dass der Sprachassistent ein maschinelles Lernmodell, insbesondere ein künstliches neuronales Netzwerk aufweist. Das Lernmodell kann sich insofern an den zumindest einen Nutzer adaptieren, als dass ein emotionaler Zustand mit zumindest einer Steuerungskonstellation des Mediensystems verknüpft wird. Beispielsweise kann der emotionale Zustand „Freude“ gemäß einem Vollständigkeitskriterium mit dem Ausgeben von lauter Musik, z. B. über 70 Dezibel oder 75 bis 80 Dezibel, und/oder energischer Musik, z. B. Rock, Pop und/oder elektronische Musik, verknüpft sein, sodass zu einem späteren Zeitpunkt bei Validieren des emotionalen Zustands „Freude“ der Nutzer vom Sprachassistenten gefragt wird, ob er eine (laute) Ausgabe dieser Musik wünscht. Mit „Vollständigkeitskriterium“ ist gemeint, dass der Nutzer z. B. manuell eine solche Verknüpfung z. B. über eine Anzeigevorrichtung des Mediensystems speichert und/oder vornimmt und/oder dass ein jeweiliger emotionaler Zustand für eine vorgegebene Anzahl an Iterationen, z. B. 2 bis 8, mit zumindest einer Steuerungskonstellation, wie etwa das Ausgeben von lauter Musik, zusammen erfasst wurde, insbesondere ist der hauptsächliche Use Case auf die empathische Reaktionen von Sprachassistenten fokussiert.Furthermore, the invention provides that the voice assistant has a machine learning model, in particular an artificial neural network. The learning model can adapt to the at least one user in that an emotional state is linked to at least one control constellation of the media system. For example, the emotional state “joy” can be linked according to a completeness criterion with the output of loud music, e.g. over 70 decibels or 75 to 80 decibels, and/or energetic music, e.g. rock, pop and/or electronic music, so that at a later point in time when the emotional state “joy” is validated, the user is asked by the voice assistant whether they wish this music to be played (loudly). “Completeness criterion” means that the user, for example, manually saves and/or makes such a link, e.g. via a display device of the media system, and/or that a respective emotional state is repeated for a predetermined number of iterations, e.g. B. 2 to 8, with at least one control constellation, such as playing loud music, was recorded together, in particular the main use case is focused on the empathetic reactions of voice assistants.

Dadurch kann einerseits das subjektive Nutzergefühl des „Verstandenwerdens“ verstärkt werden und/oder andererseits die Effizienz der Dialogführung erhöht werden. Des Weiteren kann dadurch die Empathie und/oder eine vermeintliche Empathie des Sprachassistenten gegenüber dem zumindest einem Nutzer realisiert und/oder erhöht werden. Durch die Schritte a) und b) kann die Leistung und/oder Zuverlässigkeit der Emotionserkennung durch Bezug auf eine den emotionalen Zustand auslösende Situation verbessert werden. Des Weiteren kann die Nutzererfahrung durch eine bessere Nachvollziehbarkeit des Sprachassistenten, insbesondere durch Schritt b), also aufgrund der Kontextdaten, erhöht werden. Insgesamt soll also ein empathisches Systemverhalten des Sprachassistenten durch das Verständnis einer Situation und Ihrer emotionalen Implikation ermöglicht werden.This can, on the one hand, reinforce the user's subjective feeling of being "understood" and/or, on the other hand, increase the efficiency of the dialogue. Furthermore, the empathy and/or perceived empathy of the voice assistant towards at least one user can be realized and/or increased. Through steps a) and b), the performance and/or reliability of emotion recognition can be improved by reference to a situation that triggers the emotional state. Furthermore, the user experience can be enhanced by better traceability of the voice assistant, particularly through step b), i.e., based on the context data. Overall, the aim is to enable empathetic system behavior of the voice assistant through understanding a situation and its emotional implications.

Zu der Erfindung gehören auch Weiterbildungen, durch die sich zusätzliche Vorteile ergeben.The invention also includes further developments which result in additional advantages.

Eine Weiterbildung sieht vor, dass Schritt a) beinhaltet, dass einlaufende Emotionssignale in vorgegebenen Zeitfenstern oder Zeitintervallen zu einem gemeinsamen, geglätteten Emotionswert fusioniert oder integriert werden, welcher den emotionalen Zustand des Nutzers quantifiziert. Es ist also vorgesehen, dass mehrere und/oder verschiedene Emotionssignale innerhalb eines festgelegten Zeitintervalls erfasst werden und diese dann als ein emotionaler Zustand des Nutzers zusammengefasst oder berechnet werden. Mit anderen Worten wird die Erfassung einer kurzzeitigen Emotionszustandsänderung des Nutzers realisiert, die anhand der mehreren Modalitäten wie einerseits Fahrzeugsensoren (z. B. mittels Kamera und/oder Mikrofon) und andererseits der kontextueller Analyse durch die Late Fusion Komponente in ein Ergebnis fusioniert werden. Beispielsweise kann der Sprachassistent eine emphatisch gefärbte oder auf den emotionalen Zustand abgestimmte Antwort auf eine jeweilige funktionale Anfrage Nutzer ausgeben. Z. B.: „Hey Audi, wann komme ich nun endlich an?“, wobei für den Nutzer dann die empathisch gefärbte Antwort vom Sprachassistenten ausgegeben wird: „Keine Sorge, du kommt rechtzeitig vor deinem Termin um 14:00 an“.A further development provides that step a) includes merging or integrating incoming emotion signals in predetermined time windows or time intervals to form a common, smoothed emotion value that quantifies the user's emotional state. It is therefore intended that several and/or different emotion signals are recorded within a specified time interval and then summarized or calculated as an emotional state of the user. In other words, the recording of a short-term change in the user's emotional state is realized, which is fused into a result using multiple modalities such as vehicle sensors (e.g., via camera and/or microphone) and contextual analysis by the late fusion component. For example, the voice assistant can output an emphatic response or a response tailored to the user's emotional state to a respective functional request. For example: “Hey Audi, when am I finally going to arrive?”, where the voice assistant then gives the user the empathetic answer: “Don’t worry, you’ll arrive in time for your appointment at 2:00 p.m.”

Durch die Kombination verschiedener Eingangssignale können umfassende Informationen über den emotionalen Zustand des Nutzers gewonnen werden. Die Verwendung von mehreren Eingangssignalen kann helfen, Fehlinterpretationen zu minimieren. Da Emotionen nuanciert sind, können die verschiedene Modalitäten, umfassend z. B. eine Video-Analyse, für eine visuelle Emotionserkennung, und/oder eine Audio-Analyse, für eine akustische Emotionserkennung, und/oder eine Text-Analyse, z. B. durch eine Spracherkennung für eine Sentiment-Analyse, dazu beitragen, subtile Unterschiede und/oder Feinheiten in der in der Art und Weise, wie ein emotionaler Zustand von dem Nutzer ausgedrückt wird, zu erfassen.By combining different input signals, comprehensive information about the user's emotional state can be obtained. Using multiple input signals can help minimize misinterpretation. Since emotions are nuanced, different modalities, including, for example, video analysis for visual emotion recognition, and/or audio analysis for acoustic emotion recognition, and/or text analysis, e.g., through speech recognition for sentiment analysis, can help capture subtle differences and/or nuances in the way an emotional state is expressed by the user.

Eine Weiterbildung sieht vor, dass Emotionssignale in vorgegebenen Zeitintervallen von mehreren Nutzern erfasst werden, die dann zu einem geglätteten Emotionsdurchschnittswert integriert werden, welcher den emotionalen Zustand der mehreren Nutzer als Gruppe quantifiziert. Es ist also vorgesehen, dass ein durchschnittlicher emotionaler Zustand von mehreren Nutzern berechnet wird, sodass zum Beispiel eine Stimmung und/oder Atmosphäre innerhalb dieser Gruppe quantifiziert wird. Das Mediensystem kann dann auf die kollektive Stimmung der Gruppe reagieren und einen entsprechenden Medieninhalt ausgeben und/oder eine Interaktion des Sprachassistenten mit der Gruppe entsprechend anpassen.A further development proposes that emotion signals be recorded from multiple users at specified time intervals, which are then integrated to form a smoothed average emotional value that quantifies the emotional state of the multiple users as a group. The idea is therefore to calculate an average emotional state of multiple users, so that, for example, a mood and/or atmosphere within this group is quantified. The media system can then react to the collective mood of the group and output appropriate media content and/or adapt the voice assistant's interaction with the group accordingly.

Eine Weiterbildung sieht vor, dass das Emotion-Monitoring beinhaltet, dass der emotionale Zustand des zumindest einen Nutzers mittels einer Videoanalyse und/oder einer Audioanalyse und/oder einer Textanalyse und/oder einer Sentiment-Analyse erfasst wird. Dies kann dazu beitragen, Mehrdeutigkeiten in der Erfassung des emotionalen Zustands zu reduzieren. Beispielsweise kann ein Satz des Nutzers, der auf dem ersten Blick neutral erscheint, durch die Analyse der Stimmlage und/oder des Gesichtsausdrucks des zumindest einen Nutzers als emotional geladen („Wut“) erkannt werden. Damit können also Wiedersprüche in der Erfassung des emotionalen Zustands wahrscheinlicher ausgeräumt werden.A further development provides that emotion monitoring includes capturing the emotional state of at least one user through video analysis, audio analysis, text analysis, and/or sentiment analysis. This can help reduce ambiguities in capturing the emotional state. For example, a sentence from the user that appears neutral at first glance can be recognized as emotionally charged ("anger") by analyzing the tone of voice and/or facial expression of the at least one user. This makes it more likely that inconsistencies in capturing the emotional state can be eliminated.

Eine Weiterbildung sieht vor, dass Schritt b) beinhaltet, dass bei Eintreten eines Ereignisses, das gemäß der Appraisal-Komponente auf einen erwarteten emotionalen Zustand hindeutet, ein vorgegebenes Zeitfenster und/oder Zeitintervall mit einer vorgegebenen Latenz, z.B. 100 bis 400 ms, und oder einer vorgegebenen Zeitdauer, z. B. eine bis zehn Sekunden oder eine bis drei Minuten, startet oder sich öffnet. Bei Überlappen oder Erkennen eines weiteren emotionalen Zustands und/oder eines Zustandswechsels des selben Nutzers mit diesem Zeitfenster wird der emotionale Zustand als Folge des Ereignisses, das vor dem entsprechenden emotionalen Zustand eingetreten ist, mit diesem verknüpft und/oder zumindest eine Komponente des Mediensystems gesteuert. Hierdurch ergibt sich der Vorteil, dass ein Ereignis, wie z. B. das Ausgeben eines Medieninhalts, insbesondere eines Musikstücks, anhand eines Zustandswechsels des emotionalen Zustands, mit dem ursprünglichen emotionalen Zustand zugeordnet wird. Eine zeitbasierte Zuordnung eines Ereignisses mit einem emotionalem Zustand lässt sich dadurch realisieren. Ähnlich zu dem bereits oben genannten Vollständigkeitskriterium, kann hier vorgesehen sein, dass der Schritt für eine vorgegebene Anzahl an Iterationen durchgeführt wird, um eine Verknüpfung des emotionalen Zustands mit dem Ereignis zu speichern und/oder zusammenzuführen. Daraus kann eine oder zumindest eine Steuerkonstellation oder ein sogenanntes Use-case-spezifisches Systemverhalten, z.B. ein anschließender Sprachdialog und/oder eine Ausgabe von lauter Musik gesteuert wird. Insofern ist es also möglich, verschiedenes, emotionsbasiertes Verhalten über einen gemeinsamen Mechanismus abzuhandeln. Insgesamt können also erkannte Emotionsintervalle, also ein emotionaler Zustand für eine vorgegebene oder bestimmte Zeitdauer, als Reaktionen auf Ereignisse interpretiert werden.A further development provides that step b) includes starting or opening a predefined time window and/or time interval with a predefined latency, e.g., 100 to 400 ms, and/or a predefined duration, e.g., one to ten seconds or one to three minutes, upon the occurrence of an event that, according to the appraisal component, indicates an expected emotional state. If another emotional state and/or a change in state of the same user overlaps or is detected with this time window, the emotional state is linked to the corresponding emotional state as a result of the event that occurred before the corresponding emotional state and/or at least one component of the media system is controlled. This provides the advantage that an event, such as the output of media content, in particular a piece of music, is associated with the original emotional state based on a change in the emotional state. A time-based association of an event with an emotional state can thus be realized. Similar to the completeness criterion mentioned above, it can be provided that the step is performed for a predefined number of iterations in order to save and/or merge a link between the emotional state and the event. This can lead to one or at least one control constellation or a so-called use-case-specific system behavior, e.g., a subsequent voice dialog and/or the output of loud music. In this respect, it is possible to handle various emotion-based behaviors via a common mechanism. Overall, recognized emotion intervals, i.e., an emotional state for a predefined or specific period of time, can be interpreted as reactions to events.

Erfindungsgemäß ist also vorgesehen, dass der Sprachassistent den Inhalt, z. B. die Art und Weise wie die Information übermittelt wird, dass sich ein Stau anbahnt und/oder die Stimmung seiner Sprachausgabe entsprechend eines validierten emotionalen Zustands adaptiert. Als Folge dessen, kann sich z. B. in dem Kraftfahrzeug der Fahrzeuginnenraum und/oder eine oder zumindest eine Komponente, wie z. B. eine Komfortfunktion, insbesondere eine Sitzeinstellung mit Massagefunktion dem validierten emotionalen Zustand adaptiv anpassen. Dabei soll entweder durch eine manuelle Einstellung und/oder durch ein Anlernen, z. B. mittels des maschinellen Lernmodells, ein aktueller emotionaler Zustand, z. B. „Freude“ unterstützt oder z. B. der emotionale Zustand „Wut abgeschwächt werden. Hierbei kann beispielsweise vorgesehen sein, dass bei Auswertung von Gesichtsausdrücken und/oder einer Mimik des Nutzers ein Zustandswechsel erfasst wird, z. B. von „Neutral“ zu einem emotionalen Zustand „Freude“. Bei Eintreten des Zustandswechsels können rückwirkend alle Ereignisse für eine vorgegebene Zeitdauer, z. B. 1 Sekunde bis 1 oder 3 Minuten, herangezogen werden, die den Zustandswechsel möglicherweise ausgelöst haben, sodass ein potenzieller Grund für den Zustandswechsel des emotionalen Zustands erfasst werden kann.According to the invention, the voice assistant adapts the content, e.g., the manner in which the information that a traffic jam is looming, and/or the mood of its voice output according to a validated emotional state. As a result, for example, in the motor vehicle, the vehicle interior room and/or one or at least one component, such as a comfort function, in particular a seat adjustment with a massage function, adaptively adapts to the validated emotional state. In doing so, either through manual adjustment and/or through training, e.g. using the machine learning model, a current emotional state, e.g. "joy", is to be supported or, for example, the emotional state "anger" is to be alleviated. For example, it can be provided that when evaluating facial expressions and/or mimicry of the user, a change in state is recorded, e.g. from "neutral" to an emotional state of "joy". When the change in state occurs, all events for a specified period of time, e.g. 1 second to 1 or 3 minutes, that may have triggered the change in state can be retrospectively used, so that a potential reason for the change in the emotional state can be recorded.

Eine Weiterbildung sieht vor, dass die Kontextdaten der Appraisal-Komponente beinhalten, dass ein situativer Kontext und oder zumindest ein Benutzerziel des zumindest einen Nutzers herangezogen werden.A further development provides that the context data of the appraisal component includes a situational context and/or at least one user goal of at least one user.

Eine Weiterbildung sieht vor, dass der situative Kontext als zumindest ein Ereignis eine Interaktionshistorie und oder eine Kalenderinformation und/oder eine Reiseinformation umfasst. Mit „Interaktionshistorie“ ist eine Historie zwischen dem Nutzer und dem Sprachassistenten gemeint, welche Anfragen oder Befehle, die der Nutzer dem Sprachdaten gestellt hat, aufzeigt und/oder Informationen, die der Nutzer über den Sprachassistenten abgerufen hat und/oder Einstellungen oder Anpassungen, die der Nutzer vorgenommen hat und/oder Korrekturen, die der Nutzer vorgenommen hat, wenn der Sprachassistent z. B. Schwierigkeiten hatte, eine Anfrage zu verstehen. Mit der „Kalenderinformation“ ist insbesondere ein Termin und/oder ein und/oder ein bevorstehendes Ereignis, wie z.B. ein anstehender Flug gemeint. Mit der „Reiseinformation“ ist z. B. ein aktueller oder bevorstehender erfasster Stau in der unmittelbaren Umgebung oder auf dem Weg zu einem Ziel des Nutzers gemeint. Des Weiteren kann damit ein Wetter und/oder eine Wettervorhersage gemeint sein.A further development provides that the situational context includes, as at least one event, an interaction history and/or calendar information and/or travel information. “Interaction history” refers to a history between the user and the voice assistant, which shows requests or commands the user has made to the voice assistant and/or information the user has retrieved via the voice assistant and/or settings or adjustments the user has made and/or corrections the user has made, for example, if the voice assistant had difficulty understanding a request. “Calendar information” refers in particular to an appointment and/or an upcoming event, such as an upcoming flight. “Travel information” refers, for example, to a current or upcoming detected traffic jam in the immediate vicinity or on the way to a user’s destination. Furthermore, it can refer to the weather and/or a weather forecast.

Eine Weiterbildung sieht vor, dass das zumindest eine Benutzerziel als zumindest ein Ereignis eine pünktliche Ankunft an einem Zielort und/oder eine stressfreie Fahrt und/oder eine einfache Bedienhandlung mit dem Mediensystem umfasst. In technischer Hinsicht bedeutet „einfach“, dass der Nutzer auf eine unkomplizierte Weise mit dem Sprachassistenten interagieren kann, insbesondere durch die Nutzung natürlicher Sprache oder minimaler Benutzereingaben, z. B. eine bis drei. Der Sprachassistent kann also so ausgestaltet sein, dass er auf die Sprachbefehle reagiert, ohne dass der Nutzer durch komplexe Menüstrukturen navigieren muss. Grundsätzlich sind mit „dem Nutzer“ ein oder mehrere Nutzer gemeint. Diese Ereignisse sind im Gegensatz zu den Ereignissen des situativen Kontext als unveränderliche bzw. nicht-variable Größen bereitgestellt. Es kann also automatisch davon ausgegangen werden, dass der Nutzer mindestens eines dieser Benutzerziele einhalten möchte und/oder sich damit identifiziert.A further development provides that the at least one user goal includes, as at least one event, a punctual arrival at a destination and/or a stress-free journey and/or a simple operating action with the media system. In technical terms, "simple" means that the user can interact with the voice assistant in a straightforward manner, in particular through the use of natural language or minimal user inputs, e.g., one to three. The voice assistant can therefore be designed to respond to voice commands without the user having to navigate through complex menu structures. Basically, "the user" refers to one or more users. In contrast to the events of the situational context, these events are provided as unchangeable or non-variable variables. It can therefore be automatically assumed that the user wishes to adhere to at least one of these user goals and/or identifies with it.

Eine Weiterbildung sieht vor, dass ein zu validierender emotionaler Zustand einer von drei Kategorien zugeordnet wird: Freude, Wut oder Neutral und jeder Kategorie ein oder mehrere Steuerungskonstellationen zum Steuern der zumindest einen Komponente zugeordnet sind. Wie bereits erwähnt kann dies dadurch realisiert sein, dass bei einem erfassten emotionalen Zustand der Nutzer zunächst nach einer konkreten auszugebenden Steuerungskonstellation gefragt wird und/oder diese automatisch ausgegeben wird. Damit kann beispielsweise ein positiver Gemütszustand, z. B. Freude, des Nutzers erhalten bleiben und/oder ein negativer emotionaler Zustand, wie z.B. Wut, zumindest abgeschwächt werden.A further development provides for an emotional state to be validated to be assigned to one of three categories: joy, anger, or neutral, and for each category to be assigned one or more control constellations for controlling at least one component. As already mentioned, this can be achieved by first asking the user for a specific control constellation to be output when an emotional state is detected and/or by automatically outputting this. This can, for example, maintain a positive emotional state of the user, e.g., joy, and/or at least mitigate a negative emotional state, such as anger.

Mittels des Mediensystem können unterschiedliche Komponenten zum Ausdrücken einer emphatischen Reaktion angesteuert werden. Im Folgenden sind Beispiele genannt. Eine Weiterbildung sieht vor, dass die zumindest eine Komponente als ein Audiosystem und/oder eine Beleuchtungsvorrichtung und/oder als ein Sitz ausgestaltet ist. Mindestens eine diese Komponenten kann also durch das Mediensystem bei Erfassen eines validierten emotionalen Zustands entsprechend gesteuert werden.Using the media system, various components can be controlled to express an emphatic response. Examples are listed below. A further development provides for at least one component to be configured as an audio system and/or a lighting device and/or a seat. At least one of these components can thus be controlled accordingly by the media system upon detection of a validated emotional state.

Für Anwendungsfälle oder Anwendungssituationen, die sich bei dem Verfahren ergeben können und die hier nicht explizit beschrieben sind, kann vorgesehen sein, dass gemäß dem Verfahren eine Fehlermeldung und/oder eine Aufforderung zur Eingabe einer Nutzerrückmeldung ausgegeben und/oder eine Standardeinstellung und/oder ein vorbestimmter Initialzustand eingestellt wird.For use cases or application situations that may arise during the method and which are not explicitly described here, it may be provided that, in accordance with the method, an error message and/or a request to enter user feedback is issued and/or a default setting and/or a predetermined initial state is set.

Zu der Erfindung gehört auch das Mediensystem. Das Mediensystem kann eine Datenverarbeitungsvorrichtung oder eine Prozessoreinrichtung aufweisen, die dazu eingerichtet ist, eine Ausführungsform des erfindungsgemäßen Verfahrens durchzuführen. Die Prozessoreinrichtung kann hierzu zumindest einen Mikroprozessor und/oder zumindest einen Mikrocontroller und/oder zumindest einen FPGA (Field Programmable Gate Array) und/oder zumindest einen DSP (Digital Signal Processor) aufweisen. Als Mikroprozessor kann insbesondere jeweils eine CPU (Central Processing Unit), eine GPU (Graphical Processing Unit) oder eine NPU (Neural Processing Unit) verwendet werden. Des Weiteren kann die Prozessoreinrichtung Programmcode aufweisen, der dazu eingerichtet ist, bei Ausführen durch die Prozessoreinrichtung die Ausführungsform des erfindungsgemäßen Verfahrens durchzuführen. Der Programmcode kann in einem Datenspeicher der Prozessoreinrichtung gespeichert sein. Die Prozessoreinrichtung kann z.B. auf zumindest einer Schaltungsplatine und/oder auf zumindest einem SoC (System on Chip) basieren.The invention also includes the media system. The media system can comprise a data processing device or a processor device configured to carry out an embodiment of the method according to the invention. For this purpose, the processor device can comprise at least one microprocessor and/or at least one microcontroller and/or at least one FPGA (Field Programmable Gate Array) and/or at least one DSP (Digital Signal Processor). The microprocessor can be, in particular, In particular, a CPU (Central Processing Unit), a GPU (Graphical Processing Unit), or an NPU (Neural Processing Unit) may be used. Furthermore, the processor device may comprise program code configured to implement the embodiment of the method according to the invention when executed by the processor device. The program code may be stored in a data memory of the processor device. The processor device may be based, for example, on at least one circuit board and/or on at least one SoC (System on Chip).

Zu der Erfindung gehören auch Weiterbildungen des erfindungsgemäßen Mediensystems, die Merkmale aufweisen, wie sie bereits im Zusammenhang mit den Weiterbildungen des Verfahrens beschrieben worden sind. Aus diesem Grund sind die entsprechenden Weiterbildungen hier nicht noch einmal beschrieben.The invention also includes further developments of the media system according to the invention that have features already described in connection with the further developments of the method. For this reason, the corresponding further developments are not described again here.

Das Mediensystem kann in einem Kraftfahrzeug umfasst sein. Das erfindungsgemäße Kraftfahrzeug ist bevorzugt als Kraftwagen, insbesondere als Personenkraftwagen oder Lastkraftwagen, oder als Personenbus oder Motorrad ausgestaltet.The media system can be incorporated into a motor vehicle. The motor vehicle according to the invention is preferably configured as a motor vehicle, in particular as a passenger car or truck, or as a passenger bus or motorcycle.

Als eine weitere Lösung umfasst die Erfindung auch ein computerlesbares Speichermedium, umfassend Programmcode, der bei der Ausführung durch einen Computer oder einen Computerverbund diesen veranlasst, eine Ausführungsform des erfindungsgemäßen Verfahrens auszuführen. Das Speichermedium kann zumindest teilweise als ein nicht-flüchtiger Datenspeicher (z.B. als eine Flash-Speicher und/oder als SSD - solid state drive) und/oder zumindest teilweise als ein flüchtiger Datenspeicher (z.B. als ein RAM - random access memory) bereitgestellt sein. Das Speichermedium kann in dem Computer oder Computerverbund angeordnet sein. Das Speichermedium kann aber auch beispielsweise als sogenannter Appstore-Server und/oder Cloud-Server im Internet betrieben sein. Durch den Computer oder Computerverbund kann eine Prozessorschaltung mit beispielsweise zumindest einem Mikroprozessor bereitgestellt sein. Der Programmcode kann als Binärcode und/oder als Assembler-Code und/oder als Quellcode einer Programmiersprache (z.B. C) und/oder als Programmskript (z.B. Python) bereitgestellt sein.As a further solution, the invention also encompasses a computer-readable storage medium comprising program code which, when executed by a computer or computer network, causes the computer or computer network to execute an embodiment of the method according to the invention. The storage medium can be provided at least partially as a non-volatile data memory (e.g., as a flash memory and/or as an SSD - solid state drive) and/or at least partially as a volatile data memory (e.g., as a RAM - random access memory). The storage medium can be arranged in the computer or computer network. However, the storage medium can also be operated, for example, as a so-called app store server and/or cloud server on the Internet. The computer or computer network can provide a processor circuit with, for example, at least one microprocessor. The program code can be provided as binary code and/or as assembly code and/or as source code of a programming language (e.g., C) and/or as a program script (e.g., Python).

Die Erfindung umfasst auch die Kombinationen der Merkmale der beschriebenen Ausführungsformen. Die Erfindung umfasst also auch Realisierungen, die jeweils eine Kombination der Merkmale mehrerer der beschriebenen Ausführungsformen aufweisen, sofern die Ausführungsformen nicht als sich gegenseitig ausschließend beschrieben wurden.The invention also encompasses combinations of the features of the described embodiments. The invention therefore also encompasses implementations that each comprise a combination of the features of several of the described embodiments, unless the embodiments are described as mutually exclusive.

Im Folgenden sind Ausführungsbeispiele der Erfindung beschrieben. Hierzu zeigt:

1 eine schematische Darstellung einer erfindungsgemäßen Ausführungsform;
2 eine weitere schematische Darstellung einer erfindungsgemäßen Ausführungsform; und
3 eine technische Umsetzung zur Durchführung der erfindungsgemäßen Ausführungsformen.

Exemplary embodiments of the invention are described below. Shown are:

1 a schematic representation of an embodiment according to the invention;
2 a further schematic representation of an embodiment according to the invention; and
3 a technical implementation for carrying out the embodiments according to the invention.

Bei den im Folgenden erläuterten Ausführungsbeispielen handelt es sich um bevorzugte Ausführungsformen der Erfindung. Bei den Ausführungsbeispielen stellen die beschriebenen Komponenten der Ausführungsformen jeweils einzelne, unabhängig voneinander zu betrachtende Merkmale der Erfindung dar, welche die Erfindung jeweils auch unabhängig voneinander weiterbilden. Daher soll die Offenbarung auch andere als die dargestellten Kombinationen der Merkmale der Ausführungsformen umfassen. Des Weiteren sind die beschriebenen Ausführungsformen auch durch weitere der bereits beschriebenen Merkmale der Erfindung ergänzbar.The exemplary embodiments explained below are preferred embodiments of the invention. In the exemplary embodiments, the described components of the embodiments each represent individual features of the invention that can be considered independently of one another, each of which also develops the invention independently of one another. Therefore, the disclosure is intended to encompass combinations of the features of the embodiments other than those shown. Furthermore, the described embodiments can also be supplemented by further features of the invention already described.

In den Figuren bezeichnen gleiche Bezugszeichen jeweils funktionsgleiche Elemente.In the figures, the same reference symbols designate elements with the same function.

1 zeigt eine schematische Darstellung einer erfindungsgemäßen Ausführungsform gemäß der Idee. Gezeigt ist das bereits genannte Late Fusion 2, was das Emotion-Monitoring 6, die Appraisal-Komponente 7 sowie die Plausibilisierung 8 aufweist. Das Emotion-Monitoring 6 kann einen emotionalen Zustand 9 zumindest eines Nutzers sensorbasiert, also anhand von physischen Merkmalen des zumindest einen Nutzers erfassen. Aus der Appraisal-Komponente 7 kann ein erwarteter emotionaler Zustand 9 des zumindest einen Nutzers auf Grundlage von Kontextdaten, die aus einem oder mehreren Ereignissen 3 stammen, prädiziert 1 werden, wobei jeweils Beobachtungsdaten erzeugt werden können. Die Beobachtungsdaten können aus dem Emotion-Monitoring 6 und der Appraisal-Komponente 7 in einer Plausibilisierung 8 zusammengeführt werden, wobei die Plausibilisierung 8 den emotionalen Zustand 9 des zumindest einen Nutzers validiert 4. Der validierte 4 emotionale Zustand 9 kann den Zustand „Neutral“, „Freude“ oder „Wut“ umfassen. Das Late Fusion 2 kann in Verbindung zu einem Sprachassistenten stehen, der dazu ausgestaltet ist mit zumindest einem Nutzer einen Dialog zu führen. Des Weiteren kann das Late Fusion 2 über ein Semantic Blackboard aus einem oder mehrerer Cloud-Services Informationen und/oder Kontextparameter wie z. B. Termine extrahieren. 1 shows a schematic representation of an embodiment of the invention according to the idea. Shown is the aforementioned Late Fusion 2, which has the emotion monitoring 6, the appraisal component 7, and the plausibility check 8. The emotion monitoring 6 can detect an emotional state 9 of at least one user based on sensors, i.e., based on physical characteristics of the at least one user. From the appraisal component 7, an expected emotional state 9 of the at least one user can be predicted 1 based on context data originating from one or more events 3, wherein observation data can be generated in each case. The observation data from the emotion monitoring 6 and the appraisal component 7 can be combined in a plausibility check 8, wherein the plausibility check 8 validates 4 the emotional state 9 of the at least one user. The validated 4 emotional state 9 can include the state "neutral,""joy," or "anger." The Late Fusion 2 can be connected to a voice assistant designed to conduct a dialogue with at least one user. Furthermore, the Late Fusion 2 can extract information and/or context parameters, such as appointments, from one or more cloud services via a semantic blackboard.

3 zeigt eine weitere schematische Darstellung einer erfindungsgemäßen Ausführungsform der Idee. Hierbei ist die Appraisal-Komponente 7 näher veranschaulicht. Die Appraisal-Komponente 7 umfasst einen situativen Kontext 5, welcher als Ereignis 3 eine Interaktionshistorie und/oder eine Kalenderinformation und/oder eine Reiseinformation aufweist. Des Weiteren beinhaltet die Appraisal-Komponente 7 zumindest ein Benutzerziel 11, welche als Ereignis 3 eine pünktliche Ankunft an einem Zielort und/oder eine stressfreie Fahrt und/oder eine einfache Bedienhandlung mit dem Mediensystem aufweist. Beispielsweise kann die Interaktionshistorie zuvor erfasste emotionale Zustände 9 basierend auf Interaktionen des Nutzers mit dem Sprachassistenten aufweisen. Eine Reiseinformation kann zum Beispiel einen potentiellen Stau und/oder eine erwartete Ankunftszeit beschreiben. Wie bereits beschrieben hat die Appraisal-Komponente 7 die Funktion einen vom Emotion-Monitoring 6 erfassten emotionalen Zustand 9 zu bestätigen. Wenn das Emotion-Monitoring 6 ein unsicheres oder vages Ergebnis wiedergibt, also zum Beispiel Wut 30 %, Freude 30 %, Neutral 40 %, so kann vorgesehen sein, dass die Gewichtung der Appraisal-Komponente 7 erhöht, z. B. verdoppelt wird, sodass ein zuverlässiges Ergebnis zustande kommt, wie z. B. Wut 0 %, Freude 60%, Neutral 40 %. 3 shows a further schematic representation of an inventive embodiment of the idea. Here, the appraisal component 7 is illustrated in more detail. The appraisal component 7 comprises a situational context 5, which has as event 3 an interaction history and/or calendar information and/or travel information. Furthermore, the appraisal component 7 contains at least one user goal 11, which has as event 3 a punctual arrival at a destination and/or a stress-free journey and/or a simple operating action with the media system. For example, the interaction history can have previously recorded emotional states 9 based on interactions of the user with the voice assistant. Travel information can, for example, describe a potential traffic jam and/or an expected arrival time. As already described, the appraisal component 7 has the function of confirming an emotional state 9 recorded by the emotion monitoring 6. If the emotion monitoring 6 returns an uncertain or vague result, for example anger 30%, joy 30%, neutral 40%, the weighting of the appraisal component 7 can be increased, e.g. doubled, so that a reliable result is achieved, such as anger 0%, joy 60%, neutral 40%.

3 zeigt eine technische Umsetzung zur Durchführung des erfindungsgemäßen Verfahrens. Gezeigt ist wie aus einer Video-Analyse 13 und/oder Audio-Analyse 14 und/oder Text-Analyse 15 das Emotion-Monitoring 6 realisiert werden kann. In einem Zwischenschritt, also bevor die Emotionssignale 22 an das Emotion-Monitoring 6 weitergeleitet werden, kann ein Early Fusion durchgeführt werden. Hierzu zeigt das Diagramm, dass bei Eintreten eines Ereignisses 3, das gemäß der Appraisal-Komponente 7 auf einen erwarteten emotionalen Zustand 9 hindeutet, ein vorgegebenes Zeitfenster 18 mit einer vorgegebenen Latenz 17 und/oder einen vorgegebenen Zeitdauer startet. Bei Überlappen eines weiteren emotionalen Zustands 9 und/oder eines Zustandswechsels desselben Nutzers mit diesem Zeitfenster 18 kann der emotionale Zustand 9 als Folge des Ereignisses 3 verknüpft werden und/oder zumindest eine Komponente gesteuert werden. 3 shows a technical implementation for carrying out the method according to the invention. It is shown how the emotion monitoring 6 can be realized from a video analysis 13 and/or audio analysis 14 and/or text analysis 15. In an intermediate step, i.e. before the emotion signals 22 are forwarded to the emotion monitoring 6, an early fusion can be carried out. For this purpose, the diagram shows that when an event 3 occurs which, according to the appraisal component 7, indicates an expected emotional state 9, a predetermined time window 18 with a predetermined latency 17 and/or a predetermined time duration starts. If another emotional state 9 and/or a state change of the same user overlaps with this time window 18, the emotional state 9 can be linked as a result of the event 3 and/or at least one component can be controlled.

Beispielsweise kann vorgesehen sein, dass Musik abgespielt wird, der Nutzer daraufhin lächelt, also den emotionalen Zustand 9 „Freude“ ausdrückt und bei Erkennen dieses emotionalen Zustands 9 das Zeitfenster 18 gestartet wird, um das den emotionalen Zustand 9 auslösende Ereignis 3 mit der erfassten emotionalen Reaktion, also den emotionalen Zustand 9 zu verknüpfen. Daraufhin kann der Sprachassistent den Nutzefragen, ob ihm die abgespielte Musik und/oder das Lied gefällt, sodass die Verknüpfung von ihm bestätigt werden kann. Die Verknüpfung kann auch als Stimulus-Affekt-Paar bezeichnet werden.For example, it can be planned that music is played, the user then smiles, thus expressing the emotional state 9 "joy," and upon detection of this emotional state 9, the time window 18 is started to link the event 3 triggering the emotional state 9 with the detected emotional reaction, i.e., the emotional state 9. The voice assistant can then ask the user whether they like the playing music and/or the song, so that they can confirm the link. This link can also be referred to as a stimulus-affect pair.

Beispielsweise sieht eine konkrete Ausführungsform im Kraftfahrzeug vor, dass der Nutzer sich in seinem Kraftfahrzeug befindet und vom Sprachassistenten mitgeteilt bekommt, dass er in Kürze in einen Stau gerät. Der Nutzer fragt dann den Sprachassistenten wann er am Zielort wäre, wobei der Sprachassistent dann den emotionalen Zustand 9 „Wut“ beim Nutzer erfasst und der Sprachassistent daraufhin empathisch auf die Wut des Nutzers reagiert und/oder die Interaktion entsprechend anpasst, um den emotionalen Zustand 9 des Nutzers abzuschwächen und/oder zu verbessern.For example, a specific embodiment in a motor vehicle provides that the user is in their vehicle and is informed by the voice assistant that they will soon be stuck in a traffic jam. The user then asks the voice assistant when they will reach their destination. The voice assistant then detects the user's emotional state 9 "anger" and then responds empathetically to the user's anger and/or adapts the interaction accordingly to mitigate and/or improve the user's emotional state 9.

Metriken wie zum Beispiel Precision, Recall, Accuracy und/oder F1 können zur Erfassung des emotionalen Zustands 9 entsprechend verwendet und/oder genutzt werden.Metrics such as precision, recall, accuracy and/or F1 can be used and/or utilized to capture the emotional state 9.

Im Emotion-Monitoring 6 können also alle einlaufende Emotionssignale 22 in festen Zeitfenstern 18 zu einem gemeinsamen, geglätteten Emotionswert integriert werden. Durch das Wechseln von einer Emotion in eine andere können damit Emotionsintervalle entstehen, und mit einem oder mehreren jeweiligen Ereignissen 3 verknüpft werden. Das Emotion-Monitoring 6 kann mit konfigurierbaren Parametern wie „Zeitfenster“ 18, „Schwellwert“, „Anstiegsverhalten“ und/oder „Abklingparameter“ durchgeführt werden. Zusätzlich oder alternativ können die Parameter „unsichere Vorhersage“ sowie „stabile Erkennung“ herangezogen werden, wenn z. B: die Emotionssignale 22 der jeweiligen Modalitäten 13, 14, 15 über oder eben unter einem vorgegebenen Schwellwert liegen. Insgesamt kann also ein etwaiger zeitlicher Verlauf der erfassten Emotion, also des emotionalen Zustands 9, innerhalb einer Äußerung des Nutzers, die z. B. einen emotionalen Zustand 9 ausdrückt, bereits durch eine vorgelagerte Verarbeitung zu einem Emotionswert gemittelt werden.In emotion monitoring 6, all incoming emotion signals 22 can be integrated into fixed time windows 18 to form a common, smoothed emotion value. By switching from one emotion to another, emotion intervals can be created and linked to one or more respective events 3. Emotion monitoring 6 can be carried out with configurable parameters such as "time window" 18, "threshold," "rise behavior," and/or "decay parameter." Additionally or alternatively, the parameters "uncertain prediction" and "stable detection" can be used if, for example, the emotion signals 22 of the respective modalities 13, 14, 15 are above or below a predetermined threshold. Overall, any temporal progression of the detected emotion, i.e., the emotional state 9, within a user utterance that expresses, for example, an emotional state 9, can already be averaged into an emotion value through upstream processing.

Durch die Appraisal-Komponente 7 kann die Ursache einer Emotion oder eines emotionalen Zustands 9 verstanden werden. Damit kann eine Emotionserwartung anhand vom situativen Kontext 5 oder situativen Umfeld und (potenziellen) Benutzerzielen 11 des Nutzers entwickelt werden. Somit kann versucht werden, anhand des situativen Kontextes 5 und/oder der Benutzerziele 11 eine Erklärung für den möglicherweise vorliegenden emotionalen Zustand 9 zu ermitteln. Ein einfaches, anschauliches Beispiel ist die Erwartung, dass Autofahrer nicht gerne im Stau stehen und somit, wenn der Nutzer dann im Stau steht, wahrscheinlich schlecht gelaunt ist und/oder den emotionalen Zustand 9 „Wut“ aufweist. Dabei kann es jeweils auf andere Ereignisse 3 des situativen Kontexts 5 und/oder der Benutzerziele 11 ankommen, um einen erwarteten emotionalen Zustand 9 zu prädizieren 1. Zum Beispiel ist die Information „Stau“ wichtig, wenn vom Nutzer nach Informationen zur Ankunft und/oder Fahrzeit gefragt wird. Eine einfache Bedienbarkeit ist aber viel relevanter, wenn der Nutzer bestimmte Aktionen mit dem Sprachassistenten durchführen möchte, z. B. mit einem Kontakt aus dem Adressbuch telefonieren. Die Realisierung der Appraisal-Komponente 7 ist mit verschiedenen Methoden der Künstlichen Intelligenz, insbesondere durch ein oder mehrere Sprachmodelle möglich. In der Idee kann grundsätzlich vorgesehen sein, dass ein Mechanismus auf der Basis von Inferenzregeln eingesetzt wird.The appraisal component 7 allows for the understanding of the cause of an emotion or emotional state 9. This allows for the development of an emotional expectation based on the situational context 5 or situational environment and (potential) user goals 11 of the user. Thus, an attempt can be made to determine an explanation for the potentially present emotional state 9 based on the situational context 5 and/or the user goals 11. A simple, illustrative example is the expectation that drivers do not like being stuck in traffic and thus, when the user is stuck in a traffic jam, is likely to be in a bad mood and/or exhibits the emotional state 9 "anger." This can be due to other events 3 of the situational context 5 and/or of the user goals 11 in order to predict an expected emotional state 9 1. For example, the information "traffic jam" is important if the user requests information about arrival and/or travel time. However, ease of use is much more relevant if the user wants to perform certain actions with the voice assistant, e.g., make a phone call to a contact from the address book. The appraisal component 7 can be implemented using various artificial intelligence methods, in particular using one or more language models. The idea can generally envisage the use of a mechanism based on inference rules.

Die Plausibilisierung 8 kann den emotionalen Zustand 9 des Nutzers, anhand des erfassten emotionalen Zustands 9 aus dem Emotion-Monitoring 6 und dem zu prädizierenden 1 emotionalen Zustand 9 aus der Appraisal-Komponente 7 ermitteln und/oder validieren 4, welche dann an den Sprachassistenten geroutet werden kann.The plausibility check 8 can determine and/or validate 4 the emotional state 9 of the user based on the recorded emotional state 9 from the emotion monitoring 6 and the emotional state 9 to be predicted from the appraisal component 7, which can then be routed to the voice assistant.

Im Rahmen z. B. eines definierten use-case oder Anwendungsfall spezifischen Systemverhaltens, kann die Interaktion des Sprachassistenten mit dem Nutzer basierend auf den detektierten emotionalen Zustand 9 angepasst werden. So kann z. B. der Sprachassistent den Inhalt und/oder die Stimmung von Sprachausgaben entsprechend des emotionalen Zustands 9 adaptieren. Des Weiteren können sich der Fahrzeuginnenraum sowie die Fahrzeugkomfortfunktionen adaptiv anpassen. Dabei soll entweder einstellbar oder gelernt die aktuelle Emotion unterstützt oder entgegengesetzt adaptiert werden. Der Sprachassistent kann ein maschinelles Lernmodell, insbesondere ein künstliches neuronales Netzwerk aufweisen und kann sowohl cloudbasiert, also aus Cloud-Service, und/oder aus einer Gesamtpopulation von Nutzern und/oder auch aus dem individuellen Fahrer oder Nutzer selbst lernen, welche Situationen oder Kontexte, also welche Ereignisse 3, potenziell emotionsauslösend sein können. Weiterhin kann der Sprachassistent vom individuellen Nutzer lernen, welche Modalitäten 13, 14, 15 besonders aussagekräftig zur Detektion seines emotionalen Zustands 9 sind. Dies kann dann im Emotion-Monitoring 6 entsprechend angepasst werden, sodass z. B. eine Gewichtung einer Modalität 13, 14, 15 erhöht, z. B. verdoppelt, oder erniedrigt, z. B. halbiert, wird, wie z. B. der Audio-Analyse 14. Der Sprachassistent kann eine Intensität (engl. Arousal) eines emotionalen Ausdrucks, also eines emotionalen Zustands 9, erkennen und/oder prädizieren 1, basierend auf der Stärke des emotionalen Ausdrucks als auch basierend auf der „Kritikalität/Intensität“ des Ereignisses 3. Die Klassifikation der Intensität kann eine angemessene Systemreaktion unterstützen, also eine entsprechende Steuerung von zumindest einer Komponente veranlassen.Within the framework of, for example, a defined use case or application-specific system behavior, the interaction of the voice assistant with the user can be adapted based on the detected emotional state 9. For example, the voice assistant can adapt the content and/or mood of voice outputs according to the emotional state 9. Furthermore, the vehicle interior and the vehicle comfort functions can adapt adaptively. In this case, the current emotion should be supported or adapted in the opposite way, either adjustable or learned. The voice assistant can have a machine learning model, in particular an artificial neural network, and can learn both cloud-based, i.e. from a cloud service, and/or from an entire population of users and/or from the individual driver or user themselves, which situations or contexts, i.e. which events 3, can potentially trigger emotions. Furthermore, the voice assistant can learn from the individual user which modalities 13, 14, 15 are particularly meaningful for detecting their emotional state 9. This can then be adjusted accordingly in the emotion monitoring 6, so that, for example, a weighting of a modality 13, 14, 15 is increased, e.g. doubled, or decreased, e.g. halved, such as in the audio analysis 14. The voice assistant can recognize and/or predict 1 an intensity (arousal) of an emotional expression, i.e. an emotional state 9, based on the strength of the emotional expression as well as based on the “criticality/intensity” of the event 3. The classification of the intensity can support an appropriate system response, i.e. initiate appropriate control of at least one component.

Insgesamt zeigen die Beispiele, wie eine multimodale Emotionserkennung mit einem Late Fusion 2 oder kontextuellem Emotionsmodel bereitgestellt werden kann.Overall, the examples demonstrate how multimodal emotion recognition can be provided using a Late Fusion 2 or contextual emotion model.

BezugszeichenlisteList of reference symbols

11: prädiziertpredicted
22: Late FusionLate Fusion
33: EreignisEvent
44: validiertvalidated
55: situativer Kontextsituational context
66: Emotion-MonitoringEmotion monitoring
77: Appraisal-KomponenteAppraisal component
88: PlausibilisierungPlausibility check
99: emotionaler Zustandemotional state
1111: BenutzerzieleUser goals
1313: Video-AnalyseVideo analysis
1414: Audio-AnalyseAudio analysis
1515: Text-AnalyseText analysis
1717: Latenzlatency
1818: ZeitfensterTime window
1919: Early-FusionEarly Fusion
2222: EmotionssignaleEmotional signals

ZITATE ENTHALTEN IN DER BESCHREIBUNGQUOTES CONTAINED IN THE DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of documents submitted by the applicant was generated automatically and is included solely for the convenience of the reader. This list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte PatentliteraturCited patent literature

GB 2607086A [0003]
DE 10 2005 058 227 A1 [0005]

Claims

A method for the context-related detection of an emotional state in a motor vehicle using a media system, comprising the steps performed by the media system: a) detecting an emotional state (9) of at least one user using sensor-based emotion monitoring (6), b) using an appraisal component (7) from which an expected emotional state (9) of the at least one user is predicted (1) based on context data, wherein observation data is generated in each of a) and b), c) combining the observation data from the emotion monitoring (6) and the appraisal component (7) in a plausibility check (8), wherein the plausibility check (8) validates (4) the emotional state (9) of the at least one user, d) controlling at least one component by the media system according to the validated (4) emotional state (9).

Procedure according to Claim 1 , wherein step a) includes merging incoming emotion signals (22) in predetermined time windows (18) to form a common, smoothed emotion value which quantifies the emotional state (9) of the user.

Method according to one of the preceding claims, wherein emotion signals (22) are detected in predetermined time windows (18) from a plurality of users, which are then integrated to form a smoothed emotion average value which quantifies the emotional state (9) of the plurality of users as a group.

Method according to one of the preceding claims, wherein the emotion monitoring (6) includes detecting the emotional state (9) of the at least one user by means of a video analysis (13) and/or an audio analysis (14) and/or a text analysis (15) and/or a sentiment analysis.

Method according to one of the preceding claims, wherein step b) includes: Upon the occurrence of an event (3) that, according to the appraisal component (7), indicates an expected emotional state (9), the predefined time window (18) starts with a predefined latency (17) and/or a predefined time duration, and upon overlapping of another emotional state (9) and/or a state change of the same user with this time window (18): Linking the emotional state (9) as a consequence of the event (3) that occurred before the corresponding emotional state (9) and/or controlling at least one component.

Method according to one of the preceding claims, wherein the context data of the appraisal component (7) include that a situational context (5) and/or at least one user goal (11) of the at least one user is used.

Procedure according to Claim 6 , wherein the situational context comprises as at least one event (3) an interaction history and/or calendar information and/or travel information.

Procedure according to Claim 6 or 7 , wherein the at least one user goal (11) comprises as at least one event (3) a punctual arrival at a destination and/or a stress-free journey and/or a simple operating action with the media system.

Method according to one of the preceding claims, wherein an emotional state (9) to be validated (4) is assigned to one of three categories: anger, joy or neutral and each category is assigned one or more control constellations for controlling the at least one component.

Method according to one of the preceding claims, wherein the at least one component is designed as an audio system and/or a lighting device and/or a seat.

A media system, the media system comprising a processor device having program instructions which, when executed by the processor device, cause the processor device to perform a method according to any one of the preceding method claims.

Motor vehicle, comprising a media system according to Claim 11 .