WO2020206178A1 - Commande de synchronisation de dialogue dans des dialogues de dépistage sanitaire pour une modélisation améliorée du discours de réponse - Google Patents
Commande de synchronisation de dialogue dans des dialogues de dépistage sanitaire pour une modélisation améliorée du discours de réponse Download PDFInfo
- Publication number
- WO2020206178A1 WO2020206178A1 PCT/US2020/026472 US2020026472W WO2020206178A1 WO 2020206178 A1 WO2020206178 A1 WO 2020206178A1 US 2020026472 W US2020026472 W US 2020026472W WO 2020206178 A1 WO2020206178 A1 WO 2020206178A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- patient
- logic
- words
- question
- response
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/20—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/0002—Remote monitoring of patients using telemetry, e.g. transmission of vital signals via a communication network
- A61B5/0015—Remote monitoring of patients using telemetry, e.g. transmission of vital signals via a communication network characterised by features of the telemetry system
- A61B5/0022—Monitoring a patient using a global network, e.g. telephone networks, internet
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/16—Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
- A61B5/165—Evaluating the state of mind, e.g. depression, anxiety
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
Definitions
- the present invention relates generally to health screening systems and, more particularly, to a computer-implemented health screening tool with significantly improved accuracy and efficacy.
- questions asked of a human patient in screening for one or more health states are selected according to the quality of the questions.
- the quality of a question may be a measure of the informativeness of responses to the question.
- the selected questions may be posed to the patient and an audiovisual signal of the patient’s spoken response may be captured and analyzed by a number of linguistic, acoustic, and/or visual models for indications of the health state. Dishonest or unemotional responses may not reveal much in these models. Such is also true for short and repetitive responses.
- the quality of a spoken response in terms of its informativeness of the patient’s health state, is not evenly distributed along the entire length of the response. Instead, early portions of the response may be of relatively low quality as are later portions. The best quality of a response tends to be in the portion corresponding to words 80-120 of the response.
- Health screening logic may pose the questions to the patient and capture the spoken responses.
- the health screening logic can control the flow of the dialogue to maximize informativeness of the patient’s responses overall. To do that, the health screening logic can gently and politely interrupt a patient’s response when allowing the patient’s current response to continue is not as informative as proceeding to ask the patient another question. Since the earlier portions of a spoken response may tend to be less informative, the health screening logic does not interrupt the patient during those early portions so as to allow the patient to continue speaking into portions that are most informative. In addition, the health screening logic can interrupt the patient’s response when the most informative portions of the response have passed and been captured and the informativeness of the response degrades.
- audiovisual refers audio, video, or both. Accordingly, an audiovisual signal can be a signal that is audio only, video only, or both audio and video.
- the present disclosure provides a method for screening a subject for a health state, comprising: (a) presenting a question to the subject in an audiovisual format, wherein the question is configured to elicit a response from the subject, wherein the response is indicative of the health state of the subject; (b) receiving audiovisual data representing the response by the subject at least until the response has a predetermined minimum duration; (c) during the receiving, determining that the response has reached a duration that exceeds a predetermined maximum threshold; and (d) in response to (c), presenting an additional question to the subject in an audiovisual format notwithstanding continuation of the receiving of the audiovisual data.
- the predetermined minimum threshold represents a number of words spoken by the subject. In some embodiments, the number of words is in the range of 60- 100 words. In some embodiments, the number of words is about 80 words.
- the predetermined maximum threshold represents a number of words spoken by the human subject. In some embodiments, the number of words is in the range of 100-150 words. In some embodiments, the number of words is about 120 words.
- Another aspect of the present disclosure provides a non-transitory computer readable medium comprising machine executable code that, upon execution by one or more computer processors, implements any of the methods above or elsewhere herein.
- Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto.
- the computer memory comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein.
- Figure 1 shows a health screening system in which a health screening server and a clinical data server computer system cooperate to estimate a health state of a patient in accordance with the present invention
- Figure 2 is a block diagram of the health screening server of Figure 1 in greater detail
- FIG. 3 is a block diagram of interactive health screening logic of the health screening server of Figure 2 in greater detail
- Figure 4 is a block diagram of interactive screening server logic of the interactive health screening logic of Figure 3 in greater detail;
- Figure 5 is a block diagram of generalized dialogue flow logic of the interactive screening server logic of Figure 4 in greater detail;
- Figure 6 is a logic flow diagram illustrating the control of an interactive spoken conversation with the patient by the generalized dialogue flow logic in accordance with the present invention
- Figure 7 is a block diagram of a question and adaptive action bank of the generalized dialogue flow logic of Figure 5 in greater detail;
- Figure 8 is a logic flow diagram of a step of Figure 6 in greater detail;
- Figure 9 is a block diagram of question management logic of the question and adaptive action bank of Figure 7 in greater detail
- Figure 10 is a logic flow diagram of determination of the quality of a question in accordance with the present invention.
- Figure 11 is a logic flow diagram of determination of the equivalence of two questions in accordance with the present invention.
- Figure 12 is a block diagram of runtime model server logic of the interactive health screening logic of Figure 3 in greater detail;
- Figure 13 is a block diagram of model training logic of the interactive health screening logic of Figure 3 in greater detail
- Figure 14 is a block diagram of a screening system data store of the interactive health screening logic of Figure 3 in greater detail;
- Figure 15 shows a health screening system 100B in which a health screening server estimates a health state of a patient by passively listening to ambient speech in accordance with the present invention
- Figure 16 is a logic flow diagram illustrating the estimation a health state of a patient by passively listening to ambient speech in accordance with the present invention
- Figure 17 is a logic flow diagram illustrating the estimation a health state of a patient by passively listening to ambient speech in accordance with the present invention
- Figure 18 is a block diagram of health care management logic of the health screening server of Figure 2 in greater detail
- Figures 19 and 20 are respective block diagrams of component conditions and actions of work- flows of the health care management logic of Figure 18;
- Figure 21 is a logic flow diagram of the automatic formulation of a work- flow of the health care management logic of Figure 18 in accordance with the present invention.
- Figure 22 is a block diagram of the health screening server of Figure 1 in greater detail
- Figure 23 is a logic flow diagram illustrating a portion of a step of the logic flow diagram of Figure 6 in greater detail.
- Figure 24 is a diagram illustrating relative quality of a patient’s response over time.
- Figure 25 is an illustration of word lengths collected from speech samples.
- Figure 26 is an illustration of speaking data showing a relationship between speaking rate and length of response.
- Figure 27 is a graph showing areas under curve (AUC) for speech recordings that include responses and concatenations of responses.
- Figure 28 is a graph showing how accumulating responses affects performance data of a natural language processing algorithm.
- Figure 29 is a graph showing a set of AUCs for responses of particular word lengths.
- Figure 30 is a graph showing within session length orderings for sessions with four responses.
- Figure 31 is a graph showing a set of AUC curves for shortest versus longest responses within sessions.
- a server computer system can apply a health state screening test to a human patient using a client device (patient device 112), by engaging the patient in an interactive spoken conversation and applying a composite model, that can combine language, acoustic, and visual models, to a captured audiovisual signal of the patient engaged in the dialogue.
- the composite model can analyze, in real time, the audiovisual signal of the patient (i) to make the conversation more engaging for the patient and (ii) estimate the patient’s health.
- Appendix A illustrates an exemplary implementation that includes Calendaring, SMS, Dialog, Calling and User
- Health screening server 102 may encourage honesty of the patient in a number of ways.
- the spoken conversation may provide the patient with less time to compose a disingenuous response to a question rather than simply responding honestly to the question.
- the conversation may feel, to the patient, more spontaneous and personal and may be less annoying to the patient than a generic questionnaire. Accordingly, the spoken conversation may not induce or exacerbate resentment in the patient for having to answer a questionnaire for the benefit of a doctor or other clinician.
- the spoken conversation may be dynamic to be responsive to the patient, reducing the patient’s annoyance with the screening test and, in some situations, shortening the screening test.
- the screening test as administered by health screening server 102 additionally may rely on non-verbal aspects of the conversation in addition to the verbal content of the conversation to assess depression in the patient. In effect, what is said may not be as reliably accurate in assessing depression as also considering how it is said or how the person appears when saying it.
- health screening system 100 may include a health screening server 102, a call center system 104, a clinical data server 106, a social data server 108, a patient device 112, and a clinician device 114 that are connected to one another though a wide area network (WAN) 110, that is the Internet in this illustrative embodiment.
- patient device 112 may also be reachable by call center system 104 through a public-switched telephone network (PSTN) 120.
- PSTN public-switched telephone network
- Health screening server 102 may be a server computer system that administers the health screening test with the patient through patient device 112 and combines a number of language, acoustic, and visual models to produce results 1220 ( Figure 12), using clinical data retrieved from clinical data server 106, social data retrieved from social data server 108, and patient data collected from past screenings to train the models of runtime model server 304 ( Figure 12).
- Clinical data server 106 ( Figure 1) may be a server computer system that makes clinical data of the patient, including diagnoses, medication information, etc., available, e.g., to health screening server 102, in a manner that is compliant with HIPAA (Health Insurance Portability and Accountability Act of 1996).
- Social data server 106 may be a server computer system that makes social data of the patient, including social media posts, online purchases, searches, etc., available, e.g., to health screening server 102.
- Clinician device 114 may be a client device that receives data representing results of the screening regarding the patient’s health from health screening server 102.
- Health screening server 102 is shown in greater detail in Figure 2 and in even greater detail below in Figure 22. As shown in Figure 2, health screening server 102 may include an interactive health screening logic 202 and health care management logic 208. In addition, health screening server 102 may include screening system data store 210 and model repository 216.
- interactive health screening logic 202 can conduct an interactive conversation with the subject patient and estimate one or more health states of the patient by application of the models of runtime model server 304 ( Figure 3, Figure 12) to audiovisual signals representing responses by the patient.
- interactive health screening logic 202 Figure 2
- Health care management logic 208 can make expert recommendations in response to health state estimations of interactive health screening logic 202.
- Screening system data store 210 can store and maintain all user and patient data needed for, and collected by, screening in the manner described herein.
- health screening server 102 can be distributed across multiple computer systems.
- real-time, interactive behavior of health screening server 102 e.g., interactive screening server logic 302 and runtime model server logic 304 described below
- real-time, interactive behavior of health screening server 102 is implemented in one or more servers configured to handle large amounts of traffic through WAN 110 ( Figure 1)
- computationally intensive behavior of health screening server 102 e.g., health care management logic 208 and model training logic 306 described below
- Distribution of various loads carried by health screening server 102 can be distributed among multiple computer systems using conventional techniques.
- Interactive health screening logic 202 is shown in greater detail in Figure 3.
- Interactive health screening logic 202 may include interactive screening server logic 302, runtime model server logic 304, and model training logic 306.
- Interactive screening server logic 302 can conduct an interactive screening conversation with the human patient;
- runtime model server logic 304 can use and adjust a number of machine learning models to concurrently evaluate responsive audiovisual signals of the patient;
- model training logic 306 can train models of runtime model server logic 304 before such models are used during inference.
- Interactive screening server logic 302 is shown in greater detail in Figure 4 and can include generalized dialogue flow logic 402 and input/output (I/O) logic 404.
- I/O logic 404 can send audiovisual signals to, and receive audiovisual signals from, patient device 112.
- I/O logic 404 can receive data from generalized dialogue flow logic 402 that specifies questions to be asked of the patient and send audiovisual data representing those questions to patient device 112.
- I/O logic 404 (i) can send an audiovisual signal to patient device 112 via a human operator in the call center 104 (or alternatively by sending data to a backend automated dialog system destined for patients) and (ii) receive an audiovisual signal from patient device 112 via the human operator in the call center 104.
- I/O logic 404 can also send at least portions of the received audiovisual signal of the interactive screening conversation to runtime model server logic 304 ( Figure 12) and model training logic 306 ( Figure 13).
- I/O logic 404 can also receive results 1220 ( Figure 12) from runtime server logic 304 that represent evaluation of the audiovisual signal.
- Generalized dialogue flow logic 402 can conduct the interactive screening conversation with the human patient.
- Generalized dialogue flow logic 402 can determine what questions I/O logic 404 should ask of the patient and monitor the reaction of the patient as represented in results 1220.
- generalized dialogue flow logic 402 can determine when to politely conclude the interactive screening conversation.
- Generalized dialogue flow logic 402 is shown in greater detail in Figure 5.
- Generalized dialogue flow logic 402 may include interaction control logic generator 502.
- Interaction control logic generator 502 can manage the interactive screening conversation with the patient by sending data representing dialogue actions to TO logic 404 ( Figure 4) that direct the behavior of TO logic 404 in carrying out the interactive screening conversation.
- Examples of dialogue actions include asking a question of the patient, repeating the question, instructing the patient, politely concluding the conversation, changing aspects of a display of patient device 112, and modifying characteristics of the speech presented by the patient by I/O logic 404, i.e., pace, volume, apparent gender of the voice, etc.
- Interaction control logic generator 502 can customize the dialogue actions for the patient.
- Interaction control logic generator 502 can receive data from screening data store 210 that represents subjective preferences of the patient and a clinical and social history of the patient.
- the subjective preferences are explicitly specified by the patient, generally prior to any interactive screening conversation, and include such things as the particular voice to be presented to the patient through I/O logic 404, default volume and pace of the speech generated by I/O logic 404, and display schemes to be used within patient device 112.
- the clinical and social history of the patient in combination with identified interests of the patient, can indicate that questions related to certain topics should be asked of the patient.
- Interaction control logic generator 502 can use the patient’s preferences and medical history to set attributes of the questions to ask the patient.
- Interaction control logic generator 502 can also receive data from runtime model server logic 304 that represents analytical results of responses of the patient in the current screening conversation.
- interaction control logic generator 502 can receive data representing analytical results of responses, i.e., results 1220 ( Figure 12) of runtime model server logic 304 and patient and results metadata from descriptive model and analytics 1212 that facilitate proper interpretation of the analytical results.
- Interaction control logic generator 502 can interpret the analytical results in the context of the results metadata to determine the patient’s current status.
- History and state machine 520 can track the progress of the screening conversation, i.e., which questions have been asked and which questions are yet to be asked.
- Question and dialogue action bank 510 may be a data store that stores all dialogue actions that can be taken by interaction control logic generator 502, including all questions that can be asked of the patient.
- history and state machine 520 can inform question and dialogue action bank 510 as to which question is to be asked next in the screening conversation.
- Interaction control logic generator 502 can also receive data representing the current state of the conversation and what questions are queued to be asked from history and state machine 520. Interaction control logic generator 502 can process the received data to determine the next action to be taken by interactive screening server logic 302 in furtherance of the screening conversation. Once the next action is determined, interaction control logic generator 502 can retrieve data representing the action from question and dialogue action bank 510 and send a request to I/O logic 404 to perform the next action.
- generalized dialogue flow logic 402 can select a question or other dialogue action to initiate the conversation with the patient.
- Interaction control logic generator 502 can receive data from history and state machine 520 that indicates that the current screening conversation is in its initial state.
- Interaction control logic generator 502 can also receive data that indicates (i) subjective preferences of the patient and (ii) topics of relatively high pertinence to the patient. Given that information, interaction control logic generator 502 can select the initial dialogue action with which to initiate the screening conversation.
- Examples of the initial dialogue action include (i) asking a common conversation-starting question such as“can you hear me?” or“are you ready to begin?”; (ii) asking a question from a predetermined script used for all patients; (iii) reminding the patient of a topic discussed in a previous screening conversation with the patient and asking the patient a follow-up question on that topic; or (iv) presenting the patient with a number of topics from which to select using a user-interface on patient device 112.
- interaction control logic generator 502 can cause I/O logic 404 to carry out the initial dialogue action.
- Loop operation 604 and next operation 616 define a loop in which generalized dialogue flow logic 402 can conduct the screening conversation according to steps 606-614 until generalized dialogue flow logic 402 determines that the screening conversation is completed.
- interaction control logic generator 502 can cause I/O logic 404 to carry out the selected dialogue action.
- the dialogue action is selected in operation 602.
- the dialogue action is selected in operation 614 as described below.
- generalized dialogue flow logic 402 can receive an audiovisual signal of the patient’s response to the question. While processing according to logic flow diagram 600 is shown in a manner that suggests synchronous processing, generalized dialogue flow logic 402 can perform operation 608 effectively continuously during performance of operations 602-616 and processes the conversation asynchronously. The same is true for operations 610-614.
- I/O logic 404 can send the audiovisual signal received in operation 608 to runtime model server logic 304, which can process the audiovisual signal in a manner described below.
- I/O logic 404 of generalized dialogue flow logic 402 can receive multiplex data from runtime model server logic 304 and produces therefrom an intermediate score for the screening conversation so far.
- the results data may include analytical results data and results metadata.
- I/O logic 404 (i) can determine to what degree the screening conversation has completed screening for the target health state(s) of the patient, (ii) identify any topics in the patient’s response that warrant follow-up questions, and (iii) identify any explicit instructions from the patient for modifying the screening conversation. Examples of the latter include patient statements such as“can you speak louder?”,“can you repeat that?” or“what?”, and“please speak more slowly.”
- generalized dialogue flow logic 402 can select the next question to ask the subject, along with other dialogue actions to be performed by I/O logic 404, in the next iteration of operation 606.
- interaction control logic generator 502 (i) can receive dialogue state data from history and state machine 520 regarding the question to be asked next, (ii) receive intermediate results data from I/O logic 404 representing evaluation of the patient’s health state so far, and (iii) receive patient preferences and pertinent topics.
- Generalized dialogue flow logic 402 can repeat the loop of operations 604-616 until interaction control logic generator 502 determines that the screening conversation is complete, at which point generalized dialogue flow logic 402 can politely terminate the screening conversation.
- the screening conversation may be complete when (i) all mandatory questions have been asked and answered by the patient and (ii) the measure of confidence in the score resulting from screening
- determined in operation 612 is at least a predetermined threshold. It should be noted that confidence in the screening is not symmetrical. The screening conversation seeks to detect specific health states in the patient, e.g., depression and anxiety. If such states are detected quickly, they’re detected. However, absence of such states is not assured by failing to detect them immediately. Thus, generalized dialogue flow logic 402 may find confidence in early detection but not in early failure to detect.
- health screening server 102 ( Figure 2) can estimate the current health state, e.g., mood, of the patient using a spoken conversation with the patient through patient device 112.
- Interactive screening server logic 302 can send data representing the resulting screening of the patient to the patient’s doctor or other clinicians by sending the data to clinician device 114.
- interactive screening server logic 302 can record the resulting screening in screening system data store 210.
- Generalized dialogue flow logic 402 may elicit speech from the patient that is highly informative with respect to the health state attributes for which health screening server 102 screens the patient.
- health screening server 102 can screens patients for depression and anxiety.
- the analysis performed by runtime model server logic 304 may be most accurate when presented with patient speech of a particular quality.
- high quality speech may be genuine and sincere, while poor quality speech may be from a patient who is not engaged in the conversation or being intentionally dishonest. For example, if the patient does not care about the accuracy of the screening but instead wants to answer all questions as quickly as possible to end the screening as quickly as possible, the screening may be unlikely to reveal much about the patient’s true health. Similarly, if the patient intends to control the outcome of the screening by giving false responses, not only are the responses linguistically false but the emotional components of the speech may be distorted or missing due to the disingenuous participation by the patient.
- generalized dialogue flow logic 402 can increase the likelihood that the patient’s responses are relatively highly informative.
- generalized dialogue flow logic 402 can invite the patient to engage interactive screening server logic 302 as an audio diary whenever the patient is so inclined. Voluntary speech by the patient whenever motivated may tend to be genuine and sincere and therefore highly informative.
- Generalized dialogue flow logic 402 can also select topics that are pertinent to the patient. These topics can include topics specific to clinical and social records of the patient and topics specific to interests of the patient.
- topic of interest to the patient can have the negative effect of influencing the patient’s mood. For example, asking the patient about her favorite sports team can cause the patient’s mood to rise or fall with the most recent news of the team. Accordingly, generalized dialogue flow logic 402 can distinguish health-relevant topics of interest to the patient from health-irrelevant topics of interest to the patient. For example, questions related to an estranged relative of the patient can be health-relevant while questions related to the patient’s favorite television series are typically not.
- patient device 112 displays a video representation of a speaker, i.e., an avatar, to the patient
- patient preferences may include, in addition to the preferred voice, physical attributes of the appearance of the avatar.
- generalized dialogue flow logic 402 can use a synthetic voice and avatar chosen for the first screening conversation and, in subsequent screening conversations, change the synthetic voice and avatar and compare the degree of informativeness of the patient’s responses to determine which voice and avatar elicit the most informative responses.
- the voice and avatar chosen for the initial screening conversation can be chosen according to which voice and avatar tends to elicit the most informative speech among the general population or among portions of the general population sharing one or more phenotypes with the patient. The manner in which the informativeness of responses elicited by a question is determined is described below.
- generalized dialogue flow logic 402 can insert a synthetic backchannel in the conversation. For example, generalized dialogue flow logic 402 can utter“uh-huh” during short pauses in the patient’s speech to indicate that generalized dialogue flow logic 402 is listening and interested in what the patient has to say. Similarly, generalized dialogue flow logic 402 can cause the video avatar to exhibit non-verbal behavior (sometimes referred to as“body language”) to indicate attentiveness and interest in the patient.
- non-verbal behavior sometimes referred to as“body language
- Generalized dialogue flow logic 402 can also select questions that are of high quality.
- the informativeness of a response to a question may be indicative of the quality of that question.
- generalized dialogue flow logic 402 can avoid repetition of identical questions in subsequent screening conversations, substituting equivalent questions when possible. The manner in which questions are determined to be equivalent to one another is described more completely below.
- question and adaptive action bank 510 ( Figure 5) is a data store that stores all dialogue actions that can be taken by interaction control logic generator 502, including all questions that can be asked of the patient.
- Question and adaptive action bank 510 is shown in greater detail in Figure 7.
- Question and adaptive action bank 510 is shown in greater detail in Figure 7 and may include a number of question records 702 and a dialogue 712.
- Each of question records 702 may include data representing a single question that can be asked of a patient.
- Dialogue 712 may be a series of questions to ask a patient in a spoken conversation with the patient.
- Each of question records 702 may include a question body 704, a topic706, a quality 708, and an equivalence 710.
- Question body 704 may include data specifying the substantive content of the question, i.e., the sequence of words to be spoken to the patient to effect asking of the question.
- Topic 706 may include data specifying a hierarchical topic category to which the question belongs. Categories can correlate to (i) specific health diagnoses such as depression, anxiety, etc.; (ii) specific symptoms such as insomnia, lethargy, general disinterest, etc.; and/or (iii) aspects of a patient’s treatment such as medication, exercise, etc.
- Quality 708 may include data representing the quality of the question.
- Equivalence 710 may be data identifying one or more other questions in question records 702 that are equivalent to the question represented by this particular one of question records 702. In this illustrative embodiment, only questions of the same topic 706 can be considered equivalent. In an alternative embodiment, any questions can be considered equivalent regardless of classification.
- Dialogue 712 may include an ordered sequence of questions 714A-N, each of which identifies a respective one of question records 702 to ask in a spoken conversation with the patient.
- the spoken conversation begins with twenty (20) preselected questions and can include additional questions as necessary to produce the threshold degree of confidence to conclude the conversation of logic flow diagram 400 ( Figure 4).
- the preselected questions may include, in order, five (5) open-ended questions of high quality, eight (8) questions of the standard and known PHQ-8 screening tool for depression, and the seven (7) questions of the standard and known GAD-7 screening tool for anxiety Dialogue 712 specifies these twenty (20) questions in this illustrative embodiment.
- interaction control logic generator 502 can determine the next question to ask the patient in operation 614.
- One embodiment of operation 614 is shown as logic flow diagram 614 ( Figure 8).
- interaction control logic generator 502 can dequeue a question from dialogue 712, treating the ordered sequence of questions 714A-N as a queue. History and state machine 520 can keep track of which of questions 714A-N is next. If the screening conversation is not complete according to the intermediate score and all of questions 714A-N have been processed in previous performances of operation 802 in the same spoken conversation, i.e., if the question queue is empty, interaction control logic generator 502 can select questions from those of question records 702 with the highest quality 708 and pertaining to topics selected for the patient. If interaction control logic generator 502 selects multiple questions, interaction control logic generator 502 can select one as the dequeued question randomly with each question weighted by its quality 708 and its closeness to suggested topics.
- interaction control logic generator 502 can collect all equivalent questions identified by equivalence 710 ( Figure 7) for the question dequeued in operation 802.
- interaction control logic generator 502 can select a question from the collection of equivalent questions collected in operation 804, including the question dequeued in operation 802 itself.
- Interaction control logic generator 502 can select one of the equivalent questions randomly or using information about prior interactions with the patient, e.g., to select the one of the equivalent questions least recently asked of the patient.
- Interaction control logic generator 502 can process the selected question as the next question in the next iteration of the loop of operations 604-616 ( Figure 6).
- the use of equivalent questions may be important.
- the quality of a question i.e., the degree to which responses the question elicits are informative in runtime model server logic 304, decreases for a given patient over time. In other words, if a given question is asked to a given patient repeatedly, each successive response by the patient is less informative than it was in all prior askings of the question. In a sense, questions become stale over time.
- a given question can be replaced with an equivalent, but different, question in a subsequent conversation.
- Question management logic 716 may include question quality logic 902, which measures a question’s quality, and question equivalence logic 904, which determines whether two (2) questions are equivalent in the context of health screening server 102.
- Question quality logic 902 may include a number of metric records 906 and metric aggregation logic 912. To measure the quality of a question, i.e., measure the informativeness of a response elicited by the question, question quality logic 902 can use a number of metrics to be applied to a question, each of which may result in a numeric quality score for the question and each of which is represented by one of metric records 906.
- Each of metric records 906 may represent a single metric for measuring question quality and includes metric metadata 908 and quantification logic 910.
- Metric metadata 908 may represent information about the metric of metric record 906.
- Quantification logic 910 may define the behavior of question quality logic 902 in evaluating a question’s quality according to the metric of metric record 906.
- quantification logic 910 can retrieve all responses to a given question from screening system data store 210 ( Figure 2) and use associated results data from screening system data store 210 to determine the number of words in each of the responses. Quantification logic 910 can quantify the quality of the question as a statistical measure of the number of words in the responses, e.g., a statistical mean thereof.
- the duration of elicited responses can be measured in a number of ways. In one, the duration of the elicited response is simply the elapsed duration, i.e., the entire duration of the response as recorded in screening system data store 210. In another, the duration of the elicited response is the elapsed duration less pauses in speech. In yet another, the duration of the elicited response is the elapsed duration less any pause in speech at the end of the response.
- quantification logic 910 can retrieve all responses to a given question from screening system data store 210 ( Figure 2) and determines the duration of those responses. Quantification logic 910 ( Figure 9) can quantify the quality of the question as a statistical measure of the duration of the responses, e.g., a statistical mean thereof.
- semantic models of NLP model 1206 can estimate a patient’s health state from positive and/or negative content of the patient’s speech.
- the semantic models can correlate individual words and phrases to specific health states the semantic models are designed to detect.
- quantification logic 910 can retrieve all responses to a given question from collected patient data 210 ( Figure 3) and use the semantic models to determine correlation of each word of each response to one or more health states.
- An individual response’s weighted word score is the statistical mean of the correlations of the weighted word scores.
- Quantification logic 910 can quantify the quality of the question as a statistical measure of the weighted word scores of the responses, e.g., a statistical mean thereof.
- runtime model server logic 304 ( Figure 12) can estimate a patient’s health state from pitch and energy of the patient’s speech as described below. How informative speech is to the various models of runtime model server logic 304 may be related to how emotional the speech is.
- quantification logic 910 can retrieve all responses to a given question from screening system data store 210 ( Figure 2) and use response data from runtime model server logic 304 to determine an amount of energy present in each response.
- Quantification logic 910 can quantify the quality of the question as a statistical measure of the measured acoustic energy of the responses, e.g., a statistical mean thereof.
- the quality of a question is a measure of how similar responses to the question are to responses recognized by runtime models 1202 ( Figure 12) as highly indicative of a health state that runtime models 1202 are trained to recognize.
- quantification logic 910 can determine how similar deep learning machine features for all responses to a given question are to deep learning machine features for health screening server 102 as a whole.
- a subsequent layer receives data representing the recognized edges and is manually configured to recognize edges that join together to define shapes.
- a final layer receives data representing shapes and is manually configured to recognize a symmetrical grouping of triangles (cat’s ears) and dark regions (eyes and nose). Other layers can be used between those mentioned here.
- the data received as input to any operation in the computation are called features.
- the results of the learning machine are called labels.
- the labels are“cat” and“no cat”.
- This manually configured learning machine may work reasonably well but can have significant shortcomings. For example, recognizing the symmetrical grouping of shapes might not recognize an image in which a cat is represented in profile. In a deep learning machine, the machine is trained to recognize cats without manually specifying what groups of shapes represent a cat. The deep learning machine can utilize manually configured features to recognize edges, shapes, and groups of shapes, however these are not a required component of a deep learning system. Features in a deep learning system can be learned entirely automatically by the algorithm based on the labeled training data alone.
- Training a deep learning machine to recognize cats in image data can, for example, involve presenting the deep learning machine with numerous, preferably many millions of, images and associated knowledge as to whether each image includes a cat, i.e., associated labels of“cat” or“no cat”.
- the last, automatically configured layer of the deep learning machine receives data representing numerous groupings of shapes and the associated label of“cat” or“no cat”.
- the deep learning machine determines statistical weights to be given each type of shape grouping, i.e., each feature, in determining whether a previously unseen image includes a cat.
- the features of the constituent models of runtime model server logic 304 ( Figure 12) specify precisely the type of responses that indicate a health state that the constituent models of runtime model server logic 304 are configured to recognize.
- these features represent an exemplary feature set.
- quantification logic 910 retrieves all responses to the question from screening system data store 210 and data representing the diagnoses associated with those responses and trains runtime models 1202 and model repository 216 using those responses and associated data.
- the deep learning machine can develop a set of features specific to the question being measured and the determinations to be made by the trained models.
- Quantification logic 910 can measure similarity between the feature set specific to the question and the exemplary feature set in a manner described below with respect to question equivalence logic 904.
- interaction control logic generator 502 can use quality 708 (Figure 7) of various questions in determining which question(s) to ask a particular patient.
- quality 708 Figure 7
- metric aggregation logic 912 Figure 9
- logic flow diagram 1000 Figure 10
- Loop operation 1002 and next operation 1010 define a loop in which metric aggregation logic 912 processes each of metric records 906 according to operations 1004-1008.
- the particular one of metric records 906 processed in an iteration of the loop of operations 1002- 1010 is sometimes referred to as“the subject metric record”, and the metric represented by the subject metric record is sometimes referred to as“the subject metric.”
- metric aggregation logic 910 can evaluate the subject metric, using quantification logic 910 of the subject metric record and all responses in screening system data store 210 ( Figure 2) to the subject question.
- metric aggregation logic 910 can determine whether screening system data store 210 includes a statistically significant sample of responses to the subject question by the subject patient. If so, metric aggregation logic 910 can evaluate the subject metric using quantification logic 910 and only data corresponding to the subject patient in screening system data store 210 in operation 1008. Conversely, if collected patient data 210 does not include a statistically significant sample of responses to the subject question by the subject patient, metric aggregation logic 910 can skip operationl008. Thus, metric aggregation logic 910 can evaluate the quality of a question in the context of the subject to the extent screening system data store 210 contains sufficient data corresponding to the subject patient.
- metric metadata 908 can store data specifying how metric aggregation logic 912 is to include the associated metric in the aggregate measure in step 1012.
- metric metadata 908 can specify a weight to be attributed to the associated metric relative to other metrics.
- equivalence 710 for a given question can identify one or more other questions in question records 702 that are equivalent to the given question. Whether two questions are equivalent can be determined by question equivalence logic 904 ( Figure 9) by comparing similarity between the two questions to a predetermined threshold. The similarity here is not how similar the words and phrasing of the sentences are but instead how similarly models of runtime model server 304 and model repository 216 sees them. The predetermined threshold can be determined empirically.
- Question equivalence logic 904 can measure the similarity between two questions in a manner illustrated by logic flow diagram 1100 ( Figure 11).
- Operation 1102 and operation 1106 define a loop in which question equivalence logic 904 can process each of metric records 906 according to operation 1104.
- the particular one of metric records 906 processed in an iteration of operations 1102-1106 is sometimes referred to as “the subject metric record”, and the metric represented by the subject metric record is sometimes referred to as“the subject metric.”
- question equivalence logic 904 can evaluate the subject metric for each of the two questions. Once all metrics have been processed according to operations 1102- 1106, processing by question equivalence logic 904 transfers to step 1108.
- question equivalence logic 904 can combine the evaluated metrics for each question into a respective multi-dimensional vector for each question.
- question equivalence logic 904 can normalize both vectors to have a length of 1.0.
- question equivalence logic 904 can determine an angle between the two normalized vectors.
- the cosine of the angle can be determined by question equivalence logic 904 to be the measured similarity between the two questions.
- the similarity between two questions ranges from -1.0 to 1.0, 1.0 being perfectly equivalent.
- the predetermined threshold is 0.98 such that two questions have a measured similarity of at least 0.98 are considered equivalent and are so represented in equivalence 710 ( Figure 7) for both questions.
- Runtime model server logic 304 can process audiovisual signals representing the patient’s responses in the interactive screening conversation and, while the conversation is ongoing, estimate the current health of the patient from the audiovisual signals.
- ASR logic 1204 may be logic that processes speech represented in the audiovisual data from I/O logic 404 ( Figure 4) to identify words spoken in the audiovisual signal.
- the results of ASR logic 1204 ( Figure 12) can be sent to runtime models 1202.
- Runtime models 1202 can also receive the audiovisual signals directly from I/O logic 404.
- runtime models 1202 can combine language, acoustic, and visual models to produce results 1220 from the received audiovisual signal.
- interactive screening server logic 302 can use results 1220 in real time as described above to estimate the current state of the subject patient and to accordingly make the spoken conversation responsive to the subject patient as described above.
- ASR logic 1204 can also identify where in the audiovisual signal each word appears and a degree of confidence in the accuracy of each identified word in this illustrative embodiment.
- ASR logic 1204 can also identify non-verbal content of the audiovisual signals, such as laughter and fillers for example, along with location and confidence information. ASR logic 1204 can make such information available to runtime models 1202.
- Runtime models 1202 may include descriptive model and analytics 1212, natural language processing (NLP) model 1206, acoustic model 1208, and visual model 1210.
- NLP natural language processing
- NLP model 1206 may include a number of text-based machine learning models to (i) predict depression, anxiety, and other health states directly from the words spoken by the patient and (ii) model factors that correlate with such health states. Examples of machine learning that models health states directly include sentiment analysis, semantic analysis, language modeling, word/document embeddings and clustering, topic modeling, discourse analysis, syntactic analysis, and dialogue analysis. [0132] NLP model 1206 can store text metadata and modeling dynamics and share that data with acoustic model 1208, visual model 1210, and descriptive model and analytics 1212. Text data can be received directly from ASR logic 1204 as described above.
- Text metadata can include, for example, data identifying, for each word or phrase, parts of speech (syntactic analysis), sentiment analysis, semantic analysis, topic analysis, etc.
- Modeling dynamics may include data representing components of constituent models of NLP model 1206. Such components include machine learning features of NLP model 1206 and other components such as long short-term memory (LSTM) units, gated recurrent units (GRUs), hidden Markov model (HMM), and sequence-to-sequence (seq2seq) translation information.
- LSTM long short-term memory
- GRUs gated recurrent units
- HMM hidden Markov model
- sequence-to-sequence sequence-to-sequence translation information.
- NLP metadata allows acoustic model 1208, visual model 1210, and descriptive model and analytics 1212 to correlate syntactic, sentimental, semantic, and topic information to corresponding portions of the audiovisual signal. Accordingly, acoustic model 1208, visual model 1210, and descriptive model and analytics 1212 can more
- Runtime models 1202 may also include acoustic model 1208, which can analyze the audio portion of the audiovisual signal to find patterns associated with various health states, e.g., depression. Associations between acoustic patterns in speech and many health conditions tend to be independent of the particular language spoken. Accordingly, acoustic model 1208 can analyze the audiovisual signal in a language-agnostic fashion.
- acoustic model 1208 can use machine learning approaches such as convolutional neural networks (CNN), long short-term memory (LSTM) units, hidden Markov models (HMM), etc. for learning high-level representations and for modeling the temporal dynamics of the audiovisual signals.
- CNN convolutional neural networks
- LSTM long short-term memory
- HMM hidden Markov models
- Acoustic model 1208 can store data representing attributes of the audiovisual signal and machine learning features of acoustic model 1208 as acoustic model metadata and share that data with NLP model 1206, visual model 1210, and descriptive model and analytics 1212.
- the acoustic model metadata can include, for example, data representing a spectrogram of the audiovisual signal of the patient’s response.
- the acoustic model metadata can include both basic features and high-level feature representations of machine learning features. More basic features can include Mel-frequency cepstral coefficients (MFCCs), probabilistic logic programming (PLP), and filter banks, for example, of acoustic model 1208.
- High-level feature representations can include, for example, feature representations generated by
- the acoustic model metadata allows NLP model 1206 to, for example, use acoustic analysis of the audiovisual signal to improve sentiment analysis of words and phrases.
- the acoustic model metadata may allow visual model 1210 and descriptive model and analytics 1212 to, for example, use acoustic analysis of the audiovisual signal to more accurately model the audiovisual signal.
- Runtime model server logic 304 may also include visual model 1210, which can infer various health states of the patient from face and gaze behaviors.
- Visual model 1210 can include facial cue modeling, eye/gaze modeling, pose tracking and modeling, etc. These are merely examples.
- Visual model 1210 can store data representing attributes of the audiovisual signal and machine learning features of visual model 1210 as visual model metadata and share that data with NLP model 1206, acoustic model 1208, and descriptive model and analytics 1212.
- the visual model metadata can include data representing face locations, pose tracking information, and gaze tracking information of the audiovisual signal of the patient’s response.
- the visual model metadata can include both basic features and high-level feature representations of machine learning features. More basic features can include image processing features of visual model 1210. High-level feature representations can include, for example, feature representations generated by CNNs, autoencoders, variational autoencoders, deep neural networks, and support vector machines of visual model 1210.
- the visual model metadata may allow descriptive model and analytics 1212 to, for example, use video analysis of the audiovisual signal to improve sentiment analysis of words and phrases. Descriptive model and analytics 1212 can even use the visual model metadata in combination with the acoustic model metadata to estimate the veracity of the patient in speaking words and phrases for more accurate sentiment analysis.
- the visual model metadata may allow acoustic model 1208 to, for example, use video analysis of the audiovisual signal to better interpret acoustic signals associated with various gazes, poses, and gestures represented in the video portion of the audiovisual signal.
- Descriptive model and analytics 1212 may include machine learning models that generate analytics and labels for numerous health states, not just depression. Examples of such labels include emotion, anxiety, how engaged the patient is, patient energy, sentiment, speech rate, and dialogue topics.
- descriptive model and analytics 1212 can apply these labels to each word of the patient’s response and determine how significant each word is in the patient’s response. While the significance of any given word in a spoken response can be inferred from the part of speech, e.g., articles and filler words as relatively insignificant, descriptive model and analytics 1212 can infer a word’s significance from additional qualities of the word, such as emotion in the manner in which the word is spoken as indicated by acoustic model 1208.
- Descriptive model and analytics 1212 can also analyze trends over time and use such trends, at least in part, to normalize analysis of the patient’s responses. For example, a given patient may typically speak with less energy than others. Normalizing analysis for this patient may set a lower level of energy as“normal” than would be used for the general population. In addition, a given patient may use certain words more frequently than the general population and use of such words by this patient might not be as notable as such use would be by a different patient. Descriptive model and analytics 1212 can analyze trends in real-time, i.e., while a screening conversation is ongoing, and in non-real-time contexts.
- Descriptive model and analytics 1212 can store data representing the speech analysis and trend analysis described above, as well as metadata of constituent models of descriptive model and analytics 1212, as descriptive model metadata and share that data with NLP model 1206, acoustic model 1208, and visual model 1210.
- the descriptive model metadata may allow NLP model 1206, acoustic model 1208, and visual model 1210 to more accurately model the audiovisual signal.
- runtime model server logic 304 can estimate a health state of a patient using what the patient says, how the patient says it, and contemporaneous facial expressions, eye expressions, and poses in combination and store resulting data representing such estimation as results 1220. This may provide a particularly accurate and effective tool for estimating the patient’s health state.
- Runtime model server logic 304 can send results 1220 to LO logic 404 ( Figure 4) to enable interactive screening server logic 302 to respond to the patient’s responses, thereby making the screening dialogue interactive in the manner described above.
- Runtime model server logic 304 ( Figure 12) can also send results 1220 to screening system data store 210 to be included in the history of the subject patient.
- Model training logic 306 shown in greater detail in Figure 13, can train the models used by runtime model server logic 304 ( Figure 12).
- Model training logic 306 may include runtime models 1202 and ASR logic 1204 and train runtime models 1202. Model training logic 306 can send the trained models to model repository 216 to make runtime models 1202, as trained, available to runtime model server logic 304.
- screening system data store 210 can store and maintain all user and patient data needed for, and collected by, screening in the manner described herein.
- Screening system data store 210 may include data store logic 1402, label estimation logic 1404, and user and patient databases 1406.
- Data store logic 1402 controls access to user and patient databases 1406.
- data store logic 1402 can store audiovisual signals of patients’ responses and provides patient clinical history data upon request. If the requested patient clinical history data is not available in user and patient databases 1406, data store logic 1402 can retrieve the patient clinical history data from clinical data server 106.
- data store logic 1402 can retrieve the patient social history data from social data server 108.
- Social media server 108 can include a wide variety of patient/subject data including but not limited to retail purchasing records, legal records (including criminal records), income history, as these can provide valuable insights to a person’s health. In many instances, these social determinants of disease contribute more to a person’s morbidity than medical care.
- Appendix B depicts a“Health Policy Brief: The Relative Contributions of Multiple Determinants to Health Outcomes”.
- Label estimation logic 1404 may include logic that specifies labels for which the various learning machines of health screening server 102 screen. Label estimation logic 1404 may include a user interface through which human operators of health screening server 102 can configure and tune such labels.
- Label estimation logic 1404 can also control quality of model training by, inter alia, determining whether data stored in user and patient databases 1406 is of adequate quality for model training.
- Label estimation logic 1404 may include logic for automatically identifying or modifying labels. In particular, if model training reveals a significant data point that is not already identified as a label, label estimation logic 1404 can look for correlations between the data point and patient records, system predictions, and clinical insights to automatically assign a label to the data point.
- interactive screening server logic 302 While interactive screening server logic 302 is described as conducting an interactive, spoken conversation with the patient to assess the health state of the patient, interactive screening server logic 302 can also act in a passive listening mode. In this passive listening mode, interactive screening server logic 302 passively listens to the patient speaking without directing questions to be asked of the patient.
- Passive listening mode in this illustrative embodiment, has two (2) variants.
- “conversational” variant the patient is engaged in a conversation with another whose part of the conversation is not controlled by interactive screening server logic 302.
- Examples of conversational passive listening include a patient speaking with a clinician and a patient speaking during a telephone call reminding the patient of an appointment with a clinician or discussing medication with a pharmacist.
- “fly-on-the-wall” (FOTW) or“ambient” variant the patient is speaking alone or in a public, or semi-public, place.
- Examples of ambient passive listening include people speaking in a public space or a hospital emergency room and a person speaking alone, e.g., in an audio diary or leaving a telephone message.
- One potentially useful scenario for screening a person speaking alone involves interactive screening server logic 302 screening calls to police emergency services (i.e.,“9-1-1”). Analysis of emergency service callers can distinguish truly urgent callers from less urgent callers.
- Practicing the techniques described herein should comply with legal requirements and limitations that may vary from jurisdiction to jurisdiction, including federal statutes, state laws, and/or local ordinances. For example, some jurisdictions may require explicit notice and/or consent of involved person(s) prior to capturing their speech. In addition, acquisition, storage, and retrieval of clinical records should be practiced in a manner that is in compliance with applicable jurisdictional requirement(s).
- Patient screening system 100B illustrates a passive listening variation of patient screening system 100 ( Figure 1).
- Patient screening system 100B may include health screening server 102, a clinical data server 106, and a social data server 108, which are as described above and, also as described above, connected to one another through WAN 110.
- LAN local area network
- listening devices 1512 and 1514 are smart speakers, such as the HomePodTM smart speaker available from Apple Computer of Cupertino, California, the Google HomeTM smart speaker available from Google LLC of Mountain View, California, and the Amazon EchoTM available firomAmazon.com, Inc. of Seattle, Washington.
- listening devices 1512 and 1514 can be other types of listening devices such as microphones coupled to clinician device 114B, for example.
- a single listening device 1514 is used and screening server 102 distinguishes between the patient and the clinician using conventional voice recognition techniques. Accuracy of such voice recognition can be improved by training screening server 102 to recognize the clinician’s voice prior to any session with a patient. While the following description refers to a clinician as speaking to the patient, it should be appreciated that the clinician can be replaced with another. For example, in a telephone call made to the patient by a health care office administrator, e.g., support staff for a clinician, the administrator takes on the clinician’s role as described in the context of conversational passive listening.
- Appendix C depicts an exemplary Question Rank for some of the embodiments in accordance with the present invention.
- FIG. 16 Processing by interactive health screening logic 202, particularly generalized dialogue flow logic 402 (Figure 5), in conversational passive listening is illustrated by logic flow diagram 1600 ( Figure 16).
- Operation 1602 and operation 1616 define a loop in which generalized dialogue flow logic 402 can process audiovisual signals of the conversation between the patient and the clinician according to operations 1604-1614. While operations 1604-1614 are shown as discreet, sequential operations, they can be performed concurrently with one another in an ongoing basis by generalized dialogue flow logic 402.
- the loop of operations 1602 -1616 can be initiated and terminated by the clinician using user interface techniques, e.g., using clinician device 114B ( Figure 15) or listening device 1514.
- generalized dialogue flow logic 402 can recognize a question to the patient posed by the clinician and send the question to runtime model server logic 304 for processing and analysis.
- Generalized dialogue flow logic 402 can receive results 1220 for the audiovisual signal of the clinician’s response, and results 1220 ( Figure 12) include a textual representation of the clinician’s response from ASR logic 1204 along with additional information from descriptive model and analytics 1212. This additional information may include identification of the various parts of speech of the words in the clinician’s response.
- generalized dialogue flow logic 402 can identify the most similar question in question and dialogue action bank 510 ( Figure 5). If the question recognized in operation 1604 is not identical to any questions stored in question and dialogue action bank 510, generalized dialogue flow logic 402 can identify the nearest question in the manner described above with respect to question equivalence logic 904 ( Figure 9) or can identify the question in question and dialogue action bank 510 ( Figure 5) that is most similar linguistically.
- generalized dialogue flow logic 402 can retrieve the quality of the nearest question from question and dialogue action bank 510, i.e., quality 708 ( Figure 7).
- generalized dialogue flow logic 402 can recognize an audiovisual signal representing the patient’s response to the question recognized in operation 1604.
- the patient’s response is recognized as a response of the patient immediately following the recognized question.
- the response can be recognized as the patient’s by (i) determining that the voice is captured more loudly by listening device 1512 than by listening device 1514 or (ii) determining that the voice is distinct from a voice previously established and recognized as the clinician’s.
- generalized dialogue flow logic 402 can send the patient’s response, along with the context of the clinician’s corresponding question, to runtime model server logic 304 for analysis and evaluation.
- the context of the clinician’s question is important, particularly if the semantics of the patient’s response are unclear in isolation. For example, consider that the patient’s answer is simply“Yes.” That response is analyzed and evaluated very differently in response to the question“Were you able to find parking?” versus in response to the question“Do you have thoughts of hurting yourself?”.
- generalized dialogue flow logic 402 can report intermediate analysis received from results 1220 to the clinician.
- the report can be in the form of animated gauges indicating intermediate scores related to a number of health states. Examples of animated gauges include steam gauges, i.e., round dial gauges with a moving needle, and dynamic histograms such as those seen on audio equalizers in sound systems.
- health screening server 102 can screen patients for any of a number of health states passively during a conversation the patient would engage in regardless without requiring a separate, explicit screening interview of the patient.
- health screening server 102 can listen to and process ambient speech according to logic flow diagram 1700 ( Figure 17). Processing by interactive health screening logic 202, particularly generalized dialogue flow logic 402 ( Figure 5), in ambient passive listening is illustrated by logic flow diagram 1700 ( Figure 17). Operation 1702 and operation 1714 define a loop in which generalized dialogue flow logic 402 can process audiovisual signals of ambient speech according to operations 1704-1712. While operations 1704-1714 are shown as discreet, sequential operations, they can be performed concurrently with one another in an ongoing basis by generalized dialogue flow logic 402. The loop of operations 1702-1714 is initiated and terminated by a human operator of the listening device(s) involved, e.g., listening device 1514.
- generalized dialogue flow logic 402 can capture ambient speech.
- interactive screening server logic 302 can determine whether the speech captured in operation 1704 is spoken by a voice that is to be analyzed.
- many people likely to speak in such areas can be registered with health screening server 102 such that their voices can be recognized.
- schools students can have their voices registered with health screening server 102 at admission.
- the people whose voices are to be analyzed are admitted students that are recognized by generalized dialogue flow logic 402.
- hospitals hospital personnel can have their voices registered with health screening server 102 at hiring.
- patients in hospitals can register their voices at first contact, e.g., at an information desk or by hospital personnel in an emergency room.
- hospital personnel are excluded from analysis when recognized as the speaker by generalized dialogue flow logic 402.
- Health screening server 102 can also determine approximate positions of unknown speakers in environments with multiple listening devices, e.g., by triangulation using different relative amplitudes and/or relative timing of arrival of the captured speech at multiple listening devices.
- the speaker can be asked to identify herself.
- the identity of the speaker can be inferred or is not especially important.
- the speaker can be authenticated by the device or can be assumed to be used by the device’s owner.
- the identity of the caller may not be as important as the location of the speaker and qualities of the speaker’s voice such as emotion, energy, and the substantive content of the speaker’s speech.
- generalized dialogue flow logic 402 may always determines that the speaker is to be analyzed.
- generalized dialogue flow logic 402 can send the captured ambient speech to runtime model server logic 304 for processing and analysis for context.
- Generalized dialogue flow logic 402 can receive results 1220 for the audiovisual signal of the captured speech, and results 1220 ( Figure 12) may include a textual representation of the captured speech from ASR logic 1204 along with additional information from descriptive model and analytics 1212. This additional information may include identification of the various parts of speech of the words in the clinician’s response.
- Generalized dialogue flow logic 402 can process results 1220 for the captured speech to establish a context.
- processing can transfer through operation 1714 to operation 1702 and passive listening accord to the loop of operations 1702-1714 continues.
- interactive screening server logic 302 can determine that the speech captured in operation 1704 is spoken by a voice that is to be analyzed, processing can transfer to operation 1710.
- generalized dialogue flow logic 402 can send the captured speech, along with any context determined in prior yet contemporary performances of operation 1708 or operation 1710, to runtime model server logic 304 for analysis and evaluation.
- generalized dialogue flow logic 402 can process any alerts triggered by the resulting analysis from runtime model server logic 304 according predetermined alert rules. These predetermined alert rules may be analogous to work- flows 1810 described below.
- these predetermined alert rules may be in the form of if-then-else logic that specify logical states and corresponding actions to take in such states.
- alert rules that can be implemented by interactive screening server logic 302.
- a police emergency system call in which the caller, speaking initially to an automated triage system, whose speech is determined to be highly emotional and anxious and to semantically describe a highly urgent situation, e.g., a car accident with severe injuries, a very high priority can be assigned to the call and taken ahead of less urgent callers.
- interactive screening server logic 302 In a school hallway in which interactive screening server logic 302 recognizes frantic speech and screaming and semantic content describing the presence of weapon and/or blatant acts of violence, interactive screening server logic 302 can trigger immediate notification of law enforcement and school personnel.
- interactive screening server logic 302 can record the analysis in the patient’s clinical records such that the patient’s behavioral health care provider can discuss the diary entry when the patient is next seen. In situations in which the triggering condition of the captured speech is particularly serious and urgent, interactive screening server logic 302 can report the location of the speaker if it can be determined.
- health screening server 102 can screen patients for any of a number of health states passively outside the confines of a one-to-one conversation with a health care professional.
- health care management logic 208 can make expert recommendations in response to health state analysis of interactive health screening logic 202. Health care management logic 208 is shown in greater detail in Figure 18.
- Health care management logic 208 may include manual work-flow management logic 1802, automatic work- flow generation logic 1804, work-flow execution logic 1806, and work- flow configuration 1808.
- Manual work-flow management logic 1802 implements a user interface through which a human administrator can create, modify, and delete work- flows 1810 of work-flow configuration 1808 by physical manipulation of one or more user input devices of a computer system used by the administrator.
- Automatic work- flow generation logic 1804 can perform statistical analysis of patient data stored within screening system data store 210 to identify work-flows to achieve predetermined goals. Examples of such goals include things like minimizing predicted costs for the next two (2) years of a patient’s care and minimizing the cost of an initial referral while also maximizing a reduction in Hemoglobin AIC in one year.
- Work-flow execution logic 1806 can process work-flows 1810 of work- flow
- work- flow execution logic 1806 processes work-flows 1810 in response to receipt of final results of any screening according to logic flow diagram 600 ( Figure 6) using those results in processing conditions of the work- flows.
- Work-flow configuration 1808 may include data representing a number of work- flows 1810.
- Each work-flow 1810 may include work- flow metadata 1812 and data representing a number of work-flow elements 1820.
- Work-flow metadata 1812 may be metadata of work-flow 1810 and may include data representing a description 1812, an author 1816, and a schedule 1818. Description 1812 may be information intended to inform any human operator of the nature of work- flow 1810.
- Author 1816 can identify the entity that created work- flow 1810, whether a human administrator or automatic work- flow generation logic 1804.
- Schedule 1818 can specify dates and times and/or conditions in which work- flow execution logic 1806 is to process work- flow 1810.
- Work-flow elements 1820 collectively define the behavior of work- flow execution logic 1806 in processing the work-flow.
- work-flow elements may each be one of two types: conditions, such as condition 1900 (Figure 19), and actions such as action 2000 ( Figure 20).
- condition 1900 may specify a boolean test that includes an operand 1902, an operator 1904, and another operand 1906.
- Operands 1902 and 1906 can each be results 1220 ( Figure 12) or any portion thereof, a constant, or null.
- any results of a given screening e.g., results 1220, any information about a given patient stored in screening system data store 210, and any combination thereof can be either of operands 1902 and 1906.
- Next work-flow element(s) 1908 specify one or more work-flow elements to process if the test of operands 1902 and 1906 and operator 1904 evaluate to a Boolean value of true
- next work-flow element(s) 1910 specify one or more work-flow elements to process if the test of operands 1902 and 1906 and operator 1904 evaluate to a Boolean value of false.
- next work- flow element(s) 1908 and 1910 can be any of a condition, an action, or null.
- condition 1900 can include more operands and operators combined with AND, OR, and NOT operations.
- condition 1900 can test for the mere presence or absence of an occurrence in the patient’s data. For example, to determine whether a patient has ever had a Hemoglobin AIC blood test, condition 1900 can determine whether the most recent Hemoglobin AIC test results to null. If equal, the patient has not had any
- Action 2000 may include action logic 2002 and one or more next work-flow element(s) 2004.
- Action logic 2002 may represent the substantive action to be taken by work- flow execution logic 1806 and can typically make or recommend a particular course of action in the care of the patient that can range from specific treatment protocols to more holistic paradigms. Examples include referring the patient to a care provider, enrolling the patient in a particular program of care, and recording recommendations to the patient’s file such that the patient’s clinician sees the recommendation at the next visit. Examples of referring a patient to a care provider include referring the patient to a psychiatrist, a medication management coach, physical therapist, nutritionist, fitness coach, dietitian, social worker, etc. Examples of enrolling the patient in a program include telepsychiatry programs, group therapy programs, etc.
- Examples of recommendations recorded to the patient’s file include recommended changes to medication, whether a change in the particular drug prescribed or merely in dosage of the drug already prescribed to the patient, and other treatments.
- referrals and enrollment can be effected by recommendations for referrals and enrollment in the patient’s file, allowing a clinician to make the final decision regarding the patient’s care.
- automatic work-flow generation logic 1804 can perform statistical analysis of patient data stored within screening system data store 210 to identify work- flows to achieve predetermined goals. Examples of such goals given above include minimizing predicted costs for the next two (2) years of a patient’s care and minimizing the cost of an initial referral while also maximizing a reduction in Hemoglobin AIC in one year. Automatic work- flow generation logic 1804 is described in the illustrative context of the first, namely, minimizing predicted costs for the next two (2) years of a patient’s care.
- Automatic work-flow generation logic 1804 can include deep learning machine logic.
- human computer engineers configure this deep learning machine logic of automatic work-flow generation logic 1804 to analyze patient data from screening system data store 210 in the context of labels specified by users, e.g., labels related to costs of the care of each patient over a 2-year period in this illustrative example.
- Users of health screening server 102 who are not merely patients are typically either health care providers or health care payors.
- automatic work-flow generation logic 1804 information regarding events in a given patient’s health care history is available and is included in automatic work-flow generation logic 1804 by the human engineers such that automatic work-flow generation logic 1804 can track costs of a patient’s care from the patient’s medical records.
- the human engineers can use all relevant data of screening system data store 210 to train the deep learning machine logic of automatic work-flow generation logic 1804.
- the deep learning machine logic of automatic work- flow generation logic 1804 may include an extremely complex decision tree that predicts the costs of each patient over a 2-year period.
- automatic work-flow generation logic 1804 can determine which events in a patient’s medical history have the most influence over the cost of the patient’s care in a 2-year period for statistically significant portions of the patient population.
- automatic work-flow generation logic 1804 identifies deep learning machine (DLM) nodes of the decision tree that have the most influence over the predetermined goals, e.g., costs of the care of a patient over a 2-year period.
- DLM deep learning machine
- Loop operation 2106 and next operation 2112 define a loop in which automatic work- flow generation logic 1804 processes each of the influential nodes identified in operation 2104.
- the particular node processed by automatic work- flow generation logic 1804 is sometimes referred to as the subject node.
- automatic work-flow generation logic 1804 forms a condition, e.g., condition 1900 ( Figure 19), from the internal logic of the subject node.
- the internal logic of the subject node receives data representing one or more events in a patient’s history and/or one or more phenotypes of the patient and makes a decision that represents one or more branches to other nodes.
- automatic work-flow generation logic 1804 generalizes the data received by the subject node and the internal logic of the subject node that maps the received data to a decision.
- automatic work-flow generation logic 1804 forms an action, e.g., action 2000 ( Figure 20), according to the branch from the subject node that ultimately leads to the best outcome related to the predetermined goal, e.g., to the lowest cost over a 2-year period.
- action 2000 Figure 20
- the condition formed in operation 2108 ( Figure 21) and the action formed in operation 2110 collectively form a work- flow generated by automatic work-flow generation logic 1804.
- the automatically generated work- flows are subject to human ratification prior to actual deployment within health care management logic 208.
- health care management logic 208 automatically deploys work-flows generated automatically by automatic work- flow generation logic 1804 but limits actions to only recommendations to health care professionals. It’s technically feasible to fully automate work- flow generation and changes to a patient’s care without any human supervision.
- Health screening server 102 is shown in greater detail in Figure 22. As noted above, it should be appreciated that the behavior of health screening server 102 described herein can be distributed across multiple computer systems using conventional distributed processing techniques. Health screening server 102 includes one or more microprocessors 2202
- CPU 2202 that retrieve data and/or instructions from memory 2204 and execute retrieved instructions in a conventional manner.
- Memory 2204 can include generally any computer-readable medium including, for example, persistent memory such as magnetic and/or optical disks, ROM, and PROM and volatile memory such as RAM.
- CPU 2202 and memory 2204 are connected to one another through a conventional interconnect 2206, which is a bus in this illustrative embodiment and which connects CPU 2202 and memory 2204 to one or more input devices 2208, output devices 2210, and network access circuitry 2212.
- Input devices 2208 can include, for example, a keyboard, a keypad, a touch- sensitive screen, a mouse, a microphone, and one or more cameras.
- Output devices 2210 can include, for example, a display - such as a liquid crystal display (LCD) - and one or more loudspeakers.
- Network access circuitry 2212 sends and receives data through computer networks such as WAN 110 ( Figure 1). Server computer systems often exclude input and output devices, relying instead on human user interaction through network access circuity exclusively. Accordingly, in some embodiments, health screening server 102 does not include input devices 2208 and output devices 2210.
- a number of components of health screening server 102 are stored in memory 2204.
- interactive health screening logic 202 and health care management logic 208 are each all or part of one or more computer processes executing within CPU 2202 from memory 2204.
- “logic” refers to (i) logic implemented as computer instructions and/or data within one or more computer processes and/or (ii) logic implemented in electronic circuitry.
- Screening system data store 210 and model repository 216 are each data stored persistently in memory 2204 and can be implemented as all or part of one or more databases. Screening system data store 210 also includes logic as described above.
- servers and clients are largely an arbitrary one to facilitate human understanding of purpose of a given computer.
- server and“client” are primarily labels to assist human categorization and understanding.
- interaction control logic generator 504 ( Figure 5) of generalized dialogue flow logic 402 selects the next question to ask the subject patient, along with other dialogue actions to be performed by I/O logic 404, in operation 614 ( Figure 6).
- operation 614 Figure 6
- a portion of this illustrative embodiment of operation 614 is shown as logic flow diagram 614A ( Figure 23).
- Logic flow diagram 614A improves detection of a health state in a patient by exploiting a phenomenon.
- Chart 2400 ( Figure 24) shows a curve 2402 of incremental quality of a voice signal 2404 over time.“Quality” here refers to how closely modeling described above correlates to a particular health state, e.g., depression in this illustrative example.
- Voice signal 2404 is an audio signal received from the patient in step 606 ( Figure 6). As curve 2402 shows, quality improves during the initial portion of voice signal 2404 and eventually declines. It seems that the most genuine and candid speech is not at the beginning of voice signal 2404 but rather some time into it. In addition, at some point in voice signal 2404, the additional information gleaned from allowing the patient to continue to speak is worth relatively little. Processing by interaction control logic generator 504 ( Figure 5) according to logic flow diagram 614A ( Figure 23) exploits this.
- interaction control logic generator 504 determines the duration of the audio signal, e.g., voice signal 2404 ( Figure 24), received so far in response to the question most recently presented to the patient.
- the duration can be measured in seconds or in the number of words spoken by the patient.
- interaction control logic generator 504 determines whether the duration determined in step 2302 is at least a predetermined short threshold.
- the short threshold specifies a duration during which the patient should not be interrupted.
- the short threshold is set such that the patient is allowed to continue speaking until voice signal 2404 ( Figure 24) is expected to have reached and pass though the portion of curve 2402 representing the highest quality. It has been observed that the quality of the patient’s response is typically better in the second 30-45 words than the first 30-45 words.
- the short threshold is 80 words to allow the patient’s response to get past the first 30-45 words and well into the portion of the response most likely to be informative. In alternative embodiments, the short threshold is in the range of 60-100 words.
- interaction control logic generator 504 determines, in test step 2304, that the duration is less than the predetermined short threshold, processing transfers to step 2306.
- step 2306 interaction control logic generator 504 determines that the next dialogue action to be taken will not be taken until the patient has been permitted to speak longer, and the remainder of step 614 ( Figure 6) is processed by interaction control logic generator 504.
- interaction control logic generator 504 determines, in test step 2304, that the duration is at least the predetermined short threshold, processing transfers to test step 2308.
- interaction control logic generator 504 determines whether the duration is less than a predetermined long threshold.
- the long threshold specifies a duration during which allowing the patient to continue to speak can be beneficial. Beyond the long threshold, the additional benefit of allowing the patient to continue speaking is outweighed by the benefit of asking the patient a different question. It has been observed that the quality of the patient’s response is typically better in the first 75-100 words than in the second 75-100 words. Thus, the informativeness of the patient’s response can often degrade significantly as its duration approaches 150-200 words.
- the long threshold is 120 words. In alternative embodiments, the long threshold is in the range 100-150 words.
- the short and long thresholds are one and the same, e.g., in the range of 90-110 words.
- the duration and thresholds can be measured in time, e.g., seconds.
- the short threshold can be in the range of 30-50 seconds
- the long threshold can be in the range of 50-70 seconds
- the two thresholds can be one and the same in the range of 45-55 seconds.
- the particular thresholds that yield the best results can vary from data corpus to data corpus.
- Each data corpus can be empirically analyzed to identify where the most informative portions of spoken responses tends to begin in the data corpus and where incremental informativeness in the spoken responses do not justify further capture of a response to a given question.
- the short and long thresholds can be set according to such empirical analysis.
- the various rates at which individual patients typical speak can be represented in patient data and used to adjust the thresholds accordingly.
- the various data corpora for which useful thresholds can be determined empirically can include sets of patients (e.g., of similar phenotypes) or, with sufficient data, even individual patients.
- interaction control logic generator 504 determines, in test step 2308, that the duration is at least the predetermined long threshold, processing transfers to step 2310.
- interaction control logic generator 504 selects, as a next dialogue action, to interrupt the patient’s spoken response at the next lull in the patient’s speech.
- the interruption may be diplomatic and respectful, e.g., acknowledging that the patient’s response is informative and perhaps even thanking the patient for the response before asking the next question.
- processing according to logic flow diagram 614A completes.
- interaction control logic generator 504 determines, in test step 2308, that the duration is less than the predetermined long threshold, processing transfers to test step 2312.
- interaction control logic generator 504 determines whether the most recent portion of the patient’s spoken response continues to be of high quality. In particular, interaction control logic generator 504 compares confidence in runtime models 1202 ( Figure 12) reflected in results 1220 to, in one embodiment, a predetermined threshold or, in an alternative embodiment, confidence in runtime models 1202 associated with earlier portions of the patient’s current spoken response.
- Interaction control logic generator 504 can also use information related to the captured spoken responses in the aggregate in the current health screening test. For example, in one data corpus, it has been determined that spoken responses in a given screening test continue to be informative for at least eight (8) minutes or about 540 words. Accordingly, interaction control logic generator 504 base its determination of whether the most recent portion of the patient’s spoken response continues to be of high quality at least in part by determining how much more spoken responses are needed to reach an aggregate of at least eight minutes or 500 or so word in the context of how many questions in the current screening test are yet to be asked. In short, interaction control logic generator 504 controls the dialogue to elicit as much informative speech as can be useful and informative.
- interaction control logic generator 504 determines that the most recent portion of the patient’s spoken response continues to be of high quality, processing transfers to step 2306 in which interaction control logic generator 504 does not yet interrupt the patient as described above. Conversely, if interaction control logic generator 504 determines that the most recent portion of the patient’s spoken response does not continue to be of high quality, processing transfers to step 2310 in which interaction control logic generator 504 decides to interrupt the patient at the next lull as described above.
- interaction control logic generator 504 provides the best opportunity to collect the highest quality speech from the patient.
- Machine learning and deep learning logic is very resource-intensive, both in training and testing. It is not uncommon for training and testing of models to overwhelm available processing resources. While more data is generally better for training and testing models, significantly improvements in efficiency can be gained using the thresholds described above to filter training and/or testing data. For example, spoken responses shorter than the short threshold can be excluded from training and testing models. Similarly, spoken responses longer than the long threshold can be cropped at the long threshold to reduce the amount of data in particularly long responses that much be processed in training and/or testing.
- Machine learning models for speech-based depression classification offer promise for health care applications. Despite growing work on depression classification, little is understood about how the length of speech-input impacts model performance. It is possible to analyze results for speaker-independent depression classification using a corpus of over 1400 hours of speech from a human-machine health screening application. It is possible to examine
- Results for both systems show that performance depends on natural length, elapsed length, and ordering of the response within a session.
- Systems may share a minimum length threshold, but differ in a response saturation threshold, with the latter higher for the better system. At saturation it may be better to pose a new question to the speaker, than to continue the current response.
- this experiment suggests that concatenating samples composed of short responses may yield better performance and, thus, predictive ability.
- the improved system may give clinicians better information on how to structure questions to elicit responses that are above the threshold length, but below the saturation length.
- the improved system may also give clinicians insight into an optimal length of an assessment, as there is a benefit to obtaining longer responses in some cases and shorter responses in other cases.
- Depression is a prevalent disabling condition and a major global public health concern.
- Mobile AI technology may play an important role in expanding screening for depression, especially as an aid to providers who could follow up with appropriate care.
- Speech technology offers promise because speaking is natural, can be used at a distance, requires no special training, and carries information about a speaker’s state.
- a growing line of AI research has shown that depression can be detected from speech signals using natural language processing (NLP), acoustic models, and multimodal models.
- NLP natural language processing
- acoustic models acoustic models
- multimodal models Common evaluations with shared data sets, features, and tools have recently led to progress, especially in modeling methods.
- the data comprise American English spontaneous speech, with users allowed to talk freely in response to questions within a session. Users may range in age from 18 to over 65, with a mean of roughly 30. They interacted with a software application that presented questions on different topics, such as“work” or“home life”. Responses average about 125 words— longer than some reports of turn lengths in conversation, e.g.; see Figure 1. Users responded to 4-6 (mean 4.52) different questions per session, and then completed a PHQ-9 after the suicidality question was removed. The resulting session-level PHQ-8 score served as the gold standard for both the session and the responses within it. Scores were mapped to a binary classification task, with scores at or above 10 mapped to depressed (+dep) and scores below 10 to nondepressed (- dep), following.
- ULMFiT is a deep learning model trained on large publicly available corpora (e.g. Wikipedia data).
- the trained network serves as a multipurpose RNN-LSTM language model, which is fine-tuned for classification purposes.
- the model was used to predict depression class.
- the number of tokens in System 1 was 7,000; in System 2 this number increased to 30,000.
- the method may use language model fine tuning.
- the language model may be pre trained.
- the language model may be a neural network model, such as a long short-term memory (LSTM) model.
- the model may then be fine tuned. Fine tuning may be performed using a slanted triangular learning rate (STLR). With an STLR, the learning rate may be increased linearly and then linearly decreased. The rates of linear increase and decrease may be different rates.
- STLR slanted triangular learning rate
- each layer of the neural network may be fine-tuned with a different learning rate.
- the classifier itself may be fine-tuned, using, for example, a gradual unfreezing method. Using this method, each layer of the classifier may be individually unfrozen and fine- tuned, rather than fine-tuning the entire model at once.
- Figure 25 shows a distribution of lengths in the smaller corpus. As shown in Figure 25, data partitions are well matched, including nearly identical CDFs (black lines overlap) and similar distributions for class lengths both within and across partitions. Depression priors (i.e. +dep) differ slightly, at 28% (smaller 26%) for train vs. 22% for test data.
- Figure 16 shows speaking rate by class and length. Second, there is a decline for all four curves across speaking rate, corresponding to a slight slowing for longer responses in general across classes. Thus, the longer a response is naturally, the fewer words per second it generally contains. Here the effect is on the order of about 3 or 4 words per minute. Because these effects are small, a single aggregate rate over all data may be computed, of 2.39 words/second (143.4 words/minute), with which to estimate time bins for all future analyses that use only word information. The value is indicated by the dark line, or“global estimate for rate” and used in later figures to convert words to seconds.
- Length is presented using a gating measure, showing how much information is present “so far” at any point.
- the metric may be defined as follows Cumulative gated length is the value of x at which all data in the condition are accumulated, and at which the value of y is computed after removing any additional length for data samples longer than x.
- Figure 27 shows an AUC for sessions and responses, using Systems 25 and 26. As shown, and as expected, System 2 outperforms System 1 , and sessions (which concatenate all response data) outperform individual responses. (Curves stop where there is not enough data to evaluate potential additional gain.) Important observations for length include: both systems show sharp decline below 30 to 50 words, responses saturate in AUC at about 250 words, sessions appear to saturate at closer to 1000 words.
- Figure 28 shows session-level AUC as a function of progressive addition of responses.
- Point 4 reflects that even given multiple responses, NLP requires at least 30 to 50 words in order to perform. Additional responses add progressively less value as the magnitude of base increases; this is expected mathematically. Observation 6 notes it is better to compose a session of multiple responses than fewer longer ones. The largest benefit in moving to a new response is about 4% in AUC, right after the 1st response (points 5 and 6 combined). Point 8 suggests that once a response reaches saturation length, it is better to move on to a new question.
- Figure 29 shows ROC results for sample gating values.
- Figure 29 shows system 2 session-level performance for combined model as a function of gated session length in words. Both System 1 and System 2 outperform unaided primary care physicians as reported in at 87% specificity/54% sensitivity.
- Figure 29, along with Figure 27 and Figure 28 suggests that session performance continues to improve beyond 800+ words.
- System 2 session length saturation is likely to be at about 1000 words, or just over 8 minutes.
- Figure 30 displays a set of bars for each of these ordering slots, 1st through 4th. Each category of bars sums to 1 e.g.“shortest”. The highest bar in each set indicates the most frequent length for that ordering slot. There is a clear predominance of shortest responses in the first position. In each subsequent position slot, the most frequent length matches the order of the position slot. It may be concluded from this pattern that speakers tend to increase their response lengths as they progress through a session.
- Figure 31 shows an AUC for shortest versus longest responses within sessions.
- Figure 31 shows data for a selection of the shortest and the longest response (in words). It may be expected that similar value from similar lengths across longer and shorter responses.
- response performance begins to saturate. This may be performed using all data from a frequently-occurring natural length bin of 150 and 200 words. Each response in the bin was cut at various lengths, based on word count. Performance was then compared within a response, between the early part and later part.
- Results showed that the first part of responses (60% of the total length) is less valuable than the second part by 6% on the AUC scale for System 2.
- the same analysis was performed for bins of different natural lengths. In the case of a bin from 60 to 90 words, the effect was consistent; i.e. the second half was more valuable than the first.
- Natural length values of the transition in behavior from better performing second halves to better performing first halves were searched for empirically. Results can be summarized as follows. Long responses perform better than short ones— eventually. This was consistent for both Systems. Short responses perform better than long, initially. There is a threshold length below which one should not cut off a current response. This length is about 80 words for System 1, and 150 words for System 2.
- results compared two systems that differ in performance overall, to test for similarity in patterns and difference in absolute thresholds.
- Analyses using AUC indicate that responses for both systems should be at least 30 to 50 words long (about 20 seconds). Within a single response, there is a threshold below which one should keep waiting, and one at which it is better to move to a new question. These values depend on the system itself, with the better system making better use of additional words (80 and 120 words for System 1, respectively) versus 150 and 200 words for System 2.
- additional words 80 and 120 words for System 1, respectively
- concatenating a larger number of shorter responses is better than using a smaller number of longer responses— as long as all responses exceed the minimum length. Moving to a new response provides maximum relative gain when early in a session. Using the better system, session length saturation appears to occur after about 8 minutes of total speech.
- Pat ent has an appointment. scheduled
- the Ellipsis Calendering Service is responsible for recording a requested appointment time and initiating a screening session between a user and our dialog service.
- the SMS Service is responsible for texting users reminders as scheduled by the calendering service.
- the dialog service is responsible for asking questions and responding to users, this is currently a service provided a third party.
- This service creates a conference call between our dialog service and a patient.
- the cal! is recorded for future analysis.
- PHQ #1 Over the past 2 weeks, how often have you had little interest or pleasure in doing things 9 Not at all, several days, more than half of the days, or all of the days
- PHQ #2 Over the past 2 weeks, how often have you felt down, depressed, or hopeless? The choices are not at all, several days, more than half of the days, or nearly every day
- PHQ #3 Over the past 2 weeks, how often have you had trouble falling asleep, staying asleep, or sleeping too much? The choices are not at all, several days, more than half of the days, or nearly every day
- PHQ #6 Over the past 2 weeks, how often have you felt bad about yourself - or that you're a failure and have let yourself or your family down? The choices are not at all, several days, more than half of the days, or nearly every day
- PHQ #7 Over the past 2 weeks, how often have you had trouble concentrating on things, such as reading the newspaper or watching TV? The choices are not at all, several days, more than half of the days, or nearly every day
- PHQ #8 Over the past 2 weeks, how often have you been moving or speaking so slowly that other people could have noticed. Or the opposite - being so fidgety or restless that you have been moving around a lot more than usual? The choices are not at all, several days, more than half of the days, or nearly every day GAD #1 : Over the past 2 weeks, how often have you felt nervous, anxious, or on edge? Not at all, several days, more than half of the days, or nearly every day
- GAD #2 Over the past 2 weeks, how often have you not been able to stop or control
- GAD #3 Over the past 2 weeks, how often have you been worrying too much about different things? Not at all, several days, more than half of the days, or nearly every day
- GAD #4 Over the past 2 weeks, how often have you had trouble relaxing? Not at all, several days, more than half of the days, or nearly every day
- GAD #5 Over the past 2 weeks, how often have you been so restless that it’s hard to sit still? Not at all, several days, more than half of the days, or nearly every day
- GAD #7 Over the past 2 weeks, how often have you felt afraid as if something out might happen? Not at all, several days, more than half of the days, or nearly every day
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Biomedical Technology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
L'invention concerne un système informatique de serveur qui dépiste un état de santé chez un patient humain en appliquant un modèle composite qui combine des modèles linguistiques, acoustiques et visuels, à un signal audiovisuel capturé du discours du patient. La précision du dépistage est améliorée par la mesure de la qualité d'un certain nombre de questions candidates et la présentation de celles ayant la qualité la plus élevée pour le patient. Les dépistages répétés sont maintenus à la fois actuels et cohérents par remplacement par des questions équivalentes. Des vecteurs multidimensionnels représentant le patient et un score de dépistage sont utilisés pour adapter automatiquement les soins de santé du patient pour atteindre des objectifs.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201962829582P | 2019-04-04 | 2019-04-04 | |
| US62/829,582 | 2019-04-04 | ||
| US201962873169P | 2019-07-11 | 2019-07-11 | |
| US62/873,169 | 2019-07-11 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020206178A1 true WO2020206178A1 (fr) | 2020-10-08 |
Family
ID=72667075
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2020/026472 Ceased WO2020206178A1 (fr) | 2019-04-04 | 2020-04-02 | Commande de synchronisation de dialogue dans des dialogues de dépistage sanitaire pour une modélisation améliorée du discours de réponse |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2020206178A1 (fr) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022189139A1 (fr) * | 2021-03-12 | 2022-09-15 | Biotronik Se & Co. Kg | Assistant vocal médical |
| US11942194B2 (en) | 2018-06-19 | 2024-03-26 | Ellipsis Health, Inc. | Systems and methods for mental health assessment |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030180698A1 (en) * | 2002-03-22 | 2003-09-25 | Alen Salerian | Mental disorder screening tool and method of screening subjects for mental disorders |
| US20120265024A1 (en) * | 2010-10-05 | 2012-10-18 | University Of Florida Research Foundation, Incorporated | Systems and methods of screening for medical states using speech and other vocal behaviors |
| US20170119302A1 (en) * | 2012-10-16 | 2017-05-04 | University Of Florida Research Foundation, Incorporated | Screening for neurological disease using speech articulation characteristics |
| US20170251985A1 (en) * | 2016-02-12 | 2017-09-07 | Newton Howard | Detection Of Disease Conditions And Comorbidities |
-
2020
- 2020-04-02 WO PCT/US2020/026472 patent/WO2020206178A1/fr not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030180698A1 (en) * | 2002-03-22 | 2003-09-25 | Alen Salerian | Mental disorder screening tool and method of screening subjects for mental disorders |
| US20120265024A1 (en) * | 2010-10-05 | 2012-10-18 | University Of Florida Research Foundation, Incorporated | Systems and methods of screening for medical states using speech and other vocal behaviors |
| US20170119302A1 (en) * | 2012-10-16 | 2017-05-04 | University Of Florida Research Foundation, Incorporated | Screening for neurological disease using speech articulation characteristics |
| US20170251985A1 (en) * | 2016-02-12 | 2017-09-07 | Newton Howard | Detection Of Disease Conditions And Comorbidities |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11942194B2 (en) | 2018-06-19 | 2024-03-26 | Ellipsis Health, Inc. | Systems and methods for mental health assessment |
| US12230369B2 (en) | 2018-06-19 | 2025-02-18 | Ellipsis Health, Inc. | Systems and methods for mental health assessment |
| WO2022189139A1 (fr) * | 2021-03-12 | 2022-09-15 | Biotronik Se & Co. Kg | Assistant vocal médical |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12230369B2 (en) | Systems and methods for mental health assessment | |
| US11120895B2 (en) | Systems and methods for mental health assessment | |
| US11545173B2 (en) | Automatic speech-based longitudinal emotion and mood recognition for mental health treatment | |
| CA3155809A1 (fr) | Modeles de traitement de langage acoustique et naturel pour la selection par commande vocale et la surveillance de conditions de sante comportementale | |
| US10885278B2 (en) | Auto tele-interview solution | |
| Thati et al. | A novel multi-modal depression detection approach based on mobile crowd sensing and task-based mechanisms | |
| US12383179B2 (en) | Data processing system for detecting health risks and causing treatment responsive to the detection | |
| US10402924B2 (en) | System and method for remote management and detection of client complications | |
| US11710576B2 (en) | Method and system for computer-aided escalation in a digital health platform | |
| US20210012065A1 (en) | Methods Circuits Devices Systems and Functionally Associated Machine Executable Code for Generating a Scene Guidance Instruction | |
| US20250329326A1 (en) | Voice analyzer for interactive care system | |
| Papangelis et al. | An adaptive dialogue system for assessing post traumatic stress disorder | |
| WO2020206178A1 (fr) | Commande de synchronisation de dialogue dans des dialogues de dépistage sanitaire pour une modélisation améliorée du discours de réponse | |
| Levitan | Deception in spoken dialogue: Classification and individual differences | |
| WO2024259032A2 (fr) | Systèmes et procédés de prédiction d'états de santé mentale sur la base du traitement de parole conversationnelle/texte conversationnel et du langage | |
| US20250226066A1 (en) | Systems and methods for mental health assessment | |
| US12505854B2 (en) | Acoustic and natural language processing models for speech-based screening and monitoring of behavioral health conditions | |
| US20250124063A1 (en) | Computer-implemented method for generating an acknowledgment in an automated conversational healthcare pipeline | |
| US20250087343A1 (en) | Artificial intelligence appointment scheduling system | |
| Branson | Sounding Guilty: Criminality and Black Racialized Speech | |
| Thaler et al. | Using NLP to analyze whether customer statements comply with their inner belief | |
| CN121034310A (zh) | 语音分析方法、装置、计算机设备及可读存储介质 | |
| CN120600015A (zh) | 语音识别方法、装置、计算机设备及存储介质 | |
| Griol Barres et al. | Modeling the user state for context-aware spoken interaction in ambient assisted living |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20784270 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07/02/2022) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20784270 Country of ref document: EP Kind code of ref document: A1 |