WO2014052921A2 - Mesure de similarité de dossier médical de patient - Google Patents
Mesure de similarité de dossier médical de patient Download PDFInfo
- Publication number
- WO2014052921A2 WO2014052921A2 PCT/US2013/062460 US2013062460W WO2014052921A2 WO 2014052921 A2 WO2014052921 A2 WO 2014052921A2 US 2013062460 W US2013062460 W US 2013062460W WO 2014052921 A2 WO2014052921 A2 WO 2014052921A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequential
- time
- patient
- records
- record
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H20/00—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Definitions
- the subject technology relates to methods of identifying effective treatments for patients using biological sequence analysis techniques.
- a patient's electronic medical record contains data that can be used by a cl inician to evaluate and treat a patient.
- a collection of patient electronic medical records may be vast in amount, especially when clinicians provide long-term care to patients or when clinicians provide care to many different patients. Processing this large amount of medical information can therefore be difficult.
- a method of identifying treatment for a patient comprising; receiving, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention; annotating each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session; based on the indicators, creating a first time-sequential record of the patient, comprising each patient session; comparing the first sequential record to other time-sequential records, of other patients; identifying a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and identifying, by a processor at least one health care intervention that was most effective for the cohort.
- the method includes outputting, to an output device, the identified at least one intervention with an indication of a degree of effectiveness of the at least one intervention.
- the respective intervention annotated with the respective indicator of time is identified using a natural language processing technique.
- a dynamic programming algorithm is used to obtain the cohort of similar sequential records.
- the algorithm comprises
- the most effective intervention is selected from the group consisting of: drug therapy, inpatient procedures, and outpatient procedures.
- the other sequential records comprises over one million sequential records.
- the method includes a step of prioritizing, by a clinician, the significance of the respective intervention.
- the step of identifying a cohort includes identifying, by a processor, healthcare interventions that were effective in the cohort.
- the interventions that are annotated are selected from the group consisting of clinical terms, biological terms, genomic terms, and laboratory testing terms.
- the instructions further comprise code for outputting, to an output device, the identified at least one intervention with an indication of a degree of effectiveness of the at least one intervention.
- the instructions include code for annotating the files using a natural language processing technique.
- the instructions comprise code for using a dynamic programming algorithm to obtain the cohort of similar sequential records.
- the instructions comprise code for using the following algorithm
- the interventions are selected from the group consisting of: drug therapy, inpatient procedures, and outpatient procedures.
- the instructions further comprise code for accessing over one million sequential records.
- the instructions further comprise code for inputting, by a clinician, prioritizing data of the significance of the respective intervention.
- the instructions further comprise code for identifying, by a processor, healthcare interventions that were effective in the cohort.
- the interventions that are annotated are selected from the group consisting of clinical terms, biological terms, genomic terms, and laboratory testing terms.
- the instructions further comprise code for processing by distributed computers. [0015] In some embodiments, the instructions further comprise code for processing patient files in an electronic medical record.
- a computing machine comprising the machine-readable medium encoded with a computer program comprising instructions executable by a processor for: a) receiving, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention; annotating each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session; based on the indicators, creating a first time-sequential record of the patient, comprising each patient session; comparing the first sequential record to other time- sequential records, of other patients; identifying a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and identifying, by a processor, at least one health care intervention that was most effective for the cohort.
- a system for identifying treatment for a patient comprising: a patient data file input module configured to receive, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention; and a processing module, wherein the processing module is configured to: annotate each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session; based on the indicators, create a first time-sequential record of the patient, comprising each patient session; compare the first sequential record to other time- sequential records, of other patients; identify a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and identify, by a processor, at least one health care intervention that was most effective for the cohort.
- the system comprises an output module configured to output the identified at least one intervention with an indication of a degree of effectiveness of the at least one intervention.
- the processor is configured to annotate the files using a natural language processing technique.
- the processor is configured to use a dynamic programming algorithm to obtain the cohort of similar sequential records.
- the dynamic programming algorithm comprises,
- the processor is configured to annotate an intervention selected from the group consisting of: drug therapy, inpatient procedures, and outpatient procedures.
- the processor is configured to access data files comprising over one mil lion sequential records.
- the processor is configured to receive data from a clinician prioritizing the significance of the respective intervention.
- the processor is configured to identify health care interventions that were effective in the cohort of patients having similar sequential records for patients.
- the processor is configured to annotate the terms from the group consisting of clinical terms, biological terms, genomic terms, and laboratory testing terms.
- the system comprises a plurality of distributed computers.
- the processor is configured to process patient files in the an electronic medical record.
- a method of identifying cancer treatments for a patient comprising: receiving, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention; annotating each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session; based on the indicators, creating a first time-sequential record of the patient, comprising each patient session; comparing the first sequential record to other time- sequential records, of other cancer patients; identifying a cohort of cancer patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and identifying, by a processor, at least cancer treatment that was most effective for the cohort.
- the method includes outputting, to an output device, the identified at least one intervention with an indication of a degree of effectiveness of the at least one intervention.
- terms annotated in the files are annotated using a natural language processing technique.
- a dynamic programming algorithm is used to obtain the cohort of similar sequential records in the step of identifying a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record.
- the algorithm comprises
- the most effective intervention is selected from the group consisting of: drug therapy, inpatient procedures, and outpatient procedures.
- the other patient records comprises over one million sequential records.
- the method includes a step of prioritizing, by a clinician, the significance of the respective intervention.
- the terms annotated are selected from the group consisting of clinical terms, biological terms, genomic terms, and laboratory testing terms.
- a non-transitory computer-readable medium encoded with a computer program comprising instructions executable by a processor to perform a method for identifying a cancer treatment for a patient, the instructions comprising code for: receiving, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention; annotating terms in each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session; based on the indicators, creating a first time-sequential record of the patient, comprising each patient session; comparing the first sequential record to other time-sequential records, of other patients; identifying a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and identifying, by a processor, at least one health care intervention that was most effective for the cohort.
- the instructions further comprise code for outputting, to an output device, the identified at least one intervention with an indication of a degree of effectiveness of the at least one intervention.
- the instructions include code for annotating the files using a natural language processing technique.
- the instructions comprise code for using a dynamic programming algorithm to obtain the cohort of similar sequential records.
- the instructions comprise code for using the following algorithm
- the instructions further comprise code for annotating an intervention selected from the group consisting of: drug therapy, inpatient procedures, and outpatient procedures. In some embodiments, the instructions further comprise code for accessing over one million sequential records. In some embodiments, the instructions further comprise code prioritizing, by a clinician, the significance of the respective intervention. In some embodiments, the instructions further comprise code for identifying, by a processor cancer treatments that were effective in the cohort of patients having similar sequential records for patients. In some embodiments, the instructions further comprise code for annotating the terms in step from the group consisting of clinical terms, biological terms, genomic terms, and laboratory testing terms. In some embodiments, the instructions further comprise code for processing by distributed computers.
- the instructions further comprise code for processing patient files in an electronic medical record.
- a computing machine comprising the machine-readable medium encoded with a computer program comprising instructions executable by a processor for: receiving, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention; annotating each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session; based on the indicators, creating a first time-sequential record of the patient, comprising each patient session; comparing the first sequential record to other time- sequential records, of other patients; identifying a cohort of patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and identifying, by a processor, at least one health care intervention that was most effective for the cohort.
- a system a for identifying cancer treatments for a patient comprising: a patient data file input module configured to receive, by a processor, data files, each of the files representing an encounter between the patient and a respective health care intervention; and a processing module, wherein the processing module is configured to: annotate each of the files with a respective indicator of a time associated with the respective intervention, to create a respective patient session; based on the indicators, create a first time-sequential record of the patient, comprising each patient session; compare the first sequential record to other time- sequential records, of other patients; identify a cohort of cancer patients having similar sequential records by determining which of the other sequential records have a degree of similarity to the first sequential record; and identify, by a processor, at least one cancer treatment that was most effective for the cohort.
- the system comprises an output module configured to output the identified at least one intervention with an indication of a degree of effectiveness of the at least one intervention.
- the processor is configured to annotate the files using a natural language processing technique.
- the processor is configured to use a dynamic programming algorithm to obtain the cohort of similar sequential records.
- the dynamic programming algorithm comprises,
- H(, j) max ⁇ , ⁇ i ⁇ m, ⁇ ⁇ j ⁇ n
- the processor is configured to annotate an intervention selected from the group consisting of: radiation therapy, and drug therapy.
- the processor is configured to access data files comprising over one million sequential records.
- the processor is configured to receive data from a clinician prioritizing the significance of the respective intervention.
- the processor is configured to identify health care interventions that were effective in the cohort of patients having similar sequential records for patients.
- the processor is configured to annotate the terms from the group consisting of clinical terms, biological terms, genomic terms, and laboratory testing terms.
- the system comprises a plurality of distributed computers.
- the processor is configured to process patient files in an electronic medical record.
- one or more of the other time-sequential records includes (a) normal scores corresponding to a first patient not having a disease state across a time period; (b) unresponsive scores corresponding to a second patient having the disease state across the time period; (c) improving scores corresponding to a third patient recovering from the disease state across the time period; and (d) degrading scores corresponding to a fourth patient succumbing to the disease state across the time period.
- the penalties may include ( 1 ) a first penalty for starting a gap between the first sequential record and one of the other time-sequential records (2) a second penalty for continuing a gap between the first sequential record and the one of the other time-sequential records.
- the first time-sequential record may include (1 ) a sequential indicator of the patient and (2) a non-sequential indicator of the patient.
- the methods and systems of the subject technology may determine a predictive feature of each of the indicators with respect to the degree of similarity between the first sequential record and the other time-sequential records.
- the indicators may be ranked according to the predictive features.
- the predictive feature may be one of a positive predictive value, a negative predictive value, a sensitivity, and a specificity.
- [0036] is a method of identifying an event leading to a target selection by a user, comprising: (a) receiving, by a processor, data files, each of the files representing an encounter between the user and a user interface; (b) annotating each of the files with a respective indicator of a time associated with the encounter, to create a respective user session; (c) based on the indicators, creating a user time-sequential record of the user, comprising each user session; (d) comparing the user sequential record to other time-sequential records, of other users; (e) identifying a cohort of users having similar sequential records by determining which of the other sequential records have a degree of similarity to the user sequential record; and (f) identifying, by a processor, an event that most frequently precedes a target selection by the cohort.
- the method may include displaying the event to the user via the user interface.
- the event may include a display provided to the user via the user interface.
- the event may include an input provided by the user to the user interface.
- the target selection may include a purchase executed by the user via the user interface.
- the user interface may be a website.
- a system for identifying an event leading to a target selection by a user comprising: a user data file input module configured to receive, by a processor, data files, each of the files representing an encounter between the user and a user interface; and a processing module, wherein the processing module is configured to annotate each of the files with a respective indicator of a time associated with the encounter, to create a respective user session; based on the indicators, create a user time-sequential record of the user, comprising each user session; compare the user sequential record to other time-sequential records, of other users; identify a cohort of users having similar sequential records by determining which of the other sequential records have a degree of similarity to the user sequential record; and identify an event that most frequently precedes a target selection by the cohort.
- the system may include a display module configured to display the event to the user via the user interface.
- the event may include an input provided by the user to the user interface.
- the target selection may include a purchase executed by the user via the user interface.
- the user interface may be a website.
- the processing module may be further configured to apply a substitution matrix to assign penalties for substituting an indicator of the user sequential record for an indicator of one of the other time-sequential records.
- a method of identifying a published event leading to a target market event comprising: (a) receiving, by a processor, data files, each of the files representing a sequence comprising a first published event and a first market event, occurring after the first event; (b) annotating each of the files with a respective indicator of a time associated with the sequence, to create a respective event session; (c) based on the indicators, creating a first time- sequential record of the user, comprising each event session; (d) comparing the first sequential record to second time-sequential records, of other sequences comprising second published events and second market events, each occurring after a respective one of the second events; (e) identifying a cohort of sequences having similar sequential records by determining which of the second sequential records have a degree of similarity to the first sequential record; and (f) identifying, by a processor, an identified published event that most frequently precedes a target market event.
- the method may include outputting, to an output device, the identified published event.
- the published event may include publication of a news article.
- the system may include a display module configured to display the identified published event.
- the published event may include publication of a news article.
- the target market event may include a change of a value of an asset.
- the processing module may be further configured to apply a substitution matrix to assign penalties for substituting an indicator of the user sequential record for an indicator of one of the other time-sequential records.
- FIG. 1 shows a flowchart of a method of identifying treatment for a patient, according to some embodiments of the subject technology.
- FIG. 2 shows a flowchart of a method of identifying treatment for a patient, according to some embodiments of the subject technology.
- FIG. 3 illustrates a simplified diagram of a system, in accordance with various embodiments of the subject technology.
- FIG. 4 illustrates a simplified block diagram of a server, in accordance with various embodiments of the subject technology.
- FIG. 5 is a conceptual block diagram illustrating an example of a system, in accordance with various embodiments of the subject technology.
- a method of improving patient outcomes is provided by identifying best practice treatment for cohorts of patients and applying them to new patients that are identified as similar.
- a method of predicting laboratory test results in the near-term for patients is provided by identifying patients that have a statistically significant probability of going out of a predetermined range based on patterns or similar cohorts of patients.
- the term "significant probability” means having a statistically significant probability as viewed by a clinician, for example with a p value of less than 0.05.
- the term "predetermined range” means per clinical guidelines or other guidelines.
- the term "test result” means the outcome of a diagnostic test and the term “future test result” means a test result obtained in the future.
- Optimal patient treatment can be achieved by identifying best treatment practices for similar patients. It has been discovered that identifying cohorts of patients who are similar to the patient and applying the best treatment practices found for the cohort may achieve optimal treatment for the patient for whom treatment is sought. Examples of illnesses or conditions for which such a method of applying the best treatment practices found for a similar cohort include cancers, autoimmune diseases, and neurodegenerative diseases. Current cancer treatments include radiation and chemotherapy, which have many serious negative side effects. It is therefore, beneficial to determine a treatment or treatments that may be most effective in a particular patient, prior to commencing any treatment with such negative side effects.
- test results may include: blood diagnostic tests, (pressure, cholesterol levels, glucose levels, protein levels), urine analysis, blood platelet levels, tissue biopsies, protein levels, heart rate, and other tests.
- Biological sequence analysis techniques have been used to process DNA, RNA and peptide sequences in order to better elucidate its structure, function, features and transformation.
- Such biological sequence analysis involves use of biological databases populated by the results of high-throughput production of gene and protein sequences. Comparing new sequences to those with known functions as stored in databases has increased understanding of the biology of an organism from which the new sequence comes. Sequence analysis has also been used to assign function to genes and proteins by the study of the similarities between the compared sequences.
- sequence alignment Two main types of sequence alignment currently exist: pair-wise sequence alignment, which only compares two sequences at a time, and multiple sequence alignment, which compares many sequences at one time.
- Algorithms may be used to align pairs of sequences. Examples of such algorithms include the are the Needleman-Wunsch algorithm and the Smith- Waterman algorithm.
- Repeat matching alignment may also be used, in which repeating subsequence motifs are identified in the sequence, overlapping alignments where overhanging ends are not penalized.
- Hybrid alignment techniques may also be used. These hybrid techniques modify the dynamic programming formula to favor specific structures in the sequences. Complex insertion and deletion penalties that are dependent on the initiation and length of the gap or use an affine gap cost structure may also be used.
- heuristic alignment algorithms such as Basic Local Alignment Search Tool (BLAST) (Altschul et al. 1990) and alternate versions of BLAST and FASTA (Pearson & Lipman 1988) may also be used.
- BLAST uses highly matched short seed sequences from which to extend out the alignment.
- FASTA is a multistep approach that starts with exact matches, extends to ungapped matches and then identifies gapped alignments.
- the Needleman-Wunsch algorithm (also referred to as the optimal matching algorithm) performs a global alignment on two sequences and may be used to align protein or nucleotide sequences.
- the Needleman-Wunsch algorithm is an example of dynamic programming, which simplifies a complicated problem by breaking it down into simpler sub-problems in a recursive manner.
- scores for aligned characters are specified by a similarity matrix, which is a matrix of scores which express the similarity between two data points. Higher scores are given to more-similar characters, and lower or negative scores for dissimilar characters.
- the Smith-Waterman algorithm is also an example of dynamic programming and has been used for performing local sequence alignment in order to determine similar regions between two nucleotide or protein sequences.
- the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure.
- the Smith- Waterman algorithm finds the optimal local alignment with respect to the scoring system being used.
- the scoring system may include the substitution matrix scheme and the gap-scoring scheme.
- a substitution matrix describes the rate at which one character in a sequence changes to other character states over time.
- Substitution matrices have been used in the context of amino acid or DNA sequence alignments, where the similarity between sequences depends on their divergence time and the substitution rates as represented in the matrix.
- the primary difference between the Smith-Waterman and the Needleman-Wunsch algorithm is that negative scoring matrix cells are set to zero, which renders the (thus positively scoring) local alignments visible.
- Backtracking starts at the highest scoring matrix cell and proceeds until a cell with score zero is encountered, yielding the highest scoring local alignment.
- the application technology used the aforementioned sequence analysis techniques to identify best practice treatment for cohorts of patients and applying them to new patients that are identified as similar by physicians and to identify patients that have high probability of going out of range based on patterns or similar cohorts of patients. Contrary to previous research, (see for example, Lee et al., "Local Alignment Tool for Clinical History: Temporal Semantic Search of Clinical Databases" AMIA 2010 Symposium Proceedings p.
- substitution matrix has been found by Applicant to be successful in identifying best practice treatment for cohorts of patients and applying them to new patients that are identified as similar by physicians and identifying patients that have high probability of going out of range based on patterns or similar cohorts of patients.
- the substitution matrix is initialized with tunable parameters of Match Weight (value set for identical match across diagonal (i,i) positions in matrix) and MisMatchWeight (value set for mismatches of variables (i,j) where i is not equal to j).
- Match Weight value set for identical match across diagonal (i,i) positions in matrix
- MisMatchWeight value set for mismatches of variables (i,j) where i is not equal to j.
- the sequences are aligned with initialized matrix. The predictive utility of the aligned sequences is evaluated with cross validation.
- a similarity matching method 10 may include accessing the electronic medical records (EMR) 20 of patients as stored on a non-transitory computer readable form, such as a computer hard-drive.
- EMR electronic medical records
- This EMR may be systematic collection/database or log of electronic medical information about individual patients in digital format that can be shared across different health care settings i.e. accessed by different physicians at different healthcare facilities over a network (as shown in FIG. 3).
- the Veterans Affairs Informatics and Computing Infrastructure is an example of such a database.
- the EMR may be accessed via a network connection and may include a range of data, including medical history (e.g.
- the term "health care intervention” includes any of, and any combination of lab tests, imaging (x-rays, CT, MRI, ultrasound), surgeries, inpatient and outpatient medical procedures, physical, psychological and other interactions with any health care worker (doctor, nurse, pharmacist, therapist, etc.)
- the EMR data may be retrieved and annotated with an indicator of time, thereby converting the EMR data into annotated sequences of health care interventions 25, and creating a sequential record made up of each health care intervention based on the time indicator.
- the term “annotate” includes taking note, annotating, or otherwise supply an indication.
- time as used herein includes any of, and any combination of: day, date, week, month, year, minute, hour, second, or shorter or longer period of time.
- the data may be annotated with the indicator of time by identifying an intervention using a natural language processing technique.
- Natural language processing techniques may use machine learning to identify an intervention in EMR data and annotate these events with an indicator.
- the natural language processing technique may identify and annotate clinical terms, biological terms, genomic terms, and laboratory testing terms.
- the annotated term may have a value (discrete or continuous) and a time.
- Exemplary machine learning techniques may include Weka (Waikato Environment for Knowledge Analysis) and ML-Flex.
- the annotates and annotated sequences of events for the patient may then be converted into a system of annotating, such as a markup language, and stored on a computer readable medium 30.
- An example of such a markup language is an example of which is Extensible Markup Language (XML).
- XML Extensible Markup Language
- This may be repeated for multiple patients to create a database of annotated patient sequences.
- the XML annotation or tag thus may have a tagged term, a value (discrete or continuous) and a time.
- a processor may be used to process the annotated sequences in the patient database.
- the processor may use statistical and machine learning techniques to rank the predictive utility of individual annotations at predicting outcome of a clinical question 35.
- Distributed computers may process the data using various software frameworks, such as Apache Hadoop.
- Distributed computers may process the data using various software frameworks, such as Apache Hadoop, HBase and Accumulo to store and retrieve the sequential records.
- Feature selection may be performed on the XML tagged values in the record using subset selection techniques including but not limited to wrappers and filters that search through the space of possible features.
- Predictive utility rankings may be evaluated using methods including predictive classifiers and feature selection methods such as ReliefF to get a ranking of how well the features separate among the outcomes of the clinical question.
- ReliefF uses a nearest neighbor approach to numerically rank how well features distinguish between different outcomes. .
- N annotates are selected based on the threshold of the predictive ability starting with annotates ranked with the highest predictive utility 40.
- a substitution matrix is then set 45.
- the substitution matrix may be composed of N x N cells that represent the substitutability of two annotates in a sequence.
- the sequences may then be aligned 60 using DNA sequence algorithms such as dynamic programming, an example of which is a Smith and Waterman algorithm:
- New features may be constructed from identified subsequences with high coverage and predictive ability for clinical outcome of interest 65. Machine learning techniques may then be performed with cross validation to predict outcomes to clinical questions of interest 70.
- the predictive performance of learned models may be assessed and predictive alignments are used to incrementally improve substitution matrix 75.
- the threshold for improvement of substitution matrix predictive model performance over previous model may be set 50 and used to set the substitution matrix 45.
- the machine calculated substitution matrix, expert assessed substitution matrix, constructed subsequence features, predictive models and model parameters are stored on a non-transitory computer readable medium 55. In this manner, a cohort of similar sequential records may be obtained by determining which patient records as similar to or relevant to predicting the outcome of a clinical question.
- the sequences are aligned using DNA sequence alignment algorithms 80 and options and predicted outcomes are displayed 85 via an output device.
- an output device includes any one or and/or a combination of displays, storage, print-out, etc.
- Healthcare interventions that were most effective for the similar patient cohort and most predictive of future test results for a new patient may be outputted (e.g. on a display, or printout) by retrieving the new patient's EMR 90, converting the EMR data into an annotated sequence in order to answer clinical questions 95, and then aligning the sequences with DNA sequence alignment algorithms using a substitution matrix 80.
- the most effective health care intervention options and predicted test results for the new patient may be outputted 85.
- the predicted time when the test results is predicted to go out of a predetermined range is also outputted.
- the EMR for a new patient may be retrieved as data files of the patient's encounters with various physicians.
- Terms in the EMR may be identified using a natural language processing technique and annotated 95 with a time indicator to define a patient session.
- the respective patient session is an intervention annotated with an indication of time.
- a sequential record may be created which includes each patient intervention based on the time indicators 95.
- the patient's sequential record may then be compared with other patients' sequential records that are similar to the patient's sequential record by aligning the sequences using DNA sequence algorithms using a substitution matrix 80.
- a cohort of similar sequences may be obtained by determining which of the other patients EMR's are similar to the patient's sequential record and the identifying healthcare interventions (e.g. drug therapy, physical therapy, radiation therapy) that were most effective for patients in the cohort of similar sequential records.
- a cohort of similar sequences may be obtained by determining which of the other patients EMR's are similar to the patient's sequential record and then predicting outcomes based on patients in the cohort of similar sequential records
- a similarity matching method with expert input 100 may include accessing a clinical guideline 1 10 and converting it into sequences of annotated events 1 15.
- the annotated sequences may be represented as XML computer readable code 120.
- An expert such as a physician, may input data ranking the importance and relevance of clinical events to annotate in clinical care sequences.
- the clinical expert aligns a subset of patient sequences with archetype sequences 125.
- the annotates are stored, and sequences for patients and XML annotate sequences are annotated as architypes for clinical care practices 130.
- Patient annotated sequences, architype sequences and sequences for expert analysis, assessment and incremental improvement are displayed 135.
- a substitution matrix composed of N x N cells that represent the substitutability of two annotates in a sequence is set 140.
- the sequences are aligned with DNA sequence alignment algorithms 160 as in 60 of FIG. 1 , using a substitution matrix.
- New features are constructed from identified subsequence with high coverage and predictive ability for clinical outcome of interest 1 65.
- a machine learning technique is performed with cross validation to predict outcomes to a clinical question of interest 170.
- the predictive performance of learned models may be assessed and predictive alignments maybe used to incrementally improve the substitution matrix 175.
- the learned models and constructed subsequence features and alignments may be displayed 200.
- Clinical experts may then assess predictive models, select features of relevance, and evaluate alignments for improving predictive models 205.
- the threshold for improvement of substitution matrix and predictive model performance over previous model may be set 150.
- the machine calculated substitution matrix, expert assessed substitution matrix, constructed subsequence features, predictive models and model parameters are stored on a computer readable memory 155.
- the sequences may then be aligned with DNA sequence alignment algorithms 180 using a substitution matrix.
- the display treatment options and predicted outcomes may then be displayed 185.
- Treatment options and predicted outcomes for a new patient may be displayed by retrieving the new patient's EMR 190, converting the EMR data into annotated sequence in order to answer clinical questions 195, and then aligning the sequences with DNA sequence alignment algorithms using a substitution matrix 180. In this manner, the treatment options and predicted outcome for the new patient may be displayed 185.
- Example 1 A physician wanting to identify the best treatment for lowering the blood pressure of a patient may retrieve the EMR of the patient and submit the EMR for processing by a computer readable program executable by a processor, such as in a computer.
- the program may convert the EMR data into annotated sequences of events for the patient by annotating each event with an indicator.
- the annotated sequence may be that the patient first had elevated blood pressure, a day later the patient was prescribed blood pressure medication A, three months later the patient then suffered a heart attack, six months later a different blood pressure medication B was prescribed, two years later the patient then suffered a stroke, and the patient's blood pressure remains elevated.
- the annotated sequences may then be converted to XML annotates.
- the patient's sequence may then be compared to a cohort of patients have similar sequences in order to determine which treatment was successful for those other patients.
- the step of obtaining a cohort of patient having similar sequences may be achieved by converting EMR data of a large database of patients into annotated sequences of events for each patient by annotating each event with an indicator of time.
- Statistical and a machine learning technique as implemented by one or many distributed computer processors may be used to rank the predictive utility of individual annotates at predicting the outcome of treating the patient's high blood pressure. For example, the processors may identify a heart attack followed by stroke as the top two predictors in sequence having utility in the clinical question - (how to lower the patient's blood pressure).
- the executable program may then select the annotates heart disease, stroke, and current use of medication B in a substitution matrix in order to determine those patients with a similar subsequence to the patient's (i.e. identify those patients who suffered a sequence in which a heart attack was followed by a stroke and who are currently taking medication B).
- these subsequences of patients determined to be similar are used to construct new features having high coverage and predictive ability for lowering blood pressure.
- the substitution matrix is saved in a database and the predictive clinical treatment is evaluated for success in the patient. Based on the evaluation, the predictive performance is assessed and incrementally improved.
- Example 2 is the same as Example 1 , except the relevance of clinical events to annotate in clinical care sequences may be ranked by an expert, such as a physician.
- the clinical expert may assign a subset of patient sequences with architype sequences.
- Example 3 A physician wanting to predict a patient's laboratory test value may retrieve the EMR of the patient and submit the EMR for processing by a computer readable program executable by a processor, such as in a computer.
- the program may convert the EMR data into annotated sequences of laboratory test results for the patient by annotating each lab test with an indicator of time.
- the annotated sequence may be that the patient first had high cholesterol levels, followed by high blood, and the physician would like to predict if and when the patient will have high blood glucose levels indicative of diabetes.
- the annotated lab results sequences may then be converted to XML annotates.
- the patients' sequence may then be compared to a cohort of patients have similar lab test results followed by high glucose levels.
- the step of obtaining a cohort of patient having similar sequences may be achieved by converting EMR data of a large database of patients into annotated sequences of events for each patient by annotating each lab test results event with an indicator of time.
- Statistical and a machine learning technique as implemented by one or many distributed computer processors may be used to rank the predictive utility of individual annotates at predicting if and when the patient may have high blood glucose levels.
- the processors may identify high blood pressure followed by high cholesterol levels as the top two lab test value predictors in sequence having utility in the clinical question - (if and when the patient may have high glucose levels).
- the executable program may then select the annotates high blood pressure and high cholesterol levels in a substitution matrix in order to determine those patients with a similar subsequence to the patient's (i.e. identify those patients who suffered a sequence in which high blood pressure was followed by high cholesterol levels).
- these subsequences of patients determined to be similar are used to construct new features having high coverage and predictive ability for high glucose levels.
- the substitution matrix is saved in a database and the predictive test is evaluated for success in the patient. Based on the evaluation, the predictive performance is assessed and incrementally improved.
- a Patient Health Record Similarity Measure (PHRSM) framework can use sequences of similar medical events to identify signal in patients' healthcare event data.
- the framework may be applied to data to predict, for example, intensive care unit (ICU) patient mortality.
- ICU intensive care unit
- the EMR data can form inputs for algorithms to create models that predict in-hospital survival of patients.
- the challenge data set can include EMR with multiple variables for a plurality of anonymous patients with a death rate just under a given percentage.
- challenge data from the PhysioNet 2012 challenge set included EMR with 37 variables for 4,000 anonymous patients with a death rate just under 14%.
- the challenge data set is limited by its size, it provides an event sequence for multiple patients in ICUs and allows us to compare the performance of the framework to other prediction methodologies such as Simplified Acute Physiology Scores (SAPS).
- SAPS are used widely to assess effectiveness of clinical care, medications and treatment in the context of severity of illness within hospitals.
- the SAPS I scoring system was used to predict in-hospital survival for ICU data.
- machine learning methods are used to reduce the number of variables.
- a predictive model for severity of illness scale may be applied.
- Such a model may use a variety of physiologic measurements, such as elective surgery, age, and prior length of stay.
- An example of such a model is the Oxford Acute Severity of Illness Score ("OASIS' * ) (Johnson AE, Kramer AA, Clifford GD.
- OASIS' * Oxford Acute Severity of Illness Score
- the scale can use a reduced set of a number of clinical variables (e.g., ten clinical variables) that result in a cumulative score that ranges between a lower bound (e.g., 0) and an upper bound (e.g., 75).
- the maximum contribution of the variables are not identical. Some variables can contribute a maximum of ten severity points in OASIS, while other variables can contribute a maximum of four severity points. Other contributions are contemplated. For example, a variable may contribute a maximum of 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more severity points. Different variables can contribute different maximum severity points.
- the data was used to demonstrate that sequence-based variables are informative for predicting patient mortality in four types of ICUs.
- the PHRSM approach may be based upon the flowchart of FIG. 1.
- the EMR data can be converted into tagged event sequences, which is used to create new sequential variables that are evaluated for their utility in predicting in-hospital survival.
- the specific method for constructing the sequence variables involves using a distance measure related to the alignment of the patient's sequence to archetypal sequences.
- the PHRSM framework can be used with a large range of machine learning algorithms.
- the PHRSM framework can be used with a rule generating classifier (e.g., partial decision tree classification or "PART").
- PART partial decision tree classification
- One benefit of a rule generating classifier is that the rules can be examined for clinical relevance.
- data were obtained from the PhysioNet 2012 challenge.
- the data set consisted of 4000 cases with 554 (13.9%) being in-hospital deaths and the remaining 3446 being survivors.
- the data were composed of results from multiple medical tests for each patient over a period of time (e.g., 48 hours).
- the data set included patient SAPS I scores.
- the SAPS I score uses maximum values for a number of variables (e.g., 14 variables) over a period of time (e.g., 24 hours) and translates them into a severity score within a range (e.g., 0 to 4), with higher values indicating more severe events. These values are added to produce a cumulative value within a range (e.g., 0 to 56) with cumulative scores greater than or equal to a threshold (e.g., 20) used to predict death.
- a threshold e.g. 20
- variable selection algorithms are not used to reduce the clinical variables (e.g., from 42 to 5).
- a small number of clinical variables e.g., four clinical variables
- These variables can be or include Age, Urine, Glasgow Coma Scale (GCS), and Mechanical Ventilator.
- Age and Urine are non-sequential variables and are used as informative variables in the classifier. While a non-sequential variable may vary over time (e.g., age increasing over time), the non-sequential variables remain constant across a sample time period for purposes of the comparison.
- An additional variable, ICU can be used.
- ICU can include ( 1 ) Coronary Care Unit ("ICU l “), (2) Cardiac Surgery Recovery Unit (“ICU2”), (3) Medical ICU (“ICU3”), and (4) Surgical ICU ("ICU4").
- the clinical Age variable can be used to create a new discrete variable that indicated if the patient above or below an age threshold (e.g., younger than 79 or was 79 or older).
- GCS and Mechanical Ventilator can be used to create new sequential variables using the PHRSM framework.
- the GCS and Mechanical Ventilator variables have multiple measurements over a period of time (e.g., 48 hours) that can be converted into sequences. To convert these into a sequence, the measurements are grouped and made into discrete events. For example, each measurement may be categorized into one of a plurality of levels. For example, a measurement may determine the presence or absence of a condition, therefore indicating a categorization into one of two levels. By further example, a measurement may determine the presence or absence of each of a plurality of conditions, the plurality of conditions being combinable to indicate a categorization into one of more than two levels (e.g., contiguous ranges). A similar method can be used for both variables. The below description focuses on the method of GCS sequence construction.
- the GCS scores are composed of a scale of consciousness measurements based on eye response, verbal response and motor response.
- the GCS can be divided into four levels, for example, with assigned letter labels: 3-7(D), 8- 13(R), 14(N) and 15(A).
- a score of 3 indicates no eye, verbal, or motor response to stimulus.
- a score of fifteen means all three areas are functioning normally.
- Each level contributes a different number of severity points to the overall score with D contributing 10 severity points with the patient being in a coma state, R contributing 4, N contributing 3, and A contributing 0 with the patient being fully responsive.
- these letters may be treated like amino acids in a protein sequence.
- Assigning letters to the hourly GCS scores constructs a new PHRSM sequential variable over the period of time (e.g., 48 hours) for each patient. For missing hourly measures, the most recent previous GCS measure can be used. This results in one score per hour (e.g, a total of 48 scores). For example, a patient with GCS scores in the 3-7 range over an entire 48-hour time period would have a sequence of 48 letter D's (i.e.
- the substitution matrix can be constructed based on the differences between severity scores for GCS with a substitution from D to A being a -10 point penalty.
- Table 1 below, provides the substitution matrix for GCS sequences. The penalty for starting a gap was -10 and continuing a gap was -1, where a gap is a stretch of the sequence that does not align with the comparison sequence.
- the sequence for Mechanical Ventilator can be generated in the same fashion with patients being on or off the ventilator over the period of time (e.g., 48 hours).
- a different substitution matrix was used with an alphabet of two letters: D for on ventilator and A for off ventilator.
- Table 2 shows the substitution matrix for the Ventilator sequence.
- the gap penalties can be the same as for GCS sequences.
- each patient's sequence can be compared to four archetypal sequences:
- sequence alignment system may be utilized, for example, the open source system UGENE (Okonechnikov , Golosova O, and Fursov M. Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics. 28: 1 166-1 167. 2012). Matches 5% or lower can be assigned zero. Since sequences can have multiple subsequence alignments, the best match for an archetype can be selected and then stored as a sequence based variable.
- sequence alignment results in sequence-based variables that are within a range (e.g., between 0 and 480), with the upper bound of the range (e.g., 480) representing a match on all of a number of measurements (e.g., 48 measurements) in the sequence.
- a range e.g., between 0 and 480
- the upper bound of the range e.g., 480
- a classifier may be used to execute machine learning algorithms.
- One exemplary classifier is a partial decision tree rule generator named PART that is available through Weka, an open source machine-learning environment. PART uses decision trees to construct a hierarchical set of rules called a PART Decision List. The rules are applied in order with the first rule applying to the full set and the last rule applying to the remaining set. Default parameters can be used when executing the machine learning algorithms.
- a separate classifier can be constructed for each ICU using nine variables: four GCS sequence-based variables, one mechanical ventilator sequence-based variable, three clinical variables (ICU, Age, and Urine), and a new discrete informative variable that indicated if the patient younger than 79 or 79 and older.
- the predictive ability of each classifier can be assessed using cross validation and a hold out data set to estimate the generalization performance (Kohavi, Ron. "A study of cross-validation and bootstrap for accuracy estimation and model selection.” IJCA1. Vol. 14. No. 2. 1995).
- a subsample selection method can be used so that the classifiers are cross-validated on subsets with equal numbers of patients that survived or died in-hospital. The remaining patients can be used as hold out test sets (3210 out of 4000 patients).
- the generated classification models can be stored along with the substitution matrices for the sequence variables, constructed variables and model parameters.
- classifiers were evaluated for performance using only two of the original clinical variables along with the five new sequential variables. Classifiers were compared with SAPS I, which uses fourteen of the clinical variables. In addition, it was determined how informative the new sequence based variables were.
- Table 3 shows confusion matrices for SAPS I, the sequence feature classifier (SFC) and all four ICUs using SFCs. Since the PhysioNet 2012 challenge involved examining how well algorithms predict mortality, ability to predice in-hospital death or survival was evaluated. Looking at the SAPS I columns, SAPS I correctly predicted that 176 patients would die and 2868 patients would live. SAPS I incorrectly predicted 578 patients who actually lived as ones who would die. Similarly, SAPS I predicted 378 patients who actually died as ones who would live.
- the area under the ROC curve (AUC) (Cantor SB and attan MW. Determining the area under the ROC curve for a binary diagnostic test. Med Decis Making. 20(4):468-70. 2000) can be computed from Table 2 with SAP1 having and AUC of 0.58 and SFC having and AUC of 0.67.
- Table 4 also provides a summary of the sensitivity, positive predictivity AUC for the PhysioNet 2012 challenge data SAPS I, the overall the SFC along with the measures for each ICU.
- Table 4 shows Number of Observed Deaths, Number of Observed Survival, Sensitivity Score, Positive Predictivity (Pos) Score, Minimum (Min) Value and Area Under the ROC (AUC) for SAPS I, the overall SFC, and the SFC for the four Intensive Care Units (ICUs).
- ReliefF uses a nearest neighbor algorithm to calculate how well single variables discriminate between classes. The sequence variables rank highly across the four ICUs with GCS sequence variables consistently scoring the highest.
- Table 5 shows ranked attribute (ReliefF scores) for the four intensive care units (ICUs). All the variables, except Urine, Age, and Age_Discrete, are newly created sequential variables.
- Each ICU classifier is described with respect to its rule set used to classify patients' in-hospital survival or death.
- Table 6, below, provides a list of the decision rules for each ICU.
- the high-ranking sequence variables are included in conjunctive rules that ranked as highest discriminating rules by the PART Decision List algorithm.
- the rules in the classifiers can be interpreted medically.
- Table 6 shows PART Decision Lists for prediction life (L) and death (D) across the four Intensive Care Units (ICUs). All the variables, except Urine, Age, and Age_Discrete, are newly created sequential variables.
- Rule 1 for ICU l can be read as patient that are not consistently in a coma and are younger are predicted to live.
- Rule 2 can be read as, patients that do not have a consistent sequence of normal GCS scores, are degrading to lower GCS scores, are older and can be on a ventilator are predicted to die in-hospital.
- sequence variables across the classifiers demonstrates their utility in predicting survival in-hospitals.
- sequence-based variables are informative and occurred as part of the first rule in all four classifiers. They provide trending and time series information that is lacking in other models of mortality, which use minimum or maximum values over a range and thus lose information. Sequence-base variables supported the generation of compact sets of rules, composed of eight rules or less for the different units. This exemplary approach supports the use of medically meaningful checklists that clinicians can understand in relationship to their patients. Other classifiers such as support vector machines, decision trees, Bayesian classifiers and others could also use the sequence-based variables for predictive analytics.
- the SFC out performed SAPS I on the PhysioNet Challenge data with SFC having an AUC of 0.67 versus 0.58 for SAPS I.
- the AUC values were higher than SAPS I for each of the four ICU.
- the sequence variables in the SFC were highly ranked as informative by the ReliefF algorithm and were used prominently across the PART Decision List classifiers. Therefore, sequence variables created through the PHRSM framework can be used on medical data to predict outcomes and identify medically meaningful signal in patients' EMR data.
- analytical methods and systems of the subject technology may be applied to predict events, selections, trends, and behaviors relating to a user on a user interface.
- An exemplary method 600 is shown in FIG. 6.
- a user may, during a user session, have an encounter with a user interface.
- the user interface may include or be connected to a computer-implemented system, such as a personal computer, an electronic device, a website, a server, combinations thereof, and the like, as further disclosed herein.
- one or more events may occur and be recorded in a data file (operation 602).
- a user may make selections or otherwise provide inputs to the user interface.
- the user interface may provide displays or outputs to the user.
- Each of these events may be recorded with an indicator of a time associated with the encounter (operation 604).
- Events may be recorded using tracking techniques.
- a user may have a user account associated with a user interface, such that information provided by the user is recorded in a data file associated with the user account.
- a unique identifier associated with the user e.g., IP address
- Data e.g.
- cookies may be used to store data associated with the user and/or events of a user session.
- exemplary implementations of tracking techniques include web beacons, tracking bugs, tags, page tags, web bugs, tracking pixels, pixel tags, 1 x 1 gifs, clear gifs, and JavaScript tags.
- Such implementations may recorder facilitate recording of events during a user section, such as selections made by a user, such as websites visited before an event, websites visited after an event, advertisements selected, purchases executed, and the like. Such information is stored, for example, as clickstream data.
- the data is stored with time indicators to create a time- sequential record of the user (operation 606).
- the time-sequential records may span or include one or more user sessions.
- the time-sequential records may include one or more sequential indicators of the user and one or more a non-sequential indicators of the user.
- a target selection by the user is determined.
- the target selection may be a purchase executed by the user, display of a website, or any other selection made by the user and/or input to the user interface.
- the target selection may be desired outcome according to an operator (e.g., of the user interface).
- a user sequential record is compared to other time-sequential records, of other users (operation 608).
- the other time-sequential records may include an indicator of whether or not the other users achieved the target selection.
- the comparison may include techniques disclosed herein. For example, a substitution matrix may be applied to assign penalties for substituting an indicator of the user sequential record for an indicator of one of the other time-sequential records, as disclosed herein. User penalties may be applied for starting a gap between the user sequential record and one of the other time-sequential records and/or continuing a gap between the user sequential record and the one of the other time-sequential records.
- cohorts of users may be evaluated according to a degree of similarity with respect to the user sequential record (operation 610).
- a cohort of users having similar sequential records is identified by determining which of the other sequential records have a degree of similarity to the user sequential record. Accordingly, one or more events associated with the sequential records of the identified cohort is determined to be applicable to the user.
- An event preceding the occurrence of the target selection among the identified cohort is determined as being applicable to the user to potentially lead to the user making the target selection (operation 612). For example, an earlier purchase, website viewing, or other online activity by the identified cohort is determined as preceding a target purchase. Accordingly, the same or a similar event is provided or facilitated by the user interface with respect to the user.
- an advertisement displayed to the cohort prior to a target purchase is identified and/or displayed to the user to facilitate the same target purchase by the user.
- events leading away from a target selection or not leading to a target selection can be identified in a cohort.
- one or more indicators may be evaluated on the basis of one or more predictive features thereof.
- Predictive feature include positive predictive value, negative predictive value, sensitivity, and specificity.
- multiple indicators with respect to the user records and/or the other records may be separately evaluated and subsequently compared.
- Each indicator, or combinations thereof may be separately correlated with the degree of similarity between the user sequential record and the other time-sequential records.
- each indicator, or combinations thereof may be separately correlated with the occurrence of the target selection according to the user records and/or the other records.
- the indicators may be ranked according to the predictive feature(s).
- analytical methods and systems of the subject technology may be applied to predict events, selections, trends, and behaviors relating to a financial asset of interest.
- An exemplary method 700 is shown in FIG. 7.
- the asset of interest may be any property or financial instrument having economic value, including but not limited to stocks, bonds, options, precious metals, equity, contractual rights, real estate, cash, combinations thereof and the like.
- one or more events may occur, including changes in value of a financial asset, published reports of events (including market and non-market related events), broadcasts, social media entries, combinations thereof, and the like (operation 702).
- one or more events may occur and be recorded in a data file associated with an asset of interest (operation 704).
- an asset may increase or decrease in value
- a market related event may be reported in a publication
- a non-market related event may be reported in a publication, etc.
- Each of these events may be recorded with an indicator of a time associated with the event.
- Events may be monitored, collected, and/or recorded using data aggregators, news aggregators, social network aggregators, internet search engines, combinations thereof, and the like.
- reports, publications, and broadcasts may be retrieved and analyzed for information provided therein.
- Information may be extracted from sources such as news websites, blogs, podcasts, video blogs, social media websites (e.g., Twitter), and the like.
- the information may directly relate to or reference the asset of interest, or the information may be indirectly related to the asset of interest.
- the information is stored as data with time indicators to create a time-sequential record of the events session (operation 706).
- the time-sequential records may span or include one or more events sessions.
- the time-sequential records may include one or more sequential indicators of the asset of interest and one or more a non-sequential indicators of the asset of interest.
- an action to be taken with respect to the asset of interest may be identified.
- the action may be the purchase, sale, transfer, trade of the asset of interest.
- a target event related to the asset of interest may be identified.
- the target event may be a desired event or an event in response to which an action is desirable.
- the target event may be an increase or decrease in value of the asset of interest.
- an asset sequential record is compared to other time-sequential records, of the same or other assets across the same or different time periods (operation 708).
- the other time-sequential records may include an indicator of whether or not the same or other assets achieved a target event.
- the comparison may include techniques disclosed herein. For example, a substitution matrix may be applied to assign penalties for substituting an indicator of the asset sequential record for an indicator of one of the other time-sequential records, as disclosed herein. User penalties may be applied for starting a gap between the asset sequential record and one of the other time-sequential records and/or continuing a gap between the asset sequential record and the one of the other time-sequential records.
- cohorts of users may be evaluated according to a degree of similarity with respect to the user sequential record (operation 710).
- a cohort of assets and/or event sessions having similar sequential records is identified by determining which of the other sequential records have a degree of similarity to the asset sequential record. Accordingly, one or more events associated with the sequential records of the identified cohort is determined to be applicable to the asset.
- An event preceding the occurrence of the target event among the identified cohort is determined as being applicable to the asset to potentially lead to the target event (operation 712). For example, an increase or decrease in value of the asset of interest or another asset, publication of a news article containing particular information, and/or posting of information on a social networking platform is determined as preceding a target event.
- the same or a similar event is predicted or forecast with respect to the asset of interest.
- an action to be taken with respect to the asset of interest is identified and/or executed to achieve a result identical or similar to a result of the identified cohort.
- events leading away from a target event or not leading to a target event can be identified in a cohort.
- one or more indicators may be evaluated on the basis of one or more predictive features thereof.
- Predictive feature include positive predictive value, negative predictive value, sensitivity, and specificity.
- multiple indicators with respect to the asset records and/or the other records may be separately evaluated and subsequently compared.
- Each indicator, or combinations thereof may be separately correlated with the degree of similarity between the asset sequential record and the other time-sequential records.
- each indicator, or combinations thereof may be separately correlated with the occurrence of the target event according to the asset records and/or the other records.
- the indicators may be ranked according to the predictive feature(s).
- FIG. 3 illustrates a simplified diagram of a system 101 , in accordance with various embodiments of the subject technology.
- the system 101 may include one or more remote client devices 102 (e.g., client devices 102a, 102b, 102c, and 102d) in communication with a server computing device 106 (server) via a network 104.
- the server 106 is configured to run applications that may be accessed and controlled at the client devices 102.
- a user at a client device 102 may use a web browser to access and control an application running on the server 106 over the network 104.
- the server 106 is configured to allow remote sessions (e.g., remote desktop sessions) wherein users can access applications and files on the server 106 by logging onto the server 106 from a client device 102.
- remote sessions e.g., remote desktop sessions
- Such a connection may be established using any of several well-known techniques such as the Remote Desktop Protocol (RDP) on a Windows-based server.
- RDP Remote Desktop Protocol
- a server application is executed (or runs) at a server 106. While a remote client device 102 may receive and display a view of the server application on a display local to the remote client device 102, the remote client device 102 does not execute (or run) the server application at the remote client device 102. Stated in another way from a perspective of the client side (treating a server as remote device and treating a client device as a local device), a remote application is executed (or runs) at a remote server 106.
- a client device 102 can represent a computer, a mobile phone, a laptop computer, a thin client device, a personal digital assistant (PDA), a portable computing device, or a suitable device with a processor.
- a client device 102 is a smartphone (e.g., iPhone, Android phone, Blackberry, etc.).
- a client device 102 can represent an audio player, a game console, a camera, a camcorder, an audio device, a video device, a multimedia device, or a device capable of supporting a connection to a remote server.
- a client device 102 can be mobile.
- a client device 102 can be stationary.
- a client device 102 may be a device having at least a processor and memory, where the total amount of memory of the client device 102 could be less than the total amount of memory in a server 106.
- a client device 102 does not have a hard disk.
- a client device 102 has a display smaller than a display supported by a server 106.
- a client device may include one or more client devices.
- a server 106 may represent a computer, a laptop computer, a computing device, a virtual machine (e.g., VMware® Virtual Machine), a desktop session (e.g., Microsoft Terminal Server), a published application (e.g., Microsoft Terminal Server) or a suitable device with a processor.
- a server 106 can be stationary.
- a server 106 can be mobile.
- a server 106 may be any device that can represent a client device.
- a server 106 may include one or more servers.
- a first device is remote to a second device when the first device is not directly connected to the second device.
- a first remote device may be connected to a second device over a communication network such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or other network.
- LAN Local Area Network
- WAN Wide Area Network
- a client device 102 may connect to a server 106 over a network 104, for example, via a modem connection, a LAN connection including the Ethernet or a broadband WAN connection including DSL, Cable, T l , T3, Fiber Optics, Wi-Fi, or a mobile network connection including GSM, GPRS, 3G, WiMax or other network connection.
- a network 104 can be a LAN network, a WAN network, a wireless network, the Internet, an intranet or other network.
- a network 104 may include one or more routers for routing data between client devices and/or servers.
- a remote device e.g., client device, server
- a corresponding network address such as, but not limited to, an Internet protocol (IP) address, an Internet name, a Windows Internet name service (WINS) name, a domain name or other system name.
- IP Internet protocol
- WINS Windows Internet name service
- server and “remote server” are generally used synonymously in relation to a client device, and the word “remote” may indicate that a server is in communication with other device(s), for example, over a network connection(s).
- client device and “remote client device” are generally used synonymously in relation to a server, and the word “remote” may indicate that a client device is in communication with a server(s), for example, over a network connection(s).
- a "client device” may be sometimes referred to as a client or vice versa.
- a "server” may be sometimes referred to as a server device or vice versa.
- a client device may be referred to as a local client device or a remote client device, depending on whether a client device is described from a client side or from a server side, respectively.
- a server may be referred to as a local server or a remote server, depending on whether a server is described from a server side or from a client side, respectively.
- an application running on a server may be referred to as a local application, if described from a server side, and may be referred to as a remote application, if described from a client side.
- devices placed on a client side may be referred to as local devices with respect to a client device and remote devices with respect to a server.
- devices placed on a server side may be referred to as local devices with respect to a server and remote devices with respect to a client device.
- FIG. 4 illustrates a simplified block diagram of a server 106, in accordance with various embodiments of the subject technology.
- the server 106 comprises a first display module 202, a user input module 204, a second display module 206, a patient input module 208, and an adjustment module 210.
- the server 106 is communicatively coupled with the network 104 via a network interface.
- the modules can be implemented in software, hardware and/or a combination of both. Features and functions of these modules according to various aspects are further described in the subject technology.
- FIG. 5 is a conceptual block diagram illustrating an example of a system, in accordance with various embodiments of the subject technology.
- a system 601 may be, for example, a client device (e.g., client device 102) or a server (e.g., server 106).
- the system 601 may include a processing system 602.
- the processing system 602 is capable of communication with a receiver 606 and a transmitter 609 through a bus 604 or other structures or devices. It should be understood that communication means other than busses can be utilized with the disclosed configurations.
- the processing system 602 can generate audio, video, multimedia, and/or other types of data to be provided to the transmitter 609 for communication.
- audio, video, multimedia, and/or other types of data can be received at the receiver 606, and processed by the processing system 602.
- the processing system 602 may include a processor for executing instructions and may further include a machine-readable medium 619, such as a volatile or non-volatile memory, for storing data and/or instructions for software programs.
- the instructions which may be stored in a machine-readable medium 610 and/or 619, may be executed by the processing system 602 to control and manage access to the various networks, as well as provide other communication and processing functions.
- the instructions may also include instructions executed by the processing system 602 for various user interface devices, such as a display 612 and a keypad 614.
- the processing system 602 may include an input port 622 and an output port 624. Each of the input port 622 and the output port 624 may include one or more ports.
- the input port 622 and the output port 624 may be the same port (e.g., a bi-directional port) or may be different ports.
- the processing system 602 may be implemented using software, hardware, or a combination of both.
- the processing system 602 may be implemented with one or more processors.
- a processor may be a general-purpose microprocessor, a microcontroller, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a state machine, gated logic, discrete hardware components, or any other suitable device that can perform calculations or other manipulations of information.
- DSP Digital Signal Processor
- ASIC Application Specific Integrated Circuit
- FPGA Field Programmable Gate Array
- PLD Programmable Logic Device
- controller a state machine, gated logic, discrete hardware components, or any other suitable device that can perform calculations or other manipulations of information.
- a machine-readable medium can be one or more machine-readable media.
- Software shall be construed broadly to mean instructions, data, or any combination thereof, whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise. Instructions may include code (e.g., in source code format, binary code format, executable code format, or any other suitable format of code).
- Machine-readable media may include storage integrated into a processing system, such as might be the case with an ASIC.
- Machine-readable media may also include storage external to a processing system, such as a Random Access Memory (RAM), a flash memory, a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable PROM (EPROM), registers, a hard disk, a removable disk, a CD-ROM, a DVD, or any other suitable storage device.
- RAM Random Access Memory
- ROM Read Only Memory
- PROM Erasable PROM
- registers a hard disk, a removable disk, a CD-ROM, a DVD, or any other suitable storage device.
- a machine-readable medium is a computer-readable medium encoded or stored with instructions and is a computing element, which defines structural and functional interrelationships between the instructions and the rest of the system, which permit the instructions' functionality to be realized.
- a machine-readable medium is a non-transitory machine-readable medium, a machine-readable storage medium, or a non-transitory machine-readable storage medium.
- a computer-readable medium is a non-transitory computer-readable medium, a computer-readable storage medium, or a non-transitory computer-readable storage medium.
- Instructions may be executable, for example, by a client device or server or by a processing system of a client device or server. Instructions can be, for example, a computer program including code.
- An interface 616 may be any type of interface and may reside between any of the components shown in FIG. 6.
- An interface 616 may also be, for example, an interface to the outside world (e.g., an Internet network interface).
- a transceiver block 607 may represent one or more transceivers, and each transceiver may include a receiver 606 and a transmitter 609.
- a functionality implemented in a processing system 602 may be implemented in a portion of a receiver 606, a portion of a transmitter 609, a portion of a machine-readable medium 610, a portion of a display 612, a portion of a keypad 614, or a portion of an interface 616, and vice versa.
- module refers to logic embodied in hardware or firmware, or to a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example C++.
- a software module may be compiled and linked into an executable program, installed in a dynamic link library, or may be written in an interpretive language such as BASIC. It will be appreciated that software modules may be callable from other modules or from themselves, and/or may be invoked in response to detected events or interrupts.
- Software instructions may be embedded in firmware, such as an EPROM or EEPROM.
- hardware modules may be comprised of connected logic units, such as gates and flip-flops, and/or may be comprised of programmable units, such as programmable gate arrays or processors.
- the modules described herein are preferably implemented as software modules, but may be represented in hardware or firmware.
- the modules may be integrated into a fewer number of modules.
- One module may also be separated into multiple modules.
- the described modules may be implemented as hardware, software, firmware or any combination thereof. Additionally, the described modules may reside at different locations connected through a wired or wireless network, or the Internet.
- the processors can include, by way of example, computers, program logic, or other substrate configurations representing data and instructions, which operate as described herein.
- the processors can include controller circuitry, processor circuitry, processors, general purpose single-chip or multi-chip microprocessors, digital signal processors, embedded microprocessors, microcontrollers and the like.
- the program logic may advantageously be implemented as one or more components.
- the components may advantageously be configured to execute on one or more processors.
- the components include, but are not limited to, software or hardware components, modules such as software modules, object- oriented software components, class components and task components, processes methods, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.
- top should be understood as referring to an arbitrary frame of reference, rather than to the ordinary gravitational frame of reference.
- a top surface, a bottom surface, a front surface, and a rear surface may extend upwardly, downwardly, diagonally, or horizontally in a gravitational frame of reference.
- a phrase such as "an aspect” does not imply that such aspect is essential to the subject technology or that such aspect applies to all configurations of the subject technology.
- a disclosure relating to an aspect may apply to all configurations, or one or more configurations.
- An aspect may provide one or more examples of the disclosure.
- a phrase such as “an aspect” may refer to one or more aspects and vice versa.
- a phrase such as “an embodiment” does not imply that such embodiment is essential to the subject technology or that such embodiment applies to all configurations of the subject technology.
- a disclosure relating to an embodiment may apply to all embodiments, or one or more embodiments.
- An embodiment may provide one or more examples of the disclosure.
- a phrase such "an embodiment” may refer to one or more embodiments and vice versa.
- a phrase such as "a configuration” does not imply that such configuration is essential to the subject technology or that such configuration applies to all configurations of the subject technology.
- a disclosure relating to a configuration may apply to all configurations, or one or more configurations.
- a configuration may provide one or more examples of the disclosure.
- a phrase such as "a configuration” may refer to one or more configurations and vice versa.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- General Health & Medical Sciences (AREA)
- Epidemiology (AREA)
- Data Mining & Analysis (AREA)
- Entrepreneurship & Innovation (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Operations Research (AREA)
- Theoretical Computer Science (AREA)
- General Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- General Physics & Mathematics (AREA)
- Marketing (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/629,465 | 2012-09-27 | ||
| US13/629,473 US20140089004A1 (en) | 2012-09-27 | 2012-09-27 | Patient cohort laboratory result prediction |
| US13/629,465 US20140089003A1 (en) | 2012-09-27 | 2012-09-27 | Patient health record similarity measure |
| US13/629,473 | 2012-09-27 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2014052921A2 true WO2014052921A2 (fr) | 2014-04-03 |
| WO2014052921A3 WO2014052921A3 (fr) | 2014-07-31 |
Family
ID=50389159
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2013/062460 Ceased WO2014052921A2 (fr) | 2012-09-27 | 2013-09-27 | Mesure de similarité de dossier médical de patient |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2014052921A2 (fr) |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2016034996A1 (fr) * | 2014-09-03 | 2016-03-10 | Optum, Inc. | Moteur de similarité de soins de santé et tableau de bord |
| CN109074344A (zh) * | 2016-03-25 | 2018-12-21 | 摄取技术有限公司 | 用于基于预测模型来创建资产相关任务的计算机系统和方法 |
| US10431343B2 (en) | 2016-05-23 | 2019-10-01 | Koninklijke Philips N.V. | System and method for interpreting patient risk score using the risk scores and medical events from existing and matching patients |
| US20190341153A1 (en) * | 2018-05-01 | 2019-11-07 | International Business Machines Corporation | Generating personalized treatment options using precision cohorts and data driven models |
| CN113707255A (zh) * | 2021-08-31 | 2021-11-26 | 平安科技(深圳)有限公司 | 基于相似患者的健康指导方法、装置、计算机设备及介质 |
| EP3751473A4 (fr) * | 2018-02-09 | 2021-12-08 | Axion Research Inc. | Système qui estime l'état d'un système complexe à inspecter |
| US11373758B2 (en) * | 2018-09-10 | 2022-06-28 | International Business Machines Corporation | Cognitive assistant for aiding expert decision |
| WO2022245836A1 (fr) * | 2021-05-17 | 2022-11-24 | Dexcom, Inc. | Systèmes de détermination de la similarité de séquences de valeurs de glucose |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120010867A1 (en) * | 2002-12-10 | 2012-01-12 | Jeffrey Scott Eder | Personalized Medicine System |
| US8200775B2 (en) * | 2005-02-01 | 2012-06-12 | Newsilike Media Group, Inc | Enhanced syndication |
| WO2007006408A2 (fr) * | 2005-07-08 | 2007-01-18 | Bayer Healthcare Ag | Methodes et kits de prevision et de surveillance d'une reponse directe a une therapie anti-cancereuse |
| US9997260B2 (en) * | 2007-12-28 | 2018-06-12 | Koninklijke Philips N.V. | Retrieval of similar patient cases based on disease probability vectors |
| CA3149767A1 (en) * | 2009-07-16 | 2011-01-20 | Bluefin Labs, Inc. | Estimating and displaying social interest in time-based media |
| US8706521B2 (en) * | 2010-07-16 | 2014-04-22 | Naresh Ramarajan | Treatment related quantitative decision engine |
-
2013
- 2013-09-27 WO PCT/US2013/062460 patent/WO2014052921A2/fr not_active Ceased
Cited By (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10127359B2 (en) | 2014-09-03 | 2018-11-13 | Optum, Inc. | Healthcare similarity engine |
| US10180777B2 (en) | 2014-09-03 | 2019-01-15 | Optum, Inc. | Healthcare similarity engine dashboard |
| WO2016034996A1 (fr) * | 2014-09-03 | 2016-03-10 | Optum, Inc. | Moteur de similarité de soins de santé et tableau de bord |
| CN109074344A (zh) * | 2016-03-25 | 2018-12-21 | 摄取技术有限公司 | 用于基于预测模型来创建资产相关任务的计算机系统和方法 |
| EP3433758A4 (fr) * | 2016-03-25 | 2019-09-11 | Uptake Technologies, Inc. | Systèmes et procédés informatiques pour créer des tâches liées à des actifs en fonction de modèles prédictifs |
| US10796235B2 (en) | 2016-03-25 | 2020-10-06 | Uptake Technologies, Inc. | Computer systems and methods for providing a visualization of asset event and signal data |
| US11017302B2 (en) | 2016-03-25 | 2021-05-25 | Uptake Technologies, Inc. | Computer systems and methods for creating asset-related tasks based on predictive models |
| US10431343B2 (en) | 2016-05-23 | 2019-10-01 | Koninklijke Philips N.V. | System and method for interpreting patient risk score using the risk scores and medical events from existing and matching patients |
| EP3751473A4 (fr) * | 2018-02-09 | 2021-12-08 | Axion Research Inc. | Système qui estime l'état d'un système complexe à inspecter |
| US20190341153A1 (en) * | 2018-05-01 | 2019-11-07 | International Business Machines Corporation | Generating personalized treatment options using precision cohorts and data driven models |
| US11610688B2 (en) * | 2018-05-01 | 2023-03-21 | Merative Us L.P. | Generating personalized treatment options using precision cohorts and data driven models |
| US11373758B2 (en) * | 2018-09-10 | 2022-06-28 | International Business Machines Corporation | Cognitive assistant for aiding expert decision |
| WO2022245836A1 (fr) * | 2021-05-17 | 2022-11-24 | Dexcom, Inc. | Systèmes de détermination de la similarité de séquences de valeurs de glucose |
| CN113707255A (zh) * | 2021-08-31 | 2021-11-26 | 平安科技(深圳)有限公司 | 基于相似患者的健康指导方法、装置、计算机设备及介质 |
| CN113707255B (zh) * | 2021-08-31 | 2023-09-26 | 平安科技(深圳)有限公司 | 基于相似患者的健康指导方法、装置、计算机设备及介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2014052921A3 (fr) | 2014-07-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10867702B2 (en) | Individual and cohort pharmacological phenotype prediction platform | |
| Herland et al. | A review of data mining using big data in health informatics | |
| Taylor et al. | Prediction of in‐hospital mortality in emergency department patients with sepsis: a local big data–driven, machine learning approach | |
| Fiore et al. | A point-of-care clinical trial comparing insulin administered using a sliding scale versus a weight-based regimen | |
| Givens et al. | Racial and ethnic differences in hospice use among patients with heart failure | |
| Huang et al. | A scoping review of fair machine learning techniques when using real-world data | |
| US20140089003A1 (en) | Patient health record similarity measure | |
| US20170061102A1 (en) | Methods and systems for identifying or selecting high value patients | |
| WO2014052921A2 (fr) | Mesure de similarité de dossier médical de patient | |
| ul Abideen et al. | Docontap: Ai-based disease diagnostic system and recommendation system | |
| Pressman et al. | Prevalence of migraine in a diverse community—electronic methods for migraine ascertainment in a large integrated health plan | |
| US20240062885A1 (en) | Systems and methods for generating an interactive patient dashboard | |
| Shaw et al. | Timing of onset, burden, and postdischarge mortality of persistent critical illness in Scotland, 2005–2014: a retrospective, population-based, observational study | |
| Lauffenburger et al. | Use of data-driven methods to predict long-term patterns of health care spending for Medicare patients | |
| Barnato et al. | Value and role of intensive care unit outcome prediction models in end-of-life decision making | |
| Kashyap et al. | A deep learning method to detect opioid prescription and opioid use disorder from electronic health records | |
| Lagani et al. | Realization of a service for the long-term risk assessment of diabetes-related complications | |
| Hughes et al. | Assessment of a prediction model for antidepressant treatment stability using supervised topic models | |
| Trivedi et al. | Insurance parity and the use of outpatient mental health care following a psychiatric hospitalization | |
| US20140089004A1 (en) | Patient cohort laboratory result prediction | |
| Basaraba et al. | Prediction tool for individual outcome trajectories across the next year in first-episode psychosis in coordinated specialty care | |
| Pan et al. | An interpretable machine learning model based on optimal feature selection for identifying CT abnormalities in patients with mild traumatic brain injury | |
| Puckett et al. | Utilizing natural language processing to identify pediatric patients experiencing status epilepticus | |
| Liu et al. | A data-driven cognitive composite sensitive to amyloid-β for preclinical Alzheimer’s disease | |
| Huss | Digital Medicine: Bringing Digital Solutions to Medical Practice |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13840413 Country of ref document: EP Kind code of ref document: A2 |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 13840413 Country of ref document: EP Kind code of ref document: A2 |