WO2025111262A1 - Systems and methods for analyzing a corpus of documents using large language machine learning models - Google Patents
Systems and methods for analyzing a corpus of documents using large language machine learning models Download PDFInfo
- Publication number
- WO2025111262A1 WO2025111262A1 PCT/US2024/056510 US2024056510W WO2025111262A1 WO 2025111262 A1 WO2025111262 A1 WO 2025111262A1 US 2024056510 W US2024056510 W US 2024056510W WO 2025111262 A1 WO2025111262 A1 WO 2025111262A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- documents
- processors
- data
- machine learning
- computer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
Definitions
- the present disclosure is generally directed to methods and systems for using machine learning models to identify, review, analyze, and catalogue a corpus of documents, and more particularly, to techniques for training and operating one or more large language machine learning models to map information amongst such a corpus.
- LLMs large language models
- a computing system for analyzing a corpus of documents using one or more large language machine learning models includes one or more processors, and one or more memories having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing system to: (i) identify, via the one or more processors, one or more documents associated with at least one user from a corpus of documents associated with a plurality of users including the user; (ii) analyze, via the one or more processors, the one or more documents to generate extracted data for the at least one user from the one or more documents; (iii) generate, via the one or more processors and using one or more trained large language machine learning models, a mapping of the extracted data for the at least one user across the corpus of documents; and (iv) cause, via the one or more processors, the mapping of the extracted data to be displayed via an output device.
- a non-transitory computer-readable medium includes instructions that when executed, cause a computer to: (i) identify, via one or more processors, one or more documents associated with at least one user from a corpus of documents associated with a plurality of users including the user; (ii) analyze, via the one or more processors, the one or more documents to generate extracted data for the at least one user from the one or more documents; (iii) generate, via the one or more processors and using one or more trained large language machine learning models, a mapping of the extracted data for the at least one user across the corpus of documents; and (iv) cause, via the one or more processors, the mapping of the extracted data to be displayed via an output device.
- a computer-implemented method for analyzing a corpus of documents using one or more large language machine learning models includes: (i) identifying, via one or more processors, one or more documents associated with at least one user from a corpus of documents associated with a plurality of users including the user; (ii) analyzing, via the one or more processors, the one or more documents to generate extracted data for the at least one user from the one or more documents; (iii) generating, via the one or more processors and using one or more trained large language machine learning models, a mapping of the extracted data for the at least one user across the corpus of documents; and (iv) causing, via the one or more processors, the mapping of the extracted data to be displayed via an output device.
- FIG. 1 depicts an exemplary computing environment in which the techniques disclosed herein may be implemented, according to some implementations.
- FIG. 2 depicts a combined block and logic diagram in which exemplary computer- implemented methods and systems for training a large language model are implemented according to some implementations.
- FIG. 3 depicts an exemplary network including a user device, cloud platform, and data sources, which may be implemented in the exemplary computing environment of FIG. 1 , according to some implementations.
- FIG. 4 depicts an exemplary user interface for presenting mapped data generated by a machine learning model, which may be implemented in the exemplary computing environment of FIG. 1 , according to some implementations.
- FIG. 5 depicts an exemplary computer-implemented method for analyzing a corpus of documents using one or more large language machine learning models, according to some implementations.
- records may include scanned documents and various application and/or database specific (e.g., CareEverywhere) documents.
- a clinician and/or other individual may utilize the documents to perform visit triage and clinical care.
- existing processes for such tasks do not make optimal use of artificial intelligence and large language models, which, applied correctly, could yield large time savings. These time savings would serve to reduce administrative burden and burnout, while improving patient satisfaction with the referral process, as well as patient outcomes in cases where lengthy triage delays can be averted.
- Some clinics in particular, manage their own triage process(es) as opposed to relying on the Enterprise Office of Access Management (EOAM).
- EOAM Enterprise Office of Access Management
- the clinics self-manage triage processes due to the complex nature of the patients referred to such. Patients referred, for example, may need a combination of imaging and pathology results available prior to the initial consultation. For patients who have not completed this testing elsewhere, it may be able to be performed at such a clinic.
- the current processes for cataloging this information per patient are highly manual and time consuming, and present opportunities for introducing efficiencies through automation.
- the use of an application may be introduced in conjunction with LLMs in order to both increase time savings and clinician satisfaction with outside medical records review.
- the techniques described herein are capable of handling heavier loads while remaining scalable.
- the techniques described herein utilize a microservices architecture to decouple parts of the application so they can scale independently.
- the techniques described herein implement autoscaling of compute and storage resources when appropriate and/or leverage load balancing services to manage and distribute incoming application traffic.
- the components are then packaged following continuous integration/continuous delivery (CI/CD) paradigms, which enable seamless and repeatable processes for adding new features, building, and deploying the application continuously. Consequently, in addition to the benefits provided by the methods described herein, the instant systems discussed herein further enable review of an ever-increasing volume and variety of documents.
- the systems may follow rules and/or guidelines from various best-practices and/or regulatory bodies (e.g., MCC, AIF, and CAF, among others).
- the instant techniques may provide for comprehensive load and performance testing (e.g., each time a new department or site and/or feature is added).
- the systems discussed herein may provide health monitoring and alerts (e.g., by monitoring compute and storage components are in real time for any sudden spikes or sustained stress to the system) and/or send automated alerts to personnel (e.g., system engineers).
- the instant techniques may provide for data and code backups (e.g., of databases in the cloud as well as code repositories).
- the instant techniques provide for user-centric and/or post-deployment support.
- the instant techniques may achieve greater accuracy in document splitting, classification, and date extraction (e.g., 80-95% accuracy). Additionally, the instant techniques may reduce processing time and resource usage by reducing overall computational time spent analyzing records (e.g., reducing time spent by at least 12 minutes).
- use cases may include usages such as record intake, triage, and clinical care of patients who are seen as “New Consults/New Visits” at a given institution. For such visits, whether at large referral centers or community clinics, visit protocols often begin with a medical records request.
- An intake institution may request documents for a patient, which must be sorted. Sorting is conventionally manual, but is automated according to the instant techniques. Depending on the implementation, such sorting may include identifying a start and/or end of a document, classifying each document into one of several clinical categories, and identifying the relevant date of a document.
- Additional use cases may include legal document analysis (e.g., patent document analysis, discovery document analysis, etc.), financial document analysis (e.g., investment records for a user, tax documents, etc.), property document analysis, and/or other such use cases as described herein.
- legal document analysis e.g., patent document analysis, discovery document analysis, etc.
- financial document analysis e.g., investment records for a user, tax documents, etc.
- property document analysis e.g., etc.
- FIG. 1 depicts an exemplary computing environment and system 100 in which the techniques disclosed herein may be implemented, according to some implementations.
- the system 100 may include computing resources for training and/or operating machine learning models to collate and analyze documents in a clinical context, in some implementations.
- the system 100 may include a client computing device 102, a server computing device 104, an electronic network 106, a context electronic database 110, and a model electronic database 1 12.
- the computing environment may further include one or more cloud application programming interfaces (APIs) 114.
- APIs cloud application programming interfaces
- the components of the system 100 may be communicatively connected to one another via the electronic network 106, in some implementations.
- the client computing device 102 may implement, inter alia, operation of one or more applications for facilitating analysis of a corpus of documents using one or more machine learning models (e.g., large language machine learning models).
- the client computing device 102 may be implemented as one or more computing devices (e.g., one or more servers, one or more laptops, one or more mobile computing devices, one or more tablets, one or more wearable devices, one or more cloud-computing virtual instances, etc.).
- a plurality of client computing devices may be part of the system 100 - for example, a first user may access a client computing device 102 that is a laptop, while a second user accesses the client computing device 102 that is a smart phone, while yet a third user accesses a client computing device 102 that is a wearable device.
- the client computing device 102 may include one or more processors 120, one or more network interface controllers 122, one or more memories 124, an input device 126, an output device 128 and a client API 130.
- the one or more memories 124 may have stored thereon one or more modules 140 (e.g., one or more sets of instructions).
- the one or more processors 120 may include one or more central processing units, one or more graphics processing units, one or more field- programmable gate arrays, one or more application-specific integrated circuits, one or more tensor processing units, one or more digital signal processors, one or more neural processing units, one or more RISC-V processors, one or more coprocessors, one or more specialized processors/accelerators for artificial intelligence or machine learning-specific applications, one or more microcontrollers, etc.
- the client computing device 102 may include one or more network interface controllers 122, such as Ethernet network interface controllers, wireless network interface controllers, etc.
- the network interface controllers 122 may include advanced features, in some implementations, such as hardware acceleration, specialized networking protocols, etc.
- the memories 124 of the client computing device 102 may include volatile and/or nonvolatile storage media.
- the memories 124 may include one or more random access memories, one or more read-only memories, one or more cache memories, one or more hard disk drives, one or more solid-state drives, one or more non-volatile memory express, one or more optical drives, one or more universal serial bus flash drives, one or more external hard drives, one or more network-attached storage devices, one or more cloud storage instances, one or more tape drives, etc.
- the memories 124 may have stored thereon one or more modules 140, for example, as one or more sets of computer-executable instructions.
- the modules 140 may include additional storage, such as one or more operating systems (e.g., Microsoft Windows, GNU/Linux, Mac OSX, etc.).
- the operating systems may be configured to run the modules 140 during operation of the client computing device 102 - for example, the modules 140 may include additional modules and/or services for receiving and processing data from one or more other components of the system 100 such as the one or more cloud APIs 114 or the server computing device 104.
- the modules 140 may be implemented using any suitable computer programming language(s) (e.g., Python, JavaScript, C, C++, Rust, C#, Swift, Java, Go, LISP, Ruby, Fortran, etc.).
- the modules 140 may include a model configuration module 142, an API module 144, an input processing module 146, an authentication/security module 148, a context module 150 and a cataloguing module 152, in some implementations. In some implementations, more or fewer modules 140 may be included.
- the modules 140 may be configured to communicate with one another (e.g., via inter-process communication, via a bus, via sockets, pipes, message queues, etc.).
- the model configuration module 142 may include one or more sets of computerexecutable instructions (i.e., software, code, etc.) for performing the functionalities described herein.
- the model configuration module 142 may enable one or more machine learning models (e.g., large language machine learning models, image analysis models, etc.) to be stored, for example in the memory 124 or in the context electronic database 110.
- the model configuration module 142 may be omitted from the modules 140, or its access may be restricted to administrative users only.
- one or more of the modules 140 may be packaged into a downloadable application (e.g., a smart phone app available from an app store) that enables registered but non-privileged (i.e., non-administrative) users to access the system 100 using their consumer client computing device 102.
- a downloadable application e.g., a smart phone app available from an app store
- one or more of the client computing device(s) 102 may be locked down, such that the client computing device 102 is controlled hardware, accessible only to those who have physical access to certain areas.
- the API module 144 may include one or more sets of computer executable instructions for accessing one or more remote APIs, and/or for enabling one or more other components within the system 100 to access functionality of the client computing device 102.
- the API module 144 may enable other client applications (i.e., not applications facilitated by the modules 140) to connect to the client computing device 102, for example, to send queries or prompts, and to receive responses from the client computing device 102.
- the API module 144 may include instructions for authentication, rate limiting and error handling.
- the client computing device 102 may enable one or more users to access one or more trained models by providing input prompts that are processed by one or more trained models.
- the input processing module 146 may perform pre-processing of user prompts prior to being input into one or more models, and/or post-processing of outputs output by one or more models.
- the input processing module 146 may process data input into one or more input fields, voice inputs or other input methods (e.g., file attachments) depending upon the application.
- the input processing module 146 may receive inputs directly via the input device 126, in some implementations.
- the input processing module 146 may perform postprocessing of input received from one or more trained models.
- postprocessing and/or pre-processing
- the input processing module 146 may include instructions for handling errors and for displaying errors to users (e.g., via the output device 128).
- the input processing module 146 may cause one or more graphical user interfaces to be displayed, for example to enable the user to enter information directly via a text field.
- the authentication/security module 148 may include one or more sets of computerexecutable instructions for implementing access control mechanisms for one or more trained models, ensuring that the model can only be accessed by those who are authorized to do so, and that the access of those users is private and secure.
- trained models require state information in order to meaningfully carry on a dialogue with a user or with another trained model. For example, if a user prompts a trained model with a question such as “What is the weather in Chicago today?” followed by a second prompt “And how about tomorrow?” the model should understand that, in context, the second query relates to the first query, insofar as the user is asking about the weather tomorrow in the same location (Chicago).
- language models e.g., large language models (LLMs)
- LLMs large language models
- many systems add statefulness to models using context information. This may be implemented using sliding context windows, wherein a predetermined number of tokens (e.g., 4096 maximum tokens in the case of GPT 3.5, equivalent to about 3000 words) may be “remembered” by the LLM and can be used to enrich multiple sequential prompts input into the LLM (for example, when the LLM is used in a chat mode).
- the context module 150 may include one or more sets of computer-executable instructions for maintaining state of the type found in this example, and other types of state information.
- the context module 150 may implement sliding window context, in some implementations. In other implementations, the context module 150 may perform other types of state maintaining strategies. For example, the context module 150 may implement a strategy in which information from the immediately preceding prompt is part of the window, regardless of the size of that prior prompt.
- the context module 150 may implement a strategy in which one or more prior prompts are included in each current prompt.
- This prompt stuffing technique, or prompt concatenation may be limited by prompt size constraints — once the total size of the prompt exceeds the prompt limit, the model immediately loses state information related to parts of the prompt truncated from the prompt.
- the cataloguing module 152 may include one or more sets of computer-executable instructions for identifying, retrieving, analyzing, and/or cataloguing documents for a particular user (e.g., a patient). Depending on the implementation, the cataloguing module 152 may perform outside records cataloguing by developing a shared mapping of available data across scanned documents, databases (e.g., a CareEverywhere database), radiology images, and/or other such documents. As further examples, the documents may include particular database and/or application-specific documents (e.g., OnBase documents and CareEverywhere documents) updated at least every 24 hours.
- application-specific documents e.g., OnBase documents and CareEverywhere documents
- the cataloguing module 152 accesses databases at least once every 24 hours for obtaining patient visit data to preschedule Al document processing. In some implementations, the cataloguing module accesses and/or retrieves documents for thousands of patients weekly. In further implementations, the cataloguing module 152 interfaces and/or otherwise integrates with one or more databases and/or other document repositories (e.g., Epic, OnBase (e.g., directly through OnBase API or through aLongitudinal Patient Record API), CareEverywhere, Clarity through Denodo, Mayo Clinic Cloud (Cloud App Factory & Al Factory 2.0), etc.).
- Epic e.g., Epic, OnBase (e.g., directly through OnBase API or through aLongitudinal Patient Record API), CareEverywhere, Clarity through Denodo, Mayo Clinic Cloud (Cloud App Factory & Al Factory 2.0), etc.).
- OnBase e.g., directly through OnBase API or through aLongitudinal
- the mapping may include the data types present and their relevant dates.
- the cataloguing module 152 focuses on the specific data types used by the a particular entity (e.g., a breast clinic, a pulmonary clinic, etc.) for a particular purpose (e.g., new consult referrals), such as radiological or medical imaging studies (e.g., screening mammograms, diagnostic mammograms, computed tomography scans, x-ray studies, ultrasound reports, echocardiograph reports, MRI reports, clinical notes, microbiology reports, operative reports, diagnostic test reports, biopsy procedure reports, biopsy pathology reports, and post-biopsy imaging (e.g., an indication that a patient underwent mammogram on a given date and an indication of the associated report, an indication that the patient underwent biopsy on a different date and an indication of the report, an indication that an ultrasound report is absent, etc.), etc.).
- radiological or medical imaging studies e.g., screening mammograms, diagnostic mammograms, computed tomography scans,
- the cataloguing module 152 implements techniques for reliably finding these documents among scanned records and extracting their relevant clinical date, as discussed below with regard to FIG. 5.
- the cataloguing module 152 includes and/or utilizes new data pipelines to access the data (e.g., including metadata) from a database (e.g., a radiology database).
- the cataloguing module 152 additionally performs outside records summarization. As such, the cataloguing module 152 enables users to generate customized, traceable records summaries to further streamline workflows. Similarly, the cataloguing module 152 may additionally perform outside records clean-up for removal of duplicated scanned documents, de-rotation of scanned pages (e.g., orientation matching), and/or Al-assisted translation of medical records in another language (e.g., Mandarin, Spanish, Arabic, etc.).
- these records summarizations generated by the cataloguing module 152 may be tailored for a multidisciplinary care team (e.g., a care team that includes surgical and medical oncology, radiology, radiation oncology, allied health staff, etc.). By performing records summarization on multimodal data types, a more comprehensive synthesis and summary of records data may be performed without assistance from particular care team groups.
- a multidisciplinary care team e.g., a care team that includes surgical and medical oncology, radiology, radiation oncology, allied health staff, etc.
- the cataloguing module 152 is communicatively coupled to and/or includes an exposed microservice for internal use (e.g., via an API) for connection to other applications, programs, and/or other such projects that use outside medical records.
- an exposed microservice for internal use (e.g., via an API) for connection to other applications, programs, and/or other such projects that use outside medical records.
- at least some of the processes of the cataloguing module 152 may be implemented through an exposed microservice in external electronic health records computing systems, such as EPIC, (e.g., using SMART on an FHIR APP).
- the cataloguing module 152 performs performance evaluation of the functionalities described herein.
- evaluation may focus on at least three dimensions: (1 ) appropriate data science metrics for the given task, (2) clinical usability and utility, and/or (3) system evaluation and testing.
- the server computing device 104 may include one or more processors 160, one or more network interface controllers 162, one or more memories 164, an input device (not depicted), an output device (not depicted) and a server API 166.
- the one or more memories 164 may have stored thereon one or more modules 170 (e.g., one or more sets of instructions).
- the one or more processors 160 may include one or more central processing units, one or more graphics processing units, one or more field- programmable gate arrays, one or more application-specific integrated circuits, one or more tensor processing units, one or more digital signal processors, one or more neural processing units, one or more RISC-V processors, one or more coprocessors, one or more specialized processors/accelerators for artificial intelligence or machine learning-specific applications, one or more microcontrollers, etc.
- the server computing device 104 may include one or more network interface controllers 162, such as Ethernet network interface controllers, wireless network interface controllers, etc.
- the network interface controllers 162 may include advanced features, in some implementations, such as hardware acceleration, specialized networking protocols, etc.
- the memories 164 of the server computing device 104 may include volatile and/or non-volatile storage media.
- the memories 164 may include one or more random access memories, one or more read-only memories, one or more cache memories, one or more hard disk drives, one or more solid-state drives, one or more non-volatile memory express, one or more optical drives, one or more universal serial bus flash drives, one or more external hard drives, one or more network-attached storage devices, one or more cloud storage instances, one or more tape drives, etc.
- the memories 164 may have stored thereon one or more modules 170, for example, as one or more sets of computer-executable instructions.
- the modules 170 may include additional storage, such as one or more operating systems (e.g., Microsoft Windows, GNU/Linux, Mac OSX, etc.).
- the operating systems may be configured to run the modules 170 during operation of the server computing device 104 - for example, the modules 170 may include additional modules and/or services for receiving and processing data from one or more other components of the system 100 such as the one or more cloud APIs 114 or the client computing device 102.
- the modules 170 may be implemented using any suitable computer programming language(s) (e.g., Python, JavaScript, C, C++, Rust, C#, Swift, Java, Go, LISP, Ruby, Fortran, etc.).
- the modules 170 may include a data collection module 172, a data pre-processing module 174, a model pretraining module 176, a fine-tuning module 178, a model training module 180, a checkpointing module 182, a hyperparameter tuning module 184, a validation and testing module 186, an auto-prompting module 188, a model operation module 190 and an ethics and bias module 192.
- more or fewer modules 170 may be included.
- the modules 170 may be configured to communicate with one another (e.g., via inter-process communication, via a bus, via sockets, pipes, message queues, etc.).
- the modules 170 may respond to network requests (e.g., via the API 166) or other requests received via the network 106 (e.g., via the client computing device 102 or other components of the system 100).
- the data collection module 172 may be configured to collect information used to train one or more modules. In general, the information collected may be any suitable information used for training a language model.
- the data collection module 172 may collect data via web scraping, via API calls/access, via database extract-transform-load (ETL) processes, etc.
- Sources accessed by the data collection module 172 include social media websites, books, websites, academic publications, web forums/interest sites (e.g., social media pages, community pages, bulletin boards, etc.), etc.
- the data collection module 172 may access data sources by active means (e.g., scraping or other retrieval) or may access existing corpuses.
- the data collection module 172 may include sets of instructions for performing data collection in parallel, in some implementations.
- the data collection module 172 may store collected data in one or more electronic databases, such as a database accessible via the cloud APIs 114 or via a local electronic database (not depicted).
- the data may be stored in a structured and/or unstructured format.
- the data collection module 172 may store large data volumes used for training one or more models (i.e. , training data). For example, the data collection module 172 may store terabytes, petabytes, exabytes or more of training data.
- the data collection module 172 may retrieve data from one or more databases as described herein. For example, the data collection module 172 may process the retrieved/received data and sort the data into multiple subsets based on information included within the database. For example, the data collection module 172 may receive one or more sets of unstructured text (e.g., transcripts of one or more historical meeting minutes of a clinical review board). The data collection module 172 may segment the data according to time (e.g., hourly, daily, quarterly, etc.).
- time e.g., hourly, daily, quarterly, etc.
- the model pre-processing module 174 may include instructions for pre-processing data collected by the data collection module 172.
- the model pre-processing module 174 may perform text extraction and/or cleaning operations on data collected by the data collection module 172.
- the data pre-processing module 174 may perform pre-processing operations, such as lexical parsing, tokenizing, case conversions and other string splitting/munging.
- the data collection module 172 may perform data deduplication, filtering, annotation, compliance, version control, validation, quality control, etc.
- one or more human reviewers may be looped into the process of preprocessing data collected by the data pre-processing module 174. For example, a distributed work queue may be used to transmit batch jobs and receive human-computed responses from one or more human workers.
- the data pre-processing module 174 may store copied and/or modified copies of the training date in an electronic database.
- the data pre-processing module 174 may include instructions for parsing the unstructured text received by the data collection module 172 to structure the text. For example, when the text relates to clinician notes regarding a patient, the data pre-processing module 174 may generate a time series data structure in which each set of notes is represented by one or more timestamps, and at each timestamp, text associated with various speakers, determinations (e.g., treatment, recommendation, etc.) is labelled. The data pre-processing module 174 may also label the data according to the identity of one or more speaker and/or one or more topic. For example, the time series data may be labeled according to one or more speakers associated with textual speech, in a language transcript form.
- the time series may include one or more keywords associated with the transcript.
- the present techniques may use a separate trained text summarization module to generate keywords used for this purpose.
- the data pre-processing module 174 may generate structured data corresponding to unstructured meeting minutes, such that the structured data is enriched with information about the meeting that is suitable for training. This structured data may be processed by downstream processes/modules.
- the present techniques may train one or more models to perform language generation tasks that include token generation. Both training inputs and model outputs may be tokenized.
- tokenization refers to the process by which text used for training is divided into units such as words, subwords or characters. Tokenization may break a single word into multiple subwords (e.g., “LLM” may be tokenized as “L” and “LM”).
- LLM subwords
- the present techniques may train one or more models using a set of tokens (e.g., a vocabulary) that includes a multitude (e.g., thousands or more) of tokens. These tokens may be embedded into a vector. This vector of tokens or “embeddings” may include numerical representations of the individual tokens in the vocabulary in high-dimensional vector space.
- the modules 170 may access and modify the embeddings during training to learn relationships between tokens. These relationships effectively represent semantic language meaning.
- a specialized database e.g., a vector store, a graph database, etc.
- Embedding databases may include specialized features, such as efficient retrieval, similarity search, and scalability.
- the server computing device 104 may include a local electronic embedding database (not depicted).
- a remote embedding database service may be used (e.g., via the cloud APIs 114). Such a remote embedding database service may be based on an open source or proprietary model (e.g., Milvus, Pinecone, Redis, Postgres, MongoDB, Facebook Al Similarity Search (FAISS), etc.).
- the server computing device 104 may include instructions (e.g., in the data collection module 172) for adding training data to one or more specialized databases, and for accessing it to train models.
- the present techniques may include language modeling, wherein one or more deep learning models are trained by processing token sequences using a large language model architecture.
- a transformer architecture may be used to process a sequence of tokens.
- Such a transformer model may include a plurality of layers including self-attention and feedforward neural networks. This architecture may enable the model to learn contextual relationships between the tokens, and to predict the next token in a sequence, based upon the preceding tokens. During training, the model is provided with the sequence of tokens and it learns to predict a probability distribution over the next token in the sequence.
- This training process may include updating one or more model parameters (e.g., weights or biases) using an objective function that minimizes the difference between the predicted distribution and a true next token in the training data.
- model parameters e.g., weights or biases
- transformer architecture may include recurrent neural networks, long short-term memory networks, gated recurrent networks, convolutional neural networks, recursive neural networks, and other modeling architectures.
- the modules 170 may include instructions for performing pretraining of a language model (e.g., an LLM), for example, in a pretraining module 176.
- the pretraining module 176 may include one or more sets of instructions for performing pretraining, which as used herein, generally refers to a process that may span pre-processing of training data via the data pre-processing module 174 and initialization of an as-yet untrained language model.
- a pre-trained model is one that has no prior training of specific tasks.
- the model pretraining module 176 may include instructions that initialize one more model weights.
- model pretraining module 176 may initialize the weights to have random values.
- the model pretraining module 176 may train one or more models using unsupervised learning, wherein the one or more models process one or more tokens (e.g., pre-processed data output by the data pre-processing module 174) to learn to predict one or more elements (e.g., tokens).
- the model pretraining module 176 may include one or more optimizing objective functions that the model pretraining module 176 applies to the one or more models, to cause the one or more models to predict one or more most-likely next tokens, based on the likelihood of tokens in the training data.
- the model pretraining module 176 causes the one or more models to learn linguistic features such as grammar and syntax.
- the pretraining module 176 may include additional steps, including training, data batching, hyperparameter tuning and/or model checkpointing.
- the model pretraining module 176 may include instructions for generating a model that is pretrained for a general purpose, such as general text processing/ understanding.
- This model may be known as a “base model” in some implementations.
- the base model may be further trained by downstream training process(es), for example, those training processes described with respect to the fine-tuning module 178.
- the model pretraining module 176 generally trains foundational models that have general understanding of language and/or knowledge. Pretraining may be a distinct stage of model training in which training data of a general and diverse nature (i.e., not specific to any particular task or subset of knowledge) is used to train the one or more models.
- a single model may be trained and copied. Copies of this model may serve as respective base models for a plurality of finetuned models.
- base models may be trained to have specific levels of knowledge common to more advanced agents.
- the model pretraining module 176 may train a medical student base model that may be subsequently used to fine tune an internist model, a surgeon model, a resident model, etc. In this way, the base model can start from a relatively advanced stage, without requiring pretraining of each more advanced model individually.
- This strategy represents an advantageous improvement, because pretraining can take a long time (many days), and pretraining the common base model only requires that pretraining process to be performed once.
- the modules 170 may include a fine-tuning module 178.
- the fine-tuning module 178 may include instructions that train the one or models further to perform specific tasks.
- the fine-tuning module 178 may include instructions that train one or more models to generate respective language outputs (e.g., text generation), summarization, question answering or translation activities based on characteristics of a user and/or corpus of documents.
- the fine-tuning module 178 may include sets of instructions for retrieving one or more structured data sets, such as time series generated by the data preprocessing module 174. These structured data sets may be sorted by time, date, type, and/or any other similar metric to train one or more machine learning models (e.g., one or more language models) that may be used within the system 100 to collate and analyze documents.
- the fine-tuning module 178 may include instructions for configuring an objective function for performing a specific task, such as generating text that is similar to text found within the corpus of training data associated with a particular individual by role.
- the fine- tuning module 178 may include instructions for fine-tuning a pathologist model, based on a base language model. These fine-tuning instructions may select statements of pathologists from one or more databases including a corpus of data (or another data source). A medical resident model may be fine-tuned by the fine-tuning module 178, wherein the base model is the same used to fine-tune the pathologist model. The fine-tuning module 178 may train many (e.g., hundreds or more) additional models.
- the fine-tuning module 178 may include user-selectable parameters that affect the fine-tuning of the one or more models.
- a “caution” bias parameter may be included that represents medical conservativeness. This bias parameter may be adjusted to affect the cautiousness with which the resulting trained model (i.e., agent) approaches medical decision-making. Additional models may be trained, for additional personas/tasks, as discussed below.
- one or more open source frameworks may be used.
- Example frameworks include TensorFlow, Keras, MXNet, Caffe, SciKit learn, PyTorch.
- frameworks such as OpenLLM and LangChain may be used, in some implementations.
- the fine-tuning module 178 may use an algorithm such as stochastic gradient descent or another optimization technique to adjust weights of the pretrained model.
- Fine-tuning may be an optional operation, in some implementations.
- training may be performed by the training module 180 after pretraining by the model pretraining module 176.
- the model training module 180 may perform task-specific training like the fine-tuning module 178, on a smaller scale or with a more tailored objective.
- the model training module 180 may further train the model to learn knowledge of a plastic surgeon, an orthopedic surgeon, etc.
- the model may be trained as and/or using proprietary models and/or techniques (e.g., the PALM2/text-bison foundation model, MedPALM model, and/or other such LLMs).
- the training module 180 may include one or more submodules, including the checkpointing module 182, the hyperparameter tuning module 184, the validation and testing module 186, and the auto-prompting module 188.
- the checkpointing module 182 may perform checkpointing, which is saving of a model’s parameters.
- the checkpointing module 182 may store checkpoints during training and at the conclusion of training, for example, in the model electronic database 1 12. In this way, the model may be run (e.g., for testing and validation) at multiple stages and its training parameters loaded, and also retrained from a checkpoint. In this way, the model can be run and trained forward without being re-trained from the beginning, which may save significant time (e.g., days of computation).
- the hyperparameter tuning module 184 may include hyperparameters such as batch size, model size, learning rate, etc. These hyperparameters may be adjusted to influence model training.
- the hyperparameter tuning module 184 may include instructions for tuning hyperparameters by successive evaluation.
- the validation and testing module 186 may include sets of instructions for validating and testing one or more machine learning models, including those generated by the model pretraining module 176, the fine-tuning module 178 and the model training module 180.
- the auto-prompting module 188 may include sets of instructions for performing auto-prompting of one or more models. Specifically, the auto-prompting module 188 may enrich a prompt with additional information.
- the auto-prompting module 188 may include additional information in a prompt, so that the model receiving the prompt has additional context or directions that it can use. This may allow the auto-prompting module 188 to fine-tune a base model using one-shot of few-shot learning, in some implementations.
- the auto-prompting module 188 may also be used to focus the output of the one or more models.
- the training module 180 may include instructions for training one or more additional machine learning models, such as supervised or unsupervised machine learning models.
- the present techniques may include processing imaging data to enrich records regarding a patient’s care.
- the patient’s imaging data may be processed by a model (e.g., a convolutional neural network) and the results processed further (e.g., by a language model) and/or provided to the client computing device 102.
- the training module 180 may train such a supervised model separately from training one or more language models.
- the server computing device 104 may select one or more trained models at runtime based on data about a specific patient, based upon data contained in a prompt or based on other conditions that may be preprogrammed into the server computing device 104.
- the training module 180 may train multi-modal models.
- the training module 180 may train a plurality of models each capable of drawing from multimodal data types such as written text, imaging data, laboratory data, real-time monitoring data, pathology images, etc.
- the training module 180 may train a single model capable of processing the multimodal data types.
- a trained multimodal model may be used in conjunction with another model (e.g., a large language model) to provide non-text data interactions with users. Non-text data may be analyzed and integrated into the functions discussed herein. Further details regarding training the models are discussed below with regard to FIG. 2.
- the model operation module 190 may operate one or more trained models.
- the model operation module 190 may initialize one or more trained models, load parameters into the model(s), and provide the model(s) with inference data (e.g., prompt inputs).
- the model operation module 190 may deploy one or more trained model (e.g., a pretrained model, a fine-tuned model and/or a trained model) onto a cloud computing device (e.g., via the API 166).
- the model operation module 190 may receive one or more inputs, for example from the client computing device 102, and provide those inputs (e.g., one or more prompts) to the trained model.
- the API 166 may include elements for receiving requests to the model, and for generating outputs based on model outputs.
- the API 166 may include a RESTful API that receives a GET or POST request including a prompt parameter.
- the model operation module 190 may receive the request from the API 166, and pass the prompt parameter into the trained model and receive a corresponding input.
- the prompt parameter may be “What is the smallest bone in the human body?”.
- the prompt output may be “The stapes bone of the inner ear is the smallest bone in the human body.”
- the model operation module 190 may operate models in different modes. For example, in a first mode, the model operation module 190 may receive a prompt input via the client computing device 102, and provide that input to each of a plurality of agents for processing. The output of each agent may be collected and transmitted back to the client computing device 102 for display. The outputs may be labeled according to an identifier of each model (e.g., “pathologist,” “surgeon,” “medical student,” etc.). In the first mode, the model operation module 190 may receive additional inputs from the client computing device 102 that enable the user to interact with the one or more trained models in a question-answer format.
- a prompt input via the client computing device 102
- the output of each agent may be collected and transmitted back to the client computing device 102 for display.
- the outputs may be labeled according to an identifier of each model (e.g., “pathologist,” “surgeon,” “medical student,” etc.).
- the model operation module 190 may receive additional inputs from
- the user may ask follow up questions. For example, the user may enter a prompt such as “Tell me about patients who were similarly situated. How did they respond to SBRT? To surgery?”
- the server computing device 104 may have access to historical patient data, and the server computing device 104 (e.g., the data pre-processing module 174) may include instructions for retrieving additional data from knowledge databases regarding patients to provide additional context to the one or more language models.
- knowledge databases may include the electronic healthcare records database discussed above, as well as external sources such as academic papers, case studies, transcripts, etc.
- the language models may be trained using this additional data ahead of time, and may not retrieve the data at runtime.
- the training data for the liver cancer example may include the KRAS mutation status of the patient, their chemotherapy records, and outcomes for surgery, radiation, and other approaches.
- multi-modal modeling may be used.
- the data pre-processing module may, for example, process and understand image data, audio data, video data, etc.
- the server computing device 104 may interpret and respond to queries that involve understanding content from these different modalities.
- the server computing device 104 may include an image processing module (not depicted) including instructions for performing image analysis on images provided by users, or images retrieved from patient EHR data.
- the server computing device 104 may generate outputs in modalities other than text.
- the server computing device 104 may generate an audio response, an image, etc.
- Combining multi-modal data may enable the present models to perform more comprehensive analysis of patient conditions, based on information processed in multiple different modes simultaneously.
- the operating module 190 may include a set of computer-executable instructions that when executed by one or more processors (e.g., the processors 160) cause a computer (e.g., the server computing device 104) to perform retrieval-augmented generation.
- the operating module 190 may perform retrieval-augmented generation based upon inputs or queries received from the user. This allows the operating module 190 to tailor responses of a model based on the specific input and context, such as the medical issue under discussion.
- one or more models may be pre-trained, fine-tuned and/or trained as discussed above. During that training, the model may learn to generate tokens based on general language understanding as well as application-specific training. Such a model at that point may be static, insofar as it cannot access further information when presented with an input query.
- the operating module 190 may perform retrieval operations, such as searching or selecting information from a document, a database, or another source.
- the operating module 190 may include instructions for processing user input and for performing a keyword search, a regular expression search, a similarity search, etc. based upon that user input.
- the operating module 190 may input the results of that search, along with the user input, into the trained model.
- the trained model may process this additional retrieved information to augment, or contextualize, the generation of tokens that represent responses to the user’s query.
- retrieval-augmented generation applied in this manner allows the model to dynamically generate outputs that are more relevant to the user’s input query at runtime.
- Information that may be retrieved may include data corresponding to a patient (e.g., patient demographic information, medical history, clinical notes, diagnoses, medications, allergies, immunizations, laboratory results, oncology information, radiation and imaging information, vitals, etc.) and additional training information, such as medical journals, notes or speech transcripts from symposia or other meetings/conferences, etc.
- patient demographic information e.g., patient demographic information, medical history, clinical notes, diagnoses, medications, allergies, immunizations, laboratory results, oncology information, radiation and imaging information, vitals, etc.
- additional training information such as medical journals, notes or speech transcripts from symposia or other meetings/conferences, etc.
- the present techniques may trigger retrieval-augmented generation by processing a prompt, in some implementations.
- a prompt may be processed by the input processing module 146 of the client computing device 102, prior to processing the prompt by the one or more generative models.
- the input processing module 146 may trigger retrieval- augmented generation based on the presence of certain inputs, such as patient information, or a request for specific information, in the form of keywords.
- the input processing module 146 may perform entity recognition or other natural language processing functions to determine whether the prompt should be processed using retrieval-augmented generation prior to being provided to the trained model.
- prompts may be received via the input processing module 146 of the client computing device 102 and transmitted to the server computing device 104 via the electronic network 106.
- the output of the model may be modulated prior to being transmitted, output, or otherwise displayed to a user.
- the ethics and bias module 192 may process the prompt input prior to providing the prompt input to the trained model, to avoid passing objectionable content into the trained model.
- the ethics and bias module 192 may process the output of the trained model, also to avoid providing objectionable output. It should be appreciated that trained language models may be unpredictable, and thus, processing outputs for ethical and bias concerns (especially in a medical context) may be important.
- the present techniques may be used to augment and solidify human decision making, rather than as a substitute for such deliberate thinking.
- the client computing device 102 and the server computing device 104 may communicate with one another via the network 106.
- the client computing device 102 and/or the server computing device 104 may offload some or all of their respective functionality to the one or more cloud APIs 114.
- the one or more cloud APIs 114 may include one or more public clouds, one or more private clouds and/or one or more hybrid clouds.
- the one or more cloud APIs 114 may include one or resources provided under one or more service models, such as Infrastructure as a Service (laaS), Platform as a Service (PaaS), Software as a Service (SaaS), and Function as a Service (FaaS).
- laaS Infrastructure as a Service
- PaaS Platform as a Service
- SaaS Software as a Service
- FaaS Function as a Service
- the one or more cloud APIs 114 may include one or more cloud computing resources, such as computing instances, electronic databases, operating systems, email resources, etc.
- the one or more cloud APIs 1 14 may include distributed computing resources that enable, for example, the model pretraining module 176 and/or other of the modules 170 to distribute parallel model training jobs across many processors.
- the one or more cloud APIs 114 may include one or more language operation APIs, such as OpenAI, Bing, Claude. ai, etc.
- the one or more cloud APIs 114 may include an API configured to operate one or more open source models, such as Llama 2.
- the electronic network 106 may be a collection of interconnected devices, and may include one or more local area networks, wide area networks, subnets, and/or the Internet.
- the network 106 may include one or more networking devices such as routers, switches, etc. Each device within the network 106 may be assigned a unique identifier, such as an IP address, to facilitate communication.
- the network 106 may include wired (e.g., Ethernet cables) and wireless (e.g., Wi-Fi) connections.
- the network 106 may include a topology such as a star topology (devices connected to a central hub), a bus topology (devices connected along a single cable), a ring topology (devices connected in a circular fashion), and/or a mesh topology (devices connected to multiple other devices).
- the electronic network 106 may facilitate communication via one or more networking protocols, such as packet protocols (e.g., Internet Protocol (IP)) and/or application-layer protocols (e.g., HTTP, SMTP, SSH, etc.).
- IP Internet Protocol
- application-layer protocols e.g., HTTP, SMTP, SSH, etc.
- the network 106 may perform routing and/or switching operations using routers and switches.
- the network 106 may include one or more firewalls, file servers and/or storage devices.
- the network 106 may include one or more subnetworks such as a virtual LAN (VLAN).
- VLAN virtual LAN
- the system 100 may include one or more electronic databases, such as a relational database that uses structured query language (SQL) and/or a NoSQL database or other schema-less database suited for the storage of unstructured or semi-structured data.
- SQL structured query language
- NoSQL database or other schema-less database suited for the storage of unstructured or semi-structured data.
- the present techniques may store training data, training parameters and/or trained models in an electronic database such as the database 1 12.
- one or more trained machine learning models may be serialized and stored in a database (e.g., as a binary, a JSON object, etc.). Such a model can later be retrieved, deserialized and loaded into memory and then used for predictive purposes.
- the one or more trained models and their respective training parameters may also be stored as blob objects.
- Cloud computing APIs may also be used to stored trained models, via the cloud APIs 114. Examples of these services include AWS SageMaker, Google Al Platform, and Azure Machine Learning.
- a user may access a prompt graphical user interface via the client computing device 102.
- the prompt graphical user interface may be configured by the model configuration module 142 and generated by the input processing module 146, and displayed by the input processing module 146 via the output device 128.
- the model configuration module 142 may be configured to have one or more digital panel objects each comprising one or more trained models as digital agent objects.
- the model configuration module 142 may configure the graphical user interface to accept prompts and display corresponding prompt outputs generated by one or more models processing the accepted outputs.
- the input processing module 146 may be configured to transmit the prompts input by the user via the electronic network 106 to the API 166 of the server computing device 104.
- the API 166 may process the user inputs via one or more trained models.
- one or more models may already be trained, including pretraining and fine-tuning. These trained models may be selectively loaded into the one or more agent objects based on configuration parameters, and/or based upon the content of the user’s input prompts.
- the system 100 may include a single device including the described module (e.g., the server computing device 104) that is accessed for remote computing and/or other use cases by one or more client devices (e.g., the client computing device 102, output device 128, etc.).
- FIG. 2 depicts a combined block and logic diagram 200 for training a machine learning model, in which the techniques described herein may be implemented, according to some embodiments.
- Some of the blocks in FIG. 2 may represent hardware and/or software components, other blocks may represent data structures or memory storing these data structures, registers, or state variables (e.g., supervised training dataset 212), and other blocks may represent output data (e.g., scalar reward 225).
- Input and/or output signals may be represented by arrows labeled with corresponding signal names and/or other identifiers.
- the methods and systems may include one or more servers 202, 204, 206, such as the server computing device 104 or an external computing device.
- the server 202 may fine-tune a pretrained language model 210.
- the pretrained language model 210 may be obtained by the server 202 and be stored in a memory, such as memory 124.
- the pretrained language model 210 may be loaded into a machine learning training module, such as the training module 180, by the server 202 for retraining/fine-tuning.
- a supervised training dataset 212 may be used to fine-tune the pretrained language model 210 wherein each data input prompt to the pretrained language model 210 may have a known output response for the pretrained language model 210 to learn from.
- the supervised training dataset 212 may be stored in a memory of the server 202 (e.g., the memory 124) or a separate training database.
- the data labelers may create the supervised training dataset 212 prompts and appropriate responses.
- the pretrained language model 210 may be fine-tuned using the supervised training dataset 212 resulting in the SFT machine learning model 215, which may provide appropriate responses to user prompts once trained.
- the trained SFT machine learning model 215 may be stored in a memory of the server 202 (e.g., memory 124).
- the server 202 may fine-tune the pretrained language model 210 using a set of vectors associated with a set of training data.
- the set of training data may include prompts associated with questions and documents, and responses associated with the prompts.
- Creating the set of vectors may include (1 ) splitting the text of the prompts, associated questions and/or associated documents into semantic clusters, and (2) encoding the semantic clusters as the set of vectors.
- the semantic clusters may be one or more words, a portion of a word, or a character.
- a distance between the vectors (e.g., a cosine distance, a Euclidean distance) may depend on a relevance between the semantic clusters corresponding to the vectors.
- training the machine learning model 250 may include the server 204 training a reward model 220 to provide as an output a scaler value/reward 225.
- the reward model 220 may be required to leverage Reinforcement Learning with Human Feedback (RLHF) in which a model (e.g., machine learning model 250) learns to produce outputs which maximize its reward 225, and in doing so may provide responses which are better aligned to user prompts.
- RLHF Reinforcement Learning with Human Feedback
- Training the reward model 220 may include the server 204 providing a single prompt 222 to the SFT machine learning model 215 as an input.
- the input prompt 222 may be provided via an input device (e.g., a keyboard) via the I/O module of the server, such as input processing module 146.
- the prompt 222 may be previously unknown to the SFT machine learning model 215, e.g., the labelers may generate new prompt data, the prompt 222 may include testing data stored on a training database, and/or any other suitable prompt data.
- the SFT machine learning model 215 may generate multiple, different output responses 224A, 224B, 224C, 224D to the single prompt 222.
- the server 204 may output the responses 224A, 224B, 224C, 224D via an I/O module (e.g., input processing module 146) to a user interface device, such as a display (e.g., as text responses), a speaker (e.g., as audio/voice responses), and/or any other suitable manner of output of the responses 224A, 224B, 224C, 224D for review by the data labelers.
- a user interface device such as a display (e.g., as text responses), a speaker (e.g., as audio/voice responses), and/or any other suitable manner of output of the responses 224A, 224B, 224C, 224D for review by the data labelers.
- the data labelers may provide feedback via the server 204 on the responses 224A, 224B, 224C, 224D when ranking 226 them from best to worst based upon the prompt-response pairs.
- the data labelers may rank 226 the responses 224A, 224B, 224C, 224D by labeling the associated data.
- the ranked prompt-response pairs 228 may be used to train the reward model 220.
- the server 204 may load the reward model 220 via the machine learning module (e.g., the machine learning module 140) and train the reward model 220 using the ranked response pairs 228 as input.
- the reward model 220 may provide as an output the scalar reward 225.
- the scalar reward 225 may include a value numerically representing a human preference for the best and/or most expected response to a prompt (i.e., a higher scalar reward value may indicate the user is more likely to prefer that response, and a lower scalar reward may indicate that the user is less likely to prefer that response).
- a higher scalar reward value may indicate the user is more likely to prefer that response
- a lower scalar reward may indicate that the user is less likely to prefer that response.
- inputting the “winning” prompt-response (i.e., input-output) pair data to the reward model 220 may generate a winning reward.
- Inputting a “losing” prompt-response pair data to the same reward model 220 may generate a losing reward.
- the reward model 220 and/or scalar reward 225 may be updated based upon labelers ranking 226 additional promptresponse pairs generated in response to additional prompts 222.
- a data labeler may provide to the SFT machine learning model 215 as an input prompt 222, “Describe the sky.”
- the input may be provided by the labeler via the client computing device 102 over network 106 to the server 204 running a chatbot application utilizing the SFT machine learning model 215.
- the SFT machine learning model 215 may provide as output responses to the labeler via the client computing device 102: (i) “the sky is above” 224A; (ii) “the sky includes the atmosphere and may be considered a place between the ground and outer space” 224B; and (iii) “the sky is heavenly” 224C.
- the data labeler may rank 226, via labeling the prompt-response pairs, prompt-response pair 222/224B as the most preferred answer; prompt-response pair 222/224A as a less preferred answer; and promptresponse 222/224C as the least preferred answer.
- the labeler may rank 226 the promptresponse pair data in any suitable manner.
- the ranked prompt-response pairs 228 may be provided to the reward model 220 to generate the scalar reward 225.
- the reward model 220 may provide the scalar reward 225 as an output, the reward model 220 may not generate a response (e.g., text). Rather, the scalar reward 225 may be used by a version of the SFT machine learning model 215 to generate more accurate responses to prompts, i.e., the SFT model 215 may generate the response such as text to the prompt, and the reward model 220 may receive the response to generate a scalar reward 225 of how well humans perceive it. Reinforcement learning may optimize the SFT model 215 with respect to the reward model 220 which may realize the configured machine learning model 250.
- the server 206 may train the machine learning model 250 (e.g., via the machine learning module 140) to generate a response 234 to a random, new and/or previously unknown user prompt 232.
- the machine learning model 250 may use a policy 235 (e.g., algorithm) which it learns during training of the reward model 220, and in doing so may advance from the SFT model 215 to the machine learning model 250.
- the policy 235 may represent a strategy that the machine learning model 250 learns to maximize the reward 225.
- a human labeler may continuously provide feedback to assist in determining how well the machine learning model’s 250 responses match expected responses to determine the rewards 225.
- the rewards 225 may feed back into the machine learning model 250 to evolve the policy 235.
- the policy 235 may adjust the parameters of the machine learning model 250 based upon the rewards 225 it receives for generating good responses.
- the policy 235 may update as the machine learning model 250 provides responses 234 to additional prompts 232.
- the response 234 of the machine learning model 250 using the policy 235 based upon the reward 225 may be compared using a cost function 238 to the SFT machine learning model 215 (which may refrain from using a policy) response 236 of the same prompt 232.
- the cost function 238 may be trained in a similar manner and/or contemporaneous with the reward model 220.
- the server 206 may compute a cost 240 based upon the cost function 238 of the responses 234, 236.
- the cost 240 may reduce the distance between the responses 234, 236 (i.e., a statistical distance measuring how one probability distribution is different from a second).
- Using the cost 240 to reduce the distance between the responses 234, 236 may avoid a server over-optimizing the reward model 220 and deviating too drastically from the human-intended/preferred response. Without the cost 240, the machine learning model 250 optimizations may result in generating responses 234 which are unreasonable but may still result in the reward model 220 outputting a high reward 225.
- the responses 234 of the machine learning model 250 using the current policy 235 may be passed by the server 206 to the rewards model 220, which may return the scalar reward 225.
- the machine learning model 250 response 234 may be compared via the cost function 238 to the SFT machine learning model 215 response 236 by the server 206 to compute the cost 240.
- the server 206 may generate a final reward 242 which may include the scalar reward 225 offset and/or restricted by the cost 240.
- the final reward 242 may be provided by the server 206 to the machine learning model 250 and may update the policy 235, which in turn may improve the functionality of the machine learning model 250.
- RLHF via the human labeler feedback may continue ranking 226 responses of the machine learning model 250 versus outputs of earlier/other versions of the SFT machine learning model 215, i.e., providing positive or negative rewards 225.
- the RLHF may allow the servers (e.g., servers 204, 206) to continue iteratively updating the reward model 220 and/or the policy 235.
- the machine learning model 250 may be retrained and/or fine-tuned based upon the human feedback via the RLHF process, and throughout continuing conversations may become increasingly efficient.
- servers 202, 204, 206 are depicted in the exemplary block and logic diagram 200, each providing one of the three steps of the overall machine learning model 250 training, fewer and/or additional servers may be utilized and/or may provide the one or more steps of the machine learning model 250 training. In some implementations, one server may provide the entire machine learning model 250 training.
- FIG. 3 depicts an exemplary network 300 including a user device 310, cloud platform 320, and data sources 350.
- the exemplary network 300 includes a plurality of modules configured to implement the instant techniques as described herein.
- the cloud platform 320 includes, is partially stored on, or is completely stored on the server computing device 104 and/or the client computing device 102 of FIG. 1 .
- the user device 310 may be or include the client computing device 102 of FIG. 1 and/or another computing device.
- the cloud platform 320 is communicatively coupled to one or more user-associated devices including user device 310.
- the user device 310 includes a user interface (Ul) such as user-side Ul 304 that may include a frontend Ul and/or an authentication module configured to interface with a security module 312 of the cloud platform 320 to authenticate and/or verify information from a user utilizing the user device 310.
- the user device 310 additionally includes an application client 308 including an API gateway communicatively coupled to a module of the cloud platform 320.
- the API gateway in the application client 308 may include an authentication module managed by a third party (e.g., not managed by the security module 312) or managed by cloud platform 320 (e.g., managed by the security module 312).
- the user device 310 may additionally include a user profile module 306 configured to pull user profile data from one or more databases.
- the cloud platform 320 includes a security module 312 configured to manage security for the cloud platform 320 and/or user devices (e.g., user device 310) interfacing with the cloud platform 320.
- the security module 312 may include functionality for configuring and/or controlling an identity-aware proxy (IAP) functionality, defensive security functionality for defending against web attacks (e.g., denial of service attacks, virtual machine attacks, workload attacks, etc.), and/or a load balancing functionality to distribute network traffic across devices communicatively coupled to the cloud platform 320.
- IAP identity-aware proxy
- the security module 312 is communicatively coupled to a client III module 318.
- the client III module 318 may be communicatively coupled to one or more client devices (not shown).
- a user device 310 may be a user device and/or associated with data for a user and/or a device to provide user data to the cloud platform
- a client device (not shown) may be a client computing device 102 and/or other endpoint device communicatively coupling to the cloud platform 320 to utilize the models as described herein.
- the security module 312 may balance a load using a load balancing functionality between the client device, the user device 310, and the cloud platform 320.
- the cloud platform 320 may additionally or alternatively include a separate balance module 314 to perform further load balancing within the cloud platform 320.
- the balance module 314 may function in conjunction with and/or separately from the security module 312.
- the cloud platform 320 additionally includes an application server module 322 configured to serve backend functionalities of the III (e.g., the client III module 318 and/or user-side III 304).
- the application server module 322 is communicatively coupled to the record module 324 and configured to call the record module 324 (e.g., via an API) to facilitate functionality of the techniques as described herein.
- the record module 324 may perform and/or be communicatively coupled to components that perform functionalities as described herein.
- the record module 324 may analyze a corpus of documents (e.g., via a communicatively coupled and/or stored Al model 316) as described herein.
- the record module 324 may call and/or interact with the enrichment module 326 to enrich one or more documents in the corpus of documents (e.g., with metadata, with additional data, by combining records, etc.).
- the record module 324 may call the enrichment module 326 to enrich records for training the Al model 316, for analyzing the records, to enrich records with an analysis of the records (e.g., prior to display and/or storage), etc.
- the cloud platform 320 includes a persistence layer 330 communicatively coupled to the record module 324 and/or enrichment module 326.
- the persistence layer 330 may include one or more databases (e.g., an SQL database, a memory storage database, a training data database, etc.) configured to store data associated with the techniques described herein.
- the persistence layer 330 may store one or more analyzed documents, one or more summaries generated based on documents, one or more categories for a corpus of documents, etc.
- the cloud platform 320 may additionally include a communication module 332, including a virtual private cloud (VPC) network and/or a network address translation (NAT) module configured to enable and/or facilitate communications with one or more external databases, devices, services, etc.
- the communication module 332 may be communicatively coupled to the user profile module 306 of the user device 310, one or more data sources 350, and/or one or more additional devices (e.g., a client device communicatively coupled to the client III module 318) (not shown).
- the record module 324 and/or enrichment module 326 may call one or more APIs associated with such other devices via the communication module 332 to retrieve and/or store data, to display data, to train and/or access the Al model 316 (e.g., when the Al model is stored at another device), etc.
- the enrichment module 326 is communicatively coupled to one or more functionality modules (e.g., a document analysis module 334, an asynchronous processing module 336, a cloud scheduler module 338, etc.).
- the document analysis module 334 performs one or more functionalities for analysis of a document and/or pre-processing a document for analysis.
- the document analysis module 334 may perform one or more optical character recognition (OCR) operations on an unstructured data file as pre-processing to ready the document for further analysis by the Al model 316.
- OCR optical character recognition
- the document analysis module 334 functions as part of and/or in concert with the Al model 316 to analyze the documents.
- the asynchronous processing module 336 functions in conjunction with the enrichment module 326 and/or record module 324 to enable asynchronous load spreading (e.g., for analysis of the corpus of documents) across a plurality of compute instances (e.g., for parallel processing).
- the cloud scheduler module 338 may function in conjunction with the enrichment module 326 and/or record module 324 to enable batch processing of the corpus of documents.
- an API module 345 in one or more devices associated with one or more data sources 350 interfaces with the cloud platform 320 (e.g., via the communication module 332) to gather and provide data from one or more databases.
- the API module 345 may be communicatively coupled to a historical user database 352, a device report database 354, a subject matter database 356 (e.g., a healthcare database, a patent database, a financial record database, etc.), and/or any other such database as described herein.
- the database(s) may include one or more servers/devices storing the documents and/or a view for summarizing, selecting, and/or interacting with documents.
- FIG. 4 depicts an exemplary Ul 400 for displaying mapped documents to a user.
- the Ul 400 is or is associated with a user-side Ul 304, client Ul module 318, and/or other such Ul module of FIG. 3.
- the Ul 400 may be displayed to a user via a client computing device 102 of FIG. 1 and/or another such computing device.
- the Ul 400 includes a document list 436 including relevant documents from a corpus of documents for display to a user.
- the documents in the document list 436 may be or include any documents that a model (e.g., Al model 316 of FIG. 3) determines to be associated with a subject (e.g., a patient, an inventor, an investor, etc.).
- the document list 436 includes one or more segments/categories of documents determined by the model.
- the document list 436 additionally displays and/or sorts the documents in the document list 436 by an extracted date (e.g., extracted as described herein).
- the document list 436 includes a source type for the documents in the document list 436.
- the Ul 400 includes a summary window 410.
- the summary window 410 includes multiple summaries of the documents in the document list 436.
- the summary window 410 displays a summary for a selected document from the document list 436.
- the summary window 410 additionally displays summaries for documents referenced in, associated with, and/or in a similar category to the selected document.
- the summary window 410 may indicate such (e.g., by listing a document and labeling said document as “missing”).
- the summary window 410 displays documents along with particular labels (e.g., a date, a category/classification, a recommendation, a comparison, etc.).
- the Ul 400 includes a viewing pane 415 to display documents (e.g., from the document list 436 and/or the summary window 410).
- the viewing pane 415 displays and/or automatically scrolls to a relevant portion of the document (e.g., responsive to a click on a portion of the document list 436 and/or summary window 410).
- the viewing pane 415 automatically highlights one or more keywords in the document (e.g., based on the keyword list 422).
- the Ul 400 may additionally include a keyword list 422.
- a computing device displaying the Ul 400 may save (e.g., automatically and/or responsive to an indication from a user) frequently searched keywords.
- the keyword list 422 may include an indication of a number of instances within a currently displayed document (e.g., within the viewing pane 415), within one or more documents in the summary window 410, within one or more documents in the document list 436, etc.
- the keyword list 422 may be part of the summary window 410 and/or another portion of the Ul 400.
- the Ul 400 may include a duplicate record view 424.
- a computing device implementing the Ul 400 and/or communicatively coupled to a computing device implementing the Ul 400 may detect when one or more documents in the corpus of documents are duplicate copies and hides the duplicate copies from view.
- the duplicate record view 424 indicates the duplicate copies and/or displays a document designated to be a duplicate copy when selected by a user.
- the Ul 400 may include an annotation pane 432.
- users may annotate outside records (e.g., draw shapes, write notes, label portions, etc.) and save the annotations.
- the annotation pane 432 may indicate the existence of such annotations, indicate an annotator, and/or allow a user to view the annotations and/or add additional annotations.
- the Ul 400 may include a document checklist 434.
- the checklist may be a list of expected documents (e.g., as generated by a model based on past operations and/or historical data, as input by a user, as pre-recorded for a predetermined operation, as generated based on referenced documents, etc.).
- the document checklist 434 may be or include a checklist of associated image files referenced in and/or associated with the documents in the document list 436 (e.g., a database listing of corresponding images, a database listing of corresponding logs, a database listing of corresponding files, etc.).
- FIG. 5 depicts an exemplary computer-implemented method 500 for analyzing a corpus of documents using one or more large language machine learning models, according to one or more implementations.
- method 500 may be implemented by a system 100, one or more components of the system 100 (e.g., the client computing device 102, server computing device 104, a third party computing device (not shown), etc.), one or more components outside of the system 100 (not shown), one or more alternative systems, etc.
- the system 100 identifies one or more documents associated with a user.
- the one or more documents are part of a larger corpus of documents (e.g., medical documents) associated with a plurality of users including the user in question.
- the one or more documents may be scanned documents, clinical document architecture (CDA) documents or other structured text documents, portable document files (PDF), images, videos, etc.
- CDA clinical document architecture
- PDF portable document files
- the one or more documents may be or include different document types.
- the one or more documents may include a structured data file (e.g., a text file, one or more fillable forms, one or more SQL database files, etc.), a semi-structured data file (e.g., a text file including images, metadata associated with a file, NoSQL database files, etc.), and/or an unstructured data file (e.g., emails, images, scanned documents, videos, etc.).
- a structured data file e.g., a text file, one or more fillable forms, one or more SQL database files, etc.
- a semi-structured data file e.g., a text file including images, metadata associated with a file, NoSQL database files, etc.
- an unstructured data file e.g., emails, images, scanned documents, videos, etc.
- an unstructured data and/or semi-structured data is initially processed by a separate machine learning model trained to process such data to enable another model to analyze the data in conjunction with structured data that is able to be
- the system 100 sorts the data based on whether the data file includes structured, semi-structured, or unstructured data. Depending on the implementation, the system 100 performs the sort based on a file extension, based on a file name, based on one or more detected data types, etc. In further implementations, the system 100 additionally or alternatively attempts to parse and/or analyze semi-structured and/or unstructured data using a model for analyzing structured data and calls the pre-processing model responsive to determining that the model fails and/or partially fails.
- the system 100 identifies the one or more documents based on natural language processing (NLP) techniques to detect a user identity (e.g., a patient name, account number, phone number, email address, etc.).
- NLP natural language processing
- the system 100 splits and classifies data in documents into one or more categories (as described below with regard to block 503) including a name, identity, date, etc., and the system 100 identifies the one or more documents as associated with the user based on the categorized data.
- the one or more documents are or include structured text data documents (e.g., Health Level 7 (HL7) Consolidated Clinical Document Architecture (C- CDA) documents) from one or more health information exchange networks.
- the system 100 may use NLP and/or structured document semantic techniques to process the one or more documents prior to rendering and/or incorporating the one or more documents with other documents for analysis and/or display (e.g., as described below with regard to blocks 503, 504, 506, and/or 508).
- the system 100 additionally or alternatively identifies dates associated with the one or more documents.
- the system 100 may extract the dates from categorized data (e.g., as described in more detail below with regard to block 503).
- the system 100 may parse the dates associated with the one or more documents based on metadata and/or other such assigned and/or collated data. Such extraction of dates may be particularly useful, as conventional machine learning and NLP tools are generally unable to differentiate between different dates present on some records (e.g., scanned date, signed date, visit date, publication date, filing date, etc.). As such, the use of a particularly trained machine learning model as described herein allows for easier and less resource-intensive automatic extraction of dates.
- the system 100 collects at least some of the one or more documents (e.g., via the data collection module 172 of FIG. 1 ; the record module 324, the communication module 332, etc. of FIG. 3; and/or any other such component as described herein).
- the system 100 may determine to collect the one or more documents responsive to a determination (e.g., by one or more machine learning models as described below).
- the determination may be or include a determination that one or more documents are missing (e.g., based on the identification as described above), a detection of a reference to another environment (e.g., another hospital) that may include additional documentation, a determination that a user is requesting additional documents, etc.
- the system 100 collects the documents by generated automated phone calls, fax requests, emails, etc. In further implementations, the system 100 may automatically generate collated data classification(s) and/or analyze the documents collected as described in blocks 503 and/or 504 below.
- the system 100 and/or the trained large language machine learning models of the system 100 include chatbot functionalities (e.g., the chat mode described above).
- the system 100 may utilize the chatbot functionalities to communicate with a user to guide the user through requesting documents from other environments rather than automatically generating the requests.
- the system 100 may utilize the chatbot functionality to generate one or more questions for the user, for the outside environment, for another user (e.g., a physician, nurse, administrator, etc.) regarding the outside records and/or to obtain the outside records.
- the system 100 may utilize the chatbot functionality to adjust summarization, output, and/or other such functions (e.g., as described below with regard to blocks 506 and/or 508).
- the system 100 generates, using one or more trained large language machine learning models, one or more collated data classifications based on the one or more documents (e.g., using a classification tree based on the data included in the one or more documents, using neural network focused techniques, using vector space embedding values, etc.).
- the system 100 is able to reduce, remove, and/or mitigate hallucinations in the model.
- the machine learning model is able to use specific and targeted queries to analyze the data.
- the machine learning model is able to more particularly analyze the categorized data, and is less likely to provide broad or false answers (e.g., due to mischaracterizing data, due to incorrectly connecting data to a broad term, etc.). As such, hallucination and error rates may be reduced by generating and analyzing the collated data classifications.
- the machine learning model is a multi-step machine learning model.
- the same machine-learning model may perform each step as described herein, and may be fine-tuned at each step (e.g., as described in detail above).
- the machine learning model may be trained with particular documents and/or data at each step to better perform the individual step and/or may otherwise receive a plurality of targeted prompts to retrieve specific information from each document.
- the machine learning model may be trained to generate the collated data classifications using a number of documents from a broad range of categories with appropriate labeling.
- the machine learning model may be trained to analyze the documents with individual documents and subcategories, with appropriate labeling.
- the machine learning model may be trained to perform each individual step rather than broadly trained to output a generalized answer, allowing for overall improved performance and scaling without need for normalization, which may not be possible at the broad ranges of data which the machine learning model must analyze.
- the multi- step machine learning model may be consistently accurate, straightforward to evaluate, tailored to user needs, and capable of automatically correcting when errors arise.
- the system 100 splits and classifies the data into categories based on the document(s) being analyzed. For example, the system 100 may split documents into a header section, a date section, a name section, a diagnosis section, a recommendation section, a methods section, etc. In further implementations, the system 100 generates summaries for the document(s) based on the categorized data. In still further implementations, the system 100 generates a summary for at least some of the categories (e.g., a summary for the diagnosis sections, a summary for the recommendation sections, a summary for the methods sections, etc.).
- a summary for at least some of the categories e.g., a summary for the diagnosis sections, a summary for the recommendation sections, a summary for the methods sections, etc.
- the system 100 generates a summary for at least some of the categories for each document separately (e.g., a summary for each diagnosis section of each document, a summary for each recommendation section for each document, a summary for each methods section for each document, etc.).
- the system 100 generates the summaries based on and/or to include an F1 score (e.g., the harmonic mean of the precision and recall of the classification model) and/or accuracy score indicative of the reliability of the model.
- the system 100 may generate the summaries to indicate missing information in the documents, incorrect information in the documents, inconsistent information in the documents, and/or abnormal information in the documents.
- the system 100 splits the data in the documents by detecting individual sub-documents within a larger document.
- a single document may include multiple sub-documents, each indicative of a different portion of a subject’s file (e.g., for a medical file, each sub-document may represent a different procedure that was performed; for a patent file, each sub-document may be a separate patent owned in a portfolio; etc.).
- the system 100 analyzes the one or more documents (e.g., in the collated data classifications) to extract data (e.g., clinical data) for at least one user (e.g., the user(s) associated with the one or more documents).
- data e.g., clinical data
- the extracted data may include data types for the one or more documents and/or relevant dates associated with the data types.
- the data types may include radiological or medical imaging studies (e.g., such as (i) screening mammogram, (ii) diagnostic mammogram, (iii) biopsy procedure report, (iv) biopsy pathology report, (v) post-biopsy imaging, (vi) computed tomography scans, (vii) x- ray studies, (viii) ultrasound reports, (ix) echocardiograph reports, (x) MRI reports, (xi) clinical notes, (xii) microbiology reports, (xiii) operative reports, (xiv) diagnostic test reports and/or (xv) any other such indication).
- radiological or medical imaging studies e.g., such as (i) screening mammogram, (ii) diagnostic mammogram, (iii) biopsy procedure report, (iv) biopsy pathology report, (v) post-biopsy imaging, (vi) computed tomography scans, (vii) x- ray studies, (viii) ultrasound reports, (ix) echocardiograph reports, (x) MRI
- analyzing the one or more documents includes analyzing, via the one or more processors, at least some of the one or more documents using one or more image analysis models to generate at least some of the extracted data.
- the one or more documents may be or include medical documents including images (e.g., radiology images), and the system 100 may analyze the documents to determine a type of image.
- the system 100 uses the image analysis models to detect text on the image(s) (e.g., via optical character recognition (OCR) techniques) and determines the extracted data (e.g., document type and/or associated dates) based on such.
- OCR optical character recognition
- the system 100 analyzes at least some of the documents using the one or more large language machine learning models.
- the system 100 may generate the extracted data using the large language machine learning model(s).
- the system 100 may use both image analysis models and the large language models.
- the image analysis models may be separate from the large language models and/or a functionality of such.
- the system 100 may interpret the extracted data in the one or more documents (e.g., using a machine learning model through the analysis process of block 504) to determine one or more procedures, pre-visit testing orders, appointment types, etc. for a user and/or patient associated with the one or more documents.
- the system 100 automatically orders, schedules, and/or otherwise organizes the recommended procedures, appointments, etc. based on the above.
- the system 100 displays an indication to a user and/or patient for the recommendations generated above, and orders, schedules, and/or otherwise organizes such responsive to an acceptance and/or other such indication from the user and/or patient.
- the system 100 generates, using one or more trained large language machine learning models, a mapping of the data.
- the mapping is for a single user across the one or more documents and/or the corpus of documents.
- the mapping is for multiple users (e.g., users that are related, users that are associated with a particular overseer (e.g., a clinician, doctor, lawyer, etc.), users that are demographically similar (e.g., similar ages, genders, builds, etc.), and/or users that have a similarly relevant connection).
- the mapping of the data includes an indication that the extracted data is available and/or an indication of missing data not included in the extracted data.
- the mapping includes one or more suggestions for additional actions to perform to populate fields associated with the missing data (e.g., performing one or more tests, requesting information from a user, providing one or more forms to a user, etc.).
- the system 100 stitches categorized data back into a single whole. By putting the extracted information back together, the system 100 is able to verify the elements that go into the mapping, individually ensuring each element and further preventing and/or mitigating hallucinations or errors.
- the system 100 compares the mapped data to a checklist (e.g., predetermined by a user, preprogrammed into the system 100, generated by a model in the system 100 based on one or more other referenced documents, etc.). As such, the system 100 may determine whether any documents are missing from the corpus of documents. In some such implementations, the system 100 may automatically alert a user that one or more documents are missing and/or prompt the user to provide the missing document(s). In further implementations, the system 100 may automatically query one or more additional databases to search for the missing document(s) and/or transmit the results to a verification module for additional analysis and/or verification. In some implementations, the system 100 saves resources by comparing the results to a checklist.
- a checklist e.g., predetermined by a user, preprogrammed into the system 100, generated by a model in the system 100 based on one or more other referenced documents, etc.
- the system 100 may determine whether any documents are missing from the corpus of documents.
- the system 100 may automatically
- the system 100 may prevent unneeded repetitions of searches when a document is missing (e.g., searching and reanalyzing until a timeout occurs) and/or prevents unnecessary analysis (e.g., stopping when the checklist is complete rather than searching all documents), depending on the implementation.
- the system 100 evaluates performance of the model and/or of other modules in the system 100 by comparing the mapped data to manually labeled outside medical records to quantify the accuracy of the output. In further implementations, the system 100 evaluates performance of the model and/or other modules at each step of the process (e.g., to fine-tune the model(s) at each step as described in more detail below). In still further implementations, the performance may be evaluated for accuracy errors, omission errors, readability, etc. (e.g., using labeled data, using manual review, using a teacher model, etc.).
- the system 100 additionally generates a customizable summary of the data for the user.
- the system 100 receives one or more customizable metrics from a client device (e.g., associated with a doctor, lawyer, accountant, etc.) indicative of information to include in the summary.
- the system 100 uses one or more machine learning models (e.g., the large language machine learning model described above) to generate the summary.
- the system 100 may determine the one or more customizable metrics based on information previously received from and/or used by the client device.
- the system 100 generates the customizable summary to include elements of and/or the entirety of the summaries for various categories as described above.
- the system 100 fine-tunes the one or more machine learning models for each particular step of the process.
- the system 100 may finetune one or more models generating a summary by training the one or more models using labeled summary data and input documents related to the particular summary (e.g., a model to generate summaries for catheterization lab reports is trained using summaries of catheterization lab reports, a model to generate summaries of patent applications is trained using summaries of patent applications, etc.).
- the system 100 may utilize access to a diverse set of records and/or data to effectively train the one or more models.
- the various models and/or steps are fine-tuned (e.g., based on categorization, date, relevant rules, etc. as described above). As such, the overall resource usage is reduced and improved while allowing for analysis of the corpus of documents.
- the system 100 additionally detects one or more duplicate versions of at least some of the one or more documents (e.g., in the corpus of documents). Depending on the implementation, the system 100 may remove the one or more duplicate versions from the one or more documents and/or the mapping. In further implementations, the system 100 may determine that the one or more duplicate versions include one or more updated versions and replace outdated documents with the updated versions. Similarly, when initially identifying the one or more documents that are associated with a user (e.g., at block 502), the system 100 may identify the duplicate documents and remove such by refraining from including the documents in the one or more documents to be used for generating the mapping.
- the system 100 may identify the duplicate documents and remove such by refraining from including the documents in the one or more documents to be used for generating the mapping.
- the system 100 detects that some of the documents have been rotated. In some such implementations, the system 100 specifically detects that a first orientation of at least some of the one or more documents does not match a second orientation of a remainder of the one or more documents. The system 100 then modifies the first orientation to match the second orientation. In some such implementations, the system 100 modifies the orientation(s) prior to analyzing the one or more documents. In further implementations, the system 100 modifies the orientation(s) prior to causing the output device to display the mapping.
- the system 100 detects that at least some of the documents are in a language other than a preferred language (e.g., for an individual reviewing the documents). For example, if the preferred language is English and some of the documents may be in Spanish, Arabic, Mandarin Chinese, etc.
- the system 100 may translate the documents from the document language to the preferred language.
- the system 100 uses one or more trained translation machine learning models (e.g., some of the large language models) to translate the documents.
- the system 100 may integrate with other applications (e.g., via an API) to perform the functionality described herein.
- the system 100 may include an API (e.g., API 166) to retrieve and/or request access to the corpus of documents.
- the system 100 may use the API to interface with other services and/or platforms (e.g., Substitutable Medical Applications and Reusable Technologies (SMART), Fast Healthcare Interoperability Resources (FHIR), etc.).
- SMART Substitutable Medical Applications and Reusable Technologies
- FHIR Fast Healthcare Interoperability Resources
- the system 100 may evaluate the performance of the functionalities described herein. For example, the system 100 may evaluate the performance based on (i) appropriate data science metrics for the given task, (ii) clinical usability and utility, and (iii) system evaluation and testing. For example, the system 100 may use each of the following: (i) ROUGE: Used to evaluate summarization; (ii) BLEU: Measures translation quality; (iii) ROC/AUC: Duplication evaluation, etc.
- the duplication task may be treated as a binary classification task where the model outputs a true/false for duplicate or not duplicate. Therefore, ROC/AUC may be used to measure task performance.
- the ROUGE and BLUE metrics may be generally accepted quantifications of these tasks and may be used to quantify performance.
- the summarization and/or translation assessment criteria may include Clinical Quality/Clinical Correctness: (i) Factual Error Count (e.g., how many factual errors per response; put another way, whether the content is provided accurate and not a hallucination);
- Omission e.g., whether there is any notable missing information that would be considered clinically relevant
- Relevance e.g., how relevant the content provided to the question/prompt is
- Timeliness e.g., whether the content provided is appropriate for the particular point in time
- Actionable e.g., whether the content is clinically actionable (meaning, could it be acted upon confidently)
- other such criteria e.g., whether the content is clinically actionable (meaning, could it be acted upon confidently).
- Further criteria may be and/or include output structural quality: (i) Voice (Yes/No) (e.g., whether the content is written for the correct clinical audience); (ii) Detail/Content Length (Yes/No) (e.g., whether the output is too verbose (for narrative output) or overly concise and/or terse); (iii) Variability (Rating 1 to 5) (e.g., whether the responses vary when asking the same question multiple times); (iv) Content Structure/Format (X/X - # of sections correctly matched out of total sections) (e.g., whether content matches to the desired structure (such as paragraphs, list formatting, etc.)); and/or (v) other such criteria.
- Unit, integration, and system level testing may be conducted on the system before deployment. Deployments may be contingent on successful automated testing, which may be integrated into the overall development and release process.
- Perplexity is an entropy-based method for measuring how well linguistic patterns match or “fit” into some pre-determined model.
- the system 100 may determine that incoming outside records have changed (e.g., new document types, new formats, new reporting systems or templates, etc.).
- Example 1 A computing system for analyzing a corpus of documents using one or more large language machine learning models, the computing system comprising: one or more processors; and one or more memories having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing system to: identify, via the one or more processors, one or more documents associated with at least one user from a corpus of documents associated with a plurality of users including the user; generate, via the one or more processors and using one or more trained large language machine learning models, one or more collated data classifications based on the one or more documents; analyze, via the one or more processors, the one or more collated data classifications to generate extracted data for the at least one user from the one or more documents; generate, via the one or more processors and using one or more trained large language machine learning models, a mapping of the extracted data for the at least one user across the corpus of documents; and cause, via the one or more processors, the mapping of the extracted data to be displayed via an output device.
- Example 2 The computing system of example 1 , wherein the mapping of the extracted data includes: an indication that the extracted data is available; and an indication of missing data not included in the extracted data.
- Example 3 The computing system of example 1 , the one or more memories having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing system to: generate, via the one or more processors and using the one or more machine learning models, a customizable summary of the extracted data for the at least one user.
- Example 4 The computing system of example 1 , wherein analyzing the one or more documents includes: analyzing, via the one or more processors, at least some of the one or more documents using one or more image analysis models to generate at least some of the extracted data.
- Example 5 The computing system of example 1 , wherein analyzing the one or more documents includes: analyzing, via the one or more processors, each document of the one or more documents using the one or more large language machine learning models.
- Example 6 The computing system of example 1 , the one or more memories having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing system to: detect, via the one or more processors, one or more duplicate versions of at least some of the one or more documents; and remove, via the one or more processors, the one or more duplicate versions from the one or more documents or the mapping.
- Example 7 The computing system of example 1 , the one or more memories having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing system to: detect, via the one or more processors, that a first orientation of at least some of the one or more documents does not match a second orientation of a remainder of the one or more documents; and modify, via the one or more processors, the first orientation to match the second orientation.
- Example 8 The computing system of example 1 , the one or more memories having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing system to: determine, via the one or more processors, that a document language of at least some of the one or more documents does not match a preferred language; and translate, via the one or more processors and using one or more trained translation machine learning models, the at least some of the one or more documents to the preferred language.
- Example 9 The computing system of example 1 , wherein the extracted data includes data types present in the one or more documents and relevant dates associated with the data types.
- Example 10 The computing system of example 9, wherein the data types include radiological or medical imaging studies.
- Example 10B The computing system of example 9, wherein the radiological or medical imaging studies include at least one of: (i) screening mammogram, (ii) diagnostic mammogram, (iii) biopsy procedure report, (iv) biopsy pathology report, or (v) post-biopsy imaging.
- Example 11 A non-transitory computer-readable medium, having stored thereon instructions that when executed, cause a computer to: identify, via one or more processors, one or more documents associated with at least one user from a corpus of documents associated with a plurality of users including the user; generate, via the one or more processors and using one or more trained large language machine learning models, one or more collated data classifications based on the one or more documents; analyze, via the one or more processors, the one or more collated data classifications to generate extracted data for the at least one user from the one or more documents; generate, via the one or more processors and using one or more trained large language machine learning models, a mapping of the extracted data for the at least one user across the corpus of documents; and cause, via the one or more processors, the mapping of the extracted data to be displayed via an output device.
- Example 12 The non-transitory computer-readable medium of example 11 , wherein the mapping of the extracted data includes: an indication that the extracted data is available; and an indication of missing data not included in the extracted data.
- Example 13 The non-transitory computer-readable medium of example 11 , having stored thereon instructions that when executed, cause a computer to: generate, via the one or more processors and using the one or more machine learning models, a customizable summary of the extracted data for the at least one user.
- Example 14 The non-transitory computer-readable medium of example 11 , wherein analyzing the one or more documents includes: analyzing, via the one or more processors, at least some of the one or more documents using one or more image analysis models to generate at least some of the extracted data.
- Example 15 The non-transitory computer-readable medium of example 11 , wherein analyzing the one or more documents includes: analyzing, via the one or more processors, each document of the one or more documents using the one or more large language machine learning models.
- Example 16 The non-transitory computer-readable medium of example 11 , having stored thereon instructions that when executed, cause a computer to: detect, via the one or more processors, one or more duplicate versions of at least some of the one or more documents; and remove, via the one or more processors, the one or more duplicate versions from the one or more documents or the mapping.
- Example 17 The non-transitory computer-readable medium of example 11 , having stored thereon instructions that when executed, cause a computer to: detect, via the one or more processors, that a first orientation of at least some of the one or more documents does not match a second orientation of a remainder of the one or more documents; and modify, via the one or more processors, the first orientation to match the second orientation.
- Example 18 The non-transitory computer-readable medium of example 11 , having stored thereon instructions that when executed, cause a computer to: determine, via the one or more processors, that a document language of at least some of the one or more documents does not match a preferred language; and translate, via the one or more processors and using one or more trained translation machine learning models, the at least some of the one or more documents to the preferred language.
- Example 19 The non-transitory computer-readable medium of example 11 , wherein the extracted data includes data types present in the one or more documents and relevant dates associated with the data types.
- Example 20 The non-transitory computer-readable medium of example 19, wherein the data types include radiological or medical imaging studies.
- Example 20B The non-transitory computer-readable medium of example 20, wherein the radiological or medical imaging studies include at least one of: (i) screening mammogram, (ii) diagnostic mammogram, (iii) biopsy procedure report, (iv) biopsy pathology report, or (v) post-biopsy imaging.
- Example 21 A computer-implemented method for analyzing a corpus of documents using one or more large language machine learning models, the computer-implemented method comprising: identifying, via one or more processors, one or more documents associated with at least one user from a corpus of documents associated with a plurality of users including the user; generating, via the one or more processors and using one or more trained large language machine learning models, one or more collated data classifications based on the one or more documents; analyzing, via the one or more processors, the one or more collated data classifications to generate extracted data for the at least one user from the one or more documents; generating, via the one or more processors and using one or more trained large language machine learning models, a mapping of the extracted data for the at least one user across the corpus of documents; and causing, via the one or more processors, the mapping of the extracted data to be displayed via an output device.
- Example 22 The computer-implemented method of example 21 , wherein the mapping of the extracted data includes: an indication that the extracted data is available; and an indication of missing data not included in the extracted data.
- Example 23 The computer-implemented method of example 21 , further comprising: generating, via the one or more processors and using the one or more machine learning models, a customizable summary of the extracted data for the at least one user.
- Example 24 The computer-implemented method of example 21 , wherein analyzing the one or more documents includes: analyzing, via the one or more processors, at least some of the one or more documents using one or more image analysis models to generate at least some of the extracted data.
- Example 25 The computer-implemented method of example 21 , wherein analyzing the one or more documents includes: analyzing, via the one or more processors, each document of the one or more documents using the one or more large language machine learning models.
- Example 26 The computer-implemented method of example 21 , further comprising: detecting, via the one or more processors, one or more duplicate versions of at least some of the one or more documents; and removing, via the one or more processors, the one or more duplicate versions from the one or more documents or the mapping.
- Example 27 The computer-implemented method of example 21 , further comprising: detecting, via the one or more processors, that a first orientation of at least some of the one or more documents does not match a second orientation of a remainder of the one or more documents; and modifying, via the one or more processors, the first orientation to match the second orientation.
- Example 28 The computer-implemented method of example 21 , further comprising: determining, via the one or more processors, that a document language of at least some of the one or more documents does not match a preferred language; and translating, via the one or more processors and using one or more trained translation machine learning models, the at least some of the one or more documents to the preferred language.
- Example 29 The computer-implemented method of example 21 , wherein the extracted data includes data types present in the one or more documents and relevant dates associated with the data types.
- Example 30 The computer-implemented method of example 29, wherein the data types include radiological or medical imaging studies.
- Example 30B The computer-implemented method of example 30, wherein the radiological or medical imaging studies include at least one of: (i) screening mammogram, (ii) diagnostic mammogram, (iii) biopsy procedure report, (iv) biopsy pathology report, or (v) postbiopsy imaging.
- any reference to "some implementations” or “an implementation” means that a particular element, feature, structure, or characteristic described in connection with the implementation is included in at least some implementations.
- the appearances of the phrase “in some implementations” in various places in the specification are not necessarily all referring to the same implementation.
- the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion.
- a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
- “or' 1 refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
Systems and methods for analyzing a corpus of documents using one or more machine learning models are provided. An exemplary system includes a processor and a memory having stored thereon computer-executable instructions that, when executed by the processor, cause the computing system to: (i) identify one or more documents associated with at least one user from a corpus of documents associated with a plurality of users including the user; (ii) generate, using one or more trained large language machine learning models, one or more collated data classifications; (iii) analyze the one or more collated data classifications to generate extracted data for the at least one user; (iv) generate, using one or more trained large language machine learning models, a mapping of the extracted data; and (v) cause, via the one or more processors, the mapping of available data to be displayed via an output device.
Description
SYSTEMS AND METHODS FOR ANALYZING A CORPUS OF DOCUMENTS USING LARGE LANGUAGE MACHINE LEARNING MODELS
TECHNICAL FIELD
[0001] The present disclosure is generally directed to methods and systems for using machine learning models to identify, review, analyze, and catalogue a corpus of documents, and more particularly, to techniques for training and operating one or more large language machine learning models to map information amongst such a corpus.
BACKGROUND
[0002] Document analysis often requires review of large quantities of documents. For example, complex patients often require review of large corpuses of outside documents with minimal consistency toward the format and/or readability of the records. Various groups (e.g., clinics such as a breast clinic) often include multidisciplinary teams (e.g., surgical and medical oncology, radiology, radiation oncology, allied health staff, etc.) to handle a complex new referral process without additional assistance from other groups (e.g., the Enterprise Office of Access Management (EOAM)). For each patient referred as a new consult, such teams check to ensure that the patient has the appropriate imaging and pathology studies done prior to the visit. For patients who do not have this, additional testing must be completed at the clinic or prior to arrival. Conventionally, this process is complex and highly manual.
[0003] Some clinics receive over 12 million pages of scanned patient medical records annually, in addition to extensive structured and unstructured data through various databases and record management applications. Despite the proliferation of electronic health records (EHRs), the volume of scanned records imported continues to grow. Records in both formats tend to be cumbersome and time-consuming to review and are subject to multiple layers of manual review in the current status quo. Specifically, records may be reviewed by requesting care teams to ensure completeness, then reviewed by Health Information Management Services (HIMS) for indexing, then reviewed by a pre-visit clinician for triage, and again reviewed by the clinician directly caring for the patient.
[0004] Similarly, most conventional techniques for outside records review focus on managing documents, sharing discrete data elements between institutions, a specific type of medical record information (e.g., laboratory results), or requesting/retrieving records. However, such techniques fail to provide functionality for locating, synthesizing, and presenting the right information to the right clinician.
BRIEF SUMMARY
[0005] In addressing the above concerns, large language models (LLMs) hold promise for record navigation and summarization. However, any such use cases require that LLM outputs are sufficiently accurate and formatted to clinicians' needs. Additionally, LLMs have demonstrated state-of-the-art performance in language translation tasks, which may assist in document review.
[0006] In one implementation, a computing system for analyzing a corpus of documents using one or more large language machine learning models includes one or more processors, and one or more memories having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing system to: (i) identify, via the one or more processors, one or more documents associated with at least one user from a corpus of documents associated with a plurality of users including the user; (ii) analyze, via the one or more processors, the one or more documents to generate extracted data for the at least one user from the one or more documents; (iii) generate, via the one or more processors and using one or more trained large language machine learning models, a mapping of the extracted data for the at least one user across the corpus of documents; and (iv) cause, via the one or more processors, the mapping of the extracted data to be displayed via an output device.
[0007] In another implementation, a non-transitory computer-readable medium includes instructions that when executed, cause a computer to: (i) identify, via one or more processors, one or more documents associated with at least one user from a corpus of documents associated with a plurality of users including the user; (ii) analyze, via the one or more processors, the one or more documents to generate extracted data for the at least one user from the one or more documents; (iii) generate, via the one or more processors and using one or more trained large language machine learning models, a mapping of the extracted data for the at least one user across the corpus of documents; and (iv) cause, via the one or more processors, the mapping of the extracted data to be displayed via an output device.
[0008] In yet another implementation, a computer-implemented method for analyzing a corpus of documents using one or more large language machine learning models includes: (i) identifying, via one or more processors, one or more documents associated with at least one user from a corpus of documents associated with a plurality of users including the user; (ii) analyzing, via the one or more processors, the one or more documents to generate extracted data for the at least one user from the one or more documents; (iii) generating, via the one or more processors and using one or more trained large language machine learning models, a mapping of the extracted data for the at least one user across the corpus of documents; and (iv)
causing, via the one or more processors, the mapping of the extracted data to be displayed via an output device.
BRIEF DESCRIPTION OF THE FIGURES
[0009] The figures described below depict various implementations of the system and methods disclosed therein. It should be understood that each figure depicts one implementation of a particular implementation of the disclosed system and methods, and that each of the figures is intended to accord with a possible implementation thereof. Further, wherever possible, the following description refers to the reference numerals included in the following figures, in which features depicted in multiple figures are designated with consistent reference numerals.
[0010] FIG. 1 depicts an exemplary computing environment in which the techniques disclosed herein may be implemented, according to some implementations.
[0011] FIG. 2 depicts a combined block and logic diagram in which exemplary computer- implemented methods and systems for training a large language model are implemented according to some implementations.
[0012] FIG. 3 depicts an exemplary network including a user device, cloud platform, and data sources, which may be implemented in the exemplary computing environment of FIG. 1 , according to some implementations.
[0013] FIG. 4 depicts an exemplary user interface for presenting mapped data generated by a machine learning model, which may be implemented in the exemplary computing environment of FIG. 1 , according to some implementations.
[0014] FIG. 5 depicts an exemplary computer-implemented method for analyzing a corpus of documents using one or more large language machine learning models, according to some implementations.
[0015] The figures depict preferred implementations for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative implementations of the systems and methods illustrated herein may be employed without departing from the principles of the invention described herein.
DETAILED DESCRIPTION
[0016] The instant disclosure details techniques to enhance review functionality for large corpuses of documents by incorporating large language models (LLMs), ultimately yielding greater time and cost savings for those who review records such as outside medical records.
[0017] Depending on the implementation, records may include scanned documents and various application and/or database specific (e.g., CareEverywhere) documents. A clinician and/or other individual may utilize the documents to perform visit triage and clinical care. However, existing processes for such tasks do not make optimal use of artificial intelligence and large language models, which, applied correctly, could yield large time savings. These time savings would serve to reduce administrative burden and burnout, while improving patient satisfaction with the referral process, as well as patient outcomes in cases where lengthy triage delays can be averted.
[0018] Some clinics, in particular, manage their own triage process(es) as opposed to relying on the Enterprise Office of Access Management (EOAM). In some such implementations, the clinics self-manage triage processes due to the complex nature of the patients referred to such. Patients referred, for example, may need a combination of imaging and pathology results available prior to the initial consultation. For patients who have not completed this testing elsewhere, it may be able to be performed at such a clinic. However, the current processes for cataloging this information per patient are highly manual and time consuming, and present opportunities for introducing efficiencies through automation. In particular, the use of an application may be introduced in conjunction with LLMs in order to both increase time savings and clinician satisfaction with outside medical records review.
[0019] In particular, when a clinic sees high volumes of new consults, preparing for each new consult requires a highly manual process of cataloging outside results, and in some cases, obtaining additional testing, prior to the consult visit. This process could be made substantially more efficient by leveraging Al to help teams save time when reviewing outside records. Further, incorporating LLMs to facilitate outside records clean up, cataloging, and summarization may cause benefits in the form of at least: (i) enhanced time savings and satisfaction for care teams when reviewing outside medical records; (ii) improved triage wait times for patients requesting new consult referrals; and/or (iii) synergy and enablement of other outside medical records projects by utilizing an API directed to facilitating the capabilities as described herein.
[0020] Moreover, the techniques described herein are capable of handling heavier loads while remaining scalable. Specifically, in some implementations, the techniques described herein utilize a microservices architecture to decouple parts of the application so they can scale independently. In further implementations, the techniques described herein implement autoscaling of compute and storage resources when appropriate and/or leverage load balancing services to manage and distribute incoming application traffic. In still further implementations,
the components are then packaged following continuous integration/continuous delivery (CI/CD) paradigms, which enable seamless and repeatable processes for adding new features, building, and deploying the application continuously. Consequently, in addition to the benefits provided by the methods described herein, the instant systems discussed herein further enable review of an ever-increasing volume and variety of documents. Moreover, the systems may follow rules and/or guidelines from various best-practices and/or regulatory bodies (e.g., MCC, AIF, and CAF, among others).
[0021] In further implementations, the instant techniques may provide for comprehensive load and performance testing (e.g., each time a new department or site and/or feature is added). In still further implementations, the systems discussed herein may provide health monitoring and alerts (e.g., by monitoring compute and storage components are in real time for any sudden spikes or sustained stress to the system) and/or send automated alerts to personnel (e.g., system engineers). Further, the instant techniques may provide for data and code backups (e.g., of databases in the cloud as well as code repositories). Moreover, the instant techniques provide for user-centric and/or post-deployment support.
[0022] Depending on the implementation, the instant techniques may achieve greater accuracy in document splitting, classification, and date extraction (e.g., 80-95% accuracy). Additionally, the instant techniques may reduce processing time and resource usage by reducing overall computational time spent analyzing records (e.g., reducing time spent by at least 12 minutes).
[0023] Depending on the implementation, a plurality of use cases for the instant techniques may be envisioned. For example, use cases may include usages such as record intake, triage, and clinical care of patients who are seen as “New Consults/New Visits” at a given institution. For such visits, whether at large referral centers or community clinics, visit protocols often begin with a medical records request. An intake institution may request documents for a patient, which must be sorted. Sorting is conventionally manual, but is automated according to the instant techniques. Depending on the implementation, such sorting may include identifying a start and/or end of a document, classifying each document into one of several clinical categories, and identifying the relevant date of a document. After confirming that the correct documents have been received, institutions will then follow some triage protocols where the institutions attempt to identify specific pieces of information from outside records that would help match an incoming patient with the correct services. Conventional systems require a manual reviewer to review each document. The instant techniques, however, provide search and summarization options as described herein. Finally, once a patient is matched with a set of
services, those services are scheduled. The nurses, physicians, and/or NPPAs conducting the visit may then have additional need for search or summarization to identify key data to propose a diagnosis and treatment plan. The instant techniques enable both search and summarization, and also allow for the annotation of outside records to save time and reduce errors related to missing buried clinical data.
[0024] Additional use cases may include legal document analysis (e.g., patent document analysis, discovery document analysis, etc.), financial document analysis (e.g., investment records for a user, tax documents, etc.), property document analysis, and/or other such use cases as described herein.
Exemplary Computing Environment
[0025] FIG. 1 depicts an exemplary computing environment and system 100 in which the techniques disclosed herein may be implemented, according to some implementations. The system 100 may include computing resources for training and/or operating machine learning models to collate and analyze documents in a clinical context, in some implementations.
[0026] The system 100 may include a client computing device 102, a server computing device 104, an electronic network 106, a context electronic database 110, and a model electronic database 1 12. The computing environment may further include one or more cloud application programming interfaces (APIs) 114. The components of the system 100 may be communicatively connected to one another via the electronic network 106, in some implementations.
[0027] The client computing device 102 may implement, inter alia, operation of one or more applications for facilitating analysis of a corpus of documents using one or more machine learning models (e.g., large language machine learning models). In some implementations, the client computing device 102 may be implemented as one or more computing devices (e.g., one or more servers, one or more laptops, one or more mobile computing devices, one or more tablets, one or more wearable devices, one or more cloud-computing virtual instances, etc.). In some implementations, a plurality of client computing devices may be part of the system 100 - for example, a first user may access a client computing device 102 that is a laptop, while a second user accesses the client computing device 102 that is a smart phone, while yet a third user accesses a client computing device 102 that is a wearable device.
[0028] The client computing device 102 may include one or more processors 120, one or more network interface controllers 122, one or more memories 124, an input device 126, an
output device 128 and a client API 130. The one or more memories 124 may have stored thereon one or more modules 140 (e.g., one or more sets of instructions).
[0029] In some implementations, the one or more processors 120 may include one or more central processing units, one or more graphics processing units, one or more field- programmable gate arrays, one or more application-specific integrated circuits, one or more tensor processing units, one or more digital signal processors, one or more neural processing units, one or more RISC-V processors, one or more coprocessors, one or more specialized processors/accelerators for artificial intelligence or machine learning-specific applications, one or more microcontrollers, etc.
[0030] The client computing device 102 may include one or more network interface controllers 122, such as Ethernet network interface controllers, wireless network interface controllers, etc. The network interface controllers 122 may include advanced features, in some implementations, such as hardware acceleration, specialized networking protocols, etc.
[0031] The memories 124 of the client computing device 102 may include volatile and/or nonvolatile storage media. For example, the memories 124 may include one or more random access memories, one or more read-only memories, one or more cache memories, one or more hard disk drives, one or more solid-state drives, one or more non-volatile memory express, one or more optical drives, one or more universal serial bus flash drives, one or more external hard drives, one or more network-attached storage devices, one or more cloud storage instances, one or more tape drives, etc.
[0032] As noted, the memories 124 may have stored thereon one or more modules 140, for example, as one or more sets of computer-executable instructions. In some implementations, the modules 140 may include additional storage, such as one or more operating systems (e.g., Microsoft Windows, GNU/Linux, Mac OSX, etc.). The operating systems may be configured to run the modules 140 during operation of the client computing device 102 - for example, the modules 140 may include additional modules and/or services for receiving and processing data from one or more other components of the system 100 such as the one or more cloud APIs 114 or the server computing device 104. The modules 140 may be implemented using any suitable computer programming language(s) (e.g., Python, JavaScript, C, C++, Rust, C#, Swift, Java, Go, LISP, Ruby, Fortran, etc.).
[0033] The modules 140 may include a model configuration module 142, an API module 144, an input processing module 146, an authentication/security module 148, a context module 150 and a cataloguing module 152, in some implementations. In some implementations, more or fewer modules 140 may be included. The modules 140 may be configured to communicate with
one another (e.g., via inter-process communication, via a bus, via sockets, pipes, message queues, etc.).
[0034] The model configuration module 142 may include one or more sets of computerexecutable instructions (i.e., software, code, etc.) for performing the functionalities described herein. The model configuration module 142 may enable one or more machine learning models (e.g., large language machine learning models, image analysis models, etc.) to be stored, for example in the memory 124 or in the context electronic database 110.
[0035] In some implementations, the model configuration module 142 may be omitted from the modules 140, or its access may be restricted to administrative users only. For example, in some implementations, one or more of the modules 140 may be packaged into a downloadable application (e.g., a smart phone app available from an app store) that enables registered but non-privileged (i.e., non-administrative) users to access the system 100 using their consumer client computing device 102. In other implementations, one or more of the client computing device(s) 102 may be locked down, such that the client computing device 102 is controlled hardware, accessible only to those who have physical access to certain areas.
[0036] The API module 144 may include one or more sets of computer executable instructions for accessing one or more remote APIs, and/or for enabling one or more other components within the system 100 to access functionality of the client computing device 102. In some implementations, the API module 144 may enable other client applications (i.e., not applications facilitated by the modules 140) to connect to the client computing device 102, for example, to send queries or prompts, and to receive responses from the client computing device 102. The API module 144 may include instructions for authentication, rate limiting and error handling.
[0037] As noted, the client computing device 102 may enable one or more users to access one or more trained models by providing input prompts that are processed by one or more trained models. The input processing module 146 may perform pre-processing of user prompts prior to being input into one or more models, and/or post-processing of outputs output by one or more models. For example, the input processing module 146 may process data input into one or more input fields, voice inputs or other input methods (e.g., file attachments) depending upon the application. The input processing module 146 may receive inputs directly via the input device 126, in some implementations.
[0038] In some implementations, the input processing module 146 may perform postprocessing of input received from one or more trained models. In some implementations, postprocessing (and/or pre-processing) may include implementing content moderation mechanisms,
to prevent misuses of trained models or inappropriate content generation. The input processing module 146 may include instructions for handling errors and for displaying errors to users (e.g., via the output device 128). The input processing module 146 may cause one or more graphical user interfaces to be displayed, for example to enable the user to enter information directly via a text field.
[0039] The authentication/security module 148 may include one or more sets of computerexecutable instructions for implementing access control mechanisms for one or more trained models, ensuring that the model can only be accessed by those who are authorized to do so, and that the access of those users is private and secure.
[0040] Generally, trained models require state information in order to meaningfully carry on a dialogue with a user or with another trained model. For example, if a user prompts a trained model with a question such as “What is the weather in Chicago today?” followed by a second prompt “And how about tomorrow?” the model should understand that, in context, the second query relates to the first query, insofar as the user is asking about the weather tomorrow in the same location (Chicago).
[0041] However, language models (e.g., large language models (LLMs)) are generally stateless, meaning that after they process a prompt, they have no internal record or memory of the information that was input, or the information that was generated as part of the language model’s processing. Thus, many systems add statefulness to models using context information. This may be implemented using sliding context windows, wherein a predetermined number of tokens (e.g., 4096 maximum tokens in the case of GPT 3.5, equivalent to about 3000 words) may be “remembered” by the LLM and can be used to enrich multiple sequential prompts input into the LLM (for example, when the LLM is used in a chat mode).
[0042] The context module 150 may include one or more sets of computer-executable instructions for maintaining state of the type found in this example, and other types of state information. The context module 150 may implement sliding window context, in some implementations. In other implementations, the context module 150 may perform other types of state maintaining strategies. For example, the context module 150 may implement a strategy in which information from the immediately preceding prompt is part of the window, regardless of the size of that prior prompt.
[0043] In some implementations, the context module 150 may implement a strategy in which one or more prior prompts are included in each current prompt. This prompt stuffing technique, or prompt concatenation, may be limited by prompt size constraints — once the total size of the
prompt exceeds the prompt limit, the model immediately loses state information related to parts of the prompt truncated from the prompt.
[0044] The cataloguing module 152 may include one or more sets of computer-executable instructions for identifying, retrieving, analyzing, and/or cataloguing documents for a particular user (e.g., a patient). Depending on the implementation, the cataloguing module 152 may perform outside records cataloguing by developing a shared mapping of available data across scanned documents, databases (e.g., a CareEverywhere database), radiology images, and/or other such documents. As further examples, the documents may include particular database and/or application-specific documents (e.g., OnBase documents and CareEverywhere documents) updated at least every 24 hours. In further implementations, the cataloguing module 152 accesses databases at least once every 24 hours for obtaining patient visit data to preschedule Al document processing. In some implementations, the cataloguing module accesses and/or retrieves documents for thousands of patients weekly. In further implementations, the cataloguing module 152 interfaces and/or otherwise integrates with one or more databases and/or other document repositories (e.g., Epic, OnBase (e.g., directly through OnBase API or through aLongitudinal Patient Record API), CareEverywhere, Clarity through Denodo, Mayo Clinic Cloud (Cloud App Factory & Al Factory 2.0), etc.).
[0045] In further implementations, the mapping may include the data types present and their relevant dates. In some implementations, the cataloguing module 152 focuses on the specific data types used by the a particular entity (e.g., a breast clinic, a pulmonary clinic, etc.) for a particular purpose (e.g., new consult referrals), such as radiological or medical imaging studies (e.g., screening mammograms, diagnostic mammograms, computed tomography scans, x-ray studies, ultrasound reports, echocardiograph reports, MRI reports, clinical notes, microbiology reports, operative reports, diagnostic test reports, biopsy procedure reports, biopsy pathology reports, and post-biopsy imaging (e.g., an indication that a patient underwent mammogram on a given date and an indication of the associated report, an indication that the patient underwent biopsy on a different date and an indication of the report, an indication that an ultrasound report is absent, etc.), etc.). In further implementations, the cataloguing module 152 implements techniques for reliably finding these documents among scanned records and extracting their relevant clinical date, as discussed below with regard to FIG. 5. In some implementations, the cataloguing module 152 includes and/or utilizes new data pipelines to access the data (e.g., including metadata) from a database (e.g., a radiology database).
[0046] In further implementations, the cataloguing module 152 additionally performs outside records summarization. As such, the cataloguing module 152 enables users to generate
customized, traceable records summaries to further streamline workflows. Similarly, the cataloguing module 152 may additionally perform outside records clean-up for removal of duplicated scanned documents, de-rotation of scanned pages (e.g., orientation matching), and/or Al-assisted translation of medical records in another language (e.g., Mandarin, Spanish, Arabic, etc.). In various implementations, these records summarizations generated by the cataloguing module 152 may be tailored for a multidisciplinary care team (e.g., a care team that includes surgical and medical oncology, radiology, radiation oncology, allied health staff, etc.). By performing records summarization on multimodal data types, a more comprehensive synthesis and summary of records data may be performed without assistance from particular care team groups.
[0047] In further implementations, the cataloguing module 152 is communicatively coupled to and/or includes an exposed microservice for internal use (e.g., via an API) for connection to other applications, programs, and/or other such projects that use outside medical records. In further implementations, at least some of the processes of the cataloguing module 152 may be implemented through an exposed microservice in external electronic health records computing systems, such as EPIC, (e.g., using SMART on an FHIR APP).
[0048] In still further implementations, the cataloguing module 152 performs performance evaluation of the functionalities described herein. In some implementations, evaluation may focus on at least three dimensions: (1 ) appropriate data science metrics for the given task, (2) clinical usability and utility, and/or (3) system evaluation and testing.
[0049] The server computing device 104 may include one or more processors 160, one or more network interface controllers 162, one or more memories 164, an input device (not depicted), an output device (not depicted) and a server API 166. The one or more memories 164 may have stored thereon one or more modules 170 (e.g., one or more sets of instructions).
[0050] In some implementations, the one or more processors 160 may include one or more central processing units, one or more graphics processing units, one or more field- programmable gate arrays, one or more application-specific integrated circuits, one or more tensor processing units, one or more digital signal processors, one or more neural processing units, one or more RISC-V processors, one or more coprocessors, one or more specialized processors/accelerators for artificial intelligence or machine learning-specific applications, one or more microcontrollers, etc.
[0051] The server computing device 104 may include one or more network interface controllers 162, such as Ethernet network interface controllers, wireless network interface
controllers, etc. The network interface controllers 162 may include advanced features, in some implementations, such as hardware acceleration, specialized networking protocols, etc.
[0052] The memories 164 of the server computing device 104 may include volatile and/or non-volatile storage media. For example, the memories 164 may include one or more random access memories, one or more read-only memories, one or more cache memories, one or more hard disk drives, one or more solid-state drives, one or more non-volatile memory express, one or more optical drives, one or more universal serial bus flash drives, one or more external hard drives, one or more network-attached storage devices, one or more cloud storage instances, one or more tape drives, etc.
[0053] As noted, the memories 164 may have stored thereon one or more modules 170, for example, as one or more sets of computer-executable instructions. In some implementations, the modules 170 may include additional storage, such as one or more operating systems (e.g., Microsoft Windows, GNU/Linux, Mac OSX, etc.). The operating systems may be configured to run the modules 170 during operation of the server computing device 104 - for example, the modules 170 may include additional modules and/or services for receiving and processing data from one or more other components of the system 100 such as the one or more cloud APIs 114 or the client computing device 102. The modules 170 may be implemented using any suitable computer programming language(s) (e.g., Python, JavaScript, C, C++, Rust, C#, Swift, Java, Go, LISP, Ruby, Fortran, etc.).
[0054] In some implementations, the modules 170 may include a data collection module 172, a data pre-processing module 174, a model pretraining module 176, a fine-tuning module 178, a model training module 180, a checkpointing module 182, a hyperparameter tuning module 184, a validation and testing module 186, an auto-prompting module 188, a model operation module 190 and an ethics and bias module 192. In some implementations, more or fewer modules 170 may be included. The modules 170 may be configured to communicate with one another (e.g., via inter-process communication, via a bus, via sockets, pipes, message queues, etc.). The modules 170 may respond to network requests (e.g., via the API 166) or other requests received via the network 106 (e.g., via the client computing device 102 or other components of the system 100).
[0055] The data collection module 172 may be configured to collect information used to train one or more modules. In general, the information collected may be any suitable information used for training a language model. The data collection module 172 may collect data via web scraping, via API calls/access, via database extract-transform-load (ETL) processes, etc. Sources accessed by the data collection module 172 include social media websites, books,
websites, academic publications, web forums/interest sites (e.g., social media pages, community pages, bulletin boards, etc.), etc. The data collection module 172 may access data sources by active means (e.g., scraping or other retrieval) or may access existing corpuses. The data collection module 172 may include sets of instructions for performing data collection in parallel, in some implementations. The data collection module 172 may store collected data in one or more electronic databases, such as a database accessible via the cloud APIs 114 or via a local electronic database (not depicted). The data may be stored in a structured and/or unstructured format. In some implementations, the data collection module 172 may store large data volumes used for training one or more models (i.e. , training data). For example, the data collection module 172 may store terabytes, petabytes, exabytes or more of training data.
[0056] In some implementations, the data collection module 172 may retrieve data from one or more databases as described herein. For example, the data collection module 172 may process the retrieved/received data and sort the data into multiple subsets based on information included within the database. For example, the data collection module 172 may receive one or more sets of unstructured text (e.g., transcripts of one or more historical meeting minutes of a clinical review board). The data collection module 172 may segment the data according to time (e.g., hourly, daily, quarterly, etc.).
[0057] The model pre-processing module 174 may include instructions for pre-processing data collected by the data collection module 172. In particular, the model pre-processing module 174 may perform text extraction and/or cleaning operations on data collected by the data collection module 172. The data pre-processing module 174 may perform pre-processing operations, such as lexical parsing, tokenizing, case conversions and other string splitting/munging. In some implementations, the data collection module 172 may perform data deduplication, filtering, annotation, compliance, version control, validation, quality control, etc. In some implementations, one or more human reviewers may be looped into the process of preprocessing data collected by the data pre-processing module 174. For example, a distributed work queue may be used to transmit batch jobs and receive human-computed responses from one or more human workers. Once pre-processed, the data pre-processing module 174 may store copied and/or modified copies of the training date in an electronic database.
[0058] In some implementations the data pre-processing module 174 may include instructions for parsing the unstructured text received by the data collection module 172 to structure the text. For example, when the text relates to clinician notes regarding a patient, the data pre-processing module 174 may generate a time series data structure in which each set of notes is represented by one or more timestamps, and at each timestamp, text associated with
various speakers, determinations (e.g., treatment, recommendation, etc.) is labelled. The data pre-processing module 174 may also label the data according to the identity of one or more speaker and/or one or more topic. For example, the time series data may be labeled according to one or more speakers associated with textual speech, in a language transcript form. The time series may include one or more keywords associated with the transcript. In some implementations, the present techniques may use a separate trained text summarization module to generate keywords used for this purpose. In this way, the data pre-processing module 174 may generate structured data corresponding to unstructured meeting minutes, such that the structured data is enriched with information about the meeting that is suitable for training. This structured data may be processed by downstream processes/modules.
[0059] Generally, the present techniques may train one or more models to perform language generation tasks that include token generation. Both training inputs and model outputs may be tokenized. Herein, tokenization refers to the process by which text used for training is divided into units such as words, subwords or characters. Tokenization may break a single word into multiple subwords (e.g., “LLM” may be tokenized as “L” and “LM”). The present techniques may train one or more models using a set of tokens (e.g., a vocabulary) that includes a multitude (e.g., thousands or more) of tokens. These tokens may be embedded into a vector. This vector of tokens or “embeddings” may include numerical representations of the individual tokens in the vocabulary in high-dimensional vector space. The modules 170 may access and modify the embeddings during training to learn relationships between tokens. These relationships effectively represent semantic language meaning.
[0060] In some implementations, a specialized database (e.g., a vector store, a graph database, etc.) may be used to store and query the embeddings. Embedding databases may include specialized features, such as efficient retrieval, similarity search, and scalability. For example, the server computing device 104 may include a local electronic embedding database (not depicted). In some implementations, a remote embedding database service may be used (e.g., via the cloud APIs 114). Such a remote embedding database service may be based on an open source or proprietary model (e.g., Milvus, Pinecone, Redis, Postgres, MongoDB, Facebook Al Similarity Search (FAISS), etc.). The server computing device 104 may include instructions (e.g., in the data collection module 172) for adding training data to one or more specialized databases, and for accessing it to train models.
[0061] The present techniques may include language modeling, wherein one or more deep learning models are trained by processing token sequences using a large language model architecture. For example, in some implementations, a transformer architecture may be used to
process a sequence of tokens. Such a transformer model may include a plurality of layers including self-attention and feedforward neural networks. This architecture may enable the model to learn contextual relationships between the tokens, and to predict the next token in a sequence, based upon the preceding tokens. During training, the model is provided with the sequence of tokens and it learns to predict a probability distribution over the next token in the sequence. This training process may include updating one or more model parameters (e.g., weights or biases) using an objective function that minimizes the difference between the predicted distribution and a true next token in the training data. Particular techniques for training are discussed in more detail below with regard to FIG. 2.
[0062] Alternatives to the transformer architecture may include recurrent neural networks, long short-term memory networks, gated recurrent networks, convolutional neural networks, recursive neural networks, and other modeling architectures.
[0063] In some implementations, the modules 170 may include instructions for performing pretraining of a language model (e.g., an LLM), for example, in a pretraining module 176. The pretraining module 176 may include one or more sets of instructions for performing pretraining, which as used herein, generally refers to a process that may span pre-processing of training data via the data pre-processing module 174 and initialization of an as-yet untrained language model. In general, a pre-trained model is one that has no prior training of specific tasks. For example, the model pretraining module 176 may include instructions that initialize one more model weights. In some implementations, model pretraining module 176 may initialize the weights to have random values. The model pretraining module 176 may train one or more models using unsupervised learning, wherein the one or more models process one or more tokens (e.g., pre-processed data output by the data pre-processing module 174) to learn to predict one or more elements (e.g., tokens). The model pretraining module 176 may include one or more optimizing objective functions that the model pretraining module 176 applies to the one or more models, to cause the one or more models to predict one or more most-likely next tokens, based on the likelihood of tokens in the training data. In general, the model pretraining module 176 causes the one or more models to learn linguistic features such as grammar and syntax. The pretraining module 176 may include additional steps, including training, data batching, hyperparameter tuning and/or model checkpointing.
[0064] The model pretraining module 176 may include instructions for generating a model that is pretrained for a general purpose, such as general text processing/ understanding. This model may be known as a “base model” in some implementations. The base model may be further trained by downstream training process(es), for example, those training processes
described with respect to the fine-tuning module 178. The model pretraining module 176 generally trains foundational models that have general understanding of language and/or knowledge. Pretraining may be a distinct stage of model training in which training data of a general and diverse nature (i.e., not specific to any particular task or subset of knowledge) is used to train the one or more models. In some implementations, a single model may be trained and copied. Copies of this model may serve as respective base models for a plurality of finetuned models.
[0065] In some implementations, base models may be trained to have specific levels of knowledge common to more advanced agents. For example, the model pretraining module 176 may train a medical student base model that may be subsequently used to fine tune an internist model, a surgeon model, a resident model, etc. In this way, the base model can start from a relatively advanced stage, without requiring pretraining of each more advanced model individually. This strategy represents an advantageous improvement, because pretraining can take a long time (many days), and pretraining the common base model only requires that pretraining process to be performed once.
[0066] The modules 170 may include a fine-tuning module 178. The fine-tuning module 178 may include instructions that train the one or models further to perform specific tasks.
Specifically, the fine-tuning module 178 may include instructions that train one or more models to generate respective language outputs (e.g., text generation), summarization, question answering or translation activities based on characteristics of a user and/or corpus of documents.
[0067] Continuing the example, the fine-tuning module 178 may include sets of instructions for retrieving one or more structured data sets, such as time series generated by the data preprocessing module 174. These structured data sets may be sorted by time, date, type, and/or any other similar metric to train one or more machine learning models (e.g., one or more language models) that may be used within the system 100 to collate and analyze documents. For example, the fine-tuning module 178 may include instructions for configuring an objective function for performing a specific task, such as generating text that is similar to text found within the corpus of training data associated with a particular individual by role. For example, the fine- tuning module 178 may include instructions for fine-tuning a pathologist model, based on a base language model. These fine-tuning instructions may select statements of pathologists from one or more databases including a corpus of data (or another data source). A medical resident model may be fine-tuned by the fine-tuning module 178, wherein the base model is the same
used to fine-tune the pathologist model. The fine-tuning module 178 may train many (e.g., hundreds or more) additional models.
[0068] In some implementations, the fine-tuning module 178 may include user-selectable parameters that affect the fine-tuning of the one or more models. For example, a “caution” bias parameter may be included that represents medical conservativeness. This bias parameter may be adjusted to affect the cautiousness with which the resulting trained model (i.e., agent) approaches medical decision-making. Additional models may be trained, for additional personas/tasks, as discussed below.
[0069] In some implementations, to manage complexity of fine-tuning and other machine learning operations of the server computing device 104, one or more open source frameworks may be used. Example frameworks include TensorFlow, Keras, MXNet, Caffe, SciKit learn, PyTorch. Specifically for training and operating language models, frameworks such as OpenLLM and LangChain may be used, in some implementations. The fine-tuning module 178 may use an algorithm such as stochastic gradient descent or another optimization technique to adjust weights of the pretrained model.
[0070] Fine-tuning may be an optional operation, in some implementations. In some implementations, training may be performed by the training module 180 after pretraining by the model pretraining module 176. In some implementations, the model training module 180 may perform task-specific training like the fine-tuning module 178, on a smaller scale or with a more tailored objective. For example, whereas the fine-tuning module 178 may fine tune a model to learn knowledge corresponding to a surgeon, the model training module 180 may further train the model to learn knowledge of a plastic surgeon, an orthopedic surgeon, etc. Depending on the implementation, the model may be trained as and/or using proprietary models and/or techniques (e.g., the PALM2/text-bison foundation model, MedPALM model, and/or other such LLMs).
[0071] The training module 180 may include one or more submodules, including the checkpointing module 182, the hyperparameter tuning module 184, the validation and testing module 186, and the auto-prompting module 188. The checkpointing module 182 may perform checkpointing, which is saving of a model’s parameters. The checkpointing module 182 may store checkpoints during training and at the conclusion of training, for example, in the model electronic database 1 12. In this way, the model may be run (e.g., for testing and validation) at multiple stages and its training parameters loaded, and also retrained from a checkpoint. In this way, the model can be run and trained forward without being re-trained from the beginning, which may save significant time (e.g., days of computation). The hyperparameter tuning
module 184 may include hyperparameters such as batch size, model size, learning rate, etc. These hyperparameters may be adjusted to influence model training. The hyperparameter tuning module 184 may include instructions for tuning hyperparameters by successive evaluation. The validation and testing module 186 may include sets of instructions for validating and testing one or more machine learning models, including those generated by the model pretraining module 176, the fine-tuning module 178 and the model training module 180. The auto-prompting module 188 may include sets of instructions for performing auto-prompting of one or more models. Specifically, the auto-prompting module 188 may enrich a prompt with additional information. The auto-prompting module 188 may include additional information in a prompt, so that the model receiving the prompt has additional context or directions that it can use. This may allow the auto-prompting module 188 to fine-tune a base model using one-shot of few-shot learning, in some implementations. The auto-prompting module 188 may also be used to focus the output of the one or more models.
[0072] In some implementations, the training module 180 may include instructions for training one or more additional machine learning models, such as supervised or unsupervised machine learning models. For example, as discussed below, in some implementations, the present techniques may include processing imaging data to enrich records regarding a patient’s care. In that case, the patient’s imaging data may be processed by a model (e.g., a convolutional neural network) and the results processed further (e.g., by a language model) and/or provided to the client computing device 102. The training module 180 may train such a supervised model separately from training one or more language models. Further, the server computing device 104 may select one or more trained models at runtime based on data about a specific patient, based upon data contained in a prompt or based on other conditions that may be preprogrammed into the server computing device 104.
[0073] In some implementations, the training module 180 may train multi-modal models. For example, the training module 180 may train a plurality of models each capable of drawing from multimodal data types such as written text, imaging data, laboratory data, real-time monitoring data, pathology images, etc. In some cases, the training module 180 may train a single model capable of processing the multimodal data types. In some implementations, a trained multimodal model may be used in conjunction with another model (e.g., a large language model) to provide non-text data interactions with users. Non-text data may be analyzed and integrated into the functions discussed herein. Further details regarding training the models are discussed below with regard to FIG. 2.
[0074] The model operation module 190 may operate one or more trained models. Specifically, the model operation module 190 may initialize one or more trained models, load parameters into the model(s), and provide the model(s) with inference data (e.g., prompt inputs). In some implementations, the model operation module 190 may deploy one or more trained model (e.g., a pretrained model, a fine-tuned model and/or a trained model) onto a cloud computing device (e.g., via the API 166). The model operation module 190 may receive one or more inputs, for example from the client computing device 102, and provide those inputs (e.g., one or more prompts) to the trained model. In some implementations, the API 166 may include elements for receiving requests to the model, and for generating outputs based on model outputs. For example, the API 166 may include a RESTful API that receives a GET or POST request including a prompt parameter. The model operation module 190 may receive the request from the API 166, and pass the prompt parameter into the trained model and receive a corresponding input. For example, the prompt parameter may be “What is the smallest bone in the human body?”. The prompt output may be “The stapes bone of the inner ear is the smallest bone in the human body.”
[0075] The model operation module 190 may operate models in different modes. For example, in a first mode, the model operation module 190 may receive a prompt input via the client computing device 102, and provide that input to each of a plurality of agents for processing. The output of each agent may be collected and transmitted back to the client computing device 102 for display. The outputs may be labeled according to an identifier of each model (e.g., “pathologist,” “surgeon,” “medical student,” etc.). In the first mode, the model operation module 190 may receive additional inputs from the client computing device 102 that enable the user to interact with the one or more trained models in a question-answer format.
[0076] In a question-answer implementation, the user may ask follow up questions. For example, the user may enter a prompt such as “Tell me about patients who were similarly situated. How did they respond to SBRT? To surgery?” As noted, the server computing device 104 may have access to historical patient data, and the server computing device 104 (e.g., the data pre-processing module 174) may include instructions for retrieving additional data from knowledge databases regarding patients to provide additional context to the one or more language models. These knowledge databases may include the electronic healthcare records database discussed above, as well as external sources such as academic papers, case studies, transcripts, etc. In some implementations, the language models may be trained using this additional data ahead of time, and may not retrieve the data at runtime. For example, the
training data for the liver cancer example may include the KRAS mutation status of the patient, their chemotherapy records, and outcomes for surgery, radiation, and other approaches.
[0077] As discussed, in some implementations, multi-modal modeling may be used. The data pre-processing module may, for example, process and understand image data, audio data, video data, etc. The server computing device 104 may interpret and respond to queries that involve understanding content from these different modalities. For example, the server computing device 104 may include an image processing module (not depicted) including instructions for performing image analysis on images provided by users, or images retrieved from patient EHR data. In some implementations, the server computing device 104 may generate outputs in modalities other than text. For example, the server computing device 104 may generate an audio response, an image, etc. Combining multi-modal data may enable the present models to perform more comprehensive analysis of patient conditions, based on information processed in multiple different modes simultaneously.
[0078] The operating module 190 may include a set of computer-executable instructions that when executed by one or more processors (e.g., the processors 160) cause a computer (e.g., the server computing device 104) to perform retrieval-augmented generation. Specifically, the operating module 190 may perform retrieval-augmented generation based upon inputs or queries received from the user. This allows the operating module 190 to tailor responses of a model based on the specific input and context, such as the medical issue under discussion. For example, one or more models may be pre-trained, fine-tuned and/or trained as discussed above. During that training, the model may learn to generate tokens based on general language understanding as well as application-specific training. Such a model at that point may be static, insofar as it cannot access further information when presented with an input query.
[0079] When the model is used at runtime, however, such as when deployed in the system 100, the operating module 190 may perform retrieval operations, such as searching or selecting information from a document, a database, or another source. The operating module 190 may include instructions for processing user input and for performing a keyword search, a regular expression search, a similarity search, etc. based upon that user input. The operating module 190 may input the results of that search, along with the user input, into the trained model. Thus, the trained model may process this additional retrieved information to augment, or contextualize, the generation of tokens that represent responses to the user’s query. In sum, retrieval-augmented generation applied in this manner allows the model to dynamically generate outputs that are more relevant to the user’s input query at runtime. Information that may be retrieved may include data corresponding to a patient (e.g., patient demographic information,
medical history, clinical notes, diagnoses, medications, allergies, immunizations, laboratory results, oncology information, radiation and imaging information, vitals, etc.) and additional training information, such as medical journals, notes or speech transcripts from symposia or other meetings/conferences, etc.
[0080] The present techniques may trigger retrieval-augmented generation by processing a prompt, in some implementations. For example, a prompt may be processed by the input processing module 146 of the client computing device 102, prior to processing the prompt by the one or more generative models. The input processing module 146 may trigger retrieval- augmented generation based on the presence of certain inputs, such as patient information, or a request for specific information, in the form of keywords. The input processing module 146 may perform entity recognition or other natural language processing functions to determine whether the prompt should be processed using retrieval-augmented generation prior to being provided to the trained model.
[0081] As discussed above, prompts may be received via the input processing module 146 of the client computing device 102 and transmitted to the server computing device 104 via the electronic network 106. In some implementations, the output of the model may be modulated prior to being transmitted, output, or otherwise displayed to a user.
[0082] For example, the ethics and bias module 192 may process the prompt input prior to providing the prompt input to the trained model, to avoid passing objectionable content into the trained model. In some implementations, the ethics and bias module 192 may process the output of the trained model, also to avoid providing objectionable output. It should be appreciated that trained language models may be unpredictable, and thus, processing outputs for ethical and bias concerns (especially in a medical context) may be important. Ultimately, the present techniques may be used to augment and solidify human decision making, rather than as a substitute for such deliberate thinking.
[0083] The client computing device 102 and the server computing device 104 may communicate with one another via the network 106. In some implementations, the client computing device 102 and/or the server computing device 104 may offload some or all of their respective functionality to the one or more cloud APIs 114. In some implementations, the one or more cloud APIs 114 may include one or more public clouds, one or more private clouds and/or one or more hybrid clouds. The one or more cloud APIs 114 may include one or resources provided under one or more service models, such as Infrastructure as a Service (laaS), Platform as a Service (PaaS), Software as a Service (SaaS), and Function as a Service (FaaS). For example, the one or more cloud APIs 114 may include one or more cloud
computing resources, such as computing instances, electronic databases, operating systems, email resources, etc. The one or more cloud APIs 1 14 may include distributed computing resources that enable, for example, the model pretraining module 176 and/or other of the modules 170 to distribute parallel model training jobs across many processors.
[0084] In some implementations, the one or more cloud APIs 114 may include one or more language operation APIs, such as OpenAI, Bing, Claude. ai, etc. In other implementations, the one or more cloud APIs 114 may include an API configured to operate one or more open source models, such as Llama 2.
[0085] The electronic network 106 may be a collection of interconnected devices, and may include one or more local area networks, wide area networks, subnets, and/or the Internet. The network 106 may include one or more networking devices such as routers, switches, etc. Each device within the network 106 may be assigned a unique identifier, such as an IP address, to facilitate communication. The network 106 may include wired (e.g., Ethernet cables) and wireless (e.g., Wi-Fi) connections. The network 106 may include a topology such as a star topology (devices connected to a central hub), a bus topology (devices connected along a single cable), a ring topology (devices connected in a circular fashion), and/or a mesh topology (devices connected to multiple other devices). The electronic network 106 may facilitate communication via one or more networking protocols, such as packet protocols (e.g., Internet Protocol (IP)) and/or application-layer protocols (e.g., HTTP, SMTP, SSH, etc.). The network 106 may perform routing and/or switching operations using routers and switches. The network 106 may include one or more firewalls, file servers and/or storage devices. The network 106 may include one or more subnetworks such as a virtual LAN (VLAN).
[0086] The system 100 may include one or more electronic databases, such as a relational database that uses structured query language (SQL) and/or a NoSQL database or other schema-less database suited for the storage of unstructured or semi-structured data.
[0087] The present techniques may store training data, training parameters and/or trained models in an electronic database such as the database 1 12. Specifically, one or more trained machine learning models may be serialized and stored in a database (e.g., as a binary, a JSON object, etc.). Such a model can later be retrieved, deserialized and loaded into memory and then used for predictive purposes. The one or more trained models and their respective training parameters (e.g., weights) may also be stored as blob objects. Cloud computing APIs may also be used to stored trained models, via the cloud APIs 114. Examples of these services include AWS SageMaker, Google Al Platform, and Azure Machine Learning.
[0088] In operation, a user may access a prompt graphical user interface via the client computing device 102. The prompt graphical user interface may be configured by the model configuration module 142 and generated by the input processing module 146, and displayed by the input processing module 146 via the output device 128. Specifically, the model configuration module 142 may be configured to have one or more digital panel objects each comprising one or more trained models as digital agent objects. The model configuration module 142 may configure the graphical user interface to accept prompts and display corresponding prompt outputs generated by one or more models processing the accepted outputs. The input processing module 146 may be configured to transmit the prompts input by the user via the electronic network 106 to the API 166 of the server computing device 104. The API 166 may process the user inputs via one or more trained models.
[0089] At the time the user accesses the prompt graphical user interface, one or more models may already be trained, including pretraining and fine-tuning. These trained models may be selectively loaded into the one or more agent objects based on configuration parameters, and/or based upon the content of the user’s input prompts.
[0090] It will be understood that, although the various modules are depicted as belonging to one of the server computing device 104 or the client computing device 102, the system 100 may include a single device including the described module (e.g., the server computing device 104) that is accessed for remote computing and/or other use cases by one or more client devices (e.g., the client computing device 102, output device 128, etc.).
Exemplary Training of a Machine-Learning Model
[0091] FIG. 2 depicts a combined block and logic diagram 200 for training a machine learning model, in which the techniques described herein may be implemented, according to some embodiments. Some of the blocks in FIG. 2 may represent hardware and/or software components, other blocks may represent data structures or memory storing these data structures, registers, or state variables (e.g., supervised training dataset 212), and other blocks may represent output data (e.g., scalar reward 225). Input and/or output signals may be represented by arrows labeled with corresponding signal names and/or other identifiers. The methods and systems may include one or more servers 202, 204, 206, such as the server computing device 104 or an external computing device.
[0092] In some implementations, the server 202 may fine-tune a pretrained language model 210. The pretrained language model 210 may be obtained by the server 202 and be stored in a memory, such as memory 124. The pretrained language model 210 may be loaded into a machine learning training module, such as the training module 180, by the server 202 for
retraining/fine-tuning. A supervised training dataset 212 may be used to fine-tune the pretrained language model 210 wherein each data input prompt to the pretrained language model 210 may have a known output response for the pretrained language model 210 to learn from. The supervised training dataset 212 may be stored in a memory of the server 202 (e.g., the memory 124) or a separate training database. In some implementations, the data labelers may create the supervised training dataset 212 prompts and appropriate responses. The pretrained language model 210 may be fine-tuned using the supervised training dataset 212 resulting in the SFT machine learning model 215, which may provide appropriate responses to user prompts once trained. The trained SFT machine learning model 215 may be stored in a memory of the server 202 (e.g., memory 124).
[0093] In some embodiments, the server 202 may fine-tune the pretrained language model 210 using a set of vectors associated with a set of training data. In some instances, the set of training data may include prompts associated with questions and documents, and responses associated with the prompts. Creating the set of vectors may include (1 ) splitting the text of the prompts, associated questions and/or associated documents into semantic clusters, and (2) encoding the semantic clusters as the set of vectors. The semantic clusters may be one or more words, a portion of a word, or a character. A distance between the vectors (e.g., a cosine distance, a Euclidean distance) may depend on a relevance between the semantic clusters corresponding to the vectors.
[0094] In some implementations, training the machine learning model 250 may include the server 204 training a reward model 220 to provide as an output a scaler value/reward 225. The reward model 220 may be required to leverage Reinforcement Learning with Human Feedback (RLHF) in which a model (e.g., machine learning model 250) learns to produce outputs which maximize its reward 225, and in doing so may provide responses which are better aligned to user prompts.
[0095] Training the reward model 220 may include the server 204 providing a single prompt 222 to the SFT machine learning model 215 as an input. The input prompt 222 may be provided via an input device (e.g., a keyboard) via the I/O module of the server, such as input processing module 146. The prompt 222 may be previously unknown to the SFT machine learning model 215, e.g., the labelers may generate new prompt data, the prompt 222 may include testing data stored on a training database, and/or any other suitable prompt data. The SFT machine learning model 215 may generate multiple, different output responses 224A, 224B, 224C, 224D to the single prompt 222. The server 204 may output the responses 224A, 224B, 224C, 224D via an I/O module (e.g., input processing module 146) to a user interface
device, such as a display (e.g., as text responses), a speaker (e.g., as audio/voice responses), and/or any other suitable manner of output of the responses 224A, 224B, 224C, 224D for review by the data labelers.
[0096] The data labelers may provide feedback via the server 204 on the responses 224A, 224B, 224C, 224D when ranking 226 them from best to worst based upon the prompt-response pairs. The data labelers may rank 226 the responses 224A, 224B, 224C, 224D by labeling the associated data. The ranked prompt-response pairs 228 may be used to train the reward model 220. In some implementations, the server 204 may load the reward model 220 via the machine learning module (e.g., the machine learning module 140) and train the reward model 220 using the ranked response pairs 228 as input. The reward model 220 may provide as an output the scalar reward 225.
[0097] In some implementations, the scalar reward 225 may include a value numerically representing a human preference for the best and/or most expected response to a prompt (i.e., a higher scalar reward value may indicate the user is more likely to prefer that response, and a lower scalar reward may indicate that the user is less likely to prefer that response). For example, inputting the “winning” prompt-response (i.e., input-output) pair data to the reward model 220 may generate a winning reward. Inputting a “losing” prompt-response pair data to the same reward model 220 may generate a losing reward. The reward model 220 and/or scalar reward 225 may be updated based upon labelers ranking 226 additional promptresponse pairs generated in response to additional prompts 222.
[0098] In one example, a data labeler may provide to the SFT machine learning model 215 as an input prompt 222, “Describe the sky.” The input may be provided by the labeler via the client computing device 102 over network 106 to the server 204 running a chatbot application utilizing the SFT machine learning model 215. The SFT machine learning model 215 may provide as output responses to the labeler via the client computing device 102: (i) “the sky is above” 224A; (ii) “the sky includes the atmosphere and may be considered a place between the ground and outer space” 224B; and (iii) “the sky is heavenly” 224C. The data labeler may rank 226, via labeling the prompt-response pairs, prompt-response pair 222/224B as the most preferred answer; prompt-response pair 222/224A as a less preferred answer; and promptresponse 222/224C as the least preferred answer. The labeler may rank 226 the promptresponse pair data in any suitable manner. The ranked prompt-response pairs 228 may be provided to the reward model 220 to generate the scalar reward 225.
[0099] While the reward model 220 may provide the scalar reward 225 as an output, the reward model 220 may not generate a response (e.g., text). Rather, the scalar reward 225 may
be used by a version of the SFT machine learning model 215 to generate more accurate responses to prompts, i.e., the SFT model 215 may generate the response such as text to the prompt, and the reward model 220 may receive the response to generate a scalar reward 225 of how well humans perceive it. Reinforcement learning may optimize the SFT model 215 with respect to the reward model 220 which may realize the configured machine learning model 250.
[00100] In some implementations, the server 206 may train the machine learning model 250 (e.g., via the machine learning module 140) to generate a response 234 to a random, new and/or previously unknown user prompt 232. To generate the response 234, the machine learning model 250 may use a policy 235 (e.g., algorithm) which it learns during training of the reward model 220, and in doing so may advance from the SFT model 215 to the machine learning model 250. The policy 235 may represent a strategy that the machine learning model 250 learns to maximize the reward 225. As discussed herein, based upon prompt-response pairs, a human labeler may continuously provide feedback to assist in determining how well the machine learning model’s 250 responses match expected responses to determine the rewards 225. The rewards 225 may feed back into the machine learning model 250 to evolve the policy 235. Thus, the policy 235 may adjust the parameters of the machine learning model 250 based upon the rewards 225 it receives for generating good responses. The policy 235 may update as the machine learning model 250 provides responses 234 to additional prompts 232.
[00101] In some implementations, the response 234 of the machine learning model 250 using the policy 235 based upon the reward 225 may be compared using a cost function 238 to the SFT machine learning model 215 (which may refrain from using a policy) response 236 of the same prompt 232. The cost function 238 may be trained in a similar manner and/or contemporaneous with the reward model 220. The server 206 may compute a cost 240 based upon the cost function 238 of the responses 234, 236. The cost 240 may reduce the distance between the responses 234, 236 (i.e., a statistical distance measuring how one probability distribution is different from a second). In some implementations, the response 234 of the machine learning model 250 versus the response 236 of the SFT model 215. Using the cost 240 to reduce the distance between the responses 234, 236 may avoid a server over-optimizing the reward model 220 and deviating too drastically from the human-intended/preferred response. Without the cost 240, the machine learning model 250 optimizations may result in generating responses 234 which are unreasonable but may still result in the reward model 220 outputting a high reward 225.
[00102] In some implementations, the responses 234 of the machine learning model 250 using the current policy 235 may be passed by the server 206 to the rewards model 220, which
may return the scalar reward 225. The machine learning model 250 response 234 may be compared via the cost function 238 to the SFT machine learning model 215 response 236 by the server 206 to compute the cost 240. The server 206 may generate a final reward 242 which may include the scalar reward 225 offset and/or restricted by the cost 240. The final reward 242 may be provided by the server 206 to the machine learning model 250 and may update the policy 235, which in turn may improve the functionality of the machine learning model 250.
[00103] To optimize the machine learning model 250 over time, RLHF via the human labeler feedback may continue ranking 226 responses of the machine learning model 250 versus outputs of earlier/other versions of the SFT machine learning model 215, i.e., providing positive or negative rewards 225. The RLHF may allow the servers (e.g., servers 204, 206) to continue iteratively updating the reward model 220 and/or the policy 235. As a result, the machine learning model 250 may be retrained and/or fine-tuned based upon the human feedback via the RLHF process, and throughout continuing conversations may become increasingly efficient.
[0104] Although multiple servers 202, 204, 206 are depicted in the exemplary block and logic diagram 200, each providing one of the three steps of the overall machine learning model 250 training, fewer and/or additional servers may be utilized and/or may provide the one or more steps of the machine learning model 250 training. In some implementations, one server may provide the entire machine learning model 250 training.
Exemplary Computer System Architecture
[0105] FIG. 3 depicts an exemplary network 300 including a user device 310, cloud platform 320, and data sources 350. Depending on the implementation, the exemplary network 300 includes a plurality of modules configured to implement the instant techniques as described herein. In some implementations, the cloud platform 320 includes, is partially stored on, or is completely stored on the server computing device 104 and/or the client computing device 102 of FIG. 1 . Similarly, depending on the implementation, the user device 310 may be or include the client computing device 102 of FIG. 1 and/or another computing device.
[0106] In some implementations, the cloud platform 320 is communicatively coupled to one or more user-associated devices including user device 310. In some such implementations, the user device 310 includes a user interface (Ul) such as user-side Ul 304 that may include a frontend Ul and/or an authentication module configured to interface with a security module 312 of the cloud platform 320 to authenticate and/or verify information from a user utilizing the user device 310. In further implementations, the user device 310 additionally includes an application client 308 including an API gateway communicatively coupled to a module of the cloud platform 320. Depending on the implementation, the API gateway in the application client 308 may
include an authentication module managed by a third party (e.g., not managed by the security module 312) or managed by cloud platform 320 (e.g., managed by the security module 312). In further implementations, the user device 310 may additionally include a user profile module 306 configured to pull user profile data from one or more databases.
[0107] In some implementations, the cloud platform 320 includes a security module 312 configured to manage security for the cloud platform 320 and/or user devices (e.g., user device 310) interfacing with the cloud platform 320. For example, the security module 312 may include functionality for configuring and/or controlling an identity-aware proxy (IAP) functionality, defensive security functionality for defending against web attacks (e.g., denial of service attacks, virtual machine attacks, workload attacks, etc.), and/or a load balancing functionality to distribute network traffic across devices communicatively coupled to the cloud platform 320. In some implementations, the security module 312 is communicatively coupled to a client III module 318. The client III module 318 may be communicatively coupled to one or more client devices (not shown). Depending on the implementation, a user device 310 may be a user device and/or associated with data for a user and/or a device to provide user data to the cloud platform, while a client device (not shown) may be a client computing device 102 and/or other endpoint device communicatively coupling to the cloud platform 320 to utilize the models as described herein. In some such implementations, the security module 312 may balance a load using a load balancing functionality between the client device, the user device 310, and the cloud platform 320.
[0108] In further implementations, the cloud platform 320 may additionally or alternatively include a separate balance module 314 to perform further load balancing within the cloud platform 320. Depending on the implementation, the balance module 314 may function in conjunction with and/or separately from the security module 312.
[0109] In some implementations, the cloud platform 320 additionally includes an application server module 322 configured to serve backend functionalities of the III (e.g., the client III module 318 and/or user-side III 304). In further implementations, the application server module 322 is communicatively coupled to the record module 324 and configured to call the record module 324 (e.g., via an API) to facilitate functionality of the techniques as described herein. The record module 324 may perform and/or be communicatively coupled to components that perform functionalities as described herein. In particular, the record module 324 may analyze a corpus of documents (e.g., via a communicatively coupled and/or stored Al model 316) as described herein. Similarly, depending on the implementation, the record module 324 may call and/or interact with the enrichment module 326 to enrich one or more documents in the corpus
of documents (e.g., with metadata, with additional data, by combining records, etc.). Depending on the implementation, the record module 324 may call the enrichment module 326 to enrich records for training the Al model 316, for analyzing the records, to enrich records with an analysis of the records (e.g., prior to display and/or storage), etc.
[0110] In further implementations, the cloud platform 320 includes a persistence layer 330 communicatively coupled to the record module 324 and/or enrichment module 326. The persistence layer 330 may include one or more databases (e.g., an SQL database, a memory storage database, a training data database, etc.) configured to store data associated with the techniques described herein. For example, the persistence layer 330 may store one or more analyzed documents, one or more summaries generated based on documents, one or more categories for a corpus of documents, etc.
[0111] The cloud platform 320 may additionally include a communication module 332, including a virtual private cloud (VPC) network and/or a network address translation (NAT) module configured to enable and/or facilitate communications with one or more external databases, devices, services, etc. Depending on the implementation, the communication module 332 may be communicatively coupled to the user profile module 306 of the user device 310, one or more data sources 350, and/or one or more additional devices (e.g., a client device communicatively coupled to the client III module 318) (not shown). Depending on the implementation, the record module 324 and/or enrichment module 326 may call one or more APIs associated with such other devices via the communication module 332 to retrieve and/or store data, to display data, to train and/or access the Al model 316 (e.g., when the Al model is stored at another device), etc.
[0112] In further implementations, the enrichment module 326 is communicatively coupled to one or more functionality modules (e.g., a document analysis module 334, an asynchronous processing module 336, a cloud scheduler module 338, etc.). In some implementations, the document analysis module 334 performs one or more functionalities for analysis of a document and/or pre-processing a document for analysis. For example, the document analysis module 334 may perform one or more optical character recognition (OCR) operations on an unstructured data file as pre-processing to ready the document for further analysis by the Al model 316. In further implementations, the document analysis module 334 functions as part of and/or in concert with the Al model 316 to analyze the documents. In some implementations, the asynchronous processing module 336 functions in conjunction with the enrichment module 326 and/or record module 324 to enable asynchronous load spreading (e.g., for analysis of the corpus of documents) across a plurality of compute instances (e.g., for parallel processing).
Similarly, the cloud scheduler module 338 may function in conjunction with the enrichment module 326 and/or record module 324 to enable batch processing of the corpus of documents.
[0113] In some implementations, an API module 345 in one or more devices associated with one or more data sources 350 interfaces with the cloud platform 320 (e.g., via the communication module 332) to gather and provide data from one or more databases. For example, the API module 345 may be communicatively coupled to a historical user database 352, a device report database 354, a subject matter database 356 (e.g., a healthcare database, a patent database, a financial record database, etc.), and/or any other such database as described herein. Depending on the implementation, the database(s) may include one or more servers/devices storing the documents and/or a view for summarizing, selecting, and/or interacting with documents.
Exemplary User Interface
[0114] FIG. 4 depicts an exemplary Ul 400 for displaying mapped documents to a user. In some implementations, the Ul 400 is or is associated with a user-side Ul 304, client Ul module 318, and/or other such Ul module of FIG. 3. Similarly, in further implementations, the Ul 400 may be displayed to a user via a client computing device 102 of FIG. 1 and/or another such computing device.
[0115] In some implementations, the Ul 400 includes a document list 436 including relevant documents from a corpus of documents for display to a user. Depending on the implementation, the documents in the document list 436 may be or include any documents that a model (e.g., Al model 316 of FIG. 3) determines to be associated with a subject (e.g., a patient, an inventor, an investor, etc.). In further implementations, the document list 436 includes one or more segments/categories of documents determined by the model. Depending on the implementation, the document list 436 additionally displays and/or sorts the documents in the document list 436 by an extracted date (e.g., extracted as described herein). Similarly, in further implementations, the document list 436 includes a source type for the documents in the document list 436.
[0116] In some implementations, the Ul 400 includes a summary window 410. Depending on the implementation, the summary window 410 includes multiple summaries of the documents in the document list 436. Depending on the implementation, the summary window 410 displays a summary for a selected document from the document list 436. In further implementations, the summary window 410 additionally displays summaries for documents referenced in, associated with, and/or in a similar category to the selected document. In implementations in which one or more referenced documents are missing, the summary window 410 may indicate such (e.g., by
listing a document and labeling said document as “missing”). In some implementations, the summary window 410 displays documents along with particular labels (e.g., a date, a category/classification, a recommendation, a comparison, etc.).
[0117] In further implementations, the Ul 400 includes a viewing pane 415 to display documents (e.g., from the document list 436 and/or the summary window 410). In some implementations, the viewing pane 415 displays and/or automatically scrolls to a relevant portion of the document (e.g., responsive to a click on a portion of the document list 436 and/or summary window 410). Similarly, in further implementations, the viewing pane 415 automatically highlights one or more keywords in the document (e.g., based on the keyword list 422).
[0118] The Ul 400 may additionally include a keyword list 422. Depending on the implementation, a computing device displaying the Ul 400 may save (e.g., automatically and/or responsive to an indication from a user) frequently searched keywords. Depending on the implementation, the keyword list 422 may include an indication of a number of instances within a currently displayed document (e.g., within the viewing pane 415), within one or more documents in the summary window 410, within one or more documents in the document list 436, etc. Depending on the implementation, the keyword list 422 may be part of the summary window 410 and/or another portion of the Ul 400.
[0119] In some implementations, the Ul 400 may include a duplicate record view 424. In some such implementations, a computing device implementing the Ul 400 and/or communicatively coupled to a computing device implementing the Ul 400 may detect when one or more documents in the corpus of documents are duplicate copies and hides the duplicate copies from view. In some such implementations, the duplicate record view 424 indicates the duplicate copies and/or displays a document designated to be a duplicate copy when selected by a user.
[0120] In further implementations, the Ul 400 may include an annotation pane 432. In some such implementations, users may annotate outside records (e.g., draw shapes, write notes, label portions, etc.) and save the annotations. The annotation pane 432 may indicate the existence of such annotations, indicate an annotator, and/or allow a user to view the annotations and/or add additional annotations.
[0121] In further implementations, the Ul 400 may include a document checklist 434. Depending on the implementation, the checklist may be a list of expected documents (e.g., as generated by a model based on past operations and/or historical data, as input by a user, as pre-recorded for a predetermined operation, as generated based on referenced documents,
etc.). In further implementations, the document checklist 434 may be or include a checklist of associated image files referenced in and/or associated with the documents in the document list 436 (e.g., a database listing of corresponding images, a database listing of corresponding logs, a database listing of corresponding files, etc.).
Exemplary Computer-Implemented Methods
[0122] FIG. 5 depicts an exemplary computer-implemented method 500 for analyzing a corpus of documents using one or more large language machine learning models, according to one or more implementations. Depending on the implementation, method 500 may be implemented by a system 100, one or more components of the system 100 (e.g., the client computing device 102, server computing device 104, a third party computing device (not shown), etc.), one or more components outside of the system 100 (not shown), one or more alternative systems, etc.
[0123] At block 502, the system 100 identifies one or more documents associated with a user. In some implementations, the one or more documents are part of a larger corpus of documents (e.g., medical documents) associated with a plurality of users including the user in question. Depending on the implementation, the one or more documents may be scanned documents, clinical document architecture (CDA) documents or other structured text documents, portable document files (PDF), images, videos, etc. As such, in some implementations, the one or more documents may be or include different document types. For example, the one or more documents may include a structured data file (e.g., a text file, one or more fillable forms, one or more SQL database files, etc.), a semi-structured data file (e.g., a text file including images, metadata associated with a file, NoSQL database files, etc.), and/or an unstructured data file (e.g., emails, images, scanned documents, videos, etc.). In some such implementations, an unstructured data and/or semi-structured data is initially processed by a separate machine learning model trained to process such data to enable another model to analyze the data in conjunction with structured data that is able to be processed by the other model. In further implementations, the machine learning model is a multimodal machine learning model configured to analyze multiple types of modal data.
[0124] In some implementations, the system 100 sorts the data based on whether the data file includes structured, semi-structured, or unstructured data. Depending on the implementation, the system 100 performs the sort based on a file extension, based on a file name, based on one or more detected data types, etc. In further implementations, the system 100 additionally or alternatively attempts to parse and/or analyze semi-structured and/or
unstructured data using a model for analyzing structured data and calls the pre-processing model responsive to determining that the model fails and/or partially fails.
[0125] In some implementations, the system 100 identifies the one or more documents based on natural language processing (NLP) techniques to detect a user identity (e.g., a patient name, account number, phone number, email address, etc.). In further implementations, the system 100 splits and classifies data in documents into one or more categories (as described below with regard to block 503) including a name, identity, date, etc., and the system 100 identifies the one or more documents as associated with the user based on the categorized data.
[0126] In further implementations, the one or more documents are or include structured text data documents (e.g., Health Level 7 (HL7) Consolidated Clinical Document Architecture (C- CDA) documents) from one or more health information exchange networks. In such implementations, the system 100 may use NLP and/or structured document semantic techniques to process the one or more documents prior to rendering and/or incorporating the one or more documents with other documents for analysis and/or display (e.g., as described below with regard to blocks 503, 504, 506, and/or 508).
[0127] In some implementations, the system 100 additionally or alternatively identifies dates associated with the one or more documents. Depending on the implementation, the system 100 may extract the dates from categorized data (e.g., as described in more detail below with regard to block 503). In further implementations, the system 100 may parse the dates associated with the one or more documents based on metadata and/or other such assigned and/or collated data. Such extraction of dates may be particularly useful, as conventional machine learning and NLP tools are generally unable to differentiate between different dates present on some records (e.g., scanned date, signed date, visit date, publication date, filing date, etc.). As such, the use of a particularly trained machine learning model as described herein allows for easier and less resource-intensive automatic extraction of dates.
[0128] In some implementations, the system 100 collects at least some of the one or more documents (e.g., via the data collection module 172 of FIG. 1 ; the record module 324, the communication module 332, etc. of FIG. 3; and/or any other such component as described herein). In some such implementations, the system 100 may determine to collect the one or more documents responsive to a determination (e.g., by one or more machine learning models as described below). Depending on the implementation, the determination may be or include a determination that one or more documents are missing (e.g., based on the identification as described above), a detection of a reference to another environment (e.g., another hospital) that may include additional documentation, a determination that a user is requesting additional
documents, etc. In some such implementations, the system 100 collects the documents by generated automated phone calls, fax requests, emails, etc. In further implementations, the system 100 may automatically generate collated data classification(s) and/or analyze the documents collected as described in blocks 503 and/or 504 below.
[0129] In some implementations, the system 100 and/or the trained large language machine learning models of the system 100 include chatbot functionalities (e.g., the chat mode described above). In some implementations (e.g., responsive to a user request), the system 100 may utilize the chatbot functionalities to communicate with a user to guide the user through requesting documents from other environments rather than automatically generating the requests. In further implementations, the system 100 may utilize the chatbot functionality to generate one or more questions for the user, for the outside environment, for another user (e.g., a physician, nurse, administrator, etc.) regarding the outside records and/or to obtain the outside records. Similarly, the system 100 may utilize the chatbot functionality to adjust summarization, output, and/or other such functions (e.g., as described below with regard to blocks 506 and/or 508).
[0130] At block 503, the system 100 generates, using one or more trained large language machine learning models, one or more collated data classifications based on the one or more documents (e.g., using a classification tree based on the data included in the one or more documents, using neural network focused techniques, using vector space embedding values, etc.). In some implementations, by generating collated data classifications, the system 100 is able to reduce, remove, and/or mitigate hallucinations in the model. In particular, when the system 100 analyzes the one or more documents piecewise and classifies the data accordingly, the machine learning model is able to use specific and targeted queries to analyze the data. By using more targeted and specific queries, the machine learning model is able to more particularly analyze the categorized data, and is less likely to provide broad or false answers (e.g., due to mischaracterizing data, due to incorrectly connecting data to a broad term, etc.). As such, hallucination and error rates may be reduced by generating and analyzing the collated data classifications.
[0131] In further implementation, the machine learning model is a multi-step machine learning model. In particular, the same machine-learning model may perform each step as described herein, and may be fine-tuned at each step (e.g., as described in detail above). In particular, the machine learning model may be trained with particular documents and/or data at each step to better perform the individual step and/or may otherwise receive a plurality of targeted prompts to retrieve specific information from each document.. For example, the machine learning model
may be trained to generate the collated data classifications using a number of documents from a broad range of categories with appropriate labeling. Similarly, the machine learning model may be trained to analyze the documents with individual documents and subcategories, with appropriate labeling. As such, the machine learning model may be trained to perform each individual step rather than broadly trained to output a generalized answer, allowing for overall improved performance and scaling without need for normalization, which may not be possible at the broad ranges of data which the machine learning model must analyze. As such, the multi- step machine learning model may be consistently accurate, straightforward to evaluate, tailored to user needs, and capable of automatically correcting when errors arise.
[0132] In some implementations, the system 100 splits and classifies the data into categories based on the document(s) being analyzed. For example, the system 100 may split documents into a header section, a date section, a name section, a diagnosis section, a recommendation section, a methods section, etc. In further implementations, the system 100 generates summaries for the document(s) based on the categorized data. In still further implementations, the system 100 generates a summary for at least some of the categories (e.g., a summary for the diagnosis sections, a summary for the recommendation sections, a summary for the methods sections, etc.). In yet further implementations, the system 100 generates a summary for at least some of the categories for each document separately (e.g., a summary for each diagnosis section of each document, a summary for each recommendation section for each document, a summary for each methods section for each document, etc.). Depending on the implementation, the system 100 generates the summaries based on and/or to include an F1 score (e.g., the harmonic mean of the precision and recall of the classification model) and/or accuracy score indicative of the reliability of the model. In further implementations, the system 100 may generate the summaries to indicate missing information in the documents, incorrect information in the documents, inconsistent information in the documents, and/or abnormal information in the documents.
[0133] In some implementations, the system 100 splits the data in the documents by detecting individual sub-documents within a larger document. For example, a single document may include multiple sub-documents, each indicative of a different portion of a subject’s file (e.g., for a medical file, each sub-document may represent a different procedure that was performed; for a patent file, each sub-document may be a separate patent owned in a portfolio; etc.).
[0134] At block 504, the system 100 analyzes the one or more documents (e.g., in the collated data classifications) to extract data (e.g., clinical data) for at least one user (e.g., the
user(s) associated with the one or more documents). Depending on the implementation, the extracted data may include data types for the one or more documents and/or relevant dates associated with the data types. In some implementations (e.g., when the documents are medical documents), the data types may include radiological or medical imaging studies (e.g., such as (i) screening mammogram, (ii) diagnostic mammogram, (iii) biopsy procedure report, (iv) biopsy pathology report, (v) post-biopsy imaging, (vi) computed tomography scans, (vii) x- ray studies, (viii) ultrasound reports, (ix) echocardiograph reports, (x) MRI reports, (xi) clinical notes, (xii) microbiology reports, (xiii) operative reports, (xiv) diagnostic test reports and/or (xv) any other such indication).
[0135] In some implementations, analyzing the one or more documents includes analyzing, via the one or more processors, at least some of the one or more documents using one or more image analysis models to generate at least some of the extracted data. For example, the one or more documents may be or include medical documents including images (e.g., radiology images), and the system 100 may analyze the documents to determine a type of image. In further implementations, the system 100 uses the image analysis models to detect text on the image(s) (e.g., via optical character recognition (OCR) techniques) and determines the extracted data (e.g., document type and/or associated dates) based on such. In further implementations, the system 100 analyzes at least some of the documents using the one or more large language machine learning models. As such, the system 100 may generate the extracted data using the large language machine learning model(s). In some implementations, the system 100 may use both image analysis models and the large language models. In some such implementations, the image analysis models may be separate from the large language models and/or a functionality of such.
[0136] In further implementations, the system 100 may interpret the extracted data in the one or more documents (e.g., using a machine learning model through the analysis process of block 504) to determine one or more procedures, pre-visit testing orders, appointment types, etc. for a user and/or patient associated with the one or more documents. In some such implementations, the system 100 automatically orders, schedules, and/or otherwise organizes the recommended procedures, appointments, etc. based on the above. In further such implementations, the system 100 displays an indication to a user and/or patient for the recommendations generated above, and orders, schedules, and/or otherwise organizes such responsive to an acceptance and/or other such indication from the user and/or patient.
[0137] At block 506, the system 100 generates, using one or more trained large language machine learning models, a mapping of the data. In some implementations, the mapping is for
a single user across the one or more documents and/or the corpus of documents. In further implementations, the mapping is for multiple users (e.g., users that are related, users that are associated with a particular overseer (e.g., a clinician, doctor, lawyer, etc.), users that are demographically similar (e.g., similar ages, genders, builds, etc.), and/or users that have a similarly relevant connection). In some implementations, the mapping of the data includes an indication that the extracted data is available and/or an indication of missing data not included in the extracted data. In further implementations, the mapping includes one or more suggestions for additional actions to perform to populate fields associated with the missing data (e.g., performing one or more tests, requesting information from a user, providing one or more forms to a user, etc.).
[0138] Further, in generating the mapping, the system 100 stitches categorized data back into a single whole. By putting the extracted information back together, the system 100 is able to verify the elements that go into the mapping, individually ensuring each element and further preventing and/or mitigating hallucinations or errors.
[0139] In some implementations, the system 100 compares the mapped data to a checklist (e.g., predetermined by a user, preprogrammed into the system 100, generated by a model in the system 100 based on one or more other referenced documents, etc.). As such, the system 100 may determine whether any documents are missing from the corpus of documents. In some such implementations, the system 100 may automatically alert a user that one or more documents are missing and/or prompt the user to provide the missing document(s). In further implementations, the system 100 may automatically query one or more additional databases to search for the missing document(s) and/or transmit the results to a verification module for additional analysis and/or verification. In some implementations, the system 100 saves resources by comparing the results to a checklist. Notably, the system 100 may prevent unneeded repetitions of searches when a document is missing (e.g., searching and reanalyzing until a timeout occurs) and/or prevents unnecessary analysis (e.g., stopping when the checklist is complete rather than searching all documents), depending on the implementation.
[0140] In some implementations, the system 100 evaluates performance of the model and/or of other modules in the system 100 by comparing the mapped data to manually labeled outside medical records to quantify the accuracy of the output. In further implementations, the system 100 evaluates performance of the model and/or other modules at each step of the process (e.g., to fine-tune the model(s) at each step as described in more detail below). In still further implementations, the performance may be evaluated for accuracy errors, omission errors, readability, etc. (e.g., using labeled data, using manual review, using a teacher model, etc.).
[0141] At block 508, the system 100 causes the mapping of the clinical data to be displayed via an output device (e.g., output device 128 and/or another device (not shown) communicatively coupled to network 106, client computing device 102, server computing device 104, and/or another component of system 100). In some implementations, the system 100 displays the mapping by displaying results of the mapping (e.g., a listing of information that is missing and/or present), by displaying a detailed mapping (e.g., displaying a list of every document including each piece of information), by displaying documents associated with the mapping, etc.
[0142] In some implementations, the system 100 additionally generates a customizable summary of the data for the user. In some such implementations, the system 100 receives one or more customizable metrics from a client device (e.g., associated with a doctor, lawyer, accountant, etc.) indicative of information to include in the summary. In further implementations, the system 100 uses one or more machine learning models (e.g., the large language machine learning model described above) to generate the summary. In some such implementations, the system 100 may determine the one or more customizable metrics based on information previously received from and/or used by the client device. In further implementations, the system 100 generates the customizable summary to include elements of and/or the entirety of the summaries for various categories as described above.
[0143] In some implementations, the system 100 fine-tunes the one or more machine learning models for each particular step of the process. For example, the system 100 may finetune one or more models generating a summary by training the one or more models using labeled summary data and input documents related to the particular summary (e.g., a model to generate summaries for catheterization lab reports is trained using summaries of catheterization lab reports, a model to generate summaries of patent applications is trained using summaries of patent applications, etc.). As such, the system 100 may utilize access to a diverse set of records and/or data to effectively train the one or more models. In particular, because it would be too resource-intensive to scale normalization to be applied to all documents being analyzed, the various models and/or steps are fine-tuned (e.g., based on categorization, date, relevant rules, etc. as described above). As such, the overall resource usage is reduced and improved while allowing for analysis of the corpus of documents.
[0144] In further implementations, the system 100 additionally detects one or more duplicate versions of at least some of the one or more documents (e.g., in the corpus of documents). Depending on the implementation, the system 100 may remove the one or more duplicate versions from the one or more documents and/or the mapping. In further implementations, the
system 100 may determine that the one or more duplicate versions include one or more updated versions and replace outdated documents with the updated versions. Similarly, when initially identifying the one or more documents that are associated with a user (e.g., at block 502), the system 100 may identify the duplicate documents and remove such by refraining from including the documents in the one or more documents to be used for generating the mapping.
[0145] In still further implementations, the system 100 detects that some of the documents have been rotated. In some such implementations, the system 100 specifically detects that a first orientation of at least some of the one or more documents does not match a second orientation of a remainder of the one or more documents. The system 100 then modifies the first orientation to match the second orientation. In some such implementations, the system 100 modifies the orientation(s) prior to analyzing the one or more documents. In further implementations, the system 100 modifies the orientation(s) prior to causing the output device to display the mapping.
[0146] In yet further implementations, the system 100 detects that at least some of the documents are in a language other than a preferred language (e.g., for an individual reviewing the documents). For example, if the preferred language is English and some of the documents may be in Spanish, Arabic, Mandarin Chinese, etc. The system 100 may translate the documents from the document language to the preferred language. In some such implementations, the system 100 uses one or more trained translation machine learning models (e.g., some of the large language models) to translate the documents.
[0147] In some implementations, the system 100 may integrate with other applications (e.g., via an API) to perform the functionality described herein. For example, the system 100 may include an API (e.g., API 166) to retrieve and/or request access to the corpus of documents. Similarly, the system 100 may use the API to interface with other services and/or platforms (e.g., Substitutable Medical Applications and Reusable Technologies (SMART), Fast Healthcare Interoperability Resources (FHIR), etc.).
[0148] In further implementations, the system 100 may evaluate the performance of the functionalities described herein. For example, the system 100 may evaluate the performance based on (i) appropriate data science metrics for the given task, (ii) clinical usability and utility, and (iii) system evaluation and testing. For example, the system 100 may use each of the following: (i) ROUGE: Used to evaluate summarization; (ii) BLEU: Measures translation quality; (iii) ROC/AUC: Duplication evaluation, etc. The duplication task may be treated as a binary classification task where the model outputs a true/false for duplicate or not duplicate. Therefore, ROC/AUC may be used to measure task performance.
[0149] For summarization and translation, the ROUGE and BLUE metrics may be generally accepted quantifications of these tasks and may be used to quantify performance. Depending on the implementation, the summarization and/or translation assessment criteria may include Clinical Quality/Clinical Correctness: (i) Factual Error Count (e.g., how many factual errors per response; put another way, whether the content is provided accurate and not a hallucination);
(ii) Omission (Count) (e.g., whether there is any notable missing information that would be considered clinically relevant); (iii) Relevance (e.g., how relevant the content provided to the question/prompt is); (iv) Timeliness (e.g., whether the content provided is appropriate for the particular point in time; (v) Actionable (e.g., whether the content is clinically actionable (meaning, could it be acted upon confidently)); and/or (vi) other such criteria.
[0150] Further criteria may be and/or include output structural quality: (i) Voice (Yes/No) (e.g., whether the content is written for the correct clinical audience); (ii) Detail/Content Length (Yes/No) (e.g., whether the output is too verbose (for narrative output) or overly concise and/or terse); (iii) Variability (Rating 1 to 5) (e.g., whether the responses vary when asking the same question multiple times); (iv) Content Structure/Format (X/X - # of sections correctly matched out of total sections) (e.g., whether content matches to the desired structure (such as paragraphs, list formatting, etc.)); and/or (v) other such criteria.
[0151] Unit, integration, and system level testing may be conducted on the system before deployment. Deployments may be contingent on successful automated testing, which may be integrated into the overall development and release process.
[0152] To monitor the runtime system for drift, linguistic perplexity may be monitored. Perplexity is an entropy-based method for measuring how well linguistic patterns match or “fit” into some pre-determined model. In some implementations, if the system 100 detects significant or trending increases in perplexity, the system may determine that incoming outside records have changed (e.g., new document types, new formats, new reporting systems or templates, etc.).
[0153] The following list of examples reflects a variety of the embodiments explicitly contemplated by the present disclosure:
[0154] Example 1 . A computing system for analyzing a corpus of documents using one or more large language machine learning models, the computing system comprising: one or more processors; and one or more memories having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing system to: identify, via the one or more processors, one or more documents associated with at least one user from a corpus of documents associated with a plurality of users including the user; generate, via the
one or more processors and using one or more trained large language machine learning models, one or more collated data classifications based on the one or more documents; analyze, via the one or more processors, the one or more collated data classifications to generate extracted data for the at least one user from the one or more documents; generate, via the one or more processors and using one or more trained large language machine learning models, a mapping of the extracted data for the at least one user across the corpus of documents; and cause, via the one or more processors, the mapping of the extracted data to be displayed via an output device.
[0155] Example 2. The computing system of example 1 , wherein the mapping of the extracted data includes: an indication that the extracted data is available; and an indication of missing data not included in the extracted data.
[0156] Example 3. The computing system of example 1 , the one or more memories having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing system to: generate, via the one or more processors and using the one or more machine learning models, a customizable summary of the extracted data for the at least one user.
[0157] Example 4. The computing system of example 1 , wherein analyzing the one or more documents includes: analyzing, via the one or more processors, at least some of the one or more documents using one or more image analysis models to generate at least some of the extracted data.
[0158] Example 5. The computing system of example 1 , wherein analyzing the one or more documents includes: analyzing, via the one or more processors, each document of the one or more documents using the one or more large language machine learning models.
[0159] Example 6. The computing system of example 1 , the one or more memories having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing system to: detect, via the one or more processors, one or more duplicate versions of at least some of the one or more documents; and remove, via the one or more processors, the one or more duplicate versions from the one or more documents or the mapping.
[0160] Example 7. The computing system of example 1 , the one or more memories having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing system to: detect, via the one or more processors, that a first orientation of at least some of the one or more documents does not match a second orientation
of a remainder of the one or more documents; and modify, via the one or more processors, the first orientation to match the second orientation.
[0161] Example 8. The computing system of example 1 , the one or more memories having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing system to: determine, via the one or more processors, that a document language of at least some of the one or more documents does not match a preferred language; and translate, via the one or more processors and using one or more trained translation machine learning models, the at least some of the one or more documents to the preferred language.
[0162] Example 9. The computing system of example 1 , wherein the extracted data includes data types present in the one or more documents and relevant dates associated with the data types.
[0163] Example 10. The computing system of example 9, wherein the data types include radiological or medical imaging studies.
[0164] Example 10B. The computing system of example 9, wherein the radiological or medical imaging studies include at least one of: (i) screening mammogram, (ii) diagnostic mammogram, (iii) biopsy procedure report, (iv) biopsy pathology report, or (v) post-biopsy imaging.
[0165] Example 11 . A non-transitory computer-readable medium, having stored thereon instructions that when executed, cause a computer to: identify, via one or more processors, one or more documents associated with at least one user from a corpus of documents associated with a plurality of users including the user; generate, via the one or more processors and using one or more trained large language machine learning models, one or more collated data classifications based on the one or more documents; analyze, via the one or more processors, the one or more collated data classifications to generate extracted data for the at least one user from the one or more documents; generate, via the one or more processors and using one or more trained large language machine learning models, a mapping of the extracted data for the at least one user across the corpus of documents; and cause, via the one or more processors, the mapping of the extracted data to be displayed via an output device.
[0166] Example 12. The non-transitory computer-readable medium of example 11 , wherein the mapping of the extracted data includes: an indication that the extracted data is available; and an indication of missing data not included in the extracted data.
[0167] Example 13. The non-transitory computer-readable medium of example 11 , having stored thereon instructions that when executed, cause a computer to: generate, via the one or more processors and using the one or more machine learning models, a customizable summary of the extracted data for the at least one user.
[0168] Example 14. The non-transitory computer-readable medium of example 11 , wherein analyzing the one or more documents includes: analyzing, via the one or more processors, at least some of the one or more documents using one or more image analysis models to generate at least some of the extracted data.
[0169] Example 15. The non-transitory computer-readable medium of example 11 , wherein analyzing the one or more documents includes: analyzing, via the one or more processors, each document of the one or more documents using the one or more large language machine learning models.
[0170] Example 16. The non-transitory computer-readable medium of example 11 , having stored thereon instructions that when executed, cause a computer to: detect, via the one or more processors, one or more duplicate versions of at least some of the one or more documents; and remove, via the one or more processors, the one or more duplicate versions from the one or more documents or the mapping.
[0171] Example 17. The non-transitory computer-readable medium of example 11 , having stored thereon instructions that when executed, cause a computer to: detect, via the one or more processors, that a first orientation of at least some of the one or more documents does not match a second orientation of a remainder of the one or more documents; and modify, via the one or more processors, the first orientation to match the second orientation.
[0172] Example 18. The non-transitory computer-readable medium of example 11 , having stored thereon instructions that when executed, cause a computer to: determine, via the one or more processors, that a document language of at least some of the one or more documents does not match a preferred language; and translate, via the one or more processors and using one or more trained translation machine learning models, the at least some of the one or more documents to the preferred language.
[0173] Example 19. The non-transitory computer-readable medium of example 11 , wherein the extracted data includes data types present in the one or more documents and relevant dates associated with the data types.
[0174] Example 20. The non-transitory computer-readable medium of example 19, wherein the data types include radiological or medical imaging studies.
[0175] Example 20B. The non-transitory computer-readable medium of example 20, wherein the radiological or medical imaging studies include at least one of: (i) screening mammogram, (ii) diagnostic mammogram, (iii) biopsy procedure report, (iv) biopsy pathology report, or (v) post-biopsy imaging.
[0176] Example 21 . A computer-implemented method for analyzing a corpus of documents using one or more large language machine learning models, the computer-implemented method comprising: identifying, via one or more processors, one or more documents associated with at least one user from a corpus of documents associated with a plurality of users including the user; generating, via the one or more processors and using one or more trained large language machine learning models, one or more collated data classifications based on the one or more documents; analyzing, via the one or more processors, the one or more collated data classifications to generate extracted data for the at least one user from the one or more documents; generating, via the one or more processors and using one or more trained large language machine learning models, a mapping of the extracted data for the at least one user across the corpus of documents; and causing, via the one or more processors, the mapping of the extracted data to be displayed via an output device.
[0177] Example 22. The computer-implemented method of example 21 , wherein the mapping of the extracted data includes: an indication that the extracted data is available; and an indication of missing data not included in the extracted data.
[0178] Example 23. The computer-implemented method of example 21 , further comprising: generating, via the one or more processors and using the one or more machine learning models, a customizable summary of the extracted data for the at least one user.
[0179] Example 24. The computer-implemented method of example 21 , wherein analyzing the one or more documents includes: analyzing, via the one or more processors, at least some of the one or more documents using one or more image analysis models to generate at least some of the extracted data.
[0180] Example 25. The computer-implemented method of example 21 , wherein analyzing the one or more documents includes: analyzing, via the one or more processors, each document of the one or more documents using the one or more large language machine learning models.
[0181] Example 26. The computer-implemented method of example 21 , further comprising: detecting, via the one or more processors, one or more duplicate versions of at least some of
the one or more documents; and removing, via the one or more processors, the one or more duplicate versions from the one or more documents or the mapping.
[0182] Example 27. The computer-implemented method of example 21 , further comprising: detecting, via the one or more processors, that a first orientation of at least some of the one or more documents does not match a second orientation of a remainder of the one or more documents; and modifying, via the one or more processors, the first orientation to match the second orientation.
[0183] Example 28. The computer-implemented method of example 21 , further comprising: determining, via the one or more processors, that a document language of at least some of the one or more documents does not match a preferred language; and translating, via the one or more processors and using one or more trained translation machine learning models, the at least some of the one or more documents to the preferred language.
[0184] Example 29. The computer-implemented method of example 21 , wherein the extracted data includes data types present in the one or more documents and relevant dates associated with the data types.
[0185] Example 30. The computer-implemented method of example 29, wherein the data types include radiological or medical imaging studies.
[0186] Example 30B. The computer-implemented method of example 30, wherein the radiological or medical imaging studies include at least one of: (i) screening mammogram, (ii) diagnostic mammogram, (iii) biopsy procedure report, (iv) biopsy pathology report, or (v) postbiopsy imaging.
[0187] The various embodiments described above can be combined to provide further embodiments. All U.S. patents, U.S. patent application publications, U.S. patent application, foreign patents, foreign patent application and non-patent publications referred to in this specification and/or listed in the Application Data Sheet are incorporated herein by reference, in their respective entireties, for all purposes. Implementations of the embodiments can be modified if necessary to employ concepts of the various patents, applications, and publications to provide yet further embodiments.
[0188] These and other changes can be made to the embodiments in light of the abovedetailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.
Additional Considerations
[0189] The following considerations also apply to the foregoing discussion. Throughout this specification, plural instances may implement operations or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
[0190] It should also be understood that, unless a term is expressly defined in this patent using the sentence "As used herein, the term " " is hereby defined to mean . . . " or a similar sentence, there is no intent to limit the meaning of that term, either expressly or by implication, beyond its plain or ordinary meaning, and such term should not be interpreted to be limited in scope based on any statement made in any section of this patent (other than the language of the claims). To the extent that any term recited in the claims at the end of this patent is referred to in this patent in a manner consistent with a single meaning, that is done for sake of clarity only so as to not confuse the reader, and it is not intended that such claim term be limited, by implication or otherwise, to that single meaning. Finally, unless a claim element is defined by reciting the word "means" and a function without the recital of any structure, it is not intended that the scope of any claim element be interpreted based on the application of 35 U.S.C. § 1 12(f).
[0191] Unless specifically stated otherwise, discussions herein using words such as "processing," "computing," "calculating," "determining," "presenting," "displaying," or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or a combination thereof), registers, or other machine components that receive, store, transmit, or display information.
[0192] As used herein any reference to "some implementations" or "an implementation" means that a particular element, feature, structure, or characteristic described in connection with the implementation is included in at least some implementations. The appearances of the phrase "in some implementations" in various places in the specification are not necessarily all referring to the same implementation.
[0193] As used herein, the terms "comprises," "comprising," "includes," "including," "has," "having" or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of elements is not
necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Further, unless expressly stated to the contrary, "or'1 refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
[0194] In addition, use of "a" or "an" is employed to describe elements and components of the implementations herein. This is done merely for convenience and to give a general sense of the invention. This description should be read to include one or at least one and the singular also includes the plural unless it is obvious that it is meant otherwise.
[0195] Upon reading this disclosure, those of skill in the art will appreciate still additional alternative structural and functional designs for implementing the concepts disclosed herein, through the principles disclosed herein. Thus, while particular implementations and applications have been illustrated and described, it is to be understood that the disclosed implementations are not limited to the precise construction and components disclosed herein. Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims.
Claims
1 . A computer-implemented method for analyzing a corpus of documents using one or more large language machine learning models, the computer-implemented method comprising: identifying, via one or more processors, one or more documents associated with at least one user from a corpus of documents associated with a plurality of users including the at least one user; generating, via the one or more processors and using one or more trained large language machine learning models, one or more collated data classifications based on the one or more documents; analyzing, via the one or more processors, the one or more collated data classifications to generate extracted data for the at least one user from the one or more documents; generating, via the one or more processors and using the one or more trained large language machine learning models, a mapping of the extracted data for the at least one user across the corpus of documents; and causing, via the one or more processors, the mapping of the extracted data to be displayed via an output device.
2. The computer-implemented method of claim 1 , wherein the mapping of the extracted data includes: an indication that the extracted data is available; and an indication of missing data not included in the extracted data.
3. The computer-implemented method of either one of claims 1 or 2, further comprising: generating, via the one or more processors and using the one or more trained large language machine learning models, a customizable summary of the extracted data for the at least one user.
4. The computer-implemented method of claim any one of the preceding claims , wherein analyzing the collated data classifications includes: analyzing, via the one or more processors, at least some of the one or more documents using one or more image analysis models to generate at least some of the extracted data.
5. The computer-implemented method of claim any one of the preceding claims , wherein analyzing collated data classifications includes: analyzing, via the one or more processors, each document of the one or more documents using the one or more large language machine learning models.
6. The computer-implemented method of claim any one of the preceding claims , further comprising: detecting, via the one or more processors, one or more duplicate versions of at least some of the one or more documents; and removing, via the one or more processors, the one or more duplicate versions from the one or more documents or the mapping.
7. The computer-implemented method of claim any one of the preceding claims , further comprising: detecting, via the one or more processors, that a first orientation of at least some of the one or more documents does not match a second orientation of a remainder of the one or more documents; and modifying, via the one or more processors, the first orientation to match the second orientation.
8. The computer-implemented method of claim any one of the preceding claims , further comprising: determining, via the one or more processors, that a document language of at least some of the one or more documents does not match a preferred language; and translating, via the one or more processors and using one or more trained translation machine learning models, the at least some of the one or more documents to the preferred language.
9. The computer-implemented method of claim any one of the preceding claims , wherein the extracted data includes data types present in the one or more documents and relevant dates associated with the data types.
10. The computer-implemented method of claim 9, wherein the data types include radiological or medical imaging studies.
1 1 . The computer-implemented method of claim any one of the preceding claims , further comprising: training, via the one or more processors, the one or more trained large language machine learning models using a plurality of training documents.
12. The computer-implemented method of claim 11 , wherein the training of the one or more trained large language machine learning models includes: fine-tuning, via the one or more processors and using a first subset of the plurality of training documents, the one or more trained large language machine learning models to generate the one or more collated data classifications; and fine-tuning, via the one or more processors and using a second subset of the plurality of training documents, the one or more trained large language machine learning models to generate the mapping.
13. The computer-implemented method of claim any one of the preceding claims , further comprising: generating, via the one or more processors, a category summary for each of the one or more collated data classifications.
14. The computer-implemented method of claim 13, wherein the generating of the mapping includes: combining, via the one or more processors, the category summary for each of the one or more collated data classifications.
15. The computer-implemented method of either of claims 13 or 14, further comprising: receiving, via the one or more processors and a trained chatbot machine learning model, one or more adjustments for parameters associated with the category summary; and adjusting, via the one or more processors, the parameters associated with the category summary.
16. The computer-implemented method of claim any one of the preceding claims, further comprising: interpreting, via the one or more processors, the extracted data to generate one or more procedure recommendations associated with the extracted data;
wherein the causing the mapping of the extracted data to be displayed via the output device includes causing the one or more procedure recommendations to be displayed via the output device.
17. The computer-implemented method of claim 16, further comprising: automatically scheduling, via the one or more processors, procedures associated with the one or more procedure recommendations.
18. The computer-implemented method of claim any one of the preceding claims, further comprising: retrieving, via the one or more processors, the one or more documents from at least one third party environment.
19. The computer-implemented method of claim 18, wherein the retrieving includes: generating, via the one or more processors, an automated request for the one or more documents.
20. The computer-implemented method of claim any one of the preceding claims, wherein the retrieving includes: generating, via the one or more processors, at least one prompt associated with the one or more documents using a trained chatbot machine learning model; and causing, via the one or more processors, the at least one prompt to be displayed via the output device.
21 . The computer-implemented method of claim 20, wherein the at least one prompt includes a prompt for retrieving the one or more documents.
22. The computer-implemented method of any one of the preceding claims, wherein the one or more documents include structured text documents and the analyzing includes: analyzing, via the one or more processors, the structured text documents using at least one of: (i) natural language processing (NLP) or (ii) structured document semantics.
23. A computing system for analyzing a corpus of documents using one or more large language machine learning models, the computing system comprising: one or more processors; and
one or more memories having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing system to perform the methods of any one of claims 1 -22.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363602505P | 2023-11-24 | 2023-11-24 | |
| US63/602,505 | 2023-11-24 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025111262A1 true WO2025111262A1 (en) | 2025-05-30 |
Family
ID=93853414
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2024/056510 Pending WO2025111262A1 (en) | 2023-11-24 | 2024-11-19 | Systems and methods for analyzing a corpus of documents using large language machine learning models |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025111262A1 (en) |
-
2024
- 2024-11-19 WO PCT/US2024/056510 patent/WO2025111262A1/en active Pending
Non-Patent Citations (3)
| Title |
|---|
| AOKUN CHEN ET AL: "Contextualized Medication Information Extraction Using Transformer-based Deep Learning Architectures", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 14 March 2023 (2023-03-14), XP091504293, DOI: 10.1016/J.JBI.2023.104370 * |
| CODY BUMGARDNER V K ET AL: "Local Large Language Models for Complex Structured Medical Tasks", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 3 August 2023 (2023-08-03), XP091583947 * |
| HONGJIAN ZHOU ET AL: "A Survey of Large Language Models in Medicine: Progress, Application, and Challenge", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 9 November 2023 (2023-11-09), XP091756383 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11093487B2 (en) | Natural language processing review and override based on confidence analysis | |
| US11495332B2 (en) | Automated prediction and answering of medical professional questions directed to patient based on EMR | |
| US11081215B2 (en) | Medical record problem list generation | |
| US20190096509A1 (en) | Personalized Questionnaire for Health Risk Assessment | |
| WO2019194980A1 (en) | Systems and methods for responding to healthcare inquiries | |
| US9535980B2 (en) | NLP duration and duration range comparison methodology using similarity weighting | |
| US10936628B2 (en) | Automatic processing of ambiguously labeled data | |
| US11532387B2 (en) | Identifying information in plain text narratives EMRs | |
| US20200286596A1 (en) | Generating and managing clinical studies using a knowledge base | |
| US10216719B2 (en) | Relation extraction using QandA | |
| US20200311610A1 (en) | Rule-based feature engineering, model creation and hosting | |
| US20240053307A1 (en) | Identifying Repetitive Portions of Clinical Notes and Generating Summaries Pertinent to Treatment of a Patient Based on the Identified Repetitive Portions | |
| US11823775B2 (en) | Hashing electronic records | |
| US11837343B2 (en) | Identifying repetitive portions of clinical notes and generating summaries pertinent to treatment of a patient based on the identified repetitive portions | |
| Brown et al. | Leveraging large language models in radiology research: a comprehensive user guide | |
| Garcia-Carmona et al. | Leveraging Large Language Models for Accurate Retrieval of Patient Information From Medical Reports: Systematic Evaluation Study | |
| WO2025111558A1 (en) | Methods and systems for optimizing healthcare data management using generative artificial intelligence agents | |
| WO2025171415A1 (en) | Fact checking llm outputs curing hallucinations using ai overread | |
| WO2025145165A1 (en) | Chart and nearest neighbor patient mapping and llm output | |
| US11409950B2 (en) | Annotating documents for processing by cognitive systems | |
| GB2616369A (en) | Sentiment detection using medical clues | |
| WO2025111262A1 (en) | Systems and methods for analyzing a corpus of documents using large language machine learning models | |
| Butcher | Contract Information Extraction Using Machine Learning | |
| WO2025145161A1 (en) | Privacy-protecting large-language model output | |
| WO2025111610A1 (en) | Artificial intelligence systems and methods for patient charts |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24821667 Country of ref document: EP Kind code of ref document: A1 |