[go: up one dir, main page]

WO2025160535A1 - Ia fondée sur des hypothèses - Google Patents

Ia fondée sur des hypothèses

Info

Publication number
WO2025160535A1
WO2025160535A1 PCT/US2025/013168 US2025013168W WO2025160535A1 WO 2025160535 A1 WO2025160535 A1 WO 2025160535A1 US 2025013168 W US2025013168 W US 2025013168W WO 2025160535 A1 WO2025160535 A1 WO 2025160535A1
Authority
WO
WIPO (PCT)
Prior art keywords
hypothesis
model
analysis
disease
hypotheses
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2025/013168
Other languages
English (en)
Inventor
Cristina Miranda De Araujo CORREIA
Choong UNG
Hu Li
Cheng Zhang
Zilin XIANYU
Shizhen ZHU
Daniel D. BILLADEAU
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mayo Foundation for Medical Education and Research
Mayo Clinic in Florida
Original Assignee
Mayo Foundation for Medical Education and Research
Mayo Clinic in Florida
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mayo Foundation for Medical Education and Research, Mayo Clinic in Florida filed Critical Mayo Foundation for Medical Education and Research
Publication of WO2025160535A1 publication Critical patent/WO2025160535A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Definitions

  • Al artificial intelligence
  • Al technologies have applications in many social areas, and advancements in various scientific and medical fields have been made possible by the increasing scale, dimensionality, and modality of available data.
  • Such examples may include: the generation of particle collision data at the terabyte scale per second from experiments like the large hadron collider; real-time weather recording; live monitoring of physiological parameters from healthcare wearables; or surveillance of data generated from colossal online activities.
  • Al has also spurred advancements of scientific discoveries. For instance, Al can detect latent relationships between data attributes embedded in published texts that may be undetected by users, including, e.g.. attributes in clinical modalities, material sciences, or the like. Al can also be employed to identify feasible reaction routes from a large set of possible chemical reactions to facilitate large-scale chemical synthesis. Moreover, the forms and capability 7 of Al models have advanced and flourished at unprecedented speed in past decade, especially with the rapid advancements of generative Al, which can generate human-like texts, images, videos, novel protein sequences, including novel antibody design, etc. More recently, there has been a movement towards generalist Al, that functions as ’‘one-size-fits-all”, capable of simultaneously performing tasks, such as, e.g., image recognition, text generation, chat, game, etc.
  • tasks such as, e.g., image recognition, text generation, chat, game, etc.
  • HD-AI hypothesis-driven Al
  • HD-AI may enable users (e.g., scientists) to perform thought experiments and assess the validity of scientific hypotheses.
  • a thought experiment may be a hypothetical scenario created to explore implications of, e.g., ideas, theories, principles, etc.
  • the learning models of HD-AI may be designed based on scientific hypotheses.
  • the technology described herein provides techniques for effectively incorporating scientific hypotheses into the design of HD-AI models.
  • HD-AI may promote or catalyze emergence of new scientific discoveries, as described in greater detail herein.
  • HD-AI may open a new domain of Al tasks, investigative tasks. Accordingly, HD-AI is a new class of Al, as its underly ing philosophy, algorithmic design and applications are distinct from conventional Al. particularly due to the investigative capabilities facilitated by HD-AI. as described in greater detail herein.
  • the HD-AI techniques described herein may be implemented with respect to broad medical hypotheses, including, e.g., the concept of oncology as well as other diseases.
  • the etiology of complex human diseases, such as cancer is highly intricate and involves the deregulation of a myriad of cellular systems beyond genetic aberrations and, as such, involves sophisticated computational approaches and high-dimensional data for optimal interpretation.
  • Al models may excel in some predictive aspects, such Al models often lack interpretability.
  • HD-AI provides a new emerging class of Al, which, in some instances, may be implemented as a new approach to uncover the complex etiology in human diseases from big omics data.
  • the technology disclosed herein may be applied in various areas of science and technology in addition to its applicability in oncology, which may include, e.g., tumor classification, patient stratification, cancer gene discovery, drug response prediction, tumor spatial organization, and the like.
  • the technology disclosed herein may incorporate domain knowledge and new hypotheses in Al model design. For instance, using cancer as an illustrative example, the technology disclosed herein may provide for novel cancer discoveries, which may be overlooked by some Al methods. Accordingly, in the context of cancer, the HD-AI described herein holds great promise to discover new mechanistic and functional insights that explain the complexity of cancer etiology and potentially improve treatment regimens for individual patients.
  • HD-AI hypothesis-driven artificial intelligence
  • the method may include receiving, with one or more electronic processors, a request to perform a disease analysis.
  • the method may include retrieving, with the one or more electronic processors, a HD-AI model for performing the disease analysis, where the HD-AI model is a machine learning model trained using domain knowledge and a set of hypotheses.
  • the method may include executing, with the one or more electronic processors, using the HD-AI model, the disease analysis.
  • the method may include transmitting, with the one or more electronic processors, a report indicating a result of the disease analysis.
  • retrieving the HD-AI model may include retrieving a HD-AI model specifically trained to execute the disease analysis.
  • receiving the request to perform the disease analysis may include receiving a request to perform at least one of: a disease detection analysis; a drug response prediction analysis; a gene-gene non-linear association analysis; a phenotype prediction analysis; a phenotype state determination analysis; a disease microenvironment analysis; or a genome analysis.
  • receiving the request to perform the disease analysis may include receiving a request to perform hypothesis testing with respect to a hypothesis identified in the request.
  • transmitting the report may include transmitting a report that indicates a validity of the hypothesis identified in the request.
  • transmitting the report may include transmitting a report that includes information describing associations between data attributes associated with the execution of the disease analysis.
  • transmitting the report may include transmitting a report that includes a testable hypothesis.
  • transmitting the report may include transmitting a report that includes a mechanistic interpretation of the result of the disease analysis.
  • the method may include training, using machine learning, the HD-AI model using the domain knowledge and the set of hypotheses.
  • retrieving the HD-AI model may include retrieving a HD-AI model that is a deep neural network.
  • the method may include: generating a training process to be implemented with respect to training of the HD-AI model; training the HD-AI model using the training process; executing, with the HD-AI model, an Al task; assessing execution of the Al task with the HD-AI model; and. in response to the assessment, modifying the training process for the HD-AI model.
  • assessing execution of the Al task may include determining improvement with respect to at least one of: learning performance; generalization; or training speed.
  • the Al task may include executing an Al task to investigate data associations or data relationships within the domain knowledge.
  • generating the training process may include generating a training process that includes implementation of at least one of: hypothesis-directed feature engineering; a hypothesis-oriented output; or a hypothesis-crafted mapping engine.
  • generating the training process may include generating a training process that includes hypothesis-directed feature engineering that applies a mathematical function to preprocess data such that performance of the HD-AI model is improved.
  • generating the training process may include generating a training process that includes performance of a hypothesis-oriented output approach that includes setting a predicted outcome as an output associated with the training of the HD-AI model, wherein the HD-AI model is evaluated based on a deviation between the output and a corresponding observed outcome.
  • generating the training process may include generating a training process that includes performance of the hypothesis-crafted mapping engine approach that includes incorporating domain knowledge into the HD-AI model to contextualize the HD-AI model.
  • the system may include one or more electronic processors configured to: receive a request to perform a medical analysis; retrieve a HD-AI model for performing the analysis, wherein the HD-AI model is a machine learning model trained using domain knowledge and a set of hypotheses; execute, using the HD-AI model, the medical analysis; and transmit a report indicating a result of the medical analysis.
  • the medical analysis may be a disease analysis.
  • the disease analysis may include at least one of a tumor classification analysis, a patient stratification analysis, a gene discovery analysis, a drug response prediction analysis, or a tumor spatial organization analysis.
  • Non-transitoiy computer-readable medium may store instructions that, when executed by an electronic processor, may cause the electronic processor to perform operations comprising: receiving a request to perform a medical analysis; retrieving a hypothesis-driven artificial intelligence (HD-AI) model for performing the analysis, where the HD-AI model is a machine learning model trained using domain knowledge and a set of hypotheses; executing, using the HD-AI model, the medical analysis; and transmitting a report indicating a result of the medical analysis.
  • HD-AI hypothesis-driven artificial intelligence
  • the non-transitory computer-readable medium may store instructions that, when executed by an electronic processor, may cause the electronic processor to perform operations including: training, using machine learning, the HD-AI model using the domain knowledge and the set of hypotheses.
  • FIG. 1 illustrates an example overview of Al and scientific hypotheses formulation according to some configurations.
  • FIG. 2 illustrates an example comparison of a conventional Al training process and a hypothesis formulation process according to some configurations.
  • FIG. 3 illustrates an example comparison between conventional Al and hypothesis- driven Al according to some configurations.
  • FIG. 4A is an example diagram illustrating an example hypothesis-directed feature engineering approach according to some configurations.
  • FIG. 4B is an example diagram illustrating an example hypothesis-oriented output approach according to some configurations.
  • FIG. 4C is an example diagram illustrating an example hypothesis-crafted mapping engine approach according to some configurations.
  • FIG. 5 illustrates an example design and learning process for hypothesis-driven artificial intelligence models according to some configurations.
  • FIG. 6 illustrates an example of an Al method which design to a certain extent is aligned with hypothesis-driven Al methodologies with respect to tumor classification according to some configurations.
  • FIG. 7 illustrates an example of an Al method which design to a certain extent is aligned with hypothesis-driven Al methodologies with respect to patient stratification, including cancer state determination, according to some configurations.
  • FIG. 8 illustrates an example of an Al method which design to a certain extent is aligned with hypothesis-driven Al methodologies with respect to deciphering cancer genome(s) according to some configurations.
  • FIG. 9 illustrates an example of an Al method which design to a certain extent is aligned with hypothesis-driven Al methodologies with respect to drug response according to some configurations.
  • FIG. 10 illustrates an example of an Al method which design to a certain extent is aligned with hypothesis-driven Al methodologies with respect to gene-gene non-linear associations according to some configurations.
  • FIG. 11 illustrates an example of an Al method which design to a certain extent is aligned with hypothesis-driven Al methodologies with respect to phenotype prediction according to some configurations.
  • FIG. 12 illustrates an example of an Al method which design to a certain extent is aligned with hypothesis-driven Al methodologies with respect to organization of cells in tumor microenvironment according to some configurations.
  • FIG. 13 illustrates components and areas of application in the design of the hypothesis-driven Al technology according to some configurations.
  • FIG. 14 schematically illustrates a system for implementing hypothesis-driven Al according to some configurations.
  • FIG. 15 schematically illustrates a server of the system of FIG. 14 in accordance with some configurations.
  • FIG. 16 is a flowchart of a method of implementing hypothesis-driven Al in accordance with some configurations.
  • FIG. 17 is an example diagram related to Al-powered hypothesis testing and scientific discoveries according to some configurations.
  • FIG. 18 illustrates an example of implementing a hypothesis in a learning model according to some configurations.
  • FIG. 19 illustrates an example diagram of implementing Al solutions to validate working scientific or medical hypotheses according to some configurations.
  • non-transitory. computer readable medium comprises all computer-readable media but does not consist of a transitory, propagating signal. Accordingly, non-transitory computer-readable medium may include, for example, one or more hard disks, CD-ROMs, optical storage devices, magnetic storage devices, ROMs (Read Only Memory), RAMs (Random Access Memory), register memory, processor caches, or any combination thereof.
  • phrasesology 7 and terminology 7 used herein is for the purpose of description and should not be regarded as limiting.
  • the use of “including,” “containing,” “comprising,” “having,” and variations thereof herein is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.
  • the terms “connected” and “coupled” are used broadly and encompass both direct and indirect connecting and coupling. Further, “connected” and “coupled” are not restricted to physical or mechanical connections or couplings and can include electrical connections or couplings, whether direct or indirect.
  • electronic communications and notifications may be performed using wired connections, wireless connections, or a combination thereof and may be transmitted directly or through one or more intermediary devices over various types of networks, communication channels, and connections.
  • relational terms such as first and second, top and bottom, and the like may be used herein solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
  • the technology disclosed herein may provide systems and methods of implementing HD-AI techniques. While the technology disclosed herein is described with respect to the concept of complex diseases (e.g.. cancer), (which may include, e.g., disease classification, patient stratification, disease gene discovery, drug response prediction, tumor spatial organization, and the like), the technology disclosed herein may also be applied in various areas of science, medicine, and technology in addition or alternatively to its applicability in oncology and medical analysis. Such additional areas may include, without limitation, drug discovery, chemical synthesis, chemical reaction modeling, geological modeling and development, atmospheric and oceanic modeling, and semiconductor development. Cancer is used herein as an illustrative example when describing the technology 7 disclosed herein.
  • Cancer is a complex disease with a wide array of factors contributing to the etiology of cancer. Understanding underlying mechanisms of cancer may involve sophisticated computational approaches to uncover not only genetic permutations contributing to uncontrolled cell growth, but also cellular and systemic factors that contribute to cancer development and response to therapy. Years of research has shown that cancer is not just a disease of genes but rather a disease of systems, including, e.g., epigenetic modifications, alterations in signaling pathways, tumor microenvironment interactions, immune system responses, lifestyle factors, etc. This may indicate that the etiology of cancer cannot be merely- attributed to aberrations of a number of genes but rather may involve a broad array of biological factors, including, e.g.. the microenvironment and micro biome.
  • Al technologies including deep learning, has emerged as a transformative force to revolutionize the way cancer is studied, including, e.g., target identification, drug response prediction, and the like.
  • these conventional methods are powerful to classify cancer types, stratify patients, predict outcomes, etc.
  • meaningful biological signals learned by Al models are usually obscure to the users.
  • feature selection methods such as, e.g., recursive feature elimination and information gain, have been devised to identify data attributes, such as. e.g., genetic mutations that contribute to the performance of Al models.
  • feature selection is external to the learning processes of Al models and what useful “knowledge” Al models have learned, may still be inaccessible.
  • Al can be fed with larger data volumes, higher data dimensionally and modalities, and, ultimately, can perform various Al tasks, such as, e.g., object classification or recognition, text generation, voice conversation, video making, and game production, at once. Nonetheless, these Al techniques may still be considered conventional Al (regardless of how sophisticated the tasks learned or performed).
  • the present disclosure will discuss how hypothesis-driven Al differs from conventional Al, and provide examples of the hypothesis-driven Al and discuss utilities of the hypothesis-driven Al and potential challenges of the hypothesis-driven Al.
  • the present disclosure will also discuss the future of hypothesis-driven Al in oncology research and discuss how these new Al technologies can revolutionize individualized and precision medicine.
  • Al supervised or unsupervised, utilizes existing generic learning processes without incorporation of prior knowledge in the training processes.
  • data may be labeled with known classes for classification models or labelled with quantitative measures, such as responsive drug dosages for regression models.
  • unsupervised learning such as hierarchical clustering, k-nearest neighbor, and self-organizing maps, the similarity of samples are the basic assumption where distance metrics, such as Euclidean distance are employed.
  • Feature selection methods may be used to identify data attributes (also called features) that contribute to the performance of Al models. In some instances, inherent associations of selected features in explaining properties of data may often be obscured.
  • the present disclosure utilizes the term “hypothesis-driven Al” (or HD-AI) to refer to a new emerging class of Al, which may be designed based on novel hypotheses.
  • the HD- AI disclosed herein provides a new avenue for the use of Al technologies, such as, e.g., in oncology research.
  • hypotheses act as simplified representations of a real-world phenomenon.
  • the value provided by hypotheses generally relates to their ability to generate predictions that can be tested against real-world observations.
  • This concept is different from the “null hypothesis” commonly used in statistics, which aims to capture all possible combinations of events to assess statistical significance (e.g., all possible molecular conformations in a chemical space, such as those explored in for drug discovery).
  • each of these configurations may be deemed as one hypothesis.
  • all of these different configurations or combinations are simply possibilities, and, in the strict sense, these “hypotheses” do not provide real scientific insight into how nature operates. Often inference of function or behavior based on similarity is portraited as a type of hypothesis.
  • gene X is hypothesized to play a similar role as gene Y since they share similar amino acid sequences.
  • inference based on homology may be viewed as an assumption, instead of scientific hypothesis.
  • a scientific hypothesis may be a testable but a falsifiable mechanistic framework, or a set of corollaries that elucidate how a system operates.
  • a hypothesis can be tested through experimentation or observation and can be proven false based on the results (as illustrated in FIG. 1).
  • Scientific hypotheses may server as the foundational bricks that underpin scientific systems.
  • a scientific hypothesis is not knowledge itself, as knowledge refers to established facts whereas a scientific hypothesis is a working model that can be tested and potentially disproven. Accordingly, scientific hypotheses may be frameworks that offer explanations and predictions within certain contexts and may be useful to solve problems in daily life.
  • Scientific hypotheses can be represented in various forms, based on, e.g., a scientific context, a nature of research, a type of inquiry' being conducted, etc.
  • Some examples of representing hypotheses may include: (i) mathematical equations; (ii) conditional Boolean equations; (iii) graph representations; (iv) rule-based pictorial representations; and (v) verbal statements.
  • these different forms of representing hypotheses may be interchangeable based on, e.g., the preference of scientists (or user).
  • Mathematical equations may be a common representation form among, e.g., mathematicians, physicists, geoscientists, engineers, chemists, biologists, etc. Mathematical equations may describe how different attributes, which may be represented as variables, relate to each other.
  • One advantage of representing hypotheses in mathematical equation form may be that scientists can perform thought experiments via computation and inference based purely on mathematical rules, without determining whether the mathematical objects involved exist in nature.
  • mathematical equation representations may provide cleanliness and conciseness when describing scientific hypotheses. In some cases, mathematical equations may enable scientists to generate quantitative predictions that can be validated through experimental measurements.
  • Conditional Boolean equations may be a representation form that may be depicted as conditions, such as, e.g., “true-false”, switch-like “on-off’ or “activate-inhibif ’, or “yes-no” expressed in terms of Boolean functions.
  • Boolean algebra may provide a flexible framework to describe hypotheses (e.g., such as hypotheses that involve complex systems). Boolean algebra may be well-suited to describe interactions and regulation mechanisms, such as, e.g., gene regulation in biological systems.
  • conditional Boolean equation representations may include the offered flexibility in formulating Boolean algebra, which may not only describe conditional relations between components in a system but may also elucidate how these components govern behavior of the system. In some instances, conditional Boolean equations can predict one or more activity states of a system.
  • Graph representations may capture intricate and interwoven relations between observables and factors that drive event occurrences.
  • Probabilistic graph models such as, e.g., Bayesian and causal direct acyclic graphs, may be used to represent causal relationships or to infer an effect of an intervention. Simulation of such hypothetical causal graphs may allow for the detection of causal pathways that may lead or determine an occurrence of events.
  • Graphs may also be used to describe relationships between, e.g., individuals in society, genes and pathways in biology, atomic connectivity in molecular structures, etc. In general, scientific hypotheses that deal with multiple (sequential or not) and complex events may be suited for graph representations.
  • Graph representations may be used to perform perturbation studies by altering attribute relationships and then determining to which extent a graph model agrees with experimental data.
  • Rule-based pictorial representations may refer to visual depictions of relationships. Rule-based pictorial representations may allow simple pictures to be used to formulate how nature behaves.
  • One advantage of rule-based pictorial representations may include an ability to capture irreducible complexity (e.g., a property of complex systems where the system evolution cannot be predicted a priori but may explicitly be simulated to observe the consequences). This contrasts with most mathematical equations (including, e.g., Newton's law) that are agnostic to time (e.g., regardless of time step of computation as long as initial conditions are defined) and cause (e.g., equations do not tell which terms are driving causes that affect the behavior of other terms).
  • rule-based pictorial representations may be that scientists can create their own in silico "universe 7 ’ by setting up simple rules (here “rules” refer to hypotheses) and monitor their outcomes.
  • rules refer to hypotheses
  • the simplified rules used in rules-based pictorial representations may reflect connections with the real world, such as, e.g., Conway’s game of life.
  • Verbal representations may be a straightforward and rudimentary form of hypotheses formulation. For instance, in most cases, scientists first organize their thoughts verbally to clarify their thinking before translating them into a more concise and abstract representation. Hence, regardless of the complexify and sophistication of a scientific hypothesis, the scientific hypothesis can be uttered verbally. Verbal representations may serve as a starting point for scientists to formulate and refine their hypotheses before translating them into, e.g., mathematical equation representations or graph representations.
  • Al may involve constructing a mathematical model and iteratively finding the right set of parameters that best fit the data.
  • Al may involve a mapping process that transform inputs (e.g., data) into outputs using a mathematical framework of choice (e.g., types of learning processes or models).
  • the learning process also called training, may include iteratively adjusting parameters in a chosen mathematical framework until the computed outputs match closely to the real observations and minimize a predefined error metric.
  • Al can be framed as a parameter fitting problem, as illustrated in FIG. 2.
  • FIG. 2 illustrates a comparison of a conventional Al training process 105 and a hypothesis formulation process 110.
  • a learning model may include the following components: (i) one or more features (e.g., inputs from data, which may involve the selection of specific features); (ii) a mapping engine (e.g., the type of learning model, such as. e.g., random forest, support vector machine, artificial neural networks, deep neural networks, etc ); (iii) one or more learning strategies (e.g., supervised, self-supervised, semi-supervised, unsupervised learning, etc.); or (iv) one or more outputs (e.g., a classification, a recognition, a data generation, etc.).
  • a mapping engine e.g., the type of learning model, such as. e.g., random forest, support vector machine, artificial neural networks, deep neural networks, etc .
  • learning strategies e.g., supervised, self-supervised, semi-supervised, unsupervised learning, etc.
  • outputs e.g., a classification, a recognition, a data generation, etc.
  • Scientific hypothesis and Al share several commonalities as both facilitate the understanding and representation of underlying problems.
  • Al training can be formulated as a parameter fitting problem while a scientific hypothesis may be a model fitting problem (as illustrated in FIG. 2).
  • scientists may conceptualize a framework to explain a natural phenomenon while capturing one or more properties of the observed data.
  • scientists may observe natural behavior by taking measurements without any intervening or altering of a system (e.g., such as in cosmology and geosciences, where experimental interventions are almost impossible) or conduct “skillful interrogation of nature” by modifying specific steps or processes through experimentation (e.g., via genetic editing to produce novel phenotypes in biological systems).
  • the ultimate goal may be to develop a functional mechanistic framework that explains the behavior of a natural system. From a data standpoint, one primary objective may be to develop a mechanistic model that explains how observed data is generated by the underlying processes of nature. A valuable scientific hypothesis may generate accurate (or plausible) predictions that align with observations or measured data within a certain scenario. Further, when comparing the workflows of Al and scientific hypothesis (FIG. 2), similarity in how both processes operate may be observed. Since the outputs of scientific hypotheses are inherently predictions, such outputs can be converted into Al tasks to, e.g., drive Al-powered discoveries. An understanding of the differences between conventional Al and HD-AI can assist in the design of thought experiments to validate scientific hypothesis.
  • FIG. 3 illustrates an example comparison between conventional Al 305 and HD-AI 310.
  • conventional Al 305 may utilize generic learning algorithms (e g., support vector machine, decision tree, artificial neural network, etc.) to identify patterns embedded in data and make predictions without seeking a deeper understanding of the mechanisms underlying data generation.
  • Learning models of HD-AI 310 may be crafted based on scientific hypotheses together with domain knowledge.
  • a goal of HD- AI may be to evaluate how a designed learning algorithm aligns with the real mechanisms that generate the data.
  • HD-AI 310 may focus on improving the learning performance, learning speed, and generalizability under a hypothesis framework (represented in FIG. 3 by reference numeral 320).
  • the HD-AI 310 may involve assessing Al task(s) (represented in FIG. 3 by reference numeral 320), which may include, e.g., assessing learning performance, generalization, training speed, etc.
  • an assessment of the Al task(s) may be utilized to evaluate a hypothesis. For example, as illustrated in FIG. 3, when an assessment of Al tasks indicates improvement with respect to learning performance, generalization, training speed, etc., a hypothesis may be accepted (represented in FIG. 3 by reference numeral 325). When an assessment of Al tasks does not indicate improvement with respect to learning performance, generalization, training speed, etc., a hypothesis may be rejected, and, in some instances, the hypothesis may be revised or replaced (represented in FIG.
  • the hypothesis may be revised or replaced based on domain knowledge or data, such as, e.g., observable measured attributes (represented in FIG. 3 by reference numeral 335).
  • the revised or replaced hypothesis may be implemented with respect to one or more Al tasks (represented in FIG. 3 by reference numeral 340), where the one or more Al tasks may be assessed (e.g., to determine whether learning performance, generalization, training speed, etc. improved) (represented in FIG. 3 by reference numeral 320).
  • the HD-AI 310 may include various iterations (or loops), which, in some cases, may involve revising or replacing a hypothesis until, e.g., a hypothesis is accepted, as represented in FIG. 3.
  • conventional Al 305 and HD-AI 310 may focus on identifying associative or probabilistic patterns embedded in data and making predictions without seeking deeper understanding of the mechanisms underlying data generation, while the objective of HD-AI 310 may be to implement scientific hypotheses into the learning models and leverage Al tasks, such as, e.g., classification, and (simultaneously) perform hypothesis testing.
  • the motivation and workflows of HD-AI 310 and conventional Al 305 may use the learning models and the iterative learning process to assess hypothetical scenarios and evaluate mechanistic insights on how the data is generated.
  • HD-AI may focus on a new task domain referred to herein as investigative tasks, aiming to improve the learning performance, learning speed, and generalizability under a hypothesis framework (e.g., as illustrated in FIG. 3).
  • An underlying assumption may be that when a mechanistic framework offers a more accurate explanation of how nature generates the data used in the learning process, the Al tasks will train faster, show better performance, and the model can be generalizable to other datasets generated under the same mechanism of nature. Under this scenario, the comparison with other competing mechanistic models may be part of the HD-AI design.
  • HD-AI may also be distinct from explainable Al (XAI).
  • XAI plays a role in helping researchers understand how MU models make decisions (e.g., how Al fits the data) and extract meaningful insights from the data.
  • XAI can open the Al “black box”, XAI is not concerned with deciphering which underlying mechanisms contribute to data generation. For this reason. XAI may fall under the conventional Al category.
  • HD-AI may assess the validity of a scientific hypothesis, using hypothesis-crafted learning models (regardless of whether researchers understand how Al learns).
  • Integrating a scientific hypothesis into an Al model may be used to realize AI- assisted thought experiments.
  • a learning model may encompass three components or stages: features (data); a mapping engine; and outputs.
  • Scientific hypotheses can be incorporated in each of these stages, as illustrated in FIGS. 4A-4C. Therefore, the technology described herein may incorporate scientific hypotheses in the design of HD-AI by implementing one or more of the following approaches: (i) hypothesis-directed feature engineering approach(es); (ii) hypothesis-oriented output approach(es); or (iii) hypothesis- crafted mapping engine approach(es).
  • a hypothesis-directed feature engineering approach and a hypothesis-oriented output approach may preserve a core mapping engine of Al, and, thus, allow for easier implementation and integration of new features or improvements without fundamentally altering the underlying architecture.
  • FIG. 4A is an example diagram 400 illustrating an example hypothesis-directed feature engineering approach.
  • a hypothesis-directed feature engineering approach may apply a mathematical function that encapsulates the hypothetical model.
  • the Al learning and testing processes may be used to assess the validity of a defined hypothesis.
  • Feature engineering may be a technique used in Al to enhance training and performance of models.
  • Feature engineering may include performing a mathematical transformation to preprocess data prior to training and making the data more suitable for the learning model.
  • feature engineering may not be exploited as a component to test scientific hypothesis within Al.
  • gene-gene expression values represented in FIG.
  • FIG. 4A by reference numeral 405) may be transformed into a single-output value (represented in FIG. 4A by reference numeral 410) by applying a mathematical function (represented in FIG. 4A by reference numeral 415) that captures the relationship between gene pairs (represented in FIG. 4A by reference numeral 420).
  • a mathematical function represented in FIG. 4A by reference numeral 415) that captures the relationship between gene pairs (represented in FIG. 4A by reference numeral 420).
  • MALANI machine learning-assisted network inference
  • SVM support vector machines
  • FIG. 4B is an example diagram 430 illustrating an example hypothesis-oriented output approach.
  • predicted outcomes may be set as the outputs of Al training. The validity of scientific hypotheses may be evaluated based on how much Al outputs deviate from actual observations. A hypothesis predicts hypothetical outputs.
  • Outputs from a system can be represented in different forms, such as, e.g., temporal, spatial, phenotypic, transformation, generative, etc. (as illustrated in FIG. 4B). Such output forms may be leveraged to design Al tasks that investigate data associations effectively while also exploring diverse relationships within the data.
  • SPIN-AI spatially informed Al
  • SPIN-AI may identify genes whose expression informs how cells are organized in space.
  • SPIN-AI is a deep learning model designed to assess a hypothesis that certain genes, referred to as spatially predictive genes (SPGs), can govern and inform on how cells are organized within a microenvironment niche.
  • SPGs spatially predictive genes
  • SPIN-AI may employ an unbiased deep learning approach and may use spatial gene expression, as an input, and may be trained to predict the x and y coordinates on a spatial transcriptomic slide.
  • SPIN-AI is an example of hypothesis-oriented output approaches within HD-AI.
  • FIG. 4C is an example diagram 450 illustrating an example hypothesis-crafted mapping engine approach.
  • a hypothesis-crafted mapping engine approach may implement learning models (e.g., artificial neural networks (ANNs)) that may be crafted based on mathematical functions that describe processes of interest (e g., broken symmetry in particle physics) or defined relationships between data attributes, such as. e.g., interactions between genes in biological systems.
  • ANNs artificial neural networks
  • Such hypothesis-crafted learning model may be subjected to “perturbations” (e.g., slight modifications of the structures of the learning model) to assess how changes impact learning performance and evaluate how well the model fits in explaining nature's mechanisms.
  • Incorporating domain knowledge stemming from, e.g., observational, empirical, mathematical, physical, and biological understanding into Al learning models has the potential to provide several benefits, including, e.g., constraining the parameter space, reducing the number of parameters, reducing data size used for training, etc.
  • a hypothesis-crafted mapping engine approach using informative priors may also steer forward more interpretable Al models with increased generalizability.
  • Some examples may include: Physics-Informed Neural Networks (PINNs), Biologically Informed Al, or Visible Machine Learning.
  • Such knowledge- informed Al models fall into the ANNs class and may encompass several types of deep learning models, such as, e.g., autoencoder, convolutional neural network (CNN), generative adversarial network (GAN), generalist Al, etc.
  • CNN convolutional neural network
  • GAN generative adversarial network
  • the learning models e.g., the mapping engines
  • hypothesis-crafted mapping engine approaches may be largely applicable to ANNs.
  • FIG. 5 illustrates an example design and learning process 500 for hypothesis-driven Al algorithms according to some configurations.
  • domain knowledge represented in FIG. 5 by reference numeral 505
  • hypotheses represented in FIG. 5 by reference numeral 510
  • the incorporation of domain knowledge 505 and hypotheses 510 into the design and learning process 500 may indicate that algorithms of hypothesis-driven Al may be more flexible and open to manipulation based on specific hypothesis settings.
  • the design of hypothesis-driven Al models involves ingenuity of designers while offering flexible frameworks.
  • hypothesis-driven Al can be designed based on any existing Al major domains, including, e.g..
  • the design and learning process 500 for hypothesis-driven Al algorithms may include feature selection (represented in FIG. 5 by reference numeral 520). However, in some instances, the design and learning process 500 for hypothesis-driven Al algorithms may not include feature selection.
  • the resulting trained hypothesis-driven Al may indicate inherent structures of meaningful data attributes that explain the behavior of systems, enabling to formulate testable mechanistic models. Accordingly, in some instances, the technology disclosed herein may facilitate the discovery of novel associations between attributes that can explain data properties via the incorporation of knowledge or hypotheses being built-in to the design of learning algorithms (e.g., as illustrated in FIG. 5).
  • Table 1 (below) further summarizes differences between conventional Al technologies and the hypothesis-driven Al.
  • hypothesis-driven Al methodologies may include, e.g., broad areas of impact, direct experimental design, integration of domain-specific knowledge, efficient resource utilization, interpretable and explainable, targeted investigation, etc.
  • hypothesis-driven Al technology disclosed herein offers a targeted and informed approach to address many of the challenges mentioned above.
  • the hypothesis-driven Al technology disclosed herein may perform focused investigations by centering on specific hypotheses or research questions and thus uses prior knowledge to guide its exploration.
  • the hypothesis-driven Al technology disclosed herein can generate more interpretable and explainable results as compared to conventional Al tools, since the underlying hypotheses provide a mechanistic framework for understanding the logic behind certain predictions or outcomes.
  • the hypothesis-driven Al technology disclosed herein uses resources more efficiently. Since the hypothesis-driven Al technology disclosed herein allows researchers to concentrate their computation on areas of particular interest, the hypothesis-driven Al technology disclosed herein can reduce the need for extensive data and computational resources.
  • the hypothesis-driven Al technology disclosed herein also encourages the integration of domain-specific knowledge to generate meaningful insights within a specific context. Moreover, the hypothesis-driven Al technology disclosed herein allows researchers to test hypotheses and validate those hypotheses via AI- based guided experiments. For example, in some configurations, it is the Al that performs thought experiments or hypotheses raised by researchers to confirm causal relations or associations between data attributes hidden in the data that explain the behavior of studied systems.
  • the hypothesis-driven Al technology disclosed herein can provide a valid experimental design framework to ensure that experiments are aligned with specific goals.
  • Al algorithms guided by hypotheses can transform complex data into patterns indicative of early-stage cancers, potentially revolutionizing early cancer detection methods.
  • the hypothesis-driven Al technology disclosed herein can play a pivotal role in prioritizing actionable genomic alterations for further investigation.
  • the Al models disclosed herein can sift through genomic datasets to identify alterations with the highest potential for clinical impact, which could also mitigate the limitations caused by the complexity of druggable alterations and the challenges in interpreting clinical outcomes.
  • the hypothesis-driven Al technology disclosed herein can aid in understanding the complex dynamics of tumor heterogeneity and acquired resistance. By modeling and simulating the evolutionary processes within tumors, the hypothesis-driven Al technology disclosed herein can help validate or even generate hypotheses about key drivers of resistance and potential therapeutic strategies to counteract them.
  • FIGS. 6-12 illustrate examples of hypothesis-driven Al methodologies with application in different aspects of oncology according to some configurations. [0095] TUMOR CLASSIFICATION.
  • FIG. 6 illustrates an example of an Al method which design to a certain extent is aligned with hypothesis-driven Al methodologies with respect to tumor classification.
  • the underlying hypothesis (H) is: “Accurate identification of the latent primary site of CUP tumors can inform on site-specific therapies’” (represented in FIG. 6 by reference numeral 605); and the algorithm tool (T) is: OncoNPC (XGBoost-based classifier) (represented in FIG. 6 by reference numeral 610).
  • CUP cancers of unknown primary
  • Pathology assessment which plays a role in determining primary cancer types, is often lacking for these tumors and can be a challenging task for highly metastatic or poorly differentiated tumors.
  • established targeted therapies are lacking for CUP. Therefore, there is a need to develop a tool to guide classification of CUPs and help deliver more accurate primary cancertype predictions for patients.
  • Oncology NGS-based Primary Cancer-Type Classifier (2023), an XGBoost- based classifier.
  • the authors' key hypothesis is that genomic signatures, age, and sex of patients encodes information needed for accurate classification of CUPs.
  • NGS next-generation sequencing
  • CUP cancer types and unknown primary tumor cancer types
  • FIG. 7 illustrates an example of an Al method which design to a certain extent is aligned with hypothesis-driven Al methodologies with respect to patient stratification (e.g., cancer state determination).
  • the underlying hypothesis (H) is: "evolutionanly conserved biological interactions and hierarchical structures capture information flows that can be modulated by deep neural nets” (represented in FIG. 7 by reference numeral 705); and the algorithm tool (T) is: P-Net (Deep informed Neural Net) (represented in FIG. 7 by reference numeral 710).
  • P-Net a biologically informed deep neural network
  • the design of the P-Net algorithm is built upon the hypothesis that evolutionarily conserved biological interactions and hierarchical structures and information flows can be recapitulated by deep neural network architectures.
  • P-Net multi-omic status of genes (mutations, copy number, methylation, and expression) were used as inputs.
  • the connection of input nodes is determined by the involvement of a gene (input node) in each pathway (a node in first hidden layer).
  • P-Net additionally offers an interpretable computational platform for researchers to identify critical biological pathways and associated genes expression profiles contributing to disease classification from the Al trained models.
  • the neural network architecture of P-Net may leave out a number of important genes or pathways that contribute to disease stages.
  • FIG. 8 illustrates an example of an Al method which design to a certain extent is aligned with hypothesis-driven Al methodologies with respect to deciphering new class of cancer genes (e.g., cancer genome).
  • the underlying hypothesis (H) is: “‘Dark’ genes that are not differentially expressed or mutated can play key roles in linking oncogenic signals” (represented in FIG. 8 by reference numeral 805); and the algorithm tool (T) is: MALANI (SVM) (represented in FIG. 8 by reference numeral 810).
  • Cancer-driving genes are not limited to mutated oncogenes but can involve a myriad of genes whose function do not involve tumorigenesis but are needed for tumor maintenance. This includes “never mutated” genes.
  • a previous study by Gatenby et al. used a computational modeling approach to reveal the clinical benefits in targeting these genes. Building upon these observations, the technology disclosed herein provides a new class of cancer genes, referred to as “dark cancer genes” or Class II cancer genes acting as “signal linkers” to coordinate oncogenic signals between mutated and differentially expressed genes, can sen e as novel targetable candidates. These cancer-relevant genes are often missed using traditional statistics methods because they are neither mutated nor differentially expressed.
  • MALANI Machine Learning-Assisted Network Inference
  • the MALANI algorithm can be devised to incorporate other omics layer such as epigenetics and proteomics. Also, it remains to be seen how cancer progression affects the type and number of Class II cancer genes. Future work may include exploring how targeting Class II cancer genes can better benefit cancer patients. [00107] FINDING CHEMICAL FINGERPRINTS THAT ASSOCIATE WITH DRUG RESPONSE.
  • FIG. 9 illustrates an example of an Al method which design to a certain extent is aligned with hypothesis-driven Al methodologies with respect to finding chemical fingerprints that associate with drug response (e.g., drug response).
  • the underlying hypothesis (H) is: “right combination of molecular features encode rules that govern drug responses” (represented in FIG. 9 by reference numeral 905); and the algorithm tool (T) is: FIGP (Symbolic regression and genetic programming) (represented in FIG. 9 by reference numeral 910).
  • Predicting how cancer cells respond to a given drug is another area in oncology.
  • drug prediction and drug matching play a role in personalized medicine and are key to devise effective therapeutic interventions. Due to the intricate nature of pharmacokinetics (e.g., how bodies modulate drug actions) and pharmacodynamic (e.g., how drugs interact with bodies), specific treatments for individual patients are becoming more and more important for patient’s pharmacological outcome. Identifying the right combination of biological features underlying personal cancer etiology is helpful for predicting how a patient may respond to a specific drug. Precision in drug prediction thus not only enhances treatment efficacy but can also minimizes potential side effects.
  • Symbolic regression built upon a genetic algorithm, is a regression method guided by the hypothesis that the right combinations of symbolic features (e.g., mathematical operators) encode information to govern the behavior of a system, such as cancer etiology and drug response.
  • SR has been successfully used to discover physical laws that govern the properties of physical systems, by searching the mathematical space to find the best model fitting the data.
  • Other examples are the TuringBot software and Al Feynman algorithm. Taken together, this approach possesses a promising avenue for distilling complex biological relationships into interpretable simple rules, bridging the gap between intricate molecular interactions and actionable insights for therapeutic interventions and drug response.
  • FIGP is an extension of SR combined with Genetic Programming (GP).
  • GP Genetic Programming
  • FIGP will be particularly useful for generating interpretable quantitative structure-activity relationship/quantitative structure-property relationship (QSAR/QSPR) models.
  • QSAR/QSPR quantitative structure-activity relationship/quantitative structure-property relationship
  • FIG. 10 illustrates an example of an Al method which design to a certain extent is aligned with hypothesis-driven Al methodologies with respect to gene-gene non-linear associations.
  • the underlying hypothesis (H) is: “ANNs encode meaningful biological information in the form of intemeural weights learned from data” (represented in FIG. 10 by reference numeral 1005); and the algorithm tool (T) is: ANNE (Artificial Neural Net) (represented in FIG. 10 by reference numeral 1010).
  • Functional associations between genes are important to understand cancer etiology in addition to physical interactions between proteins and other gene products, such as RNAs.
  • functional associations between genes are inferred from statistical-based correlative approaches, such as co-expression of genes using Pearson’s correlation and mutual information.
  • conventional statistical methods often fail to capture nonlinear functional associations of genes that explain the properties of high-dimensional cancer sequencing data.
  • An Artificial Neural Network Encoder (2022) was recently developed as an Artificial Neural Network (ANN) implemented with novel weight engineering algorithm to reverse engineer gene-gene interactions from gene expression data.
  • the weight engineering algorithm was inspired by how the human brain leams and stores knowledge in the form of sparse spatial representations. One way the brain encodes information is by leveraging connections between neurons which are fundamentally plastic, and selection of storage information based on its importance. This process implies that changes such as “pruning” occur during the learning process, i.e., redundant information is discarded.
  • This process allows the learned information to be sparsely represented and distributed as “weights” in inter-neuronal connections throughout the neocortex.
  • the underlying hypothesis is that inter-neuronal weights resulting from the learning process represent the knowledge the brain leams from observation or data processing. In principle, mathematical manipulation of these inter-neuronal weights might be able to recover the learned knowledge.
  • This idea was tested using an autoencoder, a commonly used deep learning model to reduce data dimensionality via reconstructing input data.
  • autoencoder represents a “little brain” and the trained inter-neuronal weights represent the “knowledge” learned.
  • weight engineering algorithm implemented in this example shows that trained ANNs indeed encode learned “knowledge” represented as associations between data attributes (e.g., gene expression) in inter-neuronal weights. These associations between data attributes can serve as knowledge discovery to uncover novel functional associations between genes that underpin cancer etiology. Yet, there is still plenty of room to improve knowledge discovery via weight engineering, for instance, by incorporating multi-omics data to decipher how genes are regulated at multi-omics levels.
  • data attributes e.g., gene expression
  • DCell By incorporating an extensive collection of domain knowledge and data about cellular sub-systems and their hierarchical structure, it is possible to create an interpretable deep neural network that can accurately simulate the function of a eukaryotic cell.
  • This methodology overcomes the current limitation for ANNs by incorporating extensive knowledge of cell biology, making it a visible neural network (VNN) with a more interpretable inner structure.
  • VNN visible neural network
  • DCell (2020) aims to predict drug response phenotypes and drug synergy in human cancer cell lines revealed the power of this algorithm in constructing interpretable models for cancer treatment.
  • DCell was integrated with three embeddings as inputs: drug response, genotype, and drug structure.
  • the authors envisioned that future work may integrate mutations with additional levels of molecular information such as epigenetic states, gene expression, or microenvironmental influences.
  • current DCell structures are built upon both annotated geneontology (GO) and curated literature that can be biased to well -studied genes, the algorithm can be enhanced by integrating resources from novel gene-gene associations or gene-function assignments from computational models with robust results.
  • FIG. 12 illustrates an example of an Al method which design to a certain extent is aligned w ith hypothesis-driven Al methodologies with respect to organization of cells in tumor microenvironment (e.g.. tumor microenvironment).
  • the underlying hypothesis (H) is: “Activities of spatial predictive genes can modulate cells’ spatial distribution” (represented in FIG. 12 by reference numeral 1205); and the algorithm tool (T) is: SPIN-AI (Deep Neural Net) (represented in FIG. 12 by reference numeral 1210).
  • SPIN-AI Spatially Informed Al
  • SPGs can be distinct from SVGs, and their activities can help dictate how cells are spatially arranged in cellular niches. Further this study proposes that SPGs can be viewed as new actionable targets in cancer treatment. However, there are still a number of areas not yet explored. For example, how the genetic heterogeneity of cancer cells affects the number and identities of SPGs. Another area to explore is whether SPGs also inform drug response phenotypes.
  • FIG. 13 illustrates key components and areas of application in the design of the hypothesis-driven Al technology disclosed herein according to some configurations.
  • Manifold epigenetics may refer to the study that concerns with the totality of molecular, cellular, and environmental systems- based mechanisms that confer body-wide phenotypic memories without altering DNA sequences. Guided by this concept, it is possible to devise hypothesis-driven Al algorithm analogous to P-Net, DCell, and ANNE by incorporating multi-layered epigenetic regulation into consideration.
  • GUM Gene Utility Model
  • PPI protein-protein interaction
  • Cancer can also be described as a type of chronic disease, like metabolic syndromes and neurodegenerative diseases, in the sense that many patients require long term and complex care.
  • chronic diseases often involve multiple organs with high comorbidity 7 rate with other diseases, it is important to expand the perspective of cancer etiology by dissecting modulation by multi-organ functions.
  • kidney injury has been linked to cachexia and dysfunctional interorgan crosstalk can also induce pathological systemic niches and contribute to disease progression, including cancer metastasis.
  • Locked-State Model states that positive feedback loops sustaining inter-organ communication can help build memory -like properties that “lock” healthy and disease states.
  • LoSM provides a new conceptual framework to describe how memory-like inter-organ communication can contribute to disease etiology and how therapeutic intervention on pathological organ crosstalk can provide pharmacological benefits to patients.
  • the concept of LoSM therefore invites the design of Al algorithm byhypotheses that incorporate not only cell-cell but also organ-organ communication.
  • Drugs are often not ideal “magic bullets” that only target a single molecule. Rather, drugs are promiscuous and act on multiple targets albeit with different binding specificity and often lead to off-target effects. Hence, the action of drugs are often multifaceted and multidimensional involving gene products that participate in diverse biological pathways and even different organs. For instance, altering the microbiome can affect the sensitivity to immune checkpoint inhibitors. Therefore, these factors may be taken into account in devising hypothesis-driven Al for drug discovery' pipeline.
  • Manifold Medicine built upon five bodywide vectorial axes (genetic, molecular network, internal environment, neural-immune- endocrine, microbiota) and outlined the manifoldness nature of mode-of-action of drugs into target modes (subject), regimen modes (predicate), and patient modes (modifier), and illustrate how a manifold treatment, combining drugs with different modes of action, can counteract the vectorial tendencies of diseases.
  • This conceptual framework can be incorporated in formulating hypotheses in Al algorithm design to uncover how different types of body axes and the mode- of-action of a drug interact.
  • Generative Al (GAI) also presents a cutting-edge opportunity' to revolutionize cancer research.
  • Generative Adversarial Networks can be employed to simulate realistic biological data including genomic and imaging data, which is particularly valuable when access to large, diverse datasets is limited.
  • generative Al can be used to generate synthetic patient cohorts. By combining these synthetic cohorts to real-world data, the robustness and generalizability’ of current predictive Al models can be increased.
  • GANs can be trained on existing genomic and imaging data to generate synthetic samples that mimic the characteristics of real cancer data.
  • generative Al can be utilized to simulate the evolution of cancer over time, offering insights into dynamic changes in tumor biology' and aiding in the development of more adaptive and personalized treatment strategies. Gene-gene pairs that flip in their activities can modulate drug response phenotypes.
  • GAI has the power to encapsulate such relationships and advance the understanding of how phenotypic behaviors are modulated in cancer cells.
  • a future Al tool can be designed as a software to use not only multi-omics, but also medical imaging data, liquid biopsy omics, physiological data, and patient self-reported outcomes to track patient outcomes and thus inform on patients’ responses toward future treatment. With more comprehensive data being integrated, Al can be more accurate in predicting the next step in tumor progression across different patients.
  • an Al platform that integrates knowledge from diverse sources, including scientific literature, clinical trial data, and molecular databases, to identify novel drug repurposing opportunities for specific cancer types may be developed. By constructing a dynamic knowledge graph to represent intricate relationships among drugs, diseases, molecular targets, and biological pathways, this new Al model can utilize pattern recognition to uncover hidden associations between drugs and signaling pathways.
  • the platform should also incorporate real-world patient data from electronic health records for validation and personalized patient stratification. Through this platform, tumor targeting might be more efficient and allow for advancements in drug repurposing.
  • hypothesis-driven Al provides a new class of Al algorithms called hypothesis-driven Al which may overcome a variety of challenges in current oncology research, as illustrated through the examples described herein.
  • the design of learning algorithms of hypothesis-driven Al involve ingenuity, creativity, innovation, and domain knowledge from researchers.
  • Such special features enable researchers to formulate hypothetical modes of gene-gene associations that underpin cancer etiology and develop learning algorithm to perform “Al-based” thought experiments to validate the proposed hypotheses. This will allow novel modes of gene-gene associations overlooked by conventional Al be discovered.
  • hypothesis-driven Al offers a targeted and informed way to address issues ranging from tumor detection to drug targeting. Since the design of the Al algorithms disclosed herein is driven by hypotheses, researchers can make use of the synergy 7 between domain knowledge, computational models, and experimental validation to obtain a more profound understanding of cancer biology 7 and to develop more effective cancer treatments in the future.
  • the hypothesis-driven Al technology disclosed herein describes a framework in which the scientific or medical hypotheses are central to algorithm development for the proposed and new kind of Al.
  • scientific hypothesis is a core component of the learning algorithm, such that the hypothesis may be embedded within the algorithm (as opposed to being an external component of the algorithm itself).
  • the scientific hypotheses may be formulated or generated by' a user (e.g., a scientist).
  • the hypothesis-Driven Al framework disclosed herein may include training procedures on crafted architecture of the deep neural networks that are specifically tailored to incorporate a hypothesis component.
  • the technology disclosed herein provides expert domain knowledge in combination with Hypothesis-Driven Al architecture that may be utilized to address a key and novel hypothesis.
  • generative Al can aid data generation as part for in silico Al hypothesis testing.
  • the technology disclosed herein may facilitate the extraction of novel knowledge from previously described Al methods, which continue to be seen as “black box” models.
  • ‘‘hypothesis'’ (or “hypotheses’”) generally refers to scientific hypothesis or hypotheses. Accordingly, as used herein, a hypothesis (or hypotheses) refers to a proposition that provides a mechanistic or operational framework to explain how a system (e.g., disease development or a drug response phenotype) behaves.
  • hypothesis e.g., scientific hypothesis
  • hypothesis testing used in other fields, such as, e.g., the field of statistics (e.g., concerning testing possible combinations).
  • Hypothesis testing utilized in other fields can be categorized into statistics, symbolic regression, and latent knowledge embedded in text, which methodologies have been previously described and themselves fail to include testing of an unknown and yet novel hypothesis.
  • all possible combinations of data attributes represent the space search for the null hypothesis and testing for statistical significance.
  • Al-aided screening to find new types of candidates rather than finding novel hypotheses unknown to researchers (as provided by the technology disclosed herein).
  • regression methods e.g.
  • FIG. 14 schematically illustrates a system 1400 for implementing hypothesis- driven Al according to some configurations.
  • the system 1400 includes a server 1405, a database 1415, and a user device 1420.
  • the system 1400 includes fewer, additional, or different components than illustrated in FIG. 14.
  • the system 1400 may include multiple servers 1405, multiple databases 1415, multiple user devices 1420, or a combination thereof.
  • the database 1415 may be included in the server 1405 and one or both of the database 1415 and the server 1405 may be distributed among multiple databases or servers.
  • the database 1415, the server 1405, and the user device 1420 may be combined into a single device.
  • the server 1405, the database 1415, and the user device 1420 communicate over one or more wired or wireless communication networks 1430. Portions of the communication networks 1430 may be implemented using a wide area network, such as the Internet, a local area netw ork, such as BluetoothTM network or Wi-Fi, and combinations or derivatives thereof. In some embodiments, additional communication networks may be used to allow one or more components of the system 1400 to communicate. Also, in some embodiments, components of the system 1400 may communicate directly as compared to through a communication network 1430 and, in some embodiments, the components of the system 1400 may communicate through one or more intermediary devices not shown in FIG. 14.
  • the server 1405 may include a computing device. As illustrated in FIG. 15, the server 1405 includes an electronic processor 1500, a memory 1505, and a communication interface 1515. The electronic processor 1500, the memory 1505, and the communication interface 1515 communicate through wired connections and/or wirelessly, over one or more communication lines or buses, or a combination thereof.
  • the server 1405 may include additional components than those illustrated in FIG. 15 in various configurations.
  • the server 1405 may also include one or more human machine interfaces, such as a keyboard, keypad, mouse, joystick, touchscreen, display device, printer, microphone, speaker, and the like, that receive input from a user, provide output to a user, or a combination thereof.
  • the server 1405 may also perform additional functionality other than the functionality described herein.
  • the functionality (or a portion thereof) described herein as being performed by the server 1405 may be distributed among multiple servers or devices (for example, as part of a cloud service or cloud-computing environment), may be performed by one or more user devices 1420, or a combination thereof.
  • the communication interface 1515 allows the server 1405 to communicate with devices external to the server 1405.
  • the server 1405 may communicate with the database 1415, the user device 1420, or a combination thereof through the communication interface 1515.
  • the communication interface 1515 may include a port for receiving a wired connection to an external device (for example, a universal serial bus (“USB”) cable and the like), a transceiver for establishing a wireless connection to an external device (for example, over one or more communication networks 1430, such as the Internet, local area network (“LAN’'), a wide area network (“WAN”), and the like), or a combination thereof.
  • USB universal serial bus
  • the electronic processor 1500 is configured to access and execute computer- readable instructions (“software”) stored in the memory 1505.
  • the software may include firmware, one or more applications, program data, filters, rules, one or more program modules, and other executable instructions.
  • the software may include instructions and associated data for performing a set of functions, including the methods described herein.
  • the memory 1505 may store a learning engine 1525 and a model database 1530.
  • the learning engine 1525 develops one or more motion artifact correction models using one or more machine learning functions.
  • Machine learning functions are generally functions that allow a computer application to leam without being explicitly programmed.
  • the learning engine 1525 is configured to develop an algorithm or model based on training data.
  • the training data includes example inputs and corresponding desired (for example, actual) outputs, and the learning engine progressively develops a model (for example, a classification model) that maps inputs to the outputs included in the training data.
  • Machine learning performed by the learning engine 1525 may be performed using various ty pes of methods and mechanisms including but not limited to decision tree learning, association rule learning, artificial neural networks, inductive logic programming, support vector machines, clustering. Bayesian networks, reinforcement learning, representation learning, similarity and metric learning, sparse dictionary learning, and genetic algorithms. These approaches allow the learning engine 1525 to ingest, parse, and understand data and progressively refine models for data analytics.
  • the models generated by the learning engine 1525 may include one or more of the hypothesis-driven Al models described in greater detail herein.
  • the hypothesis-driven Al model(s) generated by the learning engine 1525 may be designed (or specifically trained to) perform various detection, classification, prediction, etc. functionality.
  • the hypothesis-driven Al model(s) generated by the learning engine 1525 may perform (or otherwise provide) tumor detection, cancer state determination, cancer genome detection or classification, drug response prediction, gene-gene non-linear associations, phenotype prediction, tumor microenvironment detection or classification, hypothesis testing, etc.
  • the learning engine 1525 may generate the hypothesis-driven Al model(s) using one or more hypotheses 1550.
  • the hypotheses 1550 may be scientific or medical hypotheses.
  • the hypotheses 1550 may originate from or be generated by a user (e.g., a scientist).
  • the hypotheses 1550 may be transmitted to the server 1405 from the user device 1420 over the communication network 1430.
  • the learning engine 1525 may utilized one or more of the hypotheses 1550 as training data for the hypothesis-driven Al model(s).
  • one or more of the hypotheses 1550 may be embedded in the hypothesis-driven Al model(s) (or algorithm(s) thereof). While FIG. 15 illustrates the hypotheses 1550 being stored locally to the server 1405 (e.g., in the memory 1115), the hypotheses 1550 (or a portion thereof) may be stored via another device, such as, e.g., the user device 1420, the database 1415, etc., such that the hypotheses 1550 may be accessible to the server 1405 (e.g., the learning engine 1525).
  • the learning engine 1525 may generate the hypothesis-driven Al model(s) using training data, which may be stored in the database 1415.
  • the database 1415 may store hypothesis(es) 1550. omics data 1555, domain knowledge 1560. and clinical data 1565. each of which is described in greater detail herein.
  • the training data utilized by the learning engine 1525 may be specifically tailored such that the training dataset reflects the hypothesis testing.
  • the training data may include specialized domain knowledge data that is selected such that hypothesis testing is enabled.
  • the hypothesis-driven Al model(s) generated by the learning engine 1525 may have an architecture of (or otherwise may be) a deep neural network.
  • the hypothesis-driven Al model(s) described herein may be implemented such that specialized domain knowledge data is carefully selected to allow for hypothesis testing, as described in greater detail herein.
  • the hypothesis-driven Al models generated by the learning engine 1525 are stored in the model database 1530.
  • the model database 1530 is included in the memory 1505 of the server 1405. However, in some embodiments, the model database 1530 is included in a separate device accessible by the server 1405 (included in the server 1405 or external to the server 1405).
  • the memory' 1505 includes a hypothesis-driven Al application 1590 (referred to herein as "the application 1590”).
  • the application 1590 is a software application executable by the electronic processor 1500.
  • the electronic processor 1500 executes the application 1590 to perform hypothesis- driven Al analyses, including, e.g., tumor detection, cancer state determination, cancer genome detection or classification, drug response prediction, gene-gene non-linear associations, phenotype prediction, tumor microenvironment detection or classification, hypothesis testing, etc., as described in greater detail herein.
  • the application 1590 may be utilized as part of the design and training process for the hypothesis-driven Al technology' disclosed herein.
  • the application 1590 may interact with the training data (e.g., the hypotheses 1550, the omics data 1555, the domain knowledge 1560, the clinical data 1565, etc.), the learning engine 1525, or a combination thereof as part of generating the hypothesis-driven Al model(s) stored in the model database 1530.
  • the training data e.g., the hypotheses 1550, the omics data 1555, the domain knowledge 1560, the clinical data 1565, etc.
  • the learning engine 1525 e.g., the training data stored in the model database 1530.
  • the database 1415 may include a computing device, such as a server, a database, or the like. As illustrated in FIG. 14, and as noted above, the database 1415 may store the omics data 1555, the domain knowledge 1560, the clinical data 1565, or a combination thereof. In some configurations, the database 1415 may store additional, fewer, or different components or data then illustrated in FIG. 14 in various configurations.
  • the user device 1420 may also include a computing device, such as a desktop computer, a laptop computer, a tablet computer, a terminal, a smart telephone, a smart television, a smart wearable, or another suitable computing device that interfaces with a user.
  • the user device 1420 may include similar components as the server 1405, such as electronic processor (e.g., a microprocessor, an application-specific integrated circuit (ASIC), or another suitable electronic device), a memory (e.g., a non- transitory, computer-readable storage medium), a communication interface, such as a transceiver, for communicating over the communication network 1430 and, optionally, one or more additional communication networks or connections, and one or more human machine interfaces.
  • electronic processor e.g., a microprocessor, an application-specific integrated circuit (ASIC), or another suitable electronic device
  • ASIC application-specific integrated circuit
  • a memory e.g., a non- transitory, computer-readable storage medium
  • a communication interface such as a trans
  • the user device 1420 may store a browser application or a dedicated software application executable by an electronic processor.
  • the user device 1420 includes additional, fewer, or different components than the server 1405.
  • the user device 1420 includes a display device, such as a screen, monitor, or touchscreen.
  • the user device 1420 also includes an input mechanism, such as a keyboard or keypad, one or more buttons, a microphone, or the like.
  • the touchscreen may function as an input device but, in some embodiments, the user device 1420 also includes one or more additional input devices.
  • the system 1400 is described herein as providing a hypothesis-driven Al service through the server 1405.
  • the functionality described herein as being performed by the server 1405 may be locally performed by the user device 1420.
  • the user device 1420 may store the learning engine 1525. the model database 1530, the application 1590, the hypotheses 1550, another component of the system 1400, or a combination thereof.
  • the user device 1420 may be used by an end user to perform hypothesis-driven Al analysis related to oncology research and treatment, including, e.g.. tumor detection, cancer state determination, cancer genome detection or classification, drug response prediction, gene- gene non-linear associations, phenoty pe prediction, tumor microenvironment detection or classification, hypothesis testing, etc., as described in greater detail herein.
  • the user device 1420 may be used by an end user to design and train hypothesis-driven Al model(s), as described in greater detail herein.
  • FIG. 16 is a flowchart illustrating a method 1600 for providing hypothesis-driven Al for disease (e.g., oncology) research and treatment by the system 1400 according to some embodiments.
  • the method 1600 is described as being performed by the server 1405 and, in particular, the application 1590 as executed by the electronic processor 1500.
  • the functionality described with respect to the method 1600 may be performed by other devices, such as the user device 1420, or distributed among a plurality of devices, such as a plurality of servers included in a cloud service.
  • the method 1600 may be described with respect to oncology (or cancer) as one example.
  • the technology 7 disclosed herein including, e.g., the method 1600
  • the method 1600 includes receiving a request to perform a disease analysis (at block 1605).
  • the electronic processor 1500 may receive the request from the user device 1420 via the communication network 1430.
  • a user may interact with the user device 1420 by submitting the request to perform a disease analysis.
  • the disease analysis may be an oncology analysis.
  • a disease analysis may include, e.g., a disease detection analysis (e.g..).
  • a tumor detection analysis e.g., a disease classification analysis (e.g., a tumor classification analysis), a patient stratification analysis, a disease gene discovery 7 analysis (e.g., a cancer gene discovery analysis), a spatial organization analysis (e.g., a tumor spatial organization analysis), a phenotype state determination analysis, a genome analysis (e.g., a cancer genome analysis), a drug response prediction analysis, a gene-gene non-linear associations analysis, a phenotype prediction analysis, a disease microenvironment analysis (e.g., a tumor microenvironment analysis), etc., as described in greater detail herein.
  • the disease analysis may include hypothesis testing (or validation).
  • the request may identify one or more hypotheses to be tested (e.g., one or more scientific or medical hypotheses).
  • the one or more hypotheses may 7 be provided by a user utilizing the user device 1420 (e.g., may be drafted or originate from a user, such as a scientist).
  • the request to perform the medical analysis may be a request to determine a validity of a (proposed) hypothesis (e.g., determine whether a hypothesis is valid or not valid).
  • the electronic processor 1500 may retrieve a hypothesis-driven Al model for performing the disease analysis (at block 1610).
  • the electronic processor 1500 may retrieve the hypothesis-driven Al model from the model database 1530.
  • the electronic processor 1500 may retrieve the hypothesis-driven Al model from another storage location or device.
  • the electronic processor 1500 may identify the hypothesis-driven Al model based on the request, including, e.g., the disease analysis being requested.
  • the hypothesis-driven Al model(s) may be specifically trained to perform a specific type of analysis, perform an analysis based on a specific set of domain knowledge, etc., as described in greater detail herein.
  • the electronic processor 1500 may then execute the disease analysis using the hypothesis-driven Al model (at block 1615).
  • the electronic processor 1500 may execute the disease analysis as described in greater detail herein.
  • the electronic processor 1500 may access additional data or information as part of executing the disease analysis using the hypothesis-driven Al model. For instance, in some configurations, the electronic processor 1500 may access the contents of the database 1415, including, e.g., the hypothesis(es) 1550. the omics data 1555. the domain knowledge 1560. clinical data 1565. etc. The electronic processor 1500 may provide the additional data or information as input data to the hypothesis-driven Al model. Alternatively, or in addition, in some configurations, the electronic processor 1500 may input a proposed hypothesis as input data to the hypothesis- driven Al model (e.g., as part of a hypothesis testing analysis).
  • the electronic processor 1500 may access patient specific data, where the disease analysis is specific to a particular patient.
  • the patient specific data may be accessible from the database 1415 (e.g., included as part of the clinical data 1565, etc.) and/or provided by a user of the user device 1020 as part of the request (e.g., received at block 1605).
  • the electronic processor 1500 may generate and transmit a report indicating a result of the disease analysis (at block 1620).
  • the report may indicate a validity of a hypothesis identified in the request.
  • the report may information describing associations between data attributes utilized associated with the execution of the disease analysis. For instance, the report may indicate how data attributes (or features) associate with each other, what their biological meaning is, etc.
  • the report may include (or otherwise provide) insights into how new knowledge may be generated to guide decisions, such as, e.g., clinical decisions.
  • the report may include a mechanistic explanation or interpretation of the results of the disease analysis.
  • the report may include a testable hypothesis.
  • the output of the hypothesis-driven Al model(s) described herein may include any of the outputs or information as described herein, and the content of the reports generated as part of the method 1600 are mere examples and are not limiting.
  • the method 1600 may include the design and training process of the hypothesis-driven Al model, as described in greater detail herein.
  • the electronic processor 1500 may utilize the model database 1530, the learning engine 1525, etc. as part of the design and training process of the hypothesis-drive Al technology disclosed herein (e.g., as described herein with respect to at least FIGS. 3-5).
  • the method 1600 may include generating a training process (or a hypothesis-driven Al training plan) (e.g., as described herein with respect to at least FIGS. 3-5).
  • the HD- AI model may be trained (as described herein) in accordance with the training process.
  • the trained HD-AI model may then execute one or more Al tasks, as described in greater detail herein.
  • performance of the HD-AI model may be assessed (e.g., an assessment of the execution of the Al task(s) may be performed).
  • the training process for the HD-AI model may be modified, as described in greater detail herein (e.g., with respect to FIG. 3).
  • This paradigm shift in Al can be a driving force for scientists to creatively incorporate scientific hypotheses into the design of Al components.
  • Adopting open science practices in HD-AI research may help make research data and analysis pipelines freely available to the scientific community, including, e.g., the final research findings, the raw data, and the code used to analyze the data, such that the data may be shared and can be tested openly.
  • researchers enable others to replicate their experiments, verily their results, build upon their work and may promote transparency and accountability in scientific research.
  • By verifying the reproducibility of HD-AI findings researchers can build confidence in the reliability of Al models as scientific tools.
  • HD-AI as a framework for generating and testing has the potential to usher in a new era of “Al for Science’ 7 and drive new discoveries.
  • Large language models (LLMs) and generalist Al can be integrated into HD-AI to enhance interactions between human scientists and machines.
  • HD-AI may be deployed to generate a summary of the status of hypothesis testing process and provide suggestions related to making hypothesis adjustments before proceeding with experimental validation.
  • the technology disclosed herein may be implemented as multi-modal generalist HD-AI to enable parallel hypothesis testing for multiple scientific hypotheses that can explain the working of multiscale multimodal systems, such as, e.g., earth climate or multi-organ interactions in the human body.
  • FIG. 17 is an example diagram 1700 illustrating a workflow related to Al-powered hypothesis testing and scientific discoveries.
  • Existing domain knowledge and discoveries, enhanced by from various forms of Al such as large language models (LLMs), generalist Al, etc.
  • LLMs large language models
  • scientific hypotheses can be integrated to craft Al learning models in HD-AI.
  • This workflow fosters human-machine interactions that encourages creativity from human scientists, ultimately driving the emergence of new scientific fields.
  • the technology described herein relates to implementing HD-AI such that a hypothesis is integrated or implemented in a learning model, as illustrated in FIG. 18.
  • the architecture of deep neural networks in HD-AI may be highly modular, which may further distinguish HD-AI from conventional AL such as, e.g., in the context of deep learning.
  • HD-AI layers can be developed and designed based on the incorporated domain knowledge and the selected mathematical structure that effectively captures the scientific or medical hypotheses to be tested.
  • a number of nodes and hidden layers of the HD-AI can be adjusted to test the validity of specific components within the selected hypotheses.
  • an output layer of HD-AI can be flexibly designed to recapitulate the anticipated output formats (e.g., as specified by a user), which may include, e.g., a class of diseases, spatial cells distribution, predicted values of genes or pathway activities, etc.
  • FIG. 19 illustrates an example diagram 1900 of implementing Al solutions to validate working scientific or medical hypotheses as described herein.
  • knowledge learned by Al may be extracted (e.g., as domain knowledge) (represented in FIG. 19 by reference numeral 1905).
  • the domain knowledge may be provided to a deep neural network (represented in FIG. 19 by reference numeral 1910).
  • Weight engineering may be applied to extract association(s) between attributes (represented in FIG. 19 by reference numeral 1915), which may be utilized to generate a reverse engineered graph (represented in FIG. 19 by reference numeral 1920).
  • Working hypothesis from pre-trained Al systems may be incorporated (represented in FIG. 19 by reference numeral 1925). In silico surgery' neuron engineering may then be performed to re-train learned concepts (represented in FIG. 19 by reference numeral 1930).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Public Health (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne des systèmes et des procédés pour fournir une intelligence artificielle fondée sur des hypothèses (HD-AI), par exemple pour la recherche oncologique. Un procédé peut comprendre la réception d'une demande de réalisation d'une analyse de maladie. Le procédé peut comprendre la réalisation de l'analyse de maladie demandée à l'aide d'un modèle HD-AI entraîné à l'aide de connaissances de domaine et d'une ou de plusieurs hypothèses. Le procédé peut comprendre la génération et la transmission d'un rapport sur la base de l'analyse de maladie, le rapport pouvant comprendre une explication mécanistique ou une interprétation du résultat de l'analyse de maladie.
PCT/US2025/013168 2024-01-26 2025-01-27 Ia fondée sur des hypothèses Pending WO2025160535A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202463625637P 2024-01-26 2024-01-26
US63/625,637 2024-01-26

Publications (1)

Publication Number Publication Date
WO2025160535A1 true WO2025160535A1 (fr) 2025-07-31

Family

ID=94768735

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2025/013168 Pending WO2025160535A1 (fr) 2024-01-26 2025-01-27 Ia fondée sur des hypothèses

Country Status (1)

Country Link
WO (1) WO2025160535A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200411199A1 (en) * 2018-01-22 2020-12-31 Cancer Commons Platforms for conducting virtual trials
US20220261668A1 (en) * 2021-02-12 2022-08-18 Tempus Labs, Inc. Artificial intelligence engine for directed hypothesis generation and ranking
US20230070131A1 (en) * 2021-09-08 2023-03-09 Georgetown University Generating and testing hypotheses and updating a predictive model of pandemic infections

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200411199A1 (en) * 2018-01-22 2020-12-31 Cancer Commons Platforms for conducting virtual trials
US20220261668A1 (en) * 2021-02-12 2022-08-18 Tempus Labs, Inc. Artificial intelligence engine for directed hypothesis generation and ranking
US20230070131A1 (en) * 2021-09-08 2023-03-09 Georgetown University Generating and testing hypotheses and updating a predictive model of pandemic infections

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ELMARAKEBY ET AL.: "recently developed a biologically informed deep neural network", CALLED P-NET, 2021
MA ET AL.: "devised an ingenious deep learning algorithm", CALLED DCELL, 2018
MOON ET AL., ONCOLOGY NGS-BASED PRIMARY CANCER-TYPE CLASSIFIER (ONCONPC, 2023
SATHISH KUMAR L. ET AL: "Artificial intelligence based health indicator extraction and disease symptoms identification using medical hypothesis models", CLUSTER COMPUTING, vol. 26, no. 4, 23 August 2022 (2022-08-23), NL, pages 2325 - 2337, XP093272946, ISSN: 1386-7857, Retrieved from the Internet <URL:https://link.springer.com/article/10.1007/s10586-022-03697-x/fulltext.html> [retrieved on 20250425], DOI: 10.1007/s10586-022-03697-x *
XIANYU ZILIN ET AL: "The Rise of Hypothesis-Driven Artificial Intelligence in Oncology", CANCERS, vol. 16, no. 4, 18 February 2024 (2024-02-18), CH, pages 822, XP093272948, ISSN: 2072-6694, DOI: 10.3390/cancers16040822 *

Similar Documents

Publication Publication Date Title
Amiri et al. The deep learning applications in IoT-based bio-and medical informatics: a systematic literature review
Muzio et al. Biological network analysis with deep learning
Kleinstreuer et al. Artificial intelligence (AI)—it’s the end of the tox as we know it (and I feel fine)
Sethi et al. Long short-term memory-deep belief network-based gene expression data analysis for prostate cancer detection and classification
Yousef et al. Deep learning in bioinformatics
US6768982B1 (en) Method and system for creating and using knowledge patterns
Sinha et al. A review on the recent applications of deep learning in predictive drug toxicological studies
Bianchini et al. Deep learning in science
Houssein et al. Soft computing techniques for biomedical data analysis: open issues and challenges
Sarvepalli et al. Role of artificial intelligence in cancer drug discovery and development
Khan et al. Artificial intelligence for intelligent systems: Fundamentals, challenges, and applications
Chandak et al. Trends and Advancements of AI and XAI in Drug Discovery
Flöther et al. How quantum computing can enhance biomarker discovery
Kaur et al. Precision medicine with data-driven approaches: A framework for clinical translation
Li et al. Tuberculous pleural effusion prediction using ant colony optimizer with grade-based search assisted support vector machine
Ghosh et al. Machine Learning in Biological Sciences
Levy et al. Artificial intelligence, bioinformatics, and pathology: emerging trends part I—an introduction to machine learning technologies
Diaa et al. Machine Learning and Traditional Statistics Integrative Approaches for Bioinformatics
Hamadani et al. A Biologist's Guide to Artificial Intelligence: Building the Foundations of Artificial Intelligence and Machine Learning for Achieving Advancements in Life Sciences
WO2025160535A1 (fr) Ia fondée sur des hypothèses
Shukla et al. Optimized breast cancer diagnosis using self-adaptive quantum metaheuristic feature selection
Flöther et al. How quantum computing can enhance biomarker discovery for multi-factorial diseases
Salem et al. Wrapper-based modified binary particle swarm optimization for dimensionality reduction in big gene expression data analytics
Tian et al. Explainable quantum neural networks: example-based and feature-based methods
Kannagi et al. Data mining techniques in bioinformatics enhanced by HPC and AI convergence

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25707844

Country of ref document: EP

Kind code of ref document: A1