[go: up one dir, main page]

WO2025173014A1 - A method for selecting ovarian cancer patients for parp inhibitor treatment - Google Patents

A method for selecting ovarian cancer patients for parp inhibitor treatment

Info

Publication number
WO2025173014A1
WO2025173014A1 PCT/IN2024/050436 IN2024050436W WO2025173014A1 WO 2025173014 A1 WO2025173014 A1 WO 2025173014A1 IN 2024050436 W IN2024050436 W IN 2024050436W WO 2025173014 A1 WO2025173014 A1 WO 2025173014A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
readable medium
computer readable
whole slide
parp inhibitor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/IN2024/050436
Other languages
French (fr)
Inventor
Mohan UTTARWAR
Gowhar SHAFI
P M Shivamurthy
Anand Raj ULLE
Chongtham Cha CHINGLEMBA
Jayant Jagannath Khandare
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Onecell Diagnostics India Private Ltd
Onecell Diagnostics Inc
Original Assignee
Onecell Diagnostics India Private Ltd
Onecell Diagnostics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Onecell Diagnostics India Private Ltd, Onecell Diagnostics Inc filed Critical Onecell Diagnostics India Private Ltd
Publication of WO2025173014A1 publication Critical patent/WO2025173014A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/20ICT specially adapted for the handling or processing of medical images for handling medical images, e.g. DICOM, HL7 or PACS
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Definitions

  • the present disclosure relates to a method for the selection of ovarian cancer patients for treatment with PARP inhibitors.
  • the method uses artificial intelligence and machine learning aided analysis of histopathology images of ovarian cancer patients. More particularly, it relates to extracting the morphological and architectural features from hematoxylin and eosin (H and E) stained histopathological image of the tissue of the ovarian cancer patient, relating the said extracted morphological and architectural features to the images in the database consisting of morphological and architectural features of H and E stained histopathological images of the tissues of the ovarian cancer patients wherein cancerous tissue showed the presence of homologous recombination deficiency / homologous recombination proficiency as evidenced by next generation sequencing and quantifying a probability score to select the patient for PARP inhibitor treatment.
  • the PARP inhibitor treatment is administered to the selected patient.
  • PARP Poly (ADP-ribose) polymerase
  • CNNs Convolutional Neural Networks
  • CNNs have achieved great success in various computer vision tasks such as image classification, object detection, segmentation.
  • CNNs automatically learn discriminative features from raw pixels directly without relying on manual feature engineering.
  • CNNs employ convolutional filters that are inspired by the visual cortex of animals to extract features at multiple levels of abstraction.
  • CNNs have revolutionized several applications including face recognition, autonomous driving, and medical image analysis.
  • face recognition autonomous driving
  • medical image analysis The adoption of CNNs is primarily driven by its weight-sharing feature, a key advantage that reduces the number of trainable parameters.
  • the concurrent learning of feature extraction layers and classification layers in CNNs contributes to a highly organized model output, fostering effective learning by relying on extracted features.
  • ResNet was introduced to facilitate training deeper neural networks, redefining layers as learning residual functions relative to their inputs. Empirical evidence demonstrates the improved optimization and accuracy of these residual networks, reaching depths of up to 152 layers on ImageNet with lower complexity than previous models using residual connections. Residual connections facilitate smoother flow of gradients during back-propagation, enabling effective training of deeper networks. Moreover, residual connections aid in mitigating the risk of the exploding gradient problem. In situations where gradients become extremely large during training, the network may experience instability. The short-cut connections provide an alternative path for gradient flow, preventing its explosion and contributing to the overall stability of the training process. Additionally, residual connections promote the learning of residual mappings, capturing the fine-grained details or nuances in the data.
  • the model is trained on the H and E-stained whole slide image (WSI) selected from a database, wherein there is a balance of HRD and HRP WSIs.
  • designing of pre-trained convolutional neural network involves designing the multiple layers of the neurons, which represent the dot product of input pixels with the predefined weights.
  • the pre-trained convolutional neural network (CNN) model is selected from ResNet-34, ResNet-50 and ResNet-101.
  • the weights of the CNN architecture are further optimized by tuning several hyper-parameters like learning rate, batch size, loss function and drop-outs to perform the initial training.
  • the CNN based architecture is retrained on the tiles generated with the labels obtained from the Next-Generation Sequencing (NGS) as ground truth using the weights obtained from initial training.
  • NGS Next-Generation Sequencing
  • the training process is continued until the loss value is in the range 0.0001-0.0002.
  • the model is trained to recognize the higher order features specific to the tissue microenvironments like nuclei, stroma, tumor cells etc.
  • the model so trained is validated on the H and E-stained whole slide image (WSI) selected from a database, wherein there is a balance of HRD and HRP WSIs.
  • probabilities are computed at each tile level indicating the class to which a tile belongs viz., HRD or HRP.
  • the tile level probabilities are aggregated to select the patient for PARP inhibitor treatment.
  • the selection of patient for PARP inhibitor treatment is based on patient level probability.
  • a final prediction report along with a visualization of probable indication of the predicted biomarker is overlayed on the WSI.
  • this disclosure provides a method of predicting a cancer patient response to PARP inhibitor treatment.
  • the system for selecting an ovarian cancer patient for PARP inhibitor treatment comprises a computer readable medium for data storage.
  • the system for selecting a cancer patient for PARP inhibitor treatment comprises a processor.
  • the processor has a) a dual microprocessor and b) multi-processor architectures.
  • the processor is a CPU (Central processing unit).
  • the processor is a CPU and a GPU (Graphic processing unit). According to an embodiment of the disclosure, the processor is a CPU and a TPU (Tensor processing unit).
  • image acquisition system comprises a whole slide digital scanner, connected to a processor and a computer readable medium to acquire the digitized formats of the whole glass slides of the H and E- stained tissue samples and store them into an image data format in a computer readable medium.
  • the disclosure provides a) a system comprising a computer readable medium for data storage, b) a processor for conducting the quality check on a whole slide image, generating tiles from the whole slide image passing the quality check , c) a device to visualize the results of the method for selecting the said patient for PARP inhibitor treatment, and d) an interface to execute transfer of the data between the processor, the computer readable medium and the device specified in c).
  • this disclosure provides a system that comprises a scanner that creates whole slide image from the tissue biopsy samples containing cancer cells, core needle biopsy samples containing cancer cells taken using fine needle aspirates as well as samples obtained by using relevant techniques.
  • digitized H and E-stained whole slide image is represented in a pyramidal structure of different zoom levels, representing the slide at different zoom levels.
  • the various parts of the image acquisition apparatus are connected over internet, which comprises of communication networks such as WAN / LAN, devices such as gateways / routers / switches / bridges and communication protocols such as TCP/IP.
  • communication networks such as WAN / LAN
  • devices such as gateways / routers / switches / bridges
  • communication protocols such as TCP/IP.
  • this disclosure provides a system that comprises an output module configured to display a commendation for the use of the PARP inhibitor-treatment regimen.
  • the system for selecting a cancer patient for PARP inhibitor treatment comprises the processor, a CPU and GPU; a computer readable medium to store digitized H and E-stained whole slide image and a set of circuits for communication and processing set of instructions stored.
  • digitized H and E-stained whole slide image which are stored as a record in a computer readable medium, are generated from an apparatus for scanning the whole glass slides of the H and E- stained tissue samples.
  • the digitized H and E-stained whole slide image which are stored as a record in a computer readable medium, are generated from an apparatus for scanning the whole glass slides of the H and E- stained tissue samples; further, the apparatus is connected to a computer readable medium, where the digitized H and E-stained whole slide images are stored.
  • Computer-readable medium refers to a medium that stores instructions and /or data.
  • a computer-readable medium may take forms, including, but not limited to, non-volatile media, and volatile media.
  • Non-volatile media may include, for example, optical disks, magnetic disks, tapes, and other media.
  • Volatile media may include, for example, semiconductor memories, dynamic memory, and other media.
  • a computer -readable medium may include, but are not limited to, a floppy disk, a flexible disk, a hard disk, a magnetic tape, other magnetic medium, an application specific integrated circuit (ASIC), a compact disk (CD), other optical medium, a random-access memory (RAM), a read only memory (ROM), a memory chip or card, a memory stick, and other media from which a computer, a processor or other electronic device can read.
  • ASIC application specific integrated circuit
  • CD compact disk
  • RAM random-access memory
  • ROM read only memory
  • memory chip or card a memory stick, and other media from which a computer, a processor or other electronic device can read.
  • Figure 1 Al model -based patient selection for PARP inhibitor treatment
  • Figure 2 Tissue slide preparation
  • Figure 3 Pyramidal structure of the WSI
  • Figure 11 Al- based model development
  • Figure 12 Basic architecture of a CNN
  • Figurel4 Patient selection from tile level prediction
  • Figure 16 Block diagram of a typical computer system to select a patient for PARP inhibitor treatment.
  • the present disclosure relates to a method of selecting an ovarian cancer patient for PARP inhibitor treatment based on the Al assisted analysis of the Hematoxylin and Eosin-stained (H and E) image of the tissue sample of the said cancer patient and the system for implementing the same.
  • the method comprises training an Al model to recognize the morphological and architectural features of the whole slide image of the tissue of the ovarian cancer patient and relate it with the morphological and architectural features of the whole slide image labeled with the NGS characterization of the tissue of the ovarian cancer patient.
  • the said morphological and architectural features are extracted by the deep learning model starting from the primitive pixel level features and moving towards higher level object levels features comprising but not limited to shape, size, color and texture etc.
  • the ResNet-50 model architecture has a depth of 50 layers and innovative design, incorporating both residual blocks and a bottleneck structure.
  • the residual block a fundamental building block that introduces a shortcut connection, allowing the gradient to flow more efficiently during backpropagation.
  • What distinguishes ResNet-50 is the integration of a bottleneck design within each residual block. This design involves a sequence of three convolutional layers: a 1x1 convolution for dimension reduction, a 3x3 convolution for feature extraction, and another 1x1 convolution for dimension restoration.
  • This bottleneck architecture effectively balances computational efficiency and model depth, enhancing the network's ability to capture intricate features while mitigating the computational load.
  • the network is composed of stacked residual blocks, with an initial layer consisting of standard convolutional and pooling layers to downsample the input.
  • An initial layer consisting of standard convolutional and pooling layers to downsample the input.
  • GAP Global Average Pooling
  • the architecture concludes with a final fully connected layer with two neurons for classification, employing softmax activation for outputting class probabilities.
  • pretrained model starts with selecting the training parameters such as the learning rate, optimization, batch size and loss function and other hyperparameters such as dropouts and regularization.
  • the optimization process involves selecting an algorithm to minimize the chosen loss and iteratively update the model's parameters.
  • Adam an adaptive optimization algorithm that combines ideas from momentum and RMSprop, is frequently favored as a part of the model training, due to its effectiveness in handling sparse gradients and noisy data.
  • the learning rate is a critical hyperparameter in the training process of the ResNet- 50 model. It represents the step size at which the optimizer adjusts the model parameters during each iteration of training. Choosing an appropriate learning rate is crucial for achieving efficient convergence and optimal performance. Setting the learning rate too high may lead to overshooting the minimum of the loss function, causing the optimization process to oscillate or diverge. Conversely, setting it too low can result in slow convergence, where the model takes small steps toward the optimal parameters, prolonging the training time. In our case we have used the learning rate as 0.0001.
  • the batch size (1024 tiles) is a crucial hyperparameter in the training process. It defines the number of training samples utilized in one iteration of gradient descent, impacting both the efficiency of the optimization process and the model's ability to generalize to unseen data.
  • the method of selecting an ovarian cancer patient for PARP inhibitor treatment comprises three aspects 1) preparing whole slide images of the tissues 2) training and validating the Al model and 3) using the Al model to select the ovarian cancer patient for PARP inhibitor treatment.
  • the method is described below and illustrated suitably with the figures. The description and the figures do not limit the scope of the invention.
  • tissue is then treated with a solvent such as xylene which renders the tissue transparent.
  • a solvent such as xylene which renders the tissue transparent.
  • the slides are then stained with Hematoxylin and Eosin stains that highlight specific components of the tissue. Finally, a coverslip is placed over the stained tissue section (see figure 2).
  • the whole slide image which is scanned through a scanner at a given magnification is formatted in a pyramidal structure as shown in figure 3. where each level of the pyramid shows different magnification levels. This is formatted into a single file such as, .svs, .tiff, .ndpi and transferred to a computer readable medium (e.g., computer readable medium (101) depicted in figure 1 through a communication network.
  • a computer readable medium e.g., computer readable medium (101) depicted in figure 1 through a communication network.
  • the whole slide scanning is depicted in figure 4.
  • the whole slide image is stored in formats representing multiple zoom level compressed in a single file.
  • Typical image format includes, svs/, .tiff/, .ndpi.
  • the ROI mask is stored in .png format.
  • the metadata which is text data, is stored in either .xlsx/.csv formats.
  • Computer readable medium (601) is used to store all the digitized whole slide images along with the ground truth data obtained from traditional NGS routine and multi-omics data.
  • the publicly available QC routine (“built and modified from Histo QC", Janowczyk A., Zuo R., Gilmore H., Feldman M., Madabhushi A., JCO Clinical Cancer Informatics, (2019)) is used to perform quality check on the WSI and is integrated into the main pipeline as a software subroutine.
  • the protocol is summarized in figure 6.
  • the output of Quality check (QC) system (607) in the form of ROI-masks (region of interest) and the output of tile generation system, (609) which is in the form of the small patches of 256x256 pixel, extracted from the image at a specified magnification of 40x is stored in the computer readable medium.
  • Computer readable medium is a virtual representation of multiple non-stationary computer readable memory storage devices distributed over internet in the form of cloud storage. This enables the user to access the Al -based prediction system (e.g., trained - model 610) at their own convenience and from any location.
  • the above hardware is controlled by operating systems such as Linux / Windows / MacOS which execute the operation based on the logical timings. It controls the CPU/GPU and the storage in a multiprocessing environment.
  • operating systems such as Linux / Windows / MacOS which execute the operation based on the logical timings. It controls the CPU/GPU and the storage in a multiprocessing environment.
  • a typical computer system 1600 is depicted in figure.16 discussed further herein.
  • This figure also illustrates the overall process of developing a deep learning-based Al model by training a pretrained deep neural network architecture (Resnet-50) using a data set whose labels are known prior to the analysis.
  • the stages in this process include collection of tissue blocks/ pre-scanned whole slide image along with previously known class labels as ground truth. These ground truths are generated from the traditional next generation sequencing (NGS) system.
  • NGS next generation sequencing
  • all the images which are uploaded into the computer readable medium are classified into HRD / HRP.
  • the whole slide images are classified into two classes HRD / HRP based on the ground truth obtained from NGS based HRD score which is similar to Myriad MyChoice HRD scoring panel and the panel followed by Ambry Genetics.
  • the ground truth could also be obtained from other panels.
  • the whole slide images along with the labels which are classified into HRD/HRP classes are identified in separate sets named as training and testing sets (e.g., training data 1117). These sets are balanced to ensure that the training process is not biased to one single class.
  • the ratio of splitting the classspecific samples into training and test sets is given by randomized 80-20% rule respectively.
  • the randomization process takes care of generalizability of all the samples across the entire set of classes which trains the model to achieve better precision.
  • the model (e.g., Al model 1121) now picks up more generic features from the whole slide image and can make more accurate prediction.
  • the images are split into a set for training and testing followed by the Quality check system (1107) which identifies the samples eligible for training and generates ROI-mask.
  • Al-based deep learning model e.g. Al model 1121
  • the process of training a deep neural network architecture is also shown in figure 11.
  • tiles generated from the whole slide images are retrieved from the computer readable medium.
  • each tile is categorized into either HRD or HRP based on the NGS data.
  • the class-specific tiles are fed as a vector of pixels into the various layer of pre-trained deep neural network architecture where each layer generates the abstraction of these pixel values based on the activation function and the weights obtained from the pre-trained neural network system.
  • the training process repeats itself over multiple epochs (iterations) by adjusting the weights at each epoch to generate more accurate abstractions/generalizations of the features specific to each class and minimize the error of generalizability.
  • the training is done until the model (e.g. Almodel 1121) has reached the predefined level of accuracy which in the present case was 99.3 %. Once the model (e.g., Al model 1121) meets these criteria, the training process is concluded, indicating that the model (e.g., Al model 1121) has attained the desired level of learning.
  • the model e.g. Almodel 1121
  • the accuracy of the model performance is validated using 16% of the of the WSI from the tissue database.
  • the various stages in validation are the training process involving tile generation, forward propagation followed by the estimation of a loss function which reflects the difference between training and validation accuracy.
  • WSIs reserved for testing were subjected to tile generation, where the WSI was divided into smaller tiles, which were then subjected to tile specific prediction using the trained model (e.g. Al model 1121.
  • the predictions from the tile level prediction software 1602 (d) (tile-specific predictor is a part of Al Model) for each tile are aggregated using a prediction aggregator algorithm developed herein. This aggregator combines the individual predictions to generate an overall prediction for the WSI.
  • the predicted HRD status is compared with the original HRD status of the WSI.
  • Figure 13 shows the tile level predictions for the patient for whom the selection for the PARP inhibitor treatment is to be made. It houses the trained deep neural network model (e.g., Al model (1321) along with various supporting routines comprising tile generation, (1312) tile level prediction (1322). The tile level prediction is carried out by the tile level prediction software 1602 (d). The entire system runs on a set of CPUs and optionally GPUs where the input to these processors (e.g., processors (1608) depicted in figure 16) is delivered through a computer readable medium (1601)
  • the trained deep neural network model e.g., Al model (1321) along with various supporting routines comprising tile generation, (1312) tile level prediction (1322).
  • the tile level prediction is carried out by the tile level prediction software 1602 (d).
  • the entire system runs on a set of CPUs and optionally GPUs where the input to these processors (e.g., processors (1608) depicted in figure 16) is delivered through a computer readable medium (1601)
  • the entire system may be a standalone system, cloud-based system or an intranetbased system.
  • Figure 14 illustrates the protocol for selection of the patient for PARP inhibitor treatment from tile level predictions.
  • Tile level prediction that each tile extracted from the ROI-mask of the high-resolution whole slide image belongs to the particular class viz. HRD or HRP is shown (1407,1408).
  • the patient level probabilities are arrived at by aggregating the tile level probabilities to generate the ratios and picking up the highest ratio value to decide if the patient is selected for PARP inhibitor treatment.
  • the process of arriving at the selection of the patient for PARP inhibitor treatment from this data is handled by the aggregation software 1602 (e).
  • WSI of the patient sample which has passed the quality check (1518), tile level predictions (1522) and the decision regarding the selection of patient for PARP inhibitor treatment (1511) extracted from the computer readable medium (1501) are used to generate the heatmap and presented as an overlay on the regions of the whole slide image representing the explainability of the final prediction visually on a heat map visualizer.
  • Higher intensity regions of heatmap visualizations indicate higher biomarker positivity of the whole slide image (figurel5).
  • Method and system for Al-based patient selection for PARP inhibitor treatment is illustrated with reference to figure 1. It comprises a) training an Al model (e.g., as trained in 110) to recognize the morphological features of the whole slide image of the tissue of the ovarian cancer patient and relate it with the NGS data for the tissue of the ovarian cancer patient, b) validating the model by making the prediction for selecting a patient for PARP inhibitor treatment based on the morphological analysis of the WSI images for which the NGS data are available and comparing the predictions with those arrived at based on NGS data, and c) using the Al model so trained to analyze morphological and architectural features of the WSI of the tissue of the ovarian cancer patient to select the patient for PARP inhibitor treatment.
  • an Al model e.g., as trained in 110
  • a set of pre scanned WSI images along with the NGS data or WSI images of tissue of ovarian cancer patients (module A) are first stored in the computer readable medium (101). These are then subjected to the quality check (module B) and the WSIs which pass the QC check are stored in the computer readable medium (101). These images along with the corresponding NGS data are used to train the Al model (module C). The trained Al model is stored in the computer readable medium. (101) For the validation of the Al model another set of pre scanned WSI images or WSI images of tissue of ovarian cancer patients (module A) are first stored in the computer readable medium.
  • the scanned WSI image or WSI image of tissue of the ovarian cancer patient (module D) is first stored in the computer readable medium. It is then subjected to QC check protocol and the WSI image if it passes the protocol, is stored in the computer readable medium. This is then subjected to biomarker prediction protocol (module E) and the Al model selects / rejects the ovarian cancer patient for the PARP inhibitor treatment based on the aggregated probability and the heatmaps generated (module F).
  • Resnet-50 model was trained using 58 tissue slides obtained from the combined cohorts of TCGA and Ambry Genetics data base.
  • the Al model e.g., Al model 1121
  • HRD status for each was extracted based on the NGS analysis carried out for 20 genes viz., ARID1A, ATM, ATRX, BAP1, BARD1, BLM, BRCA1, BRCA2, BRIP1, CHEK1, CHEK2, FANCA, FANCC, FANCD2, MRE11, NBN, PALB2, RAD50, RAD51and RAD51B and for samples from Ambry Genetics a l l gene status viz., ATM, BARD1, BRCA1, BRAC2, BRIP1, CHEK2, MRE11A, NBN, PALB2, RAD51C, RAD51D were adapted.
  • the learning rate was set at 0.001
  • the batch size was set at 1024 tiles
  • the cross-entropy loss between the predicted output of the model and the original label of every individual tile was calculated.
  • the model (e.g. Al model 1121) was trained as described earlier.
  • the value of loss function decreased with increasing iterations as summarized below.
  • the computer system 1600 has one or more processing units comprising (CPU(s) 1608al, 1608a2, 1608a3 etc., collectively or generically referred to as processor(s) 1608). It may optionally comprise GPU(s) 1608b 1 , 1608 b2, 1608b3, etc.
  • the processors 1608 can be a single-core processor, multi-core processor, computing cluster, or any number of other configurations.
  • the processors 1608, also referred to as processing circuits, are coupled via a system bus 1611 to a system memory 1605 and various other components.
  • the system memory 1605 can include a read only memory (ROM) 1607 and a random -access memory (RAM) 1606.
  • the ROM 1607 is coupled to the system bus 1611 and may include a basic input/output system (BIOS) or its successors like Unified Extensible Firmware Interface (UEFI), which controls certain basic functions of the computer system 1600.
  • BIOS basic input/output system
  • UEFI Unified Extensible Firmware Interface
  • the RAM is read-write memory coupled to the system bus 1611 for use by the processors 1608.
  • the system memory 1605 provides temporary memory space for operations of said instructions during operation.
  • the system memory 1605 can include random access memory (RAM), read only memory, flash memory, or any other suitable memory systems.
  • the computer system 1600 comprises an input/output (I/O) adapter 1604 and a communications adapter 1609 coupled to the system bus 1611.
  • the I/O adapter 1604 may be a small computer system interface (SCSI) adapter that communicates with a hard disk 1603 and/or any other similar component.
  • SCSI small computer system interface
  • the I/O adapter 1604 and the hard disk 1603 are collectively referred to herein as computer readable medium 1601.
  • Software 1602 for execution on the computer system 1600 may be stored in the computer readable medium 1601. This is an example of a tangible storage medium readable by the processors 1608, where the software 1602 is stored as instructions for execution by the processors 1608 to cause the computer system 1600 to operate, such as is described herein with respect to the various figures. Examples of computer program product and the execution of such instruction is discussed herein in more detail.
  • the communications adapter 1609 interconnects the system bus 1611 with a network 1610, which may be an outside network, enabling the computer system 1600 to communicate with other such systems.
  • a portion of the system memory 1605 and the computer readable medium 1601 collectively store an operating system, which may be any appropriate operating system to coordinate the functions of the various components shown in figure 16.
  • Additional input/output devices are shown as connected to the system bus 1611 via a display adapter 1618 and an interface adapter 1612.
  • the adapters 1604, 1609, 1618, and 1612 may be connected to one or more I/O buses that are connected to the system bus 1611 via an intermediate bus bridge (not shown).
  • a display 1617 e.g., a screen or a display monitor
  • the display adapter 1618 which may include a graphics controller to improve the performance of graphics intensive applications and a video controller.
  • a keyboard 1613, a mouse 1614, a speaker 1615, a microphone 1616, etc. can be interconnected to the system bus 1611 via the interface adapter 1612, which may include, for example, a Super I/O chip integrating multiple device adapters into a single integrated circuit.
  • Suitable I/O buses for connecting peripheral devices such as hard disk controllers, network adapters, and graphics adapters typically include common protocols, such as the Peripheral Component Interconnect (PCI) and the Peripheral Component Interconnect Express (PCIe).
  • PCI Peripheral Component Interconnect
  • PCIe Peripheral Component Interconnect Express
  • the computer system 1600 includes processing capability in the form of the processors 1608, storage capability including the system memory 1605 and computer readable medium 1601, input means such as the keyboard 1613, the mouse 1614, and the microphone 1616, and output capability including the speaker 1615 and the display 1617.
  • the communications adapter 1609 can transmit data using any suitable interface or protocol, such as the internet small computer system interface, among others.
  • the network 1610 may be a cellular network, a radio network, a wide area network (WAN), a local area network (LAN), or the Internet, among others.
  • An external computing device may connect to the computer system 1600 through the network 1610.
  • an external computing device may be an external webserver or a cloud computing node.
  • FIG. 16 the block diagram of figure 16 is not intended to indicate that the computer system 1600 is to include all of the components shown in figure 16. Rather, the computer system 1600 can include any appropriate fewer or additional components not illustrated in figure 16 (e.g., additional memory components, embedded controllers, modules, additional network interfaces, etc.). Further, the embodiments described herein with respect to computer system 1600 may be implemented with any appropriate logic, wherein the logic, as referred to herein, can include any suitable hardware (e.g., a processor, an embedded controller, or an application specific integrated circuit, among others), software (e.g., an application, among others), firmware, or any suitable combination of hardware, software, and firmware, in various embodiments.
  • suitable hardware e.g., a processor, an embedded controller, or an application specific integrated circuit, among others
  • software e.g., an application, among others
  • firmware e.g., an application, among others
  • One or more embodiments described herein can utilize machine learning techniques to perform tasks, such as classifying a feature of interest. More specifically, one or more embodiments described herein can incorporate and utilize rules-based decision making and artificial intelligence (Al) reasoning to accomplish the various operations described herein, namely classifying a feature of interest.
  • the phrase “machine learning” broadly describes a function of electronic systems that learn from data.
  • a machine learning system, engine, or module can include a trainable machine learning algorithm that can be trained, such as in an external cloud environment, to learn functional relationships between inputs and outputs, and the resulting model (sometimes referred to as a “trained neural network,” “trained model,” “a trained classifier,” and/or “trained machine learning model”) can be used for classifying a feature of interest.
  • machine learning functionality can be implemented using an Artificial Neural Network (ANN) having the capability to be trained to perform a function.
  • ANNs are a family of statistical learning models inspired by the biological neural networks of animals, and in particular the brain. ANNs can be used to estimate or approximate systems and functions that depend on a large number of inputs.
  • Convolutional Neural Networks are a class of deep, feed forward ANNs that are particularly useful at tasks such as, but not limited to analyzing visual imagery and natural language processing (NLP).
  • NLP visual imagery and natural language processing
  • RNN Recurrent Neural Networks
  • Other types of neural networks are also known and can be used in accordance with one or more embodiments described herein.

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medicinal Chemistry (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The present disclosure relates to a method for selection of ovarian cancer patients for treatment with PARP inhibitors This involves 1) preparing whole slide images of tissues of ovarian cancer patient to be treated 2) training and validating AI model built on Resnet-50 to recognize morphological features from tiles of hematoxylin and eosin stained whole slide images of tissues featuring the presence of homologous recombination deficiency as annotated by next generation sequencing 3) using the model to extract morphological features from the tiles of hematoxylin and eosin stained whole slide images of tissues of ovarian cancer patients, generating tile level predictions, aggregating tile level predictions to generate a probability score and selecting the patient for treatment with PARP inhibitors 4) administering PARP inhibitors treatment to the patient selected based on probability score reaching a predetermined threshold. Computer systems to implement the method are also disclosed.

Description

“A METHOD FOR SELECTING OVARIAN CANCER PATIENTS FOR PARP INHIBITOR TREATMENT”
FIELD OF THE INVENTION:
The present disclosure relates to a method for the selection of ovarian cancer patients for treatment with PARP inhibitors. The method uses artificial intelligence and machine learning aided analysis of histopathology images of ovarian cancer patients. More particularly, it relates to extracting the morphological and architectural features from hematoxylin and eosin (H and E) stained histopathological image of the tissue of the ovarian cancer patient, relating the said extracted morphological and architectural features to the images in the database consisting of morphological and architectural features of H and E stained histopathological images of the tissues of the ovarian cancer patients wherein cancerous tissue showed the presence of homologous recombination deficiency / homologous recombination proficiency as evidenced by next generation sequencing and quantifying a probability score to select the patient for PARP inhibitor treatment. In response to the probability score meeting a threshold, the PARP inhibitor treatment is administered to the selected patient.
BACKGROUND OF THE INVENTION:
Poly (ADP-ribose) polymerase (PARP) nuclear enzyme was discovered about 60 years ago. Its role in DNA damage repair was recognized about 40 years ago. First clinical trial of the PARP inhibitor was conducted a decade ago (OncoTargets and Therapy 15, 165-180 (2022)).
Poly (ADP-ribose) polymerase (PARP) is a family of enzymes which participate in various cellular processes by covalently adding poly (ADP-ribose) chains onto target molecules (J Clin Med. 8 (4), 435 (2019)). PARP inhibitors are long chain molecules containing 17 enzymes that combine several (poly) units of ADP-ribose in a chain and transfer them to the target proteins. PARP inhibitors are being extensively used for the treatment of various types of cancers such as ovarian cancer, breast cancer, pancreatic cancer and prostate cancer. Amongst the PARP inhibitors investigated, PARP1 is the most extensively studied. It catalyzes the synthesis of poly (ADP ribose)(PAR) which links covalently to histones, other proteins, or PARP itself in response to a variety of signals, especially DNA damage. PARP inhibitor design is based on exploiting synthetic lethality (SL) interaction.
Two publications in Nature showed that human cancer cells with BRCA mutations are vulnerable to PARP inhibitors (Nature 434, 913-917 (2005). It was later shown that PARP inhibitors killed BRCA 2 deficient cells at doses that were nontoxic to normal cells (Annals of Oncology 22, 268—279 (2011), Nature 434, 917-921 (2005)) and that BRCA2 deficient cells were 90 times more sensitive to PARP inhibition than to wildtype cells (Chin J Cancer. 30, (7), 463—471 (2011)). PARP inhibition was thrice as potent as cisplatin cytotoxicity in BRCA deficient cells. PARP inhibitor Ku0058684 inhibited tumor formation in mice injected with BRCA2 deficient cells but not normal cells (Nature 434, 917—921 (2005)).
PARP inhibitors have been approved in BRCAl/2-mutant cancers such as ovarian, breast, pancreas, and prostate cancers. Olaparib, rucaparib, niraparib, and talazoparib are approved by the USFDA. More than 35 PARP inhibitor drugs are in clinical trials. PARP inhibitors are also effective in other BRCA1/2- related cancers such as breast prostate and pancreatic cancers.
PARP inhibitor market is currently estimated at $ 5.5 bn and expected to grow to $ 8.43 bn by 2026. This is attributed to the increase in cancer incidences, increasing demand for better treatment options, and focus on targeted treatment. (Exp Hematol Oncol 8, 29 (2019))
While HRD is known to be caused by BRCA1/2 mutations, not all patients with mutations in BRCA1/2 responded to PARP inhibitors. This was attributed to the development of resistance and could result from secondary mutations in BRCA1/2, and depletion of HR compensatory repair pathways such as the non-homologous end joining pathway (Cancers 12, 1607 (2020) Frontiers in Oncology 12, 1-28 (2022)). HR restoration leads to resistance to PARP inhibitors even in cancers with BRCA1/2 mutations or hypermethylation (Cancers 14 (6), 1420 (2022)). The mechanism of resistance could differ depending on the type of “BRCAness” gene involved. These are discussed by Bunting, et al. (Cell 141, (2), 243-54 (2010)), Patch et al. (Nature 521, 489-494 (2015)); Bouwman, et al. (Nat Struct Mol Biol 17, 688-695 (2010)).
Initially it was found that PARP inhibitor could effectively kill BRCA1/2 mutated tumor cells (N Engl J Med. 361, 123-34 (2009)). Later, it was noticed that some non-BRCAl/2 mutated HRD tumors were also sensitive to PARP inhibitor treatment. (Cancer Res. 72, 5675-82, (2012); Nature 434, 917-21 (2005)), indicating PARP inhibitor treatment would benefit a broader category of patients.
Talazoparib has been recently shown to shrink the tumors of breast cancer patients with mutations in the PALB2 gene indicating a potential treatment option. (Nature Cancer 3, 1181-1191 (2022)). PTEN mutated cells / loss has also been shown to induce sensitivity to PARP inhibitors. (EMBO Mol Med. 1, 315-322 (2009), Nat Rev Clin Oncol. 8, (5), 302-306 (2011))
The concept of “BRCAness” was introduced to describe the clinical and biological features which includes similar histomorphological features as well as similar immune phenotypic profiles in breast cancers as well as ovarian cancers. (Nature Reviews Cancer. 4, 814—819 (2004), Prostate 74, 70—89 (2014)).
To explore new avenues for PARP inhibitors, newer methods to detect HRD and sensitivity to PARP inhibitor need to be developed. (Clin Cancer Res 22, (23) 5651- 60 (2016), J Pathol 244, (5) 586-97(2018), J Natl Cancer Inst 110, (7) 704-13, (2018)).
Next generation sequencing allows simultaneously sequencing a large number of DNA strands at the same time and enables high-throughput testing (Genome Biology 10 R32 (2009)). NGS has been practiced by Oncologists to personalize treatments for cancer patients.
The PARP Inhibitor testing market is segmented on the basis of product, application, and end user. This includes kits, instruments, and assays. The applications include breast cancer, ovarian cancer, pancreatic cancer, prostate cancer and melanoma. A wide range of assays, referred to as ‘HRD tests’, have been developed to identify which cancers apart from those resulting from BRCA mutation, are likely to be associated with HRD. The tests fall into three categories: (i) HRR pathway related genes that identify specific causes of HRD, (ii) genomic ‘scars or mutational signatures that measure the patterns of somatic mutations that accumulate in HRD cancers irrespective of the underlying defect, and (iii) functional assays that have the potential to provide a real time read out of HRD or HRP. (Annals of oncology 31, (12) 1606-1622 (2020)).
Myriad Genetics test uses sequencing to find BRCA mutations and produces a 'genomic instability' score related to DNA damage. However, the threshold at which a score is indicative of HRD is not unequivocally defined. The tests also falsely identify many patients as having no mutations, and hence cannot precisely predict if a patient would benefit from treatment with PARP inhibitors.
(MyChoice HRD test from Myriad, and Foundation Focus CDxBRCA by Foundation Medicine) are based on Genomic scar assays. The tests also detect pathogenic variants in HRR genes including BRCA1 and BRC A2. Identification of patients who will respond to PARP inhibitor therapy is still uncertain due to the lack of a unifying morphological phenotype, (Endocrine-Related Cancer 23, R267- R285 (2016)). US 9388472 B2, US 10400287 B2 and WO 2021119311 Al.
Myriad Genetics’ BRAC Analysis CDx, is for treatment in patients with gBRCAm ovarian cancer and metastatic breast cancer. BRAC Analysis CDx is a diagnostic test which utilizes sequencing and deletion / duplication analysis to identify germline mutated BRCA1/2 genes (Annals of Oncology 32, (12) 1582-1589 (2021)).
Foundation Medicine’s T5 NGS assay was used to assess genomic LOH when predicting response of patients with recurrent ovarian carcinoma to rucaparib (Expert Review of Molecular Diagnostics 20, (3), 285-292 (2020)). Foundation One CDx, is a comprehensive genomic profiling test for all solid tumors, capable of detecting protein-coding mutations, copy number alterations, selected promoter mutations and structural rearrangements in 324 cancer-associated genes (PLoS ONE 17(3): e0264138 (2022)).
MyChoice™ HR deficiency (HRD) companion diagnostic test (myriad®) measures three modes of HRD, including loss of heterozygosity, telomeric allelic imbalance, and large-scale state transitions in cancer cells. The HRD score is used to indicate the inability of cancer cells to repair DNA damage and may reflect tumor sensitivity to PARP inhibitors.
While currently available HRD tests are useful for predicting likely magnitude of benefit from PARP inhibitors, better biomarkers are needed to identify current homologous recombination proficiency status and stratify high-grade serous carcinoma (HGSC) management. (Annals of oncology 31, (12) 1606-1622 (2020)) Most HRD detection methods are based on genome wide enumeration of scarring events and require deep genome sequence profiles (> 30x). Genomic integrity index (GII), a convolutional neural network, that leverage features from low pass (lx) whole genome sequencing data to distinguish between HRD positive and negative samples, was introduced by Andre et al (doi.org/10.1101/2022.07.06.498851). This used the deep learning approach for improved detection of homologous recombination deficiency from shallow genomic profiles in case of ovarian and breast cancer patients.
In spite of above-mentioned advantages, the usage of the NGS technique has certain limitations. Invasive tumor tissue biopsies are difficult to perform on patients with advanced cancers. Also, they do not reflect molecular heterogeneity across metastatic lesions and may have to be supplemented by methods such as Multiplexligation dependent probe amplification (MLP A) to ensure that the full spectrum of genetic aberrations is accounted for (Nature 521, 489-494 (2015)). NGS platforms require substantial bioinformatic input for the analysis and interpretation of sequencing data (British Journal of Cancer 113, Suppl 1, S17-S21 (2015). Endocrine-Related Cancer (23, R267— R285(2016)).
Current tests are not only expensive but also involve significant uncertainty.
With the increasing use of PARP inhibitor in clinical practice and the emergence of resistance to these agents, there is a growing need to identify predictive markers which will respond to PARP inhibitor. However, the current available biomarkers to infer the presence of HRD, including multigene panel testing, genomic scar and functional assays, are not able to identify patients who will respond to therapy, with certainty. This is because of the lack of a unifying morphological phenotype, the varied components of the repair pathways, varied mechanisms of drug resistance for each gene and the fact that not all genomic alterations would lead to HRD phenotypes. The development of predictive biomarkers and diagnostic assays that will enable reliable patient selection remains an important area of research. At present, there is no reliable method based on analysis of morphological phenotypes to select patients suitable for PARP inhibitor therapy treatment.
There is also a need to identify and validate new biomarkers so that more patients who are likely to respond to PARP inhibitor therapies may be identified. Any such biomarker needs to be analytically validated. Also, there is a clinical need for a biomarker to select patients for the PARP therapy.
The haematoxylin and eosin (H and E) stain has been a cornerstone of cancer diagnosis for over a century. H and E staining offers a simple yet effective method for visualizing cellular morphology and tissue architecture. The distinctive blue nuclei and pink cytoplasm provide pathologists with crucial clues for distinguishing cancerous from healthy cells. Despite its apparent simplicity, H and E staining remains a powerful tool, and advancements in imaging technologies are continuously expanding its diagnostic potential. The advancements in computational power and image analysis have enabled robust computer-assisted approaches in radiology and histopathology using digitized slides. Computer- assisted diagnosis (CAD) algorithms, akin to those in radiology, are emerging in histopathology to complement pathologists' opinions. Morphological features like cell size, density, and nuclear shape, aiding in cancer detection and grading have been quantified for the purpose. This objective data complements the pathologist's subjective assessment, leading to more precise diagnoses (Gurcan MN, Boucheron LE, Can A, Madabhushi A, Rajpoot NM, Yener B. Histopathological image analysis: a review. IEEE Rev Biomed Eng. 2, 147-71 (2009))
CNNs (Convolutional Neural Networks) have achieved great success in various computer vision tasks such as image classification, object detection, segmentation. CNNs automatically learn discriminative features from raw pixels directly without relying on manual feature engineering. CNNs employ convolutional filters that are inspired by the visual cortex of animals to extract features at multiple levels of abstraction. CNNs have revolutionized several applications including face recognition, autonomous driving, and medical image analysis. The adoption of CNNs is primarily driven by its weight-sharing feature, a key advantage that reduces the number of trainable parameters. The concurrent learning of feature extraction layers and classification layers in CNNs contributes to a highly organized model output, fostering effective learning by relying on extracted features. Additionally, CNNs facilitate the implementation of large-scale networks more seamlessly than other neural networks, underscoring their scalability and practicality for handling complex tasks. Over the past decade, several CNN architectures have been introduced. The architecture of a model plays a pivotal role in enhancing the performance of various applications. Noteworthy modifications have been implemented in CNN architecture from 1989 to the present, encompassing structural reformulation, regularization, parameter optimizations, among others. It is essential to recognize that the significant improvement in CNN’s performance primarily stems from the reorganization of processing units and the innovation of novel blocks. A key focus in advancing CNN architectures has been on enhancing network depth. (Izubaidi, L., Zhang, J., Humaidi, A.J. et al. J Big Data 8, 53 (2021); Khan A, Sohail A, Zahoora U, Qureshi A.S. Artif Intell Rev. 53 (8) 5455-516 (2020)).
When working with deep convolutional neural networks to solve a problem related to computer vision, machine learning experts engage in stacking more layers. These additional layers help solve complex problems more efficiently as the different layers could be trained for varying tasks to get highly accurate results. While the number of stacked layers can enrich the features of the model, a deeper network can result in the issue of degradation. In other words, as the number of layers of the neural network increases, the accuracy levels may get saturated and slowly degrade after a point. This degradation is not a result of over-fitting but maybe due to other reasons like vanishing gradient problem, exploding gradient, optimization, etc. Traditional deep networks often face difficulties in propagating gradients through numerous layers during back-propagation, which is known as the vanishing gradient problem. In contrast to the vanishing gradient problem, the exploding gradient problem occurs when gradients become too large during back-propagation, leading to numerical instability and challenges in convergence (M. Liu, L. Chen, X. Du, L. Jin and M. Shang, IEEE Transactions on Neural Networks and Learning Systems, 34, (4) 2156-2168 (2023)).
ResNet was introduced to facilitate training deeper neural networks, redefining layers as learning residual functions relative to their inputs. Empirical evidence demonstrates the improved optimization and accuracy of these residual networks, reaching depths of up to 152 layers on ImageNet with lower complexity than previous models using residual connections. Residual connections facilitate smoother flow of gradients during back-propagation, enabling effective training of deeper networks. Moreover, residual connections aid in mitigating the risk of the exploding gradient problem. In situations where gradients become extremely large during training, the network may experience instability. The short-cut connections provide an alternative path for gradient flow, preventing its explosion and contributing to the overall stability of the training process. Additionally, residual connections promote the learning of residual mappings, capturing the fine-grained details or nuances in the data. Instead of requiring each layer to learn the complete mapping, residual connections allow layers to focus on learning the residual or the difference between the input and the desired output. This eases the optimization process and enhances the network's ability to capture intricate features, especially in more complex tasks (Kaiming He; Xiangyu Zhang; Shaoqing Ren; Jian Sun, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) DOI: 10.1109/CVPR.2016.90 (2016)).
ResNet-50 model has been modified in the past for various applications. Pneumonia, an inflammatory lung disease primarily caused by bacteria, requires early and accurate diagnosis to reduce morbidity and mortality. Lung x-rays are commonly used for diagnosis, but nonspecific images and a lack of qualified personnel may lead to errors. A new model based on ResNet-50, achieved a maximum accuracy of 97.22%. This outperformed other architectures, showcasing the potential for early and accurate pneumonia diagnosis. Such models could significantly improve patient treatment management by swiftly determining the appropriate course of action (Qinar, A., Yildinm, M., Eroglu, Y. Traitement du Signal, 38,1, 165-173 (2021)).
Diabetic retinopathy (DR) poses a threat to vision, requiring early detection for timely intervention. The predictive accuracy of DR was enhanced using the ResNet- 50 model with visualization and preprocessing techniques. The ResNet-50 model was compared with other common CNN models, revealing an overfitting phenomenon in the latter. The revised ResNet-50 demonstrated superior performance, mitigating over-fitting issues, reducing loss values, and minimizing fluctuations. The study proposed a diabetic retinopathy grading system involving a standard operation procedure for image preprocessing and a modified ResNet-50 structure with adaptive learning rates, weight adjustments, and regularization (Lin, CL., Wu, KC. BMC Bioinformatics 24, 157 (2023)).
Breast cancer, a significant threat to women's health, necessitates early detection for effective treatment. The challenge was addressed through a convolutional neural network (CNN)-based model, utilizing transfer learning with the ResNet-50 architecture for automatic classification of histopathology images into malignant and benign tumours. The model achieved state-of-the-art performance with training, validation, and test accuracies of 99.70%, 99.24%, and 99.24%, respectively. Notably, this demonstrated a 0.66% and 0.2% improvement compared to similar recent studies. Average precision, Fl score, and the receiver operating characteristic area also exhibited enhancements, culminating in a reliable, accurate, and consistent CNN model for breast cancer detection based on the ResNet-50 architecture (Behar, N., Shrivastava, M. CMES-Computer Modelling in Engineering and Sciences, 130 (2) 823-839 (2022)).
SUMMARY OF THE INVENTION:
It has now been surprisingly found that an ovarian cancer patient can be selected for PARP inhibitor treatment, by analysing cellular features of tumor cells by carrying out Al assisted imaging analysis of hematoxylin and eosin-stained slide of the said patient. The present invention discloses an Al -based image analysis method for selecting patients who would benefit from PARP inhibitor treatment and administering PARP inhibitor treatment to the selected patients. The Al model has been trained to identify key genomic signatures for 20 gene panel along with key biomarkers such as genomes with loss of heterozygosity (LOH), Telomeric-allelic imbalance (TAI) and large scale transitions (LST) from hematoxylin and eosin-stained whole slide images to identify patients who are likely to benefit from these tests.
This involves 1) preparing whole slide images of the tissues of the ovarian cancer patient, 2) training and validating the Al model using whole slide images annotated with NGS characterization of the tissues of the ovarian cancer patient, 3) using the Al model to select the ovarian cancer patient for PARP inhibitor treatment, and 4) administering PARP inhibitor treatment to the selected ovarian cancer patient.
According to an embodiment, this disclosure relates to a method for selecting the patients suffering from ovarian cancer, for PARP inhibitor treatment.
According to an embodiment of this disclosure, the patient sample for analysis is selected from; biopsy samples containing cancer cells, core needle biopsy samples containing cancer cells taken using fine needle aspirates.
According to an embodiment of this disclosure, the samples are formalin fixed and paraffin embedded (FFPE).
According to an embodiment of this disclosure, a thin slice of the tissue from the FFPE block is stained with hematoxylin and eosin stains over a glass slide.
According to an embodiment of this disclosure, a WSI image is created from the above slide using a whole slide scanner.
According to an embodiment of the disclosure, the whole slide scanner supplied by Phillips, Hamamatsu, Morphle, etc. are suitable for the purpose.
According to an embodiment of the disclosure, the digitized H and E-stained whole slide image is analysed using Al assisted image analysis protocol developed herein to select the patient for PARP inhibitor treatment.
According to an embodiment, this disclosure provides a Quality check (QC) that accepts a set of Whole slide image. According to an embodiment, this disclosure provides a Quality check that selects the Region of Interest (ROI) pertaining to the tissue regions in the form of mask.
According to an embodiment, this disclosure provides a QC that bifurcates images based on the specific magnification of 40x to facilitate extraction of regions at higher resolution.
According to an embodiment, this disclosure provides a QC that performs the segmentation which extracts the ROI.
According to an embodiment, this disclosure provides a QC that identifies the specific tissue regions as ROI over the entire digitized whole slide image.
According to an embodiment of the disclosure, the preprocessing module, comprises a quality check, which is an automated program, comprising set of instructions, to read, process and output the target regions of interest which includes tissue objects like nuclei, stroma, tumor regions etc. from the digitized H and E- stained whole slide image, stored on a computer readable medium.
According to an embodiment of the disclosure, the QC extracts the real tissue regions of interest by excluding the artefacts, staining/ sectioning errors, blurry and unwanted biological substances such as fatty tissue, small objects, and white background fatty regions.
According to an embodiment of the disclosure, patches of image in a fixed size (256x256) from the specific magnification levels of 40x are extracted from the pyramidal structure of the image levels in the digitized H and E-stained whole slide image.
According to an embodiment of the disclosure, the region of interest identified is split into 256x256 pixel tiles to enable training and prediction at a level which can identify individual cells and tissue micro environments.
According to an embodiment of the disclosure, the tiles are extracted at the highest magnification level of microns per pixel (MPP) equal to 0.25.
According to an embodiment, this disclosure provides the image analysis method which has a training stage.
According to an embodiment, this disclosure provides the image analysis method which has a pattern recognition stage. According to an embodiment of this disclosure, the image analysis method incorporates a supervised training model which enables the artificial intelligence to perform pattern recognition step autonomously.
According to an embodiment of the disclosure, the WSI is subjected to image analysis using a trained Al model.
According to an embodiment of the disclosure, the trained Al model is developed to handle prediction/computation of biomarkers and probabilities of existence, such as HRD or HRP.
According to an embodiment of the disclosure, the Al model for the above biomarkers is developed by training a pretrained Al model using digitized H and E- stained whole slide images, by stratifying the samples based on the Next- Generation Sequencing (NGS) / pathologist annotations as a ground truth.
According to the embodiment of the disclosure, the model is trained on the H and E-stained whole slide image (WSI) selected from a database, wherein there is a balance of HRD and HRP WSIs.
According to an embodiment of the disclosure, designing of pre-trained convolutional neural network (CNN) involves designing the multiple layers of the neurons, which represent the dot product of input pixels with the predefined weights.
According to an embodiment of the disclosure, the pre-trained convolutional neural network (CNN) model is selected from ResNet-34, ResNet-50 and ResNet-101.
According to an embodiment of the disclosure, the predefined weights, which are refined, in plurality, are obtained from a ResNet-50 pre-trained model.
According to the embodiment of the disclosure, the first two layers of the neural network are frozen, and the rest n-2 layers are trainable.
According to the embodiment of the disclosure, the weights of the CNN architecture are further optimized by tuning several hyper-parameters like learning rate, batch size, loss function and drop-outs to perform the initial training.
According to the embodiment of the disclosure, the CNN based architecture is retrained on the tiles generated with the labels obtained from the Next-Generation Sequencing (NGS) as ground truth using the weights obtained from initial training. According to the embodiment of the disclosure, the training process is continued until the loss value is in the range 0.0001-0.0002.
According to the embodiment of the disclosure, the model is trained to recognize the higher order features specific to the tissue microenvironments like nuclei, stroma, tumor cells etc.
According to the embodiment of the disclosure, the model so trained is validated on the H and E-stained whole slide image (WSI) selected from a database, wherein there is a balance of HRD and HRP WSIs.
According to the embodiment of the disclosure, probabilities are computed at each tile level indicating the class to which a tile belongs viz., HRD or HRP.
According to the embodiment of the disclosure, the tile level probabilities are aggregated to select the patient for PARP inhibitor treatment.
According to the embodiment of the disclosure, the selection of patient for PARP inhibitor treatment is based on patient level probability.
According to the embodiment of the disclosure, a final prediction report along with a visualization of probable indication of the predicted biomarker is overlayed on the WSI.
According to an embodiment, this disclosure provides a method of predicting a cancer patient response to PARP inhibitor treatment.
According to an embodiment of the disclosure, the system for selecting an ovarian cancer patient for PARP inhibitor treatment comprises a computer readable medium for data storage.
According to an embodiment of the disclosure, the system for selecting a cancer patient for PARP inhibitor treatment comprises a processor.
According to an embodiment of the disclosure, the processor has a) a dual microprocessor and b) multi-processor architectures.
According to an embodiment of the disclosure, the processor is a CPU (Central processing unit).
According to an embodiment of the disclosure, the processor is a CPU and a GPU (Graphic processing unit). According to an embodiment of the disclosure, the processor is a CPU and a TPU (Tensor processing unit).
According to an embodiment of the disclosure, image acquisition system comprises a whole slide digital scanner, connected to a processor and a computer readable medium to acquire the digitized formats of the whole glass slides of the H and E- stained tissue samples and store them into an image data format in a computer readable medium.
According to an embodiment, the disclosure provides a) a system comprising a computer readable medium for data storage, b) a processor for conducting the quality check on a whole slide image, generating tiles from the whole slide image passing the quality check , c) a device to visualize the results of the method for selecting the said patient for PARP inhibitor treatment, and d) an interface to execute transfer of the data between the processor, the computer readable medium and the device specified in c).
According to an embodiment, this disclosure provides a system that comprises a scanner that creates whole slide image from the tissue biopsy samples containing cancer cells, core needle biopsy samples containing cancer cells taken using fine needle aspirates as well as samples obtained by using relevant techniques.
According to an embodiment of the disclosure, digitized H and E-stained whole slide image is represented in a pyramidal structure of different zoom levels, representing the slide at different zoom levels.
According to an embodiment of the disclosure, the various parts of the image acquisition apparatus are connected over internet, which comprises of communication networks such as WAN / LAN, devices such as gateways / routers / switches / bridges and communication protocols such as TCP/IP.
According to an embodiment, this disclosure provides a system that comprises an output module configured to display a commendation for the use of the PARP inhibitor-treatment regimen.
According to an embodiment of the disclosure, the system for selecting a cancer patient for PARP inhibitor treatment comprises the processor, a CPU and GPU; a computer readable medium to store digitized H and E-stained whole slide image and a set of circuits for communication and processing set of instructions stored. According to an embodiment of the disclosure, digitized H and E-stained whole slide image, which are stored as a record in a computer readable medium, are generated from an apparatus for scanning the whole glass slides of the H and E- stained tissue samples.
According to an embodiment of the disclosure, the digitized H and E-stained whole slide image, which are stored as a record in a computer readable medium, are generated from an apparatus for scanning the whole glass slides of the H and E- stained tissue samples; further, the apparatus is connected to a computer readable medium, where the digitized H and E-stained whole slide images are stored.
Computer-readable medium, as used herein, refers to a medium that stores instructions and /or data. A computer-readable medium may take forms, including, but not limited to, non-volatile media, and volatile media. Non-volatile media may include, for example, optical disks, magnetic disks, tapes, and other media. Volatile media may include, for example, semiconductor memories, dynamic memory, and other media. Common forms of a computer -readable medium may include, but are not limited to, a floppy disk, a flexible disk, a hard disk, a magnetic tape, other magnetic medium, an application specific integrated circuit (ASIC), a compact disk (CD), other optical medium, a random-access memory (RAM), a read only memory (ROM), a memory chip or card, a memory stick, and other media from which a computer, a processor or other electronic device can read.
BRIEF DESCRIPTION OF DRAWINGS:
The objectives and advantages of the present invention will become apparent from the following description read in accordance with the accompanying drawings wherein,
Figure 1 : Al model -based patient selection for PARP inhibitor treatment;
Figure 2: Tissue slide preparation; Figure 3: Pyramidal structure of the WSI;
Figure 4: WSI scanning process;
Figure 5: Collection of multi -omics data;
Figure 6: Training the Al model: dataflow;
Figure 7: Using the Al model for prediction: dataflow;
Figure 8: Quality check system;
Figure 9: Sample ROI mask;
Figure 10: Tile generation: Model development;
Figure 11 : Al- based model development;
Figure 12: Basic architecture of a CNN;
Figure 13: Tile level prediction;
Figurel4: Patient selection from tile level prediction;
Figure 15: Final output system; and
Figure 16: Block diagram of a typical computer system to select a patient for PARP inhibitor treatment.
DESCRIPTION OF THE INVENTION:
The present disclosure relates to a method of selecting an ovarian cancer patient for PARP inhibitor treatment based on the Al assisted analysis of the Hematoxylin and Eosin-stained (H and E) image of the tissue sample of the said cancer patient and the system for implementing the same. The method comprises training an Al model to recognize the morphological and architectural features of the whole slide image of the tissue of the ovarian cancer patient and relate it with the morphological and architectural features of the whole slide image labeled with the NGS characterization of the tissue of the ovarian cancer patient. The said morphological and architectural features are extracted by the deep learning model starting from the primitive pixel level features and moving towards higher level object levels features comprising but not limited to shape, size, color and texture etc.
A brief overview of the ResNet-50 model is given below, which is followed by the modification carried out in this work.
The ResNet-50 model architecture has a depth of 50 layers and innovative design, incorporating both residual blocks and a bottleneck structure. At the core of ResNet- 50 lies the residual block, a fundamental building block that introduces a shortcut connection, allowing the gradient to flow more efficiently during backpropagation. What distinguishes ResNet-50 is the integration of a bottleneck design within each residual block. This design involves a sequence of three convolutional layers: a 1x1 convolution for dimension reduction, a 3x3 convolution for feature extraction, and another 1x1 convolution for dimension restoration. This bottleneck architecture effectively balances computational efficiency and model depth, enhancing the network's ability to capture intricate features while mitigating the computational load. The network is composed of stacked residual blocks, with an initial layer consisting of standard convolutional and pooling layers to downsample the input. A noteworthy departure from conventional architectures is the utilization of a Global Average Pooling (GAP) layer instead of fully connected layers, which computes the average value of each feature map, leading to a more compact and expressive representation. The architecture concludes with a final fully connected layer with two neurons for classification, employing softmax activation for outputting class probabilities.
The training of pretrained model (Resnet-50) starts with selecting the training parameters such as the learning rate, optimization, batch size and loss function and other hyperparameters such as dropouts and regularization.
The choice of an appropriate loss function is a pivotal decision in training the ResNet-50 model, especially for image classification tasks where accuracy is a primary concern. Categorical cross-entropy stands out as a widely adopted loss function for classification, effectively measuring the dissimilarity between predicted class probabilities and actual class labels. Its formulation ensures that the model optimally adjusts its weights to minimize the divergence between predicted and ground truth distributions.
The optimization process involves selecting an algorithm to minimize the chosen loss and iteratively update the model's parameters. Adam, an adaptive optimization algorithm that combines ideas from momentum and RMSprop, is frequently favored as a part of the model training, due to its effectiveness in handling sparse gradients and noisy data.
The learning rate is a critical hyperparameter in the training process of the ResNet- 50 model. It represents the step size at which the optimizer adjusts the model parameters during each iteration of training. Choosing an appropriate learning rate is crucial for achieving efficient convergence and optimal performance. Setting the learning rate too high may lead to overshooting the minimum of the loss function, causing the optimization process to oscillate or diverge. Conversely, setting it too low can result in slow convergence, where the model takes small steps toward the optimal parameters, prolonging the training time. In our case we have used the learning rate as 0.0001.
The batch size (1024 tiles) is a crucial hyperparameter in the training process. It defines the number of training samples utilized in one iteration of gradient descent, impacting both the efficiency of the optimization process and the model's ability to generalize to unseen data.
Other hyperparameters such as drop out and regularization are the techniques to minimize error to improve accuracy.
Model development process involves training a deep learning-based CNN architecture (Resnet -50) which is pretrained over a million of natural images and has been publicly available in a specific architecture. To train the Al model to extract the morphological and architectural features of the tissue regions over the digitized glass slides, the pretrained model (Resnet -50) is configured to extract the specific morphological and architectural features exhibiting the mutational aspects particular to HRD and HRP. As training data (117) the samples are segregated based on the classes i.e., HRD/HRP to which they belong to, and on the scoring obtained from traditional NGS panel. The Al model can now pick up the morphological and architectural features specific to HRD / HRP based on the NGS scoring.
The method of selecting an ovarian cancer patient for PARP inhibitor treatment comprises three aspects 1) preparing whole slide images of the tissues 2) training and validating the Al model and 3) using the Al model to select the ovarian cancer patient for PARP inhibitor treatment. The method is described below and illustrated suitably with the figures. The description and the figures do not limit the scope of the invention.
Figure 1 depicts aspects of the Al model -based patient selection for PARP inhibitor treatment. It shows the system comprising computer readable medium (101), collection of multi omics data (102), quality check system (107), Al model building system (108), whole slide image preparation system (103), biomarker prediction system (111) and final output system (114). Each of these systems accept training data (117), test sample (118) and return it to the computer readable medium (101). Each of these is described below.
Preparing whole slide images of the tissues
Tissue sample is collected in the hospital (204) either in the form of surgical open biopsy (SOB) (203) or fine needle aspiration cytology (FNAC) / core needle biopsy (CNB) (205). This is followed by the preparation of H and E-stained glass slide (206). The FFPE block (202), obtained from a biopsy or surgical specimen is trimmed to a suitable size using a microtome. The trimmed tissue sections are then placed onto cleaned and labeled glass slides, Once the tissue sections are positioned on the slides, they are immersed in a fixative solution, typically formalin, to preserve the cellular structure. After fixation, the slides are subjected to a series of dehydration steps using aqueous ethyl alcohol solutions containing increasing concentrations of ethyl alcohol. This removes water from the tissue. The tissue is then treated with a solvent such as xylene which renders the tissue transparent. The slides are then stained with Hematoxylin and Eosin stains that highlight specific components of the tissue. Finally, a coverslip is placed over the stained tissue section (see figure 2).
The tissue glass slide prepared as above is subjected to scanning for digitizing. Whole slide image (WSI) scanning is a technique used in digital pathology to convert glass slides into high-resolution digital images. These digital images can be viewed, analyzed, and shared electronically, enabling remote access, collaboration, and computer-aided analysis in pathology workflows. The WSI scanning process typically involves the following steps: Slide scanning: The tissue slides prepared are placed in a whole slide scanner. The scanner captures images of the entire slide at various magnifications and focal planes, and are stored as a pyramidal image in a format selected from, .svs, .tiff , .vms, .ndpi, .mrxs. Whole slide scanners such as 3DHistech, Grundium Ocus, Philips Intellisite, Hamamatsu Nanozoomer, Huron Tissuescope, Leica Aperio, MoticEasyscan, Olympus Life science, Optrascan and Morphle Optimus 6 X manufactured by Morphle Inc are suitable for whole slide scanning. Slide loading: The glass slide is loaded into the scanner, either manually or using an automated slide loader. The slide's barcode or unique identifier may be scanned to link the digital image to patient information and the corresponding glass slide. Image acquisition: The scanner captures multiple images of the slide by scanning it at various magnifications and are overlaid to form a pyramidal structure and stored in a format selected from .svs. tiff, mrxs and ndpi. The scanning process may involve moving the slide in an X-Y plane or using a motorized stage to capture images of different regions of interest on the slide. Autofocus and image stitching: During the scanning process, the scanner's autofocus system is used to adjust the focal plane at each image capture point to ensure optimal focus across the entire slide. Additionally, specialized algorithms stitch together the individual captured images to create a seamless, high-resolution digital representation of the whole slide. These algorithms are a part of the scanner’ s software. Each scanner manufacturer has proprietary software which enables the operator to achieve the desired result without the knowledge of the software.
The whole slide image which is scanned through a scanner at a given magnification is formatted in a pyramidal structure as shown in figure 3. where each level of the pyramid shows different magnification levels. This is formatted into a single file such as, .svs, .tiff, .ndpi and transferred to a computer readable medium (e.g., computer readable medium (101) depicted in figure 1 through a communication network. The whole slide scanning is depicted in figure 4.
The whole slide image, (506) genomic data, (504) transcriptome and clinical data related to the whole slide image obtained from the databases are stored in the computer readable medium. The training data so stored (figure 5) is used to train the deep neural network architecture. The multi-omics data serves as the ground truth to generate the trained deep learning-based Al mathematical model.
The whole slide image is stored in formats representing multiple zoom level compressed in a single file. Typical image format includes, svs/, .tiff/, .ndpi. The ROI mask is stored in .png format. The metadata, which is text data, is stored in either .xlsx/.csv formats.
Computer readable medium (601) is used to store all the digitized whole slide images along with the ground truth data obtained from traditional NGS routine and multi-omics data. The publicly available QC routine ("built and modified from Histo QC", Janowczyk A., Zuo R., Gilmore H., Feldman M., Madabhushi A., JCO Clinical Cancer Informatics, (2019)) is used to perform quality check on the WSI and is integrated into the main pipeline as a software subroutine. The protocol is summarized in figure 6. The output of Quality check (QC) system (607) in the form of ROI-masks (region of interest) and the output of tile generation system, (609) which is in the form of the small patches of 256x256 pixel, extracted from the image at a specified magnification of 40x is stored in the computer readable medium. Computer readable medium is a virtual representation of multiple non-stationary computer readable memory storage devices distributed over internet in the form of cloud storage. This enables the user to access the Al -based prediction system (e.g., trained - model 610) at their own convenience and from any location.
Computer readable medium (701) also delivers the output of the prediction probabilities for the patient to be selected for the PARP inhibitor treatment, to the heatmap visualizer (717) along with final sample report (figure 7).
The above hardware is controlled by operating systems such as Linux / Windows / MacOS which execute the operation based on the logical timings. It controls the CPU/GPU and the storage in a multiprocessing environment. A typical computer system 1600 is depicted in figure.16 discussed further herein.
The quality check (707) which is also used for identifying the expected magnification of the whole slide image being examined is shown in figure 7.
Since the whole slide image is formatted in a pyramidal structure (figure 3) where each level of pyramid represents different zoom levels, the highest zoom level is tested across the expected magnification level viz., 40 x wherein all the tissue objects of interest are clearly visible for pathological observations. If the whole slide image is at a magnification other than 40 x, it fails to qualify for further processing.
The whole slide image which undergoes magnification check (802) and passes the quality check routine is subjected to detection of artefacts such as pen markings, coverslip, blurry regions and non-tissue regions such as fatty tissue. (803). After deletion of the artefacts, the ROI mask is generated (823). using the modified Histo QC software 1602 (a) so that it is compatibilized with the system of the present invention and stored in the form of a .png image. The file is uploaded back to the computer readable medium and stored alongside the original whole slide images grouped into HRD and HRP classes. Figure 9 shows a typical ROI mask.
Figure 10 illustrates the process of extracting the tiles from a whole slide image of the patient to be selected for PARP inhibitor treatment as described in the previous paragraph The tiles so generated are uploaded into the computer readable medium for further analysis and prediction.
Training the Al model
Figure 11 illustrates generation of tiles for all the class-specific samples identified in both the training and test set. The technique involves getting the whole slide image (1105) from the computer readable medium (1101) at the magnification level 40x and splitting the whole slide image at 40x for the regions indicated by the ROI- masks into a 256x256 pixel resolution tile. These tiles represent the tissue microenvironments such as cells, nuclei, stroma, necrosis, etc. which are uploaded to the computer readable medium (1101) from the tiles generated (1109) using the tile generation software 1602 (b) to facilitate the training process in subsequent stages. The tile size chosen enables the training process to pick up the tissue morphology.
Each tile identified by the ROI mask is extracted and converted into an image format, specifically saved as a .png file, and subsequently stored in the computer readable medium. Each extracted patch is assigned a class designation, either HRD or HRP, based on the NGS label of the whole slide image (WSI) from which the patches are derived. This assignment ensures that the patches inherit the corresponding class from their parent WSI.
This figure also illustrates the overall process of developing a deep learning-based Al model by training a pretrained deep neural network architecture (Resnet-50) using a data set whose labels are known prior to the analysis. The stages in this process include collection of tissue blocks/ pre-scanned whole slide image along with previously known class labels as ground truth. These ground truths are generated from the traditional next generation sequencing (NGS) system. Following this, all the images which are uploaded into the computer readable medium are classified into HRD / HRP. Herein, the whole slide images are classified into two classes HRD / HRP based on the ground truth obtained from NGS based HRD score which is similar to Myriad MyChoice HRD scoring panel and the panel followed by Ambry Genetics. The ground truth could also be obtained from other panels. The whole slide images along with the labels which are classified into HRD/HRP classes are identified in separate sets named as training and testing sets (e.g., training data 1117). These sets are balanced to ensure that the training process is not biased to one single class. The ratio of splitting the classspecific samples into training and test sets is given by randomized 80-20% rule respectively. The randomization process takes care of generalizability of all the samples across the entire set of classes which trains the model to achieve better precision. The model (e.g., Al model 1121) now picks up more generic features from the whole slide image and can make more accurate prediction. The images are split into a set for training and testing followed by the Quality check system (1107) which identifies the samples eligible for training and generates ROI-mask. The ROI-mask is used to generate the required set of tiles under each class of training set and test set. These tiles are further subjected to training using pretrained Al model (Resnet-50) with training parameters such as the learning rate, optimization and loss function and other hyperparameters such as drop outs and regularization. The learning rate controls how fast the algorithm gets trained. The batch size refers to the number of tiles which can be fed into the model during each iteration step. Optimization and loss functions relate to how the weights and errors are adjusted in the backward propagation step during learning iterations. Other hyperparameters such as drop out and regularization are the techniques to minimize error to improve the accuracy. The trained model (e.g., Al model 1121) so generated recognizes the morphological and architectural patterns in the H and E-stained images of the tissues of the ovarian cancer patients to be selected for PARP inhibitor treatment with an accuracy of 99.3% on the data set used for validation of the model.
The development of an Al-based deep learning model (e.g. Al model 1121) though the process of training a deep neural network architecture is also shown in figure 11. As a first step tiles generated from the whole slide images are retrieved from the computer readable medium. Next each tile is categorized into either HRD or HRP based on the NGS data. The class-specific tiles are fed as a vector of pixels into the various layer of pre-trained deep neural network architecture where each layer generates the abstraction of these pixel values based on the activation function and the weights obtained from the pre-trained neural network system. The training process repeats itself over multiple epochs (iterations) by adjusting the weights at each epoch to generate more accurate abstractions/generalizations of the features specific to each class and minimize the error of generalizability.
The deep learning-based model (e.g. Al model 1121) chosen for performing the training is a deep neural network architecture based on the philosophy of CNN as shown in figure 12.
The ResNet-50 model is a pre-trained convolutional neural network (CNN) architecture. The pre-trained weights of the ResNet-50 model are typically obtained from a large dataset like ImageNet (publicly available natural image database, ImageNet (image-net.org), which allows the model to capture meaningful visual features. These pretrained weights are computed through several forward (convolve and pooling) and backward (weights adjustment) propagations over the deep neural network. For the purpose of training the WSIs, the adjusted weights in the last few layers of the ResNet-50 model are subj ected to further adaptation for a more precise prediction of HRD and HRP. a) Forward Pass: During training, the network is fed a batch of input data, such as images. This data is passed through the network, one layer at a time, performing calculations at each layer to generate predictions. This process is called the forward pass. b) Loss Calculation: After making predictions, the same are compared with the actual target labels to calculate how well the network performed. This is done using a metric called the loss function. The loss function gives the difference between the predicted and actual values, the formula is as given below Where yt is the original label and the pt is the predicted output from the model, and N is the batch size. c) Gradient: The gradient represents the direction and magnitude of the steepest change in the loss function. It indicates the extent to which the network's parameters need to be adjusted to improve its predictions. The gradient is like a guide that helps take steps in the right direction to minimize the difference between predictions and actual values. d) B ackpropagation: Backpropagation is an algorithm that helps calculate the gradient. It works by going backward through the network, from the output layer to the input layer. As it goes backward, it computes the gradients of the loss function with respect to each parameter in the network. This helps understand how changing each parameter affects the overall loss. e) Parameter updates: Once the gradients are obtained, an optimization algorithm is used to update the network's parameters. Adam, an optimization algorithm adjusts the parameters as to minimize the difference between the prediction and the ground truth. By iteratively updating the parameters, the network gradually learns to make better predictions. f) Iteration: To train the network effectively, the steps ‘c’ to ‘e’ are repeated with different batches of data (e.g., training data 1117/ test sample 1118). Each repetition is called an epoch. Going through multiple epochs helps the network learn from a variety of examples and improve its predictions over time.
The steps a-f are executed using the training software 1602 (c).
To improve the model's performance, Adam optimization algorithm is employed. This algorithm adjusts the model's parameters in a way that minimizes the discrepancy between the original ground truth values and the corresponding updated weights. Through iterative parameter updates, the model gradually refines its ability to make more accurate predictions, aligning them more closely with the ground truth. By repeating this process over multiple training iterations, the model (e.g., Al model 1121) learns to optimize its parameters and gain a better understanding of the underlying patterns and features within the data. As a result, it becomes increasingly proficient at classifying images, leading to improved performance and more reliable predictions.
The training is done until the model (e.g. Almodel 1121) has reached the predefined level of accuracy which in the present case was 99.3 %. Once the model (e.g., Al model 1121) meets these criteria, the training process is concluded, indicating that the model (e.g., Al model 1121) has attained the desired level of learning.
Validation
During the training process, at the end of the forward pass of every epoch, the accuracy of the model performance is validated using 16% of the of the WSI from the tissue database. The various stages in validation are the training process involving tile generation, forward propagation followed by the estimation of a loss function which reflects the difference between training and validation accuracy.
Testing the model
WSIs reserved for testing were subjected to tile generation, where the WSI was divided into smaller tiles, which were then subjected to tile specific prediction using the trained model (e.g. Al model 1121. The predictions from the tile level prediction software 1602 (d) (tile-specific predictor is a part of Al Model) for each tile are aggregated using a prediction aggregator algorithm developed herein. This aggregator combines the individual predictions to generate an overall prediction for the WSI. The predicted HRD status is compared with the original HRD status of the WSI.
This process is repeated for all the remaining samples in the test subset, which consists of samples belonging to both HRD and HRP category. By comparing the predicted HRD status with the NGS labelled HRD status, an assessment of the trained model's performance can be obtained. This evaluation provides insights into how effectively the model can classify WSIs and predict their HRD status. A quantitative analysis of the performance of the model is conducted using various performance metrics such as True positive rate(sensitivity), True negative rate (specificity), precision (positive predictive value), receiver operating characteristic (roc) curve and f-1 score.
By analysing the performance across multiple WSIs, an understanding of the parameters which need to be refined as to improve the predictive capability of the model (e.g. Al model 1121) is gained. This also allows for the identification of any potential discrepancies or areas where the model (e.g. Al model 1121) may need further improvement. Through this iterative process, the model's overall performance and its ability to accurately predict the HRD status of the WSIs in the dataset can be quantified.
Selecting the patient for PARP inhibitor treatment
Figure 13 shows the tile level predictions for the patient for whom the selection for the PARP inhibitor treatment is to be made. It houses the trained deep neural network model (e.g., Al model (1321) along with various supporting routines comprising tile generation, (1312) tile level prediction (1322). The tile level prediction is carried out by the tile level prediction software 1602 (d). The entire system runs on a set of CPUs and optionally GPUs where the input to these processors (e.g., processors (1608) depicted in figure 16) is delivered through a computer readable medium (1601)
The entire system may be a standalone system, cloud-based system or an intranetbased system.
Figure 14 illustrates the protocol for selection of the patient for PARP inhibitor treatment from tile level predictions. Tile level prediction that each tile extracted from the ROI-mask of the high-resolution whole slide image belongs to the particular class viz. HRD or HRP is shown (1407,1408). The patient level probabilities are arrived at by aggregating the tile level probabilities to generate the ratios and picking up the highest ratio value to decide if the patient is selected for PARP inhibitor treatment. The process of arriving at the selection of the patient for PARP inhibitor treatment from this data is handled by the aggregation software 1602 (e).
WSI of the patient sample which has passed the quality check (1518), tile level predictions (1522) and the decision regarding the selection of patient for PARP inhibitor treatment (1511) extracted from the computer readable medium (1501) are used to generate the heatmap and presented as an overlay on the regions of the whole slide image representing the explainability of the final prediction visually on a heat map visualizer. Higher intensity regions of heatmap visualizations indicate higher biomarker positivity of the whole slide image (figurel5).
Method and system for Al-based patient selection for PARP inhibitor treatment is illustrated with reference to figure 1. It comprises a) training an Al model (e.g., as trained in 110) to recognize the morphological features of the whole slide image of the tissue of the ovarian cancer patient and relate it with the NGS data for the tissue of the ovarian cancer patient, b) validating the model by making the prediction for selecting a patient for PARP inhibitor treatment based on the morphological analysis of the WSI images for which the NGS data are available and comparing the predictions with those arrived at based on NGS data, and c) using the Al model so trained to analyze morphological and architectural features of the WSI of the tissue of the ovarian cancer patient to select the patient for PARP inhibitor treatment.
During training of the Al model, a set of pre scanned WSI images along with the NGS data or WSI images of tissue of ovarian cancer patients (module A) are first stored in the computer readable medium (101). These are then subjected to the quality check (module B) and the WSIs which pass the QC check are stored in the computer readable medium (101). These images along with the corresponding NGS data are used to train the Al model (module C). The trained Al model is stored in the computer readable medium. (101) For the validation of the Al model another set of pre scanned WSI images or WSI images of tissue of ovarian cancer patients (module A) are first stored in the computer readable medium. These images are then subjected to QC check (module B) and the images which pass the QC check are stored in the computer readable medium. These images are now passed through the biomarker prediction system (module E) and the predictions of Al model are compared with the ground truth to assess the accuracy of the model. The Al model is validated by comparing the prediction with the results based on the corresponding NGS data for the respective WSIs.
To select an ovarian cancer patient for the PARP inhibitor treatment, the scanned WSI image or WSI image of tissue of the ovarian cancer patient (module D) is first stored in the computer readable medium. It is then subjected to QC check protocol and the WSI image if it passes the protocol, is stored in the computer readable medium. This is then subjected to biomarker prediction protocol (module E) and the Al model selects / rejects the ovarian cancer patient for the PARP inhibitor treatment based on the aggregated probability and the heatmaps generated (module F).
Training ResNet-50 model for recognizing whole slide images of Ovarian cancer patients.
Resnet-50 model was trained using 58 tissue slides obtained from the combined cohorts of TCGA and Ambry Genetics data base. The Al model (e.g., Al model 1121) was trained to identify key genomic signatures for twenty gene panel along with key biomarkers such as genome wide LOH, TAI, and LST from Hematoxylin and Eosin-stained whole slide images to select patients who would benefit from this test. All slides passed the quality checks. Of these 30 were known to belong the class of HRD, and 28 to the class of HRP based on next generation sequencing. For the tissue samples from TCGA, HRD status for each was extracted based on the NGS analysis carried out for 20 genes viz., ARID1A, ATM, ATRX, BAP1, BARD1, BLM, BRCA1, BRCA2, BRIP1, CHEK1, CHEK2, FANCA, FANCC, FANCD2, MRE11, NBN, PALB2, RAD50, RAD51and RAD51B and for samples from Ambry Genetics a l l gene status viz., ATM, BARD1, BRCA1, BRAC2, BRIP1, CHEK2, MRE11A, NBN, PALB2, RAD51C, RAD51D were adapted.
During training (e.g., using training data 117), the learning rate was set at 0.001, the batch size at 1024 tiles and the cross-entropy loss between the predicted output of the model and the original label of every individual tile was calculated.
The model (e.g. Al model 1121) was trained as described earlier. The value of loss function decreased with increasing iterations as summarized below.
Table 1
Testing:
The model (e.g. Al model 1121 after training is tested with 13 TCGA samples and 10 Ambry Genetics samples, totaling 23 samples. Out of these total set, 12 were identified as HRD and 11 as HRP based on next generation sequencing. All slides passed the quality checks and further subject to tiling. These tiles specific to each WSI were subjected analysis using the trained model, where the model produces a probability of an input tile and if the probability is in the range [0,0.5] it is classified as HRP. If the probability is in the range [0.5,1] it is classified as HRD. Finally, by aggregating the tile probabilities at whole slide level, the final prediction accuracy across all the 23 test samples was 100%.
The accuracy is calculated as: where;
• TP: True Positives (correctly predicted positive instances)
• TN: True Negatives (correctly predicted negative instances)
• FP: False Positives (incorrectly predicted positive instances) • FN: False Negatives (incorrectly predicted negative instances).
For each sample, as shown in table 2, accuracy increases after every iteration, and at the same time the loss decreases. This indicates that the model is generalizing better at each iteration.
Table 2 To get the WSI level predictions based on tile predictions, the total number of tiles in each class was counted. For NHRD > NHRP, the WSI was predicted as HRD and for NHRD < NHRP, the WSI was predicted as HRP.
Table 3
According to NGS analysis tissue samples 8, 9, 10, 11, 13 were HRP. Al model assisted morphological analysis predicted that patient not be selected for PARP inhibitor treatment and thus corroborated the predictions based on NGS analysis.
According to NGS analysis tissue samples 4 and 7 were HRD. Al model assisted morphological analysis predicted patient be selected for PARP inhibitor treatment and thus corroborated the predictions based on NGS analysis.
In both these cases each tissue sample had to be examined using Al model assisted morphological analysis only once. The analysis is completed in less than 15 minutes. In contrast NGS has to be carried out for all twenty genes individually. This under the best conditions takes at least one week. The Al model assisted morphological analysis thus offers tremendous saving in terms of efforts, cost and time.
According to NGS analysis, the tissue samples 1, 2, 3, 5, 6 were HRP. However, Al model assisted morphological analysis predicted that the patients be selected for PARP inhibitor treatment. This implies that the HRD in these cases was a result of mutations of genes for which NGS was not carried out. This result is consistent with the observations in the past that many patients who were not diagnosed HRD positive based on NGS, actually benefited from PARP inhibitor treatment (Hodgson, D.R., Dougherty, B.A., Lai, Z., Fielding, A., Grinsted, L., Spencer, S., O’connor, M.J., Ho, T.W., Robertson, J.D., Lanchbury, J.S. and Timms, K.M., British journal of cancer, 119, (11)1401-1409 (2018)). These tissue predictions may be outcomes of HRD beyond BRCA1/2 genes. Al model (e.g. Al model 1121) assisted morphological analysis, thus selects patients for PARP inhibitor treatment, which would have not been selected based on NGS, widening the scope of PARP inhibitor treatment and benefiting the patients.
NGS diagnosed tissue sample 12, HRD positive. Yet Al model (e.g., Al model 1121) assisted morphological and architectural analysis showed that the patient be not selected for the PARP inhibitor treatment. This result is consistent with the findings in the past that many patients diagnosed HRD positive based on NGS did not benefit from PARP inhibitor treatment. This was attributed to that all genomic alterations of HRR genes would not lead to HRD phenotype and the resistance developed as a result of secondary mutations. (Collot, T., Niogret, J., Carnet, M., Chevrier, S., Humblin, E., Favier, L., Bengrine-Lefevre, L., Desmoulins, I., Arnould, L. and Boidot, R., Molecular medicine reports, 23 (1), 1 -8 (2021)). Mutations in vital cellular genes like TP53 may constitute resistance to Olaparib treatment despite the presence of BRCA mutations. In fact, presence of TP53 mutations may contribute to disease progression in addition to Olaparib resistance thus acting as a secondary mechanism to treatment predictors. But this would not be captured by the NGS based diagnosis as it is limited only to the genes which are part of the selected gene panel. Without limiting to any theories, the Al model (e.g., Al model 1121) assisted morphological and architectural analysis provides a more reliable selection of patients for PARP inhibitor treatment.
For samples from Ambry Genetics cohorts, according to NGS analysis tissue samples 1,2, 5, 7 and 8 were HRD and the rest are HRP. The result was corroborated by Al model assisted morphological analysis.
For samples from TCGA cohorts, according to NGS analysis tissue samples 1 to 7 were HRD and the rest were HRP. The result was corroborated by Al model assisted morphological analysis.
In both these cases, each tissue sample had to be examined using Al model assisted morphological and architectural analysis only once. The analysis is completed in less than 15 minutes. In contrast NGS has to be carried out for all the genes specified by the respective panel individually which under the best conditions takes at least one week. The Al model assisted morphological and architectural analysis thus offers tremendous saving in terms of efforts, cost and time.
According to NGS analysis tissue samples 1, 2, 3, 5, 7 were HRP. However, Al model assisted morphological and architectural analysis predicted that the patient be selected for PARP inhibitor treatment. This implies that the HRD in these cases was a result of mutations of genes for which NGS was not carried out. This result is consistent with the observations in the past that many patients who were not diagnosed as HRD based on NGS, actually benefitted from PARP inhibitor treatment (Hodgson, D.R., Dougherty, B.A., Lai, Z., Fielding, A., Grinsted, L., Spencer, S., O’Connor, M.J., Ho, T.W., Robertson, J.D., Lanchbury, J.S. and Timms, K.M., British journal of cancer, 119, (11) 1401-1409 (2018); van Wijk, L.M., Nilas, A.B., Vrieling, H. and Vreeswijk, M.P., Expert review of molecular diagnostics, 22(2), 185-199 (2022)). Al model assisted morphological and architectural analysis, thus selects patients for PARP inhibitor treatment, which would have not been selected based on NGS widening the scope of PARP inhibitor treatment and benefiting the patients.
NGS diagnosed tissue sample 12, HRD. Yet Al model assisted morphological and architectural analysis did not select the same for the PARP inhibitor treatment. This result is consistent with the findings in the past that many patients diagnosed HRD positive based on NGS did not benefit from PARP inhibitor treatment. This was attributed to the resistance developed as a result of secondary mutations (Collot, T., Niogret, J., Carnet, M., Chevrier, S., Humblin, E., Favier, L., Bengrine-Lefevre, L., Desmoulins, I., Amould, L. and Boidot, R., Molecular medicine reports, 23(1), 1-8 (2021)). But this would not be captured by the NGS based diagnosis as it is limited only to the genes for which the model was trained for. Without limiting to any theories, the Al model assisted morphological and architectural analysis provides a more reliable selection of patients for PARP inhibitor treatment.
Figure 16 shows a computer system 1600 in accordance with one or more embodiments of the invention. The computer system 1600 can be an electronic, computer framework comprising and/or employing any number and combination of computing devices and networks utilizing various communication technologies, as described herein. The computer system 1600 is easily scalable, extensible, and modular, with the ability to change to different services or reconfigure some features independently of others. The computer system 1600 may be, for example, a server, desktop computer, laptop computer, tablet computer, or smartphone. In some examples, computer system 1600 may be a cloud computing node. Computer system 1600 may be described in the general context of computer system executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types. Computer system 1600 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
As shown in figure 16, the computer system 1600 has one or more processing units comprising (CPU(s) 1608al, 1608a2, 1608a3 etc., collectively or generically referred to as processor(s) 1608). It may optionally comprise GPU(s) 1608b 1 , 1608 b2, 1608b3, etc. The processors 1608 can be a single-core processor, multi-core processor, computing cluster, or any number of other configurations. The processors 1608, also referred to as processing circuits, are coupled via a system bus 1611 to a system memory 1605 and various other components. The system memory 1605 can include a read only memory (ROM) 1607 and a random -access memory (RAM) 1606. The ROM 1607 is coupled to the system bus 1611 and may include a basic input/output system (BIOS) or its successors like Unified Extensible Firmware Interface (UEFI), which controls certain basic functions of the computer system 1600. The RAM is read-write memory coupled to the system bus 1611 for use by the processors 1608. The system memory 1605 provides temporary memory space for operations of said instructions during operation. The system memory 1605 can include random access memory (RAM), read only memory, flash memory, or any other suitable memory systems.
The computer system 1600 comprises an input/output (I/O) adapter 1604 and a communications adapter 1609 coupled to the system bus 1611. The I/O adapter 1604 may be a small computer system interface (SCSI) adapter that communicates with a hard disk 1603 and/or any other similar component. The I/O adapter 1604 and the hard disk 1603 are collectively referred to herein as computer readable medium 1601.
Software 1602 for execution on the computer system 1600 may be stored in the computer readable medium 1601. This is an example of a tangible storage medium readable by the processors 1608, where the software 1602 is stored as instructions for execution by the processors 1608 to cause the computer system 1600 to operate, such as is described herein with respect to the various figures. Examples of computer program product and the execution of such instruction is discussed herein in more detail. The communications adapter 1609 interconnects the system bus 1611 with a network 1610, which may be an outside network, enabling the computer system 1600 to communicate with other such systems. In one embodiment, a portion of the system memory 1605 and the computer readable medium 1601 collectively store an operating system, which may be any appropriate operating system to coordinate the functions of the various components shown in figure 16.
Additional input/output devices are shown as connected to the system bus 1611 via a display adapter 1618 and an interface adapter 1612. In one embodiment, the adapters 1604, 1609, 1618, and 1612 may be connected to one or more I/O buses that are connected to the system bus 1611 via an intermediate bus bridge (not shown). A display 1617 (e.g., a screen or a display monitor) is connected to the system bus 1611 by the display adapter 1618, which may include a graphics controller to improve the performance of graphics intensive applications and a video controller. A keyboard 1613, a mouse 1614, a speaker 1615, a microphone 1616, etc., can be interconnected to the system bus 1611 via the interface adapter 1612, which may include, for example, a Super I/O chip integrating multiple device adapters into a single integrated circuit. Suitable I/O buses for connecting peripheral devices such as hard disk controllers, network adapters, and graphics adapters typically include common protocols, such as the Peripheral Component Interconnect (PCI) and the Peripheral Component Interconnect Express (PCIe). Thus, as configured in figure 16, the computer system 1600 includes processing capability in the form of the processors 1608, storage capability including the system memory 1605 and computer readable medium 1601, input means such as the keyboard 1613, the mouse 1614, and the microphone 1616, and output capability including the speaker 1615 and the display 1617.
In some embodiments, the communications adapter 1609 can transmit data using any suitable interface or protocol, such as the internet small computer system interface, among others. The network 1610 may be a cellular network, a radio network, a wide area network (WAN), a local area network (LAN), or the Internet, among others. An external computing device may connect to the computer system 1600 through the network 1610. In some examples, an external computing device may be an external webserver or a cloud computing node.
It is to be understood that the block diagram of figure 16 is not intended to indicate that the computer system 1600 is to include all of the components shown in figure 16. Rather, the computer system 1600 can include any appropriate fewer or additional components not illustrated in figure 16 (e.g., additional memory components, embedded controllers, modules, additional network interfaces, etc.). Further, the embodiments described herein with respect to computer system 1600 may be implemented with any appropriate logic, wherein the logic, as referred to herein, can include any suitable hardware (e.g., a processor, an embedded controller, or an application specific integrated circuit, among others), software (e.g., an application, among others), firmware, or any suitable combination of hardware, software, and firmware, in various embodiments. One or more embodiments described herein can utilize machine learning techniques to perform tasks, such as classifying a feature of interest. More specifically, one or more embodiments described herein can incorporate and utilize rules-based decision making and artificial intelligence (Al) reasoning to accomplish the various operations described herein, namely classifying a feature of interest. The phrase “machine learning” broadly describes a function of electronic systems that learn from data. A machine learning system, engine, or module can include a trainable machine learning algorithm that can be trained, such as in an external cloud environment, to learn functional relationships between inputs and outputs, and the resulting model (sometimes referred to as a “trained neural network,” “trained model,” “a trained classifier,” and/or “trained machine learning model”) can be used for classifying a feature of interest. In one or more embodiments, machine learning functionality can be implemented using an Artificial Neural Network (ANN) having the capability to be trained to perform a function. In machine learning and cognitive science, ANNs are a family of statistical learning models inspired by the biological neural networks of animals, and in particular the brain. ANNs can be used to estimate or approximate systems and functions that depend on a large number of inputs. Convolutional Neural Networks (CNN) are a class of deep, feed forward ANNs that are particularly useful at tasks such as, but not limited to analyzing visual imagery and natural language processing (NLP). Recurrent Neural Networks (RNN) are another class of deep, feed forward ANNs and are particularly useful at tasks such as, but not limited to, unsegmented connected handwriting recognition and speech recognition. Other types of neural networks are also known and can be used in accordance with one or more embodiments described herein.
The embodiments were chosen and described in order to best explain the principles of the present invention and its practical application, to thereby enable others, skilled in the art to best utilize the present invention and various embodiments with various modifications as are suited to the particular use contemplated. It is understood that various omission and substitutions of equivalents are contemplated as circumstance may suggest or render expedient, but such are intended to cover the application or implementation without departing from the scope of the present invention.

Claims

CLAIMS:
1) A method for selecting an ovarian cancer patient for PARP inhibitor treatment comprising:
(a) extracting the morphological and architectural features from the whole slide images of the tissues of the ovarian cancer patients, selected from a tissue database, wherein the said whole slide images are labelled by next generation sequencing and belong to a class selected from 1) homologous recombination deficient (HRD), and 2) homologous recombination proficient (HRP), pass the quality check routine, and storing the extracted morphological features in a computer readable medium,
(b) developing a trained Al model based on pretrained convolutional neural network (CNN) model, wherein the trained Al model relates the extracted morphological features of the whole slide image stored in the computer readable medium as in (a) and generalizes the pattern in the whole slide image into a class selected from 1) homologous recombination deficient (HRD) and 2) homologous recombination proficient (HRP), and storing the trained model in a computer readable medium,
(c) extracting the morphological and architectural features from the whole slide image generated from the tissue of an ovarian cancer patient to be selected for PARP inhibitor treatment, wherein the said whole slide image passes the quality check routine, and storing the extracted morphological features in a computer readable medium, and
(d) relating the extracted morphological features from the said whole slide image as described in (c) with the morphological features from the whole slide images described in (a) using an Al model as trained in (b) and quantifying a probability score to select the ovarian cancer patient for PARP inhibitor treatment.
2) The method of extracting the morphological and architectural features from the whole slide image as claimed in claim 1 comprising a) subjecting the said whole slide image to a quality check routine b) selecting the whole slide images which pass the quality check routine at 40 X magnification and generating the regions of interest (ROI), pertaining to the tissue regions in the form of mask, and c) storing the selected whole slide images which pass the quality check routine at 40 X magnification and the regions of interest pertaining to the tissue regions in the form of mask in a computer readable medium.
3) The method of generating the regions of interest (ROI) pertaining to the tissue regions as claimed in claim 2 using the quality check routine, comprising a set of instructions, to read, process and output the target regions of interest comprising tissue objects like nuclei, stroma, tumor regions etc. from the digitized H and E-stained whole slide image, stored in a computer readable medium.
4) The method of generating the regions of interest pertaining to the tissue regions as claimed in claim 2 using the quality check routine, by excluding the artefacts, staining / sectioning errors, blurry and biological substances such as fatty tissue, small objects, and white background fatty regions.
5) The method of generating the regions of interest pertaining to the tissue regions as claimed in claim 2, wherein the regions of interest are extracted by segmenting the entire digitized whole slide image, using the quality check routine.
6) The method of storing the extracted morphological and architectural features in a computer readable medium as claimed in claim 1 comprising, using a tile generation algorithm to split the region of interest into 256 x 256 pixels sized tiles and storing the tiles in a computer readable medium.
7) The method of developing a trained Al model based on a pretrained convolutional neural network (CNN) model as claimed in claim 1 comprising 1) training the Al model 2) validating the Al model and 3) testing the Al model.
8) The method of developing a trained Al model based on a pretrained convolutional neural network (CNN) model as claimed in claim 7 wherein the pretrained convolutional neural network (CNN) model is selected from a ResNet based CNN architecture like a) ResNet-34, b) ResNet-50 and c) ResNet-101.
9) The method of developing a trained Al model based on a pretrained convolutional neural network (CNN) model as claimed in claim 7, wherein the whole slide images of the tissues of the ovarian cancer patients selected from the tissue data base are grouped into three lots, 1) lot A containing 64 % of the WSI from the tissue data base for training the model , 2) lot B containing 16% of the WSI from the tissue data base for validating the model and 3) lot C containing 20 % of the WSI from the tissue data base for testing the model.
10) The method of training an Al model starting from ResNet-50 model as claimed in claim 7 comprising 1) customizing the fully connected layer to contain 2 output units, 2) analysing the tiles generated from the whole slide images from lot A as in claim 9, 3) computing the loss function for each tile 4) adjusting the weights to minimize the training loss value 5) repeating the steps 2-4 until the training loss value is in the range 0.001 - 0.002.
11) The method of validating the Al model as trained in claim 10 comprising 1) customizing the fully connected layer to contain 2 output units, 2) analysing the tiles generated from the whole slide images from lot B as in claim 9, 3) computing the loss function for each tile 4) adjusting the weights to minimize validation loss value 5) repeating the steps 2-4 until the validation loss value is in the range 0.001 - 0.002.
12) The method of testing the Al model validated in claim 11, comprising 1) customizing the fully connected layer to contain 2 output units, 2) analysing the tiles generated from the whole slide images from lot C as in claim 9, 3) computing the loss function for each tile 4) and generating the probability of predicting the tiles into the class selected from 1) HRD and HRP.
13) The method of relating the extracted morphological and architectural features from the said whole slide image as claimed in claim 1, comprising 1) subjecting each tile to the trained Al model 2) computing probabilities that each tile belongs to the class selected from a) HRD and b) HRP, 2) aggregating the tile level probabilities to determine the whole slide level probability that the said ovarian cancer patient belongs to the group selected from 1) suitable for PARP inhibitor treatment 2) not suitable for PARP.
14) A computer system for implementing a method for selecting an ovarian cancer patient for PARP inhibitor treatment comprising a) a computer readable medium b) a processor c) a device to visualize the results of the said method and d) an interface to execute transfer of the data and algorithm between the processor, the computer readable medium and the device c).
15) The computer readable medium as claimed in claim 1 wherein the computer-readable medium is selected from a) non-volatile medium and b) volatile medium.
16) The computer readable medium as claimed in claim 1 which stores data.
17) The computer readable medium which stores data as claimed in claim 1, wherein the data is selected from a) whole slide images in the image format, b) whole slide images which qualify the quality check in the image format c) tiles generated as images in the image format d) region of interest (ROI) masks from the quality check in the image format e) trained model in the machine readable format selected from h5 and pkl format f) tile level probabilities in the j son format g) scores in the text format and h) heat maps in the image format.
18) The computer readable medium as claimed in claim 1 which stores algorithms.
19) The computer readable medium which stores algorithms as claimed in claim 5, wherein the algorithm is selected from a) a quality check algorithm b) a tile generation algorithm c) a training algorithm d) a prediction algorithm e) an aggregation algorithm.
20) The computer readable medium as claimed in claim 1, wherein the location of the computer readable medium is selected from a) a cloud b) an intranet c) a LAN and e) standalone computer.
21) The non-volatile medium as claimed in claim 2, wherein the nonvolatile medium is selected from optical disks and magnetic disks. 22) The volatile medium as claimed in claim 2 wherein the volatile medium selected from semiconductor memories, dynamic memory.
23) The computer-readable storage medium as claimed in claim 1 wherein the computer-readable storage medium is selected from a) a hard disk, b) a compact disk, c) a random -access memory (RAM), d) a read only memory (ROM), e) a memory chip, f) a memory card g) a memory Stick.
24) The processor as claimed in claim 1 wherein the processor is selected from a) dual microprocessor and b) multi -processor architectures.
25) The processor as claimed in claim 1 wherein the processor is selected from a) a CPU (Central processing unit), b) a CPU and a GPU (Graphics processing unit), and c) a CPU and a TPU (Tensor processing unit).
26) The processor as claimed in claim 1 that accepts the whole slide images and the Quality check algorithm through the interface, identifies the whole slide images that meet the quality check, generates the region of interest (ROI) mask and returns the whole slide images that meet the quality check and the region of interest (ROI) mask to the computer readable medium.
27) The processor as claimed in claim 1 that accepts the tile generation algorithm and the whole slide images that meet the quality check and the from the computer readable storage medium through the interface and returns the tiles generated and the tile generation algorithm to the computer readable medium.
28) The processor as claimed in claim 1 that accepts the tiles generated as images and prediction algorithm from the computer readable medium through the interface and returns through the interface tile level probabilities generated to the computer readable storage medium.
29) The processor as claimed in claim 1 that accepts the tile level probabilities and aggregation algorithm from the computer readable medium and returns through the interface the decision regarding selecting an ovarian cancer patient for PARP inhibitor treatment to the computer readable medium. 30) The processor as claimed in claiml that accepts the tile level probabilities and aggregation algorithm from the computer readable medium and returns through the interface the heat map to the computer readable medium.
31) The device to visualize the results of the method for selecting the said patient for PARP inhibitor treatment as claimed in claim 1 that accepts the decision regarding selecting an ovarian cancer patient for PARP inhibitor treatment from the computer readable medium through the interface and provides an output selected from a) visual display, b) print out.
32) The device to visualize the results of the method for selecting the said patient for PARP inhibitor treatment as claimed in claim 1 that accepts the heat map from the computer readable medium and produces the output in a format selected from a) visual display, b) print out.
33) The interface to execute transfer of the data and the algorithm between the processor, the computer readable medium and the device c) as claimed in claim Iwherein the interface is a bus.
34) The bus as claimed in claim 20 wherein the bus includes a) a data bus, b) an address bus, and c) a control bus.
35) The interface as claimed in claim 1, which accepts whole slide images which qualify the quality check in the image format as claimed in claim 4 and the tile generation algorithm as claimed in claim 5 from the computer-readable medium to the processor and returns the tiles generated and the tile generation algorithm to the computer readable medium.
36) The interface as claimed in claim 1, which accepts tiles generated as images in the image format as claimed in claim 4 and the training algorithm as claimed in claim 5 from the computer-readable medium to the processor and returns the training algorithm to the computer readable medium.
37) The interface as claimed in claim Iwhich accepts the prediction algorithm as claimed in claim 5 from the computer-readable storage medium to the processor and returns the decision for selecting an ovarian cancer patient for PARP inhibitor treatment and the prediction algorithm to the computer readable medium. 38) The interface as claimed in claim 1 which accepts the aggregation algorithm from the computer-readable medium to the processor and the decision regarding selecting an ovarian cancer patient for PARP inhibitor treatment and returns the aggregation algorithm to the computer readable medium.
39) The interface as claimed in claim Iwhich accepts the decision regarding selecting an ovarian cancer patient for PARP inhibitor treatment from the computer readable storage medium and returns the same to the device to visualize the decision.
PCT/IN2024/050436 2024-02-16 2024-04-24 A method for selecting ovarian cancer patients for parp inhibitor treatment Pending WO2025173014A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN202421010878 2024-02-16
IN202421010878 2024-02-16

Publications (1)

Publication Number Publication Date
WO2025173014A1 true WO2025173014A1 (en) 2025-08-21

Family

ID=91302601

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2024/050436 Pending WO2025173014A1 (en) 2024-02-16 2024-04-24 A method for selecting ovarian cancer patients for parp inhibitor treatment

Country Status (1)

Country Link
WO (1) WO2025173014A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9388472B2 (en) 2011-12-21 2016-07-12 Myriad Genetics, Inc. Methods and materials for assessing loss of heterozygosity
US10400287B2 (en) 2014-08-15 2019-09-03 Myriad Genetics, Inc. Methods and materials for assessing homologous recombination deficiency
CA3133826A1 (en) * 2019-03-26 2020-10-01 Tempus Labs, Inc. Determining biomarkers from histopathology slide images
WO2021019311A1 (en) 2020-03-19 2021-02-04 Sharifrazi Danial Hybrid recommender system equipped with facial expression recognition and machine learning
US20220319704A1 (en) * 2019-05-06 2022-10-06 Tesaro, Inc. Methods for characterizing and treating a cancer type using cancer images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9388472B2 (en) 2011-12-21 2016-07-12 Myriad Genetics, Inc. Methods and materials for assessing loss of heterozygosity
US10400287B2 (en) 2014-08-15 2019-09-03 Myriad Genetics, Inc. Methods and materials for assessing homologous recombination deficiency
CA3133826A1 (en) * 2019-03-26 2020-10-01 Tempus Labs, Inc. Determining biomarkers from histopathology slide images
US20220319704A1 (en) * 2019-05-06 2022-10-06 Tesaro, Inc. Methods for characterizing and treating a cancer type using cancer images
WO2021019311A1 (en) 2020-03-19 2021-02-04 Sharifrazi Danial Hybrid recommender system equipped with facial expression recognition and machine learning

Non-Patent Citations (40)

* Cited by examiner, † Cited by third party
Title
ANNALS OF ONCOLOGY, vol. 22, 2011, pages 268 - 279
ANNALS OF ONCOLOGY, vol. 31, no. 12, 2020, pages 1606 - 1622
ANNALS OF ONCOLOGY, vol. 32, no. 12, 2021, pages 1582 - 1589
BEHAR, N.SHRIVASTAVA, M., CMES-COMPUTER MODELLING IN ENGINEERING AND SCIENCES, vol. 130, no. 2, 2022, pages 823 - 839
BOUWMAN ET AL., NAT STRUCT MOL BIOL, vol. 17, 2010, pages 688 - 695
BRITISH JOURNAL OF CANCER, vol. 113, 2015, pages S17 - S21
BUNTING ET AL., CELL, vol. 141, no. 2, 2010, pages 243 - 54
CANCER RES., vol. 72, 2012, pages 5675 - 82
CANCERS, vol. 12, 2020, pages 1607
CANCERS, vol. 14, no. 6, 2022, pages 1420
CHIN J CANCER., vol. 30, no. 7, 2011, pages 463 - 471
CINAR, A.YILDINM, M.EROGLU, Y., TRAITEMENT DU SIGNAL, vol. 38, no. 1, 2021, pages 165 - 173
CLIN CANCER RES, vol. 22, no. 23, 2016, pages 5651 - 60
COLLOT, T.NIOGRET, J.CARNET, M.CHEVRIER, S.HUMBLIN, E.FAVIER, L.BENGRINE-LEFEVRE, L.DESMOULINS, I.ARNOULD, L.BOIDOT, R., MOLECULAR MEDICINE REPORTS, vol. 23, no. 1, 2021, pages 1 - 8
EMBO MOL MED., vol. 1, 2009, pages 315 - 322
ENDOCRINE-RELATED CANCER, vol. 23, 2016, pages R267 - R285
EXP HEMATOL ONCOL, vol. 8, 2019, pages 29
EXPERT REVIEW OF MOLECULAR DIAGNOSTICS, vol. 20, no. 3, 2020, pages 285 - 292
FRONTIERS IN ONCOLOGY, vol. 12, 2022, pages 1 - 28
GENOME BIOLOGY, vol. 10, 2009, pages R32
GURCAN MNBOUCHERON LECAN AMADABHUSHI ARAJPOOT NMYENER B: "Histopathological image analysis: a review", IEEE REV BIOMED ENG., vol. 2, 2009, pages 147 - 71, XP011507549, DOI: 10.1109/RBME.2009.2034865
HODGSON, D.R.DOUGHERTY, B.A.LAI, Z.FIELDING, A.GRINSTED, L.SPENCER, S.O'CONNOR, M.J.HO, T.W.ROBERTSON, J.D.LANCHBURY, J.S., BRITISH JOURNAL OF CANCER, vol. 119, no. 11, 2018, pages 1401 - 1409
J NATL CANCER INST, vol. 110, no. 7, 2018, pages 704 - 13
J PATHOL, vol. 244, no. 5, 2018, pages 586 - 97
KAIMING HEXIANGYU ZHANGSHAOQING RENJIAN SUN, IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR, 2016
KHAN ASOHAIL AZAHOORA UQURESHI A.S., ARTIF INTELL REV., vol. 53, no. 8, 2020, pages 5455 - 516
LIN, CLWU, KC., BMC BIOINFORMATICS, vol. 24, 2023, pages 157
LZUBAIDI, L.ZHANG, J.HUMAIDI, A.J. ET AL., J BIG DATA, vol. 8, 2021, pages 53
M. LIUL. CHENX. DUL. JINM. SHANG, IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, vol. 34, no. 4, 2023, pages 2156 - 2168
MERCAN EZGI ET AL: "Localization of Diagnostically Relevant Regions of Interest in Whole Slide Images: a Comparative Study", JOURNAL OF DIGITAL IMAGING, SPRINGER INTERNATIONAL PUBLISHING, CHAM, vol. 29, no. 4, 9 March 2016 (2016-03-09), pages 496 - 506, XP036002537, ISSN: 0897-1889, [retrieved on 20160309], DOI: 10.1007/S10278-016-9873-1 *
N ENGL J MED., vol. 361, 2009, pages 123 - 34
NAT REV CLIN ONCOL., vol. 8, no. 5, 2011, pages 302 - 306
NATURE CANCER, vol. 3, 2022, pages 1181 - 1191
NATURE REVIEWS CANCER., vol. 4, 2004, pages 814 - 819
NATURE, vol. 434, 2005, pages 917 - 921
ONCOTARGETS AND THERAPY, vol. 15, 2022, pages 165 - 180
PATCH ET AL., NATURE, vol. 521, 2015, pages 489 - 494
PLOS ONE, vol. 17, no. 3, 2022, pages e0264138
PROSTATE, vol. 74, 2014, pages 70 - 89
VAN WIJK, L.M.NILAS, A.B.VRIELING, H.VREESWIJK, M.P., EXPERT REVIEW OF MOLECULAR DIAGNOSTICS, vol. 22, no. 2, 2022, pages 185 - 199

Similar Documents

Publication Publication Date Title
US11935152B2 (en) Determining biomarkers from histopathology slide images
CN113454733B (en) Multi-instance learner for prognostic tissue pattern recognition
JP7583041B2 (en) A multi-instance learner for tissue image classification
JP7635941B2 (en) System and method for processing slide images and inferring biomarkers - Patents.com
JP2025113294A (en) Method for determining biomarkers from pathological tissue slide images
US12249073B2 (en) Systems and methods for mesothelioma feature detection and enhanced prognosis or response to treatment
US20240428939A1 (en) Systems and methods for evaluation of mitotic events using machine-learning
US20230117405A1 (en) Systems and methods for evaluation of chromosomal instability using machine-learning
Wetteland et al. Automatic diagnostic tool for predicting cancer grade in bladder cancer patients using deep learning
Lami et al. Standardized classification of lung adenocarcinoma subtypes and improvement of grading assessment through deep learning
Wang et al. Advances in multiple instance learning for whole slide image analysis: Techniques, challenges, and future directions
US12488899B2 (en) Systems and methods for determining breast cancer prognosis and associated features
CN120958527A (en) Predictive biomarker discovery based on machine learning and patient stratification using standard of care data
WO2024118842A1 (en) Systems and methods for detecting tertiary lymphoid structures
WO2025173014A1 (en) A method for selecting ovarian cancer patients for parp inhibitor treatment
Tang et al. The Current Challenges Review of Deep Learning-Based Nuclei Segmentation of Diffuse Large B-Cell Lymphoma.
US20250336190A1 (en) Attention-based methods and systems for improving quality control of whole-slide image predictions
US20240212146A1 (en) Method and apparatus for analyzing pathological slide images
JP2025541702A (en) Systems and methods for detecting tertiary lymphoid structures
WO2024238970A1 (en) Density-based immunophenotyping
TW202437271A (en) Predicting patient outcomes related to pancreatic cancer
WO2022236100A9 (en) Systems and methods for identification of pancreatic ductal adenocarcinoma molecular subtypes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24729408

Country of ref document: EP

Kind code of ref document: A1