[go: up one dir, main page]

WO2025251043A1 - Décomposition automatisée d'images de projection pour détecter des structures cibles d'intérêt - Google Patents

Décomposition automatisée d'images de projection pour détecter des structures cibles d'intérêt

Info

Publication number
WO2025251043A1
WO2025251043A1 PCT/US2025/031801 US2025031801W WO2025251043A1 WO 2025251043 A1 WO2025251043 A1 WO 2025251043A1 US 2025031801 W US2025031801 W US 2025031801W WO 2025251043 A1 WO2025251043 A1 WO 2025251043A1
Authority
WO
WIPO (PCT)
Prior art keywords
subject
soi
image
target
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2025/031801
Other languages
English (en)
Inventor
Pengpeng Zhang
Xiang Li
Tianfang LI
Yabo FU
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Memorial Sloan Kettering Cancer Center
Original Assignee
Memorial Sloan Kettering Cancer Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Memorial Sloan Kettering Cancer Center filed Critical Memorial Sloan Kettering Cancer Center
Publication of WO2025251043A1 publication Critical patent/WO2025251043A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Definitions

  • One or more processors coupled with memory can obtain a first dataset for a first subject.
  • the first dataset can include (i) a first projection image acquired via a first x-ray scan of a first volume having a first structure of interest (SOI) moving within the first subject and (ii) first information to be used to define motion of the first SOI in the first subject.
  • the one or more processors can apply a machine learning (ML) model to the first dataset.
  • the ML model is established for at least the first subject using training data identifying a plurality of examples.
  • Each of the plurality of examples may include (i) a respective second dataset comprising (a) a second projection image corresponding to a second x- ray scan of a second volume having a second SOI moving within a second subject and (b) second information to be used to define motion of the second SOI in the second subject, and (ii) a respective target defining the second SOI.
  • the one or more processors can identify, based on applying the ML model to the first dataset, a target defining the first SOI within the first -1- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) projection image.
  • the one or more processors can store, using one or more data structures, an association between the first subject and the target.
  • the one or more processors can generate a third projection image identifying the first SOI, using the first projection image and the target. In some embodiments, the one or more processors can provide an output including the third projection image for presentation. In some embodiments, the one or more processors can determine, based on applying an encoder of the ML model to the first dataset, (i) a first feature corresponding to the first projection image and (ii) a second feature corresponding to the first information. In some embodiments, the one or more processors can generate, based on applying a cross-attention integrator of the ML model to the first feature and the second feature, a third feature.
  • the one or more processors can identify, based on applying a decoder of the ML model to the third feature, the target defining the SOI.
  • the one or more processors can receive a plurality of tomographic images of the first volume in the first subject, prior to administration of a radiotherapy on the first subject and acquisition of the first projection image.
  • the one or more processors can generate, using the plurality of tomographic images, a first digitally reconstructed radiograph (DRR) to use as the first information for the first subject.
  • the one or more processors can generate a command signal indicating an action for an apparatus upon which the subject is, based on identifying the target defining the first SOI in motion in the first projection image.
  • the one or more processors can provide to an actuator mechanically coupled with the apparatus, the command signal to perform the action.
  • the action can include at least one of a translation, rotation, or tilt relative to a beam emitter configured to administer radiotherapy on the subject.
  • the one or more processors can acquire the first x-ray scan of the first volume within the first subject while the subject is on an apparatus. -2- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267)
  • the first dataset may include a first plurality of projection images acquired via the first x-ray scan of a first volume having the first structure of interest (SOI) moving within the first subject.
  • SOI structure of interest
  • the one or more processors can identify, based on applying the ML model to the first plurality of projection images, a plurality of decomposed images defining the first SOI across the first plurality of projection images. Each of the plurality of decomposed images may define a respective target.
  • the one or more processors may identify, from the plurality of decomposed images, a reference image to compare against a remainder of the plurality of decomposed images.
  • the one or more processors may generate a motion trace based on the target in the reference image relative to the target across the remainder of the plurality of decomposed images.
  • the one or more processors can receive the first projection image from an imaging device performing the first x-ray scan, at least in partial concurrence with an administration of a radiotherapy on the first subject.
  • the first subject is to be administered a radiotherapy for cancer.
  • the radiotherapy can include at least one of an intensity-modulated radiation therapy (IMRT), stereotactic body radiation therapy (SBRT), image-guided radiation therapy (IGRT), or brachytherapy.
  • IMRT intensity-modulated radiation therapy
  • SBRT stereotactic body radiation therapy
  • IGRT image-guided radiation therapy
  • the ML model may be trained for at least one of (i) the first subject same as the second subject or (ii) a plurality of subjects including the first subject and the second subject.
  • the first projection image for the first dataset may include at least one of: (i) a single projection image or (ii) a projection video comprising a plurality of projection image frames.
  • the ML model may include at least one of (i) a convolutional neural network, (ii) a transformer network, or (iii) a diffusion model.
  • Other aspects of the present disclosure relate to systems and methods of training models to identify targets in x-ray projection images.
  • One or more processors can obtain training data identifying (i) a dataset for a subject comprising (a) a projection image acquired via an x-ray scan of a volume having a structure of interest (SOI) moving within the subject and (b) information to be used to define motion of the SOI in the subject, and (ii) a first target defining the SOI.
  • SOI structure of interest
  • the one or more processors can apply a machine learning (ML) model comprising a -3- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) plurality of parameters for identifying SOIs to the dataset in at least the subject.
  • the one or more processors can identify, based on applying the ML model to the dataset, a second target defining the SOI within the projection image.
  • the one or more processors can determine at least one metric loss based on at least one of the first target or the second target.
  • the one or more processors can update at least one of the plurality of parameters of the ML model using the at least one loss metric.
  • the one or more processors may determine, based on applying an encoder of the ML model to the first dataset, (i) a first feature corresponding to the projection image and (ii) a second feature corresponding to the information.
  • the one or more processors may generate, based on applying a cross-attention integrator of the ML model to the first feature and the second feature, a third feature.
  • the one or more processors apply a decoder of the ML model to the third feature to identify the second target defining the first SOI.
  • the one or more processors may receive a plurality of tomographic images of the volume in the subject, prior to administration of a radiotherapy on the subject and acquisition of the projection image.
  • the one or more processors may generate, using the plurality of tomographic images, a first digitally reconstructed radiograph (DRR) to use as the information for the subject.
  • the one or more processors may generate, using a discriminator, a classification indicating at least one of (i) the second target or (ii) a feature to derive the second target using the ML model as one of real or fake.
  • the one or more processors may determine the at least one metric based on the classification.
  • the one or more processors may compare the first target with the second target to determine the at least one metric identifying a degree of deviation between the first target and the second target.
  • the training data further comprises a plurality of examples to train the ML model to identify SOIs across a plurality of subjects.
  • Each of the plurality of examples may include (i) a dataset for a respective subject of the plurality of subjects comprising -4- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) (a) a respective projection image acquired via an x-ray scan of a volume having a respective SOI moving within the subject and (b) respective information to be used to define motion of the respective SOI in the respective subject, and (ii) a respective target defining the respective SOI.
  • the SOI may correspond to at least one organ in the subject to be administered the radiotherapy for cancer.
  • the at least one organ may include a spine, a lung, a breast, a gastrointestinal tract, a pelvis, a bone, a tissue, or a lymph.
  • Aspects of the present disclosure are directed to systems and methods of generating command signals for targets in x-ray projection images.
  • One or more processors coupled with memory may obtain a first dataset for a first subject.
  • the first dataset may include: (i) a first plurality of projection images corresponding to a first x-ray scan of a first volume having a first structure of interest (SOI) moving within the first subject and (ii) first information to be used to define motion of the first SOI in the first subject.
  • SOI structure of interest
  • the one or more processors may apply a machine learning (ML) model to the first dataset.
  • the ML model may be established for at least the first subject using training data identifying a plurality of examples.
  • Each of the plurality of examples may include: (i) a respective second dataset comprising (a) a second plurality of projection images corresponding to a second x-ray scan of a second volume having a second SOI moving within a second subject and (b) second information to be used to define motion of the second SOI in the second subject, and (ii) a respective plurality of decomposed images defining the second SOI across the second plurality of projection images.
  • the one or more processors may identify, based on applying the ML model to the first dataset, a plurality of decomposed images defining the first SOI across the first plurality of projection images.
  • the one or more processors may generate a motion trace based on the target across the remainder of the plurality of decomposed images.
  • the one or more processors may generate a command signal indicating an action for an apparatus upon which the subject is based on the motion trace.
  • the one or more processors may provide to an actuator mechanically coupled with the apparatus, the command signal to perform the action
  • the action may include at least one of a translation, rotation, or tilt relative to a beam emitter configured to administer radiotherapy on the subject. -5- 4924-4486-5864.1 Atty. Dkt.
  • the one or more processors may acquire the first x-ray scan of the first volume within the first subject while the subject is on an apparatus.
  • the one or more processors may receive the first plurality of projection images from an imaging device performing the first x-ray scan, at least in partial concurrence with an administration of a radiotherapy on the first subject.
  • the first SOI may correspond to at least one organ in the subject to be administered with radiotherapy for cancer.
  • the at least one organ may include a spine, a lung, a breast, a gastrointestinal tract, a pelvis, a bone, a tissue, or a lymph, wherein the first subject is same as or different from the second subject.
  • the first subject may be administered a radiotherapy for cancer, wherein the radiotherapy comprises at least one of an intensity-modulated radiation therapy (IMRT), stereotactic body radiation therapy (SBRT), image-guided radiation therapy (IGRT), or brachytherapy.
  • the at least one of the second plurality of projection images may include at least one of a simulation of the second x-ray scan or an acquisition of the second x-ray scan.
  • FIG. 2 The workflow of the proposed method.
  • a Pix2Pix patient-specific model was trained using the TS-DRR and DRR image pairs.
  • a synthetic TS-DRR can be generated from the real-time kV projection images to enhance the target visibility.
  • the synthetic TS-DRR was registered to the tumor template TS-DRR generated from the planning CT for intrafraction tumor motion monitoring. -6- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) FIG.
  • FIG. 4 The LUNGMAN phantom with 12 mm diameter tumor. Lung and tumor were deformed by manually pushing the abdomen block in.
  • FIG. 5 CBCTs showing that the tumor moved superiorly.
  • A tumor is at isocenter
  • B tumor was deformed superiorly by 1.8 mm, 5.8 mm and 9 mm, respectively.
  • FIG. 6 Top row: IMR images, Bottom row: respective sTS-DRR. Same landmarks per projection angle were annotated to show the anatomical correspondence between the IMR and its respective sTS-DRR.
  • FIG. 6 Top row: IMR images, Bottom row: respective sTS-DRR. Same landmarks per projection angle were annotated to show the anatomical correspondence between the IMR and its respective sTS-DRR.
  • FIG. 8 Top row: IMR images, Bottom row: respective sTS-DRR. Same landmarks per projection angle were annotated to show the anatomical correspondence between the IMR and its respective sTS-DRR.
  • FIG. 10 (A–C): Tumor moved 1.8 mm, 5.8 mm and 9 mm superiorly. Six projection angles were selected at ⁇ 50° interval. Each triplet shows the projection image, the sTS-DRR and the ground truth (GT). Projection angle was shown in red. The red cross landmark shows the tumor center identified in the GT.
  • FIG. 10 (A–C): Image correlation and SSIM for a tumor motion of 1.8 mm (A), 5.8 mm (B) and 9 mm (C) superiorly. Top row: Image correlation comparison between CBCT projection and synthetic TS-DRR. Bottom row: SSIM comparison between CBCT projection and synthetic TS-DRR. -7- 4924-4486-5864.1 Atty. Dkt.
  • FIG. 11 (A–C): Registration error and histogram for a tumor motion of 1.8 mm (A), 5.8 mm (B) and 9 mm (C) superiorly.
  • Top TS-DRR versus sTS-DRR registration.
  • Bottom TS-DRR versus CBCT projection images registration.
  • FIG. 12 (A) Lung tumor was vaguely visible in projection images, Fig. 12 (B) Lung tumor was hardly visible in projection images.
  • Projection CBCT projection images
  • sTS- DRR sTS-DRR based on projection images
  • 50% TS-DRR TS-DRR generated based on 50% phase of 4DCT.
  • FIG. 13 (A,B) Tumor motion trajectory in 2D obtained by template matching the GTV in the x and y directions.
  • Fig. 13 (C–E) The tumor motion trajectory in 3D calculated using sequential triangulation.
  • FIG. 14 The schematic of PCAT network structure. Left panel: the data flow of the dual branch generator (DBG). Middle panel: the architecture of the building blocks of DBG, from top to bottom, encoder, MHCA, and decoder, respectively. Right panel: dual function discriminator (DFD) network structure and the annotations for the basic layers and operations.
  • DBG dual branch generator
  • Middle panel the architecture of the building blocks of DBG, from top to bottom, encoder, MHCA, and decoder, respectively.
  • DBG dual branch generator
  • Middle panel the architecture of the building blocks of DBG, from top to bottom, encoder, MHCA, and decoder, respectively.
  • Right panel dual function discriminator (DFD) network structure and the annotations for the basic layers and operations.
  • FIG. 16A Side-by-side comparison of the 2D kV images (first column), the decomposed spine images using ResNetGAN (second column), the reference spine DRR (third column), and the decomposed spine images using the proposed PCAT (fourth column). Random examples are shown at x-ray beam angles 260°, 235°, 160°, and 145°. The lines on the images show the location where the line profiles were performed for FIG. 16B.
  • FIG. 16B The line profiles of the 2D kV images (blue), the decomposed spine images using ResNetGAN (pink), the reference spine DRR (green), and the decomposed spine -8- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) images using the proposed PCAT (red) of examples shown in FIG. 16A.
  • the location where the line profiles are performed is displayed using the lines on the images in FIG. 16A.
  • FIG. 17 Comparison of quantified SSIM and PSNR between the kV image versus reference spine DRR and the decomposed spine image to the reference spine DRR.
  • FIG. 18 Violin plot for comparison of the motion tracking accuracy in the anteroposterior (AP) and lateral (LAT) x-ray beam angle groups for the testing patients from the kV image, the ResNetGAN, and the proposed PCAT decomposed spine image to reference spine DRR rigid registration.
  • AP 0°–45°, 135°–225°, and 315°–360°.
  • LAT 45°–135° and 225°– 315°.
  • the violin plot shows the median (thick red line), mean (thick black line), and individual density distribution curve on the sides of the vertical center line.
  • FIG. 19 depicts a block diagram of an example network architecture to create high-quality X-ray imaging by decomposing the regions of interest from the X-ray projection image using a Deep-Learning method.
  • FIG. 20 depicts a block diagram of an example network architecture to use the target decomposed X-ray image to match the reference image template for motion monitoring in Radiotherapy.
  • FIG. 21 depicts a block diagram of an example network architecture to implement the target decomposition technique with patient-specific model training.
  • FIG. 22 depicts a block diagram of an example network architecture to implement the target decomposition technique with a population-based model training and then refine it with the new patient information for a particular patient treatment. -9- 4924-4486-5864.1 Atty. Dkt.
  • FIG. 23 depicts a block diagram of an example network architecture to implement the target decomposition technique with a patient-specific prior population-based model, which includes two inputs: the X-ray image and the patient-relevant prior information.
  • FIG. 24 depicts a block diagram of an input for target decomposition as a single X-ray image or an image sequence
  • FIGs. 25-27 depict block diagrams of an example of machine learning architectures for target decomposition based on a convolutional neural network, vision transformer architecture, or diffusion model.
  • FIG. 28 depicts a block diagram of a system for identifying targets in x-ray projection images, in accordance with an illustrative embodiment; FIG.
  • FIG. 29 depicts a block diagram of a process for training a machine learning (ML) model to identify a target defining a structure of interest (SOI) within a projection image, in accordance with an illustrative embodiment
  • FIG. 30 depicts a block diagram of a process for applying the ML model to a dataset to identify the target defining the structure of interest, in accordance with an illustrative embodiment
  • FIG. 31 depicts a block diagram of a process for generating a signal command to control an actuator on a table based on a movement of the target, in accordance with an illustrative embodiment
  • FIG. 32 depicts a flow diagram of a method of training the ML, in accordance with an illustrative embodiment
  • FIG. 34 depicts a block diagram of a server system and a client computer system, in accordance with one or more implementations. DETAILED DESCRIPTION Following below are more detailed descriptions of various concepts related to, and embodiments of, systems and methods for identifying targets in x-ray projection images. It should be appreciated that various concepts introduced above and discussed in greater detail below may be implemented in any of numerous ways, as the disclosed concepts are not limited to any particular manner of implementation.
  • Section A describes enhancing the target visibility with synthetic target specific digitally reconstructed radiograph for intrafraction motion monitoring.
  • Section B describes patient specific prior cross attention for kV decomposition in paraspinal motion tracking.
  • Section C describes intrafractional markerless lung tumor tracking.
  • Section D describes network architectures for identifying targets in x-ray projection images.
  • Section E describes a system and method for identifying targets in x-ray projection images.
  • Section F describes a network environment and computing environment which may be useful for practicing various computing related embodiments described herein.
  • ITV image Guided Radiation Therapy
  • EBRT external beam radiotherapy
  • CBCT Cone Beam CT
  • ITV internal target volume
  • MR-LINAC systems for example the Viewray MRIdian and Elekta Unity have been developed to combine MR imaging with LINAC to leverage the superior soft tis-sue contrast of MRI for radiotherapy.
  • BrainLab ExacTrac and Vero Gimbal systems use stereoscopic kV imaging with external breathing signals for patient motion monitoring during treatment.
  • Other motion tracking systems include electromagnetic transponders, for example, Varian Calypso, surface imaging, for example, AlignRT and ultrasound imaging. In this study, the focus is on kV x-ray imaging- based intrafraction motion monitoring using an onboard imager that is equipped on most modern LINACs.
  • fiducial marker implantation is an invasive and costly procedure that is usually associated with medical risks, therefore, not universally available to patients.
  • the small number of implanted fiducial markers are a sparse point representation of the 3D solid tumor that could change shape and volume over the course of treatment. Markers also can migrate, yielding positional errors in the kV images. Markerless target monitoring using kV imaging is highly desired.
  • markerless lung tumor motion tracking methods were systematically investigated and benchmarked.
  • One common challenge for markerless tumor tracking is the low tumor visibility of the onboard kV projection images.
  • the tumor target is obscured by overlapping structures along the x-ray projection path, resulting in low target visibility.
  • One approach performed MV/kV imaging-based lung tumor tracking and discussed that one major obstacle is the low tumor visibility.
  • Another approach performed kV imaging- based markerless lung tumor tracking and showed that the low tumor visibility was one of the major causes for unsuccessful tracking.
  • yet another approach investigated the use of dual energy x-ray imaging to enhance tumor visibility by removing overlapping bony structures, such as the ribs.
  • Another approach showed the feasibility of optimizing dual-energy x-ray parameters to enhance soft-tissue imaging.
  • Another approach proposed a thoracic bone suppression algorithm to enhance the sensitivity and specificity of the detection and localization of lung nodules.
  • Another approach proposed a deep learning model to decompose the spine from x-ray projection images, which can be used to improve paraspinal tumor tracking.
  • Another approach developed a deep learning-based patient- specific model to reconstruct volumetric CT image from a single projection image or a few projection images. The model was tested on one upper-abdomen patient, one head-and-neck patient, and one lung patient. This method demonstrated the feasibility of using a patient- specific model with prior CT knowledge to reconstruct volumetric CT images.
  • Another approach proposed reconstruct volumetric CT from a single digitally reconstructed radiograph (DRR) for lung patients.
  • DRR digitally reconstructed radiograph
  • This method was trained on phases of lung 4DCT images and tested on the remaining phase to demonstrate the potential use for tumor tracking. Based on tests on 20 patients, the method was able to reconstruct the volumetric CT with the tumor center-of-mass -13- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) positional accuracy within 2.6 mm.
  • the above-mentioned methods have demonstrated the great potential of using patient-specific deep learning models for sparse-view CT reconstruction.
  • one common limitation is that their models were tested on only DRRs rather than the real onboard projection images.
  • CBCT is often acquired after the kV/kV or kV/MV setup pair to further improve the target positioning accuracy before beam delivery.
  • the target visibility of the projection images is affected by the kV energy used, detector efficiency, soft tissue attenuation coefficient and so on. Nevertheless, one of the major contributing factors to low target visibility is the overlapping structures with the target along the x-ray projection path.
  • a new imaging modality is introduced, the target-specific DRR, short for TS- DRR.
  • the reconstruction process of TS-DRR is shown in FIG. 1. Considering one thoracic vertebra as the planning target volume (PTV), the CT volume is truncated to include only the section that overlaps with the PTV for TS-DRR generation.
  • FIG. 1 Considering one thoracic vertebra as the planning target volume (PTV), the CT volume is truncated to include only the section that overlaps with the PTV for TS-DRR generation.
  • sTS-DRR Target-Specific DRR
  • the proposed workflow of using sTS-DRR for treatment is shown in FIG. 2. During treatment, the patient was first positioned on the couch using kV setup pairs and CBCT.
  • intrafraction kV projection images were taken at a pre-determined frequency.
  • the real-time kV projection images such as the Intrafraction Motion Review images (IMR) were fed to the trained model to generate sTS-DRR with enhanced target visibility.
  • the IMR is a real- time, 2D motion management tool on the Varian Truebeam TM system featuring triggered imaging acquired with the OBI during beam delivery.
  • Tumor TS-DRR template was generated from the planning CT.
  • the tumor in the sTS-DRR generated from the real-time kV projection images was registered to the tumor template in the TS-DRR for motion tracking. Based on the calculated tumor motion, motion management techniques such as beam gating or multileaf collimator tumor tracking could be used during beam delivery. 2.2.
  • Such a patch-level discriminator architecture has fewer parameters than a full-image discriminator.
  • the Pix2Pix network can be trained to create the patient-specific model for sTS-DRR generation based on projection images.
  • 2.3. Training and testing The original Pix2Pix paper focused on image style transfer without the explicit emphasis on rigorous pixel-to-pixel correspondence. Minor training image pair misalignments -15- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) can be tolerated with the method.
  • the training image pairs were generated by cropping the roughly aligned source-and-target domain images at different pixel locations that are within a certain distance.
  • the DRR and TS-DRR image pairs are prepared to have exact pixel-to-pixel correspondence. Therefore, the training image pairs were cropped at the same pixel location for both the DRR and TS-DRR.
  • translation-invariance is a desirable feature for the neural network, meaning the network prediction is robust to minor image shift.
  • the translation-invariance feature is undesirable since any input image shifts caused by intrafraction patient motion need to be preserved in the output sTS-DRR.
  • image augmentation was used to train the network to be equivariant to translation, meaning the output image will be shifted equally when the input image is shifted.
  • the CT was randomly shifted in the lateral, longitudinal, and vertical directions in the range of ⁇ 1cm around the isocenter with uniform distribution. This is analogous to rigidly shifting the treatment couch around the treatment isocenter during DRR generation.
  • speckle noise with variance of 0.001 was added to the DRR during training.
  • LSGAN least-square GAN
  • the network was trained on Nvidia RTX A6000 graphic card with 48G memory. The training took around 4–8 h depending on the treatment site and the number of training image pairs used.
  • the number of training image pairs depends on the CT, for example whether a 4DCT or a single CT was used, and on the degree of augmentation, for example the number of random shifts performed per projection angle.
  • the IMR and raw CBCT projection images were cropped from 1024 ⁇ 768 to 512 ⁇ 512.
  • a simulation CT with sub-millimeter slice thickness should be used for model training because it allows adequate spatial resolution and time for network training before the patient starts treatment. Nevertheless, a CBCT dataset could be used as a surrogate when a high-resolution CT is not available.
  • CT usually has higher image quality but lower spatial resolution than the CBCT.
  • DRRs generated from the same-day CBCT usually have higher anatomy consistency with the IMR images that are acquired shortly after.
  • same-day -16- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) CBCT is not available until the treatment session begins, which gives limited time for network training.
  • CBCT from previous fractions could be used if the spatial resolution is of absolute importance, for example paraspinal SBRT.
  • 4DCT is often acquired to assess the tumor motion amplitude to decide the appropriate motion management technique, such as free breathing or breath hold. The 4DCT should be used for model training since it contains the tumor motion at 10 different phases within a full respiratory cycle.
  • FIG. 3 shows the proposed pre-processing steps that can effectively mitigate this problem.
  • the raw CBCT projection data were first corrected for the bowtie filter attenuation.
  • the histogram of the projection images was matched to the training DRR with the same projection angle.
  • IMR images were used for patient motion monitoring. Since no bowtie filter is used for IMR, the histogram of the IMR was directly matched to the DRR in FIG. 3(B).
  • the preprocessed projection images are then used to generate the sTS-DRR.
  • Spine tumor 2.4.1. Phantom study For paraspinal SBRT, the target is usually very close to the spinal cord. In the clinic, IMR images were acquired to monitor patient intrafraction motion during the beam delivery to ensure target coverage and spinal cord sparing. In this study, nine-field fixed gantry angle IMRT were used for treatment planning.
  • the nine projection angles of the IMR are [10°, 30°, 50°, 70°, 90° 110°, 130°, 150°, 170°].
  • An IMR image was acquired every 200 MU in IMRT -17- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) during beam delivery.
  • the IMR acquired during treatment can be registered to the planning DRR around the PTV.
  • the robustness of the registration is sometimes impaired by the target’s low visibility.
  • the sTS-DRR aims to improve the target visibility and could potentially increase the tumor tracking accuracy and robustness.
  • high-resolution CBCT (0.45 ⁇ 0.45 ⁇ 0.45 mm 3 ) was used as a surrogate to generate training images.
  • the treatment isocenter for the phantom was at the 7th cervical vertebra.
  • the phantom was manually shifted by 1 mm, 2 mm, 3 mm, and 4 mm in all the lateral, longitudinal, and vertical directions. For each shift, nine IMR images from nine different gantry angles were acquired during beam delivery for analysis. sTS-DRRs were generated based on these IMR images for testing. 2.4.2.
  • the trained model is equivariant to translation if the output sTS-DRR is equally shifted when its input IMR is shifted. Since high- resolution simulation CT with slice thickness ⁇ 1mm was not available for paraspinal patients in the clinic, CBCT was used as a surrogate to generate training images.
  • a patient-specific model was trained for a patient who underwent paraspinal SBRT using nine-field fixed gantry angle IMRT with IMR intrafraction monitoring. The projection angles were the same as for the phantom study.
  • the IMR images were intentionally shifted in the x and y directions by 1 mm, 2 mm, 3 mm and 4 mm.
  • sTS-DRRs were generated based on the shifted IMR for testing.
  • Translation-only image registration was performed using MATLAB between the sTS-DRRs with and without the shift.
  • the MATLAB built-in OnePlusOneEvolutionary optimizer was used with MattesMutualInformation as the image similarity metric.
  • FIG. 4 shows the experimental setup.
  • the abdomen block of the phantom was initially pulled out by ⁇ 2 cm to simulate end-inhalation.
  • one CBCT was acquired to align the treatment isocenter to the center of the tumor.
  • Subsequent CBCTs were taken after the abdomen block was pushed in by hand, little by little.
  • the three subsequent CBCTs revealed that the lung was deformed by the abdomen block, and the tumor moved superiorly by 1.8 mm, 5.8 mm, and 9 mm, respectively, as shown in FIG. 5.
  • the tumor was moved by ⁇ 1 mm laterally and vertically while being pushed in by the abdomen block.
  • 4DCT or 4DCBCT is preferred over a single static CT or CBCT when generating training image pairs since 4DCT or 4DCBCT captures the tumor motion at different phases throughout a respiratory cycle.
  • the training image pairs is generated using only the CBCT with the tumor at the isocenter (FIG. 5A).
  • simulation CT could be used to train the network.
  • a total of 9000 training image pairs were generated from the 10 phases of CT images by projecting the -19- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) DRR and TS-DRR at every 2° angle and augmented with five random shifts within ⁇ 1 cm around the isocenter in the lateral, vertical and longitudinal directions.
  • the same training and testing configurations as the spine were used for the lung patient.
  • the network was trained for 50 epochs which took around 8 h. Since real-time kV projection images during beam-on is not available, the CBCT projection images is used for testing to generate the sTS-DRR.
  • PTV, ITV and CTV contours were shown to indicate the location of the tumors.
  • the tumor was manually contoured on the phase 50% CT. Centroid of the tumor was identified based on the binary tumor mask. Template matching with the GTV as the ROI and SSIM as the image similarity metric was performed between the sTS-DRR and the phase 50% TS-DRR to track the tumor. 3. RESULTS 3.1. Spine 3.1.1. Phantom study The IMR images were pre-processed to match the histogram of the training DRR prior to testing, as shown in FIG. 3. FIG. 6 shows the IMR and its respective sTS-DRR at four projection angles. Around 10 landmarks were manually selected for each projection angle.
  • the landmarks were identified as the bony edges or vertebral body corners/edges that can be clearly observed on either the projection image, or the sTS-DRR or on both.
  • the same landmarks were plotted for the IMR and its respective sTS-DRR to allow visual assessment of the anatomical correspondence between the two.
  • FIG. 6 shows that the sTS-DRR greatly improved the target visibility.
  • the anatomy corresponds well between the IMR and sTS-DRR. Because the shoulder obscured the target for certain projection angles such as 70°, 90°, and 110°, the target visibility was very low for these angles.
  • FIG. 6 (70°) shows that the network was able to reconstruct the target structures even when the IMR had very low image contrast.
  • the registration was performed manually by shifting one image in the x and y directions to match the other image.
  • Image fusion and image toggle tools were used to assess the image alignment.
  • the sTS-DRR performed slightly better than the IMR in terms of tracking accuracy.
  • the absolute mean errors were 0.13 ⁇ 0.06 mm in the x direction and 0.31 ⁇ 0.05 mm in the y direction.
  • the absolute mean errors were 0.11 ⁇ 0.05 mm in the x direction and 0.25 ⁇ 0.08 mm in the y direction.
  • FIG. 8 shows the IMR images and their respective sTS-DRR images at four different projection angles.
  • the absolute mean errors of the registration between sTS-DRR with and without 1 mm, 2 mm, 3 mm, 4 mm shifts are 0.18 ⁇ 0.17 mm and 0.04 ⁇ 0.03 mm in the x and y directions, respectively. Due to the high contrast of the vertebra in the longitudinal direction, the registration error in the longitudinal direction is almost negligible. Compared to the longitudinal direction, the x direction has relatively larger errors which were mainly due to the less visible vertebra body edge in the IMR.
  • the mean absolute error of 0.18 mm in the x direction is at sub-pixel level. -21- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) Therefore, the trained model is considered to be equivariant to translations with sub-pixel accuracy, which is a desired feature for intrafraction motion monitoring. 3.2. Lung 3.2.1. Phantom study The results are shown in FIG. 9, including the CBCT projection images, the generated sTS-DRR and the ground truth TS-DRR for six projection angles at 50° interval. FIG. 9 shows that the tumor visibility was greatly improved. The red cross in FIG.
  • the registration error shows the same trend with slightly larger errors for projection angles around 140° and 320°.
  • the sTS- DRR has increased the image correlation with the ground truth by around 83%, and increased the structural similarity index measure with the ground truth by around 75%.
  • the sTS-DRR and the CBCT projection images were separately registered to the ground truth using the same translation-only registration as the spine case described previously.
  • the registration errors and their respective histograms are shown in FIG. 11.
  • the absolute mean registration errors were 0.1 ⁇ 0.3 mm in both the x and y directions for all target -22- 4924-4486-5864.1 Atty.
  • the registration errors are less than 1mm for 98% of the angles, are less than 0.5 mm for 90% of the angles.
  • the absolute mean registration errors were 0.4 ⁇ 0.5 mm in the x direction, and 0.4 ⁇ 0.3 mm in the y direction.
  • the registration errors are less than 1 mm for 95% of the angles, are less than 0.5 mm for 81% of the angles. Therefore, the sTS-DRR outperformed the CBCT projection images in terms of tumor tracking accuracy. 3.2.2.
  • FIG. 12(A) shows the results for projection angles when the lung tumor was vaguely visible by the human eye in the CBCT projection images.
  • FIG. 12(B) shows the results for projection angles when the tumor was hardly visible to human eye in the CBCT projection images.
  • the network was able to reconstruct the tumor location and shape from the image features learned from the training data.
  • the tumor trajectory of the 2D template matching was shown in FIG. 13(A,B). Sequential triangulation was implemented to calculate 3D tumor coordinates from the 2D match results.
  • FIG. 13(C-E) shows the trajectory of the calculated tumor centroid location in the vertical, lateral, and longitudinal directions.
  • the magnitude of the tumor motion was consistent with the tumor motion measured from 4DCT, which were approximately 3.4 mm sup-inf, 2.2 mm lateral, and 2.2 mm vertical. 4.
  • the sTS-DRR generated by the patient-specific model can greatly enhance the target visibility in the onboard projection images.
  • the anatomical correspondence demonstrated in the spine and lung studies shows that the network learned to reconstruct the target based on the image features of the input projection images, rather than simply ‘memorizing’ the absolute image coordinates of the training datasets.
  • the lung phantom study shows that the network trained using CBCT with a tumor at an isocenter (FIG. 5A) can be applied to projection images after the tumor has moved to different locations (FIG. 5B-D).
  • the patient studies show that the proposed method can greatly enhance tumor visibility even when the tumor is only vaguely visible in the projection images (FIG. 12A).
  • the sTS-DRR could still reconstruct the tumor, with blurred tumor shape.
  • the impact of the projection image quality on tumor localization accuracy with the proposed method needs further thorough investigation and is beyond the scope of this study.
  • the sTS-DRR outperformed the IMR for the spine tumor case and the CBCT projection images for the lung tumor case in terms of tumor tracking accuracy.
  • the improvement was due to the enhanced tumor visibility across the entire projection angles.
  • Low target visibility can be caused by overlapping structures along the X-ray projection line, including soft tissues and bony structures.
  • the network’s performance was slightly impaired when bony structures obscured the target, improved target visibility with increased image correlation and SSIM was still observed for these angles.
  • Sub- millimeter tracking accuracy was achieved using sTS-DRR for the spine tumor, even for angles where the IMR failed to track the tumor due to low target visibility.
  • the mean tracking error was reduced by 75% when using sTS-DRR, as compared to using the CBCT projections.
  • the average time needed to generate one sTS-DRR of size 512 ⁇ 512 is 35 ms.
  • the latency is approximately 50 ms.
  • the time interval between two consecutive CBCT projections is approximately 70 ms.
  • the tumor tracking -24- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) for the current CBCT projection could be finished before the next CBCT projection image is available.
  • the latency could result in delayed beam hold after the tumor has moved out of the PTV.
  • a tighter margin could be used to mitigate the latency effect.
  • the sequential triangulation algorithm relies on multiple previous projections at roughly the same respiratory phase to estimate the tumor location with the least mean square distance. Therefore, if the patient’s breathing pattern is highly irregular, it would result in a large estimation error of the tumor location.
  • translation-only data augmentation is used by randomly shifting the CT around the isocenter within ⁇ 1 cm in this work for its simplicity. This is sufficient by the phantom and patient studies. For each projection angle, at least five training image pairs should be generated by randomly shifting the CT around the isocenter.
  • This data augmentation step is crucial since it encourages image feature-based learning and prevents the network from simply ‘memorizing’ the image feature location. Nevertheless, rotations and deformation could be used to further augment the training datasets, which could improve the performance at the cost of increased training time. Furthermore, to incorporate a variety of tumor motion scenarios, artificial lung motion can be generated using principal component analysis of the patient-specific lung deformation. The augmentation could potentially increase the performance by allowing the network to learn from a wide range of tumor motion scenarios.
  • the proposed histogram matching before network testing is an effective way to account for the difference between the training DRR and projection images due to the Compton scatter, different image filters, x-source energy, mas, and detectors.
  • the generalizability of a patient-specific model means whether the model trained on DRR generated from the simulation CT/4DCT can be applied to the real-time kV images.
  • data augmentation is introduced for model training and histogram- matching image pre-processing for model testing.
  • the results show that the patient-specific models generalized well on the IMR for the spine tumor and on the CBCT projection images for the lung tumor.
  • only one case was investigated for the spine and lung tumors, -25- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) respectively.
  • Each patient is unique in terms of tumor size, location, and tumor occlusion by other structures in the kV projection images.
  • the proposed method needs to be evaluated on more patients with a variety of clinical scenarios to demonstrate its potential in intrafraction motion monitoring.
  • same-day CBCT is preferred if the CBCT has good image quality around the target. This is because (1) same-day CBCT shows greater anatomy consistency with the real-time projection images as compared to simulation CT due to daily anatomy variations, and (2) CBCT has higher spatial resolution than the simulation CT, which helps to mitigate the blurring effect of DRR generation.
  • the challenges of using same-day CBCT for training are two-fold: (1) the CBCT generally has lower image quality than the simulation CT due to patient motion blur and scattering. This could be mitigated by using 4DCBCT for targets with motion or a fast CBCT acquisition.
  • the proposed method could be used to improve the markerless tumor tracking accuracy for external beam treatment.
  • B. Patient Specific Prior Cross Attention For Kv Decomposition In Paraspinal Motion Tracking -26- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) 1.
  • INTRODUCTION AND PURPOSE Online kV imaging has been used for motion monitoring during external beam radiotherapy. However, the kV-based markerless motion tracking is limited by the reduced contrast of the target, mainly due to the superimposition of other structures along the x-ray beam path.
  • An image processing technique that can separate a particular object from an x-ray image which is referred to as image decomposition, will give the best contrast of the object, and can probably lead to far-reaching applications for online kV images.
  • a deep learning-based medical image decomposition approach may be developed using the ResNetGAN.
  • deep-learning-based image generative models tend to generate artifacts when the training data distribution is limited, and downstream applications are susceptible to adversarial attacks. Consequently, it reduces confidence in applying the deep-learning model in medicine.
  • a new input of patient-specific prior may be introduced to improve the generalizability of the deep-learning model.
  • Priori information has been widely used to enhance performance in solving medical imaging problems, for example, in the fields of image registration, image reconstruction, image segmentation, denoising, image artifact removal, and synthetic image generation.
  • prior knowledge is often incorporated into a model with an explicit form.
  • an edge-preserving prior took the form of a quadratic penalty or potential function.
  • the total variation was employed as the L1 norm.
  • the prior image-constrained compressed sensing used mean square errors to measure the similarity between corresponding pixels in the image reconstructed and the prior image.
  • the regularization in the latent space with the patient-specific prior is learnable rather than pre-defined.
  • the multi-head cross attention may be employed to calculate the attention map for fine recovery in the decoder by distilling the correlated features between the x-ray projection image and the patient-specific prior.
  • MHCA has the mechanism to jointly allow the deep -27- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) learning model to attend to cross-branch information from different representation subspaces.
  • the patient-specific cross attention (PCAT) network structure has the MHCA embedded in a Wasserstein Generative Adversarial Network (WGAN).
  • Motion tracking may be used in paraspinal SBRT to demonstrate the x-ray projection enhancement with deep-learning-based image decomposition.
  • spine SBRT 2D kV projections are acquired and compared with spine digital reconstructed radiographs (DRR) for patient motion monitoring.
  • DRR spine digital reconstructed radiographs
  • the kV to DRR matching accuracy is limited, owing to the reduced spine visibility by the superimposition of other structures.
  • the image quality of the decomposed spine image and the motion detection accuracy may be assessed for spine tracking in paraspinal SBRT. The results are thoroughly compared with the ResNetGAN, and it is shown that the proposed PCAT approach has improved performance with both the image quality and motion detection accuracy for spine tracking in paraspinal SBRT. 2.
  • the Varian TrueBeam system can acquire kV images during radiation delivery for patient motion monitoring. These images, also known as intrafraction motion review (IMR) images, are triggered based on gantry angles, time, or delivered monitor units (MUs) during treatment.
  • IMR intrafraction motion review
  • MUs delivered monitor units
  • the quality of IMR images can vary and may not be sufficient for motion tracking in certain patients due to significant attenuation at specific gantry angles. This necessitates the need to enhance image quality.
  • a deep- learning network model can be trained using these IMR images, they are limited in availability per patient. Therefore, a different source of kV images may be used to demonstrate the effectiveness of the proposed deep-learning approach.
  • the kV images to be decomposed were obtained from the patient’s CBCT scan before radiation treatment (RT), specifically the prior-RT CBCT projections.
  • RT radiation treatment
  • the -28- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) 3D volume from the same CBCT scan was used to generate the ground-truth spine image, which is the digitally reconstructed radiograph (DRR) of the spine only.
  • the CBCT raw projection data for each patient was sampled with an x-ray beam angle of 5° apart to limit anatomy overlap, resulting in 72 kV images per patient. 24 paraspinal SBRT patients may be included in this study under an ongoing IRB-approved clinical protocol.
  • the patients’ CBCT projection raw data were acquired with the full trajectory half-fan scan with a kV source voltage of 125 kV and a current of 15 mA. Nineteen patients were randomly selected for training, while the remaining five patients were used for testing. It is important to note that the proposed model was tested and trained at the per-image level rather than per-patient to ensure the model’s generalizability. Furthermore, to assess the robustness of the approach, the testing images were obtained from patients who were entirely different from those in the training group.
  • the raw projection image and the prior image may be defined as IkV and Ip, respectively.
  • the forward projection of the spine may be defined using CBCT as I spine , the digitally reconstructed radiograph (DRR) of all anatomy as DRRAll, and the deep learning model decomposed spine image as spinepred.
  • DRR digitally reconstructed radiograph
  • I p T (I spine ), where T (I spine ) depicts random shift and rotation, simulating the rigid spine motion relative to the spine component of IkV.
  • the Ip, or spine only DRR with added random motions serves as the prior knowledge for the neural network model.
  • random motions may be injected in the range of shift ( ⁇ [-32,32] pixels ( ⁇ [-8,8] mm)) and rotation ( ⁇ [-2°,2°]).
  • This process ensures that the prior knowledge is patient-specific and accounts for the potential motion of the spine during treatment.
  • the simulated motion in the study may not precisely replicate a clinical setting, as 2D motion simulations are chosen instead of 3D simulations with the consideration of computational complexity. -29- 4924-4486-5864.1 Atty. Dkt.
  • the new scheme has two inputs, including (1) the on-treatment kV projection image, which is the image to be decomposed, and (2) the randomly shifted and rotated DRR from pre-treatment CBCT as prior.
  • the same encoder performs the encoding process for the kV image and prior parallelly.
  • the latent features IkVfeature and Ipfeature are integrated through the MHCA and then fed to the decoder for the prediction of spine pred . Subsequently, the soft tissue component of the kV image I kV is computed as the complement component of the spine component.
  • the DFD is the discriminator with functions in scoring the realness of the spinepred and feature extraction for image feature matching.
  • the DFD (right panel of FIG. 14) can also be used only for feature extraction.
  • Cross attention mechanism The proposed PCAT network employs an encoder-decoder architecture.
  • the I kVfeature and I pfeature the output of the encoder, may be defined as the latent features of the kV projection image IkV and prior image Ip.
  • the decomposition scheme is IkVfeature ⁇ spinepred.
  • the major technological advancements of this study include the I pfeature and the patient-specific cross attention mechanism for the incorporation of the prior.
  • Latent feature extraction may be performed using the same encoder, for IkV and Ip.
  • the MHCA (in Equation (1)) may be -30- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) used as a layer gating the information flow, such that the IkVfeature with high relevance to the features of prior, Ipfeature, get higher weights in the decoding process in generating the spinepred.
  • the IkVfeature and the Ipfeature are scale-dot-producted and then SoftMax-ed to obtain the attention probabilities noted as ⁇ (IkVfeature, Ipfeature ), as summarized in Equation (2).
  • the “scale” part of the aforementioned scale-dot-product is to divide the dot- producted (particularly, I pfeature I’ kvfeature ) by ⁇ d k , where d k is the feature dimension.
  • the I′ kvfeature represents the transpose of I kVfeature .
  • the MHCA is embedded between the encoder and decoder. With the depth of the encoder (FIG. 14), dk is considerably large. Thus, the dot product grows large in magnitude and diminishes the gradients after SoftMax. Further, the MHCA projects the k vfeature and I p feature , so the cross-attention heads calculate attention maps parallelly and then recombine to final attention probabilities. In this work, two parallel attention heads may be employed. The IkVfeature is then weighted-summed with the attention probabilities ⁇ ( ⁇ ), and finally concatenated, as shown in the center panel of FIG. 14. The PCAT uses the tensorflow_addons 0.9.1 implementation of MHCA as the embedded layer between the encoder and decoder of the generator.
  • the critic from the WGAN may be adapted and modified as the discriminator.
  • the discriminator provides a score on the realness or fakeness of a given image, which is the output of the final layer Dense of size 1.
  • the network structure is identical to the discriminator of ResNetGAN, which is patchGAN.
  • the discriminator is trained to distinguish the real and -31- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) fake Ispine.
  • the discriminator loss is implemented as the Wasserstein loss in the form of average discriminator score. 2.4.2.
  • the discriminator generates additional output, which is the concatenated features of the discriminator’s second, fourth, and sixth layers.
  • the spinepred and the Ispine may be fed to the discriminator to extract the latent features, namely spine predfearure and I spinefeature .
  • the perceptual loss is computed as the mse(spine predfeature , I spinefeature ).
  • the feature-matching loss provides the supervision of the spinepred in the feature domain, in comparison with that of the ground truth spine component I spine . 2.4.3. Compound loss
  • the DRR of all anatomy may be defined as DRRAll.
  • Line profile analysis may be performed to assess the image quality of the decomposed images.
  • the images are normalized to [0, 1] first.
  • the lines are then chosen in the horizontal (x-) and vertical (y-) directions (in the imager coordinates), going across a vertebra, or multiple bony structures, respectively.
  • the peak signal-to-noise ratio (PSNR) and structural -32- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) similarity index measure (SSIM) are quantified to measure the predicted images’ quality.
  • the PSNR is defined as the maximum signal power to noise power ratio, while the SSIM measures the overall similarity between two images.
  • the motion detection accuracy and the performance robustness may be assessed regarding the matching error dependency on beam angle.
  • the decomposed spine image and kV image were matched to the reference spine DRR via mutual information-based rigid registration using the Registration Estimator app auto-generated code on MATLAB R2020b.
  • the image intensities may be normalized to a range [0, 1] and then apply an ROI of (128:384, 240:496).
  • the ROI is a box containing the middle part of the imaged spine (size of 6.25 cm ⁇ 6.25 cm).
  • the histogram bin number calculation is performed using the adapted Freedman- Diaconis method via the adaptive probability density estimation.
  • the OnePlusOneEvolutionary optimizer and metric MattesMutualInformation may be used. Further, the GrowthFactor, Epsilon, Initial Radius, MaximumIterations, and the number of spatial samples are 1.5, 1.5e-4, 4.25e-5, 1e3, and 64, respectively. Quantitative comparison of the 2D vector magnitudes of matching errors was conducted in the two x-ray beam angle groups: the lateral (LAT) for 45°- 135° and 225°-315°, and the anteroposterior (AP) for 0°-45°, 135°-225°, and 315°-360°. 2.6.
  • LAT lateral
  • AP anteroposterior
  • Model comparison study To elaborate on the viability and effectiveness of patient-specific prior in kV image decomposition, the model performance of the proposed PCAT and the recently published ResNetGAN may be compared. For comparison, the same structure for the encoder, decoder, and DFD, may be kept for PCAT and ResNetGAN.
  • the PCAT can degenerate to ResNetGAN. Specifically, the ResNetGAN generator takes the path I kV ⁇ encoder ⁇ I kVfeature ⁇ decoder ⁇ spine pred ⁇ DFD ⁇ spine predfeature , which is the PCAT that had the prior and MHCA removed. -33- 4924-4486-5864.1 Atty. Dkt.
  • the Nesterov-accelerated Adaptive Moment Estimation optimizer or the Nadam, which has been shown to outperform its parent Adam optimizer, may be used.
  • the learning rate is 1e-4, with ⁇ 1 0.9, ⁇ 2 0.99 and a schedule decay of 4e-4.
  • the inference time for ResNetGAN and PCAT are both about 0.15 s per frame, respectively.
  • the paired two-tailed t-test may be used to compare (1) the image quality of the kV image and the decomposed spine images, (2) the mutual information-based rigid registration detected image matching error for I kV to I spine and the spine pred to I spine , for the proposed PCAT and the ResNetGAN decomposed spinepred.
  • FIG. 15 showcases the kV images and the PCAT-decomposed spine images generated by the proposed model for all patients.
  • the performance of the proposed patient-specific prior approach using PCAT may be evaluated, and may be compared with ResNetGAN.
  • the side-by-side comparison is shown, with visualization of the images in FIG. 16A and corresponding line profiles in FIG. 16B.
  • Four examples are randomly chosen from varied x-ray beam angles of the testing patients. From top to bottom, the examples shown (FIG.
  • the PCAT approach effectively improves the model performance in retaining and preserving the spine structural information while removing soft tissues. Examples are shown with the superior vertebral body (FIG. 16A, panels a4, a3, and a2) and inferior vertebral body (FIG. 16A, panels d4, d3, and d2).
  • the ResNetGAN prediction has the corruption of the spine structure in the inferior region of the first-row example (FIG.16A, panel a2), while the proposed PCAT approach generated spinepred images have the spine structure better preserved.
  • FIG. 3b line pro-files further demonstrated the observations mentioned above with (1) effectiveness in removing soft tissues and (2) agreement with the reference DRR regarding both peak locations and intensity. For further evaluation of the image quality, the SSIM and PSNR are quantified.
  • the metrics are illustrated using pie plots to show the correlation with x-ray beam angles.
  • the SSIM (FIG. 4, left panel) indicates that the proposed PCAT and ResNetGAN decomposed images closely resemble the reference spine DRR.
  • the proposed PCAT and ResNetGAN have comparably high SSIM compared to the kV image to reference spine DRR.
  • the kV image to reference spine DRR SSIM also has (I) a larger standard deviation and (II) a more significant dependence on the x-ray beam angle.
  • the kV image to reference spine DRR SSIM is relatively higher in the anteroposterior angles and lower in the lateral beam angles.
  • the PSNR (FIG. 4, left panel) indicates that the proposed PCAT and ResNetGAN decomposed images closely resemble the reference spine DRR.
  • the proposed PCAT and ResNetGAN have comparably high SSIM compared to the kV image to reference spine DRR.
  • the kV image to reference spine DRR SSIM
  • the PCAT has the mean ( ⁇ and [10 th , 90 th ]) percentile of PSNR 67.34, (63.21, 70.77), in comparison to 64.81, (61.12, 67.94) of ResNetGAN, respectively. Further, the two-tailed paired t-test has a p-value ⁇ 0.001 for both SSIM and PSNR.
  • the motion tracking accuracy assessment is per-formed using mutual information-based rigid registration to the reference spine DRR Ispine.
  • the IkV to Ispine registration may be compared with the spine pred to I spine registration detected matching accuracy for the proposed patient-specific prior approach using PCAT versus that of ResNetGAN.
  • the detected matching errors for all five testing patients are summarized in FIG. 18 and Table 2.
  • the kV images at lateral beam angles normally have lower contrast and higher noise because the x-ray beam typically goes through a larger thickness.
  • the reduced image quality due to superimposition along the beam path may impact matching accuracy more at lateral angles.
  • the motion tracking accuracy may be assessed on the lateral and anteroposterior x-ray beam angles separately.
  • the spinepred to reference DRR matching has the mean ([5th, 95th] percentile) of 0.105 mm ([0.013 mm, 0.275 mm]) for ResNetGAN and 0.098 mm ([0.015 mm, 0.243 mm]) for PCAT, in comparison to kV to I spine 0.131 mm ([0.016 mm, 0.366 mm]).
  • the paired two-tailed t-test failed to reject the null hypothesis of equal mean matching accuracy between the proposed PCAT and ResNetGAN, with a p-value of 0.32.
  • the ROI of the mutual information-based rigid registration is the middle part (in the superior–inferior direction) of the spine, where both PCAT and ResNetGAN have relatively better performance compared to more superior or interior spine regions.
  • PCAT has a further reduced standard deviation and maximum shift error (FIG. 18 and Table 2) compared to ResNetGAN, especially at the lateral beam angles where kV images tend to have worse contrast.
  • the quantitative comparison of the motion tracking accuracy at the lateral and anteroposterior beam angles is tabulated below (Table 2).
  • the paired two-tailed t-test showed significant improvement (p ⁇ 0.05) for both ResNetGAN and PCAT decomposed spine image to reference DRR matching, compared to the kV image to reference DRR matching.
  • the PCAT had significantly higher accuracy than the kV image to reference DRR -37- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) matching
  • ResNet-GAN had p > 0.05 for paired two-tailed t-test of the kV image to DRR versus ResNetGAN decomposed to DRR matching.
  • SK-2023-108 (115872-3267) exploit priori knowledge in the explicit form of constraints in medical image analysis tasks.
  • the priori knowledge in this study is embodied as the CNN-extracted latent features.
  • the PCAT incorporated the patient-specific prior knowledge by selectively amplifying the transmission of the projection image features that correlate with features of the object prior.
  • the PCAT approach may be benchmarked with the ResNetGAN.
  • the ResNetGAN has a network structure similar to the PCAT, except for incorporating latent features of patient-specific prior.
  • the model performance evaluation showed that the trained PCAT model had improved performance in predicting high-quality spine images and led to reduced motion detection error when matching with the reference spine DRR.
  • a deep disentangled generative network was proposed to synthesize the disease residue map of the chest x-ray, assuming the diseased part was superimposed upon the normal chest x-ray. It is an unsupervised learning model that has no priori incorporated.
  • another chest x-ray study used a decomposition generative adversarial network in generating the modulated (bone-suppressed, specifically) x-ray.
  • the DRRs are used as training labels rather than explicitly incorporating patient prior, similar to the previous ResNetGAN model. It is recognized that the simulated motion in the study may not exactly replicate a clinical setting, as 2D motion simulations may be chosen to be used instead of 3D simulations due to their computational complexity.
  • the approach enables accurate restoration of blocked targets in x- ray images by utilizing the patient-specific prior and MHCA.
  • the patient-specific cross attention mechanism facilitates the selective transmission of features in the latent space, and builds a general relationship between the image to be processed and the prior.
  • the relationship between the input and the prior could be more explicit.
  • the prior image i.e., spine-only DRR
  • the relationship can be a rigid registration, which can be specifically designed with a registration network module.
  • an implicit model could be more tangible in the implementation of other applications.
  • a potential application could be CBCT reconstruction using CT priori; the implicit model could be used to learn the CBCT versus planning CT relationship for further regularization, in contrast to the fix-form constraints using pixel-to-pixel difference. 5.
  • CONCLUSION This study presents the incorporation of the patient-specific prior knowledge into the deep learning algorithm for achieving significantly improved x-ray image contrast through the kV projection image decomposition, leading to submillimeter motion tracking accuracy in paraspinal SBRT. -40- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) C.
  • Intrafractional Markerless Lung Tumor Tracking Purpose Disclosed herein are the first clinical experience of intrafractional markerless lung tumor tracking enabled by an AI-empowered target decomposition technique.
  • the primary objectives are to characterize intrafractional lung tumor motion and assess the feasibility of reducing PTV margins in deep inspiration breath hold (DIBH) lung SBRT patients.
  • DIBH deep inspiration breath hold
  • Methods Fifteen lung SBRT patients were enrolled and treated on a Varian Truebeam platform under DIBH conditions, receiving 3–5 fractions with respiratory motion managed using a Varian RPM system with a 3 ⁇ mm gating window.
  • a patient-specific deep learning model was trained on simulation CT scans to enhance tumor contrast in kV projection images using a target decomposition technique.
  • the model was subsequently integrated into various forms of clinical software, which employs template matching to track tumor motion on intrafraction motion review (IMR) images triggered every 200 monitor units during beam delivery.
  • IMR intrafraction motion review
  • Tumor motion was quantified by calculating the mean and standard deviation of the maximum displacement from the isocenter in both the longitudinal and in-plane left-right (IPLR) directions. The percentage of treatment time during which tumor displacement remained below thresholds of 5, 4, 3, and 2 ⁇ mm were calculated. Results: Tumor tracking was successfully performed on 1222 IMRs out of 1269 IMRs collected across 56 treatment sessions, indicating a tracking rate of 96.3%.
  • FIG. 19 depicted is a block diagram of an example neural network architecture to create high quality X-ray images.
  • the network architecture can achieve image enhancement for x-ray image decomposition.
  • Each mathematically decomposed x-ray image X i may correspond to a particular object or a particular region, thus can improve the image quality.
  • the neural network may be used. Referring now to FIG. 20, depicted is a block diagram of an example network architecture to use the target decomposed X-ray image to match the reference image template for motion monitoring in Radiotherapy.
  • the input x-ray image can be fed neural network model to generate the decomposed target image.
  • the decomposed image and the given reference image may be fed into a motion tracker to calculate the motion trace.
  • FIG. 21 depicted is a block diagram of an example network architecture to implement the target decomposition technique with patent specific model training.
  • the patient previous CT image with rigid translation and rotation or deformation, can transform to account for the patient anatomy variation from day to day.
  • the augmented CT images can simulate the x-ray images and the corresponding decomposed target images, all under the same imaging geometry. These paired images can train the neural network model.
  • FIG. 22 depicted is a block diagram of an example network architecture to implement the target decomposition technique with a population-based model training and refine it with the new patient information for a particular patient treatment.
  • the network architecture may be trained using a population of patient images.
  • the network architecture herein can use specific patient's data to fine-tune the model, following the process outlined in FIG. 21.
  • the fine-tuning step adapts the model to the individual characteristics of the new subject before using the network architecture for image decomposition for that patient.
  • FIG. 23 depicted is a block diagram of an example network architecture to the target decomposition technique with a patient-specific prior population-based model, which includes two inputs: the X-ray image and the patient-relevant prior information.
  • the model training is based on a population of patient data.
  • the model takes two inputs, one is the simulated x-ray image, and the other is patient relevant prior information.
  • the prior information can be previous CT images, MR images, or direct/indirect patient anatomy information from any other imaging modality.
  • FIG. 24 depicts a block diagram of an input for target decomposition as a single X-ray image or an image sequence.
  • the image sequence can be collection of images or a video of the subject.
  • FIG. 25 depicts a block diagram of a machine learning model with an encoder and decoder structure (e.g., U-net structure), in which concatenation occurs between layers of the encoder and corresponding layers in the decoder.
  • the model may be based on a convolutional -43- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) neural network, vision transformer architecture, or diffusion model, among others.
  • FIG. 26 depicts a block diagram of an example of the generative adversarial network for target decomposition training.
  • FIG. 27 depicts a block diagram of an example of a conditional diffusion model network used to create images of targets.
  • the conditional diffusion model may use random noise along with input x-ray image and decomposed images for training.
  • the system 100 can include at least one image processing system 105, at least one imaging device 110, at least one display 115, and at least one database 150 communicatively coupled with another via at least one network 120.
  • the image processing system 105 can include at least one dataset indexer 125, at least one model trainer 130, at least one model applier 135, at least one output handler 140, and at least one image decomposition model 145 (generally referred to as “IDM 145” or “machine learning (ML) model” herein), among others.
  • Each of the components in the system 100 as detailed herein may be implemented using hardware (e.g., one or more processors coupled with memory), or a combination of hardware and software as detailed herein in Section D.
  • the image processing system 105 may (sometimes herein generally referred to as a computing system or a server) be any computing device, including one or more processors coupled with memory and software and capable of performing the various processes and tasks described herein.
  • the image processing system 105 can be in communication with the imaging device 110, display 115, the database 150, and other devices, via the network 120.
  • the image processing system 105 may be situated, located, or otherwise associated with at least one server group.
  • the server group may correspond to a data center, a branch office, or a site at which one or more servers corresponding to the image processing system 105 is situated.
  • the image processing system 105 can perform or implement any of the functionalities detailed herein in conjunction with Sections A–B.
  • the dataset indexer 125 can identify images and a text report for a subject with a condition. -44- 4924-4486-5864.1 Atty. Dkt.
  • the model trainer 130 can train the IDM 145 using datasets within the database 150.
  • the model applier 135 can apply the IDM 145 to the x-ray image to decompose the x-ray image.
  • the output handler 140 may transmit an output of the decomposed image to the display 115.
  • the IDM 145 can be any type of machine learning (ML) algorithm or model to identify features (e.g., spinal column) corresponding to structure of interest (SOI) from a projection image.
  • the IDM 145 can be maintained on the image processing system 105.
  • the IDM 145 can be, for example, a deep learning artificial neural network (ANN), such as an encoder-decoder model with a convolution neural network architecture, a transformer architecture, or a diffusion model (e.g., as depicted in Figs. 1, 2, 19–24, 29, or 30), among others.
  • ANN deep learning artificial neural network
  • the IDM 145 can have a projection image in any modality from a subject as an input, an identification of the SOI within the projection image as an output, and a set of weights relating the input to the output, among others.
  • the IDM 145 may have been initialized, trained, and established using a training dataset in accordance with learning techniques (e.g., supervised or semi-supervised).
  • the training dataset can include or identify a set of examples.
  • Each example can include a respective projection image and an annotation defining the SOI within the respective projection image.
  • the IDM 145 may be trained for a specific subject. In some embodiments, the IDM 145 may be trained for a group of subjects and then refined or fine-tuned for a particular subject.
  • the imaging device 110 (sometimes herein generally referred to as an imaging device or an image acquirer) may be any device to acquire projection images of subjects.
  • the projection image can be a tomogram image acquired in accordance with a tomographic imaging technique, such as a magnetic resonance imaging (MRI) scanner, a nuclear magnetic resonance (NMR) scanner, high-energy electromagnetic radiation (X-ray) computed tomography (CT) scanner, an ultrasound imaging scanner, and a positron emission tomography (PET) scanner, and a photoacoustic spectroscopy scanner, among others.
  • MRI magnetic resonance imaging
  • NMR nuclear magnetic resonance
  • X-ray computed tomography
  • PET positron emission tomography
  • photoacoustic spectroscopy scanner a photoacoustic spectroscopy scanner, among others.
  • the imaging device 110 can be in communication with the image processing system 105 and the display 115 to -45- 4924-4486-5864.1 Atty. Dkt.
  • the imaging device 110 may capture a set of projection images represented as a video. Each projection image may correspond to an image frame within the video. The set of projection images may be acquired at a sampling rate ranging between 10 frames per second to 90 frames per second.
  • the display 115 (sometimes herein referred to as an operator device or a clinician device) can be any computing device comprising one or more processors coupled with memory and software and capable of providing an output projection image. The display 115 can be associated with an entity (e.g., a clinician) examining the subject or biomedical images from the subject. The display 115 can be in communication with the image processing system 105 and the imaging device 110 to exchange data.
  • the display 115 can display projection images acquired from the imaging device 110.
  • the display 115 can be used to input to create text reports for the projection images.
  • FIG. 29 depicted is a block diagram of a process 200 for training the IDM 145 model to identify a target defining the SOI within the projection image in the system 100.
  • the process 200 can include or correspond to operations performed in the system 100 to identify the target with the SOI.
  • the model trainer 130 can retrieve, obtain, or otherwise identify training data 205 from the database 150.
  • the training data 205 can include a set of examples.
  • Each example of the training dataset may include at least one dataset 210 and at least one sample decomposed image 215 for at least one subject 225.
  • the subject 225 may be a human or animal subject, among others.
  • the subject 225 may have, may be at risk of, or may be afflicted with at least one condition.
  • the condition may include, for example, breast cancer, lung cancer, prostate cancer, colorectal cancer, skin cancer, bladder cancer, pancreatic cancer, liver cancer, ovarian cancer, cervical cancer, and the like.
  • the subject 225 may have been administered with a radiotherapy to treat the cancer.
  • the radiotherapy can include, for example, external beam radiation therapy (EBRT), brachytherapy, proton therapy, radioimmunotherapy, radiosurgery, among others.
  • the subject 225 can include at least one volume 230.
  • the volume 230 may correspond to a section, a portion, or an area of -46- 4924-4486-5864.1 Atty. Dkt.
  • the volume 230 may include, for example, a head, a neck, a torso, a back, a pelvis, arms, or legs.
  • the volume 230 can include the SOI 235 within the subject 225.
  • the volume 230 may correspond to an area of the subject 225 in which cancer is present.
  • the SOI 235 can be at least one organ to be administered for radiotherapy for cancer within the corresponding volume 230 of the subject 225.
  • the organ can be any organ within the subject 225, such as the spine, a lung, a breast, a gastrointestinal tract, the pelvis, a bone, a tissue, a lymph, among others.
  • the SOI 235 can be the skull or facial bones. In another example, if the volume is the pelvis, the SOI 235 can be the ilium, the ischium, or the pubis. In yet another example, if the volume 230 is the back, the SOI 235 can be the cervical vertebrae, the thoracic vertebrae, the lumbar vertebrae, the sacrum, or the coccyx.
  • the dataset 210 can include a collection of examples that include projection images 220 and subject information 240 fed into the IDM 145. The dataset 210 may use the projection image 220 and the subject information 240 as input features for the IDM 145.
  • the sample decomposed image 215 can be the projection image 220 as the output variable or response variable of the IDM 145.
  • the sample decomposed image 215 can include a target 245.
  • the target 245 can represent a desired outcome or prediction that the model is trained to classify, identify, or predict to implement supervised learning.
  • the dataset 210 may not include the sample decomposed image 215 to allow the IDM 145 to learn patterns or structures of the SOI 235 and the target 245 within the projection image 220.
  • the projection image 220 can be an image captured by the imaging device 110 corresponding to the subject 225 (e.g., according to X-ray scanning).
  • the imaging device 110 can store each captured image for the subject 225 within the dataset 210.
  • imaging device 110 can capture an image of the subject 225 and store the captured image in the dataset 210 as an example of a projection image 220.
  • the projection image 220 can be a set of image -47- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) frames forming a video for the subject 225.
  • the imaging device 110 can capture or acquire the set of image frames of the subject 225 and store the collection of images in the dataset 210 as an example of a moving projection image 220.
  • the projection image 220 can include the SOI 235, however the project image 220 can include a significant amount of noise, distortions, and defects, as shown in FIG. 15 and FIG.
  • the SOI 235 can be the stomach of the subject 225, which may move (i.e., churn) during the image capture process, causing the projection image 220 to include distortions.
  • the system 100 may use the IDM 145 to remove the distortions.
  • the projection image 220 can be a synthesized image.
  • the projection image 220 can correspond to a sample frame from a simulation of a breathing pattern of a given subject 225.
  • the projection image 220 may be also derived or generated from computer tomography (CT) images.
  • CT computer tomography
  • the subject information 240 can include information about the subject 225.
  • the information 240 can indicate the one or more conditions of the subject.
  • the subject information 240 can define a motion of the SOI 235 within the subject.
  • the motion can include, for example, orientation, speed of movement, rate of change in size, direction of change, among others.
  • the subject information 240 can indicate that the subject 225 suffers from lung cancer.
  • the subject information 240 can define the movement of the lungs when the subject 225 suffers from lung cancer to help the IDM 145 to remove the distortions of the projection image 220.
  • the subject information 240 can include radiotherapy administered to the subject 225.
  • the radiotherapy may include, for example, radioactive iodine (I-131) therapy, high-dose rate brachytherapy, stereotactic body radiation therapy (SBRT), among others.
  • the subject information 240 may identify or include digitally reconstructed radiographs (DRR).
  • the DRR may be a simulated two-dimensional X- ray image generated from a CT scan or MRI scan of the subject 225.
  • the image processing system 102 may mathematically project the CT or MRI data onto a virtual X- ray detector grid to simulate the capture of X-ray images from the imaging device 110.
  • the DRR may allow for accurate localization -48- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) of the volume 230.
  • the DRR may facilitate precise patient positioning and alignment during radiotherapy to protect healthy organs and tissues.
  • the sample decomposed image 215 may be a projection image 220 with the target 245.
  • the target 245 may include the SOI 235 without any noise, distortions, or irregularities.
  • the sample decomposed image 215 may be a ground truth for the IDM 145.
  • the sample decomposed image 215 may represent a proper classification, analysis, and removal of the noise in the projection image 220.
  • the sample decomposed image 215 may include labels, annotations, or classifications for the SOI 235 to guide the IDM 145 to correctly identify the target 245 and the SOI 235, when processing the projection image 220.
  • the sample decomposed image 215 can be established or generated based on at least one of a simulation of the x-ray scan or an acquisition of the x-ray scan.
  • the sample decomposed image 215 can be a synthesized image.
  • the sample decomposed image 215 can correspond to an image frame from a simulation of a breathing pattern of a given subject 225.
  • the training data 205 can include a plurality of datasets 210. Each dataset 210 can be directed to or identify a different subject 225. For example, a first dataset 210 may identify a first subject 225 and a second dataset 210 may identify a second subject 225.
  • the model trainer 130 may train the IDM 145 to be specific to each subject 225 within the database 150.
  • the first subject 225 may have a breast cancer and the first dataset 210 can include a first projection image 220 of the breast as the SOI 235.
  • the dataset indexer 125 executing on the image processing system 105 can retrieve, identify, or otherwise obtain the at least one dataset 210 from the training data 205 for the at least one subject 225.
  • the dataset indexer 125 can further obtain a report of the subject information 240.
  • the text of the subject information can be unstructured (e.g., free-form text) or structured (e.g., populated in field-value format in accordance with a template).
  • the dataset indexer 125 can index, sort, or otherwise organize each dataset 210 to correspond to each subject 225 in the plurality of subjects 225.
  • the image processing system 105 can identify a first subject 225, a second subject 225, and a -49- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) third subject 225, each with their own respective dataset 210.
  • the dataset indexer 125 may organize the datasets 210, such that a first dataset 210 corresponds to the first subject 225, a second dataset 210 corresponds to the second subject 225, and a third dataset 210 corresponds to the third subject 225.
  • the dataset indexer 125 may provide the subjects 225 with the respective datasets 210 to the model trainer 130 and the model applier 135.
  • the model applier 135 may retrieve, receive, or otherwise obtain the dataset 210 from each example of the training dataset 205.
  • the model applier 135 can apply, execute, or otherwise employ the IDM 145 to the one or more datasets 210 for the one or more subjects 225.
  • the IDM 145 can include at least one encoder 155, at least one integrator 160, at least one decoder 165, and at least one discriminator 170, among others.
  • the IDM 145 can include a plurality of parameters for identifying the SOIs 235 to the one or more datasets 210.
  • the plurality of parameters can include a plurality of hyperparameters to establish the configuration of the IDM 145 to identify the SOIs 235.
  • the IDM 145 may lack the discriminator 170 (e.g., as in the model architecture depicted in FIGS. 23–25 and 27).
  • the plurality of hyperparameters can be arranged across the encoder 155, the integrator 160, the decoder 165, and the discriminator 170.
  • the plurality of hyperparameters can include a learning rate, number of epochs, a batch size, a model architecture (e.g., multimodal multi-head convolutional neural network), and regularization parameters.
  • the plurality of parameters can include a plurality of model parameters.
  • the plurality of model parameters can be variables learned from the training data 205 (i.e., one or more datasets 210) that establish, dictate, or otherwise define a link between the inputs (e.g., projection image 220 and subject information 240) and the outputs (e.g., target 245 of the sample decomposed image 215).
  • the model parameters can continuously optimize during training to minimize a loss function of the IDM 145.
  • the plural of model parameters can include target 245 weights or biases (i.e., influence the output of the IDM 145 based on accurate targets 245), loss function (e.g., compound loss), among others. -50- 4924-4486-5864.1 Atty.
  • the model applier 135 can feed, input, or otherwise supply the one or more datasets 210 to the IDM 145.
  • the model applier 135 can add noise to the projection image 220 prior to feeding the input to the IDM 145 (e.g., when implemented as a diffusion model).
  • the noise may include, for example, Gaussian noise, uniform noise, or Poisson noise, among others.
  • the model applier 135 can apply the projection image 220 and the subject information 240 to the encoder 155 of the IDM 145.
  • the encoder 155 can execute dimensionality reduction, feature extraction, or sequence modeling to generate, produce, or determine at least one feature 250A using the projection image 220 and at least one feature 250B using the subject information 250.
  • the encoder 155 can include a plurality of layers to extract, retrieve, or otherwise obtain the projection image 220 and the subject information 240 from the dataset 210 as a first feature 250A and a second feature 250B, respectively.
  • the plurality of layers can include an input layer.
  • the input layer can receive the one or more datasets 210 from the model applier 135 or the model trainer 130. In some implementations, the input layer can receive the one or more datasets 210 as a collection of text strings or images.
  • the plurality of layers can include one or more hidden layer to transform the datasets 210 to extract or retrieve one or more representations of the projection image 220.
  • the one or more representations may include the plurality of features 250A-N.
  • the plurality of layers can include a bottleneck layer to generate or create the first feature 250A (corresponding to the projection image 220) and the second feature 250B (corresponding to the subject information 240) by compressing the dataset 210 to extract the first feature 250A and the second feature 250B.
  • the encoder 155 can include activation functions to enable the IDM 145 to learn mappings between the projection image 220 and the sample decomposed image 215.
  • the model applier 135 can apply the features 250A and 250B to the integrator 160 of the IDM 145.
  • the integrator 160 can receive, retrieve, or otherwise obtain the first feature 250A and the second feature 250B from latent space of the encoder 155.
  • the integrator 160 can be at least one of a self-attention integrator, a cross attention integrator, a layer normalization integrator, a feedforward integers, -51- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) among others.
  • the integrator 160 is a cross-attention integrator and the integrator may project the first feature 250A and the second feature 250B into query, key, and value vectors using one or more linear transformations (e.g., scaling, translation, projection).
  • the first feature 250A can be the query
  • the second feature 250B can be the key.
  • the integrator 160 can use the query and key to interact with a plurality of aspects of the projection image 220 (e.g., SOI 235, noise, distortions, target 245) and the subject information 240 (e.g., condition of the subject 225).
  • the integrator 160 can calculate one or more attention scores for each query and key vector.
  • the attention scores can indicate a level of relevance between the key to the query.
  • the level of relevance can increase as the subject information 240 aligns with a SOI 235 or a target 245 of the projection image 220.
  • the integrator can use a SoftMax function to obtain attention weights from the attention scores.
  • the attention weights may indicate an importance of the subject information 240 to the projected image 220.
  • the subject information 240 may indicate the movement of the lungs of a subject 225.
  • the lungs of the subject can be the SOI 235 or the target 245 of the projection image 220.
  • the integrator 160 can determine that the movement of the lungs is important to generate an output feature 250’.
  • the integrator 160 can use the attention weight to calculate a weighted sum as value vectors to generate the output feature 250’.
  • the integrator can aggregate each query and key based on the relevance and importance to one another to produce the weighted sum.
  • the integrator 160 may execute or apply activation functions or multi-layer perceptron on the weighted sum to generate the output feature 250’.
  • the model applier 135 can apply the decoder 165 of the IDM 145 to the output feature 250’.
  • the decoder 165 may use the output feature 250’ and generate, determine, or otherwise predict a decomposed image 215’ with an identified target 245’.
  • the decoder 165 can include an initial state based on the output feature 250’. The initial state can be a starting point to generate the decomposed image 215’.
  • the decoder 165 can generate a plurality of tokens or pixels corresponding to the decomposed image 215’ including the target 245’. Each token in the plurality of tokens can be sampled from a -52- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) probability distribution by using SoftMax based sampling.
  • the decoder 165 can include a recurrent layer to model the decomposed image 215’. For each token in the plurality of tokens, the recurrent layer can update the internal state of the decoder 165. Updating the internal state of the decoder 165 may remove noise and distortion of the projection image 220 at each step of decoding.
  • the decoder 165 can include an output projection layer to map an internal representation of the decomposed image 215’ to the probability distribution over the plurality of tokens or pixels.
  • the probability distribution can indicate the likelihood of the target 245’ within the decomposed image 215’.
  • the decoder 165 may output the decomposed image 215’ including the predicted target 245’.
  • the decomposed image 215’ can define the SOI 235 within the target 245’.
  • the model trainer 130 can feed, supply, or otherwise apply one of the decomposed image 215’ or the sample decomposed image 215 to the discriminator 170.
  • the discriminator 170 can include an input layer that receives the sample decomposed image 215 to transmit the sample decomposed image 215 to the subsequent layers of the discriminator 170.
  • the discriminator 170 can include a plurality of hidden layers that execute one or more transformations of the sample decomposed image 215 and the decomposed image 215’.
  • the hidden layers can extract features from the transformed decomposed image 215’ to determine a classification 255.
  • the classification 255 can identify the target 245’ within the decomposed image 215’ and a feature to derive the target 245’ as real or fake. If the target 245’ is fake (e.g., incorrect), the IDM 145 may have removed the SOI 235 when attempting to remove the noise from the projection image 220.
  • a first dataset 210 can include a projection image 220 depicting the heart of the subject 225 as the SOI 235.
  • the projection image 220 can include high amounts of noise, which may prevent the heart from being visible in the projection image 220.
  • the model applier 135 may apply the IDM 145 to the first -53- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) dataset 210.
  • the IDM 145 can identify the heart within the decomposed image 215’.
  • the IDM 145 can remove the heavy amounts of noise to generate the decomposed image 215’ depicting the heart without any noise.
  • the model trainer 130 can initialize, train, or otherwise establish the IDM 145 for each subject 225 interacting with the imaging device 110 by receiving, retrieving, or otherwise obtaining the training data 205 (e.g., datasets 210 and sample decomposed images 215) from the dataset indexer 125.
  • the training data 205 e.g., datasets 210 and sample decomposed images 215
  • the model trainer 130 may feed the training data 205 to the IDM 145 to generate the decomposed image 215’ and the classification 255.
  • the model trainer 130 can train the IDM 145 for each subject 225 whom needs radiography allowing for a single machine learning model to execute on a per subject 225 or per population (e.g., with multiple subjects) basis.
  • the model trainer 130 can trigger the model applier 135 to apply each layer of the IDM 145 to the input dataset 210.
  • the model trainer 130 can initialize, train, or otherwise establish the IDM 145 for a set of subjects 225.
  • the set of subjects may correspond to a cohort or a population of subjects with common characteristics (e.g., same cancer afflicting same organs, age, gender, race, location, or co-morbidities, among others).
  • the model trainer 130 may determine, calculate, or otherwise generate at least one least one loss metric 260 based on the target 245 and the target 245’.
  • the model trainer 130 may generate the loss metric 260 be based on the classification 255.
  • the discriminator 170 can be used generate the loss metric 260.
  • the loss metric 260 can measure, rate, or identify how well the discriminator 170 can distinguish between the real target 245 of the sample decomposed image 215 and the noise within the projection image 220.
  • the loss metric 260 from the discriminator 170 can be a Wasserstein loss to measure discrepancy between the distributions of real targets 245 and generated targets 245’.
  • the discriminator 170 when the discriminator 170 is presented with the real target 245, the discriminator 170 can learn to assign high probability scores to the real targets 245 and calculate the real target 245 loss.
  • fake targets 245 e.g., predominantly noise or the incorrect -54- 4924-4486-5864.1 Atty. Dkt.
  • the discriminator 170 can learn to assign low probability scores to the fake targets 245 and calculate the fake target 245 loss.
  • the loss metric 260 may correspond to a loss function to quantify the difference between the generated target 245’ and the target 245 within the sample decomposed image 215 to optimize the IDM 145 during training.
  • the model trainer 130 may compare the generated target 245’ and the target 245 within the sample decomposed image 215. The comparison may be performed when the IDM 145 is implemented without the discriminator 170.
  • the loss function can be at least one of Mean Squared Error, Binary Cross- Entropy Loss, Categorical Cross-Entropy Loss, Hinge Loss, or Wasserstein Loss.
  • Wasserstein loss for the discriminator 170, the IDM 145 can avoid any issues of mode collapse and vanishing gradients to allow the model trainer 130 to provide more stable training for the IDM 145.
  • the Wasserstein loss can be less sensitive to hyperparameters and architecture choices to enable easier training and robust changes to the model. Therefore, the system 100 can effectively adapt to each subject 225 with relative ease, low computer resources, and stable training for the IDM 145.
  • the model trainer 130 can determine the loss metric 260 by comparing the target 245’ with the generated target 245’ using a plurality of methods, such as line profile analysis, motion detection, Pixel-wise comparison, Structural Similarity index, Mean Squared Error, Peak Signal-to-Noise, Convolutional Neural Networks (e.g., ResNet or Inception), among others.
  • the decomposed images 215’ are normalized to [0, 1] first. The lines are then chosen in the horizontal (x-) and vertical (y-) directions (in the image coordinates), going across a vertebra, or multiple bony structures, respectively.
  • the peak signal- to-noise ratio (PSNR) and structural similarity index measure (SSIM) are quantified to measure the decomposed image 215’ quality.
  • PSNR is defined as the maximum signal power to noise power ratio
  • SSIM measures the overall similarity between the sample decomposed image 215 and the decomposed image 215’.
  • a higher PSNR and SSIM score value indicates a better prediction.
  • SK-2023-108 (115872-3267)
  • the model trainer 130 can update, change, or otherwise adjust the plurality of parameters of the IDM 145 using the loss metric 260.
  • the model trainer 130 may adjust the model parameters of the IDM 145 to generate more effective decomposed images 215. For instance, the model trainer 130 can use the loss metric 260 to compute gradients for the direction and magnitude of parameter adjustments to minimize the loss metric 260. In another instance, the model trainer 130 can optimization algorithms, such as stochastic gradient descent or RMS prop, to update model parameters iteratively. In some implementations, the direction and step size of the parameter update according to the gradients and the optimization algorithm. The model trainer 130 can update the model parameters based on the training data 205 until convergence or a stopping threshold is satisfied. The model trainer 130 can continuously apply the training data 205 to gradually minimize the loss metric 260 to improve the removal of noise associated with the SOI 235.
  • optimization algorithms such as stochastic gradient descent or RMS prop
  • the model trainer 130 can evaluate, verify, or validate the IDM 145 performance using the loss metric 260 and according to a test dataset within the database 150.
  • the first subject 225 may have a breast cancer and the first dataset 210 can include a first projection image 220 of the breast as the SOI 235.
  • the model trainer 130 can train the IDM 145 to remove any distortions within the first projection image 220 to clearly identify the SOI 235.
  • the second subject 225 can have liver cancer and the second dataset 210 can include a second projection image 220 of the liver as the SOI 235.
  • the model trainer 130 can train the IDM 145 to remove any distortions within the second projection image 220 to clearly identify the liver.
  • the system 100 can be patient specific and can manage a plurality of subjects 225 while maintaining a single IDM 145 to reduce latency and save resources within a computing device.
  • FIG. 30 depicted is a block diagram of a process 300 for applying the IDM 145 to a dataset 310 to identify a target 345 defining a structure of interest (SOI) 335.
  • the process 300 can include or correspond to operations performed in the system -56- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) 100 to identify the target with the SOI.
  • the dataset indexer 125 can obtain, retrieve, or receive the projection image 320 from the imaging device 110.
  • the subject 325 may be administered with radiotherapy for cancer.
  • the data indexer 125 can receive a plurality of images of the brain, from the imaging device 110, when the subject 325 is administered with image guided radiation therapy (IGRT).
  • the imaging device 110 can transmit each image of the brain captured during the IGRT to the dataset 310 as the projection image 320.
  • the images for the dataset 310 can be acquired, via the imaging device 110, as a scan of the volume 330 within the subject 325 while the subject is on an apparatus (e.g., a patient table or couch).
  • the image processing system 105 may at least partially concurrently process a plurality of projection images 320 within the dataset 310.
  • the radiotherapy can include at least one of an intensity-modulated radiation therapy (IMRT), stereotactic body radiation therapy (SBRT), image-guided radiation therapy (IGRT), or brachytherapy.
  • IMRT intensity-modulated radiation therapy
  • SBRT stereotactic body radiation therapy
  • IGRT image-guided radiation therapy
  • the subject 325 and the dataset 310 can be the same as the subject 225 and the dataset 210, respectively, described above.
  • the dataset 310 can include a collection of examples that include projection images 320 fed into the IDM 145.
  • the dataset 310 may use the projection image 320 as an input feature for the IDM 145.
  • the projection image 320 can be an image captured by the imaging device 110 corresponding to the subject 325.
  • the imaging device 110 can store each captured image for the subject 325 within the dataset 310.
  • imaging device 110 can capture an image of the subject 325 and store the captured image in the dataset 310 as an example of a projection image 320.
  • the projection image 320 can include the SOI 335, however the project image 320 can include a significant amount of noise, distortions, and defects, as shown in FIG. 15 and FIG. 16A, because of movement by the SOI 335.
  • the SOI 335 can be the stomach of the subject 325, which may move (i.e., churn) during the image capture process, causing the projection image 320 to include distortions.
  • the system 100 may use the IDM 145 to remove the distortions.
  • the subject information 340 can include information about the subject 325.
  • the subject information 340 can indicate the one or more conditions of the subject.
  • the subject -57- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) information 340 can define a motion of the SOI 335 within the subject.
  • the motion can include, for example, orientation, speed of movement, rate of change in size, direction of change, among others.
  • the subject information 340 can indicate that the subject 325 suffers from lung cancer.
  • the subject information 340 can define the movement of the lungs when the subject 325 suffers from lung cancer to help the IDM 145 to remove the distortions of the projection image 320.
  • the subject information 340 can include radiotherapy administered to the subject 325.
  • the radiotherapy may include, for example, radioactive iodine (I-131) therapy, high-dose rate brachytherapy, stereotactic body radiation therapy (SBRT), among others.
  • the subject information 340 may be inputted or entered by a user of the imaging device 110 or through a computing device communicatively coupled with the imaging device 110 or the image processing system 105. For example, a clinician examining the subject 325 may enter data forming the subject information 340 at a terminal connected with the imaging device 110.
  • the image processing system 105 Prior to the administration of the radiotherapy and acquisition of the projection image 320, the image processing system 105 can receive, obtain, or otherwise retrieve a plurality of tomographic images of the volume 330 in the subject 225.
  • the plurality of tomographic images can be at least one of an X-ray Computed Tomography, Magnetic Resonance Imaging, Ultrasound Tomography, Positron Emission Tomography, among others.
  • the imaging device 110 may capture an X-ray Computed Tomography of the volume 330.
  • the imaging device 110 can capture a Magnetic Resonance Image of the volume 330.
  • the image processing system 105 can generate, determine, or otherwise create a digitally reconstructed radiograph (DRR) to use as the first information for the subject 325.
  • the DRR can be a computer generated 2D X-ray image to simulate the appearance of an X-ray image based on the plurality of tomographic images.
  • the DRR may receive the CT images or MRI images that contain information about the SOI 335 and density distributions within the volume 330.
  • the DRR can generate virtual X-ray projections by casting rays through the CT images or MRI images. From here, the DRR can simulate the -58- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) absorption and scattering of X-rays as the X-rays pass through tissues and SOI 335 within the subject 325.
  • the dataset indexer 125 executing on the image processing system 105 can retrieve, identify, or otherwise obtain the at least one dataset 310 for the at least one subject 325. The dataset indexer 125 can further obtain a report of the subject information 340.
  • the text of the subject information 340 can be unstructured (e.g., free-form text) or structured (e.g., populated in field-value format in accordance with a template).
  • the dataset indexer 125 can index, sort, or otherwise organize each dataset 310 to correspond to each subject 325 in the plurality of subjects 325. For instance, the image processing system 105 can identify a first subject 325, a second subject 325, and a third subject 325, each with their own respective dataset 310.
  • the dataset indexer 125 may organize the datasets 310 such that a first dataset 310 corresponds to the first subject 325, a second dataset 310 corresponds to the second subject 325, and a third dataset 310 corresponds to the third subject 325. From here, the dataset indexer 125 may provide the subjects 325 with the respective datasets 310 to the model applier 135.
  • the model applier 135 may retrieve, receive, or otherwise obtain the dataset 310.
  • the model applier 135 can apply, execute, or otherwise employ the IDM 145 to the one or more datasets 310 for the one or more subjects 325.
  • the IDM 145 can include at least one encoder 155, at least one integrator 160, and at least one decoder 165, among others.
  • the IDM 145 can include a plurality of parameters for identifying the SOIs 335 to the one or more datasets 310.
  • the plurality of parameters can include a plurality of hyperparameters to establish the configuration of the IDM 145 to identify the SOIs 335.
  • the plurality of hyperparameters can be arranged across the encoder 155, the integrator 160, and the decoder 165.
  • the plurality of hyperparameters can include a learning rate, number of epochs, a batch size, a model architecture (e.g., multimodal multi-head convolutional neural network), and regularization parameters.
  • the plurality of parameters can include a plurality of model parameters.
  • the -59- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) plurality of model parameters can include target weights or biases (i.e., influence the output of the IDM 145 based on accurate targets 345), loss function (e.g., compound loss), among others.
  • the model applier 135 can feed, input, or otherwise supply the one or more datasets 310 to the IDM 145. In feeding, the model applier 135 can apply the projection image 320 and the subject information 340 to the encoder 155 of the IDM 145.
  • the encoder 155 can execute dimensionality reduction, feature extraction, or sequence modeling to generate, produce, or determine at least one feature 350A using the projection image 320 and at least one feature 350B using the subject information 350.
  • the encoder 155 can include a plurality of layers to extract, retrieve, or otherwise obtain the projection image 320 and the subject information 340 from the dataset 310 as a first feature 350A and a second feature 350B, respectively.
  • the plurality of layers can include an input layer.
  • the input layer can receive the one or more datasets 310 from the model applier 135. In some implementations, the input layer can receive the one or more datasets 310 as a collection of text strings or images.
  • the plurality of layers can include one or more hidden layers to transform the datasets 310 to extract or retrieve one or more representations of the projection image 320.
  • the one or more representations may include the plurality of features 350A-N.
  • the plurality of layers can include a bottleneck layer to generate or create the first feature 350A (corresponding to the projection image 320) and the second feature 350B (corresponding to the subject information 340) by compressing the dataset 310 to extract the first feature 350A and the second feature 350B.
  • the model applier 135 can apply the features 350A and 350B to the integrator 160 of the IDM 145.
  • the integrator 160 can receive, retrieve, or otherwise obtain the first feature 350A and the second feature 350B from latent space of the encoder 155.
  • the integrator 160 can be at least one of a self-attention integrator, a cross attention integrator, a layer normalization integrator, a feedforward integers, among others.
  • the integrator 160 is a cross-attention integrator and the integrator may project the first feature 350A and the second feature 350B into query, key, and -60- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) value vectors using one or more linear transformations (e.g., scaling, translation, projection).
  • the first feature vector 350A can be the query
  • the second feature vector 350B can be the key.
  • the integrator 160 can use the query and key to interact with a plurality of aspects of the projection image 320 (e.g., SOI 325, noise, distortions, target 345) and the subject information 340 (e.g., condition of the subject 325).
  • the integrator 160 can calculate one or more attention scores for each query and key vector.
  • the attention scores can indicate a level of relevance between the key to the query. For example, the level of relevance can increase as the subject information 340 aligns with a SOI 335 of the projection image 320.
  • the integrator can use a SoftMax function to obtain attention weights from the attention scores.
  • the attention weights may indicate an importance of the subject information 340 to the projected image 320.
  • the subject information 340 may indicate the movement of the lungs of a subject 325.
  • the lungs of the subject can be the SOI 335 of the projection image 320.
  • the integrator 160 can determine that the movement of the lungs is important to generate an output feature 350.
  • the integrator 160 can use the attention weight to calculate a weighted sum as value vectors to generate the output vector 350.
  • the integrator can aggregate each query and key based on the relevance and importance to one another to produce the weighted sum.
  • the integrator 160 may execute or apply activation functions or multi-layer perceptron on the weighted sum to generate the output feature 350.
  • the model applier 135 can apply the decoder 165 of the IDM 145 to the output feature 350.
  • the decoder 165 may use the output feature 350 and generate, determine, or otherwise predict a decomposed image 315 with an identified target 345.
  • the decoder 165 can include an initial state based on the output feature 350. The initial state can be a starting point to generate the decomposed image 315.
  • the decoder 165 can generate a plurality of tokens or pixels corresponding to the decomposed image 315 including the target 345. Each token in the plurality of tokens can be sampled from a probability distribution by using SoftMax based sampling.
  • the decoder 165 can include a recurrent layer to model the decomposed image 315. For each token in the plurality of tokens, the recurrent layer can update the internal state of the decoder 165.
  • the decoder 165 may remove noise and distortion of the projection image 320 at each step of decoding.
  • the decoder 165 can include an output projection layer to map an internal representation of the decomposed image 315 to the probability distribution over the plurality of tokens or pixels.
  • the probability distribution can indicate the likelihood of the target 345 within the decomposed image 315.
  • the decoder 165 may output the decomposed image 315 including the predicted target 345.
  • the decomposed image 315 can define the SOI 335 within the target 345.
  • the output handler 140 can obtain the decomposed image 315 and store an association between the subject 325 and the decomposed image 315 in the database 150 using one or more data structures.
  • the association can be a link, a map, or a connection between the subject 325 and the decomposed image 315.
  • the one or more data structures can include an array, a linked list, a stack, a tree, a hash table, among others.
  • the data structure can be a hash table where the subject 325 is the key to the hash table and the decomposed image 315 is the value of the hash table.
  • the hash table can include a plurality of keys (i.e., for each subject 325) mapped to a plurality of values (i.e., decomposed images 315 for a plurality of SOIs 335 for the subject 325).
  • the data structure can be a plurality of linked lists.
  • a first linked list can correspond to a first subject 325. Each node in the first linked list corresponding to each decomposed image 315 of the first subject 325.
  • a second linked list can correspond to a second subject 325. Each node in the second linked list corresponding to each decomposed image 315 of the second subject 325.
  • the output handler 140 can store an association between the projection image 320 and the decomposed image 315.
  • the output handler 140 may calculate, determine, or otherwise generate at least one motion trace 360 using the set of decomposed images 315 generated by the IDM 145 from the set of projection images 320 (e.g., forming a video).
  • the motion trace 360 may correspond or identify a degree of movement (e.g., orientation and translation) of features over time among the set of projection images 320.
  • the motion trace 360 can define or indicate a speed of the movement over time among the set of -62- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) projection images 320.
  • the motion trace 360 can be calculated by the output handle 140 using the mean and standard deviation of a maximum displacement from an isocenter of the features or target 345 in both the longitudinal and in-plane left-right (IPLR) directions.
  • the isocenter can be the central point within the target 345 or in some cases, the volume 330.
  • the isocenter can indicate one or more machine axes for radiotherapy (e.g., gantry, collimator, and couch).
  • the output handler 140 can generate a motion trace 360 to analyze movement of the target 345 or the volume 330.
  • the motion trace 360 can be represented as a sinusoidal movement relative to the isocenter to define motion in a 3D space across the time window.
  • the output handler 140 may identify, set, or otherwise use at least one decomposed image 315 as a reference image.
  • the reference image may be used to compare against other, remaining decomposed images 315 from the IDM 145.
  • the output handler 140 may use the first decomposed image 315 as the reference image.
  • the output handler 140 may compare the target 345 in the decomposed image 315 with the target 345 of the reference image (e.g., another decomposed image 315). Based on the comparison, the output handler 140 may calculate or determine the motion trace 360 for the set of decomposed images 315.
  • the output handler 350 can provide, transmit, or otherwise send an output 355 to the display 115.
  • the output 355 can include the decomposed image 315 for presentation to the subject 325, an administrator, a lab technician, or a physician.
  • the decomposed image 315 can clearly indicate the target 345 without the noise, distortions, or the defects of the original projection image 320.
  • the decomposed image 315 can clearly indicate the SOI 335 without the noise, distortions, or the defects of the original projection image 320.
  • the lab technician may find it easier to view the output 355 as the decomposed image 315 may not include any noise.
  • the physician may attempt to view the pancreas of a subject 325 with pancreatic cancer. However, it may be difficult to view the pancreas when a volume 330 or tumor is attached to it.
  • the output 355 may provide the physician with an image without the volume.
  • the output 355 may also include or identify the motion trace 360 across the set of projection images 320 or the corresponding decomposed -63- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) images 315.
  • the output 355 may reduce computer resources necessary to analyze the target 345 or the SOI 335.
  • the volume 330 may distort the SOI 335 causing a computing device to utilize a substantial amount of computer resources to bypass the volume 330 and generate an image for the SOI 335. Referring now to FIG.
  • the process 400 can include or correspond to operations performed in the system 100 to generate the command signal 405.
  • the system 100 can include an apparatus 410 and at least one beam emitter 415, among others.
  • the apparatus 410 can include, for example, at least one of a treatment couch, a radiotherapy couch, a positioning table, among others.
  • the apparatus 410 can be used to secure, prop, or support the subject 325 (e.g., in a prone or supine position), while the subject 325 is being imaged via the imaging device 110 or receiving radiotherapy via the beam emitter 415.
  • the subject 325 can be situated, arranged, or otherwise positioned on at least one surface (e.g., couch top or tabletop) of the apparatus 410.
  • the apparatus 410 can be positioned, situated, or disposed relative to the imaging device 110 or the beam emitter 415 (or both) to facilitate imaging of the subject 325 and delivery of the radiotherapy to the subject 325.
  • the apparatus 410 can include at least one actuator 412.
  • the actuator 412 can be a mechanical device that receives electrical signals and converts the electrical signals into a movement for the apparatus 410.
  • the actuator 412 can be structured to cause the apparatus 410 (e.g., the couch or tabletop upon which the subject 325 is situated) to move, lift, rotate, or tilt according to the action.
  • the actuator 412 can include, for example, at least one of a circular actuator (e.g., rotary), a linear actuator (e.g., longitudinal axis, lateral axis, vertical axis), or a tilt actuator, among others.
  • the actuator 412 of the apparatus 410 can be controlled using a command signal (e.g., generated by the image processing system 105 or another computing device).
  • the command signal that can be communicated via a wired connection (e.g., general purpose input/output), TTL/ CMOS, universal serial bus, controlled area network bus) or wireless connection (e.g., Wi-Fi, Bluetooth).
  • a wired connection e.g., general purpose input/output
  • TTL/ CMOS universal serial bus, controlled area network bus
  • wireless connection e.g., Wi-Fi, Bluetooth
  • the beam emitter 415 (e.g., a linear accelerator) can deliver, or administer radiotherapy to a subject.
  • the beam emitter 415 may include a particle source to provide ions (e.g., electrons); an accelerating waveguide (e.g., resonating cavities) to accelerate beams to high energy; one or more bending magnets to direct ions toward a target; a target to convert the ions to the radiation beam; and a collimation system to shape and modulate the radiation beam used for the radiotherapy, among others.
  • the beam emitter 415 may also be integrated or included in the imaging device 110 or another device.
  • the beam emitter 415 may be on another portion of the imaging device 110.
  • the radiotherapy provided by the beam emitter 415 may include, for example, at least one of a stereotactic body radiation therapy (SBRT), an intensity-modulated radiation therapy (IMRT), a volumetric modulated arc therapy (VMAT), a conformal radiation therapy (CRT), or a proton beam therapy (PBT), among others.
  • the output handler 140 can receive, retrieve, or otherwise identify the motion trace 360 and a plurality of decomposed images 315A–N (referred to as decomposed images 315 herein) generated over a time window.
  • the time window may range between 30 seconds to 1 hour.
  • each decomposed image 315 can correspond to an acquisition of a respective projection image 320.
  • Each decomposed image 315 include a respective position for the target 345.
  • decomposed image 215A can include a first position of the target 345A
  • decomposed image 315A can include a second position of the target 345B.
  • the first position of the target 345A and the second position of the target 345B can be the same.
  • the first position of the target 345A and the second position of the target 345B can be different.
  • the movement of the target 345 can be a shift in the target 345 from a first position to a second position.
  • the output handler 140 can determine the motion trace 360 identifying movement of the target 345 with the respective decomposed image.
  • the output handler 140 can determine a type for the movement of the target 345 by comparing the position of each target 345 for each respective decomposed image 315 across time.
  • the motion trace 360 -65- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) can identify or indicate a type of motion for the target 345 across the decomposed images 315, such as a rotation (e.g., in orientation) or a translation (e.g., along x, y, or z-axis) of the target 345.
  • the target 345A of a first decomposed image 315 can be at an angle of 90 degrees relative to the isocenter in one time instance, whereas the target 345 of a second decomposed image 315 can be at an angle of 98 degrees relative to the isocenter.
  • the output handler 140 can generate at least one command signal 405 indicating an action for the apparatus 410 upon which the subject 325 is on.
  • the output handler 140 can generate the signal 405 in accordance with the type of movement of the target 345.
  • the signal 405 can indicate or identify an action to control, modify or adjust the apparatus 410.
  • the signal 405 may be a command for the apparatus 410 to perform the action can be at least one of a translation, rotation, tilt, among other actions relative to the beam emitter 415.
  • the signal 405 may also define or indicate a speed at which the action is to be performed.
  • the output handler 140 can generate the command signal 405 (e.g., in a continuous manner), as the subject 325 is imaged by the imaging device 110 or the beam emitter 415 is delivering the radiotherapy to the subject 325. With the generation of the signal 405, the output handler 140 can transmit, send, or otherwise provide the signal 405 to the actuator 412 of the apparatus 410. Upon reception of the signal 405, the actuator 412 can perform the action in accordance with the signal 405.
  • the actuator 412 can move, tilt, or rotate the apparatus 410 (e.g., the couch or tabletop) relative the beam emitter 415.
  • the position of the subject 325 relative to the beam emitter 415 may change to account for the movement of organs due to breathing.
  • the beam emitter 415 in turn can continue to administer or deliver the radiotherapy to the target 345 within the volume 330 of the subject 325.
  • the radiotherapy may be delivered to the target 345, without radiating areas of the volume 330 (e.g., organ tissue) outside or substantially (e.g., within 90– 95%) outside the target 345.
  • the image processing system 105 may provide or generate a clearer decomposed image of a structure of interest (e.g., an organ with a tumor) in motion in various orientations or rates behind layers of tissue and muscle within the subject, among others.
  • the IDM 145 may be able to achieve higher clarity, accuracy, and higher performance in real- time (or near-real-time) by using subject or patient-specific information, defining motion of the structure of interest within the subject in conjunction with the projection image. Furthermore, since the IDM 145 may be able to achieve higher performance in real-time, this may allow the subject to undergo radiotherapy concurrently with the acquisition of the projection images.
  • the image processing system 105 may permit a clinician examining the subject to make more accurate assessment about the subject and any conditions (e.g., tumor) on a structure of interest (e.g., organ) in the subject.
  • the utility of the image processing system 105 and the imaging device 110 in acquiring projection images and providing radiotherapy to the subject may thus be improved, relative to other approaches that do not factor in the projection image as well as the subject-specific information.
  • the motion traces 360 generated by the image processing system 105 can be used to control the movement of the apparatus 410 to allow for continuous administration of the radiotherapy to the subject 325, even as the subject 325 breathes through the time window. Referring now to FIG.
  • FIG. 32 depicts a flow diagram of a method 500 of training a machine learning (ML) model on a dataset.
  • the method 500 can be implemented by any components detailed herein, such as the system 100.
  • a computing system can obtain a first dataset (505).
  • the computing system can apply the ML model to the first dataset (510).
  • the computing system can identify a target defining a structure of interest (515).
  • the computing system can provide information about the target (520).
  • FIG. 33 depicts a method 600 of identifying targets in x-ray projection images, using the ML model.
  • the method 600 can be implemented by any components detailed herein, such as the system 100.
  • a computing system can obtain a first dataset from training data (605).
  • the computing system can apply the ML model to the first dataset (610).
  • the computing system can identify a target defining a -67- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) structure of interest (615).
  • the computing system determine a loss metric to update the ML model (620).
  • the computing system may update the ML model based on the loss metric (625).
  • FIG. 34 shows a simplified block diagram of a representative server system 700, client computing system 714, and network 726 usable to implement certain embodiments of the present disclosure.
  • server system 700 or similar systems can implement services or servers described herein or portions thereof.
  • Client computing system 714 or similar systems can implement clients described herein.
  • the system 700 described herein can be similar to the server system 700.
  • Server system 700 can have a modular design that incorporates a number of modules 702 (e.g., blades in a blade server embodiment); while two modules 702 are shown, any number can be provided.
  • Each module 702 can include processing unit(s) 704 and local storage 706.
  • Processing unit(s) 704 can include a single processor, which can have one or more cores, or multiple processors.
  • processing unit(s) 704 can include a general-purpose primary processor as well as one or more special-purpose co-processors such as graphics processors, digital signal processors, or the like.
  • processing units 704 can be implemented using customized circuits, such as application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs). In some embodiments, such integrated circuits execute instructions that are stored on the circuit itself. In other embodiments, processing unit(s) 704 can execute instructions stored in local storage 706. Any type of processors in any combination can be included in processing unit(s) 704. Local storage 706 can include volatile storage media (e.g., DRAM, SRAM, SDRAM, or the like) and/or non-volatile storage media (e.g., magnetic or optical disk, flash memory, or the like). Storage media incorporated in local storage 706 can be fixed, removable or upgradeable as desired.
  • volatile storage media e.g., DRAM, SRAM, SDRAM, or the like
  • non-volatile storage media e.g., magnetic or optical disk, flash memory, or the like. Storage media incorporated in local storage 706 can be fixed, removable or upgradeable as desired.
  • Local storage 706 can be physically or logically divided into various -68- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) subunits such as a system memory, a read-only memory (ROM), and a permanent storage device.
  • the system memory can be a read-and-write memory device or a volatile read-and-write memory, such as dynamic random-access memory.
  • the system memory can store some or all of the instructions and data that processing unit(s) 704 need at runtime.
  • the ROM can store static data and instructions that are needed by processing unit(s) 704.
  • the permanent storage device can be a non-volatile read-and-write memory device that can store instructions and data even when module 702 is powered down.
  • local storage 706 can store one or more software programs to be executed by processing unit(s) 704, such as an operating system and/or programs implementing various server functions such as functions of the system 500 of FIG. 5 or any other system described herein, or any other server(s) associated with system 500 or any other system described herein.
  • Software refers generally to sequences of instructions that, when executed by processing unit(s) 704 cause server system 700 (or portions thereof) to perform various operations, thus defining one or more specific machine embodiments that execute and perform the operations of the software programs.
  • the instructions can be stored as firmware residing in read-only memory and/or program code stored in non-volatile storage media that can be read into volatile working memory for execution by processing unit(s) 704.
  • Software can be implemented as a single program or a collection of separate programs or program modules that interact as desired. From local storage 706 (or non-local storage described below), processing unit(s) 704 can retrieve program instructions to execute and data to process in order to execute various operations described above.
  • multiple modules 702 can be interconnected via a bus or other interconnect 708, forming a local area network that supports communication -69- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) between modules 702 and other components of server system 700.
  • Interconnect 708 can be implemented using various technologies including server racks, hubs, routers, etc.
  • a wide area network (WAN) interface 710 can provide data communication capability between the local area network (interconnect 708) and the network 726, such as the Internet. Technologies can be used, including wired (e.g., Ethernet, IEEE 702.3 standards) and/or wireless technologies (e.g., Wi-Fi, IEEE 702.24 standards).
  • local storage 706 is intended to provide working memory for processing unit(s) 704, providing fast access to programs and/or data to be processed while reducing traffic on interconnect 708.
  • Storage for larger quantities of data can be provided on the local area network by one or more mass storage subsystems 712 that can be connected to interconnect 708.
  • Mass storage subsystem 712 can be based on magnetic, optical, semiconductor, or other data storage media. Direct attached storage, storage area networks, network-attached storage, and the like can be used. Any data stores or other collections of data described herein as being produced, consumed, or maintained by a service or server can be stored in mass storage subsystem 712.
  • additional data storage resources may be accessible via WAN interface 710 (potentially with increased latency).
  • Server system 700 can operate in response to requests received via WAN interface 710.
  • modules 702 can implement a supervisory function and assign discrete tasks to other modules 702 in response to received requests.
  • Work allocation techniques can be used.
  • results can be returned to the requester via WAN interface 710.
  • WAN interface 710 can connect multiple server systems 700 to each other, providing scalable systems capable of managing high volumes of activity.
  • Other techniques for managing server systems and server farms can be used, including dynamic resource allocation and reallocation. -70- 4924-4486-5864.1 Atty. Dkt.
  • Server system 700 can interact with various user-owned or user-operated devices via a wide-area network such as the Internet.
  • An example of a user-operated device is shown in FIG. 6 as client computing system 714.
  • Client computing system 714 can be implemented, for example, as a consumer device such as a smartphone, other mobile phone, tablet computer, wearable computing device (e.g., smart watch, eyeglasses), desktop computer, laptop computer, and so on.
  • client computing system 714 can communicate via WAN interface 710.
  • Client computing system 714 can include computer components such as processing unit(s) 716, storage device 718, network interface 720, user input device 722, and user output device 724.
  • Client computing system 714 can be a computing device implemented in a variety of form factors, such as a desktop computer, laptop computer, tablet computer, smartphone, other mobile computing device, wearable computing device, or the like.
  • Processing unit(s) 716 and storage device 718 can be similar to processing unit(s) 704 and local storage 706 described above. Suitable devices can be selected based on the demands to be placed on client computing system 714; for example, client computing system 714 can be implemented as a “thin” client with limited processing capability or as a high- powered computing device.
  • Client computing system 714 can be provisioned with program code executable by processing unit(s) 716 to enable various interactions with server system 700.
  • Network interface 720 can provide a connection to the network 726, such as a wide area network (e.g., the Internet) to which WAN interface 710 of server system 700 is also connected.
  • network interface 720 can include a wired interface (e.g., Ethernet) and/or a wireless interface implementing various RF data communication standards such as Wi-Fi, Bluetooth, or cellular data network standards (e.g., 3G, 4G, LTE, etc.).
  • User input device 722 can include any device (or devices) via which a user can provide signals to client computing system 714; client computing system 714 can interpret the signals as indicative of particular user requests or information.
  • input device 722 can include any or all of a keyboard, touch pad, touch screen, mouse or other pointing device, scroll wheel, click wheel, dial, button, switch, keypad, microphone, and so on.
  • User output device 724 can include any device via which client computing system 714 can provide information to a user.
  • user output device 724 can include a display to present images generated by or delivered to client computing system 714.
  • the display can incorporate various image generation technologies, e.g., a liquid crystal display (LCD), light- emitting diode (LED) including organic light-emitting diodes (OLED), projection system, cathode ray tube (CRT), or the like, together with supporting electronics (e.g., digital-to-analog or analog-to-digital converters, signal processors, or the like).
  • LCD liquid crystal display
  • LED light- emitting diode
  • OLED organic light-emitting diodes
  • CRT cathode ray tube
  • Some embodiments can include a device such as a touchscreen that function as both input and output device.
  • other user output devices 724 can be provided in addition to or instead of a display. Examples include indicator lights, speakers, tactile “display” devices, printers, and so on.
  • Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a computer-readable storage medium. Many of the features described in this specification can be implemented as processes that are specified as a set of program instructions encoded on a computer-readable storage medium. When these program instructions are executed by one or more processing units, they cause the processing unit(s) to perform various operation indicated in the program instructions. Examples of program instructions or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.
  • processing unit(s) 704 and 716 can provide various functionality for server system 700 and client computing system 714, including any of the functionality described herein as being performed by a server or client, or other functionality. It will be appreciated that server system 700 and client computing system 714 are illustrative and that variations and modifications are possible. Computer systems used in -72- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) connection with embodiments of the present disclosure can have other capabilities not specifically described here. Further, while server system 700 and client computing system 714 are described with reference to particular blocks, it is to be understood that these blocks are defined for convenience of description and are not intended to imply a particular physical arrangement of component parts.
  • blocks can be but need not be located in the same facility, in the same server rack, or on the same motherboard. Further, the blocks need not correspond to physically distinct components. Blocks can be configured to perform various operations, e.g., by programming a processor or providing appropriate control circuitry, and various blocks might or might not be reconfigurable depending on how the initial configuration is obtained. Embodiments of the present disclosure can be realized in a variety of apparatus including electronic devices implemented using any combination of circuitry and software. While the disclosure has been described with respect to specific embodiments, one skilled in the art will recognize that numerous modifications are possible. Embodiments of the disclosure can be realized using a variety of computer systems and communication technologies including but not limited to the specific examples described herein.
  • Embodiments of the present disclosure can be realized using any combination of dedicated components and/or programmable processors and/or other programmable devices.
  • the various processes described herein can be implemented on the same processor or different processors in any combination.
  • components are described as being configured to perform certain operations, such configuration can be accomplished, e.g., by designing electronic circuits to perform the operation, by programming programmable electronic circuits (such as microprocessors) to perform the operation, or any combination thereof.
  • programmable electronic circuits such as microprocessors
  • Computer programs incorporating various features of the present disclosure may be encoded and stored on various computer-readable storage media; suitable media include -73- 4924-4486-5864.1 Atty. Dkt. No.: SK-2023-108 (115872-3267) magnetic disk or tape, optical storage media such as compact disk (CD) or DVD (digital versatile disk), flash memory, and other non-transitory media.
  • Computer-readable media encoded with the program code may be packaged with a compatible electronic device, or the program code may be provided separately from electronic devices (e.g., via Internet download or as a separately packaged computer-readable storage medium).

Landscapes

  • Apparatus For Radiation Diagnosis (AREA)

Abstract

L'invention concerne des systèmes et des procédés d'identification de cibles dans des images de projection de rayons X. Un système informatique peut obtenir un premier ensemble de données pour un premier sujet. Le premier ensemble de données peut comprendre (i) une première image de projection acquise par l'intermédiaire d'un premier balayage de rayons X d'un premier volume ayant une première structure d'intérêt (SOI) se déplaçant à l'intérieur du premier sujet et (ii) des premières informations à utiliser pour définir un mouvement de la première SOI dans le premier sujet. Le système informatique peut appliquer un modèle d'apprentissage automatique (ML) au premier ensemble de données. Le modèle de ML peut être établi pour au moins le premier sujet à l'aide de données d'entraînement. Le système informatique peut identifier, sur la base de l'application du modèle de ML, une cible définissant la première SOI à l'intérieur de la première image de projection. Le système informatique peut stocker, à l'aide d'une ou de plusieurs structures de données, une association entre le premier sujet et la cible.
PCT/US2025/031801 2024-05-31 2025-05-30 Décomposition automatisée d'images de projection pour détecter des structures cibles d'intérêt Pending WO2025251043A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202463654632P 2024-05-31 2024-05-31
US63/654,632 2024-05-31

Publications (1)

Publication Number Publication Date
WO2025251043A1 true WO2025251043A1 (fr) 2025-12-04

Family

ID=97871611

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2025/031801 Pending WO2025251043A1 (fr) 2024-05-31 2025-05-30 Décomposition automatisée d'images de projection pour détecter des structures cibles d'intérêt

Country Status (1)

Country Link
WO (1) WO2025251043A1 (fr)

Similar Documents

Publication Publication Date Title
US11547874B2 (en) Machine learning approach to real-time patient motion monitoring
US11491348B2 (en) Real-time patient motion monitoring using a magnetic resonance linear accelerator (MRLINAC)
US10149987B2 (en) Method and system for generating synthetic electron density information for dose calculations based on MRI
US8655040B2 (en) Integrated image registration and motion estimation for medical imaging applications
CN107106867A (zh) 磁共振投影成像
EP3468668B1 (fr) Suivi de tissu mou à l'aide d'un rendu de volume physiologique
Rossi et al. Image‐based shading correction for narrow‐FOV truncated pelvic CBCT with deep convolutional neural networks and transfer learning
Lemus et al. Dosimetric assessment of patient dose calculation on a deep learning‐based synthesized computed tomography image for adaptive radiotherapy
US20250285300A1 (en) Method and system for image registration and volumetric imaging
Ranjbar et al. Development and prospective in‐patient proof‐of‐concept validation of a surface photogrammetry+ CT‐based volumetric motion model for lung radiotherapy
Mochizuki et al. Cycle‐generative adversarial network‐based bone suppression imaging for highly accurate markerless motion tracking of lung tumors for cyberknife irradiation therapy
US20250360339A1 (en) Markerless anatomical object tracking during an image-guided medical procedure
Wang et al. Applications of generative adversarial networks (GANs) in radiotherapy: narrative review
Wijesinghe Intelligent image-driven motion modelling for adaptive radiotherapy
WO2025251043A1 (fr) Décomposition automatisée d'images de projection pour détecter des structures cibles d'intérêt
Paysan et al. Deep learning methods for image guidance in radiation therapy
Mochizuki et al. GAN-based bone suppression imaging for high-accurate markerless motion tracking of lung tumors in CyberKnife treatment
Haidari et al. Towards real‐time conformal palliative treatment of spine metastases: A deep learning approach for Hounsfield Unit recovery of cone beam CT images
Akintonde Surrogate driven respiratory motion model derived from CBCT projection data
Xu Artificial Intelligence Augmented Medical Imaging Reconstruction in Radiation Therapy
Miandoab et al. Extraction of respiratory signal based on image clustering and intensity parameters at radiotherapy with external beam: A comparative study
Nakas 4DMRI and motion management in adaptive particle therapy of cancer
Gardner et al. Deep learning-based real-time detection of head and neck tumors during radiation therapy
Huang Motion Estimation and Motion-Compensated Reconstruction for Four-Dimensional Cone Beam Computed Tomography (4D-CBCT)
Wang et al. Deep learning-based dual-energy subtraction synthesis from single-energy kV x-ray fluoroscopy for markerless tumor tracking