[go: up one dir, main page]

WO2025050906A1 - Appareil et système pour déterminer la forme d'un objet dans une image - Google Patents

Appareil et système pour déterminer la forme d'un objet dans une image Download PDF

Info

Publication number
WO2025050906A1
WO2025050906A1 PCT/CN2024/110006 CN2024110006W WO2025050906A1 WO 2025050906 A1 WO2025050906 A1 WO 2025050906A1 CN 2024110006 W CN2024110006 W CN 2024110006W WO 2025050906 A1 WO2025050906 A1 WO 2025050906A1
Authority
WO
WIPO (PCT)
Prior art keywords
nodes
images
determining
graph
shape
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/110006
Other languages
English (en)
Inventor
Xiaomeng LI
Jiewen YANG
Xinpeng DING
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hong Kong University of Science and Technology
Original Assignee
Hong Kong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hong Kong University of Science and Technology filed Critical Hong Kong University of Science and Technology
Publication of WO2025050906A1 publication Critical patent/WO2025050906A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/162Segmentation; Edge detection involving graph-based methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30048Heart; Cardiac

Definitions

  • the present disclosure relates generally to an apparatus and system for determining a shape of an object in an image.
  • Echocardiography is a non-invasive diagnostic tool that enables the observation of all the structures of the heart. It can capture dynamic information on cardiac motion and function making it a safe and cost-effective option for cardiac morphological and functional analysis.
  • Accurate segmentation of cardiac structure such as left ventricle (LV) , right ventricle (RV) , left atrium (LA) , and right atrium (RA) , is crucial for determining essential cardiac functional parameters, such as ejection fraction and myocardial strain. These parameters can assist physicians in identifying heart diseases, planning treatments, and monitoring progress. Therefore, the development of an automated structure segmentation method for echocardiogram videos is of great significance.
  • New methods, apparatus, systems that assist in advancing technological needs and industrial applications in this area are desirable.
  • a method comprises generating, for each of a first plurality of images, a first plurality of nodes based on a first plurality of features extracted from each of the first plurality of images, the first plurality of features associated with a shape of an object in the first plurality of images; determining, for each of the first plurality of nodes, one or more other nodes based on historical data indicating a previous position and a time associated with the previous position of each of the first plurality of nodes; and determining the shape of the object for each of the first plurality of images based on the one or more other nodes.
  • Figure 1 shows an exemplary illustration of a method for determining a shape of an object in an image according to certain embodiments of the present disclosure.
  • FIG. 2 shows an exemplary illustration of a temporal-wise cycle consistency (TCC) module and a spatial-wise cross-domain graph matching (SCGM) module according to certain embodiments of the present disclosure.
  • TCC temporal-wise cycle consistency
  • SCGM spatial-wise cross-domain graph matching
  • FIG. 3 shows an exemplary illustration of a workflow of a Recursive Graph Convolutional Cell (RGCC) according to certain embodiments of the present disclosure.
  • RGCC Recursive Graph Convolutional Cell
  • Figure 4 shows exemplary results on datasets from left ventricle (LV) segmentation according to certain embodiments of the present disclosure.
  • Figure 5 shows exemplary results for how SCGM and TCC affects an averaged Dice score according to certain embodiments of the present disclosure.
  • Figure 6 shows exemplary results for how a classification loss and a graph matching loss of the SCGM affects an averaged Dice score according to certain embodiments of the present disclosure.
  • Figure 7 shows exemplary results for how a temporal consistency loss and a global domain-adversarial loss of the TCC affects an averaged Dice score according to certain embodiments of the present disclosure.
  • Figure 8 shows exemplary segmentation results from three echocardiogram images according to an embodiment of the present disclosure.
  • Figure 9 shows an exemplary analysis of how different attentions (e.g., cross-domain attention and internal domain attention) can affect performance of segmentation results according to an embodiment of the present disclosure.
  • attentions e.g., cross-domain attention and internal domain attention
  • Figure 10 shows an exemplary illustration of Dice scores of segmentation results for each frame in an echocardiogram video according to an embodiment of the present disclosure.
  • Figure 11 shows a schematic diagram of an exemplary computing device suitable for use in determining a shape of an object in an image.
  • the present specification also discloses apparatus for performing the operations of the methods.
  • Such apparatus may be specially constructed for the required purposes, or may comprise a computer or other device selectively activated or reconfigured by a computer program stored in the computer.
  • the algorithms and displays presented herein are not inherently related to any particular computer or other apparatus.
  • Various machines may be used with programs in accordance with the teachings herein.
  • the construction of more specialized apparatus to perform the required method steps may be appropriate.
  • the structure of a computer will appear from the description below.
  • the present specification also implicitly discloses a computer program, in that it would be apparent to the person skilled in the art that the individual steps of the method described herein may be put into effect by computer code.
  • the computer program is not intended to be limited to any particular programming language and implementation thereof. It will be appreciated that a variety of programming languages and coding thereof may be used to implement the teachings of the disclosure contained herein.
  • the computer program is not intended to be limited to any particular control flow. There are many other variants of the computer program, which can use different control flows without departing from the spirit or scope of the disclosure.
  • Such a computer program may be stored on any computer readable medium.
  • the computer readable medium may include storage devices such as magnetic or optical disks, memory chips, or other storage devices suitable for interfacing with a computer.
  • the computer readable medium may also include a hard-wired medium such as exemplified in the Internet system, or wireless medium such as exemplified in the GSM mobile telephone system.
  • the computer program when loaded and executed on such a computer effectively results in an apparatus that implements the steps of the preferred method.
  • Various embodiments of the present disclosure relate to a method and system for determining a shape of an object in an image.
  • an image refers to a visual representation as commonly known in the art.
  • the image may be a photograph, a video frame, for example a frame from a video (e.g., a video comprising a plurality of video frames) , or other similar media.
  • An image may be from a source domain (e.g., a dataset with labels and ground truth information) or from a target domain (e.g., a dataset from which a shape of an object is to be determined) , and may be used as input in a segmentation network for determining a shape of an object in an image.
  • echocardiogram e.g., an ultrasound test that checks the structure and function of a heart
  • images and videos are referred to herein, it will be appreciated that usage of other similar types of images and videos are also possible.
  • An object refers to an entity captured in the image.
  • a shape of the object is to be determined (e.g., a determination of a visual form of the object) .
  • the object may comprise one or more parts or structures within it that may require segmentation in order to determine a shape of each segment.
  • the object may be a heart shown in an echocardiogram image, and the object may be segmented into segments such as left ventricle (LV) , right ventricle (RV) , left atrium (LA) , and right atrium (RA) of the heart.
  • a shape for the heart and for each segment of the heart (e.g., LV, RV, LA and RA) may then be determined.
  • the object may also be one or more of the LV, RV, LA and RA of a heart.
  • a heart is referred to as the object in the present disclosure, it will be appreciated that the object may refer to other entities depending on the images and videos used.
  • a feature refers to an attribute or variable associated with the object that may be extracted (e.g., by a feature extractor of a segmentation network) from the image for use in determining a shape of the object.
  • the extracted feature may be used to generate a plurality of nodes in which each node represents a pixel associated with the feature.
  • each node one or more other nodes may be determined based on a K nearest neighbour on a hidden state of each node.
  • a hidden state refers to historical data indicating a previous position of each node and a time associated with the previous position.
  • the shape of the object may then be determined based on the one or more other nodes, advantageously taking into account cyclical consistency of a heart and thus improving accuracy of the determination. Further, a plurality of edges for connecting the one or more other nodes with each of the plurality of nodes may be determined, and a global representation may be generated for the image.
  • the global representation refers to a representation of the image (or a plurality of images, either in the source domain or the target domain) which is connected to all other nodes and edges (e.g., all other nodes and edges generated from other videos or pluralities of images that are also used as input in the segmentation network) to facilitate passing of information and messages throughout the network.
  • a first global representation may be generated based on an image or a plurality of images from the target domain and a second global representation may be generated based on an image or a plurality of images from the source domain, and these global representations may be used for determining a total temporal consistency loss which may advantageously be used for improving accuracy of determining the shape of the object.
  • an extracted feature and its corresponding pseudo label may be used to generate a graph for modelling a corresponding image.
  • a first graph may be generated for an image (or a first plurality of images) of the target domain and a second graph may be generated for an image (or a second plurality of images) of the source domain.
  • An alignment between the first (target domain) graph and the second (source domain) graph may be performed based on an adjacency matrix (e.g., a N x M matrix in which N and M refers to a total number of nodes in source domain and target domain respectively, in which each element of the matrix represent an existence of an edge that connect a pair of nodes between graphs) to reduce a difference (domain gap) between both domains, thereby improving accuracy of the determination of the shape of the object.
  • a classification loss may be determined based on the aligned first and second graphs.
  • a transport cost matrix (e.g., a distance matrix that is calculated for each feature between the source domain and the target domain) may also be determined based on the aligned first and second graphs (e.g., utilizing a Sinkhorn algorithm (an iterative method also known as the Sinkhorn-Knopp algorithm, used to solve optimal transport problems and compute a Sinkhorn distance between two probability distributions) or other similar algorithm) , and a graph matching loss may be determined based on the transport cost matrix.
  • the classification loss and graph matching loss may be used to optimise the network and eliminate influence of domain shift (e.g., from source domain to target domain) .
  • Unsupervised domain adaptation (UDA) segmentation for echocardiogram videos has not been explored yet, and the most intuitive way is to adapt existing UDA methods designed for natural image segmentation and medical image segmentation.
  • existing methods can be grouped into 1) the image-level alignment methods that focus on aligning the style difference to minimize the domain gaps, such as Probabilistic Latent Component Analysis (PLCA) , Pix-Match and Fourier-based UDA; and 2) feature-level alignment methods that use global class-wise alignment to reduce the discrepancy between source and target domains.
  • PLCA Probabilistic Latent Component Analysis
  • Pix-Match Pix-Match
  • Fourier-based UDA feature-level alignment methods that use global class-wise alignment to reduce the discrepancy between source and target domains.
  • the present disclosure refers to a novel graph-driven UDA method for echocardiogram video segmentation.
  • the proposed method consists of two novel designs: (1) Spatial-wise Cross-domain Graph Matching (SCGM) module and (2) Temporal Cycle Consistency (TCC) module.
  • SCGM Spatial-wise Cross-domain Graph Matching
  • TCC Temporal Cycle Consistency
  • SCGM is motivated by the fact that the structure/positions of the different cardiac structures are similar across different patients and domains. For example, the left ventricle’s appearance is typically visually alike across different patients.
  • the SCGM approach reframes domain alignment as a fine-grained graph-matching process that aligns both class-specific representations (local information) and the relationships between different classes (global information) . By doing so, it is possible to simultaneously improve intra-class coherence and inter-class distinctiveness.
  • the proposed TCC module is inspired by the observation the recorded echocardiogram videos exhibit cyclical consistency. Specifically, the TCC module utilizes a series of recursive graph convolutional cells to model the temporal relationships between graphs across frames, generating a global temporal graph representation for each patient. A contrastive objective is utilized that brings together representations from a same video while pushing away those from different videos, thereby enhancing temporal discrimination.
  • SCGM recursive graph convolutional cells
  • the proposed method can leverage prior knowledge in echocardiogram videos to enhance inter-class differences and intra-class similarities across source and target domains while preserving temporal cyclical consistency, leading to a better UDA segmentation result, for example as shown in results 804 of the proposed method in comparison with ground truth 806 in illustration 800 of Figure 8.
  • UDA segmentation methods are typically utilized for segmenting natural and medical images.
  • adversarial-based domain adaptation methods and multi-stage self-training methods including single stage and multi-stage are the most commonly used training methods.
  • the adversarial method aims to align the distributions and reduce the discrepancy of source and target domains through the Generative Adversarial Networks (GAN) framework.
  • GAN Generative Adversarial Networks
  • the self-training generate and update pseudo labels online during training, such as applying data augmentation or domain mix-up.
  • the UDA segmentation methods can be classified into image-level methods that use GANs and different types of data augmentation to transfer source domain data to the target domain, and feature-level methods, such as feature alignment methods that aim to learn domain-invariant features across domains.
  • cardiac segmentation techniques can be referenced from documents such as: A novel unsupervised domain adaptation framework based on graph convolutional network and multi-level feature alignment for inter-subject ECG classification (Volume 221, 2023, 119711, ISSN 0957-4174) ; Automated cardiac segmentation of cross-modal medical images using unsupervised multi-domain adaptation and spatial neural attention structure (Medical Image Analysis Volume 72, August 2021 102135) ; Characterizing Spatio-temporal Patterns for Disease Discrimination in Cardiac Echo Videos (Medical Image Computing and Computer-Assisted Intervention -MICCAI 2007, 10th International Conference, Brisbane, Australia, October 29 -November 2, 2007, Proceedings, Part I, DOI: 10.1007/978-3-540-75757-3_32) ; China patent no.
  • CN 111476805 B Coronary heart disease prediction method fusing domain-adaptive transfer learning with graph convolutional networks (Lin, H., Chen, K., Xue, Y. et al., Sci Rep 13, 14276 (2023) ) ; Unsupervised Domain Adaptation for Cardiac Segmentation: Towards Structure Mutual Information Maximization (Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 2588-2597) ; and GraphEcho: Graph-Driven Unsupervised Domain Adaptation for Echocardiogram Video Segmentation (Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 11878-11887) .
  • Graph neural networks have the ability to construct graphical representations to describe irregular objects of data. Also, graphs can iteratively aggregate the knowledge based on the broadcasting of their neighbouring nodes in the graph, which is more flexible for constructing the relationship among different components.
  • the learned graph representations can be used in various downstream tasks, such as classification, object detection, vision-language, etc.
  • ViG models an image as a graph and uses GNN to extract high-level features for image classification.
  • graphical representation instead of the feature space may be applied to explore multiple long-range contextual patterns from the different scales for more accurate object detection.
  • GOT leverages the graphs to conduct the vision and language alignment for image-text retrieval.
  • the proposed method learns both local class-wise and temporal-wise graph representations, which can advantageously reduce the domain gap in a fine-grained approach and enhance temporal consistency, leading to an enhanced result.
  • the source e.g., second plurality of images
  • target domain data e.g., first plurality of images
  • X s is the video set in the source domain
  • Y s is its corresponding label set. Note that the videos set in the target domain X t are without the label.
  • an image or video frame with the label ⁇ x s , y s ⁇ (e.g., see y s 120) from an example ⁇ X s , Y s ⁇ of the source domain data may be sampled, where X s ⁇ X s is a video from X s and Y s ⁇ Y s is its corresponding label.
  • an image or a video frame may also be sampled from the target domain, e.g., x t .
  • the basic segmentation network of illustration 100 may consist of the feature extractor 106 and the decoder 110.
  • the x s e.g., second plurality of images
  • x t e.g., first plurality of images
  • the x s may be input to the feature extractor 106 to obtain a plurality of features 108 e.g., f s (e.g., a second plurality of features) or f t (e.g., a first plurality of features) respectively, followed by the decoder 110 that maps the features f s or f t to a corresponding prediction mask, e.g., (see 118) or
  • a segmentation loss 116 e.g., L seg
  • L bce and L dice are the binary cross-entropy loss and dice loss respectively.
  • Binary Cross-Entropy (BCE) Loss refers to a loss function used in machine learning, particularly in binary classification problems where the goal is to predict whether an input belongs to one of two classes (e.g., "yes” or “no” , “true” or “false” , “spam” or “not spam” ) .
  • a Dice loss or Dice score also known as the coefficient, is a metric used to evaluate a similarity or overlap between two sets of data. It is commonly used in the field of image segmentation, where it is used to measure the similarity between a predicted segmentation and the ground truth segmentation.
  • a spatial-wise cross-domain graph matching (SCGM) module 114 may be utilized to align both class-wise representations and their relations across the source and target domains.
  • SCGM spatial-wise cross-domain graph matching
  • a graph may be used to model each echocardiogram frame, where a plurality of nodes in each graph represent the different chambers (e.g., LV, RV, LA and RA of a heart in the echocardiogram) and a plurality of edges in each graph illustrate the relations between them.
  • the graph can better construct the relations among different classes explicitly.
  • the features of source and target domains e.g., f s and f t may be converted to the corresponding graph representation, which is defined as g s 122 and g t 124 respectively.
  • a graph matching method may be utilized to align the generated graph to reduce the domain gap.
  • edge connections e s may be defined, which is a learned matrix.
  • the constructed semantic graph for the source domain e.g., generating, for each of a second plurality of images, a second graph based on a second plurality of features extracted from each of the second plurality of images and based on a second plurality of pseudo labels associated with the second plurality of images
  • g s ⁇ v s , e s ⁇ (e.g., see g s 122) .
  • a self-attention technique e.g., training a model to focus on the most relevant parts of an input sequence when generating an output.
  • v s and v t may be implemented, which can be formulated as where concat indicates a concatenation (e.g., combining two or more features into one feature) .
  • a classification loss L cls may be determined based on a comparison between the first and the second graphs (e.g., a comparison between a first plurality of nodes associated with the first graph and a second plurality of nodes associated with the second graph) as follows:
  • Softmax is a function used in the final layer of a neural network for multi-class classification problems.
  • graph matching may be implemented by maximising the similarity between graphs (including nodes and edges in the graphs) belonging to a same class but from two different domains.
  • an adjacency matrix A may be obtained from g s 122 and g t 124 to represent the relations among the graph nodes (e.g., performing an alignment of the first and second graphs based on an adjacency matrix) .
  • the maximizing process may be transferred into optimizing a transport distance of adjacency matrix A.
  • Transport distance also known as the Wasserstein distance or the Earth Mover's distance, is a metric used to measure the similarity or dissimilarity between two probability distributions.
  • a Sinkhorn algorithm or other similar algorithm may be utilized to obtain the transport cost matrix of the plurality of graphs among the chambers (e.g., LV, RV, LA and RA of a heart in the echocardiogram) , defined as (e.g., determining a transport cost matrix based on the aligned first and second graphs) .
  • an optimization target can be formulated as follows:
  • Equation (3) aims to minimize the distance between samples of the same class across different domains while increasing the distance between samples of different classes across domains, thus eliminating the influence of domain shift.
  • L SCGM L cls + L mat is the overall loss of module SCGM module 114.
  • a Temporal Cycle Consistency (TCC) module 112 may be utilized to enhance the temporal graphic representation learning across the plurality of images or frames, by leveraging the temporal morphology of echocardiograms, e.g., the discriminative heart cycle pattern across different patients.
  • the proposed TCC may consist of three parts: a temporal graph node construction to generate a sequence of temporal graph nodes for each video; a recursive graph convolutional cell to learn the global graph representations for each video; a temporal consistency loss to enhance the intra-video similarity and reduce the inter-video similarity.
  • the TCC may be applied to both source and target domains. In the following, the TCC is explained based on the source domain for clarity.
  • a plurality of features for the plurality of images or frames (e.g., a second plurality of images) may be defined as where f i is the feature of the i-th image or frame and N is the number of images or frames in X 8 .
  • each compressed feature f i s may be flattened (e.g., as shown in 302 of illustration 300) and its plurality of pixels may be treated as a plurality of graphical nodes, e.g., Thus, a plurality of temporal graph nodes for the video X 8 may be defined as
  • a recursive graph convolutional cell 204 may be utilized to aggregate the semantics of the temporal graph nodes (e.g., see 304 of illustration 300) for obtaining the global temporal representation of each video (e.g., see 306 of illustration 300) .
  • N (p) For the p-th node at we find its K nearest neighbors N (p) on a hidden state h t , where N (p) ⁇ h t (e.g., determining, for each of the plurality of nodes, one or more other nodes based on historical data indicating a previous position and a time associated with the previous position of each of the second plurality of nodes) .
  • an edge may be added directed from h t (q) to for all h t (q) ⁇ N (p) .
  • the message broadcast from the i-th graph to the i + 1-th graph can be defined as follows:
  • indicates the activation function
  • w gcn and b gcn are the graph convolution weight and bias, respectively.
  • This message broadcast for may be conducted to obtain a final hidden state h N .
  • the final hidden state refers to a hidden state of the network after it has processed the entire input sequence e.g., the first and second plurality of images.
  • o s RGCC (X s ) .
  • a temporal representation refers to one or more features learned by the network about cardiac motion patterns in the first plurality of images (e.g., echocardiogram videos) .
  • temporal consistency loss may be leveraged to make features from a same video similar and features from different videos dissimilar.
  • contrastive learning is used, which is a mainstream method to pull close the positive pairs and push away negative ones, to achieve this goal.
  • other similar methods may also be utilized. For example, two consequent clips and may be randomly sampled from a video X s as positive pairs. Then, these positive clips are input to the recursive graph convolutional cell to obtain the global representations, e.g., and For negative pairs, a memory bank B consisting of representations of clips sampled from different videos may be maintained. Then, the temporal consistency loss for the source domain (e.g., obtaining temporal consistency loss for the source domain based on global representation (s) for the source domain) is defined as follows:
  • the temporal consistency loss for the target domain e.g., obtaining temporal consistency loss for the target domain based on global representation (s) for the target domain
  • the total temporal consistency loss e.g., obtaining total temporal consistency loss based on a first global representation for the target domain and the second global representation for the source domain
  • L tc Since L tc is applied to two domains independently, a gap between source and target domains still exists for the learned global representation, e.g., o s or o t . Hence, adversarial methods may be utilized to eliminate the gap between o s and o t , which can be formulated as L adv .
  • CAMUS consists of 500 echocardiogram videos with pixel-level annotations for the left ventricle, myocardium, and left atrium. To save the annotation cost, only 2 frames (end diastole and end systole) are labelled in each video.
  • the dataset was randomly split into 8 : 1 : 1 for training, validation, and testing.
  • Echonet Dynamic is the largest echocardiogram video dataset, including 10, 030 videos with human expert annotations. Similarly, these videos were split into 8 : 1 : 1 for training, validation, and testing, respectively.
  • Table 400 of Figure 4 shows the results of the UDA methods on the three datasets (CardiacUDA, CAMUS, and Echonet) under six settings. As only LV segmentation labels were provided in these three datasets, only the results on a Dice score of LV segmentation are provided in the table. “EDV” and “ESV” refers to the Dice score of LV segmentation results at end-systole and end-diastole frames, respectively. All results are reported in Dice score (%) . In Table 400, ‘a ⁇ b’ indicates that a is the source domain and b is the target domain. As can be seen in Table 400, the proposed method can achieve excellent performance under six settings.
  • the proposed method can achieve 87.6%and 82.4%on Dice (e.g., a metric used to evaluate a similarity or overlap between two sets of segmented regions) for EDV and ESV, respectively, which are very close to the upper bound (see row 404 of Table 400) of this setting.
  • the proposed method was also compared with state-of-the-art methods on different settings as shown in the remaining rows 406 of Table 400, which shows the proposed method outperforming all other methods with significant improvements.
  • Table 500 shows the effectiveness of the proposed SCGM and TCC.
  • “Base” indicates the basic segmentation network. The results show that adopting SCGM can largely improve the base model from 48.5%to 74.3%under setting G ⁇ R. However, only applying TCC shows limited improvements over the base model. This is mainly because the TCC is designed to jointly train unlabelled data and construct better graphical representation in a temporal manner, which does not include any operation that focuses on narrowing the domain discrepancy, leading to limited adaptation results. Thus, a combination of both SCGM and TCC in the proposed method can achieve the best performance.
  • Illustration 1000 of Figure 10 illustrates that the segmentation result generated by a framework with the TCC module is able to present more consistent performance (e.g., marked by the line 1002) in a video.
  • the results without the TCC module e.g., marked by the line 1004) or disabling the domain adaptation (e.g., marked by the line 1006) perform worse in terms of segmentation consistency.
  • illustration 1000 of Figure 10 also shows the Dice score for each frame in a video (e.g., a plurality of images) example.
  • the proposed method e.g., marked by the line 1002 produces better results with enhanced temporal consistency, showing the effectiveness of the TCC module in learning temporal information.
  • Figure 11 shows a schematic diagram of an exemplary computing device suitable for use in determining a shape of an object in an image.
  • Figure 11 depicts an exemplary computing device 1100, hereinafter interchangeably referred to as a computer system 1100, where one or more such computing devices 1100 may be used as a system for determining a shape of an object in an image and execute the processes and calculations as depicted in at least Figures 1 to 10.
  • the following description of the computing device 1100 is provided by way of example only and is not intended to be limiting.
  • the example computing device 1100 includes a processor 1104 for executing software routines. Although a single processor is shown for the sake of clarity, the computing device 1100 may also include a multi-processor system.
  • the processor 1104 is connected to a communication infrastructure 1106 for communication with other components of the computing device 1100.
  • the communication infrastructure 1106 may include, for example, a communications bus, cross-bar, or network.
  • the computing device 1100 further includes a main memory 1108, such as a random access memory (RAM) , and a secondary memory 1110.
  • the secondary memory 1110 may include, for example, a storage drive 1112, which may be a hard disk drive, a solid state drive or a hybrid drive and/or a removable storage drive 1114, which may include a magnetic tape drive, an optical disk drive, a solid state storage drive (such as a USB flash drive, a flash memory device, a solid state drive or a memory card) , or the like.
  • the removable storage drive 1114 reads from and/or writes to a removable storage medium 1118 in a well-known manner.
  • the removable storage medium 1118 may include magnetic tape, optical disk, non-volatile memory storage medium, or the like, which is read by and written to by removable storage drive 1114.
  • the removable storage medium 1118 includes a computer readable storage medium having stored therein computer executable program code instructions and/or data.
  • the secondary memory 1110 may additionally or alternatively include other similar means for allowing computer programs or other instructions to be loaded into the computing device 1100.
  • Such means can include, for example, a removable storage unit 1122 and an interface 1120.
  • a removable storage unit 1122 and interface 1120 include a program cartridge and cartridge interface (such as that found in video game console devices) , a removable memory chip (such as an EPROM or PROM) and associated socket, a removable solid state storage drive (such as a USB flash drive, a flash memory device, a solid state drive or a memory card) , and other removable storage units 1122 and interfaces 1120 which allow software and data to be transferred from the removable storage unit 1122 to the computer system 1100.
  • the computing device 1100 also includes at least one communication interface 1124.
  • the communication interface 1124 allows software and data to be transferred between computing device 1100 and external devices via a communication path 1126.
  • the communication interface 1124 permits data to be transferred between the computing device 1100 and a data communication network, such as a public data or private data communication network.
  • the communication interface 1124 may be used to exchange data between different computing devices 1100 which such computing devices 1100 form part an interconnected computer network. Examples of a communication interface 1124 can include a modem, a network interface (such as an Ethernet card) , a communication port (such as a serial, parallel, printer, GPIB, IEEE 1394, RJ45, USB) , an antenna with associated circuitry and the like.
  • the communication interface 1124 may be wired or may be wireless.
  • Software and data transferred via the communication interface 1124 are in the form of signals which can be electronic, electromagnetic, optical or other signals capable of being received by communication interface 1124. These signals are provided to the communication interface via the communication path 1126.
  • the computing device 1100 further includes a display interface 1102 which performs operations for rendering images or videos to an associated display 1130 and an audio interface 1132 for performing operations for playing audio content via associated speaker (s) 1134.
  • a display interface 1102 which performs operations for rendering images or videos to an associated display 1130
  • an audio interface 1132 for performing operations for playing audio content via associated speaker (s) 1134.
  • Computer program product may refer, in part, to removable storage medium 1118, removable storage unit 1122, a hard disk installed in storage drive 1112, or a carrier wave carrying software over communication path 1126 (wireless link or cable) to communication interface 1124.
  • Computer readable storage media refers to any non-transitory, non-volatile tangible storage medium that provides recorded instructions and/or data to the computing device 1100 for execution and/or processing.
  • Examples of such storage media include magnetic tape, CD-ROM, DVD, Blu-ray Disc, a hard disk drive, a ROM or integrated circuit, a solid state storage drive (such as a USB flash drive, a flash memory device, a solid state drive or a memory card) , a hybrid drive, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computing device 1100.
  • a solid state storage drive such as a USB flash drive, a flash memory device, a solid state drive or a memory card
  • a hybrid drive such as a magneto-optical disk
  • a computer readable card such as a PCMCIA card and the like
  • Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computing device 1100 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.
  • the computer programs are stored in main memory 1108 and/or secondary memory 1110. Computer programs can also be received via the communication interface 1124. Such computer programs, when executed, enable the computing device 1100 to perform one or more features of embodiments discussed herein. In various embodiments, the computer programs, when executed, enable the processor 1104 to perform features of the above-described embodiments. Accordingly, such computer programs represent controllers of the computer system 1100.
  • Software may be stored in a computer program product and loaded into the computing device 1100 using the removable storage drive 1114, the storage drive 1112, or the interface 1120.
  • the computer program product may be a non-transitory computer readable medium. Alternatively, the computer program product may be downloaded to the computer system 1100 over the communications path 1126.
  • the software when executed by the processor 1104, causes the computing device 1100 to perform, as a system for determining a shape of an object in an image, the necessary operations to execute the processes, perform the calculations, and other similar computations as shown in Figures 1 –10.
  • FIG. 11 is presented merely by way of example to explain the operation and structure of a system for determining a shape of an object in an image. Therefore, in some embodiments one or more features of the computing device 1100 may be omitted. Also, in some embodiments, one or more features of the computing device 1100 may be combined together. Additionally, in some embodiments, one or more features of the computing device 1100 may be split into one or more component parts.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé pour déterminer une forme d'un objet dans une image, consistant à : générer, pour chacune d'une première pluralité d'images, une première pluralité de nœuds sur la base d'une première pluralité de caractéristiques extraites de chacune de la première pluralité d'images, la première pluralité de caractéristiques étant associée à une forme d'un objet dans la première pluralité d'images; déterminer, pour chacun de la première pluralité de nœuds, un ou plusieurs autres nœuds sur la base de données historiques indiquant une position précédente et un temps associé à la position précédente de chacun de la première pluralité de nœuds; et déterminer la forme de l'objet pour chacune de la première pluralité d'images sur la base des un ou plusieurs autres nœuds.
PCT/CN2024/110006 2023-09-06 2024-08-06 Appareil et système pour déterminer la forme d'un objet dans une image Pending WO2025050906A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363580717P 2023-09-06 2023-09-06
US63/580717 2023-09-06

Publications (1)

Publication Number Publication Date
WO2025050906A1 true WO2025050906A1 (fr) 2025-03-13

Family

ID=94922925

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/110006 Pending WO2025050906A1 (fr) 2023-09-06 2024-08-06 Appareil et système pour déterminer la forme d'un objet dans une image

Country Status (1)

Country Link
WO (1) WO2025050906A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190325605A1 (en) * 2016-12-29 2019-10-24 Zhejiang Dahua Technology Co., Ltd. Systems and methods for detecting objects in images
US20190378278A1 (en) * 2018-06-09 2019-12-12 Uih-Rt Us Llc Systems and methods for generating augmented segmented image set
US20210125037A1 (en) * 2019-10-28 2021-04-29 Ai4Medimaging - Medical Solutions, S.A. Artificial intelligence based cardiac motion classification
CN115082493A (zh) * 2022-06-02 2022-09-20 陕西科技大学 基于形状引导对偶一致性的3d心房图像分割方法及系统
US20230005140A1 (en) * 2020-03-13 2023-01-05 Genentech, Inc. Automated detection of tumors based on image processing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190325605A1 (en) * 2016-12-29 2019-10-24 Zhejiang Dahua Technology Co., Ltd. Systems and methods for detecting objects in images
US20190378278A1 (en) * 2018-06-09 2019-12-12 Uih-Rt Us Llc Systems and methods for generating augmented segmented image set
US20210125037A1 (en) * 2019-10-28 2021-04-29 Ai4Medimaging - Medical Solutions, S.A. Artificial intelligence based cardiac motion classification
US20230005140A1 (en) * 2020-03-13 2023-01-05 Genentech, Inc. Automated detection of tumors based on image processing
CN115082493A (zh) * 2022-06-02 2022-09-20 陕西科技大学 基于形状引导对偶一致性的3d心房图像分割方法及系统

Similar Documents

Publication Publication Date Title
Wang et al. Quantification of full left ventricular metrics via deep regression learning with contour-guidance
Biffi et al. Explainable anatomical shape analysis through deep hierarchical generative models
US11701066B2 (en) Device and method for detecting clinically important objects in medical images with distance-based decision stratification
Lan et al. Deep convolutional neural networks for WCE abnormality detection: CNN architecture, region proposal and transfer learning
CN113592769B (zh) 异常图像的检测、模型的训练方法、装置、设备及介质
Gao et al. Transformer based multiple instance learning for WSI breast cancer classification
Zhang et al. Cascaded feature warping network for unsupervised medical image registration
CN111276240A (zh) 一种基于图卷积网络的多标签多模态全息脉象识别方法
Huang et al. Prototype-guided graph reasoning network for few-shot medical image segmentation
Burmeister et al. Less is more: A comparison of active learning strategies for 3d medical image segmentation
Arora et al. Deep Learning Approaches for Enhanced Kidney Segmentation: Evaluating U-Net and Attention U-Net with Cross-Entropy and Focal Loss Functions
US20240144469A1 (en) Systems and methods for automatic cardiac image analysis
Xiao et al. Rcga-net: An improved multi-hybrid attention mechanism network in biomedical image segmentation
Zhang et al. CTransNet: Convolutional neural network combined with transformer for medical image segmentation
WO2025050906A1 (fr) Appareil et système pour déterminer la forme d'un objet dans une image
Peña et al. Cardiac disease representation conditioned by spatio-temporal priors in cine-MRI sequences using generative embedding vectors
Zhou et al. Balancing High-Performance and Lightweight: HL-UNet for 3D Cardiac Medical Image Segmentation
Susanto et al. Data augmentation using spatial transformation for brain tumor segmentation improvement
Song et al. Abdominal multi-organ segmentation using multi-scale and context-aware neural networks
Zhou et al. Learning deep feature representations for multi-modal MR brain tumor segmentation
Shetty et al. Self-Sequential Attention Layer based DenseNet for Thoracic Diseases Detection.
CN115578360A (zh) 一种针对超声心动图像的多目标语义分割方法
Rajaraman et al. Ensembled YOLO for multiorgan detection in chest x-rays
Loukil A Hybrid Approach to Intelligent Prediction of Medical Conditions A Framework for Advancing Medical Diagnostics through Novel Hybrid Deep Learning Models DenCeption and HyBoost for Enhanced Feature Extraction and Predictive Accuracy in Medical Image Analysis
Yang et al. Not all areas are equal: Detecting thoracic disease with chestwnet

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24861751

Country of ref document: EP

Kind code of ref document: A1