WO2023114364A1 - Probability cube to probability cube enhancement for deep learning artificial intelligence - Google Patents
Probability cube to probability cube enhancement for deep learning artificial intelligence Download PDFInfo
- Publication number
- WO2023114364A1 WO2023114364A1 PCT/US2022/052955 US2022052955W WO2023114364A1 WO 2023114364 A1 WO2023114364 A1 WO 2023114364A1 US 2022052955 W US2022052955 W US 2022052955W WO 2023114364 A1 WO2023114364 A1 WO 2023114364A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- deep learning
- learning algorithm
- planar orientation
- training
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
- G06V10/7753—Incorporation of unlabelled data, e.g. multiple instance learning [MIL]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
Definitions
- Al Artificial Intelligence
- Al is a term of art in the interdisciplinary fields of computer science, mathematics, data science, and statistics that generally refers to hardware systems and software algorithms that perform tasks, not intuitively programmatic, that are thought of as requiring some aspect of human intelligence such as, for example, visual perception, object detection, pattern recognition, natural language processing, speech recognition, translation, and decision making.
- Al seeks to develop algorithms that mimic human intelligence to perform tasks and improve in the performance of those tasks over time.
- the promise of these techniques is a more efficient alternative to capture knowledge in data and gradually improve the performance of predictive models, thereby enabling data-driven decision making. While Al remains an exceptionally broad area of research, recent advancements in machine learning and, more specifically, deep learning have led to a paradigm shift in the creation of predictive models that have widespread application across industries.
- a method of probability cube to probability cube enhancement for deep learning includes generating a second probability cube of data by inputting a first probability cube comprising a plurality of predicted 2-dimensional slices having a first planar orientation, partitioning the first probability cube of data into a plurality of unlabeled orthogonal 2-dimensional slices having a second planar orientation, labeling one or more features of interest in the plurality of unlabeled orthogonal 2-dimensional slices having the second planar orientation to produce a plurality of labeled orthogonal 2- dimensional slices having the second planar orientation, and training a model of a deep learning algorithm on the plurality of labeled orthogonal 2-dimensional slices of data having the second planar orientation to produce a plurality of predicted orthogonal 2- dimensional slices having the second planar orientation.
- a computer-implemented method of probability cube to probability cube enhancement for deep learning includes generating a second probability cube of data by inputting a first probability cube comprising a plurality of predicted 2-dimensional slices having a first planar orientation, partitioning the first probability cube of data into a plurality of unlabeled orthogonal 2-dimensional slices having a second planar orientation, labeling one or more features of interest in the plurality of unlabeled orthogonal 2-dimensional slices having the second planar orientation to produce a plurality of labeled orthogonal 2-dimensional slices having the second planar orientation, and training a model of a deep learning algorithm on the plurality of labeled orthogonal 2-dimensional slices of data having the second planar orientation to produce a plurality of predicted orthogonal 2-dimensional slices having the second planar orientation.
- FIG. 1 shows a conventional deep learning algorithm used as part of a deep learning application of machine learning Al.
- FIG. 2 shows a conventional process of generating a trained model of a deep learning algorithm used as part of a deep learning application of machine learning Al.
- FIG. 3 shows a conventional process of labeling 2-dimensional training data to produce a labeled training dataset in advance of training a model of a deep learning algorithm used as part of a deep learning application of machine learning Al.
- FIG. 4A shows a feature in a 3 -dimensional volume of data with a cutout highlighting its 3 -dimensional nature.
- FIG. 4B shows another view of the feature in the 3-dimensional volume of data with a cutout highlighting the planar faces of the feature.
- FIG. 4C shows a process of partitioning the 3-dimensional volume of data into a plurality of unlabeled 2-dimensional slices having a first planar orientation.
- FIG. 4D shows the process of labeling an unlabeled 2-dimensional slice having the first planar orientation and training on the labeled 2-dimensional slice having the first planar orientation to produce a predicted 2-dimensional slice having the first planar orientation.
- FIG. 4E shows a first probability cube created by a plurality of predicted 2- dimensional slices having the first planar orientation.
- FIG. 5A shows the creation of an orthogonal 2-dimensional slice having a second planar orientation from the first probability cube in accordance with one or more embodiments of the present invention.
- FIG. 5B shows the orthogonal 2-dimensional slice having the second planar orientation taken from the first probability cube showing jagged edges, discontinuities, artifacts, and noise in accordance with one or more embodiments of the present invention.
- FIG. 6A shows a process of partitioning a first probability cube into a plurality of unlabeled orthogonal 2-dimensonal slices having a second planar orientation in accordance with one or more embodiments of the present invention.
- FIG. 6B shows a process of labeling an unlabeled orthogonal 2-dimensional slice having the second planar orientation and training on the labeled orthogonal 2- dimensional slice having the second planar orientation to produce a predicted orthogonal 2-dimensional slice having a second planar orientation in accordance with one or more embodiments of the present invention.
- FIG. 6C shows a second probability cube created by a plurality of predicted orthogonal 2-dimensional slices having the second planar orientation in accordance with one or more embodiments of the present invention.
- FIG. 7 shows a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al in accordance with one or more embodiments of the present invention.
- FIG. 8 shows a computer for performing at least part of a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al in accordance with one or more embodiments of the present invention.
- Machine learning is an application of Al that generally refers to hardware systems and software algorithms that are said to leam from the experience of processing data. In essence, machine learning algorithms leam to make predictions from data, without requiring explicit programming to do so.
- Machine learning algorithms are broadly categorized as reinforcement learning algorithms, unsupervised learning algorithms, semi-supervised learning algorithms, and supervised learning algorithms.
- Reinforcement learning algorithms are goal-oriented algorithms that seek to optimize a mathematical objective without any external input.
- an agent is said to interact with an environment in accordance with a policy, take action in accordance with the policy, and adjust the policy based on a reward function of the prior action.
- reinforcement learning algorithms search the solution space using feedback to advance toward a goal.
- Reinforcement learning algorithms are sometimes referred to as self-contained algorithms because they do not require labeled training data, training on labeled training data, or human intervention.
- Unsupervised learning algorithms are pattern identification algorithms that are used to gain insight into large datasets by categorizing data without any external input. Unsupervised learning algorithms are said to self-discover broad patterns in large datasets and are typically used in clustering, association, and dimensionality reduction applications. Unsupervised learning algorithms are also self-contained algorithms that do not require labeled training data, training on labeled training data, or human intervention. While computationally complex, and less accurate than other machine learning techniques, reinforcement and unsupervised learning algorithms leam from data while processing it in real-time.
- Semi-supervised learning algorithms are a hybrid of unsupervised and supervised learning algorithms.
- semi-supervised learning algorithms a small sample of training data, taken from a larger dataset, is manually labeled. The small sample of manually labeled data is then used to train a model that is then used to label the remaining data in the training dataset prior to presentment of the entire labeled training dataset to the model for extensive training. As such, only a small portion of the training dataset is manually labeled, and the remaining data is labeled by the model.
- a significant drawback to this approach is that the quality of the labeling effort is not known until the entire training dataset has been labeled, whether manually or by the model, and then fully trained on the model.
- Supervised learning algorithms differ from reinforcement, unsupervised, and semi-supervised learning algorithms in that supervised learning algorithms typically require a significant amount of training data to be accurately labeled in advance of extensive training on a model of a deep learning algorithm.
- the training data is typically labeled in a labor-intensive process that may take days, weeks, or months to complete depending on the complexity of the application.
- the model must be extensively trained on the labeled training data in a process that may take further days, weeks, or months to complete.
- the quality of the labeling effort may be evaluated only after having fully labeled the training data and having fully trained on the labeled training data. As such, it often requires several iterations of labeling, training, and evaluating to generate a suitably trained model, in a complicated, time consuming, and costly process.
- Deep learning is a supervised learning application of machine learning Al that uses multi-layered, or deep, learning algorithms, such as, for example, artificial neural networks, and often a type of artificial neural network referred to as a convolutional neural network.
- Deep learning algorithms require extensive training on an accurately labeled training dataset to generate a trained model for operative use.
- Each instance of training data in the training dataset must be meticulously and accurately labeled to identify one or more aspects or features of interest.
- the deep learning algorithm must then be extensively trained on the labeled training data to generate a trained model, if one can be generated at all.
- trained models When presented with a large amount of accurately labeled training data and extensive subsequent training, trained models have proven highly effective in complex applications. As such, labeling a training dataset remains one of the most important tasks in generating a trained model in deep learning applications.
- Labeling is the process of qualitatively annotating, marking up, or identifying aspects or features of interest in a training dataset, typically performed in advance of training under conventional labeling processes.
- the labeled training data is used to train a model of a deep learning algorithm to identify or predict aspects or features of interest in a manner consistent with the labeled aspects or features of interest in the labeled training data.
- Training is the process of parameterizing a model of a deep learning algorithm based on a labeled training dataset in an effort to generate a trained model for operative use.
- the training process attempts to determine a set of model parameters, such as, for example, a set of model weights, that map inputs to outputs in a manner consistent with what was learned from the labeled training data.
- a set of model parameters such as, for example, a set of model weights
- the resulting trained model is a deep learning algorithm that is parameterized through training.
- the trained model may then be presented with data to predict one or more features of interest consistent with what was learned from the labeled training dataset. Consequently, the quality and effectiveness of the trained model, assuming one can be generated, is highly dependent on the quality of the labeling effort that is performed in advance of extensive training. For these reasons and others, labeling is widely considered the most important task in the development of a trained model in deep learning applications.
- the deep learning algorithm is an artificial neural network inspired by the biological neural network of the human brain, and often a type of artificial neural network referred to as a convolutional neural network.
- Artificial neural networks typically include an input layer corresponding to inputs to the algorithm, one or more intermediate layers, and an output layer corresponding to outputs of the algorithm.
- the configuration and capability of an artificial neural network is sometimes described by the network’s depth, width, size, capacity, and architecture.
- Depth refers to the number of layers, sometimes referred to as node containers, other than the input layer
- width refers to the number of nodes in a given layer and may vary from layer to layer
- size refers to the total number of nodes in the artificial neural network, where nodes are the fundamental units of the artificial neural network.
- the capacity typically refers to the type or structure of functions that can be learned and is sometimes referred to as the representational capacity of the artificial neural network.
- the architecture broadly refers to the arrangement of layers and nodes in the artificial neural network.
- Each node, or artificial neuron, in an artificial neural network is connected to one or more other nodes by way of an artificial synapse.
- each node in the given layer is connected to each node of the layer immediately following it by an artificial synapse.
- each node in a given layer is connected to each node of the layer immediately preceding it by an artificial synapse.
- the group of artificial synapses that connect each node of the layer immediately preceding the given node are considered input to the given node.
- each node may be characterized by a plurality of model weights applied to its inputs, an activation function, and a threshold function that governs the output to the nodes in the layer immediately following it.
- a model weight may be applied to each artificial synapse that is input to a given node and the weighted inputs may be summed. If the sum of weighted inputs exceeds that node’s threshold, the node is said to be activated and may output a value corresponding to a typically non-linear function of the weighted inputs, sometimes referred to as the activation function, to each node of the layer immediately following it. Conversely, if the sum of weighted inputs does not exceed the node’s threshold, the node is said to be deactivated and does not output to the nodes of the layer immediately following it.
- a trained model of an artificial neural network functionally maps one or more inputs to one or more outputs in a manner consistent with what was learned from the labeled training data.
- labeled training data is used to train the model of the deep learning algorithm, such as, for example, an artificial neural network, to produce a trained model that identifies or predicts aspects or features of interest in data.
- the training process may be thought of as an optimization problem that attempts to determine the model parameters, such as a set of model weights, and potentially other parameters, for the artificial neural network that effectively maps inputs to outputs based on the labeled training data.
- FIG. 1 shows a conventional deep learning algorithm, such as, for example, an artificial neural network 100, used as part of a deep learning application of machine learning Al.
- artificial neural network 100 may include an input layer (e.g., 110), two or more intermediate layers (e.g., 120, 130, and 140), sometimes referred to as hidden layers because they are not directly observable from the system of inputs and outputs, and an output layer (e.g, 150).
- Each layer may include a plurality of nodes, sometimes referred to as artificial neurons, (e.g, nodes 112a-112e for layer 110, nodes 122a- 122e for layer 120, nodes 132a-132e for layer 130, 142a- 142e for layer 140, and nodes 152a-152c for layer 150).
- the number of nodes per layer may vary and may not be the same from layer to layer.
- Each node 112a-112e of input layer 110 may be connected to each node 122a- 122e of intermediate layer 120 via an artificial synapse.
- node 112a may be connected to node 122a via artificial synapse Sii2a-i22a
- node 112a may be connected to node 122b via artificial synapse Sii2a-i22b
- node 112a may be connected to node 122c via artificial synapse Sii2a-i22c
- node 112a may be connected to node 122d via artificial synapse Sii2a-i22d
- node 112a may be connected to node 122e via artificial synapse Si i2a-i22e.
- Each of the remaining nodes 112b-112e of input layer 110 may be connected to each node 122a-122e of intermediate layer 120 immediately following it in the same manner.
- each node 122a-122e of intermediate layer 120 may be connected to each node 132a-132e of intermediate layer 130 via a plurality of artificial synapses (not labeled for clarity)
- each node 132a-132e of intermediate layer 130 may be connected to each node 142a-142e of intermediate layer 140 via a plurality of artificial synapses (not labeled for clarity)
- each node 142a- 143e of intermediate layer 140 may be connected to each node 152a-152c of output layer 150 via a plurality of artificial synapses (not labeled for clarity) in the same manner.
- each node is said to input an artificial synapse from each node in the layer immediately preceding it, and each node outputs an artificial synapse to each node in the layer immediately following it.
- Each node may be characterized by its weighted inputs, activation function, and outputs.
- a model weight may be applied to each artificial synapse that is input to a given node.
- a model weight Wii2a-i22a (not shown) may be applied to artificial synapse Sii2a-i22a originating from node 112a
- a model weight Wii2b-i22a (not shown) may be applied to artificial synapse Sii2b-i22a originating from node 112b
- a model weight Wii2c-i22a (not shown) may be applied to artificial synapse Sii2c-i22a originating from node 112c
- a model weight Wii2d-i22a (not shown) may be applied to artificial synapse Sii2d-i22a originating from node 112d
- a model weight Wii2e-i22a (not shown) may be applied to artificial synapse Sii2d-i22a originating from node 112d
- a model weight may be applied to each artificial synapse (not labeled for clarity) input to each of the remaining nodes 122b- 122e of intermediate layer 120 in the same manner.
- a model weight may be applied to each artificial synapse (not labeled for clarity) input to each node 132a- 132e of intermediate layer 130
- a model weight may be applied to each artificial synapse (not labeled for clarity) input to each node 142a-142e of intermediate layer 140
- a model weight may be applied to each artificial synapse (not labeled for clarity) input to each node 152a- 152c of output layer 150.
- each artificial synapse input to a given node may have a different level of influence as to whether that node activates the next nodes in the layer immediately following it.
- the model weights are typically determined during the training process.
- Each node 112a-112e, 122a-122e, 132a-132e, 142a-142e, and 152a-152c may include an activation function corresponding to a typically non-linear function of the sum of weighted inputs.
- node 122a may include an activation function corresponding to a non-linear function of the sum of: a weighted value Wii2a-i22a of input artificial synapse Sii2a-i22a, a weighted Wii2b-i22a value of input artificial synapse Sii2b-i22a, a weighted Wii2c-i22a value of input artificial synapse Sii2c-i22a, a weighted value Wii2d-i22a of input artificial synapse Sii2d-i22a, and a weighted value Wii2e-i22a of input artificial synapse Sii2e-i22a.
- Each of the remaining nodes 122b-122e of intermediate layer 120 may include an activation function in the same manner.
- each node 132a-132e of intermediate layer 130, each node 142a-142e of intermediate layer 140, and each node 152a- 152c of output layer 150 may each include an activation function in the same manner.
- an activation function governs the output of that node to each node in the layer immediately following it. If the weighted sum of the inputs to the given node falls below the node’s threshold value, the node does not output to the nodes in the layer immediately following it.
- artificial neural network 100 may be thought of as a function that maps data from input nodes 112a-112e of input layer 110 to output nodes 152a- 152c of output layer 150 by way of intermediate layers 120, 130, and 140.
- the activation function is typically specified in advance and may vary from application to application.
- the example described with reference to FIG. 1 is merely exemplary of an artificial neural network and one of ordinary skill in the art will recognize that other types or kinds of deep learning algorithms, with different parameterizations, may be used in deep learning applications of machine learning Al
- the model parameters for the deep learning algorithm are determined including, for example, one or more of model weights applied to inputs, the activation function or, if the activation function is specified in advance, parameters to the activation function, and potentially threshold values for each node.
- the activation function and threshold values may be specified in advance and the only model parameters determined during training may be the model weights applied to inputs at each node.
- the model parameters are typically determined via an empirical optimization procedure, such as, for example, the stochastic gradient descent procedure. Notwithstanding, one of ordinary skill in the art will appreciate that other optimization processes may be used.
- the optimization problem presented by deep artificial neural networks can be challenging and the solution space may include local optima that make it difficult to converge on a solution. As such, the training process typically requires several passes through the labeled training data, where each pass through is referred to as an epoch.
- the amount of change to the model parameters during each epoch is sometimes referred to as the learning rate, which corresponds to the rate at which the model is said to leam.
- the learning rate may best be described as controlling the amount of apportioned error that the model weights are updated with, each time they are updated in an epoch.
- the model Given an ideal learning rate, the model will leam to approximate the function for a given number of layers, nodes per layer, and training epochs.
- a learning rate that is too large will result in updates to model weights that are too large, and the performance of the model will tend to oscillate over epochs. Oscillating performance is typically caused by model weights that diverge during optimization and fail to reach solution.
- the learning rate is important to ensuring that the empirical optimization procedure converges on a solution of model parameter values, resulting in a trained model. In ideal situations, the empirical optimization procedure will converge on a set of model weights that effectively map inputs to outputs consistent with what was learned from the labeled training data.
- the artificial neural network be provided with a sufficient amount of accurately labeled training data such that the artificial neural network can extensively train on the labeled training data and arrive at a set of model parameters that enable the trained model to effectively and accurately map each of the inputs to outputs in a manner consistent with what was learned from the labeled training data.
- this achieves one of the most important advances in deep learning applications of machine learning Al, namely, transfer learning through trained models.
- the underlying deep learning algorithm used to generate a trained model may be, for example, a convolutional neural network, a recurrent neural network, a long short-term memory network, a radial basis function network, a selforganizing map, a deep belief network, a restricted Boltzman machine, an autoencoder, any variation or combination thereof, or any other type or kind of deep learning algorithm that requires a significant amount of labeled training data to generate a trained model, as is well known in the art.
- FIG. 2 shows a conventional process 200 of generating a trained model of a deep learning algorithm, such as, for example, an artificial neural network (e.g., 100 of FIG. 1), as part of a deep learning application of machine learning Al.
- the conventional process 200 requires the creation of a labeled 210 training dataset on the front-end of the process. Specifically, each instance 205 of training data in the training dataset must be meticulously and accurately labeled 210 to identify aspects or features of interest before the model of the deep learning algorithm can be extensively trained 220 on the labeled training data.
- training process 220 successfully converges on a candidate model, comprising a set of model parameters for the underlying deep learning algorithm, the candidate model must be evaluated 230 to determine whether it accurately maps inputs to outputs in a manner consistent with what was learned from the labeled training dataset and is suitable for use as a trained model 250. This is typically a qualitative judgment. If training process 220 does not converge on a candidate model or the evaluation 230 of the model does not meet requirements, the processes of labeling 210, training 220, and evaluating 230 must be repeated 240. Each iteration 240 of training process 220 may require several passes, or epochs, through the labeled training data. As such, each iteration 240 of this process may take days, weeks, or months to complete.
- the quality of the labeling effort is a measure of how accurately the applied labels identify aspects or features of interest in the training data and predicted labels.
- An instance of training data is poorly labeled if it fails to identify, in whole or in part, aspects or features of interest or if it identifies aspects or features of interest where none exist, either of which can frustrate the training process 220.
- training 220 is an empirical optimization process that typically requires several passes through the entire labeled training dataset, often requiring days, weeks, or months of time to complete. Only after having fully labeled 210 and extensively trained 220 may the candidate model be evaluated 230 for suitability in a process that itself may take further days, weeks, or months of time. As such, accurately labeling 210 the training data is critically important to the process and well labeled training data may expedite process 200.
- FIG. 3 shows a conventional process of labeling 210 2-dimensional training data 205 to produce a labeled training dataset 310 in advance of training 220 a model of a deep learning algorithm used as part of a deep learning application of machine learning Al.
- the goal of the labeling 210 process is to accurately label aspects or features of interest in each instance 205 of training data to produce a labeled training dataset 310 that may be used to train 220 a model of the deep learning algorithm, such as, for example, an artificial neural network (e.g., 100 of FIG. 1), and generate a trained model (e.g, 250 of FIG. 2) capable of identifying or predicting similar aspects or features of interest in data in a manner consistent with what was learned from the labeled training dataset 310.
- an artificial neural network e.g., 100 of FIG. 1
- a trained model e.g, 250 of FIG. 2
- the training data consists of a plurality of 2- dimensional graphical images (e.g, set of images 205). Each instance of training data 205 must be meticulously and accurately labeled to identify aspects or features of interest resulting in an instance of labeled training data 305. This process is repeated for each instance of training data 205 resulting in a labeled training dataset 310. It is important to note that the labeling 210 process is typically completed in advance of starting the training process 220.
- Each instance of training data 205 may be representative of a single image from a set of graphical images constituting the training dataset (e.g, set of images 205).
- Each instance of training data 205 may include one or more aspects or features 315a, 315b of interest.
- the aspects or features of interest are, in essence, the things we want the trained model to be able to identify in data. As such, the aspects or features of interest may vary based on the application. For the purpose of this example, the aspects or features of interest 315a, 315b are what we want the trained model to be able to detect in new data of first impression.
- aspects or features of interest 315a, 315b are merely exemplary and may vary in size, shape, number, nature, and orientation.
- one or more labels 325a, 325b may be applied to each aspect or feature of interest 315a, 315b, producing a labeled image 305.
- human operators often fail to recognize every aspect or feature of interest in training data (e.g, image 205) and sometimes misidentify aspects or features as being of interest when in fact they are not of interest. Putting aside the issue of the quality of the labeling effort (e.g, 210 of FIG.
- the conventional labeling process 210 must be repeated for each and every image 205 in the training dataset (e.g, set of images 205), which may include tens of thousands, hundreds of thousands, or even millions of images, prior to training 220 on the labeled training dataset 310. Only after each and every image 205 in the training dataset has been labeled to identify the aspects or features of interest, is the labeled training dataset 310 presented to the model of the deep learning algorithm, such as, for example, the artificial neural network (e.g, 100 of FIG. 1), for extensive training 220.
- the model of the deep learning algorithm such as, for example, the artificial neural network (e.g, 100 of FIG. 1), for extensive training 220.
- the goal may be to generate a trained model that can recognize, identify, or predict features of interest (e.g, 405a of FIG. 4A) in data.
- a trained model of a deep learning algorithm may be used to recognize, identify, or predict faults, salts, stratigraphy, or channels in seismic data volumes.
- a trained model of a deep learning algorithm may be used to recognize, identify, or predict cancerous tumors in computed tomography (“CT”) or magnetic resonance imaging (“MRI”) scans.
- CT computed tomography
- MRI magnetic resonance imaging
- FIG. 4A shows feature 405a in a 3-dimensional volume of data 400 with a cutout 410 highlighting its 3-dimensional nature.
- data may be represented as a 3-dimensional volume 400 in three axes (e.g, X-axis, Y-axis, and Z- axis).
- One or more features 405a (only one shown) may be disposed anywhere within 3- dimensional volume 400.
- the one or more features 405a may be any identifiable aspect of interest and may vary in size, shape, number, nature, and orientation.
- Cutout 410 shows that feature 405a may have aspects that extend in each of the three axes of the 3- dimensional volume 400.
- FIG. 4B shows another view of feature 405a in 3-dimensional volume of data 400 with a cutout 415 highlighting the planar faces of feature 405a.
- the source data constituting volume 400 is, by nature of how it is obtained, a plurality of unlabeled 2-dimensional slices (not shown) in a first planar orientation.
- volume 400 corresponds to a stack of the plurality of unlabeled 2-dimensional slices (not shown) having the first planar orientation, and no partitioning is required.
- partitioning is required to convert the volume 400 into a plurality of unlabeled 2-dimensional slices (not shown) in a first planar orientation.
- FIG. 4C shows a process of partitioning 3-dimensional volume of data 400 into a plurality of unlabeled 2-dimensional slices (e.g, 420a, 425a, and 430a) having a first planar orientation.
- a volume of data 400 is often formed by a stacked arrangement of a plurality of unlabeled 2-dimensional slices (not shown) in a first planar orientation and may not require partitioning.
- a volume 400 may be partitioned into a plurality of unlabeled 2-dimensional slices (e.g, 420a, 425a, and 430a) in any two dimensions (in this example, the XZ plane).
- unlabeled 2-dimensional slice 420a may include aspect view 422a of feature 405a
- unlabeled 2-dimensional slice 425a may include aspect view 427a of feature 405a
- unlabeled 2-dimensional slice 430a may include aspect view 432a of feature 405a, where aspect views 422a, 427a, and 432a may not be the same and may vary in size, shape, nature, and orientation.
- the aspect or view of any given feature may vary from slice to slice.
- FIG. 4D shows the process of labeling (e.g., 210 of FIG. 2) unlabeled 2-dimensional slice 430a having the first planar orientation and training on labeled 2- dimensional slice 430b having the first planar orientation to produce predicted 2- dimensional slice 430c having the first planar orientation.
- An unlabeled 2-dimensional slice 430a may represent any slice obtained from volume 400 that has not been satisfactorily labeled. For the purposes of illustration, we will assume that the goal is to train a model of a deep learning algorithm to identify features represented by aspect view 432a in unlabeled 2-dimensional slice 430a.
- unlabeled 2-dimensional slice 430a having the first planar orientation may include one or more aspects 432a (only one aspect shown) of one or more features (e.g, 405a of FIG. 4C) (only one feature shown) that may be difficult to discern.
- one or more aspects 432a of the one or more features of interest e.g, 405a of FIG. 4C
- Labeled 2-dimensional slice 430b may include applied labels, for example, labeled aspect 432b corresponding to unlabeled aspect 432a of unlabeled 2-dimensional slice 430a.
- Labeled 2-dimensional slice 430b may be submitted to train the model of the deep learning algorithm to learn to predict similar features of interest.
- the dataset comprising predictions of features of interest may be used to produce a plurality of predicted 2-dimensional slices (e.g, only 430c shown) having the first planar orientation that include the model predicted aspects (e.g, 432c) of the features of interest (e.g, 405b of FIG. 4E) that ideally correspond to the labeled aspects (e.g., 432b) of the unlabeled aspects (e.g, 432a) of the features of interest (e.g, 405a of FIG. 4C).
- FIG. 4E shows a first probability cube 490 created by a plurality of predicted 2-dimensional slices (e.g, 430c of FIG. 4D as example of one such slice) having the first planar orientation.
- a plurality of predicted 2-dimensional slices e.g, 430c of FIG. 4D as example of one such slice
- having the first planar orientation may be produced corresponding to the unlabeled (e.g., 430a of FIG. 4D) and the labeled (e.g., 430b of FIG. 4D) plurality of 2- dimensional slices having the first planar orientation.
- the plurality of predicted 2- dimensional slices e.g., 430c of FIG.
- first probability cube 490 As example of one such slice, having the first planar orientation may be reassembled to constitute a volume of data now referred to as first probability cube 490, because it includes predicted features of interest from each constituent slice (e.g., 430c).
- First probability cube 490 more clearly shows the one or more features of interest (e.g, 405b) as opposed to the initial volume of data (e.g, 400). This evidences that there was some level of success in labeling and training the model of the deep learning algorithm, because it is able to more clearly identify the one or more features of interest (e.g., 405b).
- a first probability cube 490 is a representation of a plurality of predicted 2-dimensional slices having a first planar orientation.
- a first probability cube (e.g, 490) may be input to a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al.
- the first probability cube (e.g., 490) may be generated as part of the method of probability cube to probability cube enhancement, both of which are discussed in more detail herein.
- a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al may be used with manually labeled 2-dimensonal slices, model-assisted and labeled 2- dimensional slices, Al-assisted and labeled 2-dimensional, or any other combination thereof. Regardless of how the 2-dimensional slices are labeled, the method enhances the quality of the predictions.
- a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al enhances the quality of predictions by inputting a first probability cube comprising predicted 2-dimensional slices having a first planar orientation, partitioning the first probability cube into a plurality of unlabeled orthogonal 2- dimensional slices having a second planar orientation, labeling one or more unlabeled orthogonal 2-dimensional slices having the second planar orientation, and training on one or more predicted orthogonal 2-dimensioanl slices having the second planar orientation to produce a second probability cube.
- the second planar orientation may be orthogonal to the first planer orientation (or at least at an offset angle that shows the most jagged edges, discontinuities, artifacts, or noise).
- the second probability cube substantially enhanced predictions with cleaner edges and less artifacts and noise.
- FIG. 5A shows the creation of an orthogonal 2-dimensional slice 520a having a second planar orientation from first probability cube 490 in accordance with one or more embodiments of the present invention.
- a first probability cube 490 comprises a plurality of predicted 2-dimensional slices having a first planar orientation, that may be treated as input.
- first probability cube 490 is comprised of a plurality of 2-dimensional slices having a first planar orientation in the XZ plane, but could be the XY or YZ plane.
- first probability cube 490 may be used as input to a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al, that includes partitioning, labeling, and training to enhance the resolution of the labels and resulting predictions.
- a 2-dimensional slice may be taken having a second planar orientation that is orthogonal to the first planar orientation used to create first probability cube 490.
- the resulting 2-dimensional slice 520a is referred to as an orthogonal 2-dimensional slice because it has the second planar orientation that is orthogonal to the first planar orientation.
- FIG. 5B shows orthogonal 2-dimensional slice 520a having the second planar orientation taken from the first probability cube 490 showing jagged edges, discontinuities, artifacts, and noise in accordance with one or more embodiments of the present invention.
- Orthogonal 2-dimensional slice 520a may, by virtue of its different planar orientation, shows undesirable artifacts and noise, where the edges are not clean, in aspect 522a that corresponds to a different view of predicted feature 405b of first probability cube 490. Because first probability cube 490 is created by predicted 2-dimensional slices that are oriented in the first planar orientation, the predicted aspects or features (e.g, 432c of FIG. 4D) are well defined in the first planar orientation.
- first probability cube 490 is looked at with 2-dimensonal slices oriented in the second planar orientation (e.g, 520a), discontinuities in the edges of aspects (e.g, 522a) between adjacent slices in the first planar orientation are revealed.
- first probability cube 490 may be used as input to a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al that operates in a second planar orientation that is orthogonal to the first planar orientation to enhance the resolution of the labels and resulting predictions of aspects or features, resulting in a newly created and enhanced second probability cube (e.g, 690 of FIG. 6C) that is of a higher resolution than the first probability cube (e.g, 490 of FIG. 4E).
- a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al that operates in a second planar orientation that is orthogonal to the first planar orientation to enhance the resolution of the labels and resulting predictions of aspects or features, resulting in a newly created and enhanced second probability cube (e.g, 690 of FIG. 6C) that is of a higher resolution than the first probability cube (e.g, 490 of FIG. 4E).
- FIG. 6A shows a process of partitioning first probability cube 490 into a plurality of unlabeled orthogonal 2-dimensonal slices (e.g, 610a, 620a, 630a, and 640a) having a second planar orientation in accordance with one or more embodiments of the present invention.
- unlabeled orthogonal 2-dimensonal slices e.g, 610a, 620a, 630a, and 640a
- First probability cube 490 may be partitioned into a plurality of unlabeled orthogonal 2-dimensional slices (e.g, 610a, 620a, 630a, and 640a) having a second planar orientation (in this instance, the XY plane) that is orthogonal to the first planar orientation (in this instance, the XZ plane) of the predicted 2-dimensonal slices (e.g, 430c of FIG. 4D) used to form first probability cube 490. While only four slices are shown in the figure to simplify the presentation and enhance understanding, one of ordinary skill in the art will understand that the number of slices may vary based on an application and is typically much larger in number.
- unlabeled orthogonal 2-dimensional slice 610a may include an aspect view of feature 405b
- unlabeled orthogonal 2-dimensional slice 620a may include an aspect view of feature 405b
- unlabeled orthogonal 2-dimensional slice 630a may include an aspect view of feature 405b
- unlabeled orthogonal 2-dimensional slice 640a may include an aspect view of feature 405b, where the aspect views of feature 405b may not be the same and may vary in size, shape, nature, and orientation from slice to slice.
- FIG. 6B shows a process of labeling an unlabeled orthogonal 2- dimensional slice 610a having the second planar orientation and training on the labeled orthogonal 2-dimensional slice 610b having the second planar orientation to produce a predicted orthogonal 2-dimensional slice 610c having a second planar orientation in accordance with one or more embodiments of the present invention.
- An unlabeled orthogonal 2-dimensional slice 610a may represent any slice obtained from the partitioning of the first probability cube (e.g, 490 of FIG. 6A) that has not been satisfactorily labeled.
- the goal is to train a model of a deep learning algorithm to identify features like aspect 612a in unlabeled orthogonal 2-dimensional slice 610a.
- unlabeled orthogonal 2-dimensional slice 610a having the second planar orientation may include one or more aspects 612a (only one aspect shown) of one or more features (e.g, 405b of FIG. 6A) (only one feature shown) that may include jagged edges, discontinuities, artifacts, or noise, when viewed in this planar orientation.
- one or more aspects 612a of the one or more features of interest may be labeled, or marked, to identify it for training.
- Labeled orthogonal 2-dimensional slice 610b having the second planar orientation may include applied labels, for example, aspect 612b corresponding to unlabeled aspect 612a of unlabeled orthogonal 2- dimensional slice 610a having the second planar orientation.
- Labeled orthogonal 2- dimensional slice 610b having the second planar orientation may be submitted to train on a model of a deep learning algorithm to identify similar features of interest.
- the model and deep learning algorithm may or may not be the same one used to create the first probability cube (e.g, 490 of FIG. 4E).
- the trained model may be used to produce a plurality of predicted orthogonal 2-dimensional slices (e.g, only 610c shown) having the second planar orientation that include the model predicted aspects (e.g, 612c) of the features of interest (e.g, 405b of FIG. 6A) that ideally correspond to the labeled (e.g, 612b) and the unlabeled (e.g, 612a) aspects of the features of interest (e.g, 405b of FIG. 6A).
- the model predicted aspects e.g, 612c
- the features of interest e.g, 405b of FIG. 6A
- the unlabeled e.g, 612a
- FIG. 6C shows a second probability cube 690 created by a plurality of predicted orthogonal 2-dimensional slices (e.g, 610c of FIG. 6B as an example of one such slice) having the second planar orientation in accordance with one or more embodiments of the present invention.
- a plurality of predicted orthogonal 2-dimensional slices e.g, 610c of FIG. 6B as an example of one such slice
- having the second planar orientation may be produced corresponding to the plurality of unlabeled orthogonal 2-dimensional slices (e.g, 610a of FIG. 6B) having the second planar orientation and to the plurality of labeled orthogonal 2-dimensional slices (e.g, 610b of FIG.
- second probability cube 690 comprises a plurality of predicted 2-dimensional slices (e.g, 610c of FIG. 6B) having a second planar orientation that is orthogonal to, or at least offset from, that of the first planar orientation.
- Second probability cube 690 more clearly shows the one or more features of interest (e.g, 405c) as opposed to the initial volume of data (e.g, 400) and the first probability cube (e.g, 490 of FIG. 6A). In this way, second probability cube 690 enhances and provides higher resolution predictions over that of the first probability cube (e.g, 490 of FIG. 6A).
- FIG. 7 shows a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al in accordance with one or more embodiments of the present invention.
- a first probability cube may serve as input to the method of probability cube to probability cube enhancement for deep learning applications of machine learning Al (e.g, inputting a first probability cube 705b).
- a plurality of unlabeled 2-dimensional slices having a first planar orientation may be used to create the first probability cube (e.g, requiring labeling 710a and training 720a) before substantive steps of the method are undertaken.
- a unpartitioned volume of data may be used to create the first probability cube (e.g., requiring inputting and partitioning 705a, labeling 710a, and training 720a) before substantive steps of the method are undertaken.
- a 3-dimensional volume of data may be input into a deep learning software application of a computer.
- the 3-dimensional volume may comprise a plurality of unlabeled 2-dimensional slices having a first planar orientation or a volume of data that must be partitioned into a plurality of unlabeled 2-dimensional slices having a first planar orientation.
- one or more features of interest may be manually labeled in each of the plurality of unlabeled 2-dimensionals slices having the first planar orientation or in at least a few of the plurality of unlabeled 2-dimensional slices having the first planar orientation in model- or Al-assisted labeling processes, to produce a plurality of labeled 2-dimensional slices having the first planar orientation.
- a model of a deep learning algorithm may be trained on the plurality of labeled 2-dimensionals slices having the first planar orientation to produce a plurality of predicted 2-dimensional slices having the first planar orientation.
- the deep learning algorithm may be an artificial neural network.
- the deep learning algorithm may be a convolutional neural network.
- the deep learning algorithm may be radial basis function network, a recurrent neural network, a long short-term memory network, a selforganizing map, an autoencoder, or a deep belief network.
- training the model may include training a new model of a deep learning algorithm.
- training the model may include training with a pretrained, or “canned”, model of a deep learning algorithm.
- the trained model may be evaluated by determining how well one or more of the predicted 2-dimensional slices having the first planar orientation recognizes or predicts features of interest. If it is determined that the trained model does not accurately predict features of interest, the process may return to labeling step 710a to correct one or more labels in the labeled 2-dimensional slices, or in the case of model- or Al-assisted labeling, potentially labeling more 2-dimensional slices having the first planar orientation. However, if it is determined that the trained model accurately predicts features of interest, the plurality of predicted 2-dimensional slices having the first planar orientation may be assembled to form the first probability cube.
- the first probability cube may be input into a deep learning software application of the computer.
- the first probability cube may be partitioned into a plurality of unlabeled orthogonal 2-dimensional slices having a second planar orientation.
- the second planar orientation may be orthogonal to the first planar orientation.
- the two axes orthogonal to the first planar orientation and showing the most discontinuities, artifacts, and noise would be used.
- the second planar orientation may be offset from the first planar orientation at an angle that shows the most discontinuities, artifacts, and noise.
- one or more features of interest may be manually labeled in the each of the plurality of unlabeled orthogonal 2-dimensionals slices having the second planar orientation, or at least a few of the plurality of unlabeled orthogonal 2-dimensional slices having the second planar orientation may be manually labeled with the remaining being labeled in model- or Al-assisted labeling processes, to produce a plurality of labeled orthogonal 2-dimensional slices having the second planar orientation.
- a model of a deep learning algorithm may be trained on a plurality of labeled orthogonal 2-dimensionals slices having the second planar orientation to produce a plurality of predicted orthogonal 2-dimensional slices having the second planar orientation.
- the trained model may be evaluated by determining how well one or more of the predicted orthogonal 2-dimensional slices having the second planar orientation recognize or predict features of interest. If it is determined that the trained model does not accurately predict features of interest, the process may return to labeling step 710b to correct the one or more labels in the labeled orthogonal 2-dimensional slices having the second planar orientation, or in the case of Al-assisted labeling, potentially labeling more orthogonal 2-dimensional slices having the second planar orientation. However, if it is determined that the trained model accurately predicts features of interest, the plurality of predicted orthogonal 2-dimensional slices having the second planar orientation may be assembled to form the second probability cube. The second probability cube comprises a plurality of predicted orthogonal 2-dimensional slices having the second planar orientation.
- the predicted features of interest in the second probability cube have enhanced resolution with little to no jagged edges, discontinuities, artifacts, or noise.
- any of the above-noted methods may be implemented at least in part by a computer (e.g., 800 of FIG. 8) in accordance with one or more embodiments of the present invention.
- a non-transitory computer readable medium comprising software instructions, when executed by a processor, may perform any of the above-noted methods in accordance with one or more embodiments of the present invention.
- FIG. 8 shows a computer 800 for performing at least part of a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al in accordance with one or more embodiments of the present invention. While computer 800 is merely exemplary of an Intel® x86 instruction set architecture computing system, one of ordinary skill in the art will appreciate that computer 800 may be any other type or kind of computer capable of executing software instructions that perform at least part the above-noted method.
- Computer 800 may include one or more processors, sometimes referred to as central processing units (“CPUs”) 805, host bridge 810, input/output (“IO”) bridge 815, graphics processing units (“GPUs”) 825, and/or application-specific integrated circuits (“ASICs”) (not shown) disposed on one or more printed circuit boards (not shown) that perform computational operations in accordance with an instruction set architecture (“ISA”).
- CPUs central processing units
- IO input/output
- GPUs graphics processing units
- ASICs application-specific integrated circuits
- ISA instruction set architecture
- Each of the one or more CPUs 805, GPUs 825, or ASICs (not shown) may be a single-core (not shown) device or a multi-core (not shown) device.
- Multi-core devices typically include a plurality of cores (not shown) disposed on the same physical die (not shown) or a plurality of cores (not shown) disposed on multiple die (not shown) that are collectively disposed within the same mechanical package (not shown).
- CPU 805 may be a general-purpose computational device typically configured to execute software instructions for a specific instruction set architecture.
- CPU 805 may include an interface 808 to host bridge 810, an interface 818 to system memory 820, and an interface 823 to one or more IO devices, such as, for example, one or more GPUs 825.
- GPU 825 may be a specialized computational device typically configured to perform graphics functions related to frame buffer manipulation. However, one of ordinary skill in the art will recognize that GPU 825 may be used to perform computationally intensive mathematical functions, including training a deep learning algorithm.
- GPU 825 may interface 823 directly with CPU 805 (and interface 818 with system memory 820 through CPU 805).
- GPU 825 may interface 821 with host bridge 810 (and interface 816 with system memory 820 through host bridge 810 or interface 818 with system memory 820 through CPU 805 depending on the application or design). In still other embodiments, GPU 825 may interface 833 with IO bridge 815 (and interface 816 with system memory 820 through host bridge 810 or interface 818 with system memory 820 through CPU 805 depending on the application or design).
- IO bridge 815 and interface 816 with system memory 820 through host bridge 810 or interface 818 with system memory 820 through CPU 805 depending on the application or design.
- Host bridge 810 may be an interface device that interfaces between the one or more computational devices (e.g, CPUs 805, GPUs 825, ASICs) and IO bridge 815 and, in some embodiments, system memory 820.
- Host bridge 810 may include an interface 808 to CPU 805, an interface 813 to IO bridge 815, for embodiments where CPU 805 does not include an interface 818 to system memory 820, an interface 816 to system memory 820, and for embodiments where CPU 805 does not include an integrated GPU 825 or an interface 823 to GPU 825, an interface 821 to GPU 825.
- One or ordinary skill in the art will appreciate that the functionality of host bridge 810 may be integrated, in whole or in part, with CPU 805.
- IO bridge 815 may be an interface device that interfaces between the one or more computational devices (e.g, CPUs 805, GPUs 825, ASICs) and various IO devices (e.g, 840, 845) and IO expansion, or add-on, devices (not independently illustrated).
- IO bridge 815 may include an interface 813 to host bridge 810, one or more interfaces 833 to one or more IO expansion devices 835, an interface 838 to keyboard 840, an interface 843 to mouse 845, an interface 848 to one or more local storage devices 850, and an interface 853 to one or more network interface devices 855.
- One or ordinary skill in the art will appreciate that the functionality of IO bridge 815 may be integrated, in whole or in part, with CPU 805 and/or host bridge 810.
- Each local storage device 850 may be a solid-state memory device, a solid-state memory device array, a hard disk drive, a hard disk drive array, or any other non-transitory computer readable medium.
- Network interface device 855 may provide one or more network interfaces including any network protocol suitable to facilitate networked communications.
- Computer 800 may include one or more network-attached storage devices 860 in addition to, or instead of, one or more local storage devices 850.
- Each network-attached storage device 860 if any, may be a solid-state memory device, a solid-state memory device array, a hard disk drive, a hard disk drive array, or any other non-transitory computer readable medium.
- Network-attached storage device 860 may or may not be collocated with computing system 800 and may be accessible to computing system 800 via one or more network interfaces provided by one or more network interface devices 855.
- computer 800 may be a conventional computing system or an application-specific computing system (not shown).
- an application-specific computing system may include one or more ASICs (not shown) that perform one or more specialized functions in a more efficient manner.
- the one or more ASICs may interface directly with CPU 805, host bridge 810, or GPU 825 or interface through IO bridge 815.
- an application-specific computing system may be reduced to only those components necessary to perform a desired function in an effort to reduce one or more of chip count, printed circuit board footprint, thermal design power, and power consumption.
- the one or more ASICs may be used instead of one or more of CPU 805, host bridge 810, IO bridge 815, or GPU 825. In such systems, the one or more ASICs may incorporate sufficient functionality to perform certain network and computational functions in a minimal footprint with fewer component devices.
- CPU 805, host bridge 810, IO bridge 815, GPU 825, or ASIC (not shown) or a subset, superset, or combination of functions or features thereof, may be integrated, distributed, or excluded, in whole or in part, based on an application, design, or form factor in accordance with one or more embodiments of the present invention.
- the description of computer 800 is merely exemplary and not intended to limit the type, kind, or configuration of component devices that constitute a computer 800 suitable for executing software instructions in accordance with one or more embodiments of the present invention.
- computer 800 may be a standalone, laptop, desktop, industrial, server, blade, or rack mountable system and may vary based on an application or design.
- Advantages of one or more embodiments of the present invention may include one or more of the following:
- a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al substantially improves the quality of predictions and in model- or AI- assisted labeling processes, with minimum labeling effort.
- a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al substantially improves the ability of the deep learning algorithm to enhance and improve jagged edges and discontinuities and eliminate noise and artifacts in features of interest present in the first probability cube.
- a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al reduces the amount of time required to train the labeled training dataset.
- a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al reduces or eliminates the need for subsequent iterations of labeling, training, and evaluating.
- a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al reduces the computational complexity of training.
- a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al reduces the costs associated with training.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
Probability cube to probability cube enhancement for deep learning applications of artificial intelligence enhances the quality of predictions. A first probability cube having a plurality of predicted 2-dimensional slices having a first planar orientation is partitioned into a plurality of unlabeled orthogonal 2-dimensional slices having a second planar orientation. The unlabeled orthogonal 2-dimensioanl slices having the second planar orientation are labeled to produce labeled orthogonal 2-dimensional slices having the second planar orientation. A model of a deep learning algorithm is trained on the labeled orthogonal 2-dimensional slices having the second planar orientation to produce predicted orthogonal 2-dimensonal slices having the second planar orientation. The second planar orientation may be orthogonal or at least at an offset angle to the first planar orientation. The predicted orthogonal 2-dimensional slices having the second planar orientation form a second probability cube having substantially enhanced predictions with less noise and cleaner edges.
Description
PROBABILITY CUBE TO PROBABILITY CUBE ENHANCEMENT FOR DEEP LEARNING ARTIFICIAL INTELLIGENCE
BACKGROUND OF THE INVENTION
[0001] Artificial Intelligence (“Al”) is a term of art in the interdisciplinary fields of computer science, mathematics, data science, and statistics that generally refers to hardware systems and software algorithms that perform tasks, not intuitively programmatic, that are thought of as requiring some aspect of human intelligence such as, for example, visual perception, object detection, pattern recognition, natural language processing, speech recognition, translation, and decision making. In essence, Al seeks to develop algorithms that mimic human intelligence to perform tasks and improve in the performance of those tasks over time. The promise of these techniques is a more efficient alternative to capture knowledge in data and gradually improve the performance of predictive models, thereby enabling data-driven decision making. While Al remains an exceptionally broad area of research, recent advancements in machine learning and, more specifically, deep learning have led to a paradigm shift in the creation of predictive models that have widespread application across industries.
BRIEF SUMMARY OF THE INVENTION
[0002] According to one aspect of one or more embodiments of the present invention, a method of probability cube to probability cube enhancement for deep learning includes generating a second probability cube of data by inputting a first probability cube comprising a plurality of predicted 2-dimensional slices having a first planar orientation, partitioning the first probability cube of data into a plurality of unlabeled orthogonal 2-dimensional slices having a second planar orientation, labeling one or more features of interest in the plurality of unlabeled orthogonal 2-dimensional slices having the second planar orientation to produce a plurality of labeled orthogonal 2- dimensional slices having the second planar orientation, and training a model of a deep learning algorithm on the plurality of labeled orthogonal 2-dimensional slices of data having the second planar orientation to produce a plurality of predicted orthogonal 2- dimensional slices having the second planar orientation.
[0003] According to one aspect of one or more embodiments of the present invention, a computer-implemented method of probability cube to probability cube enhancement for deep learning includes generating a second probability cube of data by inputting a first probability cube comprising a plurality of predicted 2-dimensional slices having a first planar orientation, partitioning the first probability cube of data into a plurality of
unlabeled orthogonal 2-dimensional slices having a second planar orientation, labeling one or more features of interest in the plurality of unlabeled orthogonal 2-dimensional slices having the second planar orientation to produce a plurality of labeled orthogonal 2-dimensional slices having the second planar orientation, and training a model of a deep learning algorithm on the plurality of labeled orthogonal 2-dimensional slices of data having the second planar orientation to produce a plurality of predicted orthogonal 2-dimensional slices having the second planar orientation.
[0004] According to one aspect of one or more embodiments of the present invention, a non-transitory computer-readable medium comprising software instructions that, when executed by a processor, performs a method of probability cube to probability cube enhancement for deep learning includes generating a second probability cube of data by inputting a first probability cube comprising a plurality of predicted 2-dimensional slices having a first planar orientation, partitioning the first probability cube of data into a plurality of unlabeled orthogonal 2-dimensional slices having a second planar orientation, labeling one or more features of interest in the plurality of unlabeled orthogonal 2-dimensional slices having the second planar orientation to produce a plurality of labeled orthogonal 2-dimensional slices having the second planar orientation, and training a model of a deep learning algorithm on the plurality of labeled orthogonal 2-dimensional slices of data having the second planar orientation to produce a plurality of predicted orthogonal 2-dimensional slices having the second planar orientation.
[0005] Other aspects of the present invention will be apparent from the following description and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] FIG. 1 shows a conventional deep learning algorithm used as part of a deep learning application of machine learning Al.
[0007] FIG. 2 shows a conventional process of generating a trained model of a deep learning algorithm used as part of a deep learning application of machine learning Al.
[0008] FIG. 3 shows a conventional process of labeling 2-dimensional training data to produce a labeled training dataset in advance of training a model of a deep learning algorithm used as part of a deep learning application of machine learning Al.
[0009] FIG. 4A shows a feature in a 3 -dimensional volume of data with a cutout highlighting its 3 -dimensional nature.
[0010] FIG. 4B shows another view of the feature in the 3-dimensional volume of data with a cutout highlighting the planar faces of the feature.
[0011] FIG. 4C shows a process of partitioning the 3-dimensional volume of data into a plurality of unlabeled 2-dimensional slices having a first planar orientation.
[0012] FIG. 4D shows the process of labeling an unlabeled 2-dimensional slice having the first planar orientation and training on the labeled 2-dimensional slice having the first planar orientation to produce a predicted 2-dimensional slice having the first planar orientation.
[0013] FIG. 4E shows a first probability cube created by a plurality of predicted 2- dimensional slices having the first planar orientation.
[0014] FIG. 5A shows the creation of an orthogonal 2-dimensional slice having a second planar orientation from the first probability cube in accordance with one or more embodiments of the present invention.
[0015] FIG. 5B shows the orthogonal 2-dimensional slice having the second planar orientation taken from the first probability cube showing jagged edges, discontinuities, artifacts, and noise in accordance with one or more embodiments of the present invention.
[0016] FIG. 6A shows a process of partitioning a first probability cube into a plurality of unlabeled orthogonal 2-dimensonal slices having a second planar orientation in accordance with one or more embodiments of the present invention.
[0017] FIG. 6B shows a process of labeling an unlabeled orthogonal 2-dimensional slice having the second planar orientation and training on the labeled orthogonal 2- dimensional slice having the second planar orientation to produce a predicted orthogonal 2-dimensional slice having a second planar orientation in accordance with one or more embodiments of the present invention.
[0018] FIG. 6C shows a second probability cube created by a plurality of predicted orthogonal 2-dimensional slices having the second planar orientation in accordance with one or more embodiments of the present invention.
[0019] FIG. 7 shows a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al in accordance with one or more embodiments of the present invention.
[0020] FIG. 8 shows a computer for performing at least part of a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al in accordance with one or more embodiments of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0021] One or more embodiments of the present invention are described in detail with reference to the accompanying figures. For consistency, like elements in the various figures are denoted by like reference numerals. In the following detailed description of the present invention, specific details are described to provide a thorough understanding of the present invention. In other instances, aspects that are well-known to those of ordinary skill in the art are not described to avoid obscuring the description of the present invention.
[0022] Machine learning is an application of Al that generally refers to hardware systems and software algorithms that are said to leam from the experience of processing data. In essence, machine learning algorithms leam to make predictions from data, without requiring explicit programming to do so. Machine learning algorithms are broadly categorized as reinforcement learning algorithms, unsupervised learning algorithms, semi-supervised learning algorithms, and supervised learning algorithms.
[0023] Reinforcement learning algorithms are goal-oriented algorithms that seek to optimize a mathematical objective without any external input. In reinforcement learning nomenclature, an agent is said to interact with an environment in accordance with a policy, take action in accordance with the policy, and adjust the policy based on a reward function of the prior action. In this way, reinforcement learning algorithms search the solution space using feedback to advance toward a goal. Reinforcement learning algorithms are sometimes referred to as self-contained algorithms because they do not require labeled training data, training on labeled training data, or human intervention.
[0024] Unsupervised learning algorithms are pattern identification algorithms that are used to gain insight into large datasets by categorizing data without any external input. Unsupervised learning algorithms are said to self-discover broad patterns in large datasets and are typically used in clustering, association, and dimensionality reduction applications. Unsupervised learning algorithms are also self-contained algorithms that do not require labeled training data, training on labeled training data, or human intervention. While computationally complex, and less accurate than other machine learning techniques, reinforcement and unsupervised learning algorithms leam from data while processing it in real-time.
[0025] Semi-supervised learning algorithms are a hybrid of unsupervised and supervised learning algorithms. In semi-supervised learning algorithms, a small sample of training
data, taken from a larger dataset, is manually labeled. The small sample of manually labeled data is then used to train a model that is then used to label the remaining data in the training dataset prior to presentment of the entire labeled training dataset to the model for extensive training. As such, only a small portion of the training dataset is manually labeled, and the remaining data is labeled by the model. A significant drawback to this approach is that the quality of the labeling effort is not known until the entire training dataset has been labeled, whether manually or by the model, and then fully trained on the model. The small amount of labeled training data used to train the model typically results in low-quality labels on the model-labeled instances of data in the training dataset, thereby frustrating efforts to train the model and resulting in a trained model that performs poorly, if a trained model can be generated at all. As such, semi-supervised learning typically only finds application in cases where there is not enough labeled data to produce a trained model with a supervised learning algorithm.
[0026] Supervised learning algorithms differ from reinforcement, unsupervised, and semi-supervised learning algorithms in that supervised learning algorithms typically require a significant amount of training data to be accurately labeled in advance of extensive training on a model of a deep learning algorithm. The training data is typically labeled in a labor-intensive process that may take days, weeks, or months to complete depending on the complexity of the application. After the entire training dataset is labeled, the model must be extensively trained on the labeled training data in a process that may take further days, weeks, or months to complete. In conventional processes, the quality of the labeling effort may be evaluated only after having fully labeled the training data and having fully trained on the labeled training data. As such, it often requires several iterations of labeling, training, and evaluating to generate a suitably trained model, in a complicated, time consuming, and costly process.
[0026] Notwithstanding these challenges, supervised learning algorithms are at the forefront of deep learning and show great promise for future applications.
[0027] Deep learning is a supervised learning application of machine learning Al that uses multi-layered, or deep, learning algorithms, such as, for example, artificial neural networks, and often a type of artificial neural network referred to as a convolutional neural network. Deep learning algorithms require extensive training on an accurately labeled training dataset to generate a trained model for operative use. Each instance of training data in the training dataset must be meticulously and accurately labeled to identify one or more aspects or features of interest. The deep learning algorithm must
then be extensively trained on the labeled training data to generate a trained model, if one can be generated at all. When presented with a large amount of accurately labeled training data and extensive subsequent training, trained models have proven highly effective in complex applications. As such, labeling a training dataset remains one of the most important tasks in generating a trained model in deep learning applications.
[0028] Labeling is the process of qualitatively annotating, marking up, or identifying aspects or features of interest in a training dataset, typically performed in advance of training under conventional labeling processes. During training, the labeled training data is used to train a model of a deep learning algorithm to identify or predict aspects or features of interest in a manner consistent with the labeled aspects or features of interest in the labeled training data. Training is the process of parameterizing a model of a deep learning algorithm based on a labeled training dataset in an effort to generate a trained model for operative use. Specifically, the training process attempts to determine a set of model parameters, such as, for example, a set of model weights, that map inputs to outputs in a manner consistent with what was learned from the labeled training data. For the training process to succeed, a large and accurately labeled training dataset must be presented to the model of the deep learning algorithm for extensive training. If the training process succeeds, a trained model is generated. The resulting trained model is a deep learning algorithm that is parameterized through training. In operative use, the trained model may then be presented with data to predict one or more features of interest consistent with what was learned from the labeled training dataset. Consequently, the quality and effectiveness of the trained model, assuming one can be generated, is highly dependent on the quality of the labeling effort that is performed in advance of extensive training. For these reasons and others, labeling is widely considered the most important task in the development of a trained model in deep learning applications.
[0029] In many applications, the deep learning algorithm is an artificial neural network inspired by the biological neural network of the human brain, and often a type of artificial neural network referred to as a convolutional neural network. Artificial neural networks typically include an input layer corresponding to inputs to the algorithm, one or more intermediate layers, and an output layer corresponding to outputs of the algorithm. The configuration and capability of an artificial neural network is sometimes described by the network’s depth, width, size, capacity, and architecture. Depth refers to the number of layers, sometimes referred to as node containers, other than the input
layer, width refers to the number of nodes in a given layer and may vary from layer to layer, and size refers to the total number of nodes in the artificial neural network, where nodes are the fundamental units of the artificial neural network. The capacity typically refers to the type or structure of functions that can be learned and is sometimes referred to as the representational capacity of the artificial neural network. The architecture broadly refers to the arrangement of layers and nodes in the artificial neural network.
[0030] Each node, or artificial neuron, in an artificial neural network is connected to one or more other nodes by way of an artificial synapse. In a feed-forward fully connected architecture, each node in the given layer is connected to each node of the layer immediately following it by an artificial synapse. Similarly, each node in a given layer is connected to each node of the layer immediately preceding it by an artificial synapse. For a given node, the group of artificial synapses that connect each node of the layer immediately preceding the given node are considered input to the given node. Similarly, for a given node, the group of artificial synapses that connect the given node to each node of the layer immediately following the given node are considered output from the given node. In addition, each node may be characterized by a plurality of model weights applied to its inputs, an activation function, and a threshold function that governs the output to the nodes in the layer immediately following it.
[0031] Specifically, a model weight may be applied to each artificial synapse that is input to a given node and the weighted inputs may be summed. If the sum of weighted inputs exceeds that node’s threshold, the node is said to be activated and may output a value corresponding to a typically non-linear function of the weighted inputs, sometimes referred to as the activation function, to each node of the layer immediately following it. Conversely, if the sum of weighted inputs does not exceed the node’s threshold, the node is said to be deactivated and does not output to the nodes of the layer immediately following it. In this way, a trained model of an artificial neural network, where the model parameters have been determined through the training process, functionally maps one or more inputs to one or more outputs in a manner consistent with what was learned from the labeled training data. In typical applications, labeled training data is used to train the model of the deep learning algorithm, such as, for example, an artificial neural network, to produce a trained model that identifies or predicts aspects or features of interest in data. Put another way, the training process may be thought of as an optimization problem that attempts to determine the model parameters, such as a set of
model weights, and potentially other parameters, for the artificial neural network that effectively maps inputs to outputs based on the labeled training data.
[0032] For purposes of illustration only, FIG. 1 shows a conventional deep learning algorithm, such as, for example, an artificial neural network 100, used as part of a deep learning application of machine learning Al. From an architectural standpoint, artificial neural network 100 may include an input layer (e.g., 110), two or more intermediate layers (e.g., 120, 130, and 140), sometimes referred to as hidden layers because they are not directly observable from the system of inputs and outputs, and an output layer (e.g, 150). Each layer (e.g, 110, 120, 130, 140, and 150) may include a plurality of nodes, sometimes referred to as artificial neurons, (e.g, nodes 112a-112e for layer 110, nodes 122a- 122e for layer 120, nodes 132a-132e for layer 130, 142a- 142e for layer 140, and nodes 152a-152c for layer 150). The number of nodes per layer may vary and may not be the same from layer to layer.
[0033] Each node 112a-112e of input layer 110 may be connected to each node 122a- 122e of intermediate layer 120 via an artificial synapse. For example, node 112a may be connected to node 122a via artificial synapse Sii2a-i22a, node 112a may be connected to node 122b via artificial synapse Sii2a-i22b, node 112a may be connected to node 122c via artificial synapse Sii2a-i22c, node 112a may be connected to node 122d via artificial synapse Sii2a-i22d, and node 112a may be connected to node 122e via artificial synapse Si i2a-i22e. Each of the remaining nodes 112b-112e of input layer 110, may be connected to each node 122a-122e of intermediate layer 120 immediately following it in the same manner. Similarly, each node 122a-122e of intermediate layer 120 may be connected to each node 132a-132e of intermediate layer 130 via a plurality of artificial synapses (not labeled for clarity), each node 132a-132e of intermediate layer 130 may be connected to each node 142a-142e of intermediate layer 140 via a plurality of artificial synapses (not labeled for clarity), and each node 142a- 143e of intermediate layer 140 may be connected to each node 152a-152c of output layer 150 via a plurality of artificial synapses (not labeled for clarity) in the same manner. In this way, each node is said to input an artificial synapse from each node in the layer immediately preceding it, and each node outputs an artificial synapse to each node in the layer immediately following it. Each node may be characterized by its weighted inputs, activation function, and outputs.
[0034] Similar to synaptic weights applied to the synapses input to a biological neuron, a model weight may be applied to each artificial synapse that is input to a given node. For
example, with respect to the inputs to node 122a of intermediate layer 120, a model weight Wii2a-i22a (not shown) may be applied to artificial synapse Sii2a-i22a originating from node 112a, a model weight Wii2b-i22a (not shown) may be applied to artificial synapse Sii2b-i22a originating from node 112b, a model weight Wii2c-i22a (not shown) may be applied to artificial synapse Sii2c-i22a originating from node 112c, a model weight Wii2d-i22a (not shown) may be applied to artificial synapse Sii2d-i22a originating from node 112d, and a model weight Wii2e-i22a (not shown) may be applied to artificial synapse Sii2e-i22a originating from node 112e. A model weight may be applied to each artificial synapse (not labeled for clarity) input to each of the remaining nodes 122b- 122e of intermediate layer 120 in the same manner. Similarly, a model weight may be applied to each artificial synapse (not labeled for clarity) input to each node 132a- 132e of intermediate layer 130, a model weight may be applied to each artificial synapse (not labeled for clarity) input to each node 142a-142e of intermediate layer 140, and a model weight may be applied to each artificial synapse (not labeled for clarity) input to each node 152a- 152c of output layer 150. In this way, each artificial synapse input to a given node may have a different level of influence as to whether that node activates the next nodes in the layer immediately following it. The model weights are typically determined during the training process.
[0035] Each node 112a-112e, 122a-122e, 132a-132e, 142a-142e, and 152a-152c may include an activation function corresponding to a typically non-linear function of the sum of weighted inputs. For example, node 122a may include an activation function corresponding to a non-linear function of the sum of: a weighted value Wii2a-i22a of input artificial synapse Sii2a-i22a, a weighted Wii2b-i22a value of input artificial synapse Sii2b-i22a, a weighted Wii2c-i22a value of input artificial synapse Sii2c-i22a, a weighted value Wii2d-i22a of input artificial synapse Sii2d-i22a, and a weighted value Wii2e-i22a of input artificial synapse Sii2e-i22a. Each of the remaining nodes 122b-122e of intermediate layer 120 may include an activation function in the same manner. Similarly, each node 132a-132e of intermediate layer 130, each node 142a-142e of intermediate layer 140, and each node 152a- 152c of output layer 150 may each include an activation function in the same manner. In operation, if the weighted sum of the inputs to a given node exceeds a node’s threshold value, an activation function governs the output of that node to each node in the layer immediately following it. If the weighted sum of the inputs to the given node falls below the node’s threshold value, the node does not output to the nodes in the layer immediately following it. In this way,
artificial neural network 100 may be thought of as a function that maps data from input nodes 112a-112e of input layer 110 to output nodes 152a- 152c of output layer 150 by way of intermediate layers 120, 130, and 140. The activation function is typically specified in advance and may vary from application to application. The example described with reference to FIG. 1 is merely exemplary of an artificial neural network and one of ordinary skill in the art will recognize that other types or kinds of deep learning algorithms, with different parameterizations, may be used in deep learning applications of machine learning Al
[0036] During the training process, the model parameters for the deep learning algorithm are determined including, for example, one or more of model weights applied to inputs, the activation function or, if the activation function is specified in advance, parameters to the activation function, and potentially threshold values for each node. In some applications, the activation function and threshold values may be specified in advance and the only model parameters determined during training may be the model weights applied to inputs at each node. The model parameters are typically determined via an empirical optimization procedure, such as, for example, the stochastic gradient descent procedure. Notwithstanding, one of ordinary skill in the art will appreciate that other optimization processes may be used. The optimization problem presented by deep artificial neural networks can be challenging and the solution space may include local optima that make it difficult to converge on a solution. As such, the training process typically requires several passes through the labeled training data, where each pass through is referred to as an epoch.
[0037] The amount of change to the model parameters during each epoch is sometimes referred to as the learning rate, which corresponds to the rate at which the model is said to leam. However, the learning rate may best be described as controlling the amount of apportioned error that the model weights are updated with, each time they are updated in an epoch. Given an ideal learning rate, the model will leam to approximate the function for a given number of layers, nodes per layer, and training epochs. However, at the extremes, a learning rate that is too large will result in updates to model weights that are too large, and the performance of the model will tend to oscillate over epochs. Oscillating performance is typically caused by model weights that diverge during optimization and fail to reach solution. At the other extreme, a learning rate that is too small may never converge or may get stuck on a suboptimal solution, such as local optima. As such, the learning rate is important to ensuring that the empirical
optimization procedure converges on a solution of model parameter values, resulting in a trained model. In ideal situations, the empirical optimization procedure will converge on a set of model weights that effectively map inputs to outputs consistent with what was learned from the labeled training data.
[0038] For the reasons stated above as well as others, it is critically important that the artificial neural network be provided with a sufficient amount of accurately labeled training data such that the artificial neural network can extensively train on the labeled training data and arrive at a set of model parameters that enable the trained model to effectively and accurately map each of the inputs to outputs in a manner consistent with what was learned from the labeled training data. When labeling and training are done properly, this achieves one of the most important advances in deep learning applications of machine learning Al, namely, transfer learning through trained models.
[0039] While artificial neural networks are commonly used in deep learning applications of machine learning Al, the underlying deep learning algorithm used to generate a trained model may be, for example, a convolutional neural network, a recurrent neural network, a long short-term memory network, a radial basis function network, a selforganizing map, a deep belief network, a restricted Boltzman machine, an autoencoder, any variation or combination thereof, or any other type or kind of deep learning algorithm that requires a significant amount of labeled training data to generate a trained model, as is well known in the art.
[0040] FIG. 2 shows a conventional process 200 of generating a trained model of a deep learning algorithm, such as, for example, an artificial neural network (e.g., 100 of FIG. 1), as part of a deep learning application of machine learning Al. The conventional process 200 requires the creation of a labeled 210 training dataset on the front-end of the process. Specifically, each instance 205 of training data in the training dataset must be meticulously and accurately labeled 210 to identify aspects or features of interest before the model of the deep learning algorithm can be extensively trained 220 on the labeled training data. If training process 220 successfully converges on a candidate model, comprising a set of model parameters for the underlying deep learning algorithm, the candidate model must be evaluated 230 to determine whether it accurately maps inputs to outputs in a manner consistent with what was learned from the labeled training dataset and is suitable for use as a trained model 250. This is typically a qualitative judgment. If training process 220 does not converge on a candidate model or the evaluation 230 of the model does not meet requirements, the
processes of labeling 210, training 220, and evaluating 230 must be repeated 240. Each iteration 240 of training process 220 may require several passes, or epochs, through the labeled training data. As such, each iteration 240 of this process may take days, weeks, or months to complete.
[0041] While it has always been the case in the field of computer science that garbage-in is said to result in garbage-out, it is especially true in the case of labeling 210. The inherent problem with the conventional process 200 is that training 220 is frustrated by poorly labeled 210 training data. Despite best efforts, training 220 on poorly labeled training data may not converge on a trained model 250 or produce a trained model 250 that does not meet requirements when presented with new data of first impression. Worse yet, there may not be any feedback on the quality of the labeling 210 effort until the training dataset has been fully labeled 210 and the model of the deep learning algorithm has been extensively trained 220 on the labeled training data.
[0042] The quality of the labeling effort is a measure of how accurately the applied labels identify aspects or features of interest in the training data and predicted labels. An instance of training data is poorly labeled if it fails to identify, in whole or in part, aspects or features of interest or if it identifies aspects or features of interest where none exist, either of which can frustrate the training process 220. As previously discussed, training 220 is an empirical optimization process that typically requires several passes through the entire labeled training dataset, often requiring days, weeks, or months of time to complete. Only after having fully labeled 210 and extensively trained 220 may the candidate model be evaluated 230 for suitability in a process that itself may take further days, weeks, or months of time. As such, accurately labeling 210 the training data is critically important to the process and well labeled training data may expedite process 200.
[0043] FIG. 3 shows a conventional process of labeling 210 2-dimensional training data 205 to produce a labeled training dataset 310 in advance of training 220 a model of a deep learning algorithm used as part of a deep learning application of machine learning Al. The goal of the labeling 210 process is to accurately label aspects or features of interest in each instance 205 of training data to produce a labeled training dataset 310 that may be used to train 220 a model of the deep learning algorithm, such as, for example, an artificial neural network (e.g., 100 of FIG. 1), and generate a trained model (e.g, 250 of FIG. 2) capable of identifying or predicting similar aspects or features of
interest in data in a manner consistent with what was learned from the labeled training dataset 310.
[0044] For the purpose of this example, the training data consists of a plurality of 2- dimensional graphical images (e.g, set of images 205). Each instance of training data 205 must be meticulously and accurately labeled to identify aspects or features of interest resulting in an instance of labeled training data 305. This process is repeated for each instance of training data 205 resulting in a labeled training dataset 310. It is important to note that the labeling 210 process is typically completed in advance of starting the training process 220.
[0045] Each instance of training data 205 may be representative of a single image from a set of graphical images constituting the training dataset (e.g, set of images 205). Each instance of training data 205 may include one or more aspects or features 315a, 315b of interest. The aspects or features of interest are, in essence, the things we want the trained model to be able to identify in data. As such, the aspects or features of interest may vary based on the application. For the purpose of this example, the aspects or features of interest 315a, 315b are what we want the trained model to be able to detect in new data of first impression. One of ordinary skill in the art will recognize that aspects or features of interest 315a, 315b are merely exemplary and may vary in size, shape, number, nature, and orientation. For each image 205, one or more labels 325a, 325b may be applied to each aspect or feature of interest 315a, 315b, producing a labeled image 305. However, it should be noted that human operators often fail to recognize every aspect or feature of interest in training data (e.g, image 205) and sometimes misidentify aspects or features as being of interest when in fact they are not of interest. Putting aside the issue of the quality of the labeling effort (e.g, 210 of FIG. 2) for the time being, the conventional labeling process 210 must be repeated for each and every image 205 in the training dataset (e.g, set of images 205), which may include tens of thousands, hundreds of thousands, or even millions of images, prior to training 220 on the labeled training dataset 310. Only after each and every image 205 in the training dataset has been labeled to identify the aspects or features of interest, is the labeled training dataset 310 presented to the model of the deep learning algorithm, such as, for example, the artificial neural network (e.g, 100 of FIG. 1), for extensive training 220.
[0046] In many applications of deep learning, the goal may be to generate a trained model that can recognize, identify, or predict features of interest (e.g, 405a of FIG.
4A) in data. For example, a trained model of a deep learning algorithm may be used to recognize, identify, or predict faults, salts, stratigraphy, or channels in seismic data volumes. Similarly, in another application, a trained model of a deep learning algorithm may be used to recognize, identify, or predict cancerous tumors in computed tomography (“CT”) or magnetic resonance imaging (“MRI”) scans. One of ordinary skill in the art having the benefit of this disclosure will recognize that a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al may be used in any application seeking to identify features of interest in data that can be represented as 3-dimensional volumes of data in accordance with one or more embodiments of the present invention.
[0047] FIG. 4A shows feature 405a in a 3-dimensional volume of data 400 with a cutout 410 highlighting its 3-dimensional nature. As shown in the figure, data may be represented as a 3-dimensional volume 400 in three axes (e.g, X-axis, Y-axis, and Z- axis). One or more features 405a (only one shown) may be disposed anywhere within 3- dimensional volume 400. The one or more features 405a may be any identifiable aspect of interest and may vary in size, shape, number, nature, and orientation. Cutout 410 shows that feature 405a may have aspects that extend in each of the three axes of the 3- dimensional volume 400.
[0048] Continuing, FIG. 4B shows another view of feature 405a in 3-dimensional volume of data 400 with a cutout 415 highlighting the planar faces of feature 405a. When viewed in any two dimensions, such as, for example, either the XY, XZ, or YZ plane, different aspects, or views, of feature 405a may be obtained, depending on which planar face is being observed. In many applications, the source data constituting volume 400 is, by nature of how it is obtained, a plurality of unlabeled 2-dimensional slices (not shown) in a first planar orientation. In such cases, volume 400 corresponds to a stack of the plurality of unlabeled 2-dimensional slices (not shown) having the first planar orientation, and no partitioning is required. However, if the 3 -dimensional volume of data 400 is not partitioned, partitioning is required to convert the volume 400 into a plurality of unlabeled 2-dimensional slices (not shown) in a first planar orientation.
[0049] Continuing, FIG. 4C shows a process of partitioning 3-dimensional volume of data 400 into a plurality of unlabeled 2-dimensional slices (e.g, 420a, 425a, and 430a) having a first planar orientation. As previously discussed, a volume of data 400 is often formed by a stacked arrangement of a plurality of unlabeled 2-dimensional slices (not shown) in a first planar orientation and may not require partitioning. However, in the
event a volume 400 requires partitioning, it may be partitioned into a plurality of unlabeled 2-dimensional slices (e.g, 420a, 425a, and 430a) in any two dimensions (in this example, the XZ plane). While only three slices are shown in the figure to simplify the presentation and enhance understanding, one of ordinary skill in the art will understand that the number of slices may vary based on an application and is typically much larger in number. Because of the way in which volume 400 is partitioned into slices, the aspect or view of the one or more features 405a obtained therefrom may vary from slice to slice. For example, unlabeled 2-dimensional slice 420a may include aspect view 422a of feature 405a, unlabeled 2-dimensional slice 425a may include aspect view 427a of feature 405a, and unlabeled 2-dimensional slice 430a may include aspect view 432a of feature 405a, where aspect views 422a, 427a, and 432a may not be the same and may vary in size, shape, nature, and orientation. Put another way, the aspect or view of any given feature may vary from slice to slice.
[0050] Continuing, FIG. 4D shows the process of labeling (e.g., 210 of FIG. 2) unlabeled 2-dimensional slice 430a having the first planar orientation and training on labeled 2- dimensional slice 430b having the first planar orientation to produce predicted 2- dimensional slice 430c having the first planar orientation. An unlabeled 2-dimensional slice 430a may represent any slice obtained from volume 400 that has not been satisfactorily labeled. For the purposes of illustration, we will assume that the goal is to train a model of a deep learning algorithm to identify features represented by aspect view 432a in unlabeled 2-dimensional slice 430a.
[0051] Upon inspection, unlabeled 2-dimensional slice 430a having the first planar orientation may include one or more aspects 432a (only one aspect shown) of one or more features (e.g, 405a of FIG. 4C) (only one feature shown) that may be difficult to discern. During the labeling process (e.g, 210 of FIG. 2), one or more aspects 432a of the one or more features of interest (e.g, 405a of FIG. 4C) may be labeled, or marked, to identify it for training. Labeled 2-dimensional slice 430b may include applied labels, for example, labeled aspect 432b corresponding to unlabeled aspect 432a of unlabeled 2-dimensional slice 430a. Labeled 2-dimensional slice 430b may be submitted to train the model of the deep learning algorithm to learn to predict similar features of interest. After training, the dataset comprising predictions of features of interest may be used to produce a plurality of predicted 2-dimensional slices (e.g, only 430c shown) having the first planar orientation that include the model predicted aspects (e.g, 432c) of the features of interest (e.g, 405b of FIG. 4E) that ideally correspond to the labeled aspects
(e.g., 432b) of the unlabeled aspects (e.g, 432a) of the features of interest (e.g, 405a of FIG. 4C).
[0052] Continuing, FIG. 4E shows a first probability cube 490 created by a plurality of predicted 2-dimensional slices (e.g, 430c of FIG. 4D as example of one such slice) having the first planar orientation. After labeling and training on the data, a plurality of predicted 2-dimensional slices (e.g, 430c of FIG. 4D as example of one such slice) having the first planar orientation may be produced corresponding to the unlabeled (e.g., 430a of FIG. 4D) and the labeled (e.g., 430b of FIG. 4D) plurality of 2- dimensional slices having the first planar orientation. The plurality of predicted 2- dimensional slices (e.g., 430c of FIG. 4D as example of one such slice) having the first planar orientation may be reassembled to constitute a volume of data now referred to as first probability cube 490, because it includes predicted features of interest from each constituent slice (e.g., 430c). First probability cube 490 more clearly shows the one or more features of interest (e.g, 405b) as opposed to the initial volume of data (e.g, 400). This evidences that there was some level of success in labeling and training the model of the deep learning algorithm, because it is able to more clearly identify the one or more features of interest (e.g., 405b).
[0053] For the purpose of this disclosure, a first probability cube 490 is a representation of a plurality of predicted 2-dimensional slices having a first planar orientation. In one or more embodiments of the present invention, a first probability cube (e.g, 490) may be input to a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al. However, in other embodiments, the first probability cube (e.g., 490) may be generated as part of the method of probability cube to probability cube enhancement, both of which are discussed in more detail herein.
[0054] Conventional applications that seek to identify features of interest typically require domain experts to label each and every feature in each and every unlabeled 2- dimesional slice in a 3-dimensional volume of data. However, in some applications that utilize model- or Al-assisted labeling processes, only a small portion, often less than 1%, of the unlabeled 2-dimensional slices are manually labeled, with the remaining slices being labeled by the model or Al. However, model- and Al-assisted labeling processes, for various reasons, tend to result in lower quality predictions as compared to manual labeling processes. As such, a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al may be used to enhance the quality of predictions obtained from model- or Al-assisted labeling
processes. In such cases, the model- or Al-assisted labeling reduces the amount of time required to label a dataset and the method of probability cube to probability cube enhancement improves the quality of those predictions. Notwithstanding, to be clear, in one or more embodiments of the present invention, a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al may be used with manually labeled 2-dimensonal slices, model-assisted and labeled 2- dimensional slices, Al-assisted and labeled 2-dimensional, or any other combination thereof. Regardless of how the 2-dimensional slices are labeled, the method enhances the quality of the predictions.
[0055] Accordingly, in one or more embodiments of the present invention, a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al enhances the quality of predictions by inputting a first probability cube comprising predicted 2-dimensional slices having a first planar orientation, partitioning the first probability cube into a plurality of unlabeled orthogonal 2- dimensional slices having a second planar orientation, labeling one or more unlabeled orthogonal 2-dimensional slices having the second planar orientation, and training on one or more predicted orthogonal 2-dimensioanl slices having the second planar orientation to produce a second probability cube. The second planar orientation may be orthogonal to the first planer orientation (or at least at an offset angle that shows the most jagged edges, discontinuities, artifacts, or noise). Advantageously, the second probability cube substantially enhanced predictions with cleaner edges and less artifacts and noise.
[0056] FIG. 5A shows the creation of an orthogonal 2-dimensional slice 520a having a second planar orientation from first probability cube 490 in accordance with one or more embodiments of the present invention. In one or more embodiments of the present invention, a first probability cube 490 comprises a plurality of predicted 2-dimensional slices having a first planar orientation, that may be treated as input. In this example, first probability cube 490 is comprised of a plurality of 2-dimensional slices having a first planar orientation in the XZ plane, but could be the XY or YZ plane.
[0057] In one or more embodiments of the present invention, first probability cube 490 may be used as input to a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al, that includes partitioning, labeling, and training to enhance the resolution of the labels and resulting predictions. For example, a 2-dimensional slice may be taken having a second planar orientation
that is orthogonal to the first planar orientation used to create first probability cube 490. The resulting 2-dimensional slice 520a is referred to as an orthogonal 2-dimensional slice because it has the second planar orientation that is orthogonal to the first planar orientation.
[0058] Continuing, FIG. 5B shows orthogonal 2-dimensional slice 520a having the second planar orientation taken from the first probability cube 490 showing jagged edges, discontinuities, artifacts, and noise in accordance with one or more embodiments of the present invention. Orthogonal 2-dimensional slice 520a may, by virtue of its different planar orientation, shows undesirable artifacts and noise, where the edges are not clean, in aspect 522a that corresponds to a different view of predicted feature 405b of first probability cube 490. Because first probability cube 490 is created by predicted 2-dimensional slices that are oriented in the first planar orientation, the predicted aspects or features (e.g, 432c of FIG. 4D) are well defined in the first planar orientation. However, when first probability cube 490 is looked at with 2-dimensonal slices oriented in the second planar orientation (e.g, 520a), discontinuities in the edges of aspects (e.g, 522a) between adjacent slices in the first planar orientation are revealed.
[0059] Accordingly, in one or more embodiments of the present invention, first probability cube 490 may be used as input to a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al that operates in a second planar orientation that is orthogonal to the first planar orientation to enhance the resolution of the labels and resulting predictions of aspects or features, resulting in a newly created and enhanced second probability cube (e.g, 690 of FIG. 6C) that is of a higher resolution than the first probability cube (e.g, 490 of FIG. 4E). While a second planar orientation orthogonal to the first planar orientation tends to show the most artifacts and noise, one of ordinary skill in the art, having the benefit of this disclosure, will appreciate that any second planar orientation offset from the first planar orientation that shows the most artifacts and noise may be used in accordance with one or more embodiments of the present invention.
[0060] FIG. 6A shows a process of partitioning first probability cube 490 into a plurality of unlabeled orthogonal 2-dimensonal slices (e.g, 610a, 620a, 630a, and 640a) having a second planar orientation in accordance with one or more embodiments of the present invention. First probability cube 490 may be partitioned into a plurality of unlabeled orthogonal 2-dimensional slices (e.g, 610a, 620a, 630a, and 640a) having a second
planar orientation (in this instance, the XY plane) that is orthogonal to the first planar orientation (in this instance, the XZ plane) of the predicted 2-dimensonal slices (e.g, 430c of FIG. 4D) used to form first probability cube 490. While only four slices are shown in the figure to simplify the presentation and enhance understanding, one of ordinary skill in the art will understand that the number of slices may vary based on an application and is typically much larger in number. Because of the way in which first probability cube 490 is partitioned into slices, the aspect or view of the one or more features 405b obtained therefrom may vary from slice to slice. For example, unlabeled orthogonal 2-dimensional slice 610a may include an aspect view of feature 405b, unlabeled orthogonal 2-dimensional slice 620a may include an aspect view of feature 405b, unlabeled orthogonal 2-dimensional slice 630a may include an aspect view of feature 405b, and unlabeled orthogonal 2-dimensional slice 640a may include an aspect view of feature 405b, where the aspect views of feature 405b may not be the same and may vary in size, shape, nature, and orientation from slice to slice.
[0061] Continuing, FIG. 6B shows a process of labeling an unlabeled orthogonal 2- dimensional slice 610a having the second planar orientation and training on the labeled orthogonal 2-dimensional slice 610b having the second planar orientation to produce a predicted orthogonal 2-dimensional slice 610c having a second planar orientation in accordance with one or more embodiments of the present invention. An unlabeled orthogonal 2-dimensional slice 610a may represent any slice obtained from the partitioning of the first probability cube (e.g, 490 of FIG. 6A) that has not been satisfactorily labeled. For the purposes of illustration, we will assume that the goal is to train a model of a deep learning algorithm to identify features like aspect 612a in unlabeled orthogonal 2-dimensional slice 610a.
[0062] Upon inspection, unlabeled orthogonal 2-dimensional slice 610a having the second planar orientation may include one or more aspects 612a (only one aspect shown) of one or more features (e.g, 405b of FIG. 6A) (only one feature shown) that may include jagged edges, discontinuities, artifacts, or noise, when viewed in this planar orientation. During the labeling process (e.g, 210 of FIG. 2), one or more aspects 612a of the one or more features of interest (e.g, 405b of FIG. 6A) may be labeled, or marked, to identify it for training. Labeled orthogonal 2-dimensional slice 610b having the second planar orientation may include applied labels, for example, aspect 612b corresponding to unlabeled aspect 612a of unlabeled orthogonal 2- dimensional slice 610a having the second planar orientation. Labeled orthogonal 2-
dimensional slice 610b having the second planar orientation may be submitted to train on a model of a deep learning algorithm to identify similar features of interest. The model and deep learning algorithm may or may not be the same one used to create the first probability cube (e.g, 490 of FIG. 4E). After training, the trained model may be used to produce a plurality of predicted orthogonal 2-dimensional slices (e.g, only 610c shown) having the second planar orientation that include the model predicted aspects (e.g, 612c) of the features of interest (e.g, 405b of FIG. 6A) that ideally correspond to the labeled (e.g, 612b) and the unlabeled (e.g, 612a) aspects of the features of interest (e.g, 405b of FIG. 6A).
[0063] Continuing, FIG. 6C shows a second probability cube 690 created by a plurality of predicted orthogonal 2-dimensional slices (e.g, 610c of FIG. 6B as an example of one such slice) having the second planar orientation in accordance with one or more embodiments of the present invention. After labeling and training on the data, a plurality of predicted orthogonal 2-dimensional slices (e.g, 610c of FIG. 6B as an example of one such slice) having the second planar orientation may be produced corresponding to the plurality of unlabeled orthogonal 2-dimensional slices (e.g, 610a of FIG. 6B) having the second planar orientation and to the plurality of labeled orthogonal 2-dimensional slices (e.g, 610b of FIG. 6B) having the second planar orientation. The predicted orthogonal 2-dimensional slices (e.g, 610c of FIG. 6B) having the second planar orientation may be assembled to constitute a volume of data now referred to as second probability cube 690, because it includes predicted features of interest from each constituent slice (e.g, 610c). To be clear, second probability cube 690 comprises a plurality of predicted 2-dimensional slices (e.g, 610c of FIG. 6B) having a second planar orientation that is orthogonal to, or at least offset from, that of the first planar orientation. Second probability cube 690 more clearly shows the one or more features of interest (e.g, 405c) as opposed to the initial volume of data (e.g, 400) and the first probability cube (e.g, 490 of FIG. 6A). In this way, second probability cube 690 enhances and provides higher resolution predictions over that of the first probability cube (e.g, 490 of FIG. 6A).
[0064] FIG. 7 shows a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al in accordance with one or more embodiments of the present invention.
[0065] In certain embodiments, a first probability cube may serve as input to the method of probability cube to probability cube enhancement for deep learning applications of
machine learning Al (e.g, inputting a first probability cube 705b). In other embodiments, a plurality of unlabeled 2-dimensional slices having a first planar orientation may be used to create the first probability cube (e.g, requiring labeling 710a and training 720a) before substantive steps of the method are undertaken. In still other embodiments, a unpartitioned volume of data may be used to create the first probability cube (e.g., requiring inputting and partitioning 705a, labeling 710a, and training 720a) before substantive steps of the method are undertaken.
[0066] In step 705a, a 3-dimensional volume of data may be input into a deep learning software application of a computer. The 3-dimensional volume may comprise a plurality of unlabeled 2-dimensional slices having a first planar orientation or a volume of data that must be partitioned into a plurality of unlabeled 2-dimensional slices having a first planar orientation.
[0067] In step 710a, one or more features of interest may be manually labeled in each of the plurality of unlabeled 2-dimensionals slices having the first planar orientation or in at least a few of the plurality of unlabeled 2-dimensional slices having the first planar orientation in model- or Al-assisted labeling processes, to produce a plurality of labeled 2-dimensional slices having the first planar orientation.
[0068] In step 720a, a model of a deep learning algorithm may be trained on the plurality of labeled 2-dimensionals slices having the first planar orientation to produce a plurality of predicted 2-dimensional slices having the first planar orientation. In certain embodiments, the deep learning algorithm may be an artificial neural network. In other embodiments, the deep learning algorithm may be a convolutional neural network. In still other embodiments, the deep learning algorithm may be radial basis function network, a recurrent neural network, a long short-term memory network, a selforganizing map, an autoencoder, or a deep belief network. One of ordinary skill in the art will recognize that any other type or kind of deep learning algorithm may be used in accordance with one or more embodiments of the present invention. In certain embodiments, training the model may include training a new model of a deep learning algorithm. In other embodiments, training the model may include training with a pretrained, or “canned”, model of a deep learning algorithm.
[0069] In step 730a, the trained model may be evaluated by determining how well one or more of the predicted 2-dimensional slices having the first planar orientation recognizes or predicts features of interest. If it is determined that the trained model does not accurately predict features of interest, the process may return to labeling step 710a
to correct one or more labels in the labeled 2-dimensional slices, or in the case of model- or Al-assisted labeling, potentially labeling more 2-dimensional slices having the first planar orientation. However, if it is determined that the trained model accurately predicts features of interest, the plurality of predicted 2-dimensional slices having the first planar orientation may be assembled to form the first probability cube.
[0070] In step 705b, the first probability cube may be input into a deep learning software application of the computer. The first probability cube may be partitioned into a plurality of unlabeled orthogonal 2-dimensional slices having a second planar orientation. In certain embodiments, the second planar orientation may be orthogonal to the first planar orientation. Typically, the two axes orthogonal to the first planar orientation and showing the most discontinuities, artifacts, and noise would be used. In other embodiments, the second planar orientation may be offset from the first planar orientation at an angle that shows the most discontinuities, artifacts, and noise.
[0071] In step 710b, one or more features of interest may be manually labeled in the each of the plurality of unlabeled orthogonal 2-dimensionals slices having the second planar orientation, or at least a few of the plurality of unlabeled orthogonal 2-dimensional slices having the second planar orientation may be manually labeled with the remaining being labeled in model- or Al-assisted labeling processes, to produce a plurality of labeled orthogonal 2-dimensional slices having the second planar orientation.
[0072] In step 720b, a model of a deep learning algorithm may be trained on a plurality of labeled orthogonal 2-dimensionals slices having the second planar orientation to produce a plurality of predicted orthogonal 2-dimensional slices having the second planar orientation.
[0073] In step 730b, the trained model may be evaluated by determining how well one or more of the predicted orthogonal 2-dimensional slices having the second planar orientation recognize or predict features of interest. If it is determined that the trained model does not accurately predict features of interest, the process may return to labeling step 710b to correct the one or more labels in the labeled orthogonal 2-dimensional slices having the second planar orientation, or in the case of Al-assisted labeling, potentially labeling more orthogonal 2-dimensional slices having the second planar orientation. However, if it is determined that the trained model accurately predicts features of interest, the plurality of predicted orthogonal 2-dimensional slices having the second planar orientation may be assembled to form the second probability cube. The
second probability cube comprises a plurality of predicted orthogonal 2-dimensional slices having the second planar orientation.
[0074] Because the data is labeled and trained in orthogonal planes, the predicted features of interest in the second probability cube have enhanced resolution with little to no jagged edges, discontinuities, artifacts, or noise.
[0075] In one or more embodiments of the present invention, any of the above-noted methods may be implemented at least in part by a computer (e.g., 800 of FIG. 8) in accordance with one or more embodiments of the present invention. In addition, in one or more embodiments of the present invention, a non-transitory computer readable medium comprising software instructions, when executed by a processor, may perform any of the above-noted methods in accordance with one or more embodiments of the present invention.
[0076] FIG. 8 shows a computer 800 for performing at least part of a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al in accordance with one or more embodiments of the present invention. While computer 800 is merely exemplary of an Intel® x86 instruction set architecture computing system, one of ordinary skill in the art will appreciate that computer 800 may be any other type or kind of computer capable of executing software instructions that perform at least part the above-noted method.
[0077] Computer 800 may include one or more processors, sometimes referred to as central processing units (“CPUs”) 805, host bridge 810, input/output (“IO”) bridge 815, graphics processing units (“GPUs”) 825, and/or application-specific integrated circuits (“ASICs”) (not shown) disposed on one or more printed circuit boards (not shown) that perform computational operations in accordance with an instruction set architecture (“ISA”). Each of the one or more CPUs 805, GPUs 825, or ASICs (not shown) may be a single-core (not shown) device or a multi-core (not shown) device. Multi-core devices typically include a plurality of cores (not shown) disposed on the same physical die (not shown) or a plurality of cores (not shown) disposed on multiple die (not shown) that are collectively disposed within the same mechanical package (not shown).
[0078] CPU 805 may be a general-purpose computational device typically configured to execute software instructions for a specific instruction set architecture. CPU 805 may include an interface 808 to host bridge 810, an interface 818 to system memory 820, and an interface 823 to one or more IO devices, such as, for example, one or more GPUs 825. GPU 825 may be a specialized computational device typically configured to
perform graphics functions related to frame buffer manipulation. However, one of ordinary skill in the art will recognize that GPU 825 may be used to perform computationally intensive mathematical functions, including training a deep learning algorithm. In certain embodiments, GPU 825 may interface 823 directly with CPU 805 (and interface 818 with system memory 820 through CPU 805). In other embodiments, GPU 825 may interface 821 with host bridge 810 (and interface 816 with system memory 820 through host bridge 810 or interface 818 with system memory 820 through CPU 805 depending on the application or design). In still other embodiments, GPU 825 may interface 833 with IO bridge 815 (and interface 816 with system memory 820 through host bridge 810 or interface 818 with system memory 820 through CPU 805 depending on the application or design). One or ordinary skill in the art will appreciate that the functionality of GPU 825 may be integrated, in whole or in part, with CPU 805.
[0079] Host bridge 810 may be an interface device that interfaces between the one or more computational devices (e.g, CPUs 805, GPUs 825, ASICs) and IO bridge 815 and, in some embodiments, system memory 820. Host bridge 810 may include an interface 808 to CPU 805, an interface 813 to IO bridge 815, for embodiments where CPU 805 does not include an interface 818 to system memory 820, an interface 816 to system memory 820, and for embodiments where CPU 805 does not include an integrated GPU 825 or an interface 823 to GPU 825, an interface 821 to GPU 825. One or ordinary skill in the art will appreciate that the functionality of host bridge 810 may be integrated, in whole or in part, with CPU 805. IO bridge 815 may be an interface device that interfaces between the one or more computational devices (e.g, CPUs 805, GPUs 825, ASICs) and various IO devices (e.g, 840, 845) and IO expansion, or add-on, devices (not independently illustrated). IO bridge 815 may include an interface 813 to host bridge 810, one or more interfaces 833 to one or more IO expansion devices 835, an interface 838 to keyboard 840, an interface 843 to mouse 845, an interface 848 to one or more local storage devices 850, and an interface 853 to one or more network interface devices 855. One or ordinary skill in the art will appreciate that the functionality of IO bridge 815 may be integrated, in whole or in part, with CPU 805 and/or host bridge 810. Each local storage device 850, if any, may be a solid-state memory device, a solid-state memory device array, a hard disk drive, a hard disk drive array, or any other non-transitory computer readable medium. Network interface device 855 may provide one or more network interfaces including any network protocol suitable to facilitate networked communications.
[0080] Computer 800 may include one or more network-attached storage devices 860 in addition to, or instead of, one or more local storage devices 850. Each network-attached storage device 860, if any, may be a solid-state memory device, a solid-state memory device array, a hard disk drive, a hard disk drive array, or any other non-transitory computer readable medium. Network-attached storage device 860 may or may not be collocated with computing system 800 and may be accessible to computing system 800 via one or more network interfaces provided by one or more network interface devices 855.
[0081] One of ordinary skill in the art will recognize that computer 800 may be a conventional computing system or an application-specific computing system (not shown). In certain embodiments, an application-specific computing system (not shown) may include one or more ASICs (not shown) that perform one or more specialized functions in a more efficient manner. The one or more ASICs (not shown) may interface directly with CPU 805, host bridge 810, or GPU 825 or interface through IO bridge 815. Alternatively, in other embodiments, an application-specific computing system (not shown) may be reduced to only those components necessary to perform a desired function in an effort to reduce one or more of chip count, printed circuit board footprint, thermal design power, and power consumption. The one or more ASICs (not shown) may be used instead of one or more of CPU 805, host bridge 810, IO bridge 815, or GPU 825. In such systems, the one or more ASICs may incorporate sufficient functionality to perform certain network and computational functions in a minimal footprint with fewer component devices.
[0082] As such, one of ordinary skill in the art will recognize that CPU 805, host bridge 810, IO bridge 815, GPU 825, or ASIC (not shown) or a subset, superset, or combination of functions or features thereof, may be integrated, distributed, or excluded, in whole or in part, based on an application, design, or form factor in accordance with one or more embodiments of the present invention. Thus, the description of computer 800 is merely exemplary and not intended to limit the type, kind, or configuration of component devices that constitute a computer 800 suitable for executing software instructions in accordance with one or more embodiments of the present invention. Notwithstanding the above, one of ordinary skill in the art will recognize that computer 800 may be a standalone, laptop, desktop, industrial, server, blade, or rack mountable system and may vary based on an application or design.
[0083] Advantages of one or more embodiments of the present invention may include one or more of the following:
[0084] In one or more embodiments of the present invention, a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al substantially improves the quality of predictions and in model- or AI- assisted labeling processes, with minimum labeling effort.
[0085] In one or more embodiments of the present invention, a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al substantially improves the ability of the deep learning algorithm to enhance and improve jagged edges and discontinuities and eliminate noise and artifacts in features of interest present in the first probability cube.
[0086] In one or more embodiments of the present invention, a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al reduces the amount of time required to train the labeled training dataset.
[0087] In one or more embodiments of the present invention, a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al reduces or eliminates the need for subsequent iterations of labeling, training, and evaluating.
[0088] In one or more embodiments of the present invention, a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al reduces the computational complexity of training.
[0089] In one or more embodiments of the present invention, a method of probability cube to probability cube enhancement for deep learning applications of machine learning Al reduces the costs associated with training.
[0090] While the present invention has been described with respect to the above-noted embodiments, those skilled in the art, having the benefit of this disclosure, will recognize that other embodiments may be devised that are within the scope of the invention as disclosed herein. Accordingly, the scope of the invention should only be limited by the appended claims.
Claims
What is claimed is: ethod of probability cube to probability cube enhancement for deep learning comprising: generating a second probability cube of data by: inputting a first probability cube comprising a plurality of predicted 2- dimensional slices having a first planar orientation, partitioning the first probability cube of data into a plurality of unlabeled orthogonal 2-dimensional slices having a second planar orientation, labeling one or more features of interest in the plurality of unlabeled orthogonal 2- dimensional slices having the second planar orientation to produce a plurality of labeled orthogonal 2-dimensional slices having the second planar orientation, and training a model of a deep learning algorithm on the plurality of labeled orthogonal 2-dimensional slices of data having the second planar orientation to produce a plurality of predicted orthogonal 2-dimensional slices having the second planar orientation. method of claim 1, further comprising: generating the first probability cube of data by: labeling one or more features of interest in a plurality of unlabeled 2- dimensional slices having a first planar orientation to produce a plurality of labeled 2-dimensional slices having the first planar orientation, and training a model of a deep learning algorithm on the plurality of labeled 2- dimensional slices having the first planar orientation to produce a plurality of predicted 2-dimensional slices having the first planar orientation.
27
The method of claim 2, further comprising: assembling the plurality of predicted 2-dimensional slices having the first planar orientation to form the first probability cube. The method of claim 2, further comprising: inputting a 3-dimensional volume of data, and partitioning the 3-dimensional volume of data into the plurality of unlabeled 2- dimensional slices having the first planar orientation, The method of claim 1, further comprising: assembling the plurality of predicted orthogonal 2-dimensional slices having the second planar orientation to form the second probability cube. The method of claim 1, wherein the second planar orientation is orthogonal to the first planar orientation. The method of claim 1, wherein the deep learning algorithm is an artificial neural network. The method of claim 1, wherein the deep learning algorithm is a convolutional neural network. The method of claim 1, wherein the deep learning algorithm is a radial basis function network, a recurrent neural network, a long short-term memory network, a self-organizing map, an autoencoder, or a deep belief network.
The method of claim 1, wherein training the model of the deep learning algorithm includes training with a new model of a deep learning algorithm. The method of claim 1, wherein training the model of the deep learning algorithm includes training with a pre-trained model of a deep learning algorithm. The method of claim 2, wherein the deep learning algorithm is an artificial neural network. The method of claim 2, wherein the deep learning algorithm is a convolutional neural network. The method of claim 2, wherein the deep learning algorithm is a radial basis function network, a recurrent neural network, a long short-term memory network, a self-organizing map, an autoencoder, or a deep belief network. The method of claim 2, wherein training the model of the deep learning algorithm includes training with a new model of a deep learning algorithm. The method of claim 2, wherein training the model of the deep learning algorithm includes training with a pre-trained model of a deep learning algorithm. The method of claim 2, wherein the model of the deep learning algorithm used for training as part of generating the first probability cube is not the same as the model of the deep learning algorithm used for training as part of generating the second probability cube.
The method of claim 2, wherein the deep learning algorithm is used for training as part of generating the first probability cube and then the second probability cube. The method of claim 2, wherein the deep learning algorithm used for training as part of generating the first probability cube is not the same as the deep learning algorithm used for training as part of generating the second probability cube. A computer-implemented method of probability cube to probability cube enhancement for deep learning comprising: inputting a first probability cube comprising a plurality of predicted 2-dimensional slices having a first planar orientation, partitioning the first probability cube of data into a plurality of unlabeled orthogonal 2- dimensional slices having a second planar orientation, labeling one or more features of interest in the plurality of unlabeled orthogonal 2- dimensional slices having the second planar orientation to produce a plurality of labeled orthogonal 2-dimensional slices having the second planar orientation, and training a model of a deep learning algorithm on the plurality of labeled orthogonal 2- dimensional slices of data having the second planar orientation to produce a plurality of predicted orthogonal 2-dimensional slices having the second planar orientation. The computer-implemented method of claim 20, further comprising: generating the first probability cube of data by:
labeling one or more features of interest in a plurality of unlabeled 2- dimensional slices having a first planar orientation to produce a plurality of labeled 2-dimensional slices having the first planar orientation, and training a model of a deep learning algorithm on the plurality of labeled 2- dimensional slices having the first planar orientation to produce a plurality of predicted 2-dimensional slices having the first planar orientation. The computer-implemented method of claim 21, further comprising: assembling the plurality of predicted 2-dimensional slices having the first planar orientation to form the first probability cube. The computer-implemented method of claim 21, further comprising: inputting a 3-dimensional volume of data, and partitioning the 3-dimensional volume of data into the plurality of unlabeled 2- dimensional slices having the first planar orientation, The computer-implemented method of claim 20, further comprising: assembling the plurality of predicted orthogonal 2-dimensional slices having the second planar orientation to form the second probability cube. The computer-implemented method of claim 20, wherein the second planar orientation is orthogonal to the first planar orientation. The computer-implemented method of claim 20, wherein the deep learning algorithm is an artificial neural network.
31
The computer-implemented method of claim 20, wherein the deep learning algorithm is a convolutional neural network. The computer-implemented method of claim 20, wherein the deep learning algorithm is a radial basis function network, a recurrent neural network, a long short-term memory network, a self-organizing map, an autoencoder, or a deep belief network. The computer-implemented method of claim 20, wherein training the model of the deep learning algorithm includes training with a new model of a deep learning algorithm. The computer-implemented method of claim 20, wherein training the model of the deep learning algorithm includes training with a pre-trained model of a deep learning algorithm. The computer-implemented method of claim 21, wherein the deep learning algorithm is an artificial neural network. The computer-implemented method of claim 21, wherein the deep learning algorithm is a convolutional neural network. The computer-implemented method of claim 21, wherein the deep learning algorithm is a radial basis function network, a recurrent neural network, a long short-term memory network, a self-organizing map, an autoencoder, or a deep belief network. The computer-implemented method of claim 21, wherein training the model of the deep learning algorithm includes training with a new model of a deep learning algorithm.
32
The computer-implemented method of claim 21, wherein training the model of the deep learning algorithm includes training with a pre-trained model of a deep learning algorithm. The computer-implemented method of claim 21, wherein the model of the deep learning algorithm used for training as part of generating the first probability cube is not the same as the model of the deep learning algorithm used for training as part of generating the second probability cube. The computer-implemented method of claim 21, wherein the deep learning algorithm is used for training as part of generating the first probability cube and then the second probability cube. The computer-implemented method of claim 21, wherein the deep learning algorithm used for training as part of generating the first probability cube is not the same as the deep learning algorithm used for training as part of generating the second probability cube. A non-transitory computer-readable medium comprising software instructions that, when executed by a processor, perform a method of probability cube to probability cube enhancement for deep learning comprising: generating a second probability cube of data by: inputting a first probability cube comprising a plurality of predicted 2- dimensional slices having a first planar orientation, partitioning the first probability cube of data into a plurality of unlabeled orthogonal 2-dimensional slices having a second planar orientation, labeling one or more features of interest in the plurality of unlabeled orthogonal 2- dimensional slices having the second planar orientation to produce a
33
plurality of labeled orthogonal 2-dimensional slices having the second planar orientation, and training a model of a deep learning algorithm on the plurality of labeled orthogonal 2-dimensional slices of data having the second planar orientation to produce a plurality of predicted orthogonal 2-dimensional slices having the second planar orientation. non-transitory computer-readable medium of claim 39, further comprising: generating the first probability cube of data by: labeling one or more features of interest in a plurality of unlabeled 2- dimensional slices having a first planar orientation to produce a plurality of labeled 2-dimensional slices having the first planar orientation, and training a model of a deep learning algorithm on the plurality of labeled 2- dimensional slices having the first planar orientation to produce a plurality of predicted 2-dimensional slices having the first planar orientation. non-transitory computer-readable medium of claim 40, further comprising: assembling the plurality of predicted 2-dimensional slices having the first planar orientation to form the first probability cube. non-transitory computer-readable medium of claim 40, further comprising: inputting a 3-dimensional volume of data, and partitioning the 3-dimensional volume of data into the plurality of unlabeled 2- dimensional slices having the first planar orientation, non-transitory computer-readable medium of claim 39, further comprising:
34
assembling the plurality of predicted orthogonal 2-dimensional slices having the second planar orientation to form the second probability cube. The non-transitory computer-readable medium of claim 39, wherein the second planar orientation is orthogonal to the first planar orientation. The non-transitory computer-readable medium of claim 39, wherein the deep learning algorithm is an artificial neural network. The non-transitory computer-readable medium of claim 39, wherein the deep learning algorithm is a convolutional neural network. The non-transitory computer-readable medium of claim 39, wherein the deep learning algorithm is a radial basis function network, a recurrent neural network, a long short-term memory network, a self-organizing map, an autoencoder, or a deep belief network. The non-transitory computer-readable medium of claim 39, wherein training the model of the deep learning algorithm includes training with a new model of a deep learning algorithm. The non-transitory computer-readable medium of claim 39, wherein training the model of the deep learning algorithm includes training with a pre-trained model of a deep learning algorithm. The non-transitory computer-readable medium of claim 40, wherein the deep learning algorithm is an artificial neural network.
35
The non-transitory computer-readable medium of claim 40, wherein the deep learning algorithm is a convolutional neural network. The non-transitory computer-readable medium of claim 40, wherein the deep learning algorithm is a radial basis function network, a recurrent neural network, a long short-term memory network, a self-organizing map, an autoencoder, or a deep belief network. The non-transitory computer-readable medium of claim 40, wherein training the model of the deep learning algorithm includes training with a new model of a deep learning algorithm. The non-transitory computer-readable medium of claim 40, wherein training the model of the deep learning algorithm includes training with a pre-trained model of a deep learning algorithm. The non-transitory computer-readable medium of claim 40, wherein the model of the deep learning algorithm used for training as part of generating the first probability cube is not the same as the model of the deep learning algorithm used for training as part of generating the second probability cube. The non-transitory computer-readable medium of claim 40, wherein the deep learning algorithm is used for training as part of generating the first probability cube and then the second probability cube. The non-transitory computer-readable medium of claim 40, wherein the deep learning algorithm used for training as part of generating the first probability cube is not the same as
36
the deep learning algorithm used for training as part of generating the second probability cube.
37
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202163291070P | 2021-12-17 | 2021-12-17 | |
| US63/291,070 | 2021-12-17 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023114364A1 true WO2023114364A1 (en) | 2023-06-22 |
Family
ID=86773445
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2022/052955 Ceased WO2023114364A1 (en) | 2021-12-17 | 2022-12-15 | Probability cube to probability cube enhancement for deep learning artificial intelligence |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2023114364A1 (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190365341A1 (en) * | 2018-05-31 | 2019-12-05 | Canon Medical Systems Corporation | Apparatus and method for medical image reconstruction using deep learning to improve image quality in position emission tomography (pet) |
| WO2021126370A1 (en) * | 2019-12-20 | 2021-06-24 | Genentech, Inc. | Automated tumor identification and segmentation with medical images |
-
2022
- 2022-12-15 WO PCT/US2022/052955 patent/WO2023114364A1/en not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190365341A1 (en) * | 2018-05-31 | 2019-12-05 | Canon Medical Systems Corporation | Apparatus and method for medical image reconstruction using deep learning to improve image quality in position emission tomography (pet) |
| WO2021126370A1 (en) * | 2019-12-20 | 2021-06-24 | Genentech, Inc. | Automated tumor identification and segmentation with medical images |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20220129791A1 (en) | Systematic approach for explaining machine learning predictions | |
| US20230229891A1 (en) | Reservoir computing neural networks based on synaptic connectivity graphs | |
| US11593627B2 (en) | Artificial neural network architectures based on synaptic connectivity graphs | |
| US11568201B2 (en) | Predicting neuron types based on synaptic connectivity graphs | |
| US11625611B2 (en) | Training artificial neural networks based on synaptic connectivity graphs | |
| US11375176B2 (en) | Few-shot viewpoint estimation | |
| US11631000B2 (en) | Training artificial neural networks based on synaptic connectivity graphs | |
| US20220188605A1 (en) | Recurrent neural network architectures based on synaptic connectivity graphs | |
| Yan et al. | Navigating uncertainties in machine learning for structural dynamics: A comprehensive review of probabilistic and non-probabilistic approaches in forward and inverse problems | |
| CN115051864B (en) | Network security situation element extraction method and system based on PCA-MF-WNN | |
| Wibowo et al. | 12 mJ Per Class On-Device Online Few-Shot Class-Incremental Learning | |
| WO2023114364A1 (en) | Probability cube to probability cube enhancement for deep learning artificial intelligence | |
| Pu et al. | On the effect of loss function in GAN based data augmentation for fault diagnosis of an industrial robot | |
| Giedra et al. | Assessing and forecasting tensorflow lite model execution time in different platforms | |
| Tmamna et al. | Bare‐Bones particle Swarm optimization‐based quantization for fast and energy efficient convolutional neural networks | |
| Müller et al. | Resource-efficient medical image analysis with self-adapting Forward-Forward networks | |
| Pan et al. | Bearing compound fault diagnosis considering the fusion fragment data and multi-head attention mechanism considering the actual variable working conditions | |
| Ansari | Deep learning and artificial neural networks | |
| US12124953B2 (en) | Interactive qualitative-quantitative live labeling for deep learning artificial intelligence | |
| WO2022023697A1 (en) | Methods and systems for verifying and predicting the performance of machine learning algorithms | |
| Huseljic et al. | Efficient Bayesian Updates for Deep Active Learning via Laplace Approximations | |
| Asha et al. | Analyzing the Trade-Offs Between Model Size and Approximation in Deep Learning | |
| US20250390740A1 (en) | Device and method for determining a model for an unknown function | |
| Liu et al. | Enhancing Image Classification with Bayesian Neural Networks: Improved Training Techniques and Decision Strategies for Reliable AI Support in Uncertainty-Sensitive Tasks | |
| Fathi et al. | Ideal Parametrisation Estimation for Variational Quantum Circuit Classifiers Using Machine Learning |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22908422 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22908422 Country of ref document: EP Kind code of ref document: A1 |