WO2025056614A1

WO2025056614A1 - Data processing device, medical observation apparatus and method

Info

Publication number: WO2025056614A1
Application number: PCT/EP2024/075382
Authority: WO
Inventors: George Themelis
Original assignee: Leica Microsystems CMS GmbH; Leica Instruments Singapore Pte Ltd
Current assignee: Leica Microsystems CMS GmbH; Leica Instruments Singapore Pte Ltd
Priority date: 2023-09-12
Filing date: 2024-09-11
Publication date: 2025-03-20
Anticipated expiration: 2026-03-12

Abstract

The present invention relates to a data processing device (300) for a medical observation apparatus, such as an endoscope, microscope or any other type of medical imaging device. The data processing device is configured to obtain (110) input image data (115), the input image data representing a scene acquired by the medical observation apparatus, to analyze (120) the input image data to determine different categories (125) in the scene, to generate (130) a plurality of stereoscopic images (135) from the input image data (115), each one of the stereoscopic images representing a different category (125) determined in the scene, to assign (140), based on the determined categories, a different disparity to each of the plurality of stereoscopic images to produce a plurality of processed stereoscopic images (145) and to combine (150) the plurality of processed stereoscopic images (145) to generate a combined stereoscopic image (155). Advantageously, the data processing device (300) enables stereoscopic visualization, even though it does not necessarily require the input image data (115) to derive from an apparatus compatible with stereoscopic imaging. The invention also relates to a medical observation apparatus (304) with such a data processing device (300). Further, the invention relates to a computer-implemented method (100) as well as a computer-readable medium and a computer program product.

Description

Data Processing Device, Medical Observation Apparatus and Method

The present invention relates to a data processing device for a medical observation apparatus, such as an endoscope, microscope or any other type of medical imaging device. The invention also relates to a medical observation apparatus comprising such a data processing device. Further, the invention relates to a computer-implemented method as well as a computer readable medium and a computer program product.

In the field of medical and biomedical observation, three-dimensional scenes containing various types of objects need to be captured and viewed. For example, in such a scene, a multitude of vessels, tissue, tumors, organs, organisms, materials and/or substances may be spread all over a three-dimensional space while partially or completely overlapping with each other.

Given this, the visualization of such a scene can be a demanding process, since an accurate representation of complex multidimensional data, which reflect the manifold compositions and locations of the objects, is required. For human vision to accurately perceive and comprehend the visualization of such a scene, the compositions can be conveyed through colors and the locations are often conveyed by stereoscopic visualization.

Stereoscopic visualization methods present a left-right pair of two-dimensional images with a certain perspective difference to a viewer. The left image is presented to the left eye and the right image is presented to the right eye. When viewed, the human brain perceives the images as a single three-dimensional view, giving the viewer the perception of three-dimensional depth. It is the perspective difference between the images seen through the left and right eyes of the viewer, the so-called binocular disparity, and the viewer’s accommodation through focusing that completes the three-dimensional view.

The separate presentation of the images - one for the left eye and one for the right eye - is generally incorporated through the use of specialized glasses and/or displays. There are two categories of three-dimensional viewing technology: active and passive. Active viewing utilizes electronics, e.g., specialized glasses, which interact with specialized displays. Passive viewing filters constant streams of binocular input to the appropriate eye e.g. with specialized displays that split the images directionally into the viewer's eyes.

Conventionally, the pair of two-dimensional images - required for stereoscopic visualization - needs to be captured using a specialized apparatus compatible with stereoscopic imaging. Thus, so far, stereoscopic visualization is not an option, if a stereoscopic imaging apparatus is or was not available when capturing the scene.

In addition, limited bandwidth and/or memory capacity might also prohibit the use of stereoscopic visualization for certain applications. After all, the pair of two-dimensional images takes up twice the amount of bandwidth during transmission and twice the amount of memory capacity for storage, compared to a single image of the same resolution.

Consequently, there is a need for improving the applicability of stereoscopic visualization in the field of medicine and other scientific disciplines.

Thus, an object of the present invention is to provide means which facilitate the visualization of complex multidimensional data in general, and which make possible stereoscopic visualization of such data in particular.

This object is achieved by a data processing device for a medical observation apparatus, such as an endoscope or microscope, the data processing device being configured to obtain input image data, the input image data representing a scene acquired by the medical observation apparatus, to analyze the input image data in order to determine different categories in the scene, to generate a plurality of stereoscopic images from the input image data, each one of the stereoscopic images representing a different category determined in the scene, to assign, based on the determined categories, a different disparity to each of the plurality of stereoscopic images in order to produce a plurality of processed stereoscopic images, and to combine the plurality of processed stereoscopic images in order to generate a combined stereoscopic image.

As will be described in detail further below, the categories determined by the data processing device serve to group the input image data based on their content and based on commonalities of the content. In other words, parts of the content that share commonalities with other parts of the content are determined as belonging to the same category. For example, all parts of the content that show the same object (e.g. vessel, tissue, tumor, organ, organism, material, substance) may belong to the same category.

Once determined, the categories are mapped to the plurality of stereoscopic images, which then each represent the determined categories. The categories are mapped bijectively to the stereoscopic images, i.e. there is a one-to-one assignment between the categories and the stereoscopic images. A category is uniquely assigned to one of the stereoscopic images and, likewise, each stereoscopic image uniquely belongs to one of the categories. Herein, a stereoscopic image is made up of two single images, each one representing a different viewing channel, that is, a different viewing direction. For example, each one of the stereoscopic images comprises a pair of digital images, wherein a left digital image of the pair of digital images represents an image to be presented to the left eye of the viewer, and a right digital image of the pair represents an image to be presented to the right eye of the viewer. A processed stereoscopic image may denote a stereoscopic image which was processed according to the disparity assigned to it.

In achieving the above object of the present invention, the data processing device according to the present invention is advantageous for the following reasons:

Mainly, the data processing device does not necessarily require the input image data to derive from an apparatus compatible with stereoscopic imaging. As such, the amount of bandwidth and capacity used up by the input image data is comparatively low. For example, the input image data may be an RGB image, a color image with at least two color layers, a monochrome image, a hyperspectral imaging data cube, a multispectral imaging data cube, or the like. Of course, the input image data may already be a pair of digital images representing a stereoscopic image, i.e. an input stereoscopic image. However, this is no requirement for the data processing device to be capable of generating the combined stereoscopic image.

Thus, the data processing device according to the present invention enables stereoscopic visualization and consequently solves the above object.

The above object is further achieved by a medical observation apparatus comprising the above data processing device, an optical instrument being configured to capture the input image data and provide the input image data to the data processing device, a user interface being configured to receive a user input and/or a user selection input, and a display device being configured to receive the combined stereoscopic image.

Since the medical observation apparatus contains the data processing device according to the present invention, it benefits from the above-described functions and advantages of the data processing device. Hence, the medical observation apparatus also achieves the object of the present invention. Further, the medical observation apparatus is ready to be used for imaging in the field of medical and biomedical observation.

Moreover, the above object is solved by a computer-implemented method for processing input image data from a medical observation apparatus, the method comprising the steps of obtaining the input image data representing a scene imaged by the medical observation apparatus, analyzing the input image data to determine different categories in the imaged scene, generating a plurality of stereoscopic images from the input image data, each one of the stereoscopic images representing a different category determined in the imaged scene, assigning, based on the determined categories, a different disparity to each of the plurality of stereoscopic images resulting in a plurality of processed stereoscopic images, and combining the plurality of processed stereoscopic images with one another resulting in a combined stereoscopic image.

The computer-implemented method achieves the above object, since it yields the stereoscopic image from the input image data, which do not have to be stereoscopic themselves.

Lastly, the above object is also solved by a computer-readable medium as well as a computer program product each comprising instructions, which, when executed by a computer, cause the computer to carry out the method. In particular, the computer-readable medium and computer program product allow the method of the present invention to be implemented on a general-purpose computer, such as a PC.

The data processing device, medical observation apparatus and method may be improved further by adding one or more of the features described in the following. Each of these features may be added to the method and/or the data processing device independently of the other features. In particular, a person skilled in the art - with knowledge of the inventive data processing device - is capable of configuring the inventive method such that the inventive method is capable of operating the inventive data processing device. Moreover, each feature has its own advantageous technical effect, as will be explained hereinafter.

According to one possible embodiment, generating the plurality of stereoscopic images that represent the different categories may comprise the step of generating pairs of identical digital images, the digital images having non-zero pixel values in all pixels that represent a category determined in the scene. Pixels in the digital image having zero pixel values do not represent one of the categories represented by the non-zero valued pixels in the same digital image. The pair of identical digital images can be used as a starting point for the generation of the left and right digital image. This will be described in further detail below.

Preferably, the data processing device is further configured to apply the assigned disparity to the plurality of stereoscopic images in order to arrive at the plurality of processed stereoscopic images. In other words, the data processing device may be configured to produce the processed stereoscopic images by applying the disparity assigned to a stereoscopic image. Accordingly, the computer-implemented method may further comprise the step of producing the plurality of processed stereoscopic images by applying the assigned disparity to the stereoscopic images. In the context of the present disclosure, assigning a disparity may be used as a synonym for applying the disparity to a stereoscopic image.

For example, to each processed stereoscopic image a disparity transform, i.e. a disparity shift may be applied according to the assigned disparity. The resulting processed stereoscopic image differs from the (original) stereoscopic image in that the single digital images (making up each stereoscopic image) now have a disparity according to the assigned disparity. In particular, the data processing device may be configured to apply the respective disparity assigned to a stereoscopic image of the plurality of stereoscopic images by shifting (i.e. moving) the pixels in the left digital image of the stereoscopic image by an amount representing the assigned disparity, and by shifting (i.e. moving) the pixels in the right digital image of the stereoscopic image in an opposite direction by the amount representing the assigned disparity.

When shifted, the pixels may be moved along a horizontal axis representing the horizontal distance direction of the viewer’s eyes. Moreover, shifting a pixel denotes the process of moving the position of a pixel within an otherwise unchanged raster image. Generally, the pixels of the left digital image and the right digital image are shifted towards each other, if the stereoscopic image is to appear closer to the viewer. Conversely, shifting the pixels of the left digital image and the right digital image away from each other will make the stereoscopic image appear further away from the viewer.

Preferably, the data processing device may be configured to assign the disparities to the plurality of stereoscopic images based on a user selection input. That is, a user may choose which disparity is to be assigned to which stereoscopic image via an input device (e.g. the user interface of the medical observation apparatus). Additionally or alternatively, the data processing device may be configured to assign the disparities to the plurality of stereoscopic images automatically.

In order to automatically determine the disparity that is to be assigned to a specific stereoscopic image, the following heuristic rule of thumb may be used: The disparity assigned to a stereoscopic image of the plurality of stereoscopic images is larger, the smaller a sum of non-zero pixels representing the category assigned to the stereoscopic image is in the stereoscopic image.

Thus, a category represented by a stereoscopic image with a large number of pixels will be assigned a smaller disparity, while a category represented by a stereoscopic image with fewer pixels will be assigned a larger disparity. Another rule of thumb may be the following: The larger the extent of a segment in the stereoscopic image is, the smaller the disparity assigned to that stereoscopic image. Herein, the segment represents an area/pixels (in particular non-zero pixels) in the stereoscopic image illustrating the assigned category of the stereoscopic image. In other words, the segment in the stereoscopic image defines an ensemble or group of pixels within the stereoscopic image. Further, the segment may provide location information and/or shape information about the position of an object in the stereoscopic image.

In particular, the segment may be a binary mask in which non-zero pixels, that is, pixels with nonzero value, are distributed according to the location and shape of the segment and zero valued pixels are considered as transparent pixels that do not belong to the segment. When overlaid on a digital image, the binary mask produces a separation between pixels in the digital image that fall within a region/area included by the binary mask, and pixels in the digital image, which are excluded by the binary mask.

Optionally, the single images of each stereoscopic image may be such binary masks in that nonzero pixels represent content, such as part of the determined category assigned to the stereoscopic image, and zero valued pixels in the single images are considered as transparent pixels that do not add up to a visible image contribution when the single image is overlaid with another single image from another stereoscopic image.

According to another embodiment, the plurality of processed stereoscopic images may be overlaid on one another when generating the combined stereoscopic image. In particular, the plurality of processed stereoscopic images may be overlaid on one another along a central axis standing perpendicularly on the image plane of each one of the processed stereoscopic images, and which passes through the center of each one of the processed stereoscopic images. Further, the plurality of processed stereoscopic images may be overlaid in an order determined by the disparity assigned to the processed stereoscopic images. For example, the processed stereoscopic image with the highest assigned disparity may come on top, followed by the remaining processed stereoscopic images in decreasing order of disparity.

Placing the processed stereoscopic image with the highest assigned disparity on top enhances the three-dimensional depth perceived by the viewer. As already briefly mentioned above, the (binocular) disparity refers to the difference in image location of an object seen by the left and right eyes, resulting from the eyes’ horizontal separation (parallax). The brain uses binocular disparity to extract depth information from the two-dimensional retinal images in stereopsis. In other words, the disparity represents a depth or distance at which the viewer will perceive the content of the stereoscopic image when viewing the combined stereoscopic image. The larger the disparity of an object, the closer the object appears to be located to the viewer, hence the highest assigned disparity has the top location in the overlay.

Preferably, the data processing device is configured to output the combined stereoscopic image to the display device of the medical observation apparatus. Accordingly, the computer-implemented method may further comprise the step of outputting the combined stereoscopic image to the display device.

In order to further enhance the three-dimensional depth perceived by the observer, the data processing device may be configured to modify a size scale of a stereoscopic image of the plurality of stereoscopic images based on the disparity assigned to the stereoscopic image. Herein, the size scale represents the angular extent or the lateral extent of the processed stereoscopic image. In particular, the size scale represents a maximal angular diameter which the stereoscopic image will cover on the display device after the plurality of processed stereoscopic images are overlaid to generate the combined stereoscopic image.

For example, the data processing device may be configured to magnify the processed stereoscopic image of the plurality of processed stereoscopic images based on the disparity assigned to the processed stereoscopic image. Herein, magnifying denotes scaling the stereoscopic images by a scaling factor. The scaling factor may be greater or smaller than 1 . When magnifying, respectively, modifying the size scale of a stereoscopic image, the aspect ratio is maintained. That is, the scaled processed stereoscopic image has the same aspect ratio as before scaling was applied.

Preferably, the data processing device is configured to modify the size scale prior to producing the processed stereoscopic images from the stereoscopic images. In other words, the data processing device may be configured to apply the disparity transform after the size scale of a stereoscopic image has been modified. Thus, the modification of the size scale will not create an unwanted additional shift after the disparity transform.

Preferably, the size scale/scaling factor is the smaller, the smaller the disparity assigned to the processed stereoscopic image is. For the viewer, this amplifies the impression that an object shown in a processed stereoscopic image with a small disparity is further away than an object shown in a processed stereoscopic image with a large disparity. Especially for applications utilizing an active viewing system, the data processing device may be configured to receive a user input that represents a viewing position of a user with respect to a display device. Based on the received user input, the data processing device may be configured to adjust the assigned disparities. The adjustment of the assigned disparities may take place when the received user input indicates a change in the viewing angle and/or the viewing distance of the user with respect to the display device.

For example, the user input may be face detection data representing the user’s eyes/gaze. The data processing device may be configured to derive, from the face detection data, a value representing the viewing angle and/or another value representing the viewing distance of the user with respect to the display device. Further, the data processing device may be configured to determine the change in the viewing angle and/or the viewing distance when the value or the other value exceeds a threshold. Said threshold may be derived from previously received user input or face detection data.

Additionally, the data processing device may be configured to determine the change in the viewing angle and/or the viewing distance when the value or the other value lies outside a predetermined range. Said predetermined range may be a neighborhood derived from previously received user input, from the value, or from the other value. The neighborhood, in turn, may be defined based on previously received user input, on previous viewing angles or on previous viewing distances.

The above-mentioned adjustment of the assigned disparities may comprise determining an updated disparity to be assigned to the plurality of stereoscopic images. The updated disparity reflects the changed viewing angle and/or viewing distance and can be used to update the combined stereoscopic image. In other words, the disparity assigned to the plurality of stereoscopic images is updated for each one of the stereoscopic images to reflect the change in the user’s viewing angle and/or viewing distance. That is, the data processing device is configured to dynamically compensate for the user’s change in their viewing angle and/or viewing distance.

In order to activate and deactivate this adjustment of the assigned disparities, the data processing device may be configured to receive a user selection input. As such, the data processing device may be configured to activate or deactivate updating of the combined stereoscopic image based on the received user selection input. According to another possible embodiment, the data processing device may be configured to apply, prior to combining the plurality of processed stereoscopic images, a post-processing transform to at least a subset of the plurality of processed stereoscopic images. Such post-processing transform may be at least one of a transparency adjustment, a brightness adjustment, a depth- of-focus adjustment, a perspective scaling, and a thickening surfaces effect.

The transparency adjustment increases a transparency level of a stereoscopic image when overlaying the processed stereoscopic images. The increased transparency allows seeing through a stereoscopic image and observing underlying stereoscopic images.

The brightness adjustment increases a brightness level of a stereoscopic image when overlaying the processed stereoscopic images. The increased brightness allows highlighting certain stereoscopic images, while fading out other stereoscopic images.

The depth-of-focus adjustment increases a blur level of a stereoscopic image when overlaying the processed stereoscopic images. For example, a blur filter, such as a gauss-filter with a kernel size proportional to the required blur level of the processed stereoscopic image, may be applied to the stereoscopic image. In particular, the same filter is applied to the left image and the right image of the stereoscopic image.

The perspective scaling involves shrinking or magnifying the size/dimensions of the stereoscopic image before overlaying the processed stereoscopic images, whilst maintaining the aspect-ratio of the stereoscopic image. Thereby, smaller objects appear further away than larger objects.

The thickening surfaces effect may comprise cloning pixel values from a midpoint location to an end location of a disparity adjustment. The midpoint location resides between a start location and the end location of the disparity adjustment.

Preferably, the same post-processing transform is applied to each single digital image of the pair of digital images that make up a stereoscopic image or a processed stereoscopic image. The above-mentioned user selection input may also be used to activate or deactivate applying the post-processing transform. Further, the user selection input may be used to choose which postprocessing transform is to be applied.

In particular, the post-processing transform may be applied based on the assigned disparity to the processed stereoscopic image. Further, the data processing device may be configured to designate a stereoscopic image from the plurality of processed stereoscopic images as a background, wherein the disparity assigned to the background is a global minimum or maximum among the disparities assigned to the plurality of stereoscopic images. The background may serve as a reference for the viewer.

Optionally, the data processing device may be configured to increase or decrease a weight of the post-processing transform, when applying the post-processing transform to the processed stereoscopic image, the closer the stereoscopic image is to the background, based on the distance between the background and the stereoscopic image. Herein, a difference between the disparities assigned to two stereoscopic images of the plurality of stereoscopic images may define a distance between the two stereoscopic images.

According to one possible embodiment, the disparity applied to each one of the plurality of stereoscopic images may be different from all the other assigned disparities. That is, each stereoscopic image of the plurality of stereoscopic images may have assigned thereto a unique disparity.

Alternatively, stereoscopic images representing the same ensemble of different categories may have the same disparity assigned thereto. For example, two or more stereoscopic images may be designated as the background and, as such, share the same disparity.

The optical instrument of the medical observation apparatus may be a stereoscopic digital camera, but it may also be a digital camera, a multispectral camera, a hyperspectral camera or a time- of-flight camera. More particularly, the digital camera may be a digital RGB camera and/or a digital reflectance camera. The multispectral camera may comprise a digital fluorescence-light camera and optionally further digital fluorescence-light cameras. The responsivity spectra of the digital fluorescence-light cameras are preferably non-overlapping and in particular complementary to each other.

Regardless of the camera type, the input image data provided by the optical instrument may contain raster images compiled of pixels.

For example, the input image data may contain an RGB image representing the scene and consisting of three layers of equally sized raster images. Each raster image layer represents the color intensity of the scene in one of the respective color channels red, green and blue. Similarly, the input image data may also contain a multispectral or hyperspectral imaging data cube with layers of equal-sized raster images in one of the respective multispectral or hyperspectral channels. For example, the input image data may contain a digital fluorescence-light input image representing the fluorescence of one or more fluorophores present in the scene. In particular, the digital fluorescence-light input image may contain a fluorescence signal representative for the fluorescence of a fluorophore artificially added to the scene (e.g. the fluorophore may be pPIX obtained by administration of 5-ALA for revealing tumors). Further, the digital fluorescencelight input image may also contain an auto-fluorescence signal representative for the fluorescence of a fluorophore naturally occurring in the scene (e.g. human tissue) and a reflectance signal representative of light reflected off the objects in the scene (e.g. fluorophore excitation light).

Likewise, the input image data may contain a three-dimensional image acquired by the time-of- flight camera. The three-dimensional image differs from the RGB images in that the three-dimensional image further includes a range image. The range image provides depth information to each pixel. Thus, the three-dimensional image includes one layer containing the depth information and at least another layer containing the pixel intensities that represent the optical signal received from the scene.

For applications where the input image data contain the multispectral or hyperspectral imaging data cube, the data processing device may be configured to decompose, by spectral unmixing, a mixed pixel in the input image data into a set of endmembers and fractions, wherein each one of the plurality of stereoscopic images represents a different endmember in the set of endmembers obtained from the spectral unmixing. The above-mentioned categories may be one of spectral bands, spectral narrow bands, and spectral broad bands from the multispectral or hyperspectral data.

Spectral unmixing is an analysis method known from satellite imaging. The result of spectral unmixing is a data set containing a collection of endmembers, or constituent spectra, and a set of corresponding fractions, or abundances, that indicate the proportion of each endmember present in the analyzed pixel. In other words, one can tell from the data set what substances are present in the location represented by the analyzed pixel and in what quantity relative to the other present substances.

In other words, each mixed pixel in the input image data may be a mixture of several spectra materials in a scene. The purpose of spectral unmixing is to identify the constituent spectra from the mixed pixels and to calculate the proportion of each constituent spectra in the mixed pixels in order to quantitatively decompose or “unmix” it.

The above-mentioned mixed pixels are a mixture of more than one distinct substance appearing in a single pixel, and they exist for one of two reasons. First, if the spatial resolution of a sensor is low enough that disparate materials can jointly occupy a single pixel, the resulting spectral measurement will be some composite of the individual spectra. Second, mixed pixels can result when distinct materials are combined into a homogeneous mixture. This circumstance can occur independently of the spatial resolution of the sensor.

The endmembers normally correspond to familiar macroscopic or microscopic objects, such as blood vessels, skin tissue, cancerous tissue etc. For the identification of these endmembers, known reference spectra may be used for the spectral unmixing. Consequently, as a result of the spectral unmixing, constituent spectra corresponding to macroscopic or microscopic objects in the scene are obtained. Further, fractions assigned to the constituent spectra, and which are associated with pixels, are obtained. Each of the fractions of a pixel indicates the proportion of a constituent spectrum contributing to the data value of the pixel.

Spectral unmixing, when applied to a fluorescence image, allows to distinguish multiple fluoro- phore signatures originating from the same source location (in case of a digital image, from the same pixel) in the scene. Thus, the data processing device may be configured to decompose the digital fluorescence-light input image into the fluorescence signal, the auto-fluorescence signal and the reflectance signal. In particular, the data processing device may be configured to identify each of the fluorescence signal, the auto-fluorescence signal and the reflectance signal as a separate category. Moreover, the data processing device may be configured to assign to each of the fluorescence signal, the auto-fluorescence signal and the reflectance signal a different disparity.

Additionally or alternatively, the data processing device may be configured to decompose the input image data into objects by image segmentation, and to assign an object to a stereoscopic image of the plurality of the stereoscopic images. Herein, each object has a category associated with it. For example, an object class may be the associated category. Likewise, the categories may be semantic labels of objects identified and/or located in the scene. Thus, each object also has a location in the input image data associated with it. In particular, pixels in the input image data may be associated to the objects.

Preferably, the image segmentation may be a semantic image segmentation. Further, the data processing device may be configured to generate the plurality of stereoscopic images from a result of the semantic image segmentation of the input image data. That is, the data processing device may be configured to generate a new stereoscopic image for each object instance determined in the scene. Alternatively, the data processing device may be configured to join each object instance of the same category in one stereoscopic image.

Accordingly, in the computer-implemented method, the step of analyzing the input image data may further comprise the steps of performing spectral unmixing of the input image data and/or semantic image segmentation on the input image data. Further, the computer-implemented method may comprise the steps of obtaining, as a result of the semantic image segmentation, labels defining object categories identified in the scene, and further obtaining segments (including pixel positions) associated with the labels that identify a location of the object in the scene.

For applications where the input image data is a single two-dimensional input image, the data processing device may be configured to generate image layers, e.g., by means of spectral unmixing and/or semantic image segmentation. All image layers have the same size/dimension and each image layer represents a different category/object determined in the scene. Further, each image layer may represent a binary mask with pixels having zero value that do not represent content in the scene associated with the category/object of the layer. In this case, the data processing device may be configured to generate the plurality of stereoscopic images from the image layers, preferably by duplicating each one of the image layers producing pairs of identical image layers, wherein one image layer from the pair of image layers represents the left image and the other image layer from the pair of identical image layers represents the right image.

In applications where the input image data is a stereoscopic image already comprising a pair of digital input images, the pair of digital input images may be used as the left image and the right image. In this case, the data processing device may be configured to generate from the left image a plurality of left-channel image layers, and to generate from the right image a plurality of rightchannel image layers, each image layer representing a different category/object determined in the scene. In other words, the spectral unmixing and/or semantic image segmentation may be applied to the left image and the right image individually, if a stereoscopic image already exists in the input image data.

When the input image data is a three-dimensional image, the data processing device may be configured to compute a projection of the three-dimensional image to a plane. Thus, a two-dimensional input image is obtained and handled by the data processing device in an analogous manner as for the above-described single input image. Alternatively, when the input image data is a three-dimensional image, the data processing device may be configured to compute two projections to a plane from which a stereoscopic image may be extracted. The stereoscopic image in turn comprises a pair of digital input images. The data processing device is configured to proceed with the extracted stereoscopic image in an analogous manner as in the above-described case when the input image data is a stereoscopic image.

According to another possible embodiment, the data processing device may be configured to receive another user input indicating a disparity value. Further, the data processing device may be configured to update the combined stereoscopic image based on the disparity value, wherein processed stereoscopic images having assigned a disparity exceeding the disparity value are omitted from being overlaid when generating the combined stereoscopic image. This way, the user can exclude certain stereoscopic images from the top, in order to view stereoscopic images lying underneath.

The input image data may also contain multiple input images that jointly represent a series of images. The data processing device may be configured to obtain and analyze these multiple input images individually in a sequential manner or combinedly in parallel.

As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items and may be abbreviated as

The invention will now be described by way of example using sample embodiments, which are shown in the drawings. In the drawings, the same reference numerals are used for features which correspond to each other with respect to at least function and/or design.

The combinations of features shown in the enclosed embodiments are for explanatory purposes only and can be modified. For example, a feature of an embodiment having a technical effect that is not needed for a specific application may be omitted. Likewise, a feature which is not shown to be part of an embodiment may be added if the technical effect associated with this feature is needed for a particular application.

Fig. 1 : shows a schematic representation of a flowchart for a computer-implemented method according to an exemplary embodiment of the present invention;

Fig. 2: shows a schematic representation of a detail of Fig. 1 ;

Fig. 3: shows a schematic representation of a medical observation apparatus according to an exemplary embodiment of the present invention; Fig. 4: shows a schematic representation of a data processing device according to an exemplary embodiment of the present invention;

Fig. 5: shows a schematic representation of the data processing device according to another exemplary embodiment of the present invention;

Fig. 6: shows a schematic representation of the data processing device according to another exemplary embodiment of the present invention;

Fig. 7: shows a schematic representation of enhancement of depth perception by applying disparity;

Fig. 8: shows a schematic representation of enhancement of depth perception by perspective scaling;

Fig. 9: shows a schematic representation of enhancement of depth perception by thickening surfaces;

Fig. 10: shows a schematic representation of enhancement of depth perception by depth-of-fo- cus adjustment;

Fig. 11 : shows a schematic representation of enhancement of depth perception by brightness adjustment;

Fig. 12: shows a schematic representation of enhancement of depth perception by transparency adjustment; and

Fig. 13: shows a schematic representation of a system comprising a microscope.

First, a computer-implemented method 100 is explained with reference to Figs. 1 and 2. Subsequently, the structure and functionality of a data processing device 300 and a medical observation apparatus 304, such as a microscope 310 or an endoscope, is explained with reference to Figs. 3 to 13.

The computer-implemented method 100 is used for processing input image data 115 from the medical observation apparatus 304 in order to enable stereoscopic visualization of the input image data 115. As can be seen in Fig. 1 , the method 100 comprises the step 110 of obtaining the input image data 115 representing a scene 200 imaged by the medical observation apparatus 304. Herein, the input image data 115 may contain raster images 116 compiled of pixels. For example, the input image data 115 may contain a multispectral or hyperspectral imaging data cube 117 with layers of equal-sized raster images in one of the respective multispectral or hyperspectral channels. In particular, the input image data 115 may contain a digital fluorescence-light input image 118 representing the fluorescence of one or more fluorophores 306 present in the scene 200.

Alternatively, the input image data 115 may contain an RGB image 119 representing the scene 200 and consisting of three layers of equally sized raster images. Each raster image layer represents the color intensity of the scene in one of the respective color channels red, green and blue.

Further, the method 100 comprises the step 120 of analyzing the input image data 115 to determine different categories 125 in the imaged scene 200. As part of this analysis, the input image data 115 may be decomposed into objects 225 by image segmentation 126. Herein, each object 225 has a category 125 associated with it. For example, an object class 127 may be the associated category 125.

Preferably, the image segmentation 126 may be a semantic image segmentation 126. Hence, the categories 125 may be semantic labels 129 of objects 225 identified and/or located in the scene 200. Thereby, each object 225 also has a location in the input image data 115 associated with it. In particular, pixels in the input image data 115 may be associated to the objects 225.

The objects 225 may be macroscopic or microscopic. For example, in medical applications, the objects 225 may be blood vessels, skin tissue, cancerous tissue etc. found in the scene 200. For ease of understanding, the objects 225 are symbolized by a cross symbol 202, a smiley symbol 204 and a flash symbol 206 in Fig. 2. As can be seen in Fig. 3, the objects 225 may also mutually overlap. Herein, the objects 225 are a circle 332, a triangle 334 and a striped pattern 336, each with a different color.

For applications where the input image data 115 contain the multispectral or hyperspectral imaging data cube 117, the above-mentioned analysis of the input image data 115 may be done by spectral unmixing 128. This is especially useful, if the objects 225 overlap as is the case in Fig. 3. Consequently, the categories 125 may be spectral bands from the multispectral or hyperspectral data. In the example of Fig. 3, each spectral band belongs to one of the circle 332, triangle 334 or striped pattern 336.

When the input image data 115 contain the multispectral or hyperspectral imaging data cube 117, the image segmentation 126, in particular the semantic image segmentation 126 may also be utilized. In such a case, the spectral layers/channels of the multispectral or hyperspectral imaging data cube 117 may be segmented individually or in combination with each other.

As can further be seen in Fig. 1 , the method 100 also comprises the step 130 of generating a plurality of stereoscopic images 135 from the input image data 115. Each one of the stereoscopic images 135 represents a different category 125 determined in the imaged scene 200. If the image segmentation 126 was applied, then an object 225 may be assigned to each stereoscopic image 135. If spectral unmixing 128 was used, a spectral band may be assigned to each stereoscopic image 135. That is, a new stereoscopic image 135 may be generated for each object 225/spectral band determined in the scene 200.

For applications where the input image data 115 is a two-dimensional input image 324, image layers 326 may be generated, e.g., by means of spectral unmixing 128 and/or semantic image segmentation 126. All image layers 326 have the same size/dimension and each image layer 326 represents a different category 125/object 225 determined in the scene 200.

Moreover, the step 130 of generating the plurality of stereoscopic images 135 that represent the different categories 125 may comprise the step 136 of generating pairs of identical digital images (left digital image L and right digital image R), the digital images L, R having non-zero pixel values in all pixels 312 that represent a category 125 determined in the scene 200. Pixels 314 in the digital image L, R having zero pixel values do not represent one of the categories 125 represented by the non-zero valued pixels 312 in the same digital image L, R (see Fig. 4). In other words, the digital images L, R may be binary masks 316 in that non-zero pixels 312 represent content, such as part of the determined category 125 assigned to the stereoscopic image 135, and zero valued pixels 314 in the digital images L, R are considered as transparent pixels 318 that do not add up to a visible image contribution when the digital images L, R are overlaid with other images from other stereoscopic images. Generating the pairs of identical digital images L, R may be done by duplicating each one of the image layers 326.

The left digital image L is presented to the left eye and the right digital image R is presented to the right eye of a viewer 400. When viewed, the human brain is supposed to perceive the images L, R as a single three-dimensional view 404, giving the viewer 400 the perception of three-dimensional depth. The separate presentation of the images L, R - one for the left eye and one for the right eye -may be incorporated through the use of specialized glasses 406 and/or a specialized display 408. In order to complete this stereoscopic visualization, the images L, R need to have a certain perspective difference. After all, it is this perspective difference between the images L, R seen through the left and right eyes of the viewer 400, the so-called binocular disparity, and the viewer’s accommodation through focusing that creates the three-dimensional view 404.

For this purpose, the method 100 comprises the step 140 of assigning, based on the determined categories 125, a different disparity A, B, C to each of the plurality of stereoscopic images 135 resulting in a plurality of processed stereoscopic images 145. In order to automatically determine the disparity A, B, C which is to be assigned to a stereoscopic image 135, the following heuristic rule of thumb may be used: the disparity A, B, C assigned to a stereoscopic image 135 is larger, the smaller a sum of non-zero pixels 312, which represent the category 125 assigned to the stereoscopic image 135, is in the stereoscopic image 135. Thus, a category 125 represented by a stereoscopic image 135 with a large number of non-zero pixels 312 will be assigned a smaller disparity A, B, C, while a category 125 represented by a stereoscopic image 135 with fewer nonzero pixels 312 will be assigned a larger disparity A, B, C.

In the example shown in Fig. 4, the circle 332 comprises the least non-zero pixels 312, while the striped pattern 336 has the most non-zero pixels 312. Therefore, the circle 332 is assigned a larger disparity A and the striped pattern 336 is assigned a smaller disparity C. The triangle 334 of Fig. 4 is assigned a disparity B between A and C due to its median number of non-zero pixels 312.

Alternatively, the disparities A, B, C may be assigned to the plurality of stereoscopic images 135 based on a user selection input 195. That is, a user 402, in particular the viewer 400, may choose via an input device 328, which disparity A, B, C is to be assigned to which stereoscopic image 135. The input device 328 may be connected to a user interface 330 of the medical observation apparatus 304 being configured to receive the user selection input 195.

The step 140 of assigning the disparities A, B, C, preferably comprises the step 190 of applying the assigned disparities A, B, C to the stereoscopic images 135. In particular, the processed stereoscopic images 145 may be obtained by applying the disparities A, B, C to the stereoscopic images 135.

For example, to each stereoscopic image 135 a disparity transform 410, i.e. a disparity shift 412 may be applied according to the assigned disparity A, B, C. The resulting processed stereoscopic image 145 differs from the original stereoscopic image 135 in that the single digital images L, R (making up each stereoscopic image 135, 145) now have a disparity according to the assigned disparity A, B, C.

The disparity shift 412 may be applied by shifting the pixels in the left digital image L of the stereoscopic image 135 by an amount representing the assigned disparity A, B, C and by shifting the pixels in the right digital image R of the stereoscopic image 135 in an opposite direction by the amount representing the assigned disparity A, B, C (see Fig. 4). Shifting a pixel denotes the process of moving the position of a pixel within the otherwise unchanged raster image 116.

When shifted, the pixels may be moved along a horizontal axis 414 representing the horizontal distance direction of the viewer’s eyes. Generally, the pixels of the left digital image L and the right digital image R are shifted towards each other, if the stereoscopic image 135 is to appear closer to the viewer 400. Conversely, shifting the pixels of the left digital image L and the right digital image R away from each other will make the processed stereoscopic image 145 appear further away from the viewer 400. This is depicted in Fig. 4 on the right side, where the three- dimensional view 404 perceived by the viewer 400 is shown.

For presentation to the viewer 400, the method 100 may comprise the step 150 of combining the plurality of processed stereoscopic images 145 with one another resulting in a combined stereoscopic image 155. In particular, the plurality of processed stereoscopic images 145 may be overlaid on one another when generating the combined stereoscopic image 155. Preferably, the plurality of processed stereoscopic images 145 are overlaid on one another along a central axis standing perpendicularly on the image plane of each one of the processed stereoscopic images 145. Moreover, the central axis passes through the center of each one of the processed stereoscopic images 145. Further, the plurality of processed stereoscopic images 145 may be overlaid in an order determined by the disparity A, B, C assigned to them.

For example, the processed stereoscopic image 145 with the highest assigned disparity A may come on top, followed by the remaining processed stereoscopic images 145 in decreasing order of disparity B, C. In the example of Fig. 4, the circle 332 with the highest disparity A is overlaid over the triangle 334 and the striped pattern 336. The triangle 334 is overlaid over the striped pattern 336, given that its disparity B is higher than the disparity C of the striped pattern 336.

The combined stereoscopic image 155 may be used as output on the above-mentioned specialized display 408. Depending on the type of display, the viewer 400 may have to wear the above- mentioned specialized glasses 406 in order to properly view the combined stereoscopic image 155. The specialized glasses 406 and display 408 may be part of a display device 320 of the medical observation apparatus 304.

The medical observation apparatus 304 may further comprise an optical instrument 322 configured to capture the input image data 115. The optical instrument 322 may be a stereoscopic digital camera, but it may also be a digital camera, a multispectral camera, a hyperspectral camera or a time-of-flight camera. More particularly, the digital camera may be a digital RGB camera and/or a digital reflectance camera. The multispectral camera may comprise a digital fluorescence-light camera and optionally further digital fluorescence-light cameras.

The data processing device 300 may also be part of the medical observation apparatus 304. In particular, the data processing device 300 may be integrated in the microscope 310 as an embedded processor 302 or as part of such embedded processor 302.

The viewer 400 may be given an immersive experience, when the display device 320 utilizes a so-called active viewing system 500. In this case, the data processing device 300 may be configured to receive a user input 175 via the input device 328 and the user interface 330. The user input 175 may represent a viewing position of the viewer with respect to the display device 320. Based on the received user input 175, the data processing device 300 may be configured to adjust the assigned disparities A, B, C. The adjustment 180 of the assigned disparities A, B, C may take place when the received user input 175 indicates a change in a viewing angle and/or a viewing distance of the viewer 400 with respect to the display device 320. This change is indicated with the arrows 502 in Figs. 5 and 6.

For example, the user input 175 may be face detection data 504 received from a face-tracking camera 506 that analyses the viewer’s eyes/gaze. The data processing device 300 and/or the face-tracking camera 506 may be configured to derive, from the face detection data 504, a value 508 representing the viewing angle and/or another value 510 representing the viewing distance of the viewer 400 with respect to the display device 320. Further, the data processing device 300 and/or the face-tracking camera 506 may be configured to determine the change in the viewing angle and/or the viewing distance when the value 508 or the other value 510 exceeds a threshold. Said threshold may be derived from previously received user input 175 or face detection data 504.

The adjustment 180 of the assigned disparities A, B, C may comprise determining an updated disparity A’, B’, C’ to be assigned to the plurality of stereoscopic images 135. The updated disparity A’, B’, C’ reflects the changed viewing angle and/or viewing distance and can be used to update the combined stereoscopic image 155. In other words, the disparity A, B, C assigned to the plurality of stereoscopic images 135 is updated for each one of the stereoscopic images 135 to reflect the change in the viewing angle and/or viewing distance. That is, the data processing device 300 is configured to dynamically compensate for the change in the viewing angle and/or viewing distance.

Optionally, the assigned disparities A, A’, B, B’, C, C’ may differ between the left digital image L and the right digital image R. For example, if the face-tracking camera 506 registers a movement of the viewer to the right side, the disparity A, A’, B, B’, C, C’ assigned to the right digital image R may be larger than the disparity A, A’, B, B’, C, C’ assigned to the left digital image L. Additionally, the updated disparity A’, B’, C’ may be larger, the larger the original disparity A, B, C was. This is shown in Fig. 5 and allows the viewer to “peak behind” the top layer by moving the head sideways.

Moreover, the method 100 may comprise the step 170 of receiving another user input 196 indicating a disparity value. The combined stereoscopic image 155 may then be updated based on the disparity value, wherein processed stereoscopic images 145 having assigned a disparity 230 exceeding the disparity value are omitted from being overlaid when generating the combined stereoscopic image 155 (see Fig. 6).

In order to enhance the three-dimensional depth perceived by the viewer 400, data processing device 300 may be configured to modify a size scale of the processed stereoscopic images 145 based on the respective assigned disparities A, B, C. Herein, the size scale represents the angular extent or the lateral extent of the processed stereoscopic images 145. In particular, the size scale represents a maximal angular diameter which the stereoscopic image will cover on the display device 320 after the plurality of processed stereoscopic images 145 are overlaid to generate the combined stereoscopic image 155.

For example, the data processing device 300 may be configured to magnify the processed stereoscopic images 145 based on their assigned disparity A, B, C. Herein, magnifying denotes scaling the processed stereoscopic images 145 by a scaling factor. The scaling factor may be greater or smaller than 1. When magnifying, respectively, modifying the size scale of a stereoscopic image, the aspect ratio is maintained. That is, the scaled processed stereoscopic image has the same aspect ratio as before the scaling was applied to it.

Preferably, the size scale/scaling factor is the smaller, the smaller the disparity A, B, C assigned to the processed stereoscopic image 145 is. For the viewer, this amplifies the impression that an object 225 shown in a processed stereoscopic image 145 with a small disparity is further away than an object 225 shown in a processed stereoscopic image with a large disparity. In the example of Fig. 6, the striped pattern 336 has the smallest scaling factor applied to it, since its disparity C is also the smallest. The triangle 334 with the second smallest disparity B receives the second smallest scaling factor. The circle 332 with the largest disparity A may remain unchanged or may even be magnified by a scaling factor greater than 1 .

Optionally, the size scale/scaling factor may also be updated based on the face detection data 504, in particular the value 510 representing the viewing distance of the viewer 400 with respect to the display device 320. This is shown in Fig. 6.

Figs. 8 to 12 show embodiments where the data processing device is configured to apply, prior to combining the plurality of processed stereoscopic images 145, a post-processing transform 160 to at least a subset of the plurality of processed stereoscopic images 145. Such post-pro- cessing transform 160 may be a perspective scaling 800 (see Fig. 8), a thickening surfaces effect 900 (see Fig. 9), a depth-of-focus adjustment 1000 (see Fig. 10), a brightness adjustment 1100 (see Fig. 11) and a transparency adjustment 1200 (see Fig. 12).

The perspective scaling 800 involves shrinking or magnifying the size/dimensions of the stereoscopic image 135 before overlaying the processed stereoscopic images 145, whilst maintaining the aspect-ratio of the stereoscopic image 135. Thereby, smaller objects appear further away than larger objects.

The thickening surfaces effect 900 may comprise cloning pixel values from a midpoint location 902 to an end location 904 of a disparity adjustment 700. The midpoint location 902 resides between a start location 906 and the end location 904 of the disparity adjustment 700.

The depth-of-focus adjustment 1000 increases a blur level of a stereoscopic image 135 when overlaying the processed stereoscopic images 145. For example, a blur filter, such as a gaussfilter with a kernel size proportional to the required blur level of the processed stereoscopic image 145, may be applied to the stereoscopic image 135. The stronger the applied blur, the further away an object appears to be.

The brightness adjustment 1100 increases a brightness level of a stereoscopic image 135 when overlaying the processed stereoscopic images 145. The increased brightness allows highlighting certain stereoscopic images in the front, while fading out other stereoscopic images in the back. The transparency adjustment 1200 increases a transparency level of a stereoscopic image 135 when overlaying the processed stereoscopic images 145. The increased transparency allows seeing through a stereoscopic image and observing underlying stereoscopic images.

Preferably, the same post-processing transform 160 is applied to each single digital image 324 of the pair of digital images L, R that make up a stereoscopic image 135 or a processed stereoscopic image 145. For example, the same blur filter is applied to the left digital image L and the right digital image R.

In particular, the post-processing transform 160 may be applied based on the assigned disparity to the processed stereoscopic image 145. Further, the data processing device 300 may be configured to designate a stereoscopic image 135 from the plurality of processed stereoscopic images 135 as a background 908, wherein a disparity D assigned to the background 908 is a global minimum or maximum among the disparities A, B, C, D assigned to the plurality of stereoscopic images 135. The background 908 may serve as a reference for the viewer 400.

Optionally, the data processing device 300 may be configured to increase or decrease a weight of the post-processing transform 160, when applying the post-processing transform 160 to the stereoscopic images 135, the closer the stereoscopic image 135 is to the background 908, based on the distance between the background 908 and the stereoscopic image 135. Herein, a difference between the disparities A, B, C, D assigned to two stereoscopic images 135 of the plurality of stereoscopic images 135 may define a distance between these two stereoscopic images 135.

As can be seen in Fig. 9, the disparity applied to each one of the plurality of stereoscopic images 135 may be different from all the other assigned disparities. That is, each stereoscopic image 135 may have assigned thereto a unique disparity. Therefore, the circle 332, the triangle 334, the striped pattern 336 and the background 908 all appear in different depths of the three-dimensional view 404.

Alternatively, stereoscopic images representing the same ensemble of different categories may have the same disparity assigned thereto. For example, two or more stereoscopic images may be designated as the background 908 and, as such, share the same disparity. In Fig. 7, for example, the background 908 and the striped pattern 336 have the same disparity and thus appear in the same depth of the three-dimensional view 404. Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.

Some embodiments relate to a microscope comprising a system as described in connection with one or more of the Figs. 1 to 12. Alternatively, a microscope may be part of or connected to a system as described in connection with one or more of the Figs. 1 to 12. Fig. 13 shows a schematic illustration of a system 1300 configured to perform a method described herein. The system 1300 comprises a microscope 310 and a computer system 1320. The microscope 310 is configured to take images and is connected to the computer system 1320. The computer system 1320 is configured to execute at least a part of a method described herein. The computer system 1320 may be configured to execute a machine learning algorithm. The computer system 1320 and microscope 310 may be separate entities but can also be integrated together in one common housing. The computer system 1320 may be part of a central processing system of the microscope 310 and/or the computer system 1320 may be part of a subcomponent of the microscope 310, such as a sensor, an actor, a camera or an illumination unit, etc. of the microscope 310.

The computer system 1320 may be a local computer device (e.g. personal computer, laptop, tablet computer or mobile phone) with one or more processors and one or more storage devices or may be a distributed computer system (e.g. a cloud computing system with one or more processors and one or more storage devices distributed at various locations, for example, at a local client and/or one or more remote server farms and/or data centers). The computer system 1320 may comprise any circuit or combination of circuits. In one embodiment, the computer system 1320 may include one or more processors which can be of any type. As used herein, processor may mean any type of computational circuit, such as but not limited to a microprocessor, a microcontroller, a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a graphics processor, a digital signal processor (DSP), multiple core processor, a field programmable gate array (FPGA), for example, of a microscope or a microscope component (e.g. camera) or any other type of processor or processing circuit. Other types of circuits that may be included in the computer system 1320 may be a custom circuit, an application-specific integrated circuit (ASIC), or the like, such as, for example, one or more circuits (such as a communication circuit) for use in wireless devices like mobile telephones, tablet computers, laptop computers, two-way radios, and similar electronic systems. The computer system 1320 may include one or more storage devices, which may include one or more memory elements suitable to the particular application, such as a main memory in the form of random access memory (RAM), one or more hard drives, and/or one or more drives that handle removable media such as compact disks (CD), flash memory cards, digital video disk (DVD), and the like. The computer system 1320 may also include a display device, one or more speakers, and a keyboard and/or controller, which can include a mouse, trackball, touch screen, voice-recognition device, or any other device that permits a system user to input information into and receive information from the computer system 1320.

Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a processor, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.

Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a non-transitory storage medium such as a digital storage medium, for example a floppy disc, a DVD, a Blu-Ray, a CD, a ROM, a PROM, and EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.

Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.

Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may, for example, be stored on a machine readable carrier.

Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.

In other words, an embodiment of the present invention is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer. A further embodiment of the present invention is, therefore, a storage medium (or a data carrier, or a computer-readable medium) comprising, stored thereon, the computer program for performing one of the methods described herein when it is performed by a processor. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary. A further embodiment of the present invention is an apparatus as described herein comprising a processor and the storage medium.

A further embodiment of the invention is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example, via the internet.

A further embodiment comprises a processing means, for example, a computer or a programmable logic device, configured to, or adapted to, perform one of the methods described herein.

A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.

A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.

In some embodiments, a programmable logic device (for example, a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are preferably performed by any hardware apparatus. REFERENCE NUMERALS

100 method

110 step

115 input image data

116 raster image

117 data cube

118 digital fluorescence-light input image

119 RGB image

120 step

125 category

126 image segmentation

127 object class

128 spectral unmixing

129 semantic label

130 step

135 stereoscopic image

136 step

140 step

145 processed stereoscopic image

150 step

155 combined stereoscopic image

160 post-processing transform

170 step

175 user input

180 adjustment

190 step

195 user selection input

196 user input

200 scene

202 cross symbol

204 smiley symbol

206 flash symbol

225 object 230 disparity

300 data processing device

302 embedded processor

304 medical observation apparatus

306 fluorophore

310 microscope

312 non-zero pixel

314 zero pixel

316 binary mask

318 transparent pixel

320 display device

322 optical instrument

324 two-dimensional input image

326 image layer

328 input device

330 user interface

332 circle

334 triangle

336 striped pattern

400 viewer

402 user

404 three-dimensional view

406 specialized glasses

408 specialized display

410 disparity transform

412 disparity shift

414 horizontal axis

500 active viewing system

502 arrow

504 face detection data

506 face-tracking camera

508 value 510 value

700 disparity adjustment

800 perspective scaling

900 thickening surfaces effect

902 midpoint location

904 end location

906 start location

908 background

1000 depth-of-focus adjustment

1100 brightness adjustment

1200 transparency adjustment

1300 system

1320 computer system

A, B, C disparity

A’, B’, C’ updated disparity

D disparity assigned to the background

L left digital image

R right digital image

Claims

1. A data processing device (300) for a medical observation apparatus, such as an endoscope or microscope, the data processing device being configured to:

- obtain (110) input image data (115), the input image data representing a scene acquired by the medical observation apparatus;

- analyze (120) the input image data to determine different categories (125) in the scene;

- generate (130) a plurality of stereoscopic images (135) from the input image data (115), each one of the stereoscopic images representing a different category (125) determined in the scene;

- assign (140), based on the determined categories, a different disparity to each of the plurality of stereoscopic images to produce a plurality of processed stereoscopic images (145); and

- combine (150) the plurality of processed stereoscopic images (145) to generate a combined stereoscopic image (155).

2. The data processing device (300) of claim 1 , further being configured to modify (160) a size scale of a stereoscopic image of the plurality of stereoscopic images (135) based on the assigned disparity to the stereoscopic image.

3. The data processing device (300) of claim 1 or 2, further being configured to

- receive (170) a user input (175) that represents a viewing position of a user with respect to a display device, and

- adjust (180) the assigned disparities based on the received user input (175).

4. The data processing device (300) according to any one of claims 1 to 3, further being configured to:

- apply (160), prior to the combining (150), a post-processing transform to at least a subset of the plurality of processed stereoscopic images (145).

5. The data processing device (300) according to claim 4, wherein the post-processing transform (160) is at least one of: a transparency adjustment, a brightness adjustment, a depth- of-focus adjustment, a perspective scaling, and a thickening surfaces effect.

6. The data processing device (300) according to claim 4 or 5, wherein the post-processing transform (160) is applied based on the assigned disparity to the processed stereoscopic image (145), and the data processing device is configured to

- designate a stereoscopic image from the plurality of processed stereoscopic images (145) as a background, wherein the disparity assigned to the background is a global minimum or maximum among the disparities assigned (230) to the plurality of stereoscopic images (135), wherein a difference between the disparities (230) assigned to two stereoscopic images of the plurality of stereoscopic images (135) defines a distance between the two stereoscopic images; and

- increase or decrease a weight of the post-processing transform (160), when applying the post-processing transform to the processed stereoscopic image, the closer the stereoscopic image is to the background, based on the distance between the background and the stereoscopic image.

7. The data processing device (300) according to any one of claims 1 to 6, further being configured to decompose (130), by spectral unmixing, a mixed pixel in the input image data (115) into a set of endmembers and fractions, wherein each one of the plurality of stereoscopic images (135) represents a different endmember in the set of endmembers obtained from the spectral unmixing.

8. The data processing device (300) according to any one of claims 1 to 7, further being configured to decompose (130) the input image data (115) into objects by image segmentation, and assign an object to a stereoscopic image of the plurality of the stereoscopic images (135).

9. The data processing device (300) according to claim 8, wherein the image segmentation is a semantic image segmentation.

10. The data processing device (300) according to any one of claims 3 to 9, further being configured to

- receive (170) a user selection input (195); and

- when the data processing device further is at least dependent on claim 3, deactivate adjusting (180) of the combined stereoscopic image (155) based on the received user selection input (195); and/or

- when the data processing device further is at least dependent on claim 4, deactivate applying the post-processing transform (160) based on the received user selection input (195).

11. The data processing device (300) according to any one of claims 1 to 10, further being configured to

- receive (170) another user input (196) indicating a disparity value; and

- update (180) the combined stereoscopic image (155) based on the disparity value (196), wherein processed stereoscopic images (145) having assigned a disparity (A, B, C) exceeding the disparity value are omitted from being overlaid (150) when generating the combined stereoscopic image (155).

12. A medical observation apparatus (304) comprising a data processing device according to any one of claims 1 to 11 , further comprising: an optical instrument being configured to

- capture the input image data (115); and

- provide the input image data (115) to the data processing device; a user interface being configured to receive a user input (175, 196) and/or a user selection input (195); and a display device being configured to receive the combined stereoscopic image (155).

13. A computer-implemented method (100) for processing input image data from a medical observation apparatus, such as a microscope or endoscope, the method comprising the steps of:

- obtaining (110) the input image data (115) representing a scene imaged by the medical observation apparatus;

- analyzing (120) the input image data (115) to determine different categories (125) in the imaged scene;

- generating (130) a plurality of stereoscopic images (135) from the input image data (115), each one of the stereoscopic images representing a different category (125) determined in the imaged scene;

- assigning (140), based on the determined categories, a different disparity (A-C) to each of the plurality of stereoscopic images (135) resulting in a plurality of processed stereoscopic images (145); and

- combining (150) the plurality of processed stereoscopic images with one another resulting in a combined stereoscopic image (155).

14. The computer-implemented method (100) of claim 13, wherein the analyzing of the input image data further comprises: performing semantic image segmentation on the input image data (115); and/or spectral unmixing of the input image data (115).

15. A computer-readable medium comprising instructions, which, when executed by a computer, cause the computer to carry out the method of claim 13 or 14.

16. A computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of claim 13 or 14.