CN111813532A

CN111813532A - Image management method and device based on multitask machine learning model

Info

Publication number: CN111813532A
Application number: CN202010923050.2A
Authority: CN
Inventors: 黄迎松; 徐飞翔; 白琨
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2020-10-23
Anticipated expiration: 2040-09-04
Also published as: CN111813532B

Abstract

The application discloses an image management method and device based on a multitask machine learning model, and relates to an artificial intelligence machine learning technology. By acquiring image data; then inputting the target image into a shared feature expression network in a multi-task machine learning model to obtain task output features; respectively inputting the task output characteristics into a plurality of subtask networks to obtain an identification result; and then processing the image data based on the recognition result. The image management process based on machine learning is realized, and the features for executing a plurality of tasks are extracted through the shared feature expression network, and corresponding tasks are executed by adopting different subtask networks respectively, so that the execution process of the image management tasks can be realized through the multi-task machine learning model, and the image management efficiency is improved.

Description

Image management method and device based on multitask machine learning model

Technical Field

The application relates to the technical field of computers, in particular to an image management method and device based on a multitask machine learning model.

Background

With the popularization of smart phones and the increasing volume of photos of mobile phones, more and more people have the required tasks of arranging the photos of the mobile phones in real time, and the required tasks mainly have the following remarkable characteristics. Firstly, the task relevance is strong; the scene type of the photo is greatly related to the main objects in the photo, and various quality indexes of the photo, such as the blur degree and the exposure degree, are also related to the scene of the photo. Besides, photo scene recognition, object detection and photo quality evaluation all depend on the texture and contour light characteristics of the photos. In addition, task processing has high requirements on model size and computational complexity. The model needs to be deployed at the mobile terminal, so that the occupied storage size and the calculation time consumption of the model are strongly limited.

Generally, for each single task involved in the image arrangement process, a process of respectively establishing corresponding machine learning models for identification or detection is adopted.

However, the image arrangement process involves a large amount of tasks, and a single machine learning model is designed for each task, so that a large number of system resources are occupied, and the preprocessing time and the flow of the multiple models are complex in the design process, which affects the efficiency of image management.

Disclosure of Invention

In view of this, the present application provides an image management method based on machine learning, which can effectively improve the efficiency of image management.

A first aspect of the present application provides an image management method based on machine learning, which can be applied to a system or a program that includes an image management function based on machine learning in a terminal device, and specifically includes:

acquiring image data, wherein the image data comprises a plurality of target images;

inputting the target image into a shared feature expression network in a multi-task machine learning model to obtain task output features, wherein the shared feature expression network comprises a feature expression layer and a plurality of sequentially associated sub-network layers, the sub-network layers are used for generating sub-task features under different resolutions, and the task output features are obtained by processing based on the sub-task features;

inputting the task output features into a plurality of subtask networks in the multitask machine learning model respectively to obtain recognition results, wherein the subtask networks are associated with the shared feature expression network and correspond to the sub network layers;

and processing the image data based on the identification result.

Optionally, in some possible implementations of the present application, the respectively inputting the task output features into a plurality of subtask networks in the multitask machine learning model to obtain a recognition result includes:

determining a target task corresponding to the subtask network, wherein the target task comprises a target detection task, a scene recognition task or an image quality evaluation task;

and extracting characteristic parameters in the task output characteristics based on the target task to obtain the identification result.

Optionally, in some possible implementation manners of the present application, the extracting, based on the target task, feature parameters in the task output features to obtain the recognition result includes:

if the target task is a target detection task, analyzing a feature map indicated in the task output features based on the target detection task to obtain a resolution and a detection frame corresponding to the feature map;

distributing the detection frame to the feature maps under the corresponding respective rates to obtain marked feature maps;

and constructing a detection frame regression network and a detection frame classification network according to the marked feature map so as to obtain the identification result.

Optionally, in some possible implementation manners of the present application, the constructing a detection frame regression network and a detection frame classification network according to the labeled feature map to obtain the identification result includes:

constructing a detection frame regression network according to the marked feature map to obtain a target position of a detection frame;

inputting the target position of the detection frame into a detection frame classification network to obtain the image category;

and dividing the image data based on the image category to obtain the identification result.

if the target task is a target detection task which is a scene recognition task or an image quality evaluation task, analyzing the feature graph indicated in the task output features based on the target task;

and respectively inputting the feature maps into a classification network corresponding to the subtask network to obtain the identification result, wherein the classification network is set based on the scene identification task or the image quality evaluation task.

Optionally, in some possible implementation manners of the present application, the respectively inputting the feature maps into the classification networks corresponding to the subtask networks to obtain the recognition result includes:

respectively inputting the feature maps into classification networks corresponding to the subtask networks to obtain a classification result set;

and smoothing the prediction probability value indicated in the classification result set to obtain the identification result.

Optionally, in some possible implementations of the present application, the method further includes:

acquiring a convolution tensor corresponding to the feature expression layer;

decomposing a convolutional layer contained in the feature expression layer based on the convolution tensor to obtain a low-rank representation of the feature expression layer;

updating the feature expression layer according to the low rank representation.

acquiring an image training set associated with the target photo album;

calling a target loss function based on the training set to train the multitask machine learning model so as to update the multitask machine learning model, wherein the target loss function comprises a sub-loss function corresponding to the sub-network layer, and the sub-loss function is used for indicating the composition of the target loss function.

Optionally, in some possible implementation manners of the present application, the invoking a target loss function based on the training set to train the multitask machine learning model so as to update the multitask machine learning model includes:

calling a corresponding sub-loss function according to the type of the target task;

performing a weighted calculation based on the sub-loss functions to obtain a target loss function corresponding to the sub-network layer;

and calling a target loss function based on the training set to train the multi-task machine learning model so as to update the multi-task machine learning model.

traversing a corresponding pre-training detection model based on the shared feature expression network;

and calling model parameters of the pre-training detection model to update the parameters of the shared feature expression network.

Optionally, in some possible implementation manners of the present application, the traversing the corresponding pre-training detection model based on the shared feature expression network includes:

acquiring label information corresponding to the target image input into the shared feature expression network;

and calling the pre-training detection model according to the label information.

Optionally, in some possible implementation manners of the present application, the image data is album data, the feature expression layer is a residual network, the sub-network layer is a branch of a feature pyramid network, and processing the image data based on the identification result includes:

and performing image classification, image search or image association on the album data based on the identification result.

A second aspect of the present application provides an image management apparatus based on machine learning, including: an acquisition unit configured to acquire image data including a plurality of target images;

the input unit is used for inputting the target image into a shared feature expression network in a multitask machine learning model so as to obtain task output features, the shared feature expression network comprises a feature expression layer and a plurality of sequentially associated sub-network layers, the sub-network layers are used for generating sub-task features under different resolutions, and the task output features are obtained by processing based on the sub-task features;

the recognition unit is used for respectively inputting the task output characteristics into a plurality of subtask networks in the multitask machine learning model to obtain recognition results, the subtask networks are associated with the shared characteristic expression network, and the subtask networks correspond to the sub network layers;

and the management unit is used for processing the image data based on the identification result.

Optionally, in some possible implementation manners of the present application, the identifying unit is specifically configured to determine a target task corresponding to the subtask network, where the target task includes a target detection task, a scene identification task, or an image quality evaluation task;

the identification unit is specifically configured to extract feature parameters in the task output features based on the target task to obtain the identification result.

Optionally, in some possible implementation manners of the present application, the identifying unit is specifically configured to, if the target task is a target detection task, analyze a feature map indicated in the task output feature based on the target detection task to obtain a resolution and a detection frame corresponding to the feature map;

the identification unit is specifically configured to assign the detection frame to the feature maps at the corresponding respective rates to obtain labeled feature maps;

the identification unit is specifically configured to construct a detection frame regression network and a detection frame classification network according to the labeled feature map to obtain the identification result.

Optionally, in some possible implementation manners of the present application, the identification unit is specifically configured to construct a detection frame regression network according to the labeled feature map, so as to obtain a target position of a detection frame;

the identification unit is specifically used for inputting the target position of the detection frame into a detection frame classification network to obtain the image category;

the identification unit is specifically configured to divide the image data based on the image category to obtain the identification result.

Optionally, in some possible implementation manners of the present application, the identifying unit is specifically configured to, if the target task is a target detection task and is a scene identification task or an image quality assessment task, analyze the feature map indicated in the task output feature based on the target task;

the recognition unit is specifically configured to input the feature maps into classification networks corresponding to the subtask networks, respectively, to obtain the recognition results, where the classification networks are set based on the scene recognition task or the image quality assessment task.

Optionally, in some possible implementation manners of the present application, the identifying unit is specifically configured to input the feature maps into classification networks corresponding to the subtask networks, respectively, so as to obtain a classification result set;

the identification unit is specifically configured to perform smoothing processing on the prediction probability value indicated in the classification result set to obtain the identification result.

Optionally, in some possible implementations of the present application, the management unit is specifically configured to obtain a convolution tensor corresponding to the feature expression layer;

the management unit is specifically configured to decompose a convolutional layer included in the feature expression layer based on the convolution tensor to obtain a low-rank representation of the feature expression layer;

the management unit is specifically configured to update the feature expression layer according to the low-rank representation.

Optionally, in some possible implementation manners of the present application, the management unit is specifically configured to obtain an image training set associated with the target album;

the management unit is specifically configured to invoke a target loss function based on the training set to train the multitask machine learning model so as to update the multitask machine learning model, where the target loss function includes a sub-loss function corresponding to the sub-network layer, and the sub-loss function is used to indicate a composition of the target loss function.

Optionally, in some possible implementation manners of the present application, the management unit is specifically configured to call a corresponding sub-loss function according to the type of the target task;

the management unit is specifically configured to perform weighting calculation based on the sub-loss functions to obtain target loss functions corresponding to the sub-network layers;

the management unit is specifically configured to invoke a target loss function based on the training set to train the multi-task machine learning model, so as to update the multi-task machine learning model.

Optionally, in some possible implementation manners of the present application, the management unit is specifically configured to traverse a corresponding pre-training detection model based on the shared feature expression network;

the management unit is specifically configured to invoke the model parameters of the pre-training detection model to update the parameters of the shared feature expression network.

Optionally, in some possible implementation manners of the present application, the management unit is specifically configured to acquire tag information corresponding to the target image input to the shared feature expression network;

the management unit is specifically configured to call the pre-training detection model according to the tag information.

Optionally, in some possible implementation manners of the present application, the image data is album data, the feature expression layer is a residual network, the sub-network layer is a branch of a feature pyramid network, and the management unit is specifically configured to perform image classification, image search, or image association on the album data based on the identification result.

A third aspect of the present application provides a computer device comprising: a memory, a processor, and a bus system; the memory is used for storing program codes; the processor is configured to execute the image management method based on machine learning according to any one of the first aspect or the first aspect.

A fourth aspect of the present application provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to execute the machine learning-based image management method of the first aspect or any one of the first aspects.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. A processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the machine learning-based image management method provided in the first aspect or the various alternative implementations of the first aspect.

According to the technical scheme, the embodiment of the application has the following advantages:

acquiring image data of a target album, wherein the image data comprises a plurality of target images; then inputting the target image into a shared feature expression network in the multi-task machine learning model, or calling the shared feature expression network in the multi-task machine learning model to process the target image so as to obtain task output features, wherein the shared feature expression network comprises a feature expression layer and a plurality of sequentially associated sub-network layers, the sub-network layers are used for generating sub-task features under different resolutions, the task output features are obtained by processing the sub-task features, and the multi-task machine learning model comprises a shared feature expression network and a plurality of sub-task networks which are associated with each other; further, the task output characteristics are respectively input into a plurality of subtask networks in the multi-task machine learning model to obtain an identification result, the subtask networks are associated with the shared characteristic expression network, and the subtask networks correspond to the sub network layers; and then processing the image data based on the identification result to classify the images in the target album. The image management process based on machine learning is realized, and the task features for executing a plurality of tasks are extracted through the shared feature expression network, and the corresponding tasks are executed by adopting different subtask networks respectively, so that the execution process of the image management tasks can be realized through the multi-task machine learning model, and the image management efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a diagram of a network architecture in which an image management system operates;

fig. 2 is a flowchart of image management based on machine learning according to an embodiment of the present disclosure;

fig. 3 is a flowchart of an image management method based on machine learning according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a machine learning model according to an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of another machine learning model provided in an embodiment of the present application;

FIG. 6 is a flowchart of another image management method based on machine learning according to an embodiment of the present disclosure;

FIG. 7 is a flowchart of another image management method based on machine learning according to an embodiment of the present disclosure;

fig. 8 is a scene schematic diagram of an image management method based on machine learning according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an image management apparatus based on machine learning according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides an image management method based on machine learning and a related device, which can be applied to a system or a program containing an image management function based on machine learning in terminal equipment, and the image data comprises a plurality of target images by acquiring the image data of a target album; then inputting the target image into a shared feature expression network in the multi-task machine learning model, or calling the shared feature expression network in the multi-task machine learning model to process the target image so as to obtain task output features, wherein the shared feature expression network comprises a feature expression layer and a plurality of sequentially associated sub-network layers, the sub-network layers are used for generating sub-task features under different resolutions, the task output features are obtained by processing the sub-task features, and the multi-task machine learning model comprises a shared feature expression network and a plurality of sub-task networks which are associated with each other; further, the task output characteristics are respectively input into a plurality of subtask networks in the multi-task machine learning model to obtain an identification result, the subtask networks are associated with the shared characteristic expression network, and the subtask networks correspond to the sub network layers; and then processing the image data based on the identification result to classify the images in the target album. The image management process based on machine learning is realized, and the task features for executing a plurality of tasks are extracted through the shared feature expression network, and the corresponding tasks are executed by adopting different subtask networks respectively, so that the execution process of the image management tasks can be realized through the multi-task machine learning model, and the image management efficiency is improved.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some nouns that may appear in the embodiments of the present application are explained.

Multitask machine learning model: a machine learning method that learns across multiple related tasks based on a shared representation.

And (3) characterization learning: the method also refers to feature learning, which is a form that the learning automatically converts the original data into the data which can be used by machine learning, and avoids manual feature extraction.

Residual network (Resnet): and connecting different characteristic layers through a jump layer to form a residual block network so as to learn a convolution network structure of residual characteristics.

Feature Pyramid Network (FPN): a network structure designed according to a characteristic pyramid concept and fusing multilayer characteristics to improve convolutional network characteristic extraction is provided.

Tensor low rank analysis: a low rank tensor is used to approximate the minimization of the original tensor.

It should be understood that the image management method based on machine learning provided by the present application may be applied to a system or a program including an image management function based on machine learning in a terminal device, such as an album manager, specifically, the image management system may operate in a network architecture as shown in fig. 1, which is a network architecture diagram operated by the image management system as shown in fig. 1, and as can be seen from the figure, the image management system may provide the image management function with multiple information sources, that is, receive image data sent by a terminal, further input a machine learning model to perform processes such as identification and classification, and return the processed result to the terminal; it is understood that, fig. 1 shows various terminal devices, in an actual scene, more or fewer types of terminal devices may participate in the process of image management based on machine learning, and the specific number and type are determined by the actual scene, and are not limited herein.

In this embodiment, the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

It is understood that the image management system described above may be run on a personal mobile terminal, such as: the application can be operated on a server as an album manager, and can also be operated on a third-party device to provide image management based on machine learning so as to obtain the image management processing result based on machine learning of an information source; the specific image management system may be operated in the device in the form of a program, may also be operated as a system component in the device, and may also be used as one of cloud service programs, and a specific operation mode is determined according to an actual scene, which is not limited herein.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

In order to solve the above problems, the present application proposes a method for managing images based on machine learning, which is applied to a flow framework of image management based on machine learning shown in fig. 2, as shown in fig. 2, for a flow framework of image management based on machine learning provided in an embodiment of the present application, a user selects related images through an interface layer, and if the images need to be automatically classified, a multitask machine learning model of an application layer may be called to perform a multitask identification process on the images, and features of the images are shared among the tasks, thereby improving efficiency of image management.

It can be understood that the method provided by the present application may be a program written as a processing logic in a hardware system, or may be an image management apparatus based on machine learning, and the processing logic is implemented in an integrated or external manner. The machine learning-based image management device acquires image data of a target album, wherein the image data comprises a plurality of target images; then inputting the target image into a shared feature expression network in the multi-task machine learning model, or calling the shared feature expression network in the multi-task machine learning model to process the target image so as to obtain task output features, wherein the shared feature expression network comprises a feature expression layer and a plurality of sequentially associated sub-network layers, the sub-network layers are used for generating sub-task features under different resolutions, the task output features are obtained by processing the sub-task features, and the multi-task machine learning model comprises a shared feature expression network and a plurality of sub-task networks which are associated with each other; further, the task output characteristics are respectively input into a plurality of subtask networks in the multi-task machine learning model to obtain an identification result, the subtask networks are associated with the shared characteristic expression network, and the subtask networks correspond to the sub network layers; and then processing the image data based on the identification result to classify the images in the target album. The image management process based on machine learning is realized, and the task features for executing a plurality of tasks are extracted through the shared feature expression network, and the corresponding tasks are executed by adopting different subtask networks respectively, so that the execution process of the image management tasks can be realized through the multi-task machine learning model, and the image management efficiency is improved.

The scheme provided by the embodiment of the application relates to an artificial intelligence machine learning technology, and is specifically explained by the following embodiment:

with reference to the above flow architecture, the following describes an image management method based on machine learning in the present application, please refer to fig. 3, where fig. 3 is a flow chart of an image management method based on machine learning according to an embodiment of the present application, where the management method may be executed by a terminal device, may also be executed by a server, and may also be executed by both of them, and the following describes an embodiment of the present application by taking the terminal device as an example, where the embodiment of the present application at least includes the following steps:

301. and acquiring image data of the target album.

In the present embodiment, the image data includes a plurality of target images; specifically, the image data is a data set including a plurality of images, which may be a single image data set; it may also contain data in other content forms, for example, for the processing of image data in a newsfeed, the specific image form depends on the actual scene.

302. And inputting the target image into a shared feature expression network in the multi-task machine learning model, or calling the shared feature expression network in the multi-task machine learning model to process the target image so as to obtain task output features.

In this embodiment, the shared feature expression network includes a feature expression layer and a plurality of sequentially associated sub-network layers, the sub-network layers are used for generating sub-task features at different resolutions, and the task output features are obtained based on sub-task feature processing; the feature expression layer is a convolutional neural network and is specifically used for feature extraction of images, parameters among the sub network layers are shared, so that features of the images can be accurately and efficiently extracted, in a possible scene, a plurality of sequentially associated sub network layers are structures of a feature pyramid network, each associated branch of the feature pyramid network is a sub network layer and is used for generating sub task features under different resolutions, and therefore the image processing efficiency is improved.

Specifically, for the application of the multitask machine learning model, the target image can be input into the multitask machine learning model, and the multitask machine learning model can be used as a management plug-in an album; the multitask machine learning model can be called when the target image is determined, and the multitask machine learning model can be used as a model in third-party album processing software at the moment, and the specific mode is determined according to the actual scene.

It can be understood that the feature expression layer and the sub-network layer are a backbone network for constructing a sub-task network, and after the feature extraction is performed on the target image through the backbone network, the execution process of a plurality of different tasks can be conveniently performed.

Optionally, the setting of the plurality of different tasks may be related subtasks, such as an image recognition task and an image classification task; therefore, the selection of the target task may be obtained by merging similar subtasks, or may be performed by clustering a plurality of subtasks, and performing the multitask processing process in this embodiment on the subtasks of the same category, so as to improve the task processing efficiency.

303. And respectively inputting the task output characteristics into a plurality of subtask networks in the multi-task machine learning model to obtain a recognition result.

In this embodiment, the subtask network is associated with the shared feature expression network, and the subtask network corresponds to the sub network layer; for the association relationship of the subtask network, refer to fig. 4, where fig. 4 is a schematic structural diagram of a machine learning model provided in the embodiment of the present application; the figure shows that the sub-network layer is a component of a shared feature expression network, the sub-network layers are mutually related so as to share parameters, and the input of the sub-network layer is a feature map of an extracted target image of the feature expression layer; after feature extraction is carried out through the shared feature expression network, the feature extraction can be input into different subtask networks with similarity, and therefore corresponding recognition results are obtained.

Specifically, in a possible scenario, the shared feature expression network is composed of a residual error network (a feature expression layer) and a feature pyramid network (a combination of a plurality of sub-network layers), specifically, as shown in fig. 5, fig. 5 is a schematic structural diagram of another machine learning model provided in the embodiment of the present application; the figure shows that the feature expression layer contains three convolutional layers with parameters 32 x 416, 64 x 208, 128 x 104; the feature pyramid network is composed of three layers, namely, three sub-network layers which are mutually related are included, and corresponding size parameters are 256 × 52, 256 × 26 and 512 × 13, so that the task output features can be obtained through the shared feature expression network, and different sub-tasks are executed based on the task output features.

Specifically, in the image management process of the album application, the target task may include a target detection task, a scene recognition task, or an image quality evaluation task, so that the determination of the recognition record also corresponds to the type of the target task; firstly, determining a target task corresponding to a subtask network; and then extracting characteristic parameters in the task output characteristics based on the target task to obtain a recognition result. Therefore, the accuracy of the output result of the subtask network is ensured.

Optionally, when the target task is the target detection task, the feature map indicated in the task output features may be analyzed based on the target detection task to obtain a resolution and a detection frame corresponding to the feature map; then, the detection frames are distributed to the feature maps under the corresponding respective rates to obtain marked feature maps; and then constructing a detection frame regression network and a detection frame classification network according to the marked feature map so as to obtain an identification result. The regression network of the detection frame is used for reducing the disturbance range of the detection frame and ensuring the accuracy of the detection frame; the detection frame classification network is used for identifying the objects contained in the detection frame and classifying the images based on the objects, so that the accuracy of image identification is ensured.

Specifically, for the utilization of the detection frame regression network and the detection frame classification network, the detection frame regression network is firstly constructed according to the marked feature map to obtain the target position of the detection frame; then inputting the target position of the detection frame into a detection frame classification network to obtain the image category; and then the image data is divided based on the image category to obtain the identification result, thereby ensuring the accuracy of image identification.

In the scenario shown in fig. 5, each task sub-network layer corresponding to each layer of feature map in the feature pyramid has an influence on the final prediction result. Therefore, for the target detection task, the finally output detection frame is the set of the sub-network prediction results corresponding to all the layer feature maps.

Optionally, when the target task is a scene recognition task or an image quality evaluation task, analyzing a feature map indicated in the task output features based on the target task; and then, respectively inputting the feature maps into the classification networks corresponding to the subtask networks to obtain an identification result. The classification network is set based on a scene recognition task or an image quality evaluation task, namely, the classification network outputs the probability that a target image belongs to a target scene or meets the preset image quality.

In addition, for the output of the classification network, because the prediction results corresponding to different sub-network layers may be different, the feature maps output by different sub-network layers may be respectively input into the classification networks corresponding to the sub-task networks to obtain a classification result set; and then smoothing the prediction probability value indicated in the classification result set to obtain an identification result.

Specifically, for the scene classification task and the picture quality evaluation task, the final output probability value may be a smooth result of the sub-network prediction probability values corresponding to all the sub-network layer feature maps, so as to ensure the accuracy of the probability prediction, and the specific formula is as follows:

where n is the number of prediction tasks, i is the number of sub-network layers, P_nThat is, the prediction probability value of each sub-network, the ensemble function may be an average value, and may also be other calculation manners in some possible scenarios, for example, taking a median, which is not limited herein.

In a possible scenario, when the ensemble function is used for averaging, the calculation formula of the finally output probability value is as follows:

wherein N is the total number of layers of the feature pyramid network, for example N = 3; and i is the layer number of the feature pyramid network where the current device is located.

With the above embodiments, based on the network structure shown in fig. 5, different tasks share the network and parameters of the feature expression part, and each branch processes the detection and the corresponding classification task respectively. Wherein the feature expression network comprises a multi-layer downsampling convolution (resnet) and a feature pyramid network structure. And decomposing each embedded subtask based on the output characteristics of each layer of the characteristic pyramid. And for the condition that the subtask is a target detection task, uniformly distributing the frames to be detected to the feature maps under the corresponding resolutions according to the feature resolutions corresponding to the feature maps of different layers and the sizes of the frames to be detected, and constructing a detection frame regression network and a detection frame classification network by using the distributed feature maps. And for the condition that the subtasks are scene identification and picture quality evaluation tasks, constructing a corresponding classification network for each layer in the characteristic pyramid.

Therefore, a plurality of tasks are processed by one network simultaneously, a plurality of models are prevented from being deployed at a mobile terminal, the total calculation complexity of the models is reduced, and the real-time performance of task processing is improved.

304. And processing the image data based on the identification result to classify the images in the target album.

In this embodiment, the recognition result may include labels such as scene recognition, object detection, and picture quality score of the image, so the corresponding processing manner may be operations related to classification and retrieval. For example, in the application of photo album managers, the processing mode comprises the services of photo automatic classification, photo story generation, photo search and the like; specifically, the photo album manager can classify photos according to scenes and important objects of the photos, and support functions such as story generation and tag search.

With reference to the above embodiments, by acquiring image data of a target album, the image data includes a plurality of target images; then inputting the target image into a shared feature expression network in the multi-task machine learning model, or calling the shared feature expression network in the multi-task machine learning model to process the target image so as to obtain task output features, wherein the shared feature expression network comprises a feature expression layer and a plurality of sequentially associated sub-network layers, the sub-network layers are used for generating sub-task features under different resolutions, the task output features are obtained by processing the sub-task features, and the multi-task machine learning model comprises a shared feature expression network and a plurality of sub-task networks which are associated with each other; further, the task output characteristics are respectively input into a plurality of subtask networks in the multi-task machine learning model to obtain an identification result, the subtask networks are associated with the shared characteristic expression network, and the subtask networks correspond to the sub network layers; and then processing the image data based on the identification result to classify the images in the target album. The image management process based on machine learning is realized, and the task features for executing a plurality of tasks are extracted through the shared feature expression network, and the corresponding tasks are executed by adopting different subtask networks respectively, so that the execution process of the image management tasks can be realized through the multi-task machine learning model, and the image management efficiency is improved.

The above embodiment introduces the application process of the multitask machine learning model, and in order to ensure the accuracy of the recognition of the multitask machine learning model, the multitask machine learning model can be trained, and the following description will be given on the scenario. Referring to fig. 6, fig. 6 is a flowchart of another image management method based on machine learning according to an embodiment of the present application, where the embodiment of the present application at least includes the following steps:

601. and constructing a multi-task machine learning model.

In this embodiment, the multitask machine learning includes a shared feature expression network and a plurality of subtask networks.

Specifically, since the multi-task machine learning model needs to perform operations on multiple tasks, the influence of different task performing operations needs to be considered in the training process. Therefore, for the design of the target loss function in the training process, the corresponding sub-loss function can be called according to the type of the target task; then, performing weighted calculation based on the sub-loss functions to obtain target loss functions corresponding to the sub-network layers; and then calling a target loss function based on the training set to train the multi-task machine learning model so as to update the multi-task machine learning model.

The following describes task setting in image scene management, and a specific task form depends on an actual scene, which is not limited herein.

602. An image training set associated with a target album is obtained.

In this embodiment, the training set includes a labeled image, for example, an image labeled with a target object and an object detection frame, and the target object is set with label information, that is, the data in the training set is the data in the validation set, and the data in the validation set is set with the label information required for executing the target task.

It is understood that different image training sets may be set for different target albums, for example, for a landscape album, the images in the image training set are set as landscape photos, and the specific image training set composition depends on the actual scene.

603. The target detection task loss is determined based on the setting of the detection box.

In this embodiment, the loss of the target detection task is mainly caused by the introduction of the detection frame, so the loss of the target detection task may include the loss caused by the classification of the detection frame and the loss caused by the position of the detection frame.

In one possible scenario, for a photo album managed scenario, a target detection task may be included, i.e., identifying a target object in an image; specifically, in the target detection task, the loss function includes two parts, where the loss of the detection frame classification is a cross entropy loss function, and the loss function of the detection frame position is an L2 distance between each predicted value and the label position after the relative position transformation, and the specific formula is as follows:

wherein,

the total number of positive and negative sample boxes during training,

the total number of positive sample frames during training;

the label category of the sample box is,

the probability of predicting as a label class for the ith sample box,

the position-normalized result is predicted for the sample box,

for the normalized result of the sample box label position,

is a loss weight parameter for the detection frame position.

It is understood that, in a scenario with a small number of detection box classifications or a small detection box position disturbance range, the target detection task loss may also be performed by using a part of the above loss function, that is, only using the cross entropy loss function or only using the L2 distance, which is not limited herein.

604. A scene recognition task loss is determined based on the image tags.

In this embodiment, the scene recognition task loss is mainly a loss generated by matching the image tag and the target tag, and is set based on a loss generated in the matching process.

In one possible scenario, the loss function for the scene recognition task may be set as a cross-entropy loss function, and the specific formula is as follows:

wherein,

is a scene category of the picture label,

the probability of predicting a picture as a label category.

605. An image quality assessment task loss is determined based on the image attributes.

In this embodiment, the image quality assessment task loss is a loss generated by performing prediction scoring based on the attributes of the image, and in a general quality assessment process, the image attributes may include pixel attributes and aesthetic scoring attributes, so that the image quality assessment task loss can be designed based on the scoring losses of the two angles.

In one possible scenario, the loss function of the picture quality assessment task may be divided into two parts, with the loss corresponding to the quality attribute being a binary cross entropy and the corresponding aesthetic score loss being the L2 distance.

Wherein,

a tag for a picture composition element is provided,

the probability of predicting a picture as a corresponding composition element, s is the picture aesthetic prediction score,

the label score is aesthetic for the picture.

606. The multiple losses are integrated to determine a target loss function.

In this embodiment, the pairThe loss functions involved in steps 603-605 are integrated as a whole, so as to fully consider the loss conditions of different dimensions and ensure the accuracy of the target loss function. Specifically, the target loss function may be obtained by weighting a plurality of sub-loss functions, and the weighting for the weighting

During training, adjustment needs to be performed according to the convergence condition of each subtask, that is, the degree of influence of the subtask on the whole model is reflected.

In one possible scenario, the model is learned for a multitasking machine

The layer profile is defined as follows for the objective loss function of the sub-network:

wherein,

the loss introduced by the target detection box in the size range which is responsible for supervision of the sub-network is corresponding to the feature map of the current layer,

the loss introduced by the sub-network relative to the picture scene class label for the current layer feature map,

the penalty introduced for the current layer profile corresponding to the sub-network relative picture quality attribute labels and the aesthetic score labels.

It should be noted that it is preferable that,

the weight of the sub-loss is the hyper-parameter of the model; during training, adjustment, such as value taking, is required according to the convergence condition of each subtask

And is not limited herein.

It is understood that the value range of i is set based on the number of sub-network layers, and the specific value depends on the actual situation and is not limited herein.

607. The multi-tasking machine learning model is trained based on the objective loss function.

In this embodiment, based on the setting of the target loss function, a gradient descent algorithm is used to perform calculation, so as to adjust parameters in the multi-task machine learning model, thereby completing training of the multi-task machine learning model.

By combining the above embodiments, it can be known that the supervision information (loss information) of different tasks is fully utilized, the different tasks are mutually influenced, the final effect of each individual task can be improved, and the final generalization performance of the model can also be improved due to the influence of the different tasks on feature learning.

Next, a description is given with reference to a specific photo album management scenario, that is, with reference to the design and using steps of a multitask machine learning model, as shown in fig. 7, fig. 7 is a flowchart of another image management method based on machine learning according to an embodiment of the present application; the process at least comprises the following steps:

701. and responding to the target operation, calling a residual error network structure and a pyramid network structure to construct a shared feature expression network.

In this embodiment, the residual network structure may also adopt other convolutional neural networks, such as a lightweight neural network (mobilenet), and related series of networks thereof, such as mobilenetV1, mobilenetV2, or mobilenetV 3; the number of layers of the corresponding pyramid network structure may be three or more, and is not limited herein.

In addition, the call to the shared feature expression network may be made in response to a target operation, which may be an editing process for third-party album management software, for example, a terminal device newly installs an album management software that needs to perform an initialization process of the shared feature expression network based on an existing image of the terminal device, so as to adapt to the operation of the terminal device to provide a targeted service; the target operation may also be an operation performed during the system update process by the terminal device, that is, the shared feature expression network is inserted into the album storage mode, so as to perform the classified management of the images in the new version system.

702. And constructing a subtask network for performing different dimensionality processing processes on the image data based on the sub-network layer of the pyramid network structure.

In this embodiment, since the sub-network layers of the pyramid network structure are associated with each other, and parameters can be shared, constructing a sub-task network based on the sub-network layers can improve the execution efficiency of the sub-tasks.

It should be understood that the subtask network is used for performing processing with different dimensions on the image data, that is, performing multi-task processing, for example, an image recognition process, an image tag setting process, an image scene recognition process, an image scoring process, or an image retrieval process, and the setting of the specific subtask network is set according to specific processing requirements, which is not limited herein.

703. And configuring a loss function for each subtask network which carries out different dimensional processing processes aiming at the image data.

In this embodiment, the configuration process of the loss function may be set based on a history setting record, for example, a usage record of the loss function in the album management process, and called.

It will be appreciated that different image losses may be introduced for different tasks to be performed, which may affect the functional performance of the shared feature representation network. Therefore, the corresponding sub-loss functions are set for the processing processes with different dimensions, namely different subtasks, so that the integration of the sub-loss functions is facilitated, and the accuracy of the target loss function is improved.

For example, the sub-loss function is set as a cross entropy loss function for a task of image recognition, the sub-loss function is set as a binary cross entropy for an image quality evaluation task, and a specific setting mode is determined according to an actual scene.

704. Whether a pre-trained detection model exists for the image data is traversed based on the settings of the shared feature expression network.

In this embodiment, due to the continuity of the image data, that is, in a scene in which the image data is the target album data, the photos are continuously input one by one, and the photo sets corresponding to different times have a certain number of coincided photos. Therefore, the photo album management process is linked, namely the photo album updating process is carried out based on the preorder photo albums, and the photos in the photo albums have relevance, namely the shooting styles can be similar; therefore, whether a pre-training detection model, namely a model for preamble album identification exists can be traversed.

Optionally, the determination process of the pre-training detection model may also be performed based on the labels of the album, that is, label information corresponding to the target image of the input shared feature expression network is first obtained; and then calling a pre-training detection model according to the label information. For example, if the label information corresponding to the photo album is a landscape, traversing whether a pre-training detection model for landscape recognition exists or not, thereby improving the efficiency of image recognition.

705. The shared feature expression network is initialized using a pre-trained detection model.

In this embodiment, if there is a corresponding pre-training detection model, the model parameters of the pre-training detection model are called to update the parameters of the shared feature expression network.

In a possible scenario, if the image overlapping rate of the pre-album and the current album reaches an overlapping threshold, the shared feature expression network may be initialized by using a pre-training detection model, for example, the overlapping threshold is set to 80%, and the specific value is determined by the actual scenario.

706. And randomly initializing a pre-training detection model.

In this embodiment, if there is no pre-training detection model, the random pre-training detection model needs to be initialized, and then the corresponding training process is performed, so as to ensure that the training process is not affected by the left-over parameters, and ensure the accuracy of the training process.

707. And randomly initializing parameters of each subtask network.

In this embodiment, the process of initializing the pre-training detection model needs to set the subtask network parameters included in the pre-training detection model.

708. A training set of images associated with the image data is acquired.

In this embodiment, the image training set is marked image information, such as a landscape image packet, a task image packet, or a life image packet, and the specific training set may include one or more of the image packets.

It can be understood that, because there is similarity between image data, that is, the feature parameters of the image times of similar styles are similar, corresponding image selection may be performed in the selection process of the image training set, for example, images of the same image style are selected as the image training set.

709. And updating the model parameters by adopting a random gradient descent algorithm.

In this embodiment, the process of updating the model parameters by the stochastic gradient descent algorithm is a process of calculating the target loss function.

710. And performing redundancy analysis on the trained model parameters in response to the end of the training process.

In this embodiment, since the model parameters are represented by high dimensionality after being trained, data redundancy is likely to occur, and the processing efficiency of data is affected, the high dimensionality features can be extracted and subjected to dimensionality reduction decomposition after the training process is finished.

711. And calling the convolution layer in the shared feature expression network to decompose the convolution layer and load the parameter tensor of the low-rank approximation.

In this embodiment, the dimension reduction decomposition process is a process of representing the convolution layer by using a low-rank approximate parameter tensor, so that the convolution layer in the shared feature expression network can be called first, and a convolution tensor corresponding to the feature expression layer is obtained; then decomposing the convolution layer contained in the characteristic expression layer based on the convolution tensor to obtain low-rank expression of the characteristic expression layer; and updating the feature expression layer according to the low-rank representation.

Specifically, based on the prior knowledge of the inherent redundant information of the convolutional network model parameter tensor, tensor decomposition can be tried on the trained multitask convolutional network. For example, a layer of input in the network is

The corresponding convolution kernel tensor is

It is now approximated as a two-layer convolutional network, the first layer of the network having a convolutional kernel of

The second layer of the convolutional network is

And solving the following formula by using a conjugate gradient method to obtain new network parameters, wherein the following formula is specifically referred to:

wherein obj is the updated convolutional layer parameter; c is the number of input channels, d is the size of the convolution kernel, N is the number of output channels, and K is the number of channels of the intermediate layer after the convolution kernel decomposition.

After the trained multi-task convolutional network is obtained, the computation complexity of the model is further reduced by using a low-rank approximation method through the redundancy analysis of the convolutional kernel parameters, retraining is not needed, and the image recognition efficiency is improved.

712. And acquiring image data in the target album.

In this embodiment, the target album is an album that needs to be identified, and may be specifically downloaded over a network or updated locally, for example, one thousand photos that have been taken recently.

In a possible scenario, the target album may be a set of all image format information in the current terminal device, such as an image in the album, an image in a system file, or an image in an application, and on one hand, all the images may be classified, and on the other hand, images with different attributes may also be identified, for example, a general element of an image in a system file is single and is not classified into a class with an image in the album, so that a situation that the album is redundant due to an image in a system file is avoided.

713. And constructing to obtain the multi-task machine learning model.

In this embodiment, the shared feature expression network and the subtask network designed in the above embodiments are used to further train the model, and decompose the convolution layer in the shared feature expression network to obtain a low-rank representation, so as to obtain the multi-task machine learning model.

Specifically, the multi-task machine learning model is used for solving the problem of simultaneously processing a plurality of classification and detection tasks of mobile phone photos. The overall size of the model can be reduced and the overall prediction time of the tasks can be optimized while the multi-task expression learning is fully utilized to improve the prediction result of each task.

714. And executing a scene recognition classification task and a quality evaluation prediction task aiming at the target photo album based on a subtask network in the multi-task machine learning model.

In this embodiment, the scene identification and classification task may be to identify whether the target image is a night scene, a morning scene, or the like; and the quality evaluation prediction task can judge whether the target image is clear or not.

715. And performing smoothing processing on output results of the multiple parallel tasks to obtain a prediction result.

In this embodiment, smoothing is performed on output results of a plurality of parallel tasks output by a classification network used for executing a scene recognition classification task and a quality evaluation prediction task, for example, an average value of probabilities is taken, so that a prediction result is obtained according to the smoothed probability, accuracy of the prediction result is ensured, and influence of few extreme recognition conditions on an overall recognition result is avoided.

716. And executing a detection frame prediction task aiming at the target photo album based on the subtask network of the multitask machine learning model.

In this embodiment, the detection frame prediction task is a target detection task, and may identify a target object in a target image and set a corresponding tag.

717. Merging images of the same category in response to completion of the detection frame prediction task.

In this embodiment, images belonging to the same category (label) are merged, that is, placed in the same folder. Specifically, as shown in fig. 8, fig. 8 is a scene schematic diagram of an image management method based on machine learning according to an embodiment of the present application; the figure shows an interface of album management in the terminal equipment, and the multitask machine learning model in the embodiment can be called by clicking a finishing A1 button in the interface of album management, and photos in the album are taken as image data to be input, so that a corresponding recognition result is obtained; for example, the scene division shown in the figure is obtained, that is, the detection frame is firstly locked, and then the category corresponding to the object contained in the detection frame is determined, so that the photos are classified, and the identification process is performed through a plurality of subtasks sharing the characteristics, so that the convenience degree of the user for managing the photo album is improved, and the user experience is improved.

It is understood that after the images in the target album are identified, the labels of the images can be set for image retrieval; in addition, after the image is classified, the image label can be set according to the classified label, and compared with the setting process for the image, the efficiency of label setting is improved.

By combining the above embodiments, it can be known that different tasks share characteristic expressions by considering the correlation of multiple tasks in photo arrangement, that is, the complexity of the model is reduced, and the optimization of different tasks is mutually promoted.

In order to better implement the above-mentioned aspects of the embodiments of the present application, the following also provides related apparatuses for implementing the above-mentioned aspects. Referring to fig. 9, fig. 9 is a schematic structural diagram of an image management apparatus based on machine learning according to an embodiment of the present application, where the image management apparatus 900 includes:

an acquisition unit 901 configured to acquire image data including a plurality of target images;

an input unit 902, configured to input the target image into a shared feature expression network in a multitask machine learning model to obtain a task output feature, where the shared feature expression network includes a feature expression layer and multiple sequentially associated sub-network layers, the sub-network layers are used to generate sub-task features at different resolutions, and the task output feature is obtained by processing based on the sub-task features;

a recognition unit 903, configured to input the task output features into a plurality of subtask networks in the multitask machine learning model respectively to obtain recognition results, where the subtask networks are associated with the shared feature expression network, and the subtask networks correspond to the sub network layers;

a management unit 904 for processing the image data based on the recognition result.

Optionally, in some possible implementation manners of the present application, the identifying unit 903 is specifically configured to determine a target task corresponding to the subtask network, where the target task includes a target detection task, a scene identification task, or an image quality evaluation task;

the identifying unit 903 is specifically configured to extract a feature parameter in the task output feature based on the target task to obtain the identification result.

Optionally, in some possible implementation manners of the present application, the identifying unit 903 is specifically configured to, if the target task is a target detection task, analyze a feature map indicated in the task output feature based on the target detection task to obtain a resolution and a detection frame corresponding to the feature map;

the identifying unit 903 is specifically configured to assign the detection frame to the feature maps at the corresponding respective rates to obtain labeled feature maps;

the identifying unit 903 is specifically configured to construct a detection box regression network and a detection box classification network according to the labeled feature map to obtain the identification result.

Optionally, in some possible implementation manners of the present application, the identifying unit 903 is specifically configured to construct a detection box regression network according to the labeled feature map to obtain a target position of a detection box;

the identifying unit 903 is specifically configured to input the target position of the detection box into a detection box classification network to obtain an image category;

the identifying unit 903 is specifically configured to divide the image data based on the image category to obtain the identification result.

Optionally, in some possible implementation manners of the present application, the identifying unit 903 is specifically configured to, if the target task is a target detection task, that is, a scene identification task or an image quality assessment task, analyze the feature map indicated in the task output feature based on the target task;

the identifying unit 903 is specifically configured to input the feature maps into a classification network corresponding to the subtask network, respectively, so as to obtain the identification result, where the classification network is set based on the scene identification task or the image quality assessment task.

Optionally, in some possible implementation manners of the present application, the identifying unit 903 is specifically configured to input the feature maps into classification networks corresponding to the subtask networks, respectively, so as to obtain a classification result set;

the identifying unit 903 is specifically configured to perform smoothing processing on the prediction probability value indicated in the classification result set to obtain the identification result.

Optionally, in some possible implementations of the present application, the management unit 904 is specifically configured to obtain a convolution tensor corresponding to the feature expression layer;

the management unit 904 is specifically configured to decompose a convolutional layer included in the feature expression layer based on the convolution tensor to obtain a low-rank representation of the feature expression layer;

the management unit 904 is specifically configured to update the feature expression layer according to the low rank representation.

Optionally, in some possible implementations of the present application, the management unit 904 is specifically configured to obtain an image training set associated with the target album;

the management unit 904 is specifically configured to invoke a target loss function based on the training set to train the multitask machine learning model so as to update the multitask machine learning model, where the target loss function includes a sub-loss function corresponding to the sub-network layer, and the sub-loss function is used to indicate a composition of the target loss function.

Optionally, in some possible implementation manners of the present application, the management unit 904 is specifically configured to call a corresponding sub-loss function according to the type of the target task;

the management unit 904 is specifically configured to perform weighting calculation based on the sub-loss functions to obtain target loss functions corresponding to the sub-network layers;

the management unit 904 is specifically configured to invoke a target loss function to train the multi-tasking machine learning model based on the training set, so as to update the multi-tasking machine learning model.

Optionally, in some possible implementation manners of the present application, the management unit 904 is specifically configured to traverse a corresponding pre-training detection model based on the shared feature expression network;

the management unit 904 is specifically configured to invoke the model parameters of the pre-training detection model to update the parameters of the shared feature expression network.

Optionally, in some possible implementation manners of the present application, the management unit 904 is specifically configured to obtain tag information corresponding to the target image input into the shared feature expression network;

the management unit 904 is specifically configured to invoke the pre-training detection model according to the tag information.

Optionally, in some possible implementation manners of the present application, the image data is album data, the feature expression layer is a residual network, the sub-network layer is a branch of a feature pyramid network, and the management unit 904 is specifically configured to perform image classification, image search, or image association on the album data based on the identification result.

An embodiment of the present application further provides a terminal device, as shown in fig. 10, which is a schematic structural diagram of another terminal device provided in the embodiment of the present application, and for convenience of description, only a portion related to the embodiment of the present application is shown, and details of the specific technology are not disclosed, please refer to a method portion in the embodiment of the present application. The terminal may be any terminal device including a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a point of sale (POS), a vehicle-mounted computer, and the like, taking the terminal as the mobile phone as an example:

fig. 10 is a block diagram illustrating a partial structure of a mobile phone related to a terminal provided in an embodiment of the present application. Referring to fig. 10, the cellular phone includes: radio Frequency (RF) circuitry 1010, memory 1020, input unit 1030, display unit 1040, sensor 1050, audio circuitry 1060, wireless fidelity (WiFi) module 1070, processor 1080, and power source 1090. Those skilled in the art will appreciate that the handset configuration shown in fig. 10 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 10:

RF circuit 1010 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, for processing downlink information of a base station after receiving the downlink information to processor 1080; in addition, the data for designing uplink is transmitted to the base station. In general, the RF circuitry 1010 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 1010 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Message Service (SMS), etc.

The memory 1020 can be used for storing software programs and modules, and the processor 1080 executes various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 1020. The memory 1020 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 1020 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 1030 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 1030 may include a touch panel 1031 and other input devices 1032. The touch panel 1031, also referred to as a touch screen, may collect touch operations by a user (e.g., operations by a user on or near the touch panel 1031 using any suitable object or accessory such as a finger, a stylus, etc., and spaced touch operations within a certain range on the touch panel 1031) and drive corresponding connection devices according to a preset program. Alternatively, the touch panel 1031 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 1080, and can receive and execute commands sent by the processor 1080. In addition, the touch panel 1031 may be implemented by various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 1030 may include other input devices 1032 in addition to the touch panel 1031. In particular, other input devices 1032 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a track ball, a mouse, a joystick, or the like.

The display unit 1040 may be used to display information input by a user or information provided to the user and various menus of the cellular phone. The display unit 1040 may include a display panel 1041, and optionally, the display panel 1041 may be configured in the form of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like. Further, the touch panel 1031 can cover the display panel 1041, and when the touch panel 1031 detects a touch operation on or near the touch panel 1031, the touch operation is transmitted to the processor 1080 to determine the type of the touch event, and then the processor 1080 provides a corresponding visual output on the display panel 1041 according to the type of the touch event. Although in fig. 10, the touch panel 1031 and the display panel 1041 are two separate components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 1031 and the display panel 1041 may be integrated to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 1050, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1041 according to the brightness of ambient light, and the proximity sensor may turn off the display panel 1041 and/or the backlight when the mobile phone moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), can detect the magnitude and direction of gravity when the mobile phone is stationary, can be used for applications of recognizing the gesture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and tapping) and the like, and can also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor and the like, which are not described herein again.

Audio circuitry 1060, speaker 1061, microphone 1062 may provide an audio interface between the user and the handset. The audio circuit 1060 can transmit the electrical signal converted from the received audio data to the speaker 1061, and the electrical signal is converted into a sound signal by the speaker 1061 and output; on the other hand, the microphone 1062 converts the collected sound signal into an electrical signal, which is received by the audio circuit 1060 and converted into audio data, which is then processed by the audio data output processor 1080 and then sent to, for example, another cellular phone via the RF circuit 1010, or output to the memory 1020 for further processing.

WiFi belongs to short-distance wireless transmission technology, and the mobile phone can help the user to send and receive e-mail, browse web pages, access streaming media, etc. through the WiFi module 1070, which provides wireless broadband internet access for the user. Although fig. 10 shows the WiFi module 1070, it is understood that it does not belong to the essential constitution of the handset, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 1080 is a control center of the mobile phone, connects various parts of the whole mobile phone by using various interfaces and lines, and executes various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 1020 and calling data stored in the memory 1020, thereby integrally monitoring the mobile phone. Optionally, processor 1080 may include one or more processing units; optionally, processor 1080 may integrate an application processor, which primarily handles operating systems, user interfaces, application programs, etc., and a modem processor, which primarily handles wireless communications. It is to be appreciated that the modem processor described above may not be integrated into processor 1080.

The handset also includes a power source 1090 (e.g., a battery) for powering the various components, which may optionally be logically coupled to the processor 1080 via a power management system to manage charging, discharging, and power consumption via the power management system.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.

In the embodiment of the present application, the processor 1080 included in the terminal further has a function of executing the steps of the page processing method.

Referring to fig. 11, fig. 11 is a schematic structural diagram of a server provided in the embodiment of the present application, where the server 1100 may generate large differences due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1122 (e.g., one or more processors) and a memory 1132, and one or more storage media 1130 (e.g., one or more mass storage devices) storing an application program 1142 or data 1144. Memory 1132 and storage media 1130 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 1130 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 1122 may be provided in communication with the storage medium 1130 to execute a series of instruction operations in the storage medium 1130 on the server 1100.

The server 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input-output interfaces 1158, and/or one or more operating systems 1141, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

The steps performed by the management apparatus in the above-described embodiment may be based on the server configuration shown in fig. 11.

Also provided in an embodiment of the present application is a computer-readable storage medium, which stores therein machine-learning-based image management instructions, and when the computer-readable storage medium is executed on a computer, causes the computer to perform the steps performed by the machine-learning-based image management apparatus in the methods described in the foregoing embodiments shown in fig. 2 to 8.

Also provided in embodiments of the present application is a computer program product comprising instructions for managing images based on machine learning, which when run on a computer, causes the computer to perform the steps performed by the image management apparatus based on machine learning in the methods described in the embodiments of fig. 2 to 8.

The embodiment of the present application further provides an image management system, which may include the image management apparatus based on machine learning in the embodiment described in fig. 9, or the terminal device in the embodiment described in fig. 10, or the server described in fig. 11.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a machine learning based image management apparatus, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An image management method based on a multitask machine learning model is characterized by comprising the following steps:

acquiring image data of a target album, wherein the image data comprises a plurality of target images;

inputting the target image into a shared feature expression network in a multitask machine learning model, or calling the shared feature expression network in the multitask machine learning model to process the target image so as to obtain task output features, wherein the shared feature expression network comprises a feature expression layer and a plurality of sequentially associated sub-network layers, the sub-network layers are used for generating sub-task features under different resolutions, the task output features are obtained by processing based on the sub-task features, and the multitask machine learning model comprises the shared feature expression network and the sub-task networks which are associated with each other;

and processing the image data based on the identification result so as to classify the images in the target album.

2. The method of claim 1, wherein the inputting the task output features into a plurality of subtask networks in the multitask machine learning model respectively to obtain a recognition result comprises:

determining a target task corresponding to the subtask network in the multi-task machine learning model, wherein the target task comprises a target detection task, a scene recognition task or an image quality evaluation task;

3. The method according to claim 2, wherein the extracting feature parameters in the task output features based on the target task to obtain the recognition result comprises:

assigning the detection frame to the feature map at the corresponding resolution to obtain a labeled feature map;

4. The method according to claim 3, wherein constructing a detection frame regression network and a detection frame classification network according to the labeled feature map to obtain the recognition result comprises:

5. The method according to claim 2, wherein the extracting feature parameters in the task output features based on the target task to obtain the recognition result comprises:

if the target task is a scene recognition task or an image quality evaluation task, analyzing a feature map indicated in the task output features based on the target task;

6. The method according to claim 5, wherein the inputting the feature maps into classification networks corresponding to the subtask networks respectively to obtain the recognition results comprises:

7. The method according to any one of claims 1-6, further comprising:

acquiring a convolution tensor corresponding to the feature expression layer;

updating the feature expression layer according to the low rank representation.

8. The method of any one of claims 2, 3 or 5, further comprising:

acquiring an image training set associated with the target photo album;

9. The method of claim 8, wherein the training the multitask machine learning model based on the training set calling a target loss function to update the multitask machine learning model comprises:

10. The method of claim 8, further comprising:

11. The method of claim 10, wherein traversing the corresponding pre-trained detection model based on the shared feature expression network comprises:

12. The method of claim 1, wherein the image data is cell phone album data, the feature representation layer is a residual network, and the sub-network layer is a branch of a feature pyramid network, the method further comprising:

and performing image marking, image searching or image association on the album data based on the identification result.

13. An image management apparatus based on machine learning, comprising:

an acquisition unit configured to acquire image data including a plurality of target images;

14. A computer device, the computer device comprising a processor and a memory:

the memory is used for storing program codes; the processor is configured to execute the multitask machine learning model based image management method according to any one of claims 1 to 12 according to instructions in the program code.

15. A computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to execute the multitask machine learning model based image management method according to any one of claims 1 to 12.