[go: up one dir, main page]

WO2023185787A1 - Procédé d'appariement d'articles et dispositif associé - Google Patents

Procédé d'appariement d'articles et dispositif associé Download PDF

Info

Publication number
WO2023185787A1
WO2023185787A1 PCT/CN2023/084241 CN2023084241W WO2023185787A1 WO 2023185787 A1 WO2023185787 A1 WO 2023185787A1 CN 2023084241 W CN2023084241 W CN 2023084241W WO 2023185787 A1 WO2023185787 A1 WO 2023185787A1
Authority
WO
WIPO (PCT)
Prior art keywords
items
image
item
candidate
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/084241
Other languages
English (en)
Chinese (zh)
Inventor
邓一萌
杨坚鑫
李继忠
曹朝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of WO2023185787A1 publication Critical patent/WO2023185787A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5862Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using texture

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a method of matching items and related equipment.
  • AI Artificial Intelligence
  • Common item search solutions in the industry include photo search. Specifically, users can take photos of the items they want to search for, and then search for similar items based on the input pictures.
  • the embodiments of the present application provide an item matching method and related equipment.
  • a complex image to be processed that is, an image including at least two items
  • a target category of items related to each other which greatly expands the application scenarios of this solution and is conducive to improving the user stickiness of this solution.
  • embodiments of the present application provide an item matching method, which can apply artificial intelligence technology to the field of item search.
  • the method includes: the client device obtains an image to be processed input by the user, and the image to be processed contains a background and at least two items; the server or the client device obtains a target category of items that has a matching relationship with the image to be processed through the first neural network based on the characteristic information of the image to be processed and the characteristic information of at least two items in the image to be processed; the client device Show the user items from the aforementioned target categories.
  • the user can provide an image of the scene used by the item to be searched (that is, the above-mentioned image to be processed), and then a target category that has a matching relationship with the entire image to be processed can be obtained through the first neural network, and then Display the target items corresponding to a target category to the user; through the above solution, the user can not only search for the items they want to match by providing the image to be processed, but also when the user inputs a complex image to be processed (that is, including at least two (image of an item), it is still possible to obtain a target category of items that has a matching relationship with the entire to-be-processed image, which greatly expands the application scenarios of this solution and is conducive to improving the user stickiness of this solution; in addition, based on the entire to-be-processed image
  • the characteristic information of the image and the characteristic information of the items in the image to be processed determine a target category that has a matching relationship with the entire image to be processed, that is, not only the information of the entire image to be
  • the method further includes: the server or the client device inputs the image to be processed into a third neural network, so as to perform feature extraction on the image to be processed through the third neural network, and obtain the feature corresponding to the image to be processed.
  • Target feature information includes feature information of at least two items in the image to be processed and feature information of the image to be processed.
  • the characteristic information of the image to be processed includes the characteristic information of the whole composed of the background and at least two items. That is, the characteristic information of the image to be processed refers to treating the image to be processed as a whole and extracting features from the matching image.
  • Feature information of the image to be processed may include texture information, color information, contour information, style information, scene information or other types of feature information of the image to be processed; at least two items in the image to be processed
  • the characteristic information of can also be called the semantic label set of the image to be processed.
  • the characteristic information of at least two items in the image to be processed can include attribute information of each item.
  • the attribute information of each item includes any one or more of the following information. : The category of the item, the color of the item and the location information of the item in the image to be processed; optionally, it can also include style information of each item, the material of the item, the pattern of the item or other feature information.
  • the feature information of the image to be processed refers to the feature information obtained by treating the image to be processed as a whole and extracting features from the matching images.
  • the feature information of at least two items in the image to be processed can include each The attribute information of each item further refines the concept of the feature information of the image to be processed and the feature information of at least two items, which is conducive to a clearer distinction between the feature information of the image to be processed and the feature information of at least two items; and each The characteristic information of an item includes information such as the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.
  • the information of the object in the image to be processed is fully considered, which is beneficial to improving the determination of the item. accuracy of the target category.
  • the server or the client device obtains a target that has a matching relationship with the image to be processed based on the characteristic information of the image to be processed and the characteristic information of at least two items through the first neural network.
  • Category including: the server or client device generates M candidate intentions corresponding to the image to be processed through the first neural network, M is an integer greater than or equal to 2, and each candidate intention indicates a category that has a collocation relationship with the image to be processed items; the client device displays M candidate intentions to the user to obtain feedback operations corresponding to the M candidate intentions; the client device determines a target category that has a matching relationship with the image to be processed based on the feedback operations for the M candidate intentions .
  • the "feedback operation” may be a selection operation on one of the M candidate intentions, or the “feedback operation” may also be a user manually inputting a new search intention, etc.
  • M candidate intentions are first generated through the first neural network, and then based on the feedback operation input by the user for the M candidate intentions, a target category that has a matching relationship with the image to be processed is determined, that is, an interactive method is used. This method guides the user's search intention and is conducive to improving the accuracy of the determined target category.
  • the method further includes: the client device obtains target text information input by the user, the target text information is used to indicate the user's search intention; the server or the client device inputs the text information into the fourth neural network , to perform feature extraction on the text information through the fourth neural network to obtain the feature information of the text information.
  • the server or client device obtains a target category that has a matching relationship with the image to be processed through the first neural network based on the characteristic information of the image to be processed and the characteristic information of at least two items, including: combining the characteristic information of the image to be processed and at least The characteristic information of the two items and the characteristic information of the text information are input into the first neural network, so as to obtain a target category that has a matching relationship with the image to be processed through the first neural network.
  • the target text information input by the user can also be obtained.
  • the target text information is used to indicate the user's search intention, and the target feature information and the feature information of the target text information are input into the third neural network together, that is, when obtaining
  • the target feature information and the feature information of the target text information are input into the third neural network together, that is, when obtaining
  • the text information used to indicate the user's search intention can be combined to further improve the accuracy of the determined candidate intention.
  • the client device obtains an item of a target category that has a collocation relationship with the image to be processed through a first neural network, including: the server obtains an item that has a collocation relationship with the image to be processed through the first neural network N candidate items of the relationship, each candidate item is a target category, and N is an integer greater than 1; the server generates a target score corresponding to the N candidate items through the second neural network, and the target score indicates the relationship between the candidate item and the image to be processed.
  • the matching degree between the N candidate items that is, the aesthetic score used to indicate the matching renderings of a candidate item and the image to be processed; the server selects K target items from the N candidate items based on the target scores corresponding to the N candidate items, K is an integer greater than or equal to 1.
  • Items of the target category displayed on the customer equipment include: K target items displayed on the customer equipment.
  • scores corresponding to N candidate items are generated through a neural network.
  • the scores indicate the matching degree between the candidate items and the image to be processed; and based on the matching degree between each candidate item and the image to be processed, from N Select the target item that is finally displayed to the user among the candidate items. That is to say, the beauty of the matching of the candidate item and the image to be processed is quantitatively scored, and the beauty of the matching rendering is taken into consideration in the process of selecting the target item, so that the matching rendering of the target item and the image to be processed is provided to the user, which will be better-looking. It will help improve the user stickiness of this program.
  • generating a target score corresponding to N candidate items through a second neural network includes: combining the image of each candidate item, the semantic label of each candidate item, the image to be processed, and The semantic labels corresponding to the items in the image to be processed are input into the second neural network, and the target score corresponding to each candidate item output by the second neural network is obtained.
  • the semantic labels of the items in the image to be processed can also be called the feature information of the items in the image to be processed.
  • the semantic label of the candidate item may include at least one attribute information of the candidate item.
  • the semantic label of the candidate item may include any one or more of the following: the category of the candidate item, the style of the candidate item, the shape of the candidate item, or Other attributes of candidate items, etc.
  • the client device displays items of a target category to the user, including: the client device displays to the user a rendering of a combination of the items of the target category and the image to be processed.
  • the aforementioned matching renderings can be in pure image format, renderings after VR modeling, renderings after AR modeling, or other formats, etc.
  • the client device can also display to the user any one or more of the following information about the items in each target category: access links, names, prices, target ratings, or other types of information about the items in each target category. etc., there is no limit here.
  • the user is shown the matching effect diagram of the items of each target category and the image to be processed, so that the user can more intuitively experience the matching effect of the items of the target category applied to the image to be processed, which is conducive to improving the solution. user viscosity.
  • embodiments of the present application provide an item matching method, which can apply artificial intelligence technology in the field of item search.
  • the method includes: the client device obtains an image to be processed input by the user, and the image to be processed contains a background and at least two items; receive an item of a target category sent by the server that has a matching relationship with the image to be processed.
  • the item of the target category is obtained by the server based on the feature information of the image to be processed and the feature information of at least two items; display the items of the target category thing.
  • the feature information of the image includes a background and at least two items.
  • the overall characteristic information of at least two items includes attribute information of each item.
  • the attribute information of each item includes any one or more of the following information: category of item, color of item, style of item, item The material or pattern of the item.
  • the client device receives M candidate intentions corresponding to the image to be processed sent by the server, and displays the M candidate intentions to the user, where M is an integer greater than or equal to 2, and each candidate The intent indicates a category of items that have a matching relationship with the image to be processed; the client device obtains the feedback operations corresponding to the M candidate intentions, and determines a item that has a matching relationship with the image to be processed based on the feedback operations for the M candidate intentions.
  • a target category is sent to the server.
  • the client device can also be used to perform the steps performed by the client device in the first aspect and each possible implementation manner of the first aspect.
  • the specific implementation methods and meanings of the nouns in each possible implementation manner of the second aspect please refer to the first aspect and will not be repeated here.
  • embodiments of the present application provide an item matching method, which can apply artificial intelligence technology to the field of item search.
  • the method includes: the server uses the characteristic information of the image to be processed and the characteristic information of at least two items, through the first A neural network obtains a target category that has a matching relationship with the image to be processed, where there is a background and at least two items in the image to be processed; the server sends information about the items of the target category to the client device.
  • the server obtains a target category that has a matching relationship with the image to be processed through the first neural network based on the characteristic information of the image to be processed and the characteristic information of at least two items, including:
  • the server generates M candidate intentions corresponding to the image to be processed through the first neural network, M is an integer greater than or equal to 2, and each candidate intention indicates a category of items that has a collocation relationship with the image to be processed;
  • the server sends a request to the client device Send M candidate intentions, which are used by the client device to obtain a target category that has a matching relationship with the image to be processed; receive the target category sent by the client device.
  • the server can also be used to execute the steps performed by the server in the first aspect and each possible implementation of the first aspect.
  • inventions of the present application provide an item matching device that can apply artificial intelligence technology to the field of item search.
  • the item matching device is applied to client equipment in an item matching system.
  • the item matching system also includes a server.
  • the item matching device includes: an acquisition module, used to obtain an image input by the user, in which there is a background and at least two items; a receiving module, used to receive a target category of items sent by the server that has a matching relationship with the image, The items of the target category are obtained by the server based on the feature information of the image and the feature information of at least two items; the display module is used to display the items of the target category.
  • the item matching device can also be used to perform the steps performed by the client device in the second aspect and each possible implementation manner of the second aspect.
  • the specific implementation methods and nouns of the steps in each possible implementation manner of the fourth aspect Please refer to the second aspect for its meaning and beneficial effects, and will not be repeated here.
  • embodiments of the present application provide an item matching device that can apply artificial intelligence technology to the field of item search.
  • the item matching device is applied to a server in an item matching system.
  • the item matching system also includes a client device.
  • the item matching device includes: an acquisition module, configured to acquire a target category of items that has a matching relationship with the image through the first neural network based on the feature information of the image and the feature information of at least two items, wherein the image There is a background and at least two items in; the sending module is used to send information of items of the target category to the client device.
  • the item matching device can also be used to perform the steps performed by the server in the third aspect and each possible implementation of the third aspect.
  • inventions of the present application provide a computer program product.
  • the computer program product includes a program.
  • the program When the program is run on a computer, it causes the computer to execute the method for matching items described in the second aspect or the third aspect.
  • embodiments of the present application provide a computer-readable storage medium.
  • a computer program is stored in the computer-readable storage medium. When the program is run on a computer, it causes the computer to execute the second aspect or the third aspect. How to match items.
  • embodiments of the present application provide a client device, including a processor and a memory.
  • the processor is coupled to the memory.
  • the memory is used to store programs; the processor is used to execute the program in the memory, so that the client device executes the above. Methods performed by client devices in various aspects.
  • embodiments of the present application provide a server, including a processor and a memory.
  • the processor is coupled to the memory.
  • the memory is used to store programs; the processor is used to execute the program in the memory, so that the server performs the above aspects. The method executed by the server.
  • the present application provides a chip system, which includes a processor for supporting a terminal device or communication device to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods. /or information.
  • the chip system also includes a memory, which is used to store necessary program instructions and data for the terminal device or communication device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • Figure 1a is a schematic structural diagram of the artificial intelligence main framework provided by the embodiment of the present application.
  • Figure 1b is an application scenario diagram of the item matching method provided by the embodiment of the present application.
  • Figure 2a is a system architecture diagram of the item matching system provided by the embodiment of the present application.
  • Figure 2b is a schematic flowchart of a method for matching items provided by an embodiment of the present application.
  • Figure 3 is a schematic flowchart of a method for matching items provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of an interface for obtaining the image to be processed and target text information in the item matching method provided by the embodiment of the present application;
  • Figure 5 is a schematic diagram of the first feature extraction network in the item matching method provided by the embodiment of the present application.
  • Figure 6 is a schematic diagram showing M candidate intentions in the item matching method provided by the embodiment of the present application.
  • Figure 7 is a schematic flowchart of obtaining a target category in the item matching method provided by the embodiment of the present application.
  • Figure 8 is a schematic diagram of the target score in the item matching method provided by the embodiment of the present application.
  • Figure 9 is a schematic diagram of the second neural network in the item matching method provided by the embodiment of the present application.
  • Figure 10 is a schematic diagram of the matching effect diagram of the target item and the image to be processed in the item matching method provided by the embodiment of the present application;
  • Figure 11 is a schematic flowchart of a method for matching items provided by an embodiment of the present application.
  • Figure 12 is a schematic flowchart of a method for matching items provided by an embodiment of the present application.
  • Figure 13 is a schematic flowchart of a method for matching items provided by an embodiment of the present application.
  • Figure 14 is a schematic structural diagram of an item matching device provided by an embodiment of the present application.
  • Figure 15 is a schematic structural diagram of an item matching device provided by an embodiment of the present application.
  • Figure 16 is a schematic structural diagram of a client device provided by an embodiment of the present application.
  • Figure 17 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • Figure 18 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • Figure 1a shows a structural schematic diagram of the artificial intelligence main framework.
  • the following is from the “intelligent information chain” (horizontal axis) and “IT value chain” ( The above artificial intelligence theme framework is elaborated on the two dimensions of vertical axis).
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.
  • Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms.
  • computing power is provided by a smart chip, which can specifically use a central processing unit (CPU), an embedded neural network processor (neural-network processing unit, NPU), a graphics processor ( Graphics processing unit (GPU), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) and other hardware acceleration chips;
  • the basic platform includes distributed computing framework and network and other related platforms Guarantee and support can include cloud storage and computing, interconnection networks, etc.
  • sensors communicate with the outside world to obtain data, which are provided to smart chips in the distributed computing system provided by the basic platform for calculation.
  • Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
  • the data involves graphics, images, Voice and text also involve IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.
  • Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent terminals, intelligent manufacturing, Smart transportation, smart home, smart healthcare, smart security, autonomous driving, smart city, etc.
  • Figure 1b is an application scenario diagram of the item matching method provided by the embodiment of the present application. As shown in Figure 1b, when the user is using a shopping application, when the user clicks on the icon, you can enter the image to be processed to search for and purchase items of a category that have a matching relationship with the image to be processed.
  • the user when using a decoration design application, the user can input an image to be processed to search for items of a category that have a matching relationship with the image to be processed. It should be understood that the embodiments of the present application can also be applied In other scenarios of obtaining items that have a matching relationship with the image to be processed, other application scenarios will not be listed one by one here.
  • Figure 2a is a system architecture diagram of the item matching system provided by the embodiment of the present application.
  • the item matching system 200 includes training Device 210, database 220, execution device 230, data storage system 240 and client device 250.
  • the execution device 230 includes a computing module 231.
  • the first training data set is stored in the database 220
  • the training device 210 generates the first model/rules 201
  • uses the first training data set in the database to iteratively train the first model/rules 201 to obtain the mature first Model/Rule 201.
  • the first model/rule 201 may be embodied as a model in the form of a first neural network or a non-neural network.
  • the first model/rule 201 is a first neural network as an example for description.
  • the execution device 230 can call data, codes, etc. in the data storage system 240, and can also store data, instructions, etc. in the data storage system 240.
  • the data storage system 240 may be placed in the execution device 230 , or the data storage system 240 may be an external memory relative to the execution device 230 .
  • the trained first model/rule 201 obtained by the training device 210 may be deployed in the execution device 230 , and the execution device 230 may appear as a server corresponding to the application program deployed on the client device 250 .
  • the computing module 231 of the execution device 230 may obtain, through the first model/rule 201, a target category that has a matching relationship with the image to be processed, where the image to be processed is obtained through the client device 250, and the target category indicates the target category that has a matching relationship with the image to be processed.
  • the image has a collocation relationship with a target category of items.
  • the client device 250 can be represented by various forms of terminal devices, such as mobile phones, tablets, laptops, virtual reality (VR) devices or augmented reality (AR) devices, etc.
  • terminal devices such as mobile phones, tablets, laptops, virtual reality (VR) devices or augmented reality (AR) devices, etc.
  • VR virtual reality
  • AR augmented reality
  • the execution device 230 and the client device 250 may be independent devices.
  • the execution device 230 is configured with an input/output (I/O) interface for data interaction with the client device 250.
  • the "user" can input the image to be processed to the I/O interface through the client device 250, and the execution device 230 returns the items of the target category that have a matching relationship with the image to be processed to the client device 250 through the I/O interface, and provides them to the user.
  • Figure 2a is only a schematic architectural diagram of a matching system for two items provided by an embodiment of the present invention, and the positional relationship between the equipment, components, modules, etc. shown in the figure does not constitute any limitation.
  • the execution device 230 and the client device 250 can also be integrated into the same device, which is not limited here.
  • Figure 2b is a schematic flow chart of the item matching method provided by an embodiment of the present application.
  • S1. Obtain the image to be processed input by the user. There is a background and at least two items in the image to be processed.
  • S2. Based on the characteristic information of the image to be processed and the characteristic information of the at least two items, obtain an item of a target category that has a matching relationship with the image to be processed through the first neural network.
  • S3. Display items of the target category.
  • users can not only search for items they want to match by providing images to be processed, but also when the user inputs a complex image to be processed (that is, an image including at least two items), the user can still obtain A target category of items that matches the entire image to be processed greatly expands the application scenarios of this solution and is conducive to improving the user stickiness of this solution.
  • a complex image to be processed that is, an image including at least two items
  • the item matching system may include a client device and a server.
  • the process of "obtaining a target category that has a matching relationship with the image to be processed” may include feature extraction of the matching image and the extraction of features based on the extracted features. Identify two parts of the target category.
  • the aforementioned two parts can be completely dominated by the server, that is, the execution device of the first neural network and the client device are separated; in another implementation, the operations of the aforementioned two parts It can be completely led by the client device, that is, the execution device of the first neural network and the client device are integrated on the same device; in another implementation, the feature extraction operation can be performed on the client device, and the server can lead the determination of the target category. operation, the execution device of the first neural network and the client device are also separated. Since the specific implementation processes of the above three implementation methods are different, they are described separately below.
  • Figure 3 is a schematic flowchart of a method of matching items provided by an embodiment of the present application.
  • the method of matching items provided by an embodiment of the present application may include:
  • the client device obtains the image to be processed input by the user.
  • the user can input the image to be processed through the client device.
  • the client device obtains the image to be processed input by the user to search for items that have a matching relationship with the image to be processed.
  • the image to be processed can be an image selected by the user from images stored locally on the client device, an image captured by the user using the camera on the client device, or an image downloaded by the user using a browser, etc. , no limitation is made here.
  • the client device obtains the target text information input by the user, and the target text information is used to indicate the user's search intention.
  • the client device can also obtain target text information input by the user, and the target text information is used to indicate the user's search intention. Further, the item indicated by the target text information may be an item in the image to be processed, or may not be an item in the image to be processed.
  • Figure 4 is a schematic diagram of an interface for obtaining the image to be processed and the target text information in the item matching method provided by the embodiment of the present application.
  • Figure 4 includes two sub-schematic diagrams (a) and (b).
  • the image can be Triggering entry into sub-diagram (b) of Figure 4, that is, prompting the user to input target text information through sub-diagram (b) of Figure 4.
  • the schematic diagram can be flexibly set according to the actual product form, and is not limited here.
  • the server inputs the image to be processed into the third neural network to extract features of the image to be processed through the third neural network to obtain target feature information corresponding to the image to be processed.
  • the target feature information includes feature information of the items in the image to be processed. and feature information of the image to be processed.
  • the client can send the image to be processed to the server, and the server can input the received image to be processed into the third neural network to pass the third neural network.
  • the network performs feature extraction on the entire image to be processed to obtain the feature information of the image to be processed.
  • the feature information of the image to be processed includes the overall feature information composed of the background of the image to be processed and at least two items; the server also uses a third neural network Identify each item area in the image to be processed, and perform feature extraction on the items in the image to be processed, to obtain feature information of at least two items in the image to be processed, where the feature information of the at least two items includes attribute information of each item.
  • the target feature information includes feature information of at least two items in the image to be processed and feature information of the image to be processed.
  • the aforementioned feature information of the image to be processed refers to the feature information obtained after feature extraction of the image to be processed by treating the image to be processed as a whole (that is, the background of the image to be processed and at least two items); as
  • the feature information of the image to be processed may include texture information, color information, contour information, style information, scene information or other types of feature information of the image to be processed.
  • the characteristic information of at least two items in the image to be processed can also be called the set of semantic tags corresponding to the image to be processed.
  • the characteristic information of the at least two items can include attribute information of each item.
  • the attribute information of each item includes any of the following: One or more types of information: the position information of the item in the image to be processed, the category information of the item, and the color information of the item; optionally, it can also include style information of each item, the material of the item, the pattern of the item, or other Feature information.
  • the characteristic information of items of different categories may include different information.
  • the characteristic information of the bed may include the position information of the bed in the image to be processed, the bed's location information, and the location information of the bed. one type identification information, the color of the bed and the style of the bed.
  • the characteristic information of the top may include the position information of the top in the image to be processed, the category information of the top, the color of the top, the shape of the top and the material of the top. It should be understood that here The examples are only used to facilitate understanding of this solution and are not used to limit this solution.
  • the third neural network can specifically be embodied as a convolutional neural network or other neural networks used for feature extraction. Further, the third neural network may include two different feature extraction networks: a first feature extraction network and a second feature extraction network.
  • the first feature extraction network is used to generate feature information of at least two items in the image to be processed, and the second feature extraction network is used to generate feature information of the entire image to be processed.
  • the first feature extraction network can be used as part of the neural network used for target recognition of images, that is, the training device can use the training data to train the neural network used for target recognition of images.
  • the network is iteratively trained until the convergence conditions are met. After the trained neural network is obtained, the trained first feature extraction network is obtained from it.
  • a neural network used for object recognition in an image can identify coffee tables, sideboards, storage cabinets, shoe cabinets and flower racks in the image. That is, the first feature extraction network in the embodiment of the present application can be used in more detailed Feature extraction at a granular level.
  • Figure 5 is a schematic diagram of the first feature extraction network in the item matching method provided by the embodiment of the present application.
  • the first feature extraction network can identify the three item areas in the image to be processed and generate feature information of the items in the image to be processed. It should be understood that , the example in Figure 5 is only for convenience of understanding this solution and is not used to limit this solution.
  • the second feature extraction network can be used as part of the neural network used to classify the entire image, that is, the training device can use the training data to classify the neural network used to classify the entire image.
  • the network is iteratively trained until the convergence conditions are met. After the trained neural network is obtained, the trained second feature extraction network is obtained from it.
  • the characteristic information of the image to be processed refers to the characteristic information obtained by treating the image to be processed as a whole and extracting features from the matching image.
  • the characteristic information of at least two items in the image to be processed may include The attribute information of each item further refines the concept of the feature information of the image to be processed and the feature information of at least two items, which is conducive to a clearer distinction between the feature information of the image to be processed and the feature information of at least two items; and
  • the feature information of each item includes information such as the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.
  • the information of the object in the image to be processed is fully considered, which is beneficial to improve Accuracy of identified target categories.
  • the server inputs the text information into the fourth neural network to extract features of the text information through the fourth neural network to obtain feature information of the text information.
  • the server can also input text information into a fourth neural network to extract features of the text information through the fourth neural network to obtain feature information of the text information.
  • step 302 is an optional step. If step 302 is executed, the text information input into the fourth neural network refers to the target text information obtained in step 302; if step 302 is not executed, the text information input into the fourth neural network is The text information may be the characteristic information of the items in the image to be processed obtained in step 303, that is, the text information input into the fourth neural network may be a set of semantic labels of the image to be processed.
  • the fourth neural network is a neural network that extracts features from text information. It can be embodied as a recurrent neural network or other types of neural networks, etc., and is not exhaustive here.
  • step 304 is also an optional step. If step 304 is not executed, step 302 does not need to be executed. After step 303 is executed, step 305 can be executed directly.
  • the server Based on the characteristic information of the image to be processed and the characteristic information of at least two items, the server obtains a target category that has a matching relationship with the image to be processed through the first neural network.
  • the server may obtain a target category that has a matching relationship with the image to be processed through the first neural network based on the characteristic information of the image to be processed and the characteristic information of at least two items. Specifically, in an implementation manner, if steps 303 and 304 are executed, the server can input the target feature information and the feature information of the text information into the first neural network, so that the first neural network generates M corresponding to the image to be processed.
  • candidate intents each candidate intent indicates a category of items that has a collocation relationship with the image to be processed.
  • M is an integer greater than or equal to 1. Further, when there are at least two objects in the image to be processed, M is an integer greater than or equal to 2.
  • the first neural network can also output M first scores that correspond one-to-one to the M candidate intentions, and each first score is used to indicate the probability that a candidate intention is consistent with the user's search intention.
  • the number of candidate intentions output by the first neural network may be the same or different, that is, the first neural network may determine the number of candidate intentions output according to the actual situation.
  • the server sends the M candidate intentions to the client device to present the M candidate intentions to the user through the display interface of the client device; wherein the client device can present the M candidate intentions to the user in text, images, or other forms.
  • the server can also send the M first scores to the client device, and the client device can evaluate the M first scores according to the first scores corresponding to each candidate intention. M candidate intentions are sorted. The higher the first score, the higher the ranking position.
  • FIG. 6 is a schematic diagram showing M candidate intentions in the item matching method provided by the embodiment of the present application.
  • the image to be processed contains three main areas, namely bed, wardrobe and wall, and the text information is "wall decoration", then the target feature information can include the feature information of the bed, the feature information of the wardrobe, the feature information of the wall and the entire
  • the characteristic information of the image to be processed, the M candidate intentions may include decorative paintings, pendants and lighting in Figure 6. It should be understood that the examples in Figure 6 are only for convenience of understanding this solution and are not used to limit this solution.
  • the client device After the client device displays M candidate intentions to the user, in one case, if the client device obtains feedback operations corresponding to the M candidate intentions, it can determine that the image to be processed has the same characteristics as the image to be processed based on the feedback operations for the M candidate intentions. A target category of the matching relationship, and sends the target category that has a matching relationship to the collocation image to the server. Correspondingly, if the server obtains the aforementioned target category sent by the client device within the target time period, it can determine the target category corresponding to the image to be processed.
  • the "feedback operation” can be a selection operation for one of the M candidate intentions, or the “feedback operation” can also be the user manually inputting a new search intention, etc.
  • the specific implementation of the "feedback operation” is not mentioned here. List in the form.
  • the target category may be one of the M candidate intentions, or may be other search intentions other than the M candidate intentions.
  • Figure 7 is a schematic flowchart of obtaining a target category in the item matching method provided by an embodiment of the present application.
  • E1 and the server input the target feature information and the feature information of the text information into the first neural network, and the first neural network generates M candidate intentions corresponding to the image to be processed.
  • E2. The server sends M candidate intentions to the client device.
  • the client device displays M candidate intentions to the user.
  • the client device determines a target category based on the user's feedback operations for the M candidate intent inputs.
  • E5. The client device sends the target category to the server, and accordingly, the server receives the target category.
  • the example in Figure 7 is only for convenience of understanding this solution and is not used to limit this solution.
  • the client device may not send any feedback information to the server, or the client device may also send the first response to the server.
  • Feedback information the first feedback information is used to inform the server that the feedback operation input by the user has not been received.
  • the server does not receive feedback information from the client device within the target time period, or receives the first feedback information sent by the client device, the candidate intention with the highest first score value among the M candidate intentions can be determined as target category.
  • step 303 if step 303 is executed but step 304 is not executed, the server can input the target feature information into the first neural network, so that the first neural network generates M candidate intentions corresponding to the image to be processed.
  • the server sends M candidate intentions to the client device to display the M candidate intentions to the user through the display interface of the client device, and obtains feedback operations corresponding to the M candidate intentions through the display interface of the client device; the client device operates based on the feedback, Determine a target category that has a matching relationship with the image to be processed, and send the target category corresponding to the collocated image to the server.
  • the feature information of the entire image to be processed when performing feature extraction on the image to be processed, not only the feature information of the entire image to be processed can be obtained, but also the feature information of the items in the image to be processed can be obtained, and then based on the feature information of the entire image to be processed and the features to be processed, Process the characteristic information of the items in the image and generate M categories of items that have a matching relationship with the entire image to be processed. That is, not only the information of the entire image to be processed is considered, but also each object in the image to be processed is fully considered. It is beneficial to improve the accuracy of determined candidate intentions.
  • the target text information input by the user can also be obtained.
  • the target text information is used to indicate the user's search intention, and the target feature information and the feature information of the target text information are input into the third neural network together, that is, after obtaining and In the process of establishing a matching relationship between the images to be processed, not only the information in the images to be processed can be fully obtained, but also the text information used to indicate the user's search intention can be combined to further improve the accuracy of the determined candidate intentions.
  • the server can input the image to be processed into the first neural network, and perform feature extraction on the image to be processed through the first neural network to obtain the entire image to be processed.
  • feature information based on the feature information of the entire image to be processed, M candidate intentions corresponding to the image to be processed are generated through the first neural network.
  • the server sends M candidate intentions to the client device to display the M candidate intentions to the user through the display interface of the client device, and obtains feedback operations corresponding to the M candidate intentions through the display interface of the client device; the client device is based on According to the feedback operation, a target category that has a matching relationship with the image to be processed is determined, and the target category corresponding to the matching image is sent to the server.
  • a target category that has a matching relationship with the image to be processed is determined, and the target category corresponding to the matching image is sent to the server.
  • M candidate intentions are first generated through the first neural network, and then based on the feedback operation input by the user for the M candidate intentions, a target category that has a matching relationship with the image to be processed is determined, that is, an interactive method is used. This method guides the user's search intention, which is conducive to improving the accuracy of the determined target category.
  • the server can also input the target feature information and the feature information of the text information into the first neural network to obtain an image generated by the first neural network that has a matching relationship with the image to be processed. target category.
  • step 303 if step 303 is executed but step 304 is not executed, the server can also input the target feature information into the first neural network to obtain a target category generated by the first neural network that has a matching relationship with the image to be processed. .
  • the server can input the image to be processed into the first neural network, and perform feature extraction on the image to be processed through the first neural network to obtain the entire image to be processed. According to the characteristic information of the entire image to be processed, a target category that has a matching relationship with the image to be processed is generated through the first neural network.
  • the server obtains N candidate items, each of which is a target category.
  • the server after the server determines a target category that has a matching relationship with the image to be processed, it can obtain N candidate items corresponding to the target category from the item library stored in the server. That is, the server can obtain N candidate items from the items. Obtain N candidate items of the target category from the library, where N is an integer greater than 1.
  • the server generates target scores corresponding to the N candidate items through the second neural network.
  • the target scores indicate the matching degree between the candidate items and the image to be processed.
  • the server can generate a target score corresponding to each of the N candidate items through a second neural network, where a target score indicates the matching degree between a candidate item and the image to be processed, That is, it is used to indicate the aesthetic score of the matching effect of a candidate item and the image to be processed.
  • Figure 8 is a schematic diagram of the target score in the item matching method provided by the embodiment of the present application.
  • Figure 8 includes three sub-schematic diagrams (a), (b) and (c).
  • the sub-schematic diagram (a) of Figure 8 shows the three items in the image to be processed;
  • the sub-schematic diagram (b) of Figure 8 shows The candidate item is sofa one, and the score of the matching effect diagram of sofa one and the image to be processed is 0.956 points;
  • the candidate item shown in the sub-schematic diagram (c) of Figure 8 is sofa two, and the matching effect of sofa two and the image to be processed is The score of the graph is 0.425 points. It means that the matching degree between sofa 1 and the image to be processed is higher than the matching degree between sofa 2 and the image to be processed. It should be understood that the example in Figure 8 is only for convenience of understanding this solution and is not used to limit this solution.
  • the server can input the feature information and target feature information of each candidate item into the second neural network to obtain the target score corresponding to each candidate item output by the second neural network, and the server evaluates N By performing the foregoing operations on each of the candidate items, a target score corresponding to each of the N candidate items can be generated.
  • the server can also input the image of each candidate item and the image to be processed into the second neural network to obtain the target score corresponding to each candidate item output by the second neural network, and the server evaluates the N candidates thing If each candidate item in the product performs the above operations, the target score corresponding to each candidate item can be generated.
  • the server can also input the image of each candidate item, the semantic label of each candidate item, the image to be processed, and the semantic label of the item in the image to be processed into the second neural network to obtain the second neural network.
  • the second neural network may be a convolutional neural network or other types of neural networks.
  • the semantic labels of the items in the image to be processed can also be called the feature information of the items in the image to be processed.
  • the semantic label of the candidate item may include at least one attribute information of the candidate item.
  • the semantic label of the candidate item may include any one or more of the following: the category of the candidate item, the style of the candidate item, the shape of the candidate item, or Other attributes of candidate items, etc., are not exhaustive here.
  • Figure 9 is a schematic diagram of the second neural network in the item matching method provided by the embodiment of the present application.
  • the server inputs the image of each candidate item and the semantic label of each candidate item into the second neural network, it performs feature extraction on the image of the candidate item through the second neural network to obtain the characteristics of the image of the candidate item. information, and performs feature extraction on the semantic labels of the candidate items to obtain the feature information of the semantic labels of the candidate items; the server fuses the feature information of the image of the candidate items and the feature information of the semantic labels of the candidate items through the second neural network, and Convolve the fused feature information to obtain the feature information corresponding to the candidate items.
  • the server After the server inputs the image to be processed and the semantic labels of the items in the image to be processed into the second neural network, it performs feature extraction on the image to be processed through the second neural network, obtains the feature information of the image to be processed, and obtains the semantic labels of the items in the image to be processed. Perform feature extraction to obtain the feature information of the semantic tag of the item in the image to be processed; the server fuses the feature information of the image to be processed and the feature information of the semantic tag through the second neural network, and convolves the fused feature information , obtain the feature information corresponding to the image to be processed.
  • the server performs the above-mentioned multiplication, fusion and other operations through the second neural network, and then outputs a matching effect of the candidate item and the image to be processed. a target score. It should be understood that the example in Figure 9 is only for convenience of understanding this solution and is not used to limit this solution.
  • a training data set may be stored on the training device, and each training data may include an image to be processed, feature information of items in the image to be processed, images of at least two candidate items, and semantic labels corresponding to each candidate item.
  • the expected result corresponding to the training data is the one of the aforementioned at least two candidate items that is most suitable for the image to be processed.
  • the training device can form a set of target data by combining the image to be processed, the characteristic information of the items in the image to be processed, the image of each candidate item, and the semantic label corresponding to the image of each candidate item. Then the training device can obtain a set of target data related to at least two At least two sets of target data corresponding one-to-one to each candidate item.
  • the training device inputs each set of target data into the second neural network to obtain a target score output by the second neural network; the training device performs the aforementioned operations on each set of at least two sets of target data through the second neural network, then we can obtain At least two target scores are in one-to-one correspondence with at least two sets of target data, that is, at least two target scores are obtained in one-to-one correspondence with at least two candidate items.
  • the training device selects the most suitable item from at least two candidate items according to the at least two target scores mentioned above and the image to be processed. An item is matched, and the previously selected item is used as the prediction result corresponding to the training data.
  • the training device generates the function value of the loss function based on the predicted results and expected results corresponding to the training data, and reversely updates the weight parameters of the second neural network, thus completing a training of the second neural network.
  • the training device uses multiple data in the training data set to iteratively train the second neural network until the convergence condition is met, and the trained second neural network is obtained.
  • the server obtains K target items corresponding to the target category, and each target item is a target category.
  • steps 306 and 307 are both optional steps. If steps 306 and 307 are executed, step 308 may include: the server selects K items from the N candidate items based on the target scores corresponding to the N candidate items.
  • Target item, K is an integer greater than or equal to 1. Among them, the candidate item with a higher target score has a greater probability of being selected.
  • a target score corresponding to N candidate items is generated through a neural network.
  • the target score indicates the matching degree between the candidate item and the image to be processed; and based on the matching degree between each candidate item and the image to be processed , select the target item that is finally displayed to the user from N candidate items. That is to say, the beauty of the matching of the candidate item and the image to be processed is quantitatively scored, and the beauty of the matching rendering is taken into consideration in the process of selecting the target item, so that the matching rendering of the target item and the image to be processed is provided to the user, which will be better-looking. It will help improve the user stickiness of this program.
  • the server can also directly obtain K target items corresponding to the target category from the item library, and the category of each target item is the target category indicated by the target category.
  • the server sends the information of the target item to the client device.
  • the server may acquire the information of each target item among the K target items, and send the information of each target item to the client device.
  • the information of each target item may include the image corresponding to the target item; optionally, the information of each target item may also include any one or more of the following information: access link, name, price, target of the target item Ratings or other types of information about items are not limited here.
  • the image corresponding to the target item may be the image of the target item itself; it may also be a matching rendering of the target item and the image to be processed generated by the server using a neural network.
  • the aforementioned matching renderings can be in pure image format, renderings after VR modeling, renderings after AR modeling, or other formats, etc., and are not limited here.
  • Figure 10 is a schematic diagram of the matching effect diagram of the target item and the image to be processed in the item matching method provided by the embodiment of the present application.
  • the sub-schematic diagram on the left shows the image to be processed
  • the two sub-schematic diagrams on the right respectively show two different target items and two matching renderings of the image to be processed. It should be understood that Figure The examples in 10 are only for the convenience of understanding this solution and are not used to limit this solution.
  • the client device displays K target items corresponding to one target category to the user.
  • the client device after acquiring the information of each of the K target items sent by the server, the client device will display the K target items corresponding to the one target category to the user.
  • the client device can show the user the image corresponding to each target item; the image corresponding to the target item can be an image of the target item, or a matching effect diagram of each target item and the image to be processed; for the matching effect
  • the image corresponding to the target item can be an image of the target item, or a matching effect diagram of each target item and the image to be processed; for the matching effect
  • the client device can display to the user the matching effect diagram of the items of each target category and the image to be processed, so that the user can more intuitively experience the matching effect of the items of the target category applied to the image to be processed. It will help improve the user stickiness of this program.
  • the client device can also display any one or more of the following information about each target item to the user: access links, names, prices, target ratings or other types of information about the items in each target category, etc., There are no limitations here.
  • FIG. 11 is a schematic flowchart of a method for matching items provided by an embodiment of the present application.
  • the client device displays three candidate intentions to the user, namely the decorative paintings and pendants in Figure 11 and lighting, the client device sends feedback information to the server based on the user's selection operation of the candidate intention "decorative painting", and the aforementioned feedback information is used to instruct the server that the target category is "decorative painting”.
  • the server Based on the target category "decorative painting", the server sends information about two different decorative paintings (ie, target items) to the client device.
  • the information of each decorative painting includes the matching renderings of the decorative painting and the image to be processed, the name of the decorative painting, the price of the decorative painting, and the size of the decorative painting.
  • the example in Figure 11 shows the item from the perspective of the customer's device The implementation process of the matching method. The example in Figure 11 is only for convenience of understanding this solution and is not used to limit this solution.
  • Figure 12 is a schematic flowchart of a method of matching items provided by an embodiment of the present application.
  • the method of matching items provided by an embodiment of the present application may include:
  • the client device obtains the image to be processed input by the user.
  • the client device obtains the target text information input by the user, and the target text information is used to indicate the user's search intention.
  • the client device inputs the image to be processed into the third neural network to perform feature extraction on the image to be processed through the third neural network to obtain target feature information corresponding to the image to be processed.
  • the target feature information includes at least the characteristics of the items in the image to be processed. Feature information and feature information of the image to be processed.
  • the client device inputs the text information into the fourth neural network to extract features of the text information through the fourth neural network to obtain feature information of the text information.
  • the client device Based on the characteristic information of the image to be processed and the characteristic information of at least two items, the client device obtains a target category that has a matching relationship with the image to be processed through the first neural network.
  • steps 1201 to 1205 for the specific implementation of steps 1201 to 1205, please refer to the description of steps 301 to 305 in the corresponding embodiment of Figure 3. The difference is that in the corresponding embodiment of Figure 3, steps 303 to 305 are executed by the server, while Figure 12 corresponds to In the embodiment, steps 1203 to 1205 are executed by the client device, and will not be described again here.
  • the client device sends the target category to the server.
  • the server obtains N candidate items, each of which is a target category.
  • the server generates target scores corresponding to the N candidate items through the second neural network.
  • the target scores indicate the matching degree between the candidate items and the image to be processed.
  • the server obtains K target items corresponding to the target category, and each target item is a target category.
  • the server sends the information of the target item to the client device.
  • the client device displays K target items corresponding to one target category to the user.
  • Figure 13 is a schematic flowchart of a method of matching items provided by an embodiment of the present application.
  • the method of matching items provided by an embodiment of the present application may include:
  • the client device obtains the image to be processed input by the user.
  • the client device obtains the target text information input by the user, and the target text information is used to indicate the user's search intention.
  • the client device inputs the image to be processed into the third neural network to perform feature extraction on the image to be processed through the third neural network to obtain target feature information corresponding to the image to be processed.
  • the target feature information includes features of the items in the image to be processed. information and feature information of the image to be processed.
  • the client device inputs the text information into the fourth neural network to extract features of the text information through the fourth neural network to obtain feature information of the text information.
  • steps 1301 to 1304 for the specific implementation of steps 1301 to 1304, please refer to the description of steps 301 to 304 in the corresponding embodiment of Figure 3. The difference is that in the corresponding embodiment of Figure 3, steps 303 and 304 are executed by the server, while Figure 13 corresponds to In the embodiment, steps 1303 and 1304 are executed by the client device, and will not be described again here.
  • the client device may send the target feature information to the server; optionally, the client device sends the target feature information and the feature information of the text information to the server.
  • the server Based on the characteristic information of the image to be processed and the characteristic information of at least two items, the server obtains a target category that has a matching relationship with the image to be processed through the first neural network.
  • the server obtains N candidate items corresponding to the target category, and each candidate item is the target category.
  • the server generates target scores corresponding to the N candidate items through the second neural network.
  • the target scores indicate the matching degree between the candidate items and the image to be processed.
  • the server obtains K target items corresponding to the target category, and each target item is a target category.
  • the server sends the information of the target item to the client device.
  • the client device displays K target items corresponding to one target category to the user.
  • steps 1305 to 1310 for the specific implementation of steps 1305 to 1310, please refer to the description of steps 305 to 310 in the corresponding embodiment in Figure 3, and will not be described again here.
  • the user can provide an image of the scene used by the item to be searched (that is, the above-mentioned image to be processed), and then a target category that has a matching relationship with the entire image to be processed can be obtained through the first neural network , and then display the items of the target category to the user; through the above solution, the user can not only search for the items they want to match by providing the image to be processed, but also when the user inputs a complex image to be processed (that is, including at least two items) image), it is still possible to obtain a target category of items that has a matching relationship with the entire image to be processed, which greatly expands the application scenarios of this solution and is conducive to improving the user stickiness of this solution; in addition, based on the entire image to be processed The characteristic information and the characteristic information of the items in the image to be processed determine a target category that has a matching relationship with the entire image to be processed, that is, not only the information of the entire image to be processed is considered, but also each item in the image
  • FIG 14 is a schematic structural diagram of an item matching device provided by an embodiment of the present application.
  • the item matching device 1400 is applied to the client device in the item matching system.
  • the item matching system also includes a server.
  • the matching device 1400 includes: an acquisition module 1401, which is used to acquire an image input by a user, in which there is a background and at least two items; a receiving module 1402, which is used to receive a target category of items sent by the server that has a matching relationship with the image.
  • the items of the category are obtained by the server based on the feature information of the image and the feature information of at least two items; the display module 1403 is used to display the items of the target category.
  • the characteristic information of the image includes the overall characteristic information composed of the background and at least two items.
  • the characteristic information of the at least two items includes attribute information of each item.
  • the attribute information of each item includes the following Any one or more types of information: the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.
  • the receiving module 1402 is also used to receive M candidate intentions corresponding to the image sent by the server.
  • M is an integer greater than or equal to 2.
  • Each candidate intention indicates a type that has a collocation relationship with the image.
  • category of items the display module 1403 is also used to display M candidate intentions; the acquisition module 1401 is also used to obtain the feedback operations corresponding to the M candidate intentions, and determine the characteristics of the image based on the feedback operations for the M candidate intentions.
  • a target category of collocation relationships is also used to receive M candidate intentions corresponding to the image sent by the server.
  • M is an integer greater than or equal to 2.
  • Each candidate intention indicates a type that has a collocation relationship with the image.
  • category of items the display module 1403 is also used to display M candidate intentions
  • the acquisition module 1401 is also used to obtain the feedback operations corresponding to the M candidate intentions, and determine the characteristics of the image based on the feedback operations for the M candidate intentions.
  • a target category of collocation relationships is also used to obtain the feedback operations corresponding to the
  • the display module 1403 is specifically used to display the matching renderings of items and images of the target category.
  • Figure 15 is a schematic structural diagram of an item matching device provided by an embodiment of the present application.
  • the item matching device 1500 is applied to the server in the item matching system.
  • the item matching system also includes client equipment.
  • the matching device 1500 includes: an acquisition module 1501, configured to acquire a target category of items that has a matching relationship with the image through a first neural network based on the feature information of the image and the feature information of at least two items, where there is a background in the image and at least two items; a sending module 1502 configured to send items of the target category to the client device.
  • the characteristic information of the image includes the overall characteristic information composed of the background and at least two items.
  • the characteristic information of the at least two items includes attribute information of each item.
  • the attribute information of each item includes the following Any one or more types of information: the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.
  • the acquisition module 1501 is specifically used for:
  • M candidate intentions corresponding to the image are generated through the first neural network, M is an integer greater than or equal to 2, and each candidate intention indicates a matching relationship with the image.
  • items of various categories sends M candidate intentions to the client device, and the M candidate intentions are used for the client device to obtain a target category that has a matching relationship with the image; receives the target category sent by the client device.
  • the acquisition module 1501 is specifically used for:
  • N candidate items that have a matching relationship with the image are obtained.
  • Each candidate item is a target category, and N is an integer greater than 1;
  • scores corresponding to the N candidate items are generated, and the scoring instructions are Matching degree between candidate items and images;
  • the sending module is specifically used to send K target items to the client device.
  • FIG. 16 is a schematic structural diagram of a client device provided by an embodiment of the present application.
  • the client device 1600 can be embodied as a mobile phone, a tablet, a notebook computer, Smart wearable devices, smart robots or smart homes, etc. are not limited here.
  • the client device 1600 includes: a receiver 1601, a transmitter 1602, a processor 1603 and a memory 1604 (the number of processors 1603 in the client device 1600 can be one or more, one processor is taken as an example in Figure 16) , wherein the processor 1603 may include an application processor 16031 and a communication processor 16032.
  • the receiver 1601, the transmitter 1602, the processor 1603, and the memory 1604 may be connected by a bus or other means.
  • Memory 1604 may include read-only memory and random access memory and provides instructions and data to processor 1603 .
  • a portion of memory 1604 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1604 stores processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for implementing various operations.
  • Processor 1603 controls the operation of the client device.
  • various components of the customer equipment are coupled together through a bus system.
  • the bus system may also include a power bus, a control bus, a status signal bus, etc.
  • various buses are called bus systems in the figure.
  • the methods disclosed in the above embodiments of the present application can be applied to the processor 1603 or implemented by the processor 1603.
  • the processor 1603 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 1603 .
  • the above-mentioned processor 1603 can be a general processor, a digital signal processor (DSP), a microprocessor or a microcontroller, and can further include an application specific integrated circuit (ASIC), a field programmable Gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • the processor 1603 can implement or execute each method, step and logical block diagram disclosed in the embodiment of this application.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory 1604.
  • the processor 1603 reads the information in the memory 1604 and completes the steps of the above method in combination with its hardware.
  • the receiver 1601 may be used to receive input numeric or character information and generate signal inputs related to relevant settings and functional controls of the client device.
  • the transmitter 1602 can be used to output numeric or character information through the first interface; the transmitter 1602 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1602 also Can include display devices such as display screens.
  • the processor 1603 is used to execute the item matching method executed by the client device in the corresponding embodiment of FIG. 2b to FIG. 13 .
  • the application processor 16031 is used to obtain an image input by the user, in which there is a background and at least two items; receive a target category of items sent by the server that has a matching relationship with the image, and the target category of items is the server's image based on The characteristic information of the item and the characteristic information of at least two items are obtained; items of the target category are displayed.
  • FIG. 17 is a schematic structural diagram of the server provided by the embodiment of the present application.
  • the server 1700 is implemented by one or more servers.
  • the server 1700 can be configured or There is a relatively large difference due to different performance, which may include one or more central processing units (CPU) 1722 (for example, one or more processors) and memory 1732, and one or more storage applications 1742 or data 1744 storage medium 1730 (eg, one or more mass storage devices).
  • the memory 1732 and the storage medium 1730 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1730 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server.
  • the central processor 1722 may be configured to communicate with the storage medium 1730 and execute a series of instruction operations in the storage medium 1730 on the server 1700 .
  • Server 1700 may also include one or more power supplies 1726, one or more wired or wireless network interfaces 1750, one or more input and output interfaces 1758, and/or, one or more operating systems 1741, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
  • operating systems 1741 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
  • the central processing unit 1722 is used to execute the item matching method executed by the server in the corresponding embodiment of FIGS. 2b to 13 .
  • the central processor 1722 is configured to obtain a target category of items that has a collocation relationship with the image through the first neural network based on the feature information of the image and the feature information of at least two items, where there are background and At least two items; items of the target category are sent to the client device.
  • Embodiments of the present application also provide a computer program product.
  • the computer program product includes a program.
  • the program When the program is run on a computer, it causes the computer to execute the methods executed by the client device in the methods described in the embodiments shown in Figures 2b to 13. or, causing the computer to perform the steps performed by the server in the method described in the embodiments shown in FIGS. 2b to 13 .
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a program.
  • the program When the program is run on a computer, it causes the computer to execute the foregoing description of the embodiments shown in Figures 2b to 13.
  • the steps performed by the client device in the method, or causing the computer to perform the steps described in the embodiments shown in Figures 2b to 13 The steps performed by the server in the method described above.
  • the client device, server or item matching device provided by the embodiment of the present application may specifically be a chip.
  • the chip includes: a processing unit and a communication unit.
  • the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface. , pins or circuits, etc.
  • the processing unit can execute computer execution instructions stored in the storage unit, so that the chip executes the matching method of items described in the embodiments shown in FIGS. 2b to 13 .
  • the storage unit is a storage unit within the chip, such as a register, cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • Figure 18 is a structural schematic diagram of a chip provided by an embodiment of the present application.
  • the chip can be represented as a neural network processor NPU 180.
  • the NPU 180 serves as a co-processor and is mounted to the main CPU (Host). CPU), tasks are allocated by the Host CPU.
  • the core part of the NPU is the arithmetic circuit 1803.
  • the arithmetic circuit 1803 is controlled by the controller 1804 to extract the matrix data in the memory and perform multiplication operations.
  • the computing circuit 1803 includes multiple processing units (Process Engine, PE).
  • arithmetic circuit 1803 is a two-dimensional systolic array.
  • the arithmetic circuit 1803 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition.
  • arithmetic circuit 1803 is a general-purpose matrix processor.
  • the arithmetic circuit obtains the corresponding data of matrix B from the weight memory 1802 and caches it on each PE in the arithmetic circuit.
  • the operation circuit takes matrix A data and matrix B from the input memory 1801 to perform matrix operations, and the partial result or final result of the matrix is stored in an accumulator (accumulator) 1808 .
  • the unified memory 1806 is used to store input data and output data.
  • the weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 1805, and the DMAC is transferred to the weight memory 1802.
  • Input data is also transferred to unified memory 1806 via DMAC.
  • DMAC Direct Memory Access Controller
  • BIU is the Bus Interface Unit, that is, the bus interface unit 1810, which is used for the interaction between the AXI bus and the DMAC and the Instruction Fetch Buffer (IFB) 1809.
  • IFB Instruction Fetch Buffer
  • the bus interface unit 1810 (Bus Interface Unit, BIU for short) is used to fetch the memory 1809 to obtain instructions from the external memory, and is also used for the storage unit access controller 1805 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • BIU Bus Interface Unit
  • DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1806 or the weight data to the weight memory 1802 or the input data to the input memory 1801 .
  • the vector calculation unit 1807 includes multiple arithmetic processing units, and if necessary, further processes the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc.
  • vector calculation unit 1807 can store the processed output vectors to unified memory 1806 .
  • the vector calculation unit 1807 can apply a linear function and/or a nonlinear function to the output of the operation circuit 1803, such as linear interpolation on the feature plane extracted by the convolution layer, or a vector of accumulated values, to generate an activation value.
  • vector calculation unit 1807 generates normalized values, pixel-wise summed values, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 1803, such as for use in a subsequent layer in a neural network.
  • the instruction fetch buffer 1809 connected to the controller 1804 is used to store instructions used by the controller 1804;
  • the unified memory 1806, the input memory 1801, the weight memory 1802 and the fetch memory 1809 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • each layer in the first neural network, the second neural network, the third neural network and the fourth neural network shown in the method embodiments corresponding to Figures 2b to 13 can be performed by the operation circuit 1803 or the vector calculation unit 1807 implement.
  • the processor mentioned in any of the above places may be a general central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control program execution of the method of the first aspect.
  • the device embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separate.
  • the physical unit can be located in one place, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the connection relationship between modules indicates that there are communication connections between them, which can be specifically implemented as one or more communication buses or signal lines.
  • the present application can be implemented by software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions performed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to implement the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. circuit etc. However, for this application, software program implementation is a better implementation in most cases. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology.
  • the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to cause a computer device (which can be a personal computer, training device, or network device, etc.) to execute the steps described in various embodiments of this application. method.
  • a computer device which can be a personal computer, training device, or network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, the computer instructions may be transferred from a website, computer, training device, or data
  • the center transmits to another website site, computer, training equipment or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that a computer can store or may be a training device, data, or data integrated with one or more available media. Center and other data storage equipment.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (Solid State Disk, SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé d'appariement d'articles et un dispositif associé. Au moyen du procédé, la technologie d'intelligence artificielle peut être appliquée au champ technique de la recherche d'articles. Le procédé consiste : à acquérir une image qui est entrée par un utilisateur, tel qu'un arrière-plan et au moins deux articles sont présents dans l'image ; sur la base d'informations d'attributs de l'image et d'informations d'attributs des au moins deux articles, à acquérir, au moyen d'un premier réseau neuronal, une catégorie cible ayant une relation d'appariement avec l'image ; et à afficher un article cible de la catégorie cible. Non seulement un article à apparier peut être recherché en présentant une image, mais encore lorsqu'une image complexe est entrée, un article d'une catégorie cible ayant une relation d'appariement avec l'image entière peut quand même être acquis, étendant ainsi grandement les scénarios d'application de la solution selon la présente invention, et facilitant une amélioration de la viscosité d'utilisateur de la solution selon la présente invention.
PCT/CN2023/084241 2022-03-31 2023-03-28 Procédé d'appariement d'articles et dispositif associé Ceased WO2023185787A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210333006.5A CN116932804A (zh) 2022-03-31 2022-03-31 一种物品的搭配方法以及相关设备
CN202210333006.5 2022-03-31

Publications (1)

Publication Number Publication Date
WO2023185787A1 true WO2023185787A1 (fr) 2023-10-05

Family

ID=88199121

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/084241 Ceased WO2023185787A1 (fr) 2022-03-31 2023-03-28 Procédé d'appariement d'articles et dispositif associé

Country Status (2)

Country Link
CN (1) CN116932804A (fr)
WO (1) WO2023185787A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119205279A (zh) * 2024-11-26 2024-12-27 贝壳找房(北京)科技有限公司 图像处理方法、设备、介质及程序产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095362A (zh) * 2015-06-25 2015-11-25 深圳码隆科技有限公司 一种基于目标对象的图像显示方法和装置
CN109583514A (zh) * 2018-12-19 2019-04-05 成都西纬科技有限公司 一种图像处理方法、装置及计算机存储介质
CN110909746A (zh) * 2018-09-18 2020-03-24 深圳云天励飞技术有限公司 一种服饰推荐方法、相关装置和设备
CN111401306A (zh) * 2020-04-08 2020-07-10 青岛海尔智能技术研发有限公司 用于衣物穿搭推荐的方法及装置、设备
US20210303914A1 (en) * 2020-11-11 2021-09-30 Beijing Baidu Netcom Science And Technology Co., Ltd. Clothing collocation

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426462A (zh) * 2015-11-13 2016-03-23 深圳码隆科技有限公司 一种基于图像元素的图像搜索方法和装置
CN109903103B (zh) * 2017-12-07 2021-08-20 华为技术有限公司 一种推荐物品的方法和装置
CN108829764B (zh) * 2018-05-28 2021-11-09 腾讯科技(深圳)有限公司 推荐信息获取方法、装置、系统、服务器及存储介质
CN111400525B (zh) * 2020-03-20 2023-06-16 中国科学技术大学 基于视觉组合关系学习的时尚服装智能搭配与推荐方法
CN113961833A (zh) * 2021-10-20 2022-01-21 维沃移动通信有限公司 信息搜索方法、装置及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095362A (zh) * 2015-06-25 2015-11-25 深圳码隆科技有限公司 一种基于目标对象的图像显示方法和装置
CN110909746A (zh) * 2018-09-18 2020-03-24 深圳云天励飞技术有限公司 一种服饰推荐方法、相关装置和设备
CN109583514A (zh) * 2018-12-19 2019-04-05 成都西纬科技有限公司 一种图像处理方法、装置及计算机存储介质
CN111401306A (zh) * 2020-04-08 2020-07-10 青岛海尔智能技术研发有限公司 用于衣物穿搭推荐的方法及装置、设备
US20210303914A1 (en) * 2020-11-11 2021-09-30 Beijing Baidu Netcom Science And Technology Co., Ltd. Clothing collocation

Also Published As

Publication number Publication date
CN116932804A (zh) 2023-10-24

Similar Documents

Publication Publication Date Title
CN113095475B (zh) 一种神经网络的训练方法、图像处理方法以及相关设备
US10346893B1 (en) Virtual dressing room
US9875258B1 (en) Generating search strings and refinements from an image
US10032072B1 (en) Text recognition and localization with deep learning
US9607010B1 (en) Techniques for shape-based search of content
US20200104633A1 (en) Methods and apparatus for recommending collocating dress, electronic devices, and storage media
CN112905889B (zh) 服饰搜索方法及装置、电子设备和介质
US9830534B1 (en) Object recognition
WO2018118803A1 (fr) Représentation de catégorie visuelle avec classement divers
CN109409994A (zh) 模拟用户穿戴服装饰品的方法、装置和系统
CN111414915A (zh) 一种文字识别方法以及相关设备
CN108229559A (zh) 服饰检测方法、装置、电子设备、程序和介质
CN111950702A (zh) 一种神经网络结构确定方法及其装置
CN113159315A (zh) 一种神经网络的训练方法、数据处理方法以及相关设备
WO2024002167A1 (fr) Procédé de prédiction d'opération et appareil associé
WO2024041483A1 (fr) Procédé de recommandation et dispositif associé
CN114821096A (zh) 一种图像处理方法、神经网络的训练方法以及相关设备
WO2023231753A1 (fr) Procédé d'apprentissage de réseau neuronal, procédé de traitement de données et dispositif
Magassouba et al. Predicting and attending to damaging collisions for placing everyday objects in photo-realistic simulations
WO2023246735A1 (fr) Procédé de recommandation d'article et dispositif connexe associé
CN113627421B (zh) 一种图像处理方法、模型的训练方法以及相关设备
US20240203143A1 (en) Prompt tuning for zero-shot compositional learning in machine learning systems
WO2023185787A1 (fr) Procédé d'appariement d'articles et dispositif associé
CN116910201A (zh) 一种对话数据生成方法及其相关设备
CN115618950A (zh) 一种数据处理方法及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23778152

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 23778152

Country of ref document: EP

Kind code of ref document: A1