WO2025067005A1 - Method for picture retrieval and related apparatus - Google Patents
Method for picture retrieval and related apparatus Download PDFInfo
- Publication number
- WO2025067005A1 WO2025067005A1 PCT/CN2024/119642 CN2024119642W WO2025067005A1 WO 2025067005 A1 WO2025067005 A1 WO 2025067005A1 CN 2024119642 W CN2024119642 W CN 2024119642W WO 2025067005 A1 WO2025067005 A1 WO 2025067005A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- image
- picture
- cloud
- target image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
- G06F16/532—Query formulation, e.g. graphical querying
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
- G06F16/538—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/60—Editing figures and text; Combining figures or text
Definitions
- the present application relates to the field of cloud computing technology, and in particular to a method and related device for image retrieval.
- Retrieval is also called search.
- search engine or shopping platform After the user enters keywords in the search box, the search platform will return the corresponding search results.
- Image search is to search based on the image input by the user and return images similar to the input image, which is also called image search.
- image retrieval can narrow the search scope compared to text retrieval, it will still retrieve a lot of images. Users need to browse the returned images one by one to find the desired images, and accurate retrieval cannot be achieved.
- the present application provides a method for image retrieval, which is used to improve the accuracy of image retrieval.
- the present application also provides a corresponding device, system, computer-readable storage medium, and computer program product.
- the first aspect of the present application provides a method for image retrieval, including: generating multiple preview images based on a first image and a first text from a client, wherein the first text is used to adjust the first image; sending the multiple preview images to the client; retrieving a second image associated with a target image based on operation information from the client, wherein the target image is included in the multiple preview images; and sending the second image to the client.
- the cloud can be software or services of a cloud platform, or software or services deployed on a node in a network such as an edge node.
- the cloud can run on an independent physical machine or on virtualized resources.
- the client can be a terminal device or an application, for example, the application runs on the terminal device for the user to use.
- the application can be a search engine, a response application, or a shopping application.
- the user can input the first image and the first text on the client, and use the first text to guide the cloud to adjust the first image, thereby obtaining multiple preview images.
- the user can select a suitable target image from the preview images, and the cloud will perform searches based on the target images, which can narrow the search scope and improve the accuracy of image retrieval.
- the above steps: generating multiple preview images based on the first image and the first text from the client include: expanding the first text to obtain multiple extended texts; generating a preview image corresponding to each extended text based on each extended text and the first image.
- the cloud can expand around the content in the first text, for example, if the first text is a flower pattern configured on the first picture, the text can be expanded around the type of flower, the position of the flower on the first picture, etc.
- the preview picture obtained can provide preview pictures that meet the user's requirements in the first text from multiple angles, thereby increasing the richness of the preview pictures within the scope of the user's requirements.
- the multiple extended texts are texts in a text design database whose similarity with the first text satisfies a first condition.
- the text design database can be pre-built and continuously updated, and the text in the text design database can be collected based on historical text retrieval records.
- the text with high similarity to the first text can be searched in the text design database through the first text.
- the first condition can be a threshold value. Taking the value of the similarity in the interval of 0 to 1 as an example, the threshold value can be 0.5 or other numerical values. Taking 0.5 as an example, texts with a similarity greater than 0.5 can be understood as texts whose similarity to the first text meets the first condition.
- the first condition can also be a restriction condition on the number of texts, such as: the top N texts with the highest similarity to the first text are texts whose similarity to the first text meets the first condition. It can be seen from this possible implementation that a better extended text can be obtained through the text design database, thereby improving the user satisfaction of the preview image.
- the multiple extended texts are texts in a text design database whose similarity and design popularity with the first text meet a second condition.
- the first text can be used to search for texts with high similarity to the first text in the text design database. And the popularity of these texts with high similarity can be determined, and then the extended text can be determined by combining the similarity and popularity.
- the popularity of each text in the text design database can be counted in advance by the number of times it is used.
- the value of popularity can be a value between 0 and 1.
- the second condition can include one threshold or two thresholds. When it is a threshold, the similarity and popularity of the text can be weighted to obtain a weighted value (the weight of the similarity is usually greater than the popularity). If the weighted value is greater than this threshold, it means that the text meets the second condition and belongs to the extended text of the first text.
- one threshold can be a similarity threshold, such as: 0.5; one threshold is a threshold for the number of extended texts, such as: N, N is a positive integer. If the similarity between the found text and the first text is greater than 0.5, the extended texts with similarity greater than 0.5 can be sorted according to the size of popularity, and then the N extended texts with the largest popularity are selected as the final selected extended texts. In this application, the accuracy of the extended text can be improved by similarity and popularity.
- the second image is an image retrieved from an image vector database.
- the picture vector database can be pre-built and continuously updated, and the pictures in the picture vector database can be collected from the Internet and continuously updated.
- the second picture can also be searched from the Internet. In this application, by retrieving the second picture from the picture vector database, the speed of retrieving the second picture can be improved.
- the method is applied to a response system based on a large language model (LLM); the operation information includes a target image, or the operation information includes a target image and a second text.
- LLM large language model
- image retrieval can be performed based on the target image.
- retrieval can also be performed based on the target image and the extended text corresponding to the target image.
- the target image can be used as an input image for the second image editing, and the second text can guide the cloud to edit the target image, and further provide the user with a second wave of preview images until the user selects a satisfactory image.
- the above step: retrieving a second image associated with the target image includes: calling an image editing retrieval model through LLM, and retrieving the associated second image based on the extended text corresponding to the target image and the target image.
- the image editing retrieval model retrieves the second image based on the extended text corresponding to the target image and the target image, which can improve the accuracy of the retrieved second image.
- retrieving the second image associated with the target image includes: calling the image editing retrieval model through LLM, and retrieving the associated second image based on the extended text corresponding to the target image, the second text and the target image.
- the image editing retrieval model retrieves the second image based on the extended text corresponding to the target image, the second text and the target image, which can improve the accuracy of the retrieved second image.
- the above step: retrieving the second image associated with the target image includes: calling the picture editing generation model through LLM to generate multiple preview images associated with the target image and the second text.
- the picture editing generation model can obtain multiple preview images associated with the target image and the second text based on the multiple extended texts obtained by expanding the second text and the target image to determine the second image.
- the image editing generation model refers to a model that can use the second text and the target image to further generate the next wave of preview images.
- the generation process can be understood by referring to the process of generating preview images from the first text and the first image. In this way, multiple rounds of generating preview images can enable users to obtain satisfactory preview images, thereby improving the accuracy of the retrieved second image.
- the LLM-based answering system also includes a text analysis model, and the text analysis model is used to analyze the second text to determine whether the target image meets the user's expectations.
- the second text may instruct the LLM to call the picture editing retrieval model or the picture editing generation model.
- the text analysis model may learn whether the user is satisfied with the target picture by analyzing the second text. If the user is not satisfied, the text analysis model may instruct the LLM to call the picture editing generation model to continue to generate the next wave of preview pictures. If the user is satisfied, the text analysis model may instruct the LLM to call the picture editing retrieval model to generate the second picture. This can improve the efficiency of communication with users.
- the method when receiving the second picture from the cloud, the method further includes: receiving link information associated with the second picture from the cloud.
- the link information associated with the second picture may be description information of the product in the second picture and a purchase link, etc. In this way, the communication efficiency with the user can be improved.
- a second aspect of the present application provides a method for image retrieval, including: sending a first image and a first text input by a user to a cloud, wherein the first text is used to adjust the first image; receiving multiple preview images from the cloud, wherein the multiple preview images are generated based on the first image and the first text; sending operation information input by the user to the cloud, wherein the operation information is used to retrieve a second image associated with a target image, wherein the target image is included in the multiple preview images; and receiving the second image from the cloud.
- the user can input the first image and the first text on the client, and use the first text to guide the cloud to adjust the first image, thereby obtaining multiple preview images.
- the user can select a suitable target image from the preview images, and the cloud will perform searches based on the target images, which can narrow the search scope and improve the accuracy of image retrieval.
- the method is applied to a response system based on a large language model (LLM); the operation information includes a target image, or the operation information includes a target image and a second text.
- LLM large language model
- image retrieval can be performed based on the target image.
- retrieval can also be performed based on the target image and the extended text corresponding to the target image.
- the target image can be used as an input image for the second image editing, and the second text can guide the cloud to edit the target image, and further provide the user with a second wave of preview images until the user selects a satisfactory image.
- the second image is retrieved based on the extended text of the first text and the target image by calling the image editing retrieval model through the LLM.
- the image editing retrieval model is based on the extended text corresponding to the target image and the target retrieval second image, which can improve the accuracy of the retrieved second image.
- the second image is retrieved by calling the image editing retrieval model through LLM based on the extended text corresponding to the target image, the second text and the target image.
- the image editing retrieval model retrieves the second image based on the extended text corresponding to the target image, the second text and the target image, which can improve the accuracy of the retrieved second image.
- the method also includes: receiving multiple preview images associated with the target image and the second text from the cloud, the multiple preview images associated with the target image and the second text are obtained by LLM calling a picture editing generation model, the picture editing generation model obtains the multiple preview images associated with the target image and the second text based on multiple extended texts of the second text and the target image; sending a third image to the cloud, or a third image and a third text; the third image, or the third image and the third text are used to determine the second image, and the third image is included in the multiple preview images associated with the target image and the second text.
- the image editing generation model refers to a model that can use the extended text of the second text and the target image to further generate the next wave of preview images.
- the generation process can be understood by referring to the process of generating preview images from the first text and the first image. In this way, multiple rounds of generating preview images can enable users to obtain satisfactory preview images, thereby improving the accuracy of the retrieved second image.
- the method when: receiving the second picture from the cloud, the method further includes: receiving link information associated with the second picture from the cloud.
- the link information associated with the second picture may be description information of the product in the second picture and a purchase link, etc. In this way, the communication efficiency with the user can be improved.
- a cloud device for executing the method in the first aspect or any possible implementation of the first aspect.
- the cloud device includes a module or unit for executing the method in the first aspect or any possible implementation of the first aspect, such as a processing unit and a sending unit.
- a client for executing the method in the second aspect or any possible implementation of the second aspect.
- the client includes a module for executing the method in the second aspect or any possible implementation of the second aspect.
- Blocks or units such as a sending unit and a receiving unit.
- the fifth aspect of the present application provides a cloud device, including a transceiver, a processor and a memory, wherein the transceiver and the processor are coupled to the memory, and the memory is used to store programs or instructions.
- the client executes the method in the aforementioned first aspect or any possible implementation of the first aspect.
- a client in a sixth aspect of the present application, may include at least one processor, a memory, and a communication interface.
- the processor is coupled to the memory and the communication interface.
- the memory is used to store instructions
- the processor is used to execute the instructions
- the communication interface is used to communicate with other network elements under the control of the processor.
- the seventh aspect of the present application provides a chip system, which includes one or more interface circuits and one or more processors; the interface circuit and the processor are interconnected by lines; the interface circuit is used to receive signals from the memory of the cloud device and send signals to the processor, and the signals include computer instructions stored in the memory; when the processor executes the computer instructions, the cloud device executes the method in the aforementioned first aspect or any possible implementation of the first aspect.
- a chip system which includes one or more interface circuits and one or more processors; the interface circuits and the processors are interconnected through lines; the interface circuits are used to receive signals from a client's memory and send signals to the processor, the signals including computer instructions stored in the memory; when the processor executes the computer instructions, the client executes the method in the aforementioned second aspect or any possible implementation of the second aspect.
- the ninth aspect of the present application provides a computer-readable storage medium on which a computer program or instruction is stored.
- the computer program or instruction is executed on a computer device, the computer device executes the method in the aforementioned first aspect or any possible implementation of the first aspect.
- the tenth aspect of the present application provides a computer-readable storage medium on which a computer program or instruction is stored.
- the computer device executes the method in the aforementioned second aspect or any possible implementation of the second aspect.
- a computer device program product which includes a computer device program code.
- the computer device program code When the computer device program code is executed on a computer device, the computer device executes the method in the aforementioned first aspect or any possible implementation of the first aspect.
- the twelfth aspect of the present application provides a computer device program product, which includes a computer device program code.
- the computer device program code When the computer device program code is executed on a computer device, the computer device executes the method in the aforementioned second aspect or any possible implementation of the second aspect.
- the thirteenth aspect of the present application provides a computer device cluster, comprising at least one computer device, each computer device comprising a processor and a memory; the processor of at least one computer device is used to execute instructions stored in the memory of at least one computer device, so that the computer device cluster executes the method in the aforementioned first aspect or any possible implementation of the first aspect.
- a retrieval system which includes a client and a cloud device, the cloud device is used to execute the method in the aforementioned first aspect or any possible implementation of the first aspect, and the client is used to execute the method in the aforementioned second aspect or any possible implementation of the second aspect.
- the technical effects brought about by the third to fourteenth aspects or any possible implementation methods thereof can refer to the technical effects brought about by the first aspect or different possible implementation methods of the first aspect, and will not be repeated here.
- FIG1A is a schematic diagram of an architecture of a retrieval system provided in an embodiment of the present application.
- FIG1B is a schematic diagram of image retrieval in a shopping scenario provided by an embodiment of the present application.
- FIG1C is a schematic diagram of an architecture of a cloud system provided in an embodiment of the present application.
- FIG2A is a schematic diagram of a structure of a terminal device provided in an embodiment of the present application.
- FIG2B is a schematic diagram of a structure of a cloud device provided in an embodiment of the present application.
- FIG3 is a schematic diagram of an embodiment of a method for image retrieval provided in an embodiment of the present application.
- FIG4 is a schematic diagram of an example of an input interface of a client in an embodiment of the present application.
- FIG5 is a schematic diagram of an example scenario of generating a preview image in an embodiment of the present application.
- FIG6 is a schematic diagram of the relationship between a large language model and a text analysis model in an embodiment of the present application.
- FIG7 is a schematic diagram of another embodiment of image retrieval in the embodiment of the present application.
- FIG8 is a schematic diagram of an architecture of a server provided in an embodiment of the present application.
- FIG9 is a schematic diagram of an architecture of a retrieval system in an embodiment of the present application.
- FIG10 is a schematic diagram of a structure of a cloud device provided in an embodiment of the present application.
- FIG. 11 is a schematic diagram of the structure of a client provided in an embodiment of the present application.
- the embodiment of the present application provides a method for image retrieval, which is used to improve the accuracy of image retrieval.
- the present application also provides corresponding devices, systems, computer-readable storage media, and computer program products, etc. The following are detailed descriptions.
- Text retrieval The text entered by the user in the search box is used to select the document candidate set or image candidate set that is most relevant to the text from a large-scale web document library (the document library can be a library of hundreds of billions of documents).
- Image retrieval Search based on the image input by the user and return images similar to the input image; also known as image search, which can be understood as searching for images by image.
- Image editing retrieval allows users to input images and text at the same time.
- the text can be used to edit and modify local information in the image.
- the images returned by the retrieval meet the content of the text and are as similar to the input image as possible.
- Image editing generation allows users to input images and text at the same time.
- the text can be used to edit and modify local information in the image.
- a new image is generated based on the image and text. The new image meets the text requirements, and other information is consistent with the input image.
- Answering system Generally refers to a system that searches for relevant documents and then understands the document information to answer user questions.
- Conversational search and response system A system that adds multi-round conversation capabilities to the search and response system and can understand the context, thereby realizing multi-round continuous response functions.
- LLM Large language model
- LLM After training, LLM internalizes knowledge into model parameters. LLM can directly generate answers to user queries. LLM can call image retrieval model, image editing retrieval model or image editing generation model to achieve different functions.
- FIG. 1A is a schematic diagram of an architecture of a search provided in an embodiment of the present application.
- the retrieval system provided in the embodiment of the present application includes a cloud and multiple clients, and the cloud can communicate with multiple clients through a network.
- the cloud can be software or services of a cloud platform, or software or services deployed on nodes in a network such as edge nodes.
- the cloud can run on an independent physical machine or on virtualized resources.
- the client can be a terminal device or an application, for example, the application runs on a terminal device for use by a user.
- the application can be a search engine, an answering application, or a shopping application, etc.
- the retrieval system may be a browser search system, a search response system, or a shopping platform system, etc.
- the client can send a query request to the cloud, the query request includes a first text and a first picture, and the first text is used to adjust the first picture.
- the cloud-side device on the cloud can generate multiple preview pictures based on the first picture and the first text from the client, and then send the multiple preview pictures to the client; then, based on the operation information from the client, retrieve a second picture associated with the target picture, the target picture is included in the multiple preview pictures; and then send the second picture to the client.
- the first picture input by the user may be a picture 101 of “black short-sleeved shirt”, and the first text It could be "It would be better if there is a flower pattern" 102.
- the cloud can generate four preview pictures shown in Figure 1B based on the above-mentioned "black short-sleeved shirt” picture 101 and "It would be better if there is a flower pattern" 102.
- the client receives the four preview pictures, the user can select one of the four preview pictures as the target picture. For example, in Figure 1B, the user selects "black short-sleeved shirt picture with evenly distributed flowers" 103 as the target picture.
- the cloud can retrieve a suitable second picture based on the "black short-sleeved shirt picture with evenly distributed flowers" 103. As shown in Figure 1B, the cloud returns two second pictures to the client, namely pictures 104 and 105.
- the flower distribution on the short-sleeved shirt in pictures 104 and 105 is highly similar to that in picture 103.
- Picture 104 shows a short-sleeved shirt with evenly distributed black flowers
- picture 105 shows a short-sleeved shirt with evenly distributed white flowers. Users can choose their favorite products to buy by comparing colors.
- the cloud can also provide users with more color options. These short-sleeved shirts with flowers in different colors can be presented in one picture.
- the user can input the first image and the first text on the client, and use the first text to guide the cloud to adjust the first image, thereby obtaining multiple preview images.
- the user can select a suitable target image from the preview images, and the cloud will perform searches based on the target images, which can narrow the search scope and improve the accuracy of image retrieval.
- the cloud-side device in the cloud in Figure 1A can be a working node or a scheduling node in the cloud system. After the scheduling node receives a query request from the client, the scheduling node can execute the corresponding retrieval process. The scheduling node can also assign the query request to one or more working nodes in the cloud system, and the corresponding retrieval process will be executed by one or more working nodes.
- the function of the scheduling node can be implemented by software or hardware.
- the scheduling node may include code running on a computing instance.
- the computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Furthermore, the computing instance may be one or more.
- the scheduling node may include code running on multiple hosts/virtual machines/containers. It should be noted that the multiple hosts/virtual machines/containers used to run the code may be distributed in the same region or in different regions. Furthermore, the multiple hosts/virtual machines/containers used to run the code may be distributed in the same availability zone (AZ) or in different AZs, each AZ including one data center or multiple data centers with similar geographical locations. Generally, a region may include multiple AZs.
- AZ availability zone
- VPC virtual private cloud
- multiple hosts/virtual machines/containers used to run the code can be distributed in the same virtual private cloud (VPC) or in multiple VPCs.
- VPC virtual private cloud
- a VPC is set up in one region.
- a communication gateway needs to be set up in each VPC to achieve interconnection between VPCs through the communication gateway.
- the scheduling node may include at least one computing device, such as a server, etc.
- the scheduling node may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
- ASIC application-specific integrated circuit
- PLD programmable logic device
- the PLD may be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL) or any combination thereof.
- CPLD complex programmable logical device
- FPGA field-programmable gate array
- GAL generic array logic
- the multiple computing devices included in the scheduling node can be distributed in the same area or in different areas.
- the multiple computing devices included in the scheduling node can be distributed in the same AZ or in different AZs.
- the multiple computing devices included in the scheduling node can be distributed in the same VPC or in multiple VPCs.
- the multiple computing devices can be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
- a working node can be a physical machine, or a computing instance such as a virtual machine (VM) or a container.
- a working node can include one or more central processing units (CPUs) and graphics processing units (GPUs), etc.
- CPUs central processing units
- GPUs graphics processing units
- a working node can also be a CPU or a GPU.
- the client may be a terminal device.
- the terminal device also known as user equipment (UE), mobile station (MS), mobile terminal (MT), etc., is a device including a wireless communication function (providing voice/data connectivity to users), for example, a handheld device with a wireless connection function.
- some examples of terminal devices are: mobile phones, tablet computers, laptops, PDAs, laptops, wireless routers, mobile internet devices (MID), wearable devices, virtual reality (VR) devices, augmented reality (AR) devices, wireless terminals in industrial control, wireless terminals in self-driving, wireless terminals in the Internet of Vehicles, wireless terminals in remote medical surgery, wireless terminals in smart grids, wireless terminals in transportation safety, wireless terminals in smart cities, or wireless terminals in smart homes, etc.
- the wireless terminal in the Internet of Vehicles may be a vehicle-mounted device, a vehicle-mounted device, a vehicle-mounted module, a vehicle, etc.
- Industrial Control The wireless terminal in the system may be a robot, etc.
- the wireless terminal in the unmanned driving may be a drone.
- the terminal device may be a device running an Android system, an IOS system, a Windows system, or other systems.
- the terminal device may run an application that needs to render an application scene to obtain a two-dimensional image, such as a game application, a lock screen application, or a map application.
- the cloud in FIG. 1A above may be located in a cloud system, and the architecture of the cloud system may be understood by referring to FIG. 1C.
- the cloud system includes a cloud platform and basic resources.
- the cloud platform includes a cloud platform manager, and the scheduling node described above may be the cloud platform manager in FIG. 1C.
- the basic resources may include multiple servers, each of which may be a working node, or each server may include multiple working nodes.
- the working node in FIG1C may be a computing device card or a virtual machine (VM).
- the computing device card may be at least one of a central processing unit (CPU), a graphics processing unit (GPU), and a neural network processor (NPU).
- CPU central processing unit
- GPU graphics processing unit
- NPU neural network processor
- the cloud platform manager will maintain or regularly collect information about each work node in the basic resources, such as the resource usage on each work node (resource usage rate or resource idle rate), etc. This information can be used as auxiliary decision-making information when allocating query requests.
- the cloud platform manager can receive query requests from the client, and then the cloud platform manager can execute the corresponding retrieval process.
- the cloud platform manager can also assign the query request to one or more working nodes in the cloud system, and the one or more working nodes can execute the corresponding retrieval process.
- the cloud platform manager can return the preview image and the second image to the client.
- the above-mentioned client takes a terminal device as an example.
- the structure of the terminal device provided in the embodiment of the present application can be understood by referring to FIG. 2A below, and the structure of the cloud device in the cloud can be understood by referring to FIG. 2B below.
- the terminal device may include a processor 101, a transceiver 102, a memory 103, a display 104 and a bus 105.
- the processor 101, the transceiver 102, the memory 103 and the display 104 are interconnected via the bus 105.
- the processor 101 is used to control and manage the actions of the terminal device 10, for example, the processor 101 is used to respond to the process of the user inputting a query request.
- the transceiver 102 is used to support the communication of the terminal device 10, for example: the transceiver 102 can execute the steps of sending a query request and receiving a preview image and a second image.
- the memory 103 is used to store the program code and data of the terminal device 10, and the display 104 is used to display the preview image and the second image.
- the processor 101 can be a central processing unit, a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute various exemplary logic blocks, modules and circuits described in conjunction with the disclosure of this application.
- the processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
- the bus 105 can be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc.
- PCI peripheral component interconnect standard
- EISA Extended Industry Standard Architecture
- the bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, only one thick line is used in FIG. 2A, but it does not mean that there is only one bus or one
- FIG. 2A above introduces the structure of the terminal device, and the structure of the cloud device is introduced below in conjunction with FIG. 2B .
- FIG2B is a possible logical structure diagram of a cloud device provided in an embodiment of the present application.
- the cloud device 20 provided in an embodiment of the present application includes: a processor 201, a communication interface 202, a memory 203, and a bus 204.
- the processor 201, the communication interface 202, and the memory 203 are interconnected via the bus 204.
- the processor 201 is used to control and manage the actions of the cloud device 20.
- the processor 201 is used to generate multiple preview images based on a first image and a first text from a client.
- the communication interface 202 is used to support the cloud device 20 to communicate.
- the communication interface 202 can receive the steps of the first image and the first text, and send the preview image and the second image.
- the memory 203 is used to store the program code and data of the cloud device 20.
- the processor 201 can be a central processing unit, a general processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute various exemplary logic blocks, modules and circuits described in conjunction with the disclosure of this application.
- the processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
- the bus 204 can be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc.
- PCI peripheral component interconnect standard
- EISA Extended Industry Standard Architecture
- the bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, only one thick line is used in FIG. 2B, but it does not mean that there is only one bus or one type
- the content involved in the cloud execution in the method can be executed by the cloud
- the content related to client execution can be executed by the client or by a component of the client (such as a processor, chip, or chip system).
- FIG3 is a schematic diagram of an embodiment of a method for image retrieval provided in an embodiment of the present application.
- the client sends the first picture and the first text input by the user to the cloud.
- the cloud receives the first picture and the first text from the client.
- the first image and the first text may be sent through one query request or may be sent separately.
- the first text is used to adjust the first picture.
- the interface for inputting the first picture and the first text on the client can be understood by referring to Figure 4.
- the first text can be entered in the text input box 401, the first picture can be selected through the picture option box 402, and then the first picture and the first text can be sent by clicking the send button 403.
- the cloud generates multiple preview images according to the first image and the first text from the client.
- the process of step 302 may be: expanding the first text in the cloud to obtain a plurality of extended texts; and then generating a preview image corresponding to each extended text according to each extended text and the first image.
- the cloud can expand around the content in the first text.
- the first text is a flower pattern configured on the first picture
- the text can be expanded around the type of flower, the position of the flower on the first picture, etc.
- the extended text may be generated by the cloud based on the first text, or may be a text found by the cloud from a text design database whose similarity to the first text meets the first condition.
- the extended text may be a text found by the cloud from a text design database whose similarity to the first text and design popularity meet the second condition.
- the text design database may be pre-built and continuously updated.
- the texts in the text design database may be collected based on historical text retrieval records.
- the texts in the text design database may also be obtained through other means.
- the first text can be used to search for texts with high similarity to the first text in the text design database.
- the first condition can be a threshold value. Taking the similarity value in the range of 0 to 1 as an example, the threshold value can be 0.5 or other values. Taking 0.5 as an example, texts with a similarity greater than 0.5 can be understood as texts whose similarity to the first text meets the first condition.
- the first condition can also be a restriction condition on the number of texts, such as: the top N texts with the highest similarity to the first text are texts whose similarity to the first text meets the first condition. Wherein, N is an integer greater than 1.
- the first text can be used to search for texts with high similarity to the first text in the text design database. And determine the popularity of these texts with high similarity, and then, the extended text can be determined by combining the similarity and popularity.
- Each text in the text design database can be pre-stated by the number of times it is used.
- the value of popularity can be a value between 0 and 1.
- the second condition can include a threshold value or two threshold values. When it is a threshold value, the similarity and popularity of the text can be weighted to obtain a weighted value (the weight of the similarity is usually greater than the popularity). If the weighted value is greater than this threshold value, it means that the text meets the second condition and belongs to the extended text of the first text.
- one threshold value can be a similarity threshold value, such as: 0.5; one threshold value is a threshold value for the number of extended texts, such as: N, N is a positive integer. If the similarity between the text found and the first text is greater than 0.5, the extended texts with similarity greater than 0.5 can be sorted according to the size of popularity, and then the N extended texts with the largest popularity are selected as the final selected extended texts.
- the cloud can compare the similarity of the first text "It would be better with a flower pattern" with the text in the text design database. If the first condition requires that the similarity between the extended text and the first text reaches 0.7, the cloud can match four extended texts such as those shown in Figure 5 from the text design database, namely "multiple small flowers in the center”, “small flower pattern in the upper left corner”, “a red rose in the center”, and “multiple blooming flowers”.
- a picture editing generation model can be used according to each extended text and the first picture.
- the first picture and each extended text can be input into the picture editing generation model to generate different design schemes, and then a preview picture can be generated according to the different design schemes.
- the preview image can be generated by inputting the extended text and the first image into the image editing generation model.
- the preview images obtained can provide preview images that meet the user's requirements in the first text from multiple angles, thereby increasing the richness of the preview images within the scope of the user's requirements.
- the cloud sends multiple preview images to the client.
- the client receives multiple preview images from the cloud.
- the client sends the operation information input by the user to the cloud.
- the cloud receives the operation information from the client.
- the operation information is used to retrieve a second image associated with a target image, and the target image is included in a plurality of preview images.
- the operation information includes a target image, or the operation information includes a target image and a second text.
- the LLM can call a picture editing retrieval model to retrieve the second image based on the target image and the extended text corresponding to the target image.
- the answering system may also include a text analysis model.
- the LLM may call the text analysis model to analyze the second text. Then, based on the analysis result of the second text, the LLM may decide to call the image editing retrieval model or the image editing generation model.
- the text analysis model can include a sentiment analysis model and an entity extraction model; wherein the sentiment analysis model can analyze the tone and feelings of the user in the text, and the entity extraction model can analyze the entities in the text, and the entities can be objects in the text.
- the analysis result can be returned to the LLM.
- the cloud retrieves a second image associated with the target image according to the operation information from the client, where the target image is included in the plurality of preview images.
- the cloud can call the image editing retrieval model through LLM to retrieve the associated second image based on the extended text corresponding to the target image and the target image.
- the text analysis model 701 analyzes the second text to determine whether the target image meets the user's expectations. If the second text is a sentence such as "pretty good”, “not bad”, “it's what I want", etc. that indicates user satisfaction, it means that the target image meets the user's expectations. Then LLM702 calls the picture editing retrieval model 703, and the picture editing retrieval model 703 retrieves the second image according to the extended text corresponding to the target image and the target image.
- LLM702 calls the picture editing generation model 704, which generates multiple preview images associated with the target image and the second text.
- the generation process of multiple preview images associated with the target image and the second text can be understood by referring to the process of generating preview images from the first text and the first image. Then the cloud can send this wave of preview images to the client. If the user selects a satisfactory target image from this wave of preview images, the cloud can retrieve the second image based on the target image that the user is satisfied with. If the user is still not satisfied, the above process of generating preview images can be repeated until the target image that the user is satisfied with is obtained.
- the image editing retrieval model can retrieve the second image from the image vector database.
- the image vector database can be pre-built and continuously updated.
- the images in the image vector database can be collected from the Internet and can be continuously updated.
- the second picture may also be searched from the Internet. In the present application, by retrieving the second picture from the picture vector database, the speed of retrieving the second picture can be improved.
- the cloud sends the second picture to the client.
- the client receives the second picture from the cloud.
- the cloud when the cloud sends the second picture, it can also send the link information associated with the second picture.
- the client when the client receives the second picture, it also receives the link information associated with the second picture from the cloud.
- the link information associated with the second picture may be description information of the product in the second picture and a purchase link, etc. In this way, the communication efficiency with the user can be improved.
- the user can input the first image and the first text on the client, and use the first text to guide the cloud to adjust the first image, thereby obtaining multiple preview images.
- the user can select a suitable target image from the preview images, and the cloud will perform searches based on the target images, which can narrow the search scope and improve the accuracy of image retrieval.
- the execution process in the cloud can be configured in the platform software of machine learning and deep learning, and can be a program code deployed on the server hardware.
- the program code in the cloud of the present application can exist in the runtime engine, memory management and communication management modules of the above-mentioned platform software, as well as outside the existing modules.
- the program code of the present application can run in the host memory and/or graphics processing unit (GPU) memory of the server.
- the architecture shown in Figure 8 is the implementation form in the server and platform software in the embodiment of the present application, wherein the hardware layer includes GPU and memory, and in addition, the memory will store a text design database and a picture vector database.
- the software layer includes LLM, picture editing generation model and picture editing retrieval model, and of course, the software layer can also include a text analysis model.
- the calling relationship between LLM, picture editing generation model, picture editing retrieval model and text analysis model can be understood by referring to the previous introduction.
- the server executes the process of the present application, it will call the text design database in the memory through the GPU to generate extended text, and call the picture in the picture vector database to obtain the preview picture. These processes can be understood by referring to the previous introduction.
- the text design database and the image vector database in the embodiment of the present application can be obtained through offline training and can be updated online.
- the retrieval system is roughly divided into an offline system, an online system and a storage module.
- the main task of the offline system is to construct design texts and product image feature vectors based on Internet product images and texts, and to screen the design texts and product images.
- the process may include: deduplication of product images, and generating image representation vectors based on the image pre-training model, constructing a product image index, and constructing a picture vector database.
- the product attributes are classified and the specific content is extracted; after completion, deduplication and text cleaning are performed to construct a text design database.
- the offline system can also retrain the image editing generation model and the image editing retrieval model based on the product images and texts.
- the main task of the online system is to generate preview images and retrieve products based on the image editing generation model and image editing retrieval model according to user requests.
- the storage module is mainly used to store design text, product image representation vectors, LLM large model, image editing retrieval model and image editing generation model for offline and online system calls.
- the online system After receiving the user input request, the online system passes it to LLM for decision-making; LLM analyzes the user input and then decides whether to start the image editing retrieval process. If started, the image editing generation module is called.
- the picture editing generation module expands the text input by the user to generate multiple extended texts, and generates corresponding preview pictures based on the image editing generation model and returns them to the LLM.
- LLM responds to the user's request based on the generated preview image; the user provides feedback on the generated preview image.
- LLM passes user feedback to the text analysis model, which obtains key information from the user feedback and returns the analysis results to LLM.
- LLM decides whether to continue calling the image editing generation module or the image editing retrieval module; if the user is satisfied, LLM calls the image editing retrieval module; if the user is not satisfied, LLM calls the image editing generation module to continue generating the next wave of preview images.
- the image editing retrieval model performs image editing retrieval with the target image selected by the user and the corresponding text as input, and returns the retrieved image to the LLM.
- LLM combines the retrieved images to reply to the user.
- the program code of the embodiment of the present application can provide external services through a remote link, such as: a user downloads an application (APP), and the user can interact with the cloud through the APP to obtain a satisfactory picture.
- a remote link such as: a user downloads an application (APP), and the user can interact with the cloud through the APP to obtain a satisfactory picture.
- APP application
- the solution provided by the embodiment of the present application combines the generation of preview images with product retrieval: based on the images and texts input by the user, a preview is generated through image editing generation technology, so that the user can get satisfactory images.
- the present application recommends expansion solutions to users through text expansion, realizes diversified recommendations, and helps to collect the user's editing intentions, solving the problem that the user inputs a short text and has a large search range.
- retrieval can be performed based on images that the user is satisfied with, realizing accurate retrieval of images.
- a structure of a cloud device 100 provided in an embodiment of the present application includes:
- the processing unit 1001 is used to generate a plurality of preview images according to a first image and a first text from a client, wherein the first text is used to adjust the first image.
- the sending unit 1002 is used to send multiple preview pictures to the client.
- the processing unit 1001 is further configured to retrieve a second image associated with the target image according to operation information from the client, where the target image is included in the plurality of preview images.
- the sending unit 1002 is further configured to send the second image to the client.
- a user can input a first image and a first text on the client, and use the first text to guide the cloud to adjust the first image, thereby obtaining multiple preview images.
- the user can select a suitable target image from the preview images, and the cloud will perform a search based on the target image, which can narrow the search scope and improve the accuracy of image retrieval.
- the processing unit 1001 is configured to expand the first text to obtain multiple extended texts; and generate a preview image corresponding to each extended text according to each extended text and the first image.
- the multiple extended texts are texts in a text design database whose similarity with the first text satisfies a first condition.
- the multiple extended texts are texts in a text design database whose similarity and design popularity with the first text meet the second condition.
- the second picture is a picture retrieved from a picture vector database.
- the cloud device 100 is applied to a response system based on a large language model (LLM); the operation information includes a target image, or the operation information includes a target image and a second text.
- LLM large language model
- the processing unit 1001 is configured to call the image editing retrieval model through the LLM, and retrieve an associated second image based on the extended text corresponding to the target image and the target image.
- the processing unit 1001 is used to call the image editing retrieval model through the LLM to retrieve the associated second image based on the extended text corresponding to the target image, the second text and the target image.
- the processing unit 1001 is used to call the picture editing generation model through LLM to generate multiple preview images associated with the target image and the second text.
- the picture editing generation model is used to obtain multiple preview images associated with the target image and the second text based on multiple extended texts of the second text and the target image to determine the second image.
- the LLM-based answering system further includes a text analysis model, and the text analysis model is used to analyze the second text to determine whether the target image meets user expectations.
- the sending unit 1002 is further configured to send link information associated with the second image to the client.
- the embodiment of the present application further provides a structure of a client 110 including:
- the sending unit 1101 is used to send the first picture and the first text input by the user to the cloud, where the first text is used to adjust the first picture.
- the receiving unit 1102 is used to receive multiple preview images from the cloud, where the multiple preview images are generated based on the first image and the first text.
- the sending unit 1101 is used to send the operation information input by the user to the cloud, where the operation information is used to retrieve a second image associated with the target image, where the target image is included in the multiple preview images.
- the receiving unit 1102 is configured to receive a second image from the cloud.
- the client 110 is applied to a response system based on a large language model (LLM); the operation information includes a target image, or the operation information includes a target image and a second text.
- LLM large language model
- the second image is retrieved based on the extended text of the target image and the target image by calling the image editing retrieval model through the LLM.
- the second image is retrieved based on the extended text of the target image, the second text and the target image by calling the image editing retrieval model through LLM.
- the receiving unit 1102 is further used to receive multiple preview images associated with the target image and the second text from the cloud, and the multiple preview images associated with the target image and the second text are obtained by LLM calling the picture editing generation model, and the picture editing generation model is used to obtain multiple preview images associated with the target image and the second text based on multiple extended texts of the second text and the target image.
- the sending unit 1101 is also used to send a third picture, or a third picture and a third text to the cloud.
- the third picture, or the third picture and the third text are used to determine the second picture, and the third picture is included in multiple preview pictures associated with the target picture and the second text.
- the receiving unit 1102 is further configured to receive link information associated with the second image from the cloud.
- each unit in the client 110 is similar to those described in the embodiments shown in the aforementioned Figures 3 to 9, and will not be repeated here.
- a computer-readable storage medium in which computer execution instructions are stored.
- the cloud device executes the steps executed by the cloud in Figures 3 to 9 above.
- a computer-readable storage medium in which computer-executable instructions are stored.
- the processor of the client executes the computer-executable instructions
- the client executes the steps executed by the client in Figures 3 to 9 above.
- a computer program product is also provided.
- the computer program product includes a computer program code.
- the computer program code is executed on a computer, the computer device executes the steps executed by the cloud or client in Figures 3 to 9 above.
- a chip system which includes one or more interface circuits and one or more processors; the interface circuit and the processor are interconnected by lines; the interface circuit is used to receive signals from the memory of the terminal and send signals to the processor, and the signals include computer instructions stored in the memory; when the processor executes the computer instructions, the terminal executes the steps performed by the cloud device or the client in the above-mentioned Figures 3 to 9.
- the chip system may also include a memory, which is used to store program instructions and data necessary for the control device.
- the chip system may be composed of chips, or may include chips and other discrete devices.
- the disclosed systems, devices and methods can be implemented in other ways.
- the device embodiments described above are only schematic.
- the division of units is only a logical function division. There may be other division methods in actual implementation.
- multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed can be an indirect coupling or communication connection through some interfaces, devices or units, which can be electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above integrated units may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
- the integrated unit When the integrated unit is implemented using software, it can be implemented in whole or in part in the form of a computer program product.
- the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
- the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- the computer instructions may be transmitted from a website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, server or data center.
- the computer-readable storage medium may be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more available media integrations.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Processing Or Creating Images (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
本申请要求于2023年09月27日提交国家知识产权局、申请号为202311287307.X、申请名称为“一种图片检索的方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office on September 27, 2023, with application number 202311287307.X and application name “A method for image retrieval and related devices”, all contents of which are incorporated by reference in this application.
本申请涉及云计算技术领域,具体涉及一种图片检索的方法及相关装置。The present application relates to the field of cloud computing technology, and in particular to a method and related device for image retrieval.
检索也称搜索,在搜索引擎或者购物平台上,用户在搜索框中输入关键词后,搜索平台就会返回相应的搜索结果。Retrieval is also called search. On a search engine or shopping platform, after the user enters keywords in the search box, the search platform will return the corresponding search results.
目前较常用的是文本搜索,也有一些搜索平台提供了图片检索功能。图片检索是根据用户输入的图片进行检索,返回与输入图片相近的图片,也称图像检索。Currently, text search is more commonly used, and some search platforms also provide image search functions. Image search is to search based on the image input by the user and return images similar to the input image, which is also called image search.
图片检索相对于文本检索虽然可以缩小检索范围,但还是会检索出很多图片,用户需要对返回的图片逐个进行浏览才能找到想要的图片,无法实现精准检索。Although image retrieval can narrow the search scope compared to text retrieval, it will still retrieve a lot of images. Users need to browse the returned images one by one to find the desired images, and accurate retrieval cannot be achieved.
发明内容Summary of the invention
本申请提供一种图片检索的方法,用于提高图片检索的精准度。本申请还提供了相应的装置、系统、计算机可读存储介质以及计算机程序产品等。The present application provides a method for image retrieval, which is used to improve the accuracy of image retrieval. The present application also provides a corresponding device, system, computer-readable storage medium, and computer program product.
本申请第一方面提供一种图片检索的方法,包括:根据来自客户端的第一图片和第一文本,生成多张预览图片,第一文本用于调整第一图片;向客户端发送多张预览图片;根据来自客户端的操作信息,检索与目标图片关联的第二图片,目标图片包含于多张预览图片;向客户端发送第二图片。The first aspect of the present application provides a method for image retrieval, including: generating multiple preview images based on a first image and a first text from a client, wherein the first text is used to adjust the first image; sending the multiple preview images to the client; retrieving a second image associated with a target image based on operation information from the client, wherein the target image is included in the multiple preview images; and sending the second image to the client.
本申请中,云端可以是云平台的软件或服务,也可以是部署在例如边缘节点等网络中节点上的软件或服务。云端可以运行在独立的物理机上,也可以运行在虚拟化的资源上。In this application, the cloud can be software or services of a cloud platform, or software or services deployed on a node in a network such as an edge node. The cloud can run on an independent physical machine or on virtualized resources.
本申请中,客户端可以是终端设备,也可以是应用,例如该应用运行于终端设备上供用户使用。该应用可以是搜索引擎、应答应用或者购物应用等。In the present application, the client can be a terminal device or an application, for example, the application runs on the terminal device for the user to use. The application can be a search engine, a response application, or a shopping application.
本申请中,用户可以在客户端输入第一图片和第一文本,通过第一文本来指导云端调整第一图片,从而得到多张预览图片,用户可以从预览图片中选择合适的目标图片,云端会基于目标图片进行检索,这样可以缩小检索范围,还可以提高图片检索的精准度。In this application, the user can input the first image and the first text on the client, and use the first text to guide the cloud to adjust the first image, thereby obtaining multiple preview images. The user can select a suitable target image from the preview images, and the cloud will perform searches based on the target images, which can narrow the search scope and improve the accuracy of image retrieval.
一种可能的实现方式中,上述步骤:根据来自客户端的第一图片和第一文本,生成多张预览图片,包括:扩展第一文本得到多个扩展文本;根据每个扩展文本,以及第一图片生成与每个扩展文本对应的预览图片。In a possible implementation, the above steps: generating multiple preview images based on the first image and the first text from the client, include: expanding the first text to obtain multiple extended texts; generating a preview image corresponding to each extended text based on each extended text and the first image.
该种可能的实现方式中,云端可以围绕第一文本中的内容进行扩展,如:第一文本为在第一图片上配置花的图案,那么可以围绕花的种类,花在第一图片上的位置等进行文本扩展。通过扩展文本,得到的预览图片,可以从多个角度提供符合用户在第一文本中要求的预览图片,从而在用户要求范围内增加了预览图片的丰富度。In this possible implementation, the cloud can expand around the content in the first text, for example, if the first text is a flower pattern configured on the first picture, the text can be expanded around the type of flower, the position of the flower on the first picture, etc. By expanding the text, the preview picture obtained can provide preview pictures that meet the user's requirements in the first text from multiple angles, thereby increasing the richness of the preview pictures within the scope of the user's requirements.
一种可能的实现方式中,多个扩展文本为文本设计数据库中,与第一文本的相似度满足第一条件的文本。In a possible implementation, the multiple extended texts are texts in a text design database whose similarity with the first text satisfies a first condition.
该种可能的实现方式中,文本设计数据库可以是预先构建的,并可以持续更新的,文本设计数据库中的文本可以是根据历史上的文本检索记录收集的。本申请中,可以通过第一文本,在文本设计数据库中查找与第一文本相似度高的文本。第一条件可以是一个阈值,以相似度的取值在0至1的区间为例,阈值可以为0.5或者其他数值,以0.5为例,相似度大于0.5的文本都可以理解为是与第一文本的相似度满足第一条件的文本。第一条件还可以是文本个数的限制条件,如:与第一文本的相似度最高的前N个文本,为与第一文本的相似度满足第一条件的文本。由该种可能的实现方式可知,通过文本设计数据库可以得到较好的扩展文本,进而可以提高预览图片的用户满意度。 In this possible implementation, the text design database can be pre-built and continuously updated, and the text in the text design database can be collected based on historical text retrieval records. In the present application, the text with high similarity to the first text can be searched in the text design database through the first text. The first condition can be a threshold value. Taking the value of the similarity in the interval of 0 to 1 as an example, the threshold value can be 0.5 or other numerical values. Taking 0.5 as an example, texts with a similarity greater than 0.5 can be understood as texts whose similarity to the first text meets the first condition. The first condition can also be a restriction condition on the number of texts, such as: the top N texts with the highest similarity to the first text are texts whose similarity to the first text meets the first condition. It can be seen from this possible implementation that a better extended text can be obtained through the text design database, thereby improving the user satisfaction of the preview image.
一种可能的实现方式中,所述多个扩展文本为文本设计数据库中,与所述第一文本的相似度和设计受欢迎度满足第二条件的文本。In a possible implementation, the multiple extended texts are texts in a text design database whose similarity and design popularity with the first text meet a second condition.
该种可能的实现方式中,可以通过第一文本,在文本设计数据库中查找与第一文本相似度高的文本。并确定这些相似度高的文本的受欢迎度,然后,可以结合相似度与受欢迎度来确定扩展文本。文本设计数据库中的每个文本都可以预先通过被使用次数等统计出受欢迎度。受欢迎度的取值可以是0至1之间的数值。第二条件可以包括一个阈值,也可以包括两个阈值,当是一个阈值时,可以对文本的相似度和受欢迎度进行加权,得到一个加权值(相似度的权重通常大于受欢迎度),若该加权值大于这一个阈值,则表示该文本满足第二条件,属于第一文本的扩展文本。若第二条件是两个阈值,则一个阈值可以为相似度阈值,如:为0.5;一个阈值为扩展文本的数量阈值,如:为N,N为正整数,若查找到的文本与第一文本的相似度大于0.5,则可以为相似度大于0.5的扩展文本按照受欢迎度的大小进行排序,然后从中选择受欢迎度最大的N个扩展文本作为最终选择的扩展文本。本申请中,通过相似度和受欢迎度可以提高扩展文本的精确度。In this possible implementation, the first text can be used to search for texts with high similarity to the first text in the text design database. And the popularity of these texts with high similarity can be determined, and then the extended text can be determined by combining the similarity and popularity. The popularity of each text in the text design database can be counted in advance by the number of times it is used. The value of popularity can be a value between 0 and 1. The second condition can include one threshold or two thresholds. When it is a threshold, the similarity and popularity of the text can be weighted to obtain a weighted value (the weight of the similarity is usually greater than the popularity). If the weighted value is greater than this threshold, it means that the text meets the second condition and belongs to the extended text of the first text. If the second condition is two thresholds, one threshold can be a similarity threshold, such as: 0.5; one threshold is a threshold for the number of extended texts, such as: N, N is a positive integer. If the similarity between the found text and the first text is greater than 0.5, the extended texts with similarity greater than 0.5 can be sorted according to the size of popularity, and then the N extended texts with the largest popularity are selected as the final selected extended texts. In this application, the accuracy of the extended text can be improved by similarity and popularity.
一种可能的实现方式中,所述第二图片为从图片向量数据库中检索出的图片。In a possible implementation, the second image is an image retrieved from an image vector database.
该种可能的实现方式中,图片向量数据库可以是预先构建的,并可以持续更新的,图片向量数据库中的图片可以是从网络上收集的,并且可以持续更新的。当然,第二图片也可以是从网络上搜索到的。本申请中,通过从图片向量数据库检索第二图片,可以提高第二图片检索的速度。In this possible implementation, the picture vector database can be pre-built and continuously updated, and the pictures in the picture vector database can be collected from the Internet and continuously updated. Of course, the second picture can also be searched from the Internet. In this application, by retrieving the second picture from the picture vector database, the speed of retrieving the second picture can be improved.
一种可能的实现方式中,该方法应用于基于大型语言模型(large language model,LLM)的应答系统中;操作信息包括目标图片,或者,操作信息包括目标图片和第二文本。In one possible implementation, the method is applied to a response system based on a large language model (LLM); the operation information includes a target image, or the operation information includes a target image and a second text.
该种可能的实现方式中,若预览图片中包含用户满意的目标图片,则可以基于该目标图片进行图片检索,当然,也可以基于该目标图片以及该目标图片对应的扩展文本进行检索。若预览图片中没有包含用户满意的图片,那么目标图片可以作为第二次图片编辑的输入图片,第二文本可以指导云端编辑该目标图片,进一步为用户提供第二波预览图片,直到用户选择到满意的图片。由该种可能的实现方式可知,在应答系统中,可以通过LLM调用不同的模型来实现图片检索或者图片编辑等不同功能,从而丰富了应答系统的功能。In this possible implementation, if the preview image contains a target image that the user is satisfied with, image retrieval can be performed based on the target image. Of course, retrieval can also be performed based on the target image and the extended text corresponding to the target image. If the preview image does not contain an image that the user is satisfied with, the target image can be used as an input image for the second image editing, and the second text can guide the cloud to edit the target image, and further provide the user with a second wave of preview images until the user selects a satisfactory image. From this possible implementation, it can be seen that in the response system, different models can be called through LLM to implement different functions such as image retrieval or image editing, thereby enriching the functions of the response system.
一种可能的实现方式中,若操作信息包括目标图片,对应地,上述步骤:检索与目标图片关联的第二图片,包括:通过LLM调用图片编辑式检索模型,基于目标图片对应的扩展文本和目标图片检索关联的第二图片。In one possible implementation, if the operation information includes a target image, correspondingly, the above step: retrieving a second image associated with the target image includes: calling an image editing retrieval model through LLM, and retrieving the associated second image based on the extended text corresponding to the target image and the target image.
该种可能的实现方式中,图片编辑式检索模型基于目标图片对应的扩展文本和目标图片检索第二图片,可以提高检索到的第二图片的准确度。In this possible implementation, the image editing retrieval model retrieves the second image based on the extended text corresponding to the target image and the target image, which can improve the accuracy of the retrieved second image.
一种可能的实现方式中,若第二文本指示目标图片符合用户预期,对应地,上述步骤:检索与目标图片关联的第二图片,包括:通过LLM调用图片编辑式检索模型,基于目标图片对应的扩展文本、第二文本和目标图片检索关联的第二图片。In one possible implementation, if the second text indicates that the target image meets the user's expectations, correspondingly, the above step: retrieving the second image associated with the target image includes: calling the image editing retrieval model through LLM, and retrieving the associated second image based on the extended text corresponding to the target image, the second text and the target image.
该种可能的实现方式中,图片编辑式检索模型基于目标图片对应的扩展文本、第二文本和目标图片检索第二图片,可以提高检索到的第二图片的准确度。In this possible implementation, the image editing retrieval model retrieves the second image based on the extended text corresponding to the target image, the second text and the target image, which can improve the accuracy of the retrieved second image.
一种可能的实现方式中,若第二文本指示目标图片不符合用户预期,对应地,上述步骤:检索与目标图片关联的第二图片,包括:通过LLM调用图片编辑式生成模型,生成与目标图片和第二文本关联的多张预览图片,图片编辑式生成模型可以基于扩展第二文本得到的多个扩展文本和目标图片得到与目标图片和第二文本关联的多张预览图片,以确定第二图片。In one possible implementation, if the second text indicates that the target image does not meet user expectations, correspondingly, the above step: retrieving the second image associated with the target image includes: calling the picture editing generation model through LLM to generate multiple preview images associated with the target image and the second text. The picture editing generation model can obtain multiple preview images associated with the target image and the second text based on the multiple extended texts obtained by expanding the second text and the target image to determine the second image.
该种可能的实现方式中,图片编辑式生成模型指的是可以使用第二文本和目标图片进一步生成下一波预览图片的模型,生成过程可以参阅前面第一文本和第一图片生成预览图片的过程进行理解。这样多轮生成预览图片,可以使用户获得满意的预览图片,进而提高检索到的第二图片的准确度。In this possible implementation, the image editing generation model refers to a model that can use the second text and the target image to further generate the next wave of preview images. The generation process can be understood by referring to the process of generating preview images from the first text and the first image. In this way, multiple rounds of generating preview images can enable users to obtain satisfactory preview images, thereby improving the accuracy of the retrieved second image.
一种可能的实现方式中,基于LLM的应答系统中还包括文本分析模型,文本分析模型用于分析第二文本,以确定目标图片是否符合用户预期。In a possible implementation, the LLM-based answering system also includes a text analysis model, and the text analysis model is used to analyze the second text to determine whether the target image meets the user's expectations.
该种可能的实现方式中,第二文本可以指示LLM调用图片编辑式检索模型或者图片编辑式生成模型。文本分析模型可以通过分析第二文本获知用户对目标图片是否满意,若不满意,则指示LLM调用图片编辑式生成模型继续生成下一波预览图片,若满意,则指示LLM调用图片编辑式检索模型生成第二图片。 这样可以提高与用户沟通的效率。In this possible implementation, the second text may instruct the LLM to call the picture editing retrieval model or the picture editing generation model. The text analysis model may learn whether the user is satisfied with the target picture by analyzing the second text. If the user is not satisfied, the text analysis model may instruct the LLM to call the picture editing generation model to continue to generate the next wave of preview pictures. If the user is satisfied, the text analysis model may instruct the LLM to call the picture editing retrieval model to generate the second picture. This can improve the efficiency of communication with users.
一种可能的实现方式中,接收来自云端的第二图片时,该方法还包括:接收来自云端的与第二图片关联的链接信息。In a possible implementation, when receiving the second picture from the cloud, the method further includes: receiving link information associated with the second picture from the cloud.
该种可能的实现方式中,与第二图片关联的链接信息可以是第二图片中商品的描述信息以及购买链接等。这样,可以提高与用户的沟通效率。In this possible implementation, the link information associated with the second picture may be description information of the product in the second picture and a purchase link, etc. In this way, the communication efficiency with the user can be improved.
本申请第二方面提供一种图片检索的方法,包括:向云端发送用户输入的第一图片和第一文本,第一文本用于调整第一图片;接收来自云端的多张预览图片,多张预览图片是基于第一图片和第一文本生成的;向云端发送用户输入的操作信息,操作信息用于检索与目标图片关联的第二图片,目标图片包含于多张预览图片;接收来自云端的第二图片。A second aspect of the present application provides a method for image retrieval, including: sending a first image and a first text input by a user to a cloud, wherein the first text is used to adjust the first image; receiving multiple preview images from the cloud, wherein the multiple preview images are generated based on the first image and the first text; sending operation information input by the user to the cloud, wherein the operation information is used to retrieve a second image associated with a target image, wherein the target image is included in the multiple preview images; and receiving the second image from the cloud.
本申请中,用户可以在客户端输入第一图片和第一文本,通过第一文本来指导云端调整第一图片,从而得到多张预览图片,用户可以从预览图片中选择合适的目标图片,云端会基于目标图片进行检索,这样可以缩小检索范围,还可以提高图片检索的精准度。In this application, the user can input the first image and the first text on the client, and use the first text to guide the cloud to adjust the first image, thereby obtaining multiple preview images. The user can select a suitable target image from the preview images, and the cloud will perform searches based on the target images, which can narrow the search scope and improve the accuracy of image retrieval.
一种可能的实现方式中,该方法应用于基于大型语言模型LLM的应答系统中;操作信息包括目标图片,或者,操作信息包括目标图片和第二文本。In a possible implementation, the method is applied to a response system based on a large language model (LLM); the operation information includes a target image, or the operation information includes a target image and a second text.
该种可能的实现方式中,若预览图片中包含用户满意的目标图片,则可以基于该目标图片进行图片检索,当然,也可以基于该目标图片以及该目标图片对应的扩展文本进行检索。若预览图片中没有包含用户满意的图片,那么目标图片可以作为第二次图片编辑的输入图片,第二文本可以指导云端编辑该目标图片,进一步为用户提供第二波预览图片,直到用户选择到满意的图片。由该种可能的实现方式可知,在应答系统中,可以通过LLM调用不同的模型来实现图片检索或者图片编辑等不同功能,从而丰富了应答系统的功能。In this possible implementation, if the preview image contains a target image that the user is satisfied with, image retrieval can be performed based on the target image. Of course, retrieval can also be performed based on the target image and the extended text corresponding to the target image. If the preview image does not contain an image that the user is satisfied with, the target image can be used as an input image for the second image editing, and the second text can guide the cloud to edit the target image, and further provide the user with a second wave of preview images until the user selects a satisfactory image. From this possible implementation, it can be seen that in the response system, different models can be called through LLM to implement different functions such as image retrieval or image editing, thereby enriching the functions of the response system.
一种可能的实现方式中,若操作信息包括目标图片,则第二图片是通过LLM调用图片编辑式检索模型,基于第一文本的扩展文本和目标图片检索到的。In a possible implementation, if the operation information includes a target image, the second image is retrieved based on the extended text of the first text and the target image by calling the image editing retrieval model through the LLM.
该种可能的实现方式中,图片编辑式检索模型基于目标图片对应的扩展文本,以及目标检索第二图片,可以提高检索到的第二图片的准确度。In this possible implementation, the image editing retrieval model is based on the extended text corresponding to the target image and the target retrieval second image, which can improve the accuracy of the retrieved second image.
一种可能的实现方式中,若第二文本指示目标图片符合用户预期,则第二图片是通过LLM调用图片编辑式检索模型,基于目标图片对应的扩展文本、第二文本和目标图片检索到的。In one possible implementation, if the second text indicates that the target image meets the user's expectations, the second image is retrieved by calling the image editing retrieval model through LLM based on the extended text corresponding to the target image, the second text and the target image.
该种可能的实现方式中,图片编辑式检索模型基于目标图片对应的扩展文本、第二文本和目标图片检索第二图片,可以提高检索到的第二图片的准确度。In this possible implementation, the image editing retrieval model retrieves the second image based on the extended text corresponding to the target image, the second text and the target image, which can improve the accuracy of the retrieved second image.
一种可能的实现方式中,若第二文本指示目标图片不符合用户预期,则该方法还包括:接收来自云端的与目标图片和第二文本关联的多张预览图片,与目标图片和第二文本关联的多张预览图片是通过LLM调用图片编辑式生成模型得到的,图片编辑式生成模型基于第二文本的多个扩展文本和目标图片得到与目标图片和第二文本关联的多张预览图片;向云端发送第三图片,或者,第三图片和第三文本;第三图片,或者,第三图片和第三文本用于确定第二图片,第三图片包含于目标图片和第二文本关联的多张预览图片中。In one possible implementation, if the second text indicates that the target image does not meet user expectations, the method also includes: receiving multiple preview images associated with the target image and the second text from the cloud, the multiple preview images associated with the target image and the second text are obtained by LLM calling a picture editing generation model, the picture editing generation model obtains the multiple preview images associated with the target image and the second text based on multiple extended texts of the second text and the target image; sending a third image to the cloud, or a third image and a third text; the third image, or the third image and the third text are used to determine the second image, and the third image is included in the multiple preview images associated with the target image and the second text.
该种可能的实现方式中,图片编辑式生成模型指的是可以使用第二文本的扩展文本和目标图片进一步生成下一波预览图片的模型,生成过程可以参阅前面第一文本和第一图片生成预览图片的过程进行理解。这样多轮生成预览图片,可以使用户获得满意的预览图片,进而提高检索到的第二图片的准确度。In this possible implementation, the image editing generation model refers to a model that can use the extended text of the second text and the target image to further generate the next wave of preview images. The generation process can be understood by referring to the process of generating preview images from the first text and the first image. In this way, multiple rounds of generating preview images can enable users to obtain satisfactory preview images, thereby improving the accuracy of the retrieved second image.
一种可能的实现方式中,上述:接收来自所述云端的所述第二图片时,该方法还包括:接收来自云端的与所述第二图片关联的链接信息。In a possible implementation, when: receiving the second picture from the cloud, the method further includes: receiving link information associated with the second picture from the cloud.
该种可能的实现方式中,与第二图片关联的链接信息可以是第二图片中商品的描述信息以及购买链接等。这样,可以提高与用户的沟通效率。In this possible implementation, the link information associated with the second picture may be description information of the product in the second picture and a purchase link, etc. In this way, the communication efficiency with the user can be improved.
本申请第三方面,提供了一种云端装置,用于执行上述第一方面或第一方面的任意可能的实现方式中的方法。具体地,该云端装置包括用于执行上述第一方面或第一方面的任意可能的实现方式中的方法的模块或单元,如:处理单元和发送单元。In a third aspect of the present application, a cloud device is provided for executing the method in the first aspect or any possible implementation of the first aspect. Specifically, the cloud device includes a module or unit for executing the method in the first aspect or any possible implementation of the first aspect, such as a processing unit and a sending unit.
本申请第四方面,提供了一种客户端,用于执行上述第二方面或第二方面的任意可能的实现方式中的方法。具体地,该客户端包括用于执行上述第二方面或第二方面的任意可能的实现方式中的方法的模 块或单元,如:发送单元和接收单元。In a fourth aspect of the present application, a client is provided for executing the method in the second aspect or any possible implementation of the second aspect. Specifically, the client includes a module for executing the method in the second aspect or any possible implementation of the second aspect. Blocks or units, such as a sending unit and a receiving unit.
本申请第五方面提供了一种云端装置,包括收发器、处理器和存储器,收发器和处理器与存储器耦合,存储器用于存储程序或指令,当程序或指令被处理器执行时,使得客户端执行前述第一方面或第一方面的任意可能的实现方式中的方法。The fifth aspect of the present application provides a cloud device, including a transceiver, a processor and a memory, wherein the transceiver and the processor are coupled to the memory, and the memory is used to store programs or instructions. When the programs or instructions are executed by the processor, the client executes the method in the aforementioned first aspect or any possible implementation of the first aspect.
本申请第六方面,提供了一种客户端。该客户端可以包括至少一个处理器、存储器和通信接口。处理器与存储器和通信接口耦合。存储器用于存储指令,处理器用于执行该指令,通信接口用于在处理器的控制下与其他网元进行通信。该指令在被处理器执行时,使处理器执行第二方面或第二方面的任意可能的实现方式中的方法。In a sixth aspect of the present application, a client is provided. The client may include at least one processor, a memory, and a communication interface. The processor is coupled to the memory and the communication interface. The memory is used to store instructions, the processor is used to execute the instructions, and the communication interface is used to communicate with other network elements under the control of the processor. When the instructions are executed by the processor, the processor executes the method in the second aspect or any possible implementation of the second aspect.
本申请第七方面提供了一种芯片系统,该芯片系统包括一个或多个接口电路和一个或多个处理器;接口电路和处理器通过线路互联;接口电路用于从云端装置的存储器接收信号,并向处理器发送信号,信号包括存储器中存储的计算机指令;当处理器执行计算机指令时,云端装置执行前述第一方面或第一方面的任意可能的实现方式中的方法。The seventh aspect of the present application provides a chip system, which includes one or more interface circuits and one or more processors; the interface circuit and the processor are interconnected by lines; the interface circuit is used to receive signals from the memory of the cloud device and send signals to the processor, and the signals include computer instructions stored in the memory; when the processor executes the computer instructions, the cloud device executes the method in the aforementioned first aspect or any possible implementation of the first aspect.
本申请第八方面提供了一种芯片系统,该芯片系统包括一个或多个接口电路和一个或多个处理器;接口电路和处理器通过线路互联;接口电路用于从客户端的存储器接收信号,并向处理器发送信号,信号包括存储器中存储的计算机指令;当处理器执行计算机指令时,客户端执行前述第二方面或第二方面的任意可能的实现方式中的方法。In an eighth aspect of the present application, a chip system is provided, which includes one or more interface circuits and one or more processors; the interface circuits and the processors are interconnected through lines; the interface circuits are used to receive signals from a client's memory and send signals to the processor, the signals including computer instructions stored in the memory; when the processor executes the computer instructions, the client executes the method in the aforementioned second aspect or any possible implementation of the second aspect.
本申请第九方面提供了一种计算机可读存储介质,其上存储有计算机程序或指令,当计算机程序或指令在计算机设备上运行时,使得计算机设备执行前述第一方面或第一方面的任意可能的实现方式中的方法。The ninth aspect of the present application provides a computer-readable storage medium on which a computer program or instruction is stored. When the computer program or instruction is executed on a computer device, the computer device executes the method in the aforementioned first aspect or any possible implementation of the first aspect.
本申请第十方面提供了一种计算机可读存储介质,其上存储有计算机程序或指令,当计算机程序或指令在计算机设备上运行时,使得计算机设备执行前述第二方面或第二方面的任意可能的实现方式中的方法。The tenth aspect of the present application provides a computer-readable storage medium on which a computer program or instruction is stored. When the computer program or instruction is executed on a computer device, the computer device executes the method in the aforementioned second aspect or any possible implementation of the second aspect.
本申请第十一方面提供了一种计算机设备程序产品,该计算机设备程序产品包括计算机设备程序代码,当计算机设备程序代码在计算机设备上执行时,使得计算机设备执行前述第一方面或第一方面的任意可能的实现方式中的方法。In an eleventh aspect of the present application, there is provided a computer device program product, which includes a computer device program code. When the computer device program code is executed on a computer device, the computer device executes the method in the aforementioned first aspect or any possible implementation of the first aspect.
本申请第十二方面提供了一种计算机设备程序产品,该计算机设备程序产品包括计算机设备程序代码,当计算机设备程序代码在计算机设备上执行时,使得计算机设备执行前述第二方面或第二方面的任意可能的实现方式中的方法。The twelfth aspect of the present application provides a computer device program product, which includes a computer device program code. When the computer device program code is executed on a computer device, the computer device executes the method in the aforementioned second aspect or any possible implementation of the second aspect.
本申请第十三方面提供一种计算机设备集群,包括至少一个计算机设备,每个计算机设备包括处理器和存储器;至少一个计算机设备的处理器用于执行至少一个计算机设备的存储器中存储的指令,以使得计算机设备集群执行前述第一方面或第一方面的任意可能的实现方式中的方法。The thirteenth aspect of the present application provides a computer device cluster, comprising at least one computer device, each computer device comprising a processor and a memory; the processor of at least one computer device is used to execute instructions stored in the memory of at least one computer device, so that the computer device cluster executes the method in the aforementioned first aspect or any possible implementation of the first aspect.
本申请第十四方面提供一种检索系统,该检索系统包括客户端和云端装置,该云端装置用于执行前述第一方面或第一方面的任意可能的实现方式中的方法,客户端用于执行前述第二方面或第二方面的任意可能的实现方式中的方法。In the fourteenth aspect of the present application, a retrieval system is provided, which includes a client and a cloud device, the cloud device is used to execute the method in the aforementioned first aspect or any possible implementation of the first aspect, and the client is used to execute the method in the aforementioned second aspect or any possible implementation of the second aspect.
其中,第三方面至第十四方面或者其中任一种可能实现方式所带来的技术效果可参见第一方面或第一方面不同可能实现方式所带来的技术效果,此处不再赘述。Among them, the technical effects brought about by the third to fourteenth aspects or any possible implementation methods thereof can refer to the technical effects brought about by the first aspect or different possible implementation methods of the first aspect, and will not be repeated here.
图1A是本申请实施例提供的检索系统的一架构示意图;FIG1A is a schematic diagram of an architecture of a retrieval system provided in an embodiment of the present application;
图1B是本申请实施例提供的一购物场景中的图片检索示意图;FIG1B is a schematic diagram of image retrieval in a shopping scenario provided by an embodiment of the present application;
图1C是本申请实施例提供的云系统的一架构示意图;FIG1C is a schematic diagram of an architecture of a cloud system provided in an embodiment of the present application;
图2A是本申请实施例提供的终端设备的一结构示意图;FIG2A is a schematic diagram of a structure of a terminal device provided in an embodiment of the present application;
图2B是本申请实施例提供的云端装置的一结构示意图;FIG2B is a schematic diagram of a structure of a cloud device provided in an embodiment of the present application;
图3是本申请实施例提供的图片检索的方法的一实施例示意图;FIG3 is a schematic diagram of an embodiment of a method for image retrieval provided in an embodiment of the present application;
图4是本申请实施例中客户端的输入界面的一示例示意图;FIG4 is a schematic diagram of an example of an input interface of a client in an embodiment of the present application;
图5是本申请实施例中生成预览图片的一场景示例示意图; FIG5 is a schematic diagram of an example scenario of generating a preview image in an embodiment of the present application;
图6是本申请实施例中大型语言模型与文本分析模型的关系示意图;FIG6 is a schematic diagram of the relationship between a large language model and a text analysis model in an embodiment of the present application;
图7是本申请实施例中图片检索的另一实施例示意图;FIG7 is a schematic diagram of another embodiment of image retrieval in the embodiment of the present application;
图8是本申请实施例提供的服务器的一架构示意图;FIG8 is a schematic diagram of an architecture of a server provided in an embodiment of the present application;
图9是本申请实施例中检索系统的一架构示意图;FIG9 is a schematic diagram of an architecture of a retrieval system in an embodiment of the present application;
图10是本申请实施例提供的云端装置的一结构示意图;FIG10 is a schematic diagram of a structure of a cloud device provided in an embodiment of the present application;
图11是本申请实施例提供的客户端的一结构示意图。FIG. 11 is a schematic diagram of the structure of a client provided in an embodiment of the present application.
下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The following describes the embodiments of the present application in conjunction with the accompanying drawings. Obviously, the described embodiments are only embodiments of a part of the present application, rather than all embodiments. It is known to those skilled in the art that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", etc. in the specification and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable where appropriate, so that the embodiments described herein can be implemented in an order other than that illustrated or described herein. In addition, the terms "including" and "having" and any of their variations are intended to cover non-exclusive inclusions, for example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to those steps or units that are clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products or devices.
本申请实施例提供一种图片检索的方法,用于提高图片检索的精准度。本申请还提供了相应的装置、系统、计算机可读存储介质以及计算机程序产品等。以下分别进行详细说明。The embodiment of the present application provides a method for image retrieval, which is used to improve the accuracy of image retrieval. The present application also provides corresponding devices, systems, computer-readable storage media, and computer program products, etc. The following are detailed descriptions.
为便于理解,下面对本申请实施例涉及到的技术术语做简单介绍:For ease of understanding, the following briefly introduces the technical terms involved in the embodiments of the present application:
1.文本检索(Query):用户在搜索框输入的文本,从大规模网页文档库(文档库可以是千亿级文档库)中选择与文本最相关的文档候选集或者图片候选集。1. Text retrieval (Query): The text entered by the user in the search box is used to select the document candidate set or image candidate set that is most relevant to the text from a large-scale web document library (the document library can be a library of hundreds of billions of documents).
2.图片检索:根据用户输入的图片进行检索,返回与输入图片相近的图片;也称图片搜索,可以理解为是以图搜图。2. Image retrieval: Search based on the image input by the user and return images similar to the input image; also known as image search, which can be understood as searching for images by image.
3.图片编辑式检索:图片编辑式检索允许用户同时输入图片与文本,文本可用来编辑修改图片中的局部信息,检索返回的图片满足文本的内容,同时尽可能与输入图像相似。3. Image editing retrieval: Image editing retrieval allows users to input images and text at the same time. The text can be used to edit and modify local information in the image. The images returned by the retrieval meet the content of the text and are as similar to the input image as possible.
4.图片编辑式生成:图片编辑式生成允许用户同时输入图片与文本,文本可用来编辑修改图片中的局部信息,根据图片和文本生成新的图片,新的图片满足文本要求,其他信息与输入图片保持一致。4. Image editing generation: Image editing generation allows users to input images and text at the same time. The text can be used to edit and modify local information in the image. A new image is generated based on the image and text. The new image meets the text requirements, and other information is consistent with the input image.
5.应答系统:泛指搜索相关文档后,基于文档信息进行理解,从而来回答用户问题的系统。5. Answering system: Generally refers to a system that searches for relevant documents and then understands the document information to answer user questions.
6.对话式搜索应答系统:在搜索应答系统的基础上加入多轮对话能力,能够理解上下文,从而实现多轮连续应答功能的系统。6. Conversational search and response system: A system that adds multi-round conversation capabilities to the search and response system and can understand the context, thereby realizing multi-round continuous response functions.
7.大型语言模型(large language model,LLM):LLM通过训练后,将知识内化为模型参数,LLM可直接生成用户查询对应的答案。LLM可以调用图片检索模型、图片编辑式检索模型或图片编辑式生成模型来实现不同的功能。7. Large language model (LLM): After training, LLM internalizes knowledge into model parameters. LLM can directly generate answers to user queries. LLM can call image retrieval model, image editing retrieval model or image editing generation model to achieve different functions.
图1A是本申请实施例提供的检索的一架构示意图。FIG. 1A is a schematic diagram of an architecture of a search provided in an embodiment of the present application.
如图1A所示,本申请实施例提供的检索系统包括云端以及多个客户端,该云端可以与多个客户端通过网络进行通信。其中,云端可以是云平台的软件或服务,也可以是部署在例如边缘节点等网络中节点上的软件或服务。云端可以运行在独立的物理机上,也可以运行在虚拟化的资源上。客户端可以是终端设备,也可以是应用,例如该应用运行于终端设备上供用户使用。该应用可以是搜索引擎、应答应用或者购物应用等。As shown in FIG1A , the retrieval system provided in the embodiment of the present application includes a cloud and multiple clients, and the cloud can communicate with multiple clients through a network. Among them, the cloud can be software or services of a cloud platform, or software or services deployed on nodes in a network such as edge nodes. The cloud can run on an independent physical machine or on virtualized resources. The client can be a terminal device or an application, for example, the application runs on a terminal device for use by a user. The application can be a search engine, an answering application, or a shopping application, etc.
该检索系统可以是浏览器的搜索系统,搜索应答系统,或者购物平台系统等。The retrieval system may be a browser search system, a search response system, or a shopping platform system, etc.
客户端可以向云端发送查询请求,查询请求中包括第一文本和第一图片,第一文本用于调整第一图片。云端的云侧装置可以根据来自客户端的第一图片和第一文本,生成多张预览图片,然后向客户端发送多张预览图片;再根据来自客户端的操作信息,检索与目标图片关联的第二图片,目标图片包含于多张预览图片;然后再向客户端发送第二图片。The client can send a query request to the cloud, the query request includes a first text and a first picture, and the first text is used to adjust the first picture. The cloud-side device on the cloud can generate multiple preview pictures based on the first picture and the first text from the client, and then send the multiple preview pictures to the client; then, based on the operation information from the client, retrieve a second picture associated with the target picture, the target picture is included in the multiple preview pictures; and then send the second picture to the client.
上述过程以购物场景为例,参阅图1B,用户输入的第一图片可以为“黑色短袖”的图片101,第一文 本可以为“有花朵图案就更好了”102。云端可以根据上述“黑色短袖”的图片101和“有花朵图案就更好了”102,生成图1B中所示的四张预览图片,客户端接收到这四张预览图片后,用户可以从四张预览图片中选择其中一张图片作为目标图片,如:图1B中,用户选择了“花朵分布均匀的黑色短袖图片”103作为目标图片,云端可以根据该“花朵分布均匀的黑色短袖图片”103检索出合适的第二图片,如图1B中,云端向客户端返回了两张第二图片,分别为图片104和图片105,图片104和图片105中在短袖上的花朵分布与图片103相似度较高,图片104中是黑色花朵分布均匀的短袖,图片105中是白色花朵分布均匀的短袖。用户可以通过颜色对比,选择自己喜欢的商品进行购买。云端还可以为用户提供更多颜色的选择,这些不同颜色的带花朵的短袖可以在一张图片上呈现。The above process takes a shopping scenario as an example. Referring to FIG. 1B , the first picture input by the user may be a picture 101 of “black short-sleeved shirt”, and the first text It could be "It would be better if there is a flower pattern" 102. The cloud can generate four preview pictures shown in Figure 1B based on the above-mentioned "black short-sleeved shirt" picture 101 and "It would be better if there is a flower pattern" 102. After the client receives the four preview pictures, the user can select one of the four preview pictures as the target picture. For example, in Figure 1B, the user selects "black short-sleeved shirt picture with evenly distributed flowers" 103 as the target picture. The cloud can retrieve a suitable second picture based on the "black short-sleeved shirt picture with evenly distributed flowers" 103. As shown in Figure 1B, the cloud returns two second pictures to the client, namely pictures 104 and 105. The flower distribution on the short-sleeved shirt in pictures 104 and 105 is highly similar to that in picture 103. Picture 104 shows a short-sleeved shirt with evenly distributed black flowers, and picture 105 shows a short-sleeved shirt with evenly distributed white flowers. Users can choose their favorite products to buy by comparing colors. The cloud can also provide users with more color options. These short-sleeved shirts with flowers in different colors can be presented in one picture.
本申请中,用户可以在客户端输入第一图片和第一文本,通过第一文本来指导云端调整第一图片,从而得到多张预览图片,用户可以从预览图片中选择合适的目标图片,云端会基于目标图片进行检索,这样可以缩小检索范围,还可以提高图片检索的精准度。In this application, the user can input the first image and the first text on the client, and use the first text to guide the cloud to adjust the first image, thereby obtaining multiple preview images. The user can select a suitable target image from the preview images, and the cloud will perform searches based on the target images, which can narrow the search scope and improve the accuracy of image retrieval.
图1A中的云端的云侧装置可以是云系统中的工作节点或者调度节点,调度节点接收到来自客户端的查询请求后,可以由调度节点来执行相应的检索过程,调度节点也可以将该查询请求分配给云系统中的一个或多个工作节点,由一个或多个工作节点来执行相应的检索过程。The cloud-side device in the cloud in Figure 1A can be a working node or a scheduling node in the cloud system. After the scheduling node receives a query request from the client, the scheduling node can execute the corresponding retrieval process. The scheduling node can also assign the query request to one or more working nodes in the cloud system, and the corresponding retrieval process will be executed by one or more working nodes.
调度节点的功能可以通过软件或硬件来实现。The function of the scheduling node can be implemented by software or hardware.
调度节点作为软件功能单元的一种举例,调度节点可以包括运行在计算实例上的代码。其中,计算实例可以包括物理主机(计算设备)、虚拟机、容器中的至少一种。进一步地,上述计算实例可以是一台或者多台。例如,调度节点可以包括运行在多个主机/虚拟机/容器上的代码。需要说明的是,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的区域(region)中,也可以分布在不同的region中。进一步地,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的可用区(availability zone,AZ)中,也可以分布在不同的AZ中,每个AZ包括一个数据中心或多个地理位置相近的数据中心。其中,通常一个region可以包括多个AZ。As an example of a software functional unit, the scheduling node may include code running on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Furthermore, the computing instance may be one or more. For example, the scheduling node may include code running on multiple hosts/virtual machines/containers. It should be noted that the multiple hosts/virtual machines/containers used to run the code may be distributed in the same region or in different regions. Furthermore, the multiple hosts/virtual machines/containers used to run the code may be distributed in the same availability zone (AZ) or in different AZs, each AZ including one data center or multiple data centers with similar geographical locations. Generally, a region may include multiple AZs.
同样,用于运行该代码的多个主机/虚拟机/容器可以分布在同一个虚拟私有云(virtual private cloud,VPC)中,也可以分布在多个VPC中。其中,通常一个VPC设置在一个区域内,同一区域内两个VPC之间,以及不同区域的VPC之间跨区通信需在每个VPC内设置通信网关,经通信网关实现VPC之间的互连。Similarly, multiple hosts/virtual machines/containers used to run the code can be distributed in the same virtual private cloud (VPC) or in multiple VPCs. Usually, a VPC is set up in one region. For cross-region communication between two VPCs in the same region and between VPCs in different regions, a communication gateway needs to be set up in each VPC to achieve interconnection between VPCs through the communication gateway.
调度节点作为硬件功能单元的一种举例,调度节点可以包括至少一个计算设备,如服务器等。或者,调度节点也可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合实现。As an example of a hardware functional unit, the scheduling node may include at least one computing device, such as a server, etc. Alternatively, the scheduling node may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). The PLD may be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL) or any combination thereof.
调度节点包括的多个计算设备可以分布在相同的区域中,也可以分布在不同的区域中。调度节点包括的多个计算设备可以分布在相同的AZ中,也可以分布在不同的AZ中。同样,调度节点包括的多个计算设备可以分布在同一个VPC中,也可以分布在多个VPC中。其中,所述多个计算设备可以是服务器、ASIC、PLD、CPLD、FPGA和GAL等计算设备的任意组合。The multiple computing devices included in the scheduling node can be distributed in the same area or in different areas. The multiple computing devices included in the scheduling node can be distributed in the same AZ or in different AZs. Similarly, the multiple computing devices included in the scheduling node can be distributed in the same VPC or in multiple VPCs. The multiple computing devices can be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
工作节点可以是物理机,也可以是虚拟机(virtual machine,VM)或容器(container)等计算实例,工作节点上可以包括一个或多个中央处理器(central processing unit,CPU)和图形处理器(graphics processing unit,GPU)等,工作节点也可以是CPU或GPU。A working node can be a physical machine, or a computing instance such as a virtual machine (VM) or a container. A working node can include one or more central processing units (CPUs) and graphics processing units (GPUs), etc. A working node can also be a CPU or a GPU.
客户端可以是终端设备。终端设备,又称之为用户设备(user equipment,UE)、移动台(mobile station,MS)、移动终端(mobile terminal,MT)等,是包括无线通信功能(向用户提供语音/数据连通性)的设备,例如,具有无线连接功能的手持式设备。目前,一些终端设备的举例为:手机(mobile phone)、平板电脑、笔记本电脑、掌上电脑、笔记本电脑、无线路由器、移动互联网设备(mobile internet device,MID)、可穿戴设备,虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、车联网中的无线终端、远程手术(remote medical surgery)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、或智慧家庭(smart home)中的无线终端等。例如,车联网中的无线终端可以为车载设备、整车设备、车载模块、车辆等。工业控 制中的无线终端可以为机器人等。例如,无人驾驶中的无线终端可以为无人机。该终端设备可以是运行安卓系统、IOS系统、windows系统以及其他系统的设备。在该终端设备中可以运行有需要对应用场景进行渲染而得到二维图像的应用程序,例如游戏应用、锁屏应用或地图应用等。The client may be a terminal device. The terminal device, also known as user equipment (UE), mobile station (MS), mobile terminal (MT), etc., is a device including a wireless communication function (providing voice/data connectivity to users), for example, a handheld device with a wireless connection function. At present, some examples of terminal devices are: mobile phones, tablet computers, laptops, PDAs, laptops, wireless routers, mobile internet devices (MID), wearable devices, virtual reality (VR) devices, augmented reality (AR) devices, wireless terminals in industrial control, wireless terminals in self-driving, wireless terminals in the Internet of Vehicles, wireless terminals in remote medical surgery, wireless terminals in smart grids, wireless terminals in transportation safety, wireless terminals in smart cities, or wireless terminals in smart homes, etc. For example, the wireless terminal in the Internet of Vehicles may be a vehicle-mounted device, a vehicle-mounted device, a vehicle-mounted module, a vehicle, etc. Industrial Control The wireless terminal in the system may be a robot, etc. For example, the wireless terminal in the unmanned driving may be a drone. The terminal device may be a device running an Android system, an IOS system, a Windows system, or other systems. The terminal device may run an application that needs to render an application scene to obtain a two-dimensional image, such as a game application, a lock screen application, or a map application.
上述图1A中云端可以位于云系统中,云系统的架构可以参阅图1C进行理解。如图1C所示,云系统包括云平台和基础资源。云平台包括云平台管理器,前述介绍的调度节点可以为图1C中的云平台管理器。基础资源可以包括多个服务器,每个服务器都可以为一个工作节点,也可以是每个服务器都包括多个工作节点。The cloud in FIG. 1A above may be located in a cloud system, and the architecture of the cloud system may be understood by referring to FIG. 1C. As shown in FIG. 1C, the cloud system includes a cloud platform and basic resources. The cloud platform includes a cloud platform manager, and the scheduling node described above may be the cloud platform manager in FIG. 1C. The basic resources may include multiple servers, each of which may be a working node, or each server may include multiple working nodes.
在图1C中的工作节点可以是计算设备卡或者虚拟机(virtual machine,VM)。其中,计算设备卡可以是中央处理器(central processing unit,CPU)、图形处理器(graphic processing unit GPU)和神经网络处理器(network processing unit,NPU)中的至少一种。The working node in FIG1C may be a computing device card or a virtual machine (VM). The computing device card may be at least one of a central processing unit (CPU), a graphics processing unit (GPU), and a neural network processor (NPU).
云平台管理器中会维护或定时采集基础资源中各个工作节点的信息,如:各工作节点上资源的使用情况(资源的使用率或资源的空闲率)等信息。这些信息可以作为分配查询请求时的辅助决策信息。The cloud platform manager will maintain or regularly collect information about each work node in the basic resources, such as the resource usage on each work node (resource usage rate or resource idle rate), etc. This information can be used as auxiliary decision-making information when allocating query requests.
云平台管理器可以接收来自客户端的查询请求,然后可以由云平台管理器来执行相应的检索过程,云平台管理器也可以将该查询请求分配给云系统中的一个或多个工作节点,由一个或多个工作节点来执行相应的检索过程。The cloud platform manager can receive query requests from the client, and then the cloud platform manager can execute the corresponding retrieval process. The cloud platform manager can also assign the query request to one or more working nodes in the cloud system, and the one or more working nodes can execute the corresponding retrieval process.
工作节点完成检索后,云平台管理器可以向客户端返回预览图片以及第二图片。After the working node completes the retrieval, the cloud platform manager can return the preview image and the second image to the client.
上述客户端以终端设备为例,本申请实施例提供的终端设备的结构可以参阅如下图2A进行理解,云端的云端装置的结构可以参阅如下图2B进行理解。The above-mentioned client takes a terminal device as an example. The structure of the terminal device provided in the embodiment of the present application can be understood by referring to FIG. 2A below, and the structure of the cloud device in the cloud can be understood by referring to FIG. 2B below.
请参考图2A,为本申请实施例提供的一种终端设备的结构示意图。如图2A所示,终端设备可以包括处理器101、收发器102、存储器103、显示器104以及总线105。处理器101、收发器102、存储器103以及显示器104通过总线105相互连接。在本申请的实施例中,处理器101用于对终端设备10的动作进行控制管理,例如,处理器101用于响应用户输入查询请求的过程。收发器102用于支持终端设备10通信,例如:收发器102可以执行发送查询请求以及接收预览图片以及第二图片的步骤。存储器103,用于存储终端设备10的程序代码和数据,显示器104用于显示预览图片以及第二图片。Please refer to Figure 2A, which is a schematic diagram of the structure of a terminal device provided in an embodiment of the present application. As shown in Figure 2A, the terminal device may include a processor 101, a transceiver 102, a memory 103, a display 104 and a bus 105. The processor 101, the transceiver 102, the memory 103 and the display 104 are interconnected via the bus 105. In an embodiment of the present application, the processor 101 is used to control and manage the actions of the terminal device 10, for example, the processor 101 is used to respond to the process of the user inputting a query request. The transceiver 102 is used to support the communication of the terminal device 10, for example: the transceiver 102 can execute the steps of sending a query request and receiving a preview image and a second image. The memory 103 is used to store the program code and data of the terminal device 10, and the display 104 is used to display the preview image and the second image.
其中,处理器101可以是中央处理器单元,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。总线105可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图2A中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Among them, the processor 101 can be a central processing unit, a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute various exemplary logic blocks, modules and circuits described in conjunction with the disclosure of this application. The processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like. The bus 105 can be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, only one thick line is used in FIG. 2A, but it does not mean that there is only one bus or one type of bus.
以上图2A介绍了终端设备的结构,下面结合图2B介绍云端装置的结构。FIG. 2A above introduces the structure of the terminal device, and the structure of the cloud device is introduced below in conjunction with FIG. 2B .
图2B为本申请的实施例提供的云端装置的一种可能的逻辑结构示意图。如图2B所示,本申请实施例提供的云端装置20包括:处理器201、通信接口202、存储器203以及总线204。处理器201、通信接口202以及存储器203通过总线204相互连接。在本申请的实施例中,处理器201用于对云端装置20的动作进行控制管理,例如,处理器201用于根据来自客户端的第一图片和第一文本,生成多张预览图片。通信接口202用于支持云端装置20进行通信,例如:通信接口202可以接收第一图片和第一文本的步骤,以及发送预览图片和第二图片的步骤。存储器203,用于存储云端装置20的程序代码和数据。FIG2B is a possible logical structure diagram of a cloud device provided in an embodiment of the present application. As shown in FIG2B , the cloud device 20 provided in an embodiment of the present application includes: a processor 201, a communication interface 202, a memory 203, and a bus 204. The processor 201, the communication interface 202, and the memory 203 are interconnected via the bus 204. In an embodiment of the present application, the processor 201 is used to control and manage the actions of the cloud device 20. For example, the processor 201 is used to generate multiple preview images based on a first image and a first text from a client. The communication interface 202 is used to support the cloud device 20 to communicate. For example, the communication interface 202 can receive the steps of the first image and the first text, and send the preview image and the second image. The memory 203 is used to store the program code and data of the cloud device 20.
其中,处理器201可以是中央处理器单元,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。总线204可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图2B中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Among them, the processor 201 can be a central processing unit, a general processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute various exemplary logic blocks, modules and circuits described in conjunction with the disclosure of this application. The processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like. The bus 204 can be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, only one thick line is used in FIG. 2B, but it does not mean that there is only one bus or one type of bus.
下面对本申请实施例提供的图片检索的方法进行描述。该方法中涉及到云端执行的内容可以由云端 执行,也可以由云端的部件(例如处理器、芯片、或芯片系统等)执行。涉及到客户端执行的内容可以由客户端执行,也可以由客户端的部件(例如处理器、芯片、或芯片系统等)执行。The following describes the image retrieval method provided in the embodiment of the present application. The content involved in the cloud execution in the method can be executed by the cloud The content related to client execution can be executed by the client or by a component of the client (such as a processor, chip, or chip system).
图3为本申请实施例提供的图片检索的方法的一实施例示意图。FIG3 is a schematic diagram of an embodiment of a method for image retrieval provided in an embodiment of the present application.
301.客户端向云端发送用户输入的第一图片和第一文本。对应地,云端接收来自客户端的第一图片和第一文本。301. The client sends the first picture and the first text input by the user to the cloud. Correspondingly, the cloud receives the first picture and the first text from the client.
本申请中,第一图片和第一文本可以是通过一个查询请求发送的,也可以是分别发送的。In the present application, the first image and the first text may be sent through one query request or may be sent separately.
第一文本用于调整第一图片。The first text is used to adjust the first picture.
客户端上输入第一图片和第一文本的界面可以参阅图4进行理解,如图4所示,可以在文本输入框401中输入第一文本,通过图片选项框402选择第一图片,然后可以通过点击发送按钮403,发送第一图片和第一文本。The interface for inputting the first picture and the first text on the client can be understood by referring to Figure 4. As shown in Figure 4, the first text can be entered in the text input box 401, the first picture can be selected through the picture option box 402, and then the first picture and the first text can be sent by clicking the send button 403.
302.云端根据来自客户端的第一图片和第一文本,生成多张预览图片。302. The cloud generates multiple preview images according to the first image and the first text from the client.
该步骤302的过程可以是:云端扩展第一文本得到多个扩展文本;然后根据每个扩展文本,以及第一图片生成与每个扩展文本对应的预览图片。The process of step 302 may be: expanding the first text in the cloud to obtain a plurality of extended texts; and then generating a preview image corresponding to each extended text according to each extended text and the first image.
云端可以围绕第一文本中的内容进行扩展,如:第一文本为在第一图片上配置花的图案,那么可以围绕花的种类,花在第一图片上的位置等进行文本扩展。The cloud can expand around the content in the first text. For example, if the first text is a flower pattern configured on the first picture, the text can be expanded around the type of flower, the position of the flower on the first picture, etc.
其中,扩展文本可以是云端根据第一文本生成的,也可以是云端从文本设计数据库中查找到的与第一文本的相似度满足第一条件的文本。或者,云端从文本设计数据库中查找到的与第一文本的相似度和设计受欢迎度满足第二条件的文本。The extended text may be generated by the cloud based on the first text, or may be a text found by the cloud from a text design database whose similarity to the first text meets the first condition. Alternatively, the extended text may be a text found by the cloud from a text design database whose similarity to the first text and design popularity meet the second condition.
文本设计数据库可以是预先构建的,并可以持续更新的,文本设计数据库中的文本可以是根据历史上的文本检索记录收集的,当然,文本设计数据库中的文本还可以通过其他方式获得。The text design database may be pre-built and continuously updated. The texts in the text design database may be collected based on historical text retrieval records. Of course, the texts in the text design database may also be obtained through other means.
本申请中,可以通过第一文本,在文本设计数据库中查找与第一文本相似度高的文本。第一条件可以是一个阈值,以相似度的取值在0至1的区间为例,阈值可以为0.5或者其他数值,以0.5为例,相似度大于0.5的文本都可以理解为是与第一文本的相似度满足第一条件的文本。第一条件还可以是文本个数的限制条件,如:与第一文本的相似度最高的前N个文本,为与第一文本的相似度满足第一条件的文本。其中,N为大于1的整数。In the present application, the first text can be used to search for texts with high similarity to the first text in the text design database. The first condition can be a threshold value. Taking the similarity value in the range of 0 to 1 as an example, the threshold value can be 0.5 or other values. Taking 0.5 as an example, texts with a similarity greater than 0.5 can be understood as texts whose similarity to the first text meets the first condition. The first condition can also be a restriction condition on the number of texts, such as: the top N texts with the highest similarity to the first text are texts whose similarity to the first text meets the first condition. Wherein, N is an integer greater than 1.
本申请中,可以通过第一文本,在文本设计数据库中查找与第一文本相似度高的文本。并确定这些相似度高的文本的受欢迎度,然后,可以结合相似度与受欢迎度来确定扩展文本。文本设计数据库中的每个文本都可以预先通过被使用次数等统计出受欢迎度。受欢迎度的取值可以是0至1之间的数值。第二条件可以包括一个阈值,也可以包括两个阈值,当是一个阈值时,可以对文本的相似度和受欢迎度进行加权,得到一个加权值(相似度的权重通常大于受欢迎度),若该加权值大于这一个阈值,则表示该文本满足第二条件,属于第一文本的扩展文本。若第二条件是两个阈值,则一个阈值可以为相似度阈值,如:为0.5;一个阈值为扩展文本的数量阈值,如:为N,N为正整数,若查找到的文本与第一文本的相似度大于0.5,则可以为相似度大于0.5的扩展文本按照受欢迎度的大小进行排序,然后从中选择受欢迎度最大的N个扩展文本作为最终选择的扩展文本。In the present application, the first text can be used to search for texts with high similarity to the first text in the text design database. And determine the popularity of these texts with high similarity, and then, the extended text can be determined by combining the similarity and popularity. Each text in the text design database can be pre-stated by the number of times it is used. The value of popularity can be a value between 0 and 1. The second condition can include a threshold value or two threshold values. When it is a threshold value, the similarity and popularity of the text can be weighted to obtain a weighted value (the weight of the similarity is usually greater than the popularity). If the weighted value is greater than this threshold value, it means that the text meets the second condition and belongs to the extended text of the first text. If the second condition is two threshold values, one threshold value can be a similarity threshold value, such as: 0.5; one threshold value is a threshold value for the number of extended texts, such as: N, N is a positive integer. If the similarity between the text found and the first text is greater than 0.5, the extended texts with similarity greater than 0.5 can be sorted according to the size of popularity, and then the N extended texts with the largest popularity are selected as the final selected extended texts.
上述通过文本设计数据库生成预览图片的过程还可以参阅图5进行理解。The above process of generating a preview image through a text design database can also be understood by referring to FIG. 5 .
如图5所示,云端可以将第一文本“有花朵图案就更好了”与文本设计数据库中的文本进行相似度比对,如果第一条件要求扩展文本与第一文本的相似度要达到0.7,则云端可以从文本设计数据库中匹配出例如图5中所示的四个扩展文本,分别为“多支小花在中心”、“小花图案在左上角”、“一支红色玫瑰在中心”,以及“多支盛开花朵”。As shown in Figure 5, the cloud can compare the similarity of the first text "It would be better with a flower pattern" with the text in the text design database. If the first condition requires that the similarity between the extended text and the first text reaches 0.7, the cloud can match four extended texts such as those shown in Figure 5 from the text design database, namely "multiple small flowers in the center", "small flower pattern in the upper left corner", "a red rose in the center", and "multiple blooming flowers".
这四个扩展文本与第一文本的相似度可以参阅表1进行理解,如表1所示:The similarities between the four extended texts and the first text can be understood by referring to Table 1, as shown in Table 1:
表1:扩展文本与第一文本的相似度
Table 1: Similarity between the extended text and the first text
云端确定扩展文本后,就可以根据每个扩展文本以及第一图片,采用图片编辑式生成模型,向该图片编辑式生成模型中输入第一图片和各个扩展文本,产生不同的设计方案,然后根据不同的设计方案,生成预览图片。After the extended text is determined in the cloud, a picture editing generation model can be used according to each extended text and the first picture. The first picture and each extended text can be input into the picture editing generation model to generate different design schemes, and then a preview picture can be generated according to the different design schemes.
另外,云端还可以根据文本的相似度和受欢迎度来确定扩展文本,如图5中,与第一文本“有花朵图案就更好了”按照相似度的要求可以从文本设计数据库中查找到多个相似度较高的文本,然后可以进一步结合各个相似度较高的文本的受欢迎度来确定扩展文本,如针对相似度高于0.7的文本按照受欢迎度的从大到小的顺序进行排序,然后选择其中排序在前N(N=4)的文本作为扩展文本,可以得到表2的四个扩展文本。In addition, the cloud can also determine the extended text based on the similarity and popularity of the text. For example, as shown in FIG5 , according to the similarity requirement with the first text “It would be better with a flower pattern”, multiple texts with high similarity can be found from the text design database, and then the extended text can be further determined based on the popularity of each text with high similarity. For example, texts with a similarity higher than 0.7 are sorted in descending order of popularity, and then the texts ranked in the top N (N=4) are selected as extended texts, and the four extended texts in Table 2 can be obtained.
表2:扩展文本与第一文本的相似度,及扩展文本的受欢迎度
Table 2: Similarity between the extended text and the first text, and the popularity of the extended text
得到扩展文本后,就可以通过将扩展文本和第一图片输入到图片编辑式生成模型,生成预览图片。After obtaining the extended text, the preview image can be generated by inputting the extended text and the first image into the image editing generation model.
本申请实施例中,通过扩展文本,得到的预览图片,可以从多个角度提供符合用户在第一文本中要求的预览图片,从而在用户要求范围内增加了预览图片的丰富度。In an embodiment of the present application, by expanding the text, the preview images obtained can provide preview images that meet the user's requirements in the first text from multiple angles, thereby increasing the richness of the preview images within the scope of the user's requirements.
303.云端向客户端发送多张预览图片。对应地,客户端接收来自云端的多张预览图片。303. The cloud sends multiple preview images to the client. Correspondingly, the client receives multiple preview images from the cloud.
304.客户端向云端发送用户输入的操作信息。对应地,云端接收来自客户端的操作信息。304. The client sends the operation information input by the user to the cloud. Correspondingly, the cloud receives the operation information from the client.
其中,操作信息用于检索与目标图片关联的第二图片,目标图片包含于多张预览图片。The operation information is used to retrieve a second image associated with a target image, and the target image is included in a plurality of preview images.
本申请实施例中,操作信息包括目标图片,或者,操作信息包括目标图片和第二文本。In an embodiment of the present application, the operation information includes a target image, or the operation information includes a target image and a second text.
在基于大型语言模型(large language model,LLM)的应答系统中,若操作信息包括目标图片,不包括第二文本,则LLM可以调用图片编辑式检索模型,根据目标图片和目标图片对应的扩展文本检索第二图片。In a response system based on a large language model (LLM), if the operation information includes a target image but not a second text, the LLM can call a picture editing retrieval model to retrieve the second image based on the target image and the extended text corresponding to the target image.
该应答系统中还可以包括文本分析模型,该LLM可以调用文本分析模型分析第二文本,然后基于对第二文本的分析结果,LLM再决定调用图片编辑式检索模型或者图片编辑式生成模型。The answering system may also include a text analysis model. The LLM may call the text analysis model to analyze the second text. Then, based on the analysis result of the second text, the LLM may decide to call the image editing retrieval model or the image editing generation model.
文本分析模型的结构以及与LLM的关系可以参阅图6进行理解,如图6所示,文本分析模型可以包括情感分析模型,以及实体抽取模型;其中,情感分析模型可以分析文本中用户的语气以及感情,实体抽取模型可以分析文本中的实体,实体可以是文本中的对象。文本分析模型分析完第二文本后,可以将分析结果返回给LLM。The structure of the text analysis model and its relationship with the LLM can be understood by referring to FIG6. As shown in FIG6, the text analysis model can include a sentiment analysis model and an entity extraction model; wherein the sentiment analysis model can analyze the tone and feelings of the user in the text, and the entity extraction model can analyze the entities in the text, and the entities can be objects in the text. After the text analysis model analyzes the second text, the analysis result can be returned to the LLM.
305.云端根据来自客户端的操作信息,检索与目标图片关联的第二图片,目标图片包含于多张预览图片。305. The cloud retrieves a second image associated with the target image according to the operation information from the client, where the target image is included in the plurality of preview images.
该步骤305中,若操作信息不包括第二文本,则云端可以通过LLM调用图片编辑式检索模型,基于目标图片对应的扩展文本和目标图片检索关联的第二图片。In step 305, if the operation information does not include the second text, the cloud can call the image editing retrieval model through LLM to retrieve the associated second image based on the extended text corresponding to the target image and the target image.
若操作信息包括第二文本,则该过程可以参阅图7进行理解,文本分析模型701对第二文本进行分析,确定目标图片是否符合用户预期,若第二文本为“挺好的”,“不错”,“是我想要的”等表示用户满意的语句,则表示目标图片符合用户预期,则LLM702调用图片编辑式检索模型703,图片编辑式检索模型703根据目标图片对应的扩展文本和目标图片检索第二图片。If the operation information includes a second text, the process can be understood by referring to Figure 7. The text analysis model 701 analyzes the second text to determine whether the target image meets the user's expectations. If the second text is a sentence such as "pretty good", "not bad", "it's what I want", etc. that indicates user satisfaction, it means that the target image meets the user's expectations. Then LLM702 calls the picture editing retrieval model 703, and the picture editing retrieval model 703 retrieves the second image according to the extended text corresponding to the target image and the target image.
若第二文本为“没有我想要的”、“花朵的分布太分散”等表示用户不满意的语句,则表示目标图片不符合用户预期,则LLM702调用图片编辑式生成模型704,图片编辑式生成模型704会生成与目标图片和第二文本关联的多张预览图片,此处关于与目标图片和第二文本关联的多张预览图片的生成过程可以参阅前面第一文本和第一图片生成预览图片的过程进行理解。然后云端可以将这波的预览图片发送给客户端,若用户从这波的预览图片中选到了满意的目标图片,则云端可以根据用户满意的目标图片检索第二图片,若用户还不满意,则还可以重复上述生成预览图片的过程,直到得到用户满意的目标图片。If the second text is a sentence indicating user dissatisfaction, such as "there is not what I want", "the distribution of flowers is too scattered", etc., it means that the target image does not meet the user's expectations, and LLM702 calls the picture editing generation model 704, which generates multiple preview images associated with the target image and the second text. The generation process of multiple preview images associated with the target image and the second text can be understood by referring to the process of generating preview images from the first text and the first image. Then the cloud can send this wave of preview images to the client. If the user selects a satisfactory target image from this wave of preview images, the cloud can retrieve the second image based on the target image that the user is satisfied with. If the user is still not satisfied, the above process of generating preview images can be repeated until the target image that the user is satisfied with is obtained.
图片编辑式检索模型可以从图片向量数据库中检索第二图片。图片向量数据库可以是预先构建的,并可以持续更新的,图片向量数据库中的图片可以是从网络上收集的,并且可以持续更新的。当然,第 二图片也可以是从网络上搜索到的。本申请中,通过从图片向量数据库检索第二图片,可以提高第二图片检索的速度。The image editing retrieval model can retrieve the second image from the image vector database. The image vector database can be pre-built and continuously updated. The images in the image vector database can be collected from the Internet and can be continuously updated. The second picture may also be searched from the Internet. In the present application, by retrieving the second picture from the picture vector database, the speed of retrieving the second picture can be improved.
306.云端向客户端发送第二图片。对应地,客户端接收来自云端的第二图片。306. The cloud sends the second picture to the client. Correspondingly, the client receives the second picture from the cloud.
本申请实施例中,云端在发送第二图片时,还可以发送与第二图片关联的链接信息。对应地,客户端在接收第二图片时,也会接收来自云端的与第二图片关联的链接信息。In the embodiment of the present application, when the cloud sends the second picture, it can also send the link information associated with the second picture. Correspondingly, when the client receives the second picture, it also receives the link information associated with the second picture from the cloud.
与第二图片关联的链接信息可以是第二图片中商品的描述信息以及购买链接等。这样,可以提高与用户的沟通效率。The link information associated with the second picture may be description information of the product in the second picture and a purchase link, etc. In this way, the communication efficiency with the user can be improved.
本申请中,用户可以在客户端输入第一图片和第一文本,通过第一文本来指导云端调整第一图片,从而得到多张预览图片,用户可以从预览图片中选择合适的目标图片,云端会基于目标图片进行检索,这样可以缩小检索范围,还可以提高图片检索的精准度。In this application, the user can input the first image and the first text on the client, and use the first text to guide the cloud to adjust the first image, thereby obtaining multiple preview images. The user can select a suitable target image from the preview images, and the cloud will perform searches based on the target images, which can narrow the search scope and improve the accuracy of image retrieval.
本申请实施例中,云端的执行流程可以配置在机器学习、深度学习的平台软件中,可以是部署在服务器硬件上的程序代码。以图8所示的架构为例,本申请云端的程序代码可以存在于上述平台软件的运行时引擎、内存管理和通信管理模块内部,以及现有模块外部。运行时,本申请的程序代码可以运行于服务器的主机内存和/或图形处理器(graphic processing unit,GPU)内存。图8所示意的架构为本申请实施例中的服务器及平台软件中的实现形态,其中,在硬件层包括GPU和存储器,另外,存储器中会存储有文本设计数据库,以及图片向量数据库。软件层包括LLM、图片编辑式生成模型和图片编辑式检索模型,当然,软件层还可以包括文本分析模型。LLM、图片编辑式生成模型和图片编辑式检索模型以及文本分析模型之间的调用关系可以参阅前面的介绍进行理解。服务器执行本申请的流程时,会通过GPU调用存储器中的文本设计数据库来生成扩展文本,以及调用图片向量数据库中的图片来得到预览图片。这些过程都可以参阅前面的介绍进行理解。In the embodiment of the present application, the execution process in the cloud can be configured in the platform software of machine learning and deep learning, and can be a program code deployed on the server hardware. Taking the architecture shown in Figure 8 as an example, the program code in the cloud of the present application can exist in the runtime engine, memory management and communication management modules of the above-mentioned platform software, as well as outside the existing modules. During operation, the program code of the present application can run in the host memory and/or graphics processing unit (GPU) memory of the server. The architecture shown in Figure 8 is the implementation form in the server and platform software in the embodiment of the present application, wherein the hardware layer includes GPU and memory, and in addition, the memory will store a text design database and a picture vector database. The software layer includes LLM, picture editing generation model and picture editing retrieval model, and of course, the software layer can also include a text analysis model. The calling relationship between LLM, picture editing generation model, picture editing retrieval model and text analysis model can be understood by referring to the previous introduction. When the server executes the process of the present application, it will call the text design database in the memory through the GPU to generate extended text, and call the picture in the picture vector database to obtain the preview picture. These processes can be understood by referring to the previous introduction.
本申请实施例中的文本设计数据库,以及图片向量数据库可以是通过离线训练得到的,并且可以进行在线更新。如图9所示,该检索系统大致分为离线系统,在线系统和存储模块。离线系统的主要任务是基于互联网商品图片与文本构建设计文本,和商品图片特征向量,并对设计文本和商品图片进行筛选。该过程可以包括:对商品图片进行去重,并基于图像预训练模型产生图片表征向量,构建商品图片索引,以构建图片向量数据库。基于商品文本,对商品属性进行分类,并提取具体内容;完成后,进行去重,文本清洗,构建文本设计数据库。The text design database and the image vector database in the embodiment of the present application can be obtained through offline training and can be updated online. As shown in Figure 9, the retrieval system is roughly divided into an offline system, an online system and a storage module. The main task of the offline system is to construct design texts and product image feature vectors based on Internet product images and texts, and to screen the design texts and product images. The process may include: deduplication of product images, and generating image representation vectors based on the image pre-training model, constructing a product image index, and constructing a picture vector database. Based on the product text, the product attributes are classified and the specific content is extracted; after completion, deduplication and text cleaning are performed to construct a text design database.
本申请实施例中,离线系统还可以基于商品图片与文本,对图片编辑式生成模型和图片编辑式检索模型进行再训练。In the embodiment of the present application, the offline system can also retrain the image editing generation model and the image editing retrieval model based on the product images and texts.
在线系统的主要任务是根据用户请求,基于图片编辑式生成模型与图片编辑式检索模型,实现生成预览图片与商品检索。存储模块主要是用来存储设计文本,商品图片表征向量,LLM大模型,图片编辑式检索模型及图片编辑式生成模型,以供离线系统和在线系统的调用。The main task of the online system is to generate preview images and retrieve products based on the image editing generation model and image editing retrieval model according to user requests. The storage module is mainly used to store design text, product image representation vectors, LLM large model, image editing retrieval model and image editing generation model for offline and online system calls.
在线系统在接收用户输入的请求后,并将其传递给LLM进行决策;LLM对用户输入进行分析,进而决策是否启动图片编辑式检索流程,若启动,调用图片编辑式生成模块。After receiving the user input request, the online system passes it to LLM for decision-making; LLM analyzes the user input and then decides whether to start the image editing retrieval process. If started, the image editing generation module is called.
图片编辑式生成模块根据用户输入的文本进行扩写,产生多个扩展文本,并根据图像编辑式生成模型生成对应的预览图片返回给LLM。The picture editing generation module expands the text input by the user to generate multiple extended texts, and generates corresponding preview pictures based on the image editing generation model and returns them to the LLM.
LLM结合生成的预览图片对用户请求进行回复;用户对生成的预览图片进行反馈。LLM responds to the user's request based on the generated preview image; the user provides feedback on the generated preview image.
LLM将用户反馈传递给文本分析模型,文本分析模型获取用户反馈中的关键信息,并将分析结果返回给LLM。LLM passes user feedback to the text analysis model, which obtains key information from the user feedback and returns the analysis results to LLM.
LLM依据文本分析模块返回的分析结果,决策是否继续调用图片编辑式生成模块,或是图片编辑式检索模块;若用户满意,则LLM调用图片编辑式检索模块,若用户不满意,则LLM调用图像编辑式生成模块,继续生成下一波预览图片。Based on the analysis results returned by the text analysis module, LLM decides whether to continue calling the image editing generation module or the image editing retrieval module; if the user is satisfied, LLM calls the image editing retrieval module; if the user is not satisfied, LLM calls the image editing generation module to continue generating the next wave of preview images.
一旦用户满意生成的预览图片,图像编辑式检索模型就以用户选择的目标图片及对应的文本为输入进行图像编辑式检索,并将检索到的图片返回给LLM。Once the user is satisfied with the generated preview image, the image editing retrieval model performs image editing retrieval with the target image selected by the user and the corresponding text as input, and returns the retrieved image to the LLM.
LLM结合将检索的图片回复用户。LLM combines the retrieved images to reply to the user.
本申请实施例的程序代码在云端部署后,可以通过远程链接的形式对外提供服务,如:用户下载应用(application,APP),用户可以通过该APP与云端进行交互,从而得到满意的图片。 After the program code of the embodiment of the present application is deployed in the cloud, it can provide external services through a remote link, such as: a user downloads an application (APP), and the user can interact with the cloud through the APP to obtain a satisfactory picture.
本申请实施例提供的方案,将生成预览图片与商品检索相结合:根据用户输入的图片和文本,通过图片编辑式生成技术进行生成预览,可以使用户得到满意的图片。另外,本申请通过文本扩写,向用户推荐扩展方案,实现了多样化的推荐,同时有助于收集用户的编辑意图,解决了用户输入文本较短,检索范围大的问题。另外,可以基于用户满意的图片进行检索,实现了对图片的精准检索。The solution provided by the embodiment of the present application combines the generation of preview images with product retrieval: based on the images and texts input by the user, a preview is generated through image editing generation technology, so that the user can get satisfactory images. In addition, the present application recommends expansion solutions to users through text expansion, realizes diversified recommendations, and helps to collect the user's editing intentions, solving the problem that the user inputs a short text and has a large search range. In addition, retrieval can be performed based on images that the user is satisfied with, realizing accurate retrieval of images.
以上介绍了图片检索系统,以及图片检索的方法,下面结合附图介绍相关装置。The above introduces the image retrieval system and the image retrieval method. The following introduces the relevant devices in conjunction with the accompanying drawings.
如图10所示,本申请实施例提供的云端装置100的一结构包括:As shown in FIG. 10 , a structure of a cloud device 100 provided in an embodiment of the present application includes:
处理单元1001,用于根据来自客户端的第一图片和第一文本,生成多张预览图片,第一文本用于调整第一图片。The processing unit 1001 is used to generate a plurality of preview images according to a first image and a first text from a client, wherein the first text is used to adjust the first image.
发送单元1002,用于向客户端发送多张预览图片。The sending unit 1002 is used to send multiple preview pictures to the client.
处理单元1001,还用于根据来自客户端的操作信息,检索与目标图片关联的第二图片,目标图片包含于多张预览图片。The processing unit 1001 is further configured to retrieve a second image associated with the target image according to operation information from the client, where the target image is included in the plurality of preview images.
发送单元1002,还用于向客户端发送第二图片。The sending unit 1002 is further configured to send the second image to the client.
本申请实施例中,用户可以在客户端输入第一图片和第一文本,通过第一文本来指导云端调整第一图片,从而得到多张预览图片,用户可以从预览图片中选择合适的目标图片,云端会基于目标图片进行检索,这样可以缩小检索范围,还可以提高图片检索的精准度。In an embodiment of the present application, a user can input a first image and a first text on the client, and use the first text to guide the cloud to adjust the first image, thereby obtaining multiple preview images. The user can select a suitable target image from the preview images, and the cloud will perform a search based on the target image, which can narrow the search scope and improve the accuracy of image retrieval.
可选地,处理单元1001,用于扩展第一文本得到多个扩展文本;根据每个扩展文本,以及第一图片生成与每个扩展文本对应的预览图片。Optionally, the processing unit 1001 is configured to expand the first text to obtain multiple extended texts; and generate a preview image corresponding to each extended text according to each extended text and the first image.
可选地,多个扩展文本为文本设计数据库中,与第一文本的相似度满足第一条件的文本。Optionally, the multiple extended texts are texts in a text design database whose similarity with the first text satisfies a first condition.
可选地,多个扩展文本为文本设计数据库中,与第一文本的相似度和设计受欢迎度满足第二条件的文本。Optionally, the multiple extended texts are texts in a text design database whose similarity and design popularity with the first text meet the second condition.
可选地,第二图片为从图片向量数据库中检索出的图片。Optionally, the second picture is a picture retrieved from a picture vector database.
可选地,该云端装置100应用于基于大型语言模型LLM的应答系统中;操作信息包括目标图片,或者,操作信息包括目标图片和第二文本。Optionally, the cloud device 100 is applied to a response system based on a large language model (LLM); the operation information includes a target image, or the operation information includes a target image and a second text.
可选地,若操作信息包括目标图片,处理单元1001,用于通过LLM调用图片编辑式检索模型,基于目标图片对应的扩展文本和目标图片检索关联的第二图片。Optionally, if the operation information includes a target image, the processing unit 1001 is configured to call the image editing retrieval model through the LLM, and retrieve an associated second image based on the extended text corresponding to the target image and the target image.
可选地,若第二文本指示目标图片符合用户预期,处理单元1001,用于通过LLM调用图片编辑式检索模型,基于目标图片对应的扩展文本、第二文本和目标图片检索关联的第二图片。Optionally, if the second text indicates that the target image meets user expectations, the processing unit 1001 is used to call the image editing retrieval model through the LLM to retrieve the associated second image based on the extended text corresponding to the target image, the second text and the target image.
可选地,若第二文本指示目标图片不符合用户预期,处理单元1001,用于通过LLM调用图片编辑式生成模型生成与目标图片和第二文本关联的多张预览图片,图片编辑式生成模型用于基于第二文本的多个扩展文本和目标图片得到与目标图片和第二文本关联的多张预览图片,以确定第二图片。Optionally, if the second text indicates that the target image does not meet user expectations, the processing unit 1001 is used to call the picture editing generation model through LLM to generate multiple preview images associated with the target image and the second text. The picture editing generation model is used to obtain multiple preview images associated with the target image and the second text based on multiple extended texts of the second text and the target image to determine the second image.
可选地,基于LLM的应答系统中还包括文本分析模型,文本分析模型用于分析第二文本,以确定目标图片是否符合用户预期。Optionally, the LLM-based answering system further includes a text analysis model, and the text analysis model is used to analyze the second text to determine whether the target image meets user expectations.
可选地,发送单元1002,还用于向客户端发送与第二图片关联的链接信息。Optionally, the sending unit 1002 is further configured to send link information associated with the second image to the client.
本申请实施例中,云端100中各单元所执行的操作与前述图3至图9所示实施例中描述的类似,此处不再赘述。In the embodiment of the present application, the operations performed by each unit in the cloud 100 are similar to those described in the embodiments shown in Figures 3 to 9 above, and will not be repeated here.
如图11所示,本申请实施例还提供了客户端110的一结构包括:As shown in FIG. 11 , the embodiment of the present application further provides a structure of a client 110 including:
发送单元1101,用于向云端发送用户输入的第一图片和第一文本,第一文本用于调整第一图片。The sending unit 1101 is used to send the first picture and the first text input by the user to the cloud, where the first text is used to adjust the first picture.
接收单元1102,用于接收来自云端的多张预览图片,多张预览图片是基于第一图片和第一文本生成的。The receiving unit 1102 is used to receive multiple preview images from the cloud, where the multiple preview images are generated based on the first image and the first text.
发送单元1101,用于向云端发送用户输入的操作信息,操作信息用于检索与目标图片关联的第二图片,目标图片包含于多张预览图片。The sending unit 1101 is used to send the operation information input by the user to the cloud, where the operation information is used to retrieve a second image associated with the target image, where the target image is included in the multiple preview images.
接收单元1102,用于接收来自云端的第二图片。The receiving unit 1102 is configured to receive a second image from the cloud.
可选地,客户端110应用于基于大型语言模型LLM的应答系统中;操作信息包括目标图片,或者,操作信息包括目标图片和第二文本。Optionally, the client 110 is applied to a response system based on a large language model (LLM); the operation information includes a target image, or the operation information includes a target image and a second text.
可选地,若操作信息包括目标图片,则第二图片是通过LLM调用图片编辑式检索模型,基于目标图片的扩展文本和目标图片检索到的。 Optionally, if the operation information includes a target image, the second image is retrieved based on the extended text of the target image and the target image by calling the image editing retrieval model through the LLM.
可选地,若第二文本指示目标图片符合用户预期,则第二图片是通过LLM调用图片编辑式检索模型,基于目标图片的扩展文本、第二文本和目标图片检索到的。Optionally, if the second text indicates that the target image meets the user's expectations, the second image is retrieved based on the extended text of the target image, the second text and the target image by calling the image editing retrieval model through LLM.
可选地,若第二文本指示目标图片不符合用户预期,接收单元1102,还用于接收来自云端的与目标图片和第二文本关联的多张预览图片,与目标图片和第二文本关联的多张预览图片是通过LLM调用图片编辑式生成模型得到的,图片编辑式生成模型用于基于第二文本的多个扩展文本和目标图片得到与目标图片和第二文本关联的多张预览图片。Optionally, if the second text indicates that the target image does not meet user expectations, the receiving unit 1102 is further used to receive multiple preview images associated with the target image and the second text from the cloud, and the multiple preview images associated with the target image and the second text are obtained by LLM calling the picture editing generation model, and the picture editing generation model is used to obtain multiple preview images associated with the target image and the second text based on multiple extended texts of the second text and the target image.
发送单元1101,还用于向云端发送第三图片,或者,第三图片和第三文本,第三图片,或者,第三图片和第三文本用于确定第二图片,第三图片包含于目标图片和第二文本关联的多张预览图片中。The sending unit 1101 is also used to send a third picture, or a third picture and a third text to the cloud. The third picture, or the third picture and the third text are used to determine the second picture, and the third picture is included in multiple preview pictures associated with the target picture and the second text.
可选地,接收单元1102,还用于接收来自云端的与第二图片关联的链接信息。Optionally, the receiving unit 1102 is further configured to receive link information associated with the second image from the cloud.
本申请实施例中,客户端110中各单元所执行的操作与前述图3至图9所示实施例中描述的类似,此处不再赘述。In the embodiment of the present application, the operations performed by each unit in the client 110 are similar to those described in the embodiments shown in the aforementioned Figures 3 to 9, and will not be repeated here.
在本申请的另一实施例中,还提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当云端装置的处理器执行该计算机执行指令时,云端装置执行上述图3至图9中云端所执行的步骤。In another embodiment of the present application, a computer-readable storage medium is also provided, in which computer execution instructions are stored. When the processor of the cloud device executes the computer execution instructions, the cloud device executes the steps executed by the cloud in Figures 3 to 9 above.
在本申请的另一实施例中,还提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当客户端的处理器执行该计算机执行指令时,客户端执行上述图3至图9中客户端所执行的步骤。In another embodiment of the present application, a computer-readable storage medium is provided, in which computer-executable instructions are stored. When the processor of the client executes the computer-executable instructions, the client executes the steps executed by the client in Figures 3 to 9 above.
在本申请的另一实施例中,还提供一种计算机程序产品,该计算机程序产品包括计算机程序代码,当计算机程序代码在计算机上执行时,计算机设备执行上述图3至图9中云端或客户端所执行的步骤。In another embodiment of the present application, a computer program product is also provided. The computer program product includes a computer program code. When the computer program code is executed on a computer, the computer device executes the steps executed by the cloud or client in Figures 3 to 9 above.
在本申请的另一实施例中,还提供一种芯片系统,该芯片系统包括一个或多个接口电路和一个或多个处理器;接口电路和处理器通过线路互联;接口电路用于从终端的存储器接收信号,并向处理器发送信号,信号包括存储器中存储的计算机指令;当处理器执行计算机指令时,终端执行前述上述图3至图9中云端装置或客户端所执行的步骤。在一种可能的设计中,芯片系统还可以包括存储器,存储器,用于保存控制设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。In another embodiment of the present application, a chip system is also provided, which includes one or more interface circuits and one or more processors; the interface circuit and the processor are interconnected by lines; the interface circuit is used to receive signals from the memory of the terminal and send signals to the processor, and the signals include computer instructions stored in the memory; when the processor executes the computer instructions, the terminal executes the steps performed by the cloud device or the client in the above-mentioned Figures 3 to 9. In one possible design, the chip system may also include a memory, which is used to store program instructions and data necessary for the control device. The chip system may be composed of chips, or may include chips and other discrete devices.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of units is only a logical function division. There may be other division methods in actual implementation. For example, multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be an indirect coupling or communication connection through some interfaces, devices or units, which can be electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated units may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
当使用软件实现所述集成的单元时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。 When the integrated unit is implemented using software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more available media integrations. The available medium may be a magnetic medium, (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid state disk (SSD)), etc.
Claims (26)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311287307.XA CN119719396A (en) | 2023-09-27 | 2023-09-27 | Picture retrieval method and related device |
| CN202311287307.X | 2023-09-27 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025067005A1 true WO2025067005A1 (en) | 2025-04-03 |
Family
ID=95075711
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2024/119642 Pending WO2025067005A1 (en) | 2023-09-27 | 2024-09-19 | Method for picture retrieval and related apparatus |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN119719396A (en) |
| WO (1) | WO2025067005A1 (en) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103946838A (en) * | 2011-11-24 | 2014-07-23 | 微软公司 | Interactive Multimodal Image Search |
| US20170098152A1 (en) * | 2015-10-02 | 2017-04-06 | Adobe Systems Incorporated | Modifying at least one attribute of an image with at least one attribute extracted from another image |
| CN107748779A (en) * | 2017-10-20 | 2018-03-02 | 百度在线网络技术(北京)有限公司 | information generating method and device |
| CN113795834A (en) * | 2019-05-09 | 2021-12-14 | 微软技术许可有限责任公司 | Techniques for modifying a query image |
| CN114840700A (en) * | 2022-05-30 | 2022-08-02 | 来也科技(北京)有限公司 | Image retrieval method and device for realizing IA (IA) by combining RPA (resilient packet Access) and AI (Artificial Intelligence), and electronic equipment |
| CN116127111A (en) * | 2023-01-03 | 2023-05-16 | 百度在线网络技术(北京)有限公司 | Image search method, device, electronic device and computer-readable storage medium |
-
2023
- 2023-09-27 CN CN202311287307.XA patent/CN119719396A/en active Pending
-
2024
- 2024-09-19 WO PCT/CN2024/119642 patent/WO2025067005A1/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103946838A (en) * | 2011-11-24 | 2014-07-23 | 微软公司 | Interactive Multimodal Image Search |
| US20170098152A1 (en) * | 2015-10-02 | 2017-04-06 | Adobe Systems Incorporated | Modifying at least one attribute of an image with at least one attribute extracted from another image |
| CN107748779A (en) * | 2017-10-20 | 2018-03-02 | 百度在线网络技术(北京)有限公司 | information generating method and device |
| CN113795834A (en) * | 2019-05-09 | 2021-12-14 | 微软技术许可有限责任公司 | Techniques for modifying a query image |
| CN114840700A (en) * | 2022-05-30 | 2022-08-02 | 来也科技(北京)有限公司 | Image retrieval method and device for realizing IA (IA) by combining RPA (resilient packet Access) and AI (Artificial Intelligence), and electronic equipment |
| CN116127111A (en) * | 2023-01-03 | 2023-05-16 | 百度在线网络技术(北京)有限公司 | Image search method, device, electronic device and computer-readable storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119719396A (en) | 2025-03-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12061989B2 (en) | Machine learning artificial intelligence system for identifying vehicles | |
| US8762383B2 (en) | Search engine and method for image searching | |
| US9305050B2 (en) | Aggregator, filter and delivery system for online context dependent interaction, systems and methods | |
| AU2020271841A1 (en) | System and method for customer journey event representation learning and outcome prediction using neural sequence models | |
| CN113704531A (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
| CN112868004A (en) | Resource recommendation method and device, electronic equipment and storage medium | |
| CN114091572B (en) | Model training method and device, data processing system and server | |
| EP4293537A1 (en) | Item retrieval using query core intent detection | |
| CN117056598A (en) | Service plan recommended methods, devices, equipment and storage media | |
| CN117690002A (en) | Information interaction methods, devices, electronic equipment and storage media | |
| KR20200040813A (en) | Create search result-based listings in a single view | |
| CN111881352A (en) | Content pushing method and device, computer equipment and storage medium | |
| WO2025067005A1 (en) | Method for picture retrieval and related apparatus | |
| CN118152609B (en) | Image generation method, device and computer equipment | |
| WO2024188089A1 (en) | Special effect information display method and apparatus, electronic device, and storage medium | |
| CN118732893A (en) | Conversational interaction method, device, equipment and storage medium | |
| WO2025039700A1 (en) | Search result sorting method, apparatus, and system | |
| CN118467845A (en) | Method for constructing intelligent interactive service system, website intelligent interactive method and device | |
| CN111191065A (en) | Homologous image determining method and device | |
| CN110765296A (en) | Image search method, terminal device and storage medium | |
| WO2024031999A1 (en) | Image layering method and apparatus, electronic device, and storage medium | |
| CN116958322A (en) | Image editing methods and related equipment | |
| CN114117192B (en) | Object query method, device, server and storage medium | |
| US11907224B2 (en) | Facilitating search result removal | |
| CN116975433A (en) | Information ordering method and corresponding device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24870549 Country of ref document: EP Kind code of ref document: A1 |