[go: up one dir, main page]

WO2025086118A1 - Providing a structured image collection - Google Patents

Providing a structured image collection Download PDF

Info

Publication number
WO2025086118A1
WO2025086118A1 PCT/CN2023/126349 CN2023126349W WO2025086118A1 WO 2025086118 A1 WO2025086118 A1 WO 2025086118A1 CN 2023126349 W CN2023126349 W CN 2023126349W WO 2025086118 A1 WO2025086118 A1 WO 2025086118A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
intent
images
user
image collection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2023/126349
Other languages
French (fr)
Inventor
Lin SU
Dehua Cui
Ke Chen
Taroon BHARTI
Dongmei Zhang
Kun Wu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Priority to PCT/CN2023/126349 priority Critical patent/WO2025086118A1/en
Publication of WO2025086118A1 publication Critical patent/WO2025086118A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/538Presentation of query results

Definitions

  • the image search service may receive a search query from a user, and provide a search result including images relevant to the search query.
  • the image recommendation service may obtain user data of a user, such as browsing history, preference setting, etc., and accordingly push a recommendation result including images that might be interested by the user.
  • Embodiments of the present disclosure propose methods, apparatuses and non-transitory computer-readable medium for providing a structured image collection.
  • User data may be obtained.
  • a user intent may be determined based on the user data through an intent classifier, wherein the user intent comprises at least a sequential result intent or a grouped result intent.
  • a target query may be determined based on the user data.
  • the structured image collection may be generated based on the target query and the user intent, wherein the structured image collection comprises a plurality of images that are structurally related.
  • FIG. 1 illustrates an exemplary process of providing a structured image collection according to an embodiment.
  • FIG. 2 illustrates an exemplary process of user intent determination according to an embodiment.
  • FIG. 3 illustrates an exemplary process of structured image collection generation for a sequential result intent according to an embodiment.
  • FIG. 4 illustrates an exemplary process of structured image collection generation for a grouped result intent according to an embodiment.
  • FIG. 5 illustrates another exemplary process of structured image collection generation for a sequential result intent according to an embodiment.
  • FIG. 6 illustrates another exemplary process of structured image collection generation for a grouped result intent according to an embodiment.
  • FIG. 7 illustrates another exemplary process of structured image collection generation for a grouped result intent according to an embodiment.
  • FIG. 8A and FIG. 8B illustrate an exemplary presentation of a structured image collection for a sequential result intent according to an embodiment, respectively.
  • FIG. 9 illustrates another exemplary presentation of a structured image collection for a sequential result intent according to an embodiment.
  • FIG. 10A and FIG. 10B illustrate an exemplary presentation of a structured image collection for a grouped result intent according to an embodiment, respectively.
  • FIG. 11 illustrates another exemplary presentation of a structured image collection for a grouped result intent according to an embodiment.
  • FIG. 12 illustrates a flowchart of an exemplary method for providing a structured image collection according to an embodiment.
  • FIG. 13 illustrates an exemplary apparatus for providing a structured image collection according to an embodiment.
  • FIG. 14 illustrates an exemplary apparatus for providing a structured image collection according to an embodiment.
  • Image services may provide image-related results to users.
  • the image search service may provide a search result
  • the image recommendation service may push a recommendation result, etc.
  • Images in either the search result or the recommendation result are separately arranged instead of being structurally organized.
  • a search result including multiple images each showing a best-sold car in 2023 may be generated.
  • the images in the search result are separately arranged from each other, instead of being structurally organized as, e.g., an ordered image list which explicitly and directly shows cars ranked from 1 to 10 according to their selling data.
  • the user has to further manually identify and extract useful images from the search result to obtain a series of images ranked or ordered from 1 to 10. Similar situation also exists for the scenario of image recommendation service.
  • Embodiments of the present disclosure propose to provide a structured image collection in image services.
  • the present disclosure may effectively identify an intent of viewing a structured image collection, e.g., a user may intend to browse a series of images that are structurally related to each other, and thus provide the structured image collection that satisfies the intent. Accordingly, the performance of image services, such as, the image search service, the image recommendation service, etc., may be improved so as to better satisfy user requirements and improve user experience.
  • a structured image collection which may also be termed as a structured image gallery, includes a plurality of images that are structurally related.
  • images that are “structurally related” may refer to that these images are not arranged separately from each other, but are arranged or organized in a predetermined structure relationship among each other.
  • a plurality of images in a structured image collection may be sequentially related and thus constitute a sequential image collection, may be organized into multiple groups and thus constitute a grouped image collection, etc.
  • a structured image collection may further include a title for the collection as well as brief text description for each image, so as to facilitate providing more text information about contents of the images.
  • the present disclosure may identify whether a user has an intent to view a structured image collection in a result returned by an image service, such as, in an image search result or an image recommendation result.
  • the identifying of a user intent may be performed based on user data.
  • the user data may include a search query from a user.
  • the user data may include user profile.
  • the user intent may be determined based on the user data through an intent classifier.
  • the user intent may include, e.g., a sequential result intent which indicates an intent of viewing image results in a manner of sequential image collection, a grouped result intent which indicates an intent of viewing image results in a manner of grouped image collection, etc.
  • the present disclosure may determine a target query based on the user data.
  • the target query refers to a brief summary for user-interested image contents.
  • the user data includes a search query, and the search query may be determined as the target query, since the search query usually briefly reflects to what image contents the user is interested.
  • the user data includes a user profile, and an interested topic that briefly reflects to what image contents the user is interested may be extracted from the user profile, and may be determined as the target query.
  • the present disclosure may automatically generate a structured image collection based on the target query and the user intent.
  • the structured image collection may be generated based on the target query and the user intent through image collection automatic generation.
  • the image collection automatic generation may adopt an AI model.
  • the structured image collection may be generated based on the target query and the user intent through search result aggregation.
  • the search result aggregation may utilize an image search engine.
  • the present disclosure may combine the search result aggregation approach with the image collection automatic generation approach.
  • the embodiments of the present disclosure for providing a structured image collection may be implemented in a real-time or online approach, such that a structured image collection may be generated in response to user data in real-time.
  • the embodiments of the present disclosure for providing a structured image collection may be implemented in a pre-storing approach, as a process for preparing or enriching an image library.
  • the present disclosure may combine the pre-storing approach and the real-time approach.
  • embodiments of the present disclosure may be applied for various image services, including but not limited to the image search service, the image recommendation service, etc. Moreover, the embodiments of the present disclosure may be implemented at, e.g., a server or cloud side.
  • FIG. 1 illustrates an exemplary process 100 of providing a structured image collection according to an embodiment.
  • the process 100 may be performed for various image services, e.g., the image search service, the image recommendation service, etc.
  • user data obtaining is performed for obtaining user data.
  • the user data may include a search query from a user, such as, “top 10 best-selling cars in 2023” , “how to make a banana pie” , “views of the Great Wall in four seasons” , and so on.
  • the user data may include a user profile.
  • the user profile may include various types of user information, e.g., at least one of browsing history of a user, search history of the user, topics specified by the user, common topics from other users, etc.
  • user intent determination is performed for determining a user intent based on the user data.
  • the user intent may also be termed as an image result intent, a result type intent, etc.
  • the user intent may indicate whether the user intends to view an image-related result, such as an image search result, an image recommendation result, etc., in a manner of structured image collection, such as a sequential image collection, a grouped image collection, etc.
  • the user intent may be determined through an intent classifier.
  • the intent classifier may take the user data as an input and output a classification result indicating the user intent, e.g., a sequential result intent, a grouped result intent, etc.
  • the intent classifier may determine at least an intent of viewing image results in a manner of sequential image collection, an intent of viewing image results in a manner of grouped image collection, etc. It should be understood that the embodiments of the present disclosure are not limited to adopt the intent classifier for determining the user intent, but may also adopt any other approaches for determining the user intent.
  • target query determination operation is performed for determining a target query based on the user data.
  • the target query refers to a brief summary for user-interested image contents
  • the user data includes a search query
  • the search query may be determined as the target query. This is because the search query usually briefly reflects to what image contents the user is interested. As an example, a search query such as “top 10 best-selling cars in 2023” , “how to make a banana pie” , “views of the Great Wall in four seasons” , etc. may be determined as the target query directly.
  • the user data includes a user profile
  • an interested topic extracted from the user profile may be determined as the target query. This is because the extracted interested topic usually briefly reflects to what image contents the user is interested.
  • the present disclosure may adopt any approaches for extracting an interested topic from the user profile. For example, if the user profile indicates that the user has browsed or searched webpages or images associated with, e.g., selling data of cars, etc., an interested topic “top 10 best-selling cars in 2023” may be extracted from the user profile. For example, if the user profile indicates that the user followed a topic or specified a topic, such as “how to make a banana pie” , this topic may be extracted from the user profile as an interested topic for the user.
  • the user profile comprises a common topic among some other users associated with the user, e.g., family, friends, contacts, etc., such as when at least two other users follow the same topic
  • this common topic may be extracted from the user profile as an interested topic for the user.
  • various information from the user profile e.g., browsing history, search history, specified topics, common topics, and other information not detailed above, may be taking into account in combination, so as to extract an interested topic for the user.
  • structured image collection generation is performed for generating a structured image collection based on the target query and the user intent.
  • the structured image collection includes a plurality of images that are structurally related.
  • the present disclosure may adopt the search result aggregation approach and/or the image collection automatic generation approach for generating the structured image collection, which will be discussed in details in connection with FIG. 3 to FIG. 7 later.
  • the process 100 may be implemented in a real-time or online approach, and thus applied for a purpose of providing a structured image collection in real-time.
  • the structured image collection generated according to the process 100 may be further transmitted to a device client, such that the structured image collection may be presented, as an image search result or an image recommendation result, on a user interface of the image search service or the image recommendation service.
  • the process 100 may be implemented in a pre-storing approach, and thus applied for a purpose of pre-storing structured image collections in an image library, such as preparing or enriching the image library.
  • the user data obtaining at 110 may include obtaining history user data, e.g., a history search query or a history user profile.
  • the history user data refers to user data that is obtained when preparing the image library. In other words, the history user data may be obtained at an earlier time period as compared to current user data in response to which the generated structured image collection may be provided directly.
  • the history search query may be obtained by collecting any search query that a certain user once provided. In an implementation, the history search query may also be collected through crawling webpages on the internet.
  • any web crawling tools may be employed to analyze the webpages and extract potential key words, statements, etc. from therein to determine a search query as the history search query.
  • the history user profile may be obtained by collecting the user profile that is maintained when preparing the image library.
  • the history user data is not limited as targeting for a certain user, but may target for multiple users, such that the image library may include enough pre-stored structured image collections.
  • the user intent determination at 120 may include determining a history user intent
  • the target query determination at 130 may include determining a history target query.
  • the structured image collection generation at 140 may include generating a structured image collection based on the history target query and the history user intent. Then, the generated structured image collection may be pre-stored together with a corresponding search index in the image library.
  • the image library may be then adopted by an image service for returning a structured image collection as an image-related result.
  • the structured image collection may be quickly retrieved from the image library and presented to a user if the structured image collection matches with a target query and a user intent derived from user data of the user.
  • the user data obtaining at 110 may include obtaining current user data targeting for the current user, e.g., a current search query or a current user profile.
  • the user intent determination at 120 may include determining a current user intent
  • the target query determination at 130 may include determining a current target query.
  • the structured image collection generation at 140 may include retrieving the structured image collection from the image library based on the current user intent and the current target query.
  • a search index corresponding to the current user data may be selected from a plurality of search indexes in the image library based on the current user intent and the current target query. Then, a pre-stored structured image collection corresponding to the search index may be retrieved from the image library. The retrieved structured image collection may be presented on a user interface by the image search service or the image recommendation service, such that the retrieved structured image collection may be viewed by the user.
  • the structured image collection generated in the real-time approach may also be added into the image library so as to further enrich the image library.
  • the steps in the process 100 are not limited to any specific orders, e.g., although the user intent determination at 120 is shown as performed before the target query determination at 130, the performing orders of these two steps may also be interchanged.
  • FIG. 2 illustrates an exemplary process 200 of user intent determination according to an embodiment.
  • the process 200 may be performed for determining an user intent by an intent classifier 220.
  • the intent classifier 220 takes user data 210 as an input, wherein the user data 210 may be obtained by, e.g., the user data obtaining at 110 in FIG. 1.
  • the intent classifier 220 outputs a classification result indicating a user intent corresponding to the user data 210.
  • the intent classifier 220 may classify the user data 210 to a specific class of intent among various possible classes of intent including, e.g., a sequential result intent 230, a grouped result intent 240, a non-structured result intent 250, etc.
  • the sequential result intent 230 may indicate an intent of viewing image results in a manner of sequential image collection.
  • a sequential image collection may comprise a plurality of images ordered by image contents.
  • the plurality of images may be ordered in view of at least one of the following exemplary aspects: a time point when a scene shown by an image occurs, a geographical location where a scene shown by an image corresponds, attributes of one or more entities in an image such as selling data, price, size, salary, etc., a performing or operational order such as when an image relates to a step of a process, etc., a logical order such as when an image relates to a story or narrative, etc., an evolutionary order such as when an images relates to a product, species, concept, etc., or a cause and effect order, etc.
  • the user intent may be determined as a sequential result intent through the intent classifier 220.
  • the user data includes a user profile indicating that the user is interested to “how to make banana pie” and thus the user data indicates that the user intends to view a recommendation result illustrating ordered steps for making a banana pie
  • the user intent may be determined as a sequential result intent through the intent classifier 220.
  • the grouped result intent 240 may indicate an intent of viewing image results in a manner of grouped image collection.
  • a grouped image collection may comprise multiple groups of images that are divided by image contents.
  • the multiple groups of images may be divided in view of at least one of the following exemplary aspects: a time point when a scene shown by an image occurs, a geographical location where a scene shown by an image corresponds, attributes of one or more entities in an image such as types or categories, color, size, technique or medium used, popularity or rating, etc., attributes of an image itself such as size, resolution, aspect ratio, style, color, author, format, etc., tags attached to an image, topics involved in an image, etc.
  • the user intent may be determined as a grouped result intent through the intent classifier 220.
  • the non-structured result intent 250 may indicate an intent of viewing image results in a manner other than the manner of sequential image collection or the manner of grouped image collection.
  • the non-structured result intent 250 may indicate an intent of viewing separate images that are not structurally organized as done with the existing image searching and recommending mechanisms. It should be understood that if the user data 210 is determined as having the non-structured result intent 250, any existing image searching and recommending mechanisms may be employed to provide an image result including separate images.
  • the intent classifier 220 may be a multi-class classification model for conducting a multi-class classification task to output one of multiple intent classes (e.g., a sequential result intent, a grouped result intent, a non-structured result intent, etc. ) based on the input user data.
  • the intent classifier 220 may be implemented with a neutral network (NN) model (e.g., Transformer, BERT, etc., ) , a support vector machine (SVM) model, a decision tree (DT) model, etc., that may be used to perform the multi-class classification task.
  • NN neutral network
  • SVM support vector machine
  • DT decision tree
  • Training data for the intent classifier 220 may be in a format of a data pair consisting of user data and a corresponding intent class.
  • a training data pair may be in a format of ⁇ search query, intent class>, e.g., ⁇ search query: “top 10 best-selling cars in 2023” , intent class: “sequential result intent” >, ⁇ search query: “views of the Great Wall in four seasons” , intent class: “grouped result intent” >, and so on.
  • a training data pair may be in a format of ⁇ user profile, intent class>, which is similar with the image search service except that the search query part in the data pair is replaced by information from the user profile.
  • the exemplary process 200 may be performed for exactly identifying the user intent, and thus facilitate to further provide a structured image collection satisfying the user intent.
  • FIG. 3 illustrates an exemplary process 300 of structured image collection generation for a sequential result intent according to an embodiment.
  • the process 300 is an exemplary implementation of the structured image collection generation at 140 in FIG. 1.
  • the process 300 adopts the image collection automatic generation approach for automatically generating a structured image collection based on a sequential result intent 310 and a target query 320.
  • the structured image collection may be automatically generated by means of an AI model.
  • the sequential result intent 310 and the target query 320 may be determined according to the process 100 in FIG. 1 and/or the process 200 in FIG. 2.
  • a list of image descriptions may be generated based on the target query 320.
  • Each image description may be a text description for an image in a structured image collection that is to be generated. Taking an exemplary target query “how to make a banana pie” as an example, the generated list of image descriptions may include multiple text descriptions, wherein each text description describes a specific operation/step in the process of making the banana pie.
  • the list of image descriptions may be generated through an image description generating model.
  • the image description generating model may be a generative model, which may generate a text output based on a text input.
  • the target query may be provided or feed to the image description generating model, so as to trigger the image description generating model to output the list of image descriptions.
  • the image description generating model may be implemented by any universal AI model, e.g., a large language model (LLM) .
  • LLM is a type of large-scale language model having the ability of effectively achieving general-purpose language understanding and generation. The LLM may acquire this ability by using massive amounts of data to learn a large amount of parameters during training.
  • the image description generating model may also be implemented by any specialized AI model that is specially developed and trained for generating the list of image descriptions based on the input target query.
  • the specialized AI model may employ any suitable model architecture, e.g., an Encoder-Decoder architecture that is implemented by a RNN model and its variants such as LSTM variants or GRU variants, a Transformer model and its variants such as GPT variants or T5 variants, etc.
  • Training data for the image description generating model may be in a format of a data pair consisting of a target query and a corresponding list of image descriptions, wherein the list of image descriptions may be artificially labeled.
  • the list of image descriptions generated through the image description generating model may define the same attributes of image entities and/or the same image style.
  • An image entity refers to an object included in an image, and attributes of the image entity may comprise, e.g., color of the image entity, size of the image entity, location of the image entity in the image, as well as any other suitable attributes.
  • Image style may refer to the approach of artistic expression, e.g., romanticism, realism, etc.
  • the same attributes of image entities and/or the same image style facilitate to generate multiple images in a unified formality and thus the multiple images are more suitable to be organized as an image collection.
  • certain rules may be applied on the training data used during the training stage for the image description generating model.
  • the training data may be prepared in a way that, for each image description of the list of image descriptions for the target query, the attributes of image entities and/or the image style are defined to be unified.
  • multiple images may be generated based on the list of image descriptions respectively.
  • the multiple images may be generated through a text-to-image generating model.
  • the text-to-image generating model may be a generative model, which may generate an image output based on a text input.
  • each of the list of image descriptions may be provided to the text-to-image generating model, so as to trigger the text-to-image generating model to output a corresponding multiple image.
  • the image description generating model may be implemented by any model for generating an image based on a text description, e.g., DALL-E, Midjourney, etc.
  • the multiple images generated at 340 may be arranged as a sequential image collection according to the sequential result intent 310.
  • the sequential result intent 310 indicates an intent of viewing image results in a manner of sequential image collection
  • the arranging at 350 may comprise ordering the multiple images to form a sequential image collection in response to the sequential result intent.
  • a title for the sequential image collection and/or a brief text description for each image in the sequential image collection may be generated based on the multiple images and the target query through, e.g., a title and description generating model.
  • the title and description generating model may be a multi-modal generative model that generates a text output based on an image and a text input.
  • the image description generating model may be implemented by any NN model, such as a Transformer model (e.g., BERT, GPT variants) , etc.
  • the title for the sequential image collection and/or the brief text description for each image may also be included in the generated sequential image collection for facilitating providing more text information about contents of the images.
  • FIG. 4 illustrates an exemplary process 400 of structured image collection generation for a grouped result intent according to an embodiment.
  • the process 400 is an exemplary implementation of the structured image collection generation at 140 in FIG. 1.
  • the process 400 adopts the image collection automatic generation approach for automatically generating a structured image collection based on a grouped result intent 410 and a target query 420.
  • the structured image collection may be automatically generated by means of an AI model.
  • the grouped result intent 410 and the target query 420 may be determined according to the process 100 in FIG. 1 and/or the process 200 in FIG. 2.
  • the target query may be broken into multiple sub-queries.
  • the breaking of the target query refers to semantic segmentation, rather than simple splitting of the target query into several terms or phrases.
  • this target query may be semantically broken into four sub-queries, e.g., “views of the Great Wall in spring” , “views of the Great Wall in summer” , “views of the Great Wall in autumn” , and “views of the Great Wall in winter” .
  • the breaking of the target query into multiple sub-queries may be performed through a query breaking model.
  • the query breaking model may be implemented by any universal AI model for conducting a text semantic segmentation task, e.g., a LLM.
  • the query breaking model may also be implemented by any specialized AI model that is specially developed and trained for semantically segmenting the target query into multiple sub-queries.
  • Training data for the query breaking model may be in a format of a data pair consisting of a target query and corresponding multiple sub-queries, e.g., an exemplary training data pair ⁇ target query: “views of the Great Wall in four seasons” , sub-queries: “views of the Great Wall in spring” , “views of the Great Wall in summer” , “views of the Great Wall in autumn” , and “views of the Great Wall in winter” > .
  • a list of image descriptions may be generated based on the sub-query and then a group of images may be generated based on the list of image descriptions respectively.
  • the operation of generating the list of image descriptions may be the same as the operation 330 in FIG. 3 except that the target query in 330 is changed as the sub-query, and thus this operation will not be detailed herewith for simplicity.
  • the operation of generating the group of images may be the same as the operation 340 in FIG. 3 and thus will not be detailed herewith for simplicity.
  • multiple groups of images that correspond to the multiple sub-queries respectively may be generated.
  • the multiple groups of generated image may be arranged as a grouped image collection according to the grouped result intent 410.
  • the arranging at 450 may comprise combining the multiple groups of images to form a grouped image collection in response to the grouped result intent.
  • a title for the grouped image collection, a title for each group, and/or a brief text description for each image in the grouped image collection may be generated, for facilitating providing more text information about contents of the images.
  • FIG. 5 illustrates another exemplary process 500 of structured image collection generation for a sequential result intent according to an embodiment.
  • the process 500 is an exemplary implementation of the structured image collection generation at 140 in FIG. 1.
  • the process 500 adopts the search result aggregation approach for automatically generating a structured image collection based on a sequential result intent 510 and a target query 520.
  • the structured image collection may be automatically generated by aggregating search results from an image search engine.
  • the sequential result intent 510 and the target query 520 may be determined according to the process 100 in FIG. 1 and/or the process 200 in FIG. 2.
  • a search result for the target query 520 may be obtained.
  • the search result for the target query may be obtained through the image search engine.
  • the target query may be provided to the image search engine, and the search result matching the target query may be then obtained from the image search engine.
  • the image search engine may generate the search result through any existing techniques.
  • the search result may comprise multiple images matching the target query.
  • the multiple images may be ranked by the image search engine according to freshness, authority, number of viewing, or any other metrics. Thus, highest-ranked N images may be obtained in this step.
  • the search result may comprise multiple webpages matching the target query.
  • the multiple webpages may be ranked by the image search engine according to freshness, authority, number of viewing, or any other metrics. Thus, highest-ranked N webpages may be obtained in this step.
  • multiple image lists may be generated based on the search result.
  • the search result comprises the highest-ranked N images
  • a webpage from which the image originates may be obtained. This is feasible as the original URL of an image is generally maintained together with the image on the internet.
  • original ranked images if exist, may be extracted from the webpage to form an image list.
  • one of the highest-ranked N images in the search result may originate from a webpage which lists 10 cars ranked from 1 to 10 according to their selling data, and thus images of the 10 ranked cars in the webpage may be extracted. That is, the finding of the webpage may facilitate generating an ordered list of top 10 best-selling cars. In this way, by performing the same procedure for the N images respectively, multiple image lists may be generated.
  • the search result comprises the highest-ranked N webpages
  • original ranked images if exist, may be extracted from the webpage to form an image list. In this way, by performing the same procedure for the N webpages respectively, multiple images lists may be generated.
  • an image list that is highest relevant to the target query may be selected among the multiple image lists. For example, for each of the multiple image lists, a score for the image list that reflects the relevance of the image list to the target query may be generated. The score may be generated based on a plurality of images in the image list, and texts associated with the plurality of images obtained from the original webpage where the images originate. Metrics for determining the score may include, e.g., number of images in the image list, matching rate of each image with the target query, matching rate of the text for each image with the target query, or any other suitable metrics that may be used to evaluate the relevance of the image list to the target query.
  • the score may be generated through a list scoring model, which may be a multi-modal model that takes the plurality of images in the image list and texts associated with the plurality of images as inputs and output a score for the image list.
  • the list scoring model may be implemented by any universal model, such as, a LLM.
  • the list scoring model may also be implemented by any specialized model that is customized to conduct the scoring task.
  • the specialized list scoring model may be implemented with any model architecture, such as Transformer (e.g., BERT, GPT variants, etc. ) , or implemented as a NN model, a DT model or a logistic regression model that takes image features (which may be extracted by an image encoder like CNN, ViT, CLIP, etc. ) and text features (which may be extracted by a text encoder like BERT, DSSM, word2vec, etc. ) as inputs and outputs a score for the image list.
  • Transformer e.g., BERT, GPT variants, etc.
  • the selected image list may be arranged as a sequential image collection according to the sequential result intent 510.
  • the sequential result intent 510 indicates an intent of viewing image results in a manner of sequential image collection
  • the arranging at 560 may comprise ordering multiple images in the image list to form a sequential image collection in response to the sequential result intent.
  • a title for the sequential image collection and/or a brief text description for each image in the sequential image collection may be generated, for facilitating providing more text information about contents of the images.
  • FIG. 6 illustrates another exemplary process 600 of structured image collection generation for a grouped result intent according to an embodiment.
  • the process 600 is an exemplary implementation of the structured image collection generation at 140 in FIG. 1.
  • the process 600 adopts the search result aggregation approach for automatically generating a structured image collection based on a sequential result intent 610 and a target query 620.
  • the structured image collection may be automatically generated by aggregating search results from an image search engine.
  • the sequential result intent 610 and the target query 620 may be determined according to the process 100 in FIG. 1 and/or the process 200 in FIG. 2.
  • the target query 620 may be broken into multiple sub-queries.
  • the operation of breaking the target query may be the same as the operation 430 in FIG. 4, and thus this operation will not be detailed herewith for simplicity.
  • highest-ranked N images may be obtained through an image search engine.
  • the operation of obtaining the highest-ranked N images may be similar with the operation 530 in FIG. 5 in the case that the search result includes highest-ranked N images, except that the target query as used in 530 is changed as a sub-query, and thus this operation will not be detailed herewith for simplicity.
  • the highest-ranked N images may form a group of N images corresponding to the sub-query. In this way, multiple groups of N images corresponding to the multiple sub-queries respectively may be obtained.
  • the multiple groups of N images may be arranged as a grouped image collection according to the grouped result intent 610.
  • the arranging at 650 may comprise combining the multiple groups of N images to form a grouped image collection in response to the grouped result intent.
  • a title for the grouped image collection, a title for each group, and/or a brief text description for each image in the grouped image collection may be generated, for facilitating providing more text information about contents of the images.
  • FIG. 7 illustrates another exemplary process 700 of structured image collection generation for a grouped result intent according to an embodiment.
  • the process 700 is an exemplary implementation of the structured image collection generation at 140 in FIG. 1.
  • the process 700 adopts the search result aggregation approach for automatically generating a structured image collection based on a grouped result intent 710 and a target query 720.
  • the structured image collection may be automatically generated by aggregating search results from an image search engine.
  • the sequential result intent 710 and the target query 720 may be determined according to the process 100 in FIG. 1 and/or the process 200 in FIG. 2.
  • highest-ranked N images for the target query 720 may be obtained.
  • the operation of obtaining the highest-ranked N images may be similar with the operation 530 in FIG. 5 in the case that the search result includes highest-ranked N images, and thus this operation will not be detailed herewith for simplicity.
  • the target query 720 may be broken into multiple sub-queries.
  • the operation of breaking the target query may be the same as the operation 430 in FIG. 4, and thus this operation will not be detailed herewith for simplicity.
  • the highest-ranked N images may be divided into multiple groups of images.
  • the dividing of the highest-ranked N images may be implemented by mapping the highest-ranked N images to the multiple sub-queries. For example, a distance between a feature vector for each image and a feature vector for each sub-query may be calculated. Then, an image may be mapped to a sub-query which has the nearest vector distance with the image. Accordingly, the highest-ranked N images may be divided into multiple groups of images, and each group of images corresponds to one of the multiple sub-queries.
  • the multiple groups of images, that correspond to the multiple sub-queries respectively, may be arranged as a grouped image collection according to the grouped result intent 710.
  • the arranging at 760 may comprise combining the multiple groups of images to form a grouped image collection in response to the grouped result intent.
  • a title for the grouped image collection, a title for each group, and/or a brief text description for each image in the grouped image collection may be generated, for facilitating providing more text information about contents of the images.
  • the search result aggregation approach discussed in combination with FIG. 5, FIG. 6 and FIG. 7 may be combined with the image collection automatic generation approach discussed in combination with FIG. 3 and FIG. 4 in the present disclosure.
  • the present disclosure may first attempt the search result aggregation approach, e.g., to aggregate search results from the image search engine to generate the structured image collection. If no search result is available in the search result aggregation approach, the present disclosure may then adopt the image collection automatic generation approach, e.g., to automatically generate the structured image collection.
  • Such combining of the two approaches for generating a structure image collection may facilitate to guarantee the authenticity of images in the structured image collection.
  • the providing of a structured image collection as discussed in the embodiments of the present disclosure may be implemented in a real-time or online approach.
  • the operations for generating a structured image collection that are discussed in combination with FIG. 3 to FIG. 7 may be performed in response to user data in real-time, and then the generated structured image collection may be transmitted by an image service to a device client for presenting on a user interface, such that the structured image collection may be viewed by a user.
  • the providing of a structured image collection as discussed in the embodiments of the present disclosure may be implemented in a pre-storing approach.
  • structured image collections generated according to the operations that are discussed in combination with FIG. 3 to FIG. 7 may be pre-stored together with corresponding search indexes in an image library for future use, such that a structured image collection may be quickly retrieved from the image library and presented to a user if the structured image collection matches with a target query and a user intent derived from user data of the user.
  • the present disclosure may be implemented for preparing or enriching the image library.
  • the pre-storing approach and the real-time approach may be combined in the present disclosure. Since the pre-storing approach may involve less delay as compared to the real-time approach, the combining of the two approaches may facilitate to achieve a quick response for the current user data. For example, the present disclosure may first try to retrieve a structured image collection from the image library. If no matched structured image collection is found in the image library, the present disclosure may further perform the real-time approach.
  • FIG. 8A, FIG. 8B, FIG. 9, FIG. 10A, FIG. 10B and FIG. 11 relate to exemplary presentations of a structured image collection on a user interface.
  • these figures illustrate examples about how the structured image collection may be presented at the frontend of the image search service or the image recommendation service.
  • FIG. 8A and FIG. 8B illustrate an exemplary presentation of a structured image collection for a sequential result intent according to an embodiment, respectively.
  • a sequential image collection 810 may be presented on the upper part of the user interface.
  • the sequential image collection 810 may be provided for an exemplary target query “top 10 best-selling cars in 2023” .
  • the sequential image collection 810 may be shown as a thumbnail image 820 which shows, e.g., one of the top best-selling 10 cars.
  • the thumbnail image 820 may correspond to one specific image in the sequential image collection 810.
  • the sequential image collection 810 may also include a title 830 of the sequential image collection.
  • the image collection title may be generated through a title and description generating model as discussed above in combination with FIG. 3.
  • a label such as a cruciate shape label 840, may be displayed on the thumbnail image 820 to provide an intuitive indication that 810 is an image collection, rather than a separate image as shown in the conventional mechanism.
  • one or more separate images 850-1, 850-2, 850-3 and 850-4 as well as a title of each image may also be shown in the exemplary presentation 800A.
  • the separate images 850-1, 850-2, 850-3 and 850-4 and respective titles may be obtained by any conventional mechanism for providing separate images for the target query.
  • the exemplary presentation 800A may simultaneously show more than one sequential image collection.
  • the number of the shown structured image collections may be specified, e.g., as a fixed number, or according to certain user setting, and so on.
  • an exemplary presentation 800B may be presented on the user interface, to display more specific information in the sequential image collection 810, as shown in FIG. 8B.
  • the exemplary presentation 800B may include a plurality of ordered images, such as the images 820-1 to 820-6 as shown in FIG. 8B.
  • a progress bar control may be displayed on the user interface for enabling a user to drag it to view all the images in the sequential image collection 810.
  • the order of each image may also be presented through a label on the ordered images.
  • the image 820-1 may show the car ranking first according to selling data in 2023
  • the image 820-2 may show the car ranking second according to selling data in 2023, and so on.
  • a main image 860 may be displayed on the left-above part of the exemplary presentation 800B.
  • the initially-presented main image 860 may be the first-ordered image in the sequential image collection 810, and in response to a further user action, the main image 860 may be switched to any one of the images 820-1 to 820-6.
  • the main image 860 may be switched to the image 820-2.
  • Blocks 870 and 880 may also be presented on the exemplary presentation 800B, to show the title of the sequential image collection and a text description for the current main image 860.
  • FIG. 9 illustrates another exemplary presentation of a structured image collection for a sequential result intent according to an embodiment.
  • the exemplary presentation 900 shows a sequential image collection 910. Unlike the exemplary presentation 800A, multiple thumbnail images 920-1 to 920-6, each corresponding to an ordered image in the sequential image collection 910, may be presented in a row on the upper part of the user interface. Similar as FIG. 8A, the sequential image collection 910 may also include a title 930 of the sequential image collection. Besides, also similar as FIG. 8A, optionally, one or more separate images 940-1, 940-2, 940-3 and 940-4 as well as a title of each image may also be shown in the exemplary presentation 900.
  • a new exemplary presentation (not shown) which is the same as the exemplary presentation 800B may be displayed, to show more specific information in the sequential image collection 910. It should be understood that if the clicking operation is performed, in the presentation 900, on a thumbnail image other than the thumbnail image 920-1, e.g., the thumbnail image 920-2, the displayed main image in the presentation 800B would be the ordered image corresponding to the clicked thumbnail image 920-2.
  • FIG. 10A and FIG. 10B illustrate an exemplary presentation of a structured image collection for a grouped result intent according to an embodiment, respectively.
  • a grouped image collection 1010 may be presented on the upper part of the user interface.
  • the grouped image collection 1010 may be provided for an exemplary target query “views of the Great Wall in four seasons” .
  • the grouped image collection 1010 may include four groups 1020A, 1020B, 1020C and 1020D.
  • the group 1020A may include images for views of the Great Wall in spring
  • the group 1020B may include images for views of the Great Wall in summer
  • the group 1020C may include images for views of the Great Wall in autumn
  • the group 1020D may include images for views of the Great Wall in winter.
  • the structure of the groups 1020B, 1020C and 1020D is the same as 1020A, and reference numbers in the groups 1020B, 1020C and 1020D are omitted for simplification.
  • thumbnail image 1030 corresponding to an image for views of the Great Wall in spring, a title of the group 1040 and a label 1050 indicating that the group 1020A is an image collection may be displayed.
  • one or more separate images 1060-1, 1060-2, 1060-3 and 1060-4 as well as a title of each image may also be shown in the exemplary presentation 1000A.
  • the exemplary presentation 1000A may simultaneously show more than one sequential image collection.
  • an exemplary presentation 1000B may be presented on the user interface, as shown in FIG. 10B.
  • the user action may be performed, in the presentation 1000A, on the group 1020A.
  • the exemplary presentation 1000B may include a plurality of images 1030-1 to 1030-6 in the group 1020A, as shown in FIG. 10B.
  • a main image 1070 may be displayed on the left-above part of the exemplary presentation 1000B.
  • the initially-presented main image 1070 may be the first image 1030-1 in the group 1020A, and in response to a further user action, the main image 1070 may be switched to any one of the images 1030-1 to 1030-6.
  • Blocks 1080 and 1090 may also be presented on the exemplary presentation 1000B, to show the title of the group 1020A and a text description for the current main image 1070.
  • FIG. 11 illustrates another exemplary presentation of a structured image collection for a grouped result intent according to an embodiment.
  • the exemplary presentation 1100 shows a grouped image collection 1110, which includes four groups 1120A, 1120B, 1120C and 1120D.
  • the group 1120A may include images for views of the Great Wall in spring
  • the group 1120B may include images for views of the Great Wall in summer
  • the group 1120C may include images for views of the Great Wall in autumn
  • the group 1120D may include images for views of the Great Wall in winter.
  • the structure of the groups 1120B, 1120C and 1120D is the same as the group 1120A, and reference numbers in the groups 1120B, 1120C and 1120D are omitted for simplification.
  • thumbnail images each corresponding to an image in the group, may be presented in a row on the upper part of the user interface.
  • the group 1120A multiple thumbnail images 1130-1 to 1130-5 are presented in a row.
  • the group 1120A may also include a title 1140 of the group.
  • one or more separate images 1150-1, 1150-2, 1150-3 and 1150-4 as well as a title of each image may also be shown in the exemplary presentation 1100.
  • a new exemplary presentation (not shown) which is the same as the exemplary presentation 1000B may be displayed, to show more specific information in the group 1120A.
  • the displayed main image may be the first image 1130-1 in the group 1120A.
  • the elements in the presentations 800A, 800B, 900, 1000A, 1000B and 1100 are exemplary, and do not intend to limit the structured image collection to be presented in any specific manners. Rather, the structured image collection may be presented in any suitable manner.
  • the text description for each image may be directly presented on top of the image, instead of being presented in a text block separated from the image, or may be presented in any other manner.
  • FIG. 12 illustrates a flowchart of an exemplary method 1200 for providing a structured image collection according to an embodiment.
  • user data may be obtained.
  • a user intent may be determined based on the user data through an intent classifier, wherein the user intent comprises at least a sequential result intent or a grouped result intent.
  • a target query may be determined based on the user data.
  • the structured image collection may be generated based on the target query and the user intent, wherein the structured image collection comprises a plurality of images that are structurally related.
  • the sequential result intent may indicate an intent of viewing image results in a manner of sequential image collection
  • the grouped result intent may indicate an intent of viewing image results in a manner of grouped image collection
  • the structured image collection in response to determining the user intent as the sequential result intent, may be generated as a sequential image collection in which the plurality of images are ordered by image contents.
  • the structured image collection in response to determining the user intent as the grouped result intent, may be generated as a grouped image collection in which the plurality of images are multiple groups of images that are divided by image contents.
  • the user data may comprise a search query and the determining a target query may comprise determining the search query as the target query.
  • the user data may comprise a user profile and the determining a target query may comprise determining an interested topic extracted from the user profile as the target query, wherein the user profile may comprise at least one of browsing history of a user, search history of the user, topics specified by the user, and common topics from other users.
  • the generating the structured image collection may comprise: generating a list of image descriptions based on the target query through an image description generating model; generating multiple images based on the list of image descriptions through a text-to-image generating model, respectively; and arranging the multiple images, that correspond to the list of image descriptions respectively, as a sequential image collection according to the sequential result intent.
  • the generating the structured image collection may comprise: breaking the target query into multiple sub-queries through a query breaking model; for each of the multiple sub-queries, generating a list of image descriptions based on the sub-query through an image description generating model, and generating a group of images based on the list of image descriptions through a text-to-image generating model, respectively; and arranging multiple groups of images, that correspond to the multiple sub-queries respectively, as a grouped image collection according to the grouped result intent.
  • the list of image descriptions may define the same attributes of image entities and/or the same image style.
  • the generating the structured image collection may comprise: obtaining a search result for the target query through an image search engine; generating multiple image lists based on the search result; selecting an image list, among the multiple image lists, that is highest relevant to the target query; and arranging the image list as a sequential image collection according to the sequential result intent.
  • the search result may comprise highest-ranked N images
  • the generating multiple image lists may comprise, for each of the highest-ranked N images, obtaining a webpage from which the image originates and extracting original ranked images from the webpage to form an image list.
  • the search result may comprise highest-ranked N webpages
  • the generating multiple image lists may comprise, for each of the highest-ranked N webpages, extracting original ranked images from the webpage to form an image list.
  • the selecting an image list may comprise: for each of the multiple image lists, generating, through a list scoring model, a score for the image list based on a plurality of images in the image list and texts associated with the plurality of images; and selecting the image list with the highest score based on multiple scores corresponding to the multiple image lists respectively.
  • the generating the structured image collection may comprise: breaking the target query into multiple sub-queries through a query breaking model; for each of the multiple sub-queries, obtaining highest-ranked N images through an image search engine; and arranging multiple groups of N images, that correspond to the multiple sub-queries respectively, as a grouped image collection according to the grouped result intent.
  • the generating the structured image collection may comprise: for the target query, obtaining highest-ranked N images through an image search engine; breaking the target query into multiple sub-queries through a query breaking model; dividing the highest-ranked N images into multiple groups of images by mapping the highest-ranked N images to the multiple sub-queries; and arranging the multiple groups of images, that correspond to the multiple sub-queries respectively, as a grouped image collection according to the grouped result intent.
  • the user data may comprise current user data
  • the structured image collection is to be presented on a user interface, and/or pre-stored, together with a corresponding search index, in an image library.
  • the user data may comprise history user data
  • the structured image collection and a corresponding search index may be pre-stored in an image library.
  • the method 1200 may further comprise: obtaining current user data; determining a current user intent based on the current user data through the intent classifier; determining a current target query based on the current user data; searching for a search index corresponding to the current user data from a plurality of search indexes in the image library based on the current target query and the current user intent; and retrieving a structured image collection corresponding to the search index from the image library.
  • the method 1200 may further comprise any steps/processes for providing a structured image collection according to the embodiments of the present disclosure as mentioned above.
  • FIG. 13 illustrates an exemplary apparatus 1300 for providing a structured image collection according to an embodiment.
  • the apparatus 1300 may comprise: a user data obtaining module 1310, for obtaining user data; a user intent determining module 1320, for determining a user intent based on the user data through an intent classifier; a target query determining module 1330, for determining a target query based on the user data; and a structured image collection generating module 1340, for generating the structured image collection based on the target query and the user intent.
  • the apparatus 1300 may also comprise any other modules configured for performing any steps and operations of the methods for providing a structured image collection according to the embodiments of the present disclosure as mentioned above.
  • FIG. 14 illustrates an exemplary apparatus 1400 for providing a structured image collection according to an embodiment.
  • the apparatus 1400 may comprise at least one processor 1410 and a memory 1420 storing computer-executable instructions.
  • the at least one processor 1410 may: obtain user data; determine a user intent based on the user data through an intent classifier, wherein the user intent comprises at least a sequential result intent or a grouped result intent; determine a target query based on the user data; and generate the structured image collection based on the target query and the user intent, wherein the structured image collection comprises a plurality of images that are structurally related.
  • the at least one processor 1410 may be further configured for performing any operations of the methods for providing a structured image collection according to the embodiments of the present disclosure as mentioned above.
  • the embodiments of the present disclosure propose a computer program product for providing a structured image collection.
  • the computer program product may comprise a computer program that is executed by at least one processor for: obtaining user data; determining a user intent based on the user data through an intent classifier, wherein the user intent comprises at least a sequential result intent or a grouped result intent; determining a target query based on the user data; and generating the structured image collection based on the target query and the user intent, wherein the structured image collection comprises a plurality of images that are structurally related.
  • the computer program in the computer program product may be further executed by the at least one processor for performing any other operations of the methods for providing a structured image collection according to the embodiments of the present disclosure as mentioned above.
  • the embodiments of the present disclosure may be embodied in a non-transitory computer-readable medium.
  • the non-transitory computer readable medium may include instructions that, when executed, cause one or more processors to perform any steps/processes of the method for providing a structured image collection according to embodiments of the disclosure described above.
  • modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.
  • processors have been described in connection with various apparatuses and methods. These processors may be implemented using electronic hardware, computer software, or any combination thereof. Whether such processors are implemented as hardware or software will depend upon the particular application and overall design constraints imposed on the system.
  • a processor, any portion of a processor, or any combination of processors presented in the present disclosure may be implemented with a micro-processor, micro-controller, digital signal processor (DSP) , a field-programmable gate array (FPGA) , a programmable logic device (PLD) , a state machine, gated logic, discrete hardware circuits, and other suitable processing components configured to perform the various functions described in the present disclosure.
  • DSP digital signal processor
  • FPGA field-programmable gate array
  • PLD programmable logic device
  • a state machine gated logic, discrete hardware circuits, and other suitable processing components configured to perform the various functions described in the present disclosure.
  • the functionality of a processor, any portion of a processor, or any combination of processors presented in the present disclosure
  • a computer-readable medium may include, by way of example, memory such as a magnetic storage device (e.g., hard disk, floppy disk, magnetic strip) , an optical disk, a smart card, a flash memory device, random access memory (RAM) , read only memory (ROM) , programmable ROM (PROM) , erasable PROM (EPROM) , electrically erasable PROM (EEPROM) , a register, or a removable disk.
  • RAM random access memory
  • ROM read only memory
  • PROM programmable ROM
  • EPROM erasable PROM
  • EEPROM electrically erasable PROM
  • EEPROM electrically erasable PROM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides methods, apparatuses and non-transitory computer-readable medium for providing a structured image collection. User data may be obtained. A user intent may be determined based on the user data through an intent classifier, wherein the user intent comprises at least a sequential result intent or a grouped result intent. A target query may be determined based on the user data. The structured image collection may be generated based on the target query and the user intent, wherein the structured image collection comprises a plurality of images that are structurally related.

Description

PROVIDING A STRUCTURED IMAGE COLLECTION BACKGROUND
Nowadays, with the development of computer techniques, rich image data is available on the internet, and people may browse interested images through various image services, such as, an image search service, an image recommendation service, etc. For example, the image search service may receive a search query from a user, and provide a search result including images relevant to the search query. The image recommendation service may obtain user data of a user, such as browsing history, preference setting, etc., and accordingly push a recommendation result including images that might be interested by the user.
SUMMARY
This Summary is provided to introduce a selection of concepts that are further described below in the Detailed Description. It is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Embodiments of the present disclosure propose methods, apparatuses and non-transitory computer-readable medium for providing a structured image collection. User data may be obtained. A user intent may be determined based on the user data through an intent classifier, wherein the user intent comprises at least a sequential result intent or a grouped result intent. A target query may be determined based on the user data. The structured image collection may be generated based on the target query and the user intent, wherein the structured image collection comprises a plurality of images that are structurally related.
It should be noted that the above one or more aspects comprise the features hereinafter fully described and particularly pointed out in the claims. The following description and the drawings set forth in detail certain illustrative features of the one or more aspects. These features are only indicative of the various ways in which the principles of various aspects may be implemented, and this disclosure is intended to include all such aspects and their equivalents.
BRIEF DESCRIPTION OF THE DRAWINGS
The disclosed aspects will hereinafter be described in conjunction with the appended drawings that are provided to illustrate and not to limit the disclosed aspects.
FIG. 1 illustrates an exemplary process of providing a structured image collection according to an embodiment.
FIG. 2 illustrates an exemplary process of user intent determination according to an embodiment.
FIG. 3 illustrates an exemplary process of structured image collection generation for a sequential result intent according to an embodiment.
FIG. 4 illustrates an exemplary process of structured image collection generation for a grouped result intent according to an embodiment.
FIG. 5 illustrates another exemplary process of structured image collection generation for a sequential result intent according to an embodiment.
FIG. 6 illustrates another exemplary process of structured image collection generation for a grouped result intent according to an embodiment.
FIG. 7 illustrates another exemplary process of structured image collection generation for a grouped result intent according to an embodiment.
FIG. 8A and FIG. 8B illustrate an exemplary presentation of a structured image collection for a sequential result intent according to an embodiment, respectively.
FIG. 9 illustrates another exemplary presentation of a structured image collection for a sequential result intent according to an embodiment.
FIG. 10A and FIG. 10B illustrate an exemplary presentation of a structured image collection for a grouped result intent according to an embodiment, respectively.
FIG. 11 illustrates another exemplary presentation of a structured image collection for a grouped result intent according to an embodiment.
FIG. 12 illustrates a flowchart of an exemplary method for providing a structured image collection according to an embodiment.
FIG. 13 illustrates an exemplary apparatus for providing a structured image collection according to an embodiment.
FIG. 14 illustrates an exemplary apparatus for providing a structured image collection according to an embodiment.
DETAILED DESCRIPTION
The present disclosure will now be discussed with reference to several exemplary implementations. It is to be understood that these implementations are discussed only for enabling those skilled in the art to better understand and thus implement the embodiments of the present disclosure, rather than suggesting any limitations on the scope of the present disclosure.
Image services may provide image-related results to users. For example, the image search service may provide a search result, the image recommendation service may push a recommendation result, etc. Images in either the search result or the recommendation result are separately arranged instead of being structurally organized. Taking the scenario of image search service as an example, for an exemplary search query “top 10 best-selling cars in 2023” which indicates that a user intends to browse images of 10 best-sold cars in 2023, a search result including multiple images each showing a best-sold car in 2023 may be generated. The images in the search result are separately arranged from each other, instead of being structurally organized as, e.g., an ordered image list which explicitly and directly shows cars ranked from 1 to 10 according to their selling data. In this case, the user has to further manually identify and extract useful images from the search result to obtain a series of images ranked or ordered from 1 to 10. Similar situation also exists for the scenario of image recommendation service.
Embodiments of the present disclosure propose to provide a structured image collection in image services. The present disclosure may effectively identify an intent of viewing a structured image collection, e.g., a user may intend to browse a series of images that are structurally related to each other, and thus provide the structured image collection that satisfies the intent. Accordingly, the performance of image services, such as, the image search service, the image recommendation service, etc., may be improved so as to better satisfy user requirements and improve user experience.
A structured image collection, which may also be termed as a structured image gallery, includes a plurality of images that are structurally related. Herein, images that are “structurally related” may refer to that these images are not arranged separately from each other, but are arranged or organized in a predetermined structure relationship among each other. For example, a plurality of images in a structured  image collection may be sequentially related and thus constitute a sequential image collection, may be organized into multiple groups and thus constitute a grouped image collection, etc. Moreover, except for images, a structured image collection may further include a title for the collection as well as brief text description for each image, so as to facilitate providing more text information about contents of the images.
In an aspect, the present disclosure may identify whether a user has an intent to view a structured image collection in a result returned by an image service, such as, in an image search result or an image recommendation result. The identifying of a user intent may be performed based on user data. For example, for the image search service, the user data may include a search query from a user. For example, for the image recommendation service, the user data may include user profile. In an implementation, the user intent may be determined based on the user data through an intent classifier. The user intent may include, e.g., a sequential result intent which indicates an intent of viewing image results in a manner of sequential image collection, a grouped result intent which indicates an intent of viewing image results in a manner of grouped image collection, etc.
In an aspect, the present disclosure may determine a target query based on the user data. The target query refers to a brief summary for user-interested image contents. For example, for the image search service, the user data includes a search query, and the search query may be determined as the target query, since the search query usually briefly reflects to what image contents the user is interested. For example, for the image recommendation service, the user data includes a user profile, and an interested topic that briefly reflects to what image contents the user is interested may be extracted from the user profile, and may be determined as the target query.
In an aspect, the present disclosure may automatically generate a structured image collection based on the target query and the user intent. In an implementation, the structured image collection may be generated based on the target query and the user intent through image collection automatic generation. For example, the image collection automatic generation may adopt an AI model. In an implementation, the structured image collection may be generated based on the target query and the user intent through search result aggregation. For example, the search result aggregation may utilize an image search engine.
In an aspect, in order to guarantee the authenticity of images in a structured image collection, the present disclosure may combine the search result aggregation approach with the image collection automatic generation approach.
In an aspect, the embodiments of the present disclosure for providing a structured image collection may be implemented in a real-time or online approach, such that a structured image collection may be generated in response to user data in real-time.
In an aspect, the embodiments of the present disclosure for providing a structured image collection may be implemented in a pre-storing approach, as a process for preparing or enriching an image library.
In an aspect, in order to achieve a quick response for the current user data, the present disclosure may combine the pre-storing approach and the real-time approach.
It should be understood that the embodiments of the present disclosure may be applied for various image services, including but not limited to the image search service, the image recommendation service, etc. Moreover, the embodiments of the present disclosure may be implemented at, e.g., a server or cloud side.
FIG. 1 illustrates an exemplary process 100 of providing a structured image collection according to an embodiment. The process 100 may be performed for various image services, e.g., the image search service, the image recommendation service, etc.
At 110, user data obtaining is performed for obtaining user data. In an implementation, for the image search service, the user data may include a search query from a user, such as, “top 10 best-selling cars in 2023” , “how to make a banana pie” , “views of the Great Wall in four seasons” , and so on. In an implementation, for the image recommendation service, the user data may include a user profile. The user profile may include various types of user information, e.g., at least one of browsing history of a user, search history of the user, topics specified by the user, common topics from other users, etc.
At 120, user intent determination is performed for determining a user intent based on the user data. The user intent may also be termed as an image result intent, a result type intent, etc. The user intent may indicate whether the user intends to view an image-related result, such as an image search result, an image  recommendation result, etc., in a manner of structured image collection, such as a sequential image collection, a grouped image collection, etc. In an implementation, the user intent may be determined through an intent classifier. The intent classifier may take the user data as an input and output a classification result indicating the user intent, e.g., a sequential result intent, a grouped result intent, etc. In other words, the intent classifier may determine at least an intent of viewing image results in a manner of sequential image collection, an intent of viewing image results in a manner of grouped image collection, etc. It should be understood that the embodiments of the present disclosure are not limited to adopt the intent classifier for determining the user intent, but may also adopt any other approaches for determining the user intent.
At 130, target query determination operation is performed for determining a target query based on the user data. The target query refers to a brief summary for user-interested image contents
In an implementation, for the image search service, the user data includes a search query, and the search query may be determined as the target query. This is because the search query usually briefly reflects to what image contents the user is interested. As an example, a search query such as “top 10 best-selling cars in 2023” , “how to make a banana pie” , “views of the Great Wall in four seasons” , etc. may be determined as the target query directly.
In an implementation, for the image recommendation service, the user data includes a user profile, and an interested topic extracted from the user profile may be determined as the target query. This is because the extracted interested topic usually briefly reflects to what image contents the user is interested. The present disclosure may adopt any approaches for extracting an interested topic from the user profile. For example, if the user profile indicates that the user has browsed or searched webpages or images associated with, e.g., selling data of cars, etc., an interested topic “top 10 best-selling cars in 2023” may be extracted from the user profile. For example, if the user profile indicates that the user followed a topic or specified a topic, such as “how to make a banana pie” , this topic may be extracted from the user profile as an interested topic for the user. For example, if the user profile comprises a common topic among some other users associated with the user, e.g., family, friends, contacts, etc., such as when at least two other users follow the same topic, this common topic may be extracted from the user profile as an interested topic for the user. In an  implementation, various information from the user profile, e.g., browsing history, search history, specified topics, common topics, and other information not detailed above, may be taking into account in combination, so as to extract an interested topic for the user.
At 140, structured image collection generation is performed for generating a structured image collection based on the target query and the user intent. The structured image collection includes a plurality of images that are structurally related. The present disclosure may adopt the search result aggregation approach and/or the image collection automatic generation approach for generating the structured image collection, which will be discussed in details in connection with FIG. 3 to FIG. 7 later.
In an aspect, the process 100 may be implemented in a real-time or online approach, and thus applied for a purpose of providing a structured image collection in real-time. In this case, the structured image collection generated according to the process 100 may be further transmitted to a device client, such that the structured image collection may be presented, as an image search result or an image recommendation result, on a user interface of the image search service or the image recommendation service.
In an aspect, the process 100 may be implemented in a pre-storing approach, and thus applied for a purpose of pre-storing structured image collections in an image library, such as preparing or enriching the image library. In this case, the user data obtaining at 110 may include obtaining history user data, e.g., a history search query or a history user profile. The history user data refers to user data that is obtained when preparing the image library. In other words, the history user data may be obtained at an earlier time period as compared to current user data in response to which the generated structured image collection may be provided directly. The history search query may be obtained by collecting any search query that a certain user once provided. In an implementation, the history search query may also be collected through crawling webpages on the internet. For example, any web crawling tools may be employed to analyze the webpages and extract potential key words, statements, etc. from therein to determine a search query as the history search query. The history user profile may be obtained by collecting the user profile that is maintained when preparing the image library. The history user data is not limited as targeting for a certain user, but may target for multiple users, such that the image library may include  enough pre-stored structured image collections. The user intent determination at 120 may include determining a history user intent, and the target query determination at 130 may include determining a history target query. The structured image collection generation at 140 may include generating a structured image collection based on the history target query and the history user intent. Then, the generated structured image collection may be pre-stored together with a corresponding search index in the image library.
The image library may be then adopted by an image service for returning a structured image collection as an image-related result. For example, the structured image collection may be quickly retrieved from the image library and presented to a user if the structured image collection matches with a target query and a user intent derived from user data of the user. In this case, the user data obtaining at 110 may include obtaining current user data targeting for the current user, e.g., a current search query or a current user profile. The user intent determination at 120 may include determining a current user intent, and the target query determination at 130 may include determining a current target query. The structured image collection generation at 140 may include retrieving the structured image collection from the image library based on the current user intent and the current target query. For example, a search index corresponding to the current user data may be selected from a plurality of search indexes in the image library based on the current user intent and the current target query. Then, a pre-stored structured image collection corresponding to the search index may be retrieved from the image library. The retrieved structured image collection may be presented on a user interface by the image search service or the image recommendation service, such that the retrieved structured image collection may be viewed by the user.
It should be understood that the structured image collection generated in the real-time approach may also be added into the image library so as to further enrich the image library. Moreover, the steps in the process 100 are not limited to any specific orders, e.g., although the user intent determination at 120 is shown as performed before the target query determination at 130, the performing orders of these two steps may also be interchanged.
FIG. 2 illustrates an exemplary process 200 of user intent determination according to an embodiment. The process 200 may be performed for determining an  user intent by an intent classifier 220.
The intent classifier 220 takes user data 210 as an input, wherein the user data 210 may be obtained by, e.g., the user data obtaining at 110 in FIG. 1. The intent classifier 220 outputs a classification result indicating a user intent corresponding to the user data 210. For example, the intent classifier 220 may classify the user data 210 to a specific class of intent among various possible classes of intent including, e.g., a sequential result intent 230, a grouped result intent 240, a non-structured result intent 250, etc.
The sequential result intent 230 may indicate an intent of viewing image results in a manner of sequential image collection. A sequential image collection may comprise a plurality of images ordered by image contents. For example, the plurality of images may be ordered in view of at least one of the following exemplary aspects: a time point when a scene shown by an image occurs, a geographical location where a scene shown by an image corresponds, attributes of one or more entities in an image such as selling data, price, size, salary, etc., a performing or operational order such as when an image relates to a step of a process, etc., a logical order such as when an image relates to a story or narrative, etc., an evolutionary order such as when an images relates to a product, species, concept, etc., or a cause and effect order, etc. For example, if the user data includes a search query “top 10 best-selling cars in 2023” indicating that the user intends to view a search result illustrating cars ranked from 1 to 10 according to their selling data, the user intent may be determined as a sequential result intent through the intent classifier 220. Similarly, if the user data includes a user profile indicating that the user is interested to “how to make banana pie” and thus the user data indicates that the user intends to view a recommendation result illustrating ordered steps for making a banana pie, the user intent may be determined as a sequential result intent through the intent classifier 220.
The grouped result intent 240 may indicate an intent of viewing image results in a manner of grouped image collection. A grouped image collection may comprise multiple groups of images that are divided by image contents. For example, the multiple groups of images may be divided in view of at least one of the following exemplary aspects: a time point when a scene shown by an image occurs, a geographical location where a scene shown by an image corresponds, attributes of one or more entities in an image such as types or categories, color, size, technique or  medium used, popularity or rating, etc., attributes of an image itself such as size, resolution, aspect ratio, style, color, author, format, etc., tags attached to an image, topics involved in an image, etc. For example, if the user data includes a search query “views of the Great Wall in four seasons” , or similarly the user data includes a user profile indicating that the user is interested to views of the Great Wall in four seasons, and thus the user data indicates that the user intends to view a search result or a recommendation result including four groups of images corresponding to four seasons including spring, summer, autumn and winter, the user intent may be determined as a grouped result intent through the intent classifier 220.
The non-structured result intent 250 may indicate an intent of viewing image results in a manner other than the manner of sequential image collection or the manner of grouped image collection. For example, the non-structured result intent 250 may indicate an intent of viewing separate images that are not structurally organized as done with the existing image searching and recommending mechanisms. It should be understood that if the user data 210 is determined as having the non-structured result intent 250, any existing image searching and recommending mechanisms may be employed to provide an image result including separate images.
In an implementation, the intent classifier 220 may be a multi-class classification model for conducting a multi-class classification task to output one of multiple intent classes (e.g., a sequential result intent, a grouped result intent, a non-structured result intent, etc. ) based on the input user data. For example, the intent classifier 220 may be implemented with a neutral network (NN) model (e.g., Transformer, BERT, etc., ) , a support vector machine (SVM) model, a decision tree (DT) model, etc., that may be used to perform the multi-class classification task.
Training data for the intent classifier 220 may be in a format of a data pair consisting of user data and a corresponding intent class. For example, for the image search service, a training data pair may be in a format of <search query, intent class>, e.g., <search query: “top 10 best-selling cars in 2023” , intent class: “sequential result intent” >, <search query: “views of the Great Wall in four seasons” , intent class: “grouped result intent” >, and so on. For example, for the image recommendation service, a training data pair may be in a format of <user profile, intent class>, which is similar with the image search service except that the search query part in the data pair is replaced by information from the user profile.
The exemplary process 200 may be performed for exactly identifying the user intent, and thus facilitate to further provide a structured image collection satisfying the user intent.
FIG. 3 illustrates an exemplary process 300 of structured image collection generation for a sequential result intent according to an embodiment. The process 300 is an exemplary implementation of the structured image collection generation at 140 in FIG. 1. The process 300 adopts the image collection automatic generation approach for automatically generating a structured image collection based on a sequential result intent 310 and a target query 320. For example, the structured image collection may be automatically generated by means of an AI model.
The sequential result intent 310 and the target query 320 may be determined according to the process 100 in FIG. 1 and/or the process 200 in FIG. 2.
At 330, a list of image descriptions may be generated based on the target query 320. Each image description may be a text description for an image in a structured image collection that is to be generated. Taking an exemplary target query “how to make a banana pie” as an example, the generated list of image descriptions may include multiple text descriptions, wherein each text description describes a specific operation/step in the process of making the banana pie.
In an implementation, the list of image descriptions may be generated through an image description generating model. The image description generating model may be a generative model, which may generate a text output based on a text input. For example, the target query may be provided or feed to the image description generating model, so as to trigger the image description generating model to output the list of image descriptions. The image description generating model may be implemented by any universal AI model, e.g., a large language model (LLM) . The LLM is a type of large-scale language model having the ability of effectively achieving general-purpose language understanding and generation. The LLM may acquire this ability by using massive amounts of data to learn a large amount of parameters during training. The image description generating model may also be implemented by any specialized AI model that is specially developed and trained for generating the list of image descriptions based on the input target query. The specialized AI model may employ any suitable model architecture, e.g., an Encoder-Decoder architecture that is implemented by a RNN model and its variants such as  LSTM variants or GRU variants, a Transformer model and its variants such as GPT variants or T5 variants, etc. Training data for the image description generating model may be in a format of a data pair consisting of a target query and a corresponding list of image descriptions, wherein the list of image descriptions may be artificially labeled.
In an aspect, the list of image descriptions generated through the image description generating model may define the same attributes of image entities and/or the same image style. An image entity refers to an object included in an image, and attributes of the image entity may comprise, e.g., color of the image entity, size of the image entity, location of the image entity in the image, as well as any other suitable attributes. Image style may refer to the approach of artistic expression, e.g., romanticism, realism, etc. The same attributes of image entities and/or the same image style facilitate to generate multiple images in a unified formality and thus the multiple images are more suitable to be organized as an image collection.
In order to make the list of image descriptions generated through the image description generating model to define the same attributes of image entities and/or the same image style, certain rules may be applied on the training data used during the training stage for the image description generating model. For example, the training data may be prepared in a way that, for each image description of the list of image descriptions for the target query, the attributes of image entities and/or the image style are defined to be unified.
At 340, multiple images may be generated based on the list of image descriptions respectively. In an implementation, the multiple images may be generated through a text-to-image generating model. The text-to-image generating model may be a generative model, which may generate an image output based on a text input. For example, each of the list of image descriptions may be provided to the text-to-image generating model, so as to trigger the text-to-image generating model to output a corresponding multiple image. The image description generating model may be implemented by any model for generating an image based on a text description, e.g., DALL-E, Midjourney, etc.
At 350, the multiple images generated at 340 may be arranged as a sequential image collection according to the sequential result intent 310. For example, since the sequential result intent 310 indicates an intent of viewing image results in a  manner of sequential image collection, the arranging at 350 may comprise ordering the multiple images to form a sequential image collection in response to the sequential result intent.
Moreover, although not shown in FIG. 3, a title for the sequential image collection and/or a brief text description for each image in the sequential image collection may be generated based on the multiple images and the target query through, e.g., a title and description generating model. The title and description generating model may be a multi-modal generative model that generates a text output based on an image and a text input. For example, the image description generating model may be implemented by any NN model, such as a Transformer model (e.g., BERT, GPT variants) , etc. The title for the sequential image collection and/or the brief text description for each image may also be included in the generated sequential image collection for facilitating providing more text information about contents of the images.
FIG. 4 illustrates an exemplary process 400 of structured image collection generation for a grouped result intent according to an embodiment. The process 400 is an exemplary implementation of the structured image collection generation at 140 in FIG. 1. The process 400 adopts the image collection automatic generation approach for automatically generating a structured image collection based on a grouped result intent 410 and a target query 420. For example, the structured image collection may be automatically generated by means of an AI model.
The grouped result intent 410 and the target query 420 may be determined according to the process 100 in FIG. 1 and/or the process 200 in FIG. 2.
At 430, the target query may be broken into multiple sub-queries. The breaking of the target query refers to semantic segmentation, rather than simple splitting of the target query into several terms or phrases. Take an exemplary target query “views of the Great Wall in four seasons” as an example, this target query may be semantically broken into four sub-queries, e.g., “views of the Great Wall in spring” , “views of the Great Wall in summer” , “views of the Great Wall in autumn” , and “views of the Great Wall in winter” .
In an implementation, the breaking of the target query into multiple sub-queries may be performed through a query breaking model. The query breaking model may be implemented by any universal AI model for conducting a text semantic  segmentation task, e.g., a LLM. The query breaking model may also be implemented by any specialized AI model that is specially developed and trained for semantically segmenting the target query into multiple sub-queries. Training data for the query breaking model may be in a format of a data pair consisting of a target query and corresponding multiple sub-queries, e.g., an exemplary training data pair <target query: “views of the Great Wall in four seasons” , sub-queries: “views of the Great Wall in spring” , “views of the Great Wall in summer” , “views of the Great Wall in autumn” , and “views of the Great Wall in winter” > .
At 440, for each of the multiple sub-queries, a list of image descriptions may be generated based on the sub-query and then a group of images may be generated based on the list of image descriptions respectively.
The operation of generating the list of image descriptions may be the same as the operation 330 in FIG. 3 except that the target query in 330 is changed as the sub-query, and thus this operation will not be detailed herewith for simplicity.
The operation of generating the group of images may be the same as the operation 340 in FIG. 3 and thus will not be detailed herewith for simplicity.
Through performing the operation 440 for each of the multiple sub-queries, multiple groups of images that correspond to the multiple sub-queries respectively may be generated.
At 450, the multiple groups of generated image may be arranged as a grouped image collection according to the grouped result intent 410. For example, since the grouped result intent 410 indicates an intent of viewing image results in a manner of grouped image collection, the arranging at 450 may comprise combining the multiple groups of images to form a grouped image collection in response to the grouped result intent.
Moreover, similar as discussed in combination with FIG. 3, a title for the grouped image collection, a title for each group, and/or a brief text description for each image in the grouped image collection may be generated, for facilitating providing more text information about contents of the images.
FIG. 5 illustrates another exemplary process 500 of structured image collection generation for a sequential result intent according to an embodiment. The process 500 is an exemplary implementation of the structured image collection generation at 140 in FIG. 1. The process 500 adopts the search result aggregation  approach for automatically generating a structured image collection based on a sequential result intent 510 and a target query 520. For example, the structured image collection may be automatically generated by aggregating search results from an image search engine.
The sequential result intent 510 and the target query 520 may be determined according to the process 100 in FIG. 1 and/or the process 200 in FIG. 2.
At 530, a search result for the target query 520 may be obtained. In an implementation, the search result for the target query may be obtained through the image search engine. For example, the target query may be provided to the image search engine, and the search result matching the target query may be then obtained from the image search engine. The image search engine may generate the search result through any existing techniques.
In an aspect, the search result may comprise multiple images matching the target query. The multiple images may be ranked by the image search engine according to freshness, authority, number of viewing, or any other metrics. Thus, highest-ranked N images may be obtained in this step.
In an aspect, the search result may comprise multiple webpages matching the target query. The multiple webpages may be ranked by the image search engine according to freshness, authority, number of viewing, or any other metrics. Thus, highest-ranked N webpages may be obtained in this step.
At 540, multiple image lists may be generated based on the search result.
In an aspect, if the search result comprises the highest-ranked N images, for each of the highest-ranked N images, a webpage from which the image originates may be obtained. This is feasible as the original URL of an image is generally maintained together with the image on the internet. Then, original ranked images, if exist, may be extracted from the webpage to form an image list. For example, for an exemplary target query “top 10 best-selling cars in 2023” , one of the highest-ranked N images in the search result may originate from a webpage which lists 10 cars ranked from 1 to 10 according to their selling data, and thus images of the 10 ranked cars in the webpage may be extracted. That is, the finding of the webpage may facilitate generating an ordered list of top 10 best-selling cars. In this way, by performing the same procedure for the N images respectively, multiple image lists may be generated.
In another aspect, if the search result comprises the highest-ranked N webpages, for each of the highest-ranked N webpages, original ranked images, if exist, may be extracted from the webpage to form an image list. In this way, by performing the same procedure for the N webpages respectively, multiple images lists may be generated.
At 550, an image list that is highest relevant to the target query may be selected among the multiple image lists. For example, for each of the multiple image lists, a score for the image list that reflects the relevance of the image list to the target query may be generated. The score may be generated based on a plurality of images in the image list, and texts associated with the plurality of images obtained from the original webpage where the images originate. Metrics for determining the score may include, e.g., number of images in the image list, matching rate of each image with the target query, matching rate of the text for each image with the target query, or any other suitable metrics that may be used to evaluate the relevance of the image list to the target query.
In an implementation, the score may be generated through a list scoring model, which may be a multi-modal model that takes the plurality of images in the image list and texts associated with the plurality of images as inputs and output a score for the image list. The list scoring model may be implemented by any universal model, such as, a LLM. The list scoring model may also be implemented by any specialized model that is customized to conduct the scoring task. The specialized list scoring model may be implemented with any model architecture, such as Transformer (e.g., BERT, GPT variants, etc. ) , or implemented as a NN model, a DT model or a logistic regression model that takes image features (which may be extracted by an image encoder like CNN, ViT, CLIP, etc. ) and text features (which may be extracted by a text encoder like BERT, DSSM, word2vec, etc. ) as inputs and outputs a score for the image list.
In this way, multiple scores corresponding to the multiple image lists respectively may be generated. Then the image list with the highest score may be selected.
At 560, the selected image list may be arranged as a sequential image collection according to the sequential result intent 510. For example, since the sequential result intent 510 indicates an intent of viewing image results in a manner of  sequential image collection, the arranging at 560 may comprise ordering multiple images in the image list to form a sequential image collection in response to the sequential result intent.
Moreover, similar as discussed in combination with FIG. 3, a title for the sequential image collection and/or a brief text description for each image in the sequential image collection may be generated, for facilitating providing more text information about contents of the images.
FIG. 6 illustrates another exemplary process 600 of structured image collection generation for a grouped result intent according to an embodiment. The process 600 is an exemplary implementation of the structured image collection generation at 140 in FIG. 1. The process 600 adopts the search result aggregation approach for automatically generating a structured image collection based on a sequential result intent 610 and a target query 620. For example, the structured image collection may be automatically generated by aggregating search results from an image search engine.
The sequential result intent 610 and the target query 620 may be determined according to the process 100 in FIG. 1 and/or the process 200 in FIG. 2.
At 630, the target query 620 may be broken into multiple sub-queries. The operation of breaking the target query may be the same as the operation 430 in FIG. 4, and thus this operation will not be detailed herewith for simplicity.
At 640, for each of the multiple sub-queries, highest-ranked N images may be obtained through an image search engine. The operation of obtaining the highest-ranked N images may be similar with the operation 530 in FIG. 5 in the case that the search result includes highest-ranked N images, except that the target query as used in 530 is changed as a sub-query, and thus this operation will not be detailed herewith for simplicity. The highest-ranked N images may form a group of N images corresponding to the sub-query. In this way, multiple groups of N images corresponding to the multiple sub-queries respectively may be obtained.
At 650, the multiple groups of N images may be arranged as a grouped image collection according to the grouped result intent 610. For example, since the grouped result intent 610 indicates an intent of viewing image results in a manner of grouped image collection, the arranging at 650 may comprise combining the multiple groups of N images to form a grouped image collection in response to the grouped  result intent.
Similar as discussed in combination with FIG. 3, a title for the grouped image collection, a title for each group, and/or a brief text description for each image in the grouped image collection may be generated, for facilitating providing more text information about contents of the images.
FIG. 7 illustrates another exemplary process 700 of structured image collection generation for a grouped result intent according to an embodiment. The process 700 is an exemplary implementation of the structured image collection generation at 140 in FIG. 1. The process 700 adopts the search result aggregation approach for automatically generating a structured image collection based on a grouped result intent 710 and a target query 720. For example, the structured image collection may be automatically generated by aggregating search results from an image search engine.
The sequential result intent 710 and the target query 720 may be determined according to the process 100 in FIG. 1 and/or the process 200 in FIG. 2.
At 730, highest-ranked N images for the target query 720 may be obtained. The operation of obtaining the highest-ranked N images may be similar with the operation 530 in FIG. 5 in the case that the search result includes highest-ranked N images, and thus this operation will not be detailed herewith for simplicity.
Unlike the process 600 in FIG. 6 that requires performing multiple searching operations through the image search engine, only one searching operation is enough according to the process 700 in FIG. 7. This is especially beneficial when the performing of searching operation through the image search engine is not convenient, or may involve significant amount of computing resources, or may lead to heavy time delay.
At 740, the target query 720 may be broken into multiple sub-queries. The operation of breaking the target query may be the same as the operation 430 in FIG. 4, and thus this operation will not be detailed herewith for simplicity.
At 750, the highest-ranked N images may be divided into multiple groups of images. In an implementation, the dividing of the highest-ranked N images may be implemented by mapping the highest-ranked N images to the multiple sub-queries. For example, a distance between a feature vector for each image and a feature vector for each sub-query may be calculated. Then, an image may be mapped to a sub-query  which has the nearest vector distance with the image. Accordingly, the highest-ranked N images may be divided into multiple groups of images, and each group of images corresponds to one of the multiple sub-queries.
At 760, the multiple groups of images, that correspond to the multiple sub-queries respectively, may be arranged as a grouped image collection according to the grouped result intent 710. For example, since the grouped result intent 710 indicates an intent of viewing image results in a manner of grouped image collection, the arranging at 760 may comprise combining the multiple groups of images to form a grouped image collection in response to the grouped result intent.
Similar as discussed in combination with FIG. 3, a title for the grouped image collection, a title for each group, and/or a brief text description for each image in the grouped image collection may be generated, for facilitating providing more text information about contents of the images.
The image collection automatic generation approach has been discussed in combination with FIG. 3 and FIG. 4, and the search result aggregation approach has been discussed in combination with FIG. 5, FIG. 6 and FIG. 7.
In an aspect, the search result aggregation approach discussed in combination with FIG. 5, FIG. 6 and FIG. 7 may be combined with the image collection automatic generation approach discussed in combination with FIG. 3 and FIG. 4 in the present disclosure. For example, the present disclosure may first attempt the search result aggregation approach, e.g., to aggregate search results from the image search engine to generate the structured image collection. If no search result is available in the search result aggregation approach, the present disclosure may then adopt the image collection automatic generation approach, e.g., to automatically generate the structured image collection. Such combining of the two approaches for generating a structure image collection may facilitate to guarantee the authenticity of images in the structured image collection. Through the above process, an image result satisfying the user intent may be provided even no search result is obtained from the image search engine.
In an aspect, the providing of a structured image collection as discussed in the embodiments of the present disclosure may be implemented in a real-time or online approach. For example, the operations for generating a structured image collection that are discussed in combination with FIG. 3 to FIG. 7 may be performed in  response to user data in real-time, and then the generated structured image collection may be transmitted by an image service to a device client for presenting on a user interface, such that the structured image collection may be viewed by a user.
In an aspect, the providing of a structured image collection as discussed in the embodiments of the present disclosure may be implemented in a pre-storing approach. For example, structured image collections generated according to the operations that are discussed in combination with FIG. 3 to FIG. 7 may be pre-stored together with corresponding search indexes in an image library for future use, such that a structured image collection may be quickly retrieved from the image library and presented to a user if the structured image collection matches with a target query and a user intent derived from user data of the user. In this case, the present disclosure may be implemented for preparing or enriching the image library.
In an aspect, the pre-storing approach and the real-time approach may be combined in the present disclosure. Since the pre-storing approach may involve less delay as compared to the real-time approach, the combining of the two approaches may facilitate to achieve a quick response for the current user data. For example, the present disclosure may first try to retrieve a structured image collection from the image library. If no matched structured image collection is found in the image library, the present disclosure may further perform the real-time approach.
FIG. 8A, FIG. 8B, FIG. 9, FIG. 10A, FIG. 10B and FIG. 11 relate to exemplary presentations of a structured image collection on a user interface. In other words, these figures illustrate examples about how the structured image collection may be presented at the frontend of the image search service or the image recommendation service.
FIG. 8A and FIG. 8B illustrate an exemplary presentation of a structured image collection for a sequential result intent according to an embodiment, respectively.
As shown in FIG. 8A, in an exemplary presentation 800A, for a sequential result intent, a sequential image collection 810 may be presented on the upper part of the user interface. As an example, the sequential image collection 810 may be provided for an exemplary target query “top 10 best-selling cars in 2023” . The sequential image collection 810 may be shown as a thumbnail image 820 which shows, e.g., one of the top best-selling 10 cars. In other words, the thumbnail image  820 may correspond to one specific image in the sequential image collection 810. The sequential image collection 810 may also include a title 830 of the sequential image collection. The image collection title may be generated through a title and description generating model as discussed above in combination with FIG. 3. In an aspect, a label, such as a cruciate shape label 840, may be displayed on the thumbnail image 820 to provide an intuitive indication that 810 is an image collection, rather than a separate image as shown in the conventional mechanism.
As shown in FIG. 8A, optionally, one or more separate images 850-1, 850-2, 850-3 and 850-4 as well as a title of each image may also be shown in the exemplary presentation 800A. The separate images 850-1, 850-2, 850-3 and 850-4 and respective titles may be obtained by any conventional mechanism for providing separate images for the target query.
Although only one sequential image collection 810 is shown in the exemplary presentation 800A, the exemplary presentation 800A may simultaneously show more than one sequential image collection. In an aspect, the number of the shown structured image collections may be specified, e.g., as a fixed number, or according to certain user setting, and so on.
In response to a user action on the sequential image collection 810, such as a clicking operation on any part of the sequential image collection 810, an exemplary presentation 800B may be presented on the user interface, to display more specific information in the sequential image collection 810, as shown in FIG. 8B. The exemplary presentation 800B may include a plurality of ordered images, such as the images 820-1 to 820-6 as shown in FIG. 8B. A progress bar control may be displayed on the user interface for enabling a user to drag it to view all the images in the sequential image collection 810. Optionally, the order of each image may also be presented through a label on the ordered images. For example, the image 820-1 may show the car ranking first according to selling data in 2023, the image 820-2 may show the car ranking second according to selling data in 2023, and so on.
A main image 860 may be displayed on the left-above part of the exemplary presentation 800B. In response to the user action on the sequential image collection 810 in the presentation 800A, the initially-presented main image 860 may be the first-ordered image in the sequential image collection 810, and in response to a further user action, the main image 860 may be switched to any one of the images  820-1 to 820-6. For example, in response to a user action on the image 820-2, the main image 860 may be switched to the image 820-2. Blocks 870 and 880 may also be presented on the exemplary presentation 800B, to show the title of the sequential image collection and a text description for the current main image 860.
FIG. 9 illustrates another exemplary presentation of a structured image collection for a sequential result intent according to an embodiment.
The exemplary presentation 900 shows a sequential image collection 910. Unlike the exemplary presentation 800A, multiple thumbnail images 920-1 to 920-6, each corresponding to an ordered image in the sequential image collection 910, may be presented in a row on the upper part of the user interface. Similar as FIG. 8A, the sequential image collection 910 may also include a title 930 of the sequential image collection. Besides, also similar as FIG. 8A, optionally, one or more separate images 940-1, 940-2, 940-3 and 940-4 as well as a title of each image may also be shown in the exemplary presentation 900.
In response to a user action on the sequential image collection 910, such as a clicking operation on the thumbnail image 920-1, a new exemplary presentation (not shown) which is the same as the exemplary presentation 800B may be displayed, to show more specific information in the sequential image collection 910. It should be understood that if the clicking operation is performed, in the presentation 900, on a thumbnail image other than the thumbnail image 920-1, e.g., the thumbnail image 920-2, the displayed main image in the presentation 800B would be the ordered image corresponding to the clicked thumbnail image 920-2.
FIG. 10A and FIG. 10B illustrate an exemplary presentation of a structured image collection for a grouped result intent according to an embodiment, respectively.
As shown in FIG. 10A, in the exemplary presentation 1000A, for a grouped result intent, a grouped image collection 1010 may be presented on the upper part of the user interface. As an example, the grouped image collection 1010 may be provided for an exemplary target query “views of the Great Wall in four seasons” . The grouped image collection 1010 may include four groups 1020A, 1020B, 1020C and 1020D. The group 1020A may include images for views of the Great Wall in spring, the group 1020B may include images for views of the Great Wall in summer, the group 1020C may include images for views of the Great Wall in autumn, and the group 1020D may include images for views of the Great Wall in winter. The structure  of the groups 1020B, 1020C and 1020D is the same as 1020A, and reference numbers in the groups 1020B, 1020C and 1020D are omitted for simplification.
For the group 1020A, similar as 810 in FIG. 8A, a thumbnail image 1030 corresponding to an image for views of the Great Wall in spring, a title of the group 1040 and a label 1050 indicating that the group 1020A is an image collection may be displayed.
Also similar as FIG. 8A, optionally, one or more separate images 1060-1, 1060-2, 1060-3 and 1060-4 as well as a title of each image may also be shown in the exemplary presentation 1000A.
Although only one grouped image collection 1010 is shown in the exemplary presentation 1000A, the exemplary presentation 1000A may simultaneously show more than one sequential image collection.
Similar as FIG. 8B, in response to a user action, such as a clicking operation, on one of the four groups 1020A, 1020B, 1020C and 1020D, an exemplary presentation 1000B may be presented on the user interface, as shown in FIG. 10B. As an example, the user action may be performed, in the presentation 1000A, on the group 1020A. Thus, the exemplary presentation 1000B may include a plurality of images 1030-1 to 1030-6 in the group 1020A, as shown in FIG. 10B.
A main image 1070 may be displayed on the left-above part of the exemplary presentation 1000B. In response to the user action on the group 1020A, the initially-presented main image 1070 may be the first image 1030-1 in the group 1020A, and in response to a further user action, the main image 1070 may be switched to any one of the images 1030-1 to 1030-6. Blocks 1080 and 1090 may also be presented on the exemplary presentation 1000B, to show the title of the group 1020A and a text description for the current main image 1070.
FIG. 11 illustrates another exemplary presentation of a structured image collection for a grouped result intent according to an embodiment.
The exemplary presentation 1100 shows a grouped image collection 1110, which includes four groups 1120A, 1120B, 1120C and 1120D. The group 1120A may include images for views of the Great Wall in spring, the group 1120B may include images for views of the Great Wall in summer, the group 1120C may include images for views of the Great Wall in autumn, and the group 1120D may include images for views of the Great Wall in winter. The structure of the groups 1120B, 1120C and  1120D is the same as the group 1120A, and reference numbers in the groups 1120B, 1120C and 1120D are omitted for simplification.
Unlike the exemplary presentation 1000A in FIG. 10A, for each group, multiple thumbnail images, each corresponding to an image in the group, may be presented in a row on the upper part of the user interface. For example, for the group 1120A, multiple thumbnail images 1130-1 to 1130-5 are presented in a row. Similar as FIG. 10A, the group 1120A may also include a title 1140 of the group. Besides, also similar as FIG. 10A, optionally, one or more separate images 1150-1, 1150-2, 1150-3 and 1150-4 as well as a title of each image may also be shown in the exemplary presentation 1100.
In response to a user action on one of the groups, such as a user action on the group 1120A, a new exemplary presentation (not shown) which is the same as the exemplary presentation 1000B may be displayed, to show more specific information in the group 1120A. The displayed main image may be the first image 1130-1 in the group 1120A.
It should be understood that all the elements in the presentations 800A, 800B, 900, 1000A, 1000B and 1100 are exemplary, and do not intend to limit the structured image collection to be presented in any specific manners. Rather, the structured image collection may be presented in any suitable manner. For example, the text description for each image may be directly presented on top of the image, instead of being presented in a text block separated from the image, or may be presented in any other manner.
FIG. 12 illustrates a flowchart of an exemplary method 1200 for providing a structured image collection according to an embodiment.
At 1210, user data may be obtained.
At 1220, a user intent may be determined based on the user data through an intent classifier, wherein the user intent comprises at least a sequential result intent or a grouped result intent.
At 1230, a target query may be determined based on the user data.
At 1240, the structured image collection may be generated based on the target query and the user intent, wherein the structured image collection comprises a plurality of images that are structurally related.
In an implementation, the sequential result intent may indicate an intent of  viewing image results in a manner of sequential image collection; and the grouped result intent may indicate an intent of viewing image results in a manner of grouped image collection.
In an implementation, in response to determining the user intent as the sequential result intent, the structured image collection may be generated as a sequential image collection in which the plurality of images are ordered by image contents. In response to determining the user intent as the grouped result intent, the structured image collection may be generated as a grouped image collection in which the plurality of images are multiple groups of images that are divided by image contents.
The user data may comprise a search query and the determining a target query may comprise determining the search query as the target query. Alternatively, the user data may comprise a user profile and the determining a target query may comprise determining an interested topic extracted from the user profile as the target query, wherein the user profile may comprise at least one of browsing history of a user, search history of the user, topics specified by the user, and common topics from other users.
In an implementation, in response to determining the user intent as the sequential result intent, the generating the structured image collection may comprise: generating a list of image descriptions based on the target query through an image description generating model; generating multiple images based on the list of image descriptions through a text-to-image generating model, respectively; and arranging the multiple images, that correspond to the list of image descriptions respectively, as a sequential image collection according to the sequential result intent.
In an implementation, in response to determining the user intent as the grouped result intent, the generating the structured image collection may comprise: breaking the target query into multiple sub-queries through a query breaking model; for each of the multiple sub-queries, generating a list of image descriptions based on the sub-query through an image description generating model, and generating a group of images based on the list of image descriptions through a text-to-image generating model, respectively; and arranging multiple groups of images, that correspond to the multiple sub-queries respectively, as a grouped image collection according to the grouped result intent.
The list of image descriptions may define the same attributes of image entities and/or the same image style.
In an implementation, in response to determining the user intent as the sequential result intent, the generating the structured image collection may comprise: obtaining a search result for the target query through an image search engine; generating multiple image lists based on the search result; selecting an image list, among the multiple image lists, that is highest relevant to the target query; and arranging the image list as a sequential image collection according to the sequential result intent. The search result may comprise highest-ranked N images, and the generating multiple image lists may comprise, for each of the highest-ranked N images, obtaining a webpage from which the image originates and extracting original ranked images from the webpage to form an image list. Alternatively, the search result may comprise highest-ranked N webpages, and the generating multiple image lists may comprise, for each of the highest-ranked N webpages, extracting original ranked images from the webpage to form an image list. The selecting an image list may comprise: for each of the multiple image lists, generating, through a list scoring model, a score for the image list based on a plurality of images in the image list and texts associated with the plurality of images; and selecting the image list with the highest score based on multiple scores corresponding to the multiple image lists respectively.
In an implementation, in response to determining the user intent as the grouped result intent, the generating the structured image collection may comprise: breaking the target query into multiple sub-queries through a query breaking model; for each of the multiple sub-queries, obtaining highest-ranked N images through an image search engine; and arranging multiple groups of N images, that correspond to the multiple sub-queries respectively, as a grouped image collection according to the grouped result intent.
In an implementation, in response to determining the user intent as the grouped result intent, the generating the structured image collection may comprise: for the target query, obtaining highest-ranked N images through an image search engine; breaking the target query into multiple sub-queries through a query breaking model; dividing the highest-ranked N images into multiple groups of images by mapping the highest-ranked N images to the multiple sub-queries; and arranging the  multiple groups of images, that correspond to the multiple sub-queries respectively, as a grouped image collection according to the grouped result intent.
In an implementation, the user data may comprise current user data, and the structured image collection is to be presented on a user interface, and/or pre-stored, together with a corresponding search index, in an image library.
In an implementation, the user data may comprise history user data, and the structured image collection and a corresponding search index may be pre-stored in an image library. The method 1200 may further comprise: obtaining current user data; determining a current user intent based on the current user data through the intent classifier; determining a current target query based on the current user data; searching for a search index corresponding to the current user data from a plurality of search indexes in the image library based on the current target query and the current user intent; and retrieving a structured image collection corresponding to the search index from the image library.
It should be appreciated that the method 1200 may further comprise any steps/processes for providing a structured image collection according to the embodiments of the present disclosure as mentioned above.
FIG. 13 illustrates an exemplary apparatus 1300 for providing a structured image collection according to an embodiment.
The apparatus 1300 may comprise: a user data obtaining module 1310, for obtaining user data; a user intent determining module 1320, for determining a user intent based on the user data through an intent classifier; a target query determining module 1330, for determining a target query based on the user data; and a structured image collection generating module 1340, for generating the structured image collection based on the target query and the user intent. Moreover, the apparatus 1300 may also comprise any other modules configured for performing any steps and operations of the methods for providing a structured image collection according to the embodiments of the present disclosure as mentioned above.
FIG. 14 illustrates an exemplary apparatus 1400 for providing a structured image collection according to an embodiment.
The apparatus 1400 may comprise at least one processor 1410 and a memory 1420 storing computer-executable instructions. When the computer-executable instructions are executed, the at least one processor 1410 may: obtain user  data; determine a user intent based on the user data through an intent classifier, wherein the user intent comprises at least a sequential result intent or a grouped result intent; determine a target query based on the user data; and generate the structured image collection based on the target query and the user intent, wherein the structured image collection comprises a plurality of images that are structurally related. The at least one processor 1410 may be further configured for performing any operations of the methods for providing a structured image collection according to the embodiments of the present disclosure as mentioned above.
The embodiments of the present disclosure propose a computer program product for providing a structured image collection. The computer program product may comprise a computer program that is executed by at least one processor for: obtaining user data; determining a user intent based on the user data through an intent classifier, wherein the user intent comprises at least a sequential result intent or a grouped result intent; determining a target query based on the user data; and generating the structured image collection based on the target query and the user intent, wherein the structured image collection comprises a plurality of images that are structurally related. Moreover, the computer program in the computer program product may be further executed by the at least one processor for performing any other operations of the methods for providing a structured image collection according to the embodiments of the present disclosure as mentioned above.
The embodiments of the present disclosure may be embodied in a non-transitory computer-readable medium. The non-transitory computer readable medium may include instructions that, when executed, cause one or more processors to perform any steps/processes of the method for providing a structured image collection according to embodiments of the disclosure described above.
It should be appreciated that all the operations in the methods described above are merely exemplary, and the present disclosure is not limited to any operations in the methods or orders of these operations, and should cover all other equivalents under the same or similar concepts.
It should also be appreciated that all the modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.
Processors have been described in connection with various apparatuses and methods. These processors may be implemented using electronic hardware, computer software, or any combination thereof. Whether such processors are implemented as hardware or software will depend upon the particular application and overall design constraints imposed on the system. By way of example, a processor, any portion of a processor, or any combination of processors presented in the present disclosure may be implemented with a micro-processor, micro-controller, digital signal processor (DSP) , a field-programmable gate array (FPGA) , a programmable logic device (PLD) , a state machine, gated logic, discrete hardware circuits, and other suitable processing components configured to perform the various functions described in the present disclosure. The functionality of a processor, any portion of a processor, or any combination of processors presented in the present disclosure may be implemented with software being executed by a microprocessor, micro-controller, DSP, or other suitable platform.
Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, threads of execution, procedures, functions, etc. The software may reside on a computer-readable medium. A computer-readable medium may include, by way of example, memory such as a magnetic storage device (e.g., hard disk, floppy disk, magnetic strip) , an optical disk, a smart card, a flash memory device, random access memory (RAM) , read only memory (ROM) , programmable ROM (PROM) , erasable PROM (EPROM) , electrically erasable PROM (EEPROM) , a register, or a removable disk. Although a memory is shown as being separate from the processor in various aspects presented in this disclosure, the memory may also be internal to the processor (e.g., a cache or a register) .
Moreover, the articles “a” and “an” as used in this specification and the appended claims should generally be construed to mean “one” or “one or more” unless specified otherwise or clear from the context to be directed to a singular form.
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be  limited to the aspects shown herein. All structural and functional equivalents to the elements of the various aspects described throughout the present disclosure that are known or later come to be known to those of ordinary skilled in the art are intended to be encompassed by the claims.

Claims (20)

  1. A method for providing a structured image collection, comprising:
    obtaining user data;
    determining a user intent based on the user data through an intent classifier, wherein the user intent comprises at least a sequential result intent or a grouped result intent;
    determining a target query based on the user data; and
    generating the structured image collection based on the target query and the user intent, wherein the structured image collection comprises a plurality of images that are structurally related.
  2. The method of claim 1, wherein
    the sequential result intent indicates an intent of viewing image results in a manner of sequential image collection; and
    the grouped result intent indicates an intent of viewing image results in a manner of grouped image collection.
  3. The method of claim 1, wherein
    in response to determining the user intent as the sequential result intent, the structured image collection is generated as a sequential image collection in which the plurality of images are ordered by image contents, and
    in response to determining the user intent as the grouped result intent, the structured image collection is generated as a grouped image collection in which the plurality of images are multiple groups of images that are divided by image contents.
  4. The method of claim 1, wherein
    the user data comprises a search query and the determining a target query comprises determining the search query as the target query; or
    the user data comprises a user profile and the determining a target query comprises determining an interested topic extracted from the user profile as the target query, wherein the user profile comprises at least one of browsing history of a user, search history of the user, topics specified by the user, and common topics from other  users.
  5. The method of claim 1, wherein in response to determining the user intent as the sequential result intent, the generating the structured image collection comprises:
    generating a list of image descriptions based on the target query through an image description generating model;
    generating multiple images based on the list of image descriptions through a text-to-image generating model, respectively; and
    arranging the multiple images, that correspond to the list of image descriptions respectively, as a sequential image collection according to the sequential result intent.
  6. The method of claim 1, wherein in response to determining the user intent as the grouped result intent, the generating the structured image collection comprises:
    breaking the target query into multiple sub-queries through a query breaking model;
    for each of the multiple sub-queries:
    generating a list of image descriptions based on the sub-query through an image description generating model; and
    generating a group of images based on the list of image descriptions through a text-to-image generating model, respectively; and
    arranging multiple groups of images, that correspond to the multiple sub-queries respectively, as a grouped image collection according to the grouped result intent.
  7. The method of claim 5 or 6, wherein the list of image descriptions defines the same attributes of image entities and/or the same image style.
  8. The method of claim 1, wherein in response to determining the user intent as the sequential result intent, the generating the structured image collection comprises:
    obtaining a search result for the target query through an image search engine;
    generating multiple image lists based on the search result;
    selecting an image list, among the multiple image lists, that is highest relevant to the target query; and
    arranging the image list as a sequential image collection according to the  sequential result intent.
  9. The method of claim 8, wherein
    the search result comprises highest-ranked N images, and
    the generating multiple image lists comprises, for each of the highest-ranked N images:
    obtaining a webpage from which the image originates; and
    extracting original ranked images from the webpage to form an image list.
  10. The method of claim 8, wherein
    the search result comprises highest-ranked N webpages, and
    the generating multiple image lists comprises, for each of the highest-ranked N webpages:
    extracting original ranked images from the webpage to form an image list.
  11. The method of claim 8, wherein the selecting an image list comprises:
    for each of the multiple image lists, generating, through a list scoring model, a score for the image list based on a plurality of images in the image list and texts associated with the plurality of images; and
    selecting the image list with the highest score based on multiple scores corresponding to the multiple image lists respectively.
  12. The method of claim 1, wherein in response to determining the user intent as the grouped result intent, the generating the structured image collection comprises:
    breaking the target query into multiple sub-queries through a query breaking model;
    for each of the multiple sub-queries, obtaining highest-ranked N images through an image search engine; and
    arranging multiple groups of N images, that correspond to the multiple sub-queries respectively, as a grouped image collection according to the grouped result intent.
  13. The method of claim 1, wherein in response to determining the user intent as the grouped result intent, the generating the structured image collection comprises:
    for the target query, obtaining highest-ranked N images through an image search engine;
    breaking the target query into multiple sub-queries through a query breaking model;
    dividing the highest-ranked N images into multiple groups of images by mapping the highest-ranked N images to the multiple sub-queries; and
    arranging the multiple groups of images, that correspond to the multiple sub-queries respectively, as a grouped image collection according to the grouped result intent.
  14. The method of claim 1, wherein
    the user data comprises current user data, and
    the structured image collection is to be presented on a user interface, and/or pre-stored, together with a corresponding search index, in an image library.
  15. The method of claim 1, wherein
    the user data comprises history user data, and
    the structured image collection and a corresponding search index are pre-stored in an image library.
  16. The method of claim 15, further comprising:
    obtaining current user data;
    determining a current user intent based on the current user data through the intent classifier;
    determining a current target query based on the current user data;
    searching for a search index corresponding to the current user data from a plurality of search indexes in the image library based on the current target query and the current user intent; and
    retrieving a structured image collection corresponding to the search index from the image library.
  17. An apparatus for providing a structured image collection, comprising:
    at least one processor; and
    a memory storing computer-executable instructions that, when executed, cause the at least one processor to:
    obtain user data;
    determine a user intent based on the user data through an intent classifier, wherein the user intent comprises at least a sequential result intent or a grouped result intent;
    determine a target query based on the user data; and
    generate the structured image collection based on the target query and the user intent, wherein the structured image collection comprises a plurality of images that are structurally related.
  18. The apparatus of claim 17, wherein
    the sequential result intent indicates an intent of viewing image results in a manner of sequential image collection; and
    the grouped result intent indicates an intent of viewing image results in a manner of grouped image collection.
  19. The apparatus of claim 17, wherein
    in response to determining the user intent as the sequential result intent, the structured image collection is generated as a sequential image collection in which the plurality of images are ordered by image contents, and
    in response to determining the user intent as the grouped result intent, the structured image collection is generated as a grouped image collection in which the plurality of images are multiple groups of images that are divided by image contents.
  20. A non-transitory computer-readable medium, comprising instructions that, when executed, cause at least one processor to:
    obtain user data;
    determine a user intent based on the user data through an intent classifier, wherein the user intent comprises at least a sequential result intent or a grouped result  intent;
    determine a target query based on the user data; and
    generate the structured image collection based on the target query and the user intent, wherein the structured image collection comprises a plurality of images that are structurally related.
PCT/CN2023/126349 2023-10-25 2023-10-25 Providing a structured image collection Pending WO2025086118A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/126349 WO2025086118A1 (en) 2023-10-25 2023-10-25 Providing a structured image collection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/126349 WO2025086118A1 (en) 2023-10-25 2023-10-25 Providing a structured image collection

Publications (1)

Publication Number Publication Date
WO2025086118A1 true WO2025086118A1 (en) 2025-05-01

Family

ID=88965020

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/126349 Pending WO2025086118A1 (en) 2023-10-25 2023-10-25 Providing a structured image collection

Country Status (1)

Country Link
WO (1) WO2025086118A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150169635A1 (en) * 2009-09-03 2015-06-18 Google Inc. Grouping of image search results

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150169635A1 (en) * 2009-09-03 2015-06-18 Google Inc. Grouping of image search results

Similar Documents

Publication Publication Date Title
Jing et al. Visual search at pinterest
CN103294815B (en) Based on key class and there are a search engine device and method of various presentation modes
US12141153B2 (en) Method, apparatus and device used to search for content
US9715493B2 (en) Method and system for monitoring social media and analyzing text to automate classification of user posts using a facet based relevance assessment model
CN101388022B (en) A Web Portrait Retrieval Method Combining Text Semantics and Visual Content
CN103838833B (en) Text retrieval system based on correlation word semantic analysis
CN109271574A (en) A kind of hot word recommended method and device
CN110457581A (en) An information recommendation method, device, electronic device and storage medium
CN104428769B (en) The information of text file reader is provided
US8732160B2 (en) Exploring large textual data sets via interactive aggregation
CN108733766A (en) A kind of data query method, apparatus and readable medium
CN103226578A (en) Method for identifying websites and finely classifying web pages in medical field
CN101261629A (en) Specific Information Search Method Based on Automatic Classification Technology
CN106776869A (en) Search optimization method, device and search engine based on neural network
WO2018013400A1 (en) Contextual based image search results
Ma et al. Stream-based live public opinion monitoring approach with adaptive probabilistic topic model
CN103942274A (en) Labeling system and method for biological medical treatment image on basis of LDA
CN111753052A (en) Provide knowledgeable answers to knowledge intent questions
WO2025086118A1 (en) Providing a structured image collection
Mao et al. A novel fast framework for topic labeling based on similarity-preserved hashing
CN120277255A (en) SurrealDB-based cross-modal data searching method and SurrealDB-based cross-modal data searching device
CN102982029B (en) A kind of search need recognition methods and device
Ahamed et al. Deduce user search progression with feedback session
Singh et al. Multi-feature segmentation and cluster based approach for product feature categorization
CN118051623A (en) Mobile application knowledge base construction system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23812845

Country of ref document: EP

Kind code of ref document: A1