CN117853600A

CN117853600A - Image generation method, device, electronic equipment and storage medium

Info

Publication number: CN117853600A
Application number: CN202410023820.6A
Authority: CN
Inventors: 关天梦; 陈璇; 朱宏; 辛永正; 魏锦; 张久金; 苏文嗣; 佘俏俏; 刘红星
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2024-01-05
Filing date: 2024-01-05
Publication date: 2024-04-09

Abstract

The disclosure provides an image generation method, an image generation device, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, and particularly relates to the fields of natural language processing, computer vision, deep learning and the like. The image generation method comprises the following steps: acquiring first description data input by a user in a current round of dialogue, wherein the first description data is used for describing a first image to be generated; generating a first image based on the first description data; and outputting the first image and at least one control for the first image as response data of the current wheel dialogue, wherein the at least one control corresponds to at least one action for the first image respectively, and any control in the at least one control is configured to respond to the operation of the control by a user to execute the corresponding action on the first image.

Description

Image generation method, device, electronic device and storage medium

技术领域Technical Field

本公开涉及人工智能技术领域，尤其涉及自然语言处理、计算机视觉、深度学习等技术领域，具体涉及一种图像生成方法及装置、电子设备、计算机可读存储介质和计算机程序产品。The present disclosure relates to the field of artificial intelligence technology, in particular to technical fields such as natural language processing, computer vision, and deep learning, and specifically to an image generation method and device, an electronic device, a computer-readable storage medium, and a computer program product.

背景技术Background technique

人工智能(Artificial Intelligence，AI)是研究使计算机来模拟人的某些思维过程和智能行为(如学习、推理、思考、规划等)的学科，既有硬件层面的技术也有软件层面的技术。人工智能硬件技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理等技术；人工智能软件技术主要包括计算机视觉技术、语音识别技术、自然语言处理技术以及机器学习/深度学习、大数据处理技术、知识图谱技术等几大方向。Artificial Intelligence (AI) is a discipline that studies how computers can simulate certain human thought processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.). It includes both hardware-level and software-level technologies. AI hardware technologies generally include technologies such as sensors, dedicated AI chips, cloud computing, distributed storage, and big data processing; AI software technologies mainly include computer vision technology, speech recognition technology, natural language processing technology, as well as machine learning/deep learning, big data processing technology, and knowledge graph technology.

大语言模型(Large Language Model，LLM，又称大模型)是使用大量文本数据训练的深度学习模型，其可以生成自然语言文本或理解自然语言文本的含义。大语言模型可以处理多种自然语言任务，例如对话、文本分类、文本生成等，是通向人工智能的一条重要途径。一些大语言模型还具有多模态数据处理能力，例如能够处理文本、图像、视频等多模态数据。Large Language Model (LLM) is a deep learning model trained with a large amount of text data, which can generate natural language text or understand the meaning of natural language text. Large language models can handle a variety of natural language tasks, such as dialogue, text classification, text generation, etc., and are an important path to artificial intelligence. Some large language models also have multimodal data processing capabilities, such as the ability to process multimodal data such as text, images, and videos.

在此部分中描述的方法不一定是之前已经设想到或采用的方法。除非另有指明，否则不应假定此部分中描述的任何方法仅因其包括在此部分中就被认为是现有技术。类似地，除非另有指明，否则此部分中提及的问题不应认为在任何现有技术中已被公认。The methods described in this section are not necessarily methods that have been previously conceived or employed. Unless otherwise indicated, it should not be assumed that any method described in this section is considered to be prior art simply because it is included in this section. Similarly, unless otherwise indicated, the issues mentioned in this section should not be considered to have been recognized in any prior art.

发明内容Summary of the invention

本公开提供了一种图像生成方法及装置、电子设备、计算机可读存储介质和计算机程序产品。The present disclosure provides an image generating method and device, an electronic device, a computer-readable storage medium, and a computer program product.

根据本公开的一方面，提供了一种图像生成方法，包括：获取用户在当前轮对话中输入的第一描述数据，其中，所述第一描述数据用于描述待生成的第一图像；基于所述第一描述数据，生成所述第一图像；以及将所述第一图像和针对所述第一图像的至少一个控件作为所述当前轮对话的应答数据进行输出，其中，所述至少一个控件与针对所述第一图像的至少一个动作分别对应，所述至少一个控件中的任一控件被配置为响应于所述用户对所述控件的操作，对所述第一图像执行相应的动作。According to one aspect of the present disclosure, there is provided an image generation method, comprising: obtaining first description data input by a user in a current round of dialogue, wherein the first description data is used to describe a first image to be generated; generating the first image based on the first description data; and outputting the first image and at least one control for the first image as response data for the current round of dialogue, wherein the at least one control corresponds to at least one action for the first image, respectively, and any one of the at least one control is configured to perform a corresponding action on the first image in response to the user's operation on the control.

根据本公开的一方面，提供了一种图像生成装置，包括：第一获取模块，被配置为获取用户在当前轮对话中输入的第一描述数据，其中，所述第一描述数据用于描述待生成的第一图像；第一生成模块，被配置为基于所述第一描述数据，生成所述第一图像；以及第一输出模块，被配置为将所述第一图像和针对所述第一图像的至少一个控件作为所述当前轮对话的应答数据进行输出，其中，所述至少一个控件与针对所述第一图像的至少一个动作分别对应，所述至少一个控件中的任一控件被配置为响应于所述用户对所述控件的操作，对所述第一图像执行相应的动作。According to one aspect of the present disclosure, there is provided an image generating device, comprising: a first acquisition module, configured to acquire first description data input by a user in a current round of dialogue, wherein the first description data is used to describe a first image to be generated; a first generation module, configured to generate the first image based on the first description data; and a first output module, configured to output the first image and at least one control for the first image as response data for the current round of dialogue, wherein the at least one control corresponds to at least one action for the first image, respectively, and any one of the at least one control is configured to perform a corresponding action on the first image in response to the user's operation on the control.

根据本公开的一方面，提供了一种电子设备，包括：至少一个处理器；以及与所述至少一个处理器通信连接的存储器；其中，所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行上述方法。According to one aspect of the present disclosure, an electronic device is provided, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can perform the above method.

根据本公开的一方面，提供了一种存储有计算机指令的非瞬时计算机可读存储介质，所述计算机指令用于使计算机执行上述方法。According to one aspect of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions is provided, wherein the computer instructions are used to cause a computer to execute the above method.

根据本公开的一方面，提供了一种计算机程序产品，包括计算机程序指令，所述计算机程序指令在被处理器执行时实现上述方法。According to one aspect of the present disclosure, a computer program product is provided, comprising computer program instructions, which implement the above method when executed by a processor.

根据本公开的一个或多个实施例，能够实现对话式的图像生成，并且支持用户进行图形界面交互，降低了操作复杂度，提高了图像生成的效率。According to one or more embodiments of the present disclosure, conversational image generation can be achieved, and users are supported to interact with a graphical interface, thereby reducing operation complexity and improving image generation efficiency.

应当理解，本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征，也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that the content described in this section is not intended to identify the key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will become easily understood through the following description.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

附图示例性地示出了实施例并且构成说明书的一部分，与说明书的文字描述一起用于讲解实施例的示例性实施方式。所示出的实施例仅出于例示的目的，并不限制权利要求的范围。在所有附图中，相同的附图标记指代类似但不一定相同的要素。The accompanying drawings exemplarily illustrate the embodiments and constitute a part of the specification, and together with the text description of the specification, are used to explain the exemplary implementation of the embodiments. The embodiments shown are for illustrative purposes only and do not limit the scope of the claims. In all drawings, the same reference numerals refer to similar but not necessarily identical elements.

图1示出了根据本公开实施例的可以在其中实施本文描述的各种方法的示例性系统的示意图；FIG1 shows a schematic diagram of an exemplary system in which various methods described herein may be implemented according to an embodiment of the present disclosure;

图2示出了根据本公开实施例的图像生成方法的流程图；FIG2 shows a flow chart of an image generating method according to an embodiment of the present disclosure;

图3A-图3P示出了根据本公开实施例的对话界面的示意图；3A-3P are schematic diagrams showing a dialogue interface according to an embodiment of the present disclosure;

图4示出了根据本公开实施例的图像生成装置的结构框图；以及FIG4 shows a structural block diagram of an image generating device according to an embodiment of the present disclosure; and

图5示出了能够用于实现本公开实施例的示例性电子设备的结构框图。FIG. 5 shows a structural block diagram of an exemplary electronic device that can be used to implement an embodiment of the present disclosure.

具体实施方式Detailed ways

以下结合附图对本公开的示范性实施例做出说明，其中包括本公开实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本公开的范围。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。The following is a description of exemplary embodiments of the present disclosure in conjunction with the accompanying drawings, including various details of the embodiments of the present disclosure to facilitate understanding, which should be considered as merely exemplary. Therefore, it should be recognized by those of ordinary skill in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope of the present disclosure. Similarly, for the sake of clarity and conciseness, the description of well-known functions and structures is omitted in the following description.

在本公开中，除非另有说明，否则使用术语“第一”、“第二”等来描述各种要素不意图限定这些要素的位置关系、时序关系或重要性关系，这种术语只是用于将一个要素与另一要素区分开。在一些示例中，第一要素和第二要素可以指向该要素的同一实例，而在某些情况下，基于上下文的描述，它们也可以指代不同实例。In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, temporal relationship, or importance relationship of these elements, and such terms are only used to distinguish one element from another element. In some examples, the first element and the second element may refer to the same instance of the element, and in some cases, based on the description of the context, they may also refer to different instances.

在本公开中对各种所述示例的描述中所使用的术语只是为了描述特定示例的目的，而并非旨在进行限制。除非上下文另外明确地表明，如果不特意限定要素的数量，则该要素可以是一个也可以是多个。此外，本公开中所使用的术语“和/或”涵盖所列出的项目中的任何一个以及全部可能的组合方式。“多个”指的是两个或两个以上。The terms used in the description of various examples in this disclosure are only for the purpose of describing specific examples and are not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the element can be one or more. In addition, the term "and/or" used in this disclosure covers any one of the listed items and all possible combinations. "Multiple" refers to two or more.

本公开的技术方案中，所涉及的用户个人信息的获取，存储和应用等，均符合相关法律法规的规定，且不违背公序良俗。In the technical solution disclosed herein, the acquisition, storage and application of user personal information involved are in compliance with the provisions of relevant laws and regulations and do not violate public order and good morals.

图像生成任务可以被划分为图像创作和图像编辑两类。其中，图像创作指的是根据用户的要求创作一幅全新的图像。图像编辑指的是对已有的图像(例如，已创作的图像)做指定的改动，以得到新的图像。Image generation tasks can be divided into two categories: image creation and image editing. Image creation refers to creating a new image according to the user's requirements. Image editing refers to making specified changes to an existing image (e.g., an already created image) to obtain a new image.

相关技术中，常常采用传统的图像生成工具，例如CorelDRAW、PhotoShop等绘图软件，来生成图像。这些图像生成工具的使用门槛高，需要用户进行专门的训练，并且操作流程复杂、繁琐，导致图像生成的效率低、成本高，难以满足用户的需求。In the related art, traditional image generation tools, such as CorelDRAW, PhotoShop and other drawing software, are often used to generate images. These image generation tools have high barriers to use and require users to undergo special training. In addition, the operation process is complex and cumbersome, resulting in low efficiency and high cost of image generation, which is difficult to meet user needs.

AIGC(AI Generated Content，AI生成内容)技术在图像生成任务中展现出了巨大的潜力。当下主流的AI图像生成技术为文生图技术，即，将用户给定的提示文本(即，prompt)输入文生图模型，文生图模型将输出所生成的图像。虽然文生图技术相较于传统的图像生成工具已经有效地提升了图像生成效率，但是，文生图技术的图像生成效果依赖于用户给定的提示文本的质量。用户通常需要多次修改提示文本才能得到相对理想的图像生成效果，因此该技术的使用门槛仍然较高。此外，该技术只能进行单次的图像生成，无法实现连续的图像生成(例如，先创作一幅图像，再对该图像进行编辑)。因此，文生图技术的图像生成效率仍然较低，难以满足用户需求。AIGC (AI Generated Content) technology has shown great potential in image generation tasks. The current mainstream AI image generation technology is the Vincent graph technology, that is, the prompt text (i.e., prompt) given by the user is input into the Vincent graph model, and the Vincent graph model will output the generated image. Although the Vincent graph technology has effectively improved the image generation efficiency compared to traditional image generation tools, the image generation effect of the Vincent graph technology depends on the quality of the prompt text given by the user. Users usually need to modify the prompt text multiple times to obtain a relatively ideal image generation effect, so the threshold for using this technology is still high. In addition, this technology can only perform a single image generation and cannot achieve continuous image generation (for example, create an image first and then edit the image). Therefore, the image generation efficiency of the Vincent graph technology is still low and it is difficult to meet user needs.

由上可见，相关技术中的图像生成方案不具有通用性，用户使用门槛高，操作流程复杂、繁琐，无法实现连续的图像生成，图像生成的效率低，难以满足用户需求。As can be seen from the above, the image generation solutions in the related art are not universal, the user threshold is high, the operation process is complicated and cumbersome, continuous image generation cannot be achieved, the image generation efficiency is low, and it is difficult to meet user needs.

针对上述问题，本公开实施例提供了一种融合了图形界面的对话式图像生成方法。用户通过与AI图像生成系统进行对话的方式即可完成图像生成，极大地降低了操作复杂度，提高了图像生成的效率。In view of the above problems, the embodiments of the present disclosure provide a conversational image generation method that integrates a graphical interface. Users can complete image generation by having a conversation with the AI image generation system, which greatly reduces the complexity of operation and improves the efficiency of image generation.

在每一轮对话中，基于用户输入的第一描述数据，理解用户的图像生成需求并生成第一图像，将所生成的第一图像和至少一个控件作为当前轮对话的应答数据返回给用户。用户可以通过对控件进行操作来对第一图像进行进一步处理，使得用户能够在同一个对话界面中完成连续的图像生成，简化了操作，提高了图像生成效率，从而提高了用户体验。In each round of dialogue, based on the first description data input by the user, the user's image generation requirements are understood and a first image is generated, and the generated first image and at least one control are returned to the user as the response data of the current round of dialogue. The user can further process the first image by operating the control, so that the user can complete continuous image generation in the same dialogue interface, which simplifies the operation, improves the image generation efficiency, and thus improves the user experience.

下面将结合附图详细描述本公开的实施例。The embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

图1示出了根据本公开的实施例可以将本文描述的各种方法和装置在其中实施的示例性系统100的示意图。参考图1，该系统100包括一个或多个客户端设备101、102、103、104、105和106、服务器120以及将一个或多个客户端设备耦接到服务器120的一个或多个通信网络110。客户端设备101、102、103、104、105和106可以被配置为执行一个或多个应用程序。FIG1 shows a schematic diagram of an exemplary system 100 in which various methods and apparatuses described herein may be implemented according to an embodiment of the present disclosure. Referring to FIG1 , the system 100 includes one or more client devices 101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120. The client devices 101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

在本公开的实施例中，客户端设备101、102、103、104、105和106以及服务器120可以运行使得能够执行图像生成方法或图像生成方法的一个或多个服务或软件应用。In an embodiment of the present disclosure, the client devices 101 , 102 , 103 , 104 , 105 , and 106 and the server 120 may run one or more services or software applications that enable execution of the image generation method or the image generation method.

在某些实施例中，服务器120还可以提供其他服务或软件应用，这些服务或软件应用可以包括非虚拟环境和虚拟环境。在某些实施例中，这些服务可以作为基于web的服务或云服务提供，例如在软件即服务(SaaS)模型下提供给客户端设备101、102、103、104、105和/或106的用户。In some embodiments, server 120 may also provide other services or software applications, which may include non-virtualized environments and virtualized environments. In some embodiments, these services may be provided as web-based services or cloud services, such as provided to users of client devices 101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.

在图1所示的配置中，服务器120可以包括实现由服务器120执行的功能的一个或多个组件。这些组件可以包括可由一个或多个处理器执行的软件组件、硬件组件或其组合。操作客户端设备101、102、103、104、105和/或106的用户可以依次利用一个或多个客户端应用程序来与服务器120进行交互以利用这些组件提供的服务。应当理解，各种不同的系统配置是可能的，其可以与系统100不同。因此，图1是用于实施本文所描述的各种方法的系统的一个示例，并且不旨在进行限制。In the configuration shown in FIG. 1 , server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof that can be executed by one or more processors. Users operating client devices 101, 102, 103, 104, 105, and/or 106 may in turn utilize one or more client applications to interact with server 120 to utilize the services provided by these components. It should be understood that a variety of different system configurations are possible, which may be different from system 100. Therefore, FIG. 1 is an example of a system for implementing the various methods described herein and is not intended to be limiting.

客户端设备101、102、103、104、105和/或106可以提供使客户端设备的用户能够与客户端设备进行交互的接口。客户端设备还可以经由该接口向用户输出信息。尽管图1仅描绘了六种客户端设备，但是本领域技术人员将能够理解，本公开可以支持任何数量的客户端设备。Client devices 101, 102, 103, 104, 105 and/or 106 may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although FIG. 1 depicts only six client devices, those skilled in the art will appreciate that the present disclosure may support any number of client devices.

客户端设备101、102、103、104、105和/或106可以包括各种类型的计算机设备，例如便携式手持设备、通用计算机(诸如个人计算机和膝上型计算机)、工作站计算机、可穿戴设备、智能屏设备、自助服务终端设备、服务机器人、车载设备、游戏系统、瘦客户端、各种消息收发设备、传感器或其他感测设备等。这些计算机设备可以运行各种类型和版本的软件应用程序和操作系统，例如MICROSOFT Windows、APPLE iOS、类UNIX操作系统、Linux或类Linux操作系统；或包括各种移动操作系统，例如MICROSOFT Windows Mobile OS、iOS、Windows Phone、Android。便携式手持设备可以包括蜂窝电话、智能电话、平板电脑、个人数字助理(PDA)等。可穿戴设备可以包括头戴式显示器(诸如智能眼镜)和其他设备。游戏系统可以包括各种手持式游戏设备、支持互联网的游戏设备等。客户端设备能够执行各种不同的应用程序，例如各种与Internet相关的应用程序、通信应用程序(例如电子邮件应用程序)、短消息服务(SMS)应用程序，并且可以使用各种通信协议。Client devices 101, 102, 103, 104, 105 and/or 106 may include various types of computer devices, such as portable handheld devices, general-purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, vehicle-mounted devices, game systems, thin clients, various messaging devices, sensors or other sensing devices, etc. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, Linux or Linux-like operating systems; or include various mobile operating systems, such as MICROSOFT Windows Mobile OS, iOS, Windows Phone, Android. Portable handheld devices may include cellular phones, smart phones, tablet computers, personal digital assistants (PDAs), etc. Wearable devices may include head-mounted displays (such as smart glasses) and other devices. Game systems may include various handheld game devices, Internet-enabled game devices, etc. Client devices are capable of executing a variety of different applications, such as various Internet-related applications, communication applications (such as email applications), short message service (SMS) applications, and may use various communication protocols.

网络110可以是本领域技术人员熟知的任何类型的网络，其可以使用多种可用协议中的任何一种(包括但不限于TCP/IP、SNA、IPX等)来支持数据通信。仅作为示例，一个或多个网络110可以是局域网(LAN)、基于以太网的网络、令牌环、广域网(WAN)、因特网、虚拟网络、虚拟专用网络(VPN)、内部网、外部网、区块链网络、公共交换电话网(PSTN)、红外网络、无线网络(例如蓝牙、Wi-Fi)和/或这些和/或其他网络的任意组合。The network 110 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, the one or more networks 110 may be a local area network (LAN), an Ethernet-based network, a token ring, a wide area network (WAN), the Internet, a virtual network, a virtual private network (VPN), an intranet, an extranet, a blockchain network, a public switched telephone network (PSTN), an infrared network, a wireless network (e.g., Bluetooth, Wi-Fi), and/or any combination of these and/or other networks.

服务器120可以包括一个或多个通用计算机、专用服务器计算机(例如PC(个人计算机)服务器、UNIX服务器、中端服务器)、刀片式服务器、大型计算机、服务器群集或任何其他适当的布置和/或组合。服务器120可以包括运行虚拟操作系统的一个或多个虚拟机，或者涉及虚拟化的其他计算架构(例如可以被虚拟化以维护服务器的虚拟存储设备的逻辑存储设备的一个或多个灵活池)。在各种实施例中，服务器120可以运行提供下文所描述的功能的一个或多个服务或软件应用。Server 120 may include one or more general purpose computers, dedicated server computers (e.g., PC (personal computer) servers, UNIX servers, mid-range servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. Server 120 may include one or more virtual machines running virtual operating systems, or other computing architectures involving virtualization (e.g., one or more flexible pools of logical storage devices that may be virtualized to maintain a server's virtual storage device). In various embodiments, server 120 may run one or more services or software applications that provide the functionality described below.

服务器120中的计算单元可以运行包括上述任何操作系统以及任何商业上可用的服务器操作系统的一个或多个操作系统。服务器120还可以运行各种附加服务器应用程序和/或中间层应用程序中的任何一个，包括HTTP服务器、FTP服务器、CGI服务器、JAVA服务器、数据库服务器等。The computing units in the server 120 may run one or more operating systems including any of the above operating systems and any commercially available server operating systems. The server 120 may also run any of a variety of additional server applications and/or middle-tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, etc.

在一些实施方式中，服务器120可以包括一个或多个应用程序，以分析和合并从客户端设备101、102、103、104、105和/或106的用户接收的数据馈送和/或事件更新。服务器120还可以包括一个或多个应用程序，以经由客户端设备101、102、103、104、105和/或106的一个或多个显示设备来显示数据馈送和/或实时事件。In some implementations, server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of client devices 101, 102, 103, 104, 105, and/or 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of client devices 101, 102, 103, 104, 105, and/or 106.

在一些实施方式中，服务器120可以为分布式系统的服务器，或者是结合了区块链的服务器。服务器120也可以是云服务器，或者是带人工智能技术的智能云计算服务器或智能云主机。云服务器是云计算服务体系中的一项主机产品，以解决传统物理主机与虚拟专用服务器(VPS，Virtual Private Server)服务中存在的管理难度大、业务扩展性弱的缺陷。In some embodiments, the server 120 may be a server of a distributed system, or a server combined with a blockchain. The server 120 may also be a cloud server, or an intelligent cloud computing server or intelligent cloud host with artificial intelligence technology. A cloud server is a host product in a cloud computing service system to solve the defects of difficult management and weak business scalability in traditional physical hosts and virtual private servers (VPS) services.

系统100还可以包括一个或多个数据库130。在某些实施例中，这些数据库可以用于存储数据和其他信息。例如，数据库130中的一个或多个可用于存储诸如音频文件和视频文件的信息。数据库130可以驻留在各种位置。例如，由服务器120使用的数据库可以在服务器120本地，或者可以远离服务器120且可以经由基于网络或专用的连接与服务器120通信。数据库130可以是不同的类型。在某些实施例中，由服务器120使用的数据库例如可以是关系数据库。这些数据库中的一个或多个可以响应于命令而存储、更新和检索到数据库以及来自数据库的数据。The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of the databases 130 may be used to store information such as audio files and video files. The databases 130 may reside in various locations. For example, the database used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The databases 130 may be of different types. In some embodiments, the databases used by the server 120 may be, for example, relational databases. One or more of these databases may store, update, and retrieve data to and from the databases in response to commands.

在某些实施例中，数据库130中的一个或多个还可以由应用程序使用来存储应用程序数据。由应用程序使用的数据库可以是不同类型的数据库，例如键值存储库，对象存储库或由文件系统支持的常规存储库。In some embodiments, one or more of the databases 130 may also be used by applications to store application data. The databases used by the applications may be different types of databases, such as a key-value store, an object store, or a conventional store backed by a file system.

图1的系统100可以以各种方式配置和操作，以使得能够应用根据本公开所描述的各种方法和装置。The system 100 of FIG. 1 may be configured and operated in various ways to enable application of various methods and apparatuses described according to the present disclosure.

根据一些实施例，客户端设备101-106可以执行本公开实施例的图像生成方法，为用户提供沉浸式的图像生成服务。具体地，用户可以通过对客户端设备101-106进行操作(例如，对鼠标、触摸屏等输入设备进行操作)来输入每轮对话的问题(query)数据，表达自己的图像生成需求。客户端设备101-106通过执行本公开实施例的图像生成方法，生成用户期望的图像，并将该图像和相应的控件作为当前轮对话的应答(response)数据输出(例如，通过显示器输出)给用户。According to some embodiments, the client devices 101-106 can execute the image generation method of the embodiment of the present disclosure to provide an immersive image generation service for the user. Specifically, the user can input the query data for each round of dialogue by operating the client devices 101-106 (for example, operating input devices such as a mouse and a touch screen) to express his or her image generation needs. The client devices 101-106 generate the image desired by the user by executing the image generation method of the embodiment of the present disclosure, and output the image and the corresponding controls as the response data of the current round of dialogue to the user (for example, output through a display).

根据一些实施例，服务器120也可以执行根据本公开实施例的图像生成方法。具体地，用户可以通过对客户端设备101-106进行操作(例如，对鼠标、触摸屏等输入设备进行操作)来输入每轮对话的问题数据，表达自己的图像生成需求。客户端设备101-106将用户每轮对话的问题数据发送给服务器120。服务器120通过执行本公开实施例的图像生成方法，生成用户期望的图像，并将该图像和相应的控件作为当前轮对话的应答数据输出给客户端设备101-106。客户端设备101-106进一步将应答数据输出(例如，通过显示器输出)给用户。According to some embodiments, the server 120 may also execute the image generation method according to the embodiment of the present disclosure. Specifically, the user may input the question data for each round of dialogue by operating the client devices 101-106 (for example, operating input devices such as a mouse and a touch screen) to express his or her image generation needs. The client devices 101-106 send the question data for each round of dialogue of the user to the server 120. The server 120 generates the image desired by the user by executing the image generation method of the embodiment of the present disclosure, and outputs the image and the corresponding controls as the response data of the current round of dialogue to the client devices 101-106. The client devices 101-106 further output the response data to the user (for example, through a display).

图2示出了根据本公开实施例的图像生成方法200的流程图。如上所述，方法200的执行主体可以是客户端设备，例如图1中所示的客户端设备101-106；也可以是服务器，例如图1中所示的服务器120。Fig. 2 shows a flow chart of an image generation method 200 according to an embodiment of the present disclosure. As described above, the execution subject of the method 200 may be a client device, such as the client devices 101-106 shown in Fig. 1, or a server, such as the server 120 shown in Fig. 1.

如图2所示，方法200包括步骤S210-S230。As shown in FIG. 2 , the method 200 includes steps S210 - S230 .

在步骤S210中，获取用户在当前轮对话中输入的第一描述数据。第一描述数据用于描述待生成的第一图像。In step S210, first description data input by the user in the current round of dialogue is obtained. The first description data is used to describe the first image to be generated.

在步骤S220中，基于第一描述数据，生成第一图像。In step S220 , a first image is generated based on the first description data.

在步骤S230中，将第一图像和针对第一图像的至少一个控件作为当前轮对话的应答数据进行输出。其中，至少一个控件与针对第一图像的至少一个动作分别对应，至少一个控件中的任一控件被配置为响应于用户对该控件的操作，对第一图像执行相应的动作。In step S230, the first image and at least one control for the first image are output as response data for the current round of dialogue, wherein at least one control corresponds to at least one action for the first image, and any one of the at least one control is configured to perform a corresponding action on the first image in response to a user's operation on the control.

根据本公开的实施例，提供了一种融合了图形界面的对话式图像生成方法。用户通过与AI图像生成系统进行对话和图形界面交互的方式即可完成图像生成，极大地降低了操作复杂度，提高了图像生成的效率。According to an embodiment of the present disclosure, a conversational image generation method integrating a graphical interface is provided. A user can complete image generation by having a conversation with an AI image generation system and interacting with the graphical interface, which greatly reduces the complexity of operation and improves the efficiency of image generation.

以下详细介绍方法200的各个步骤。The following is a detailed description of each step of method 200.

在步骤S210中，获取用户在当前轮对话中输入的第一描述数据。In step S210, the first description data input by the user in the current round of dialogue is obtained.

在本公开的实施例中，对话指的是用户输入问题(query)、AI图像生成系统输出应答(response)的交互过程。根据用户与AI图像生成系统的交互次数的不同，可以将对话划分为单轮对话和多轮对话。在单轮对话中，用户仅与AI图像生成系统进行一次交互。用户输入一个问题并得到系统输出的应答后，该对话即结束。在多轮对话中，用户与AI图像生成进行了多次交互。每次交互被称为对话中的“一轮”，包括用户输入的问题和系统针对该问题输出的应答。In an embodiment of the present disclosure, a dialogue refers to an interactive process in which a user inputs a question (query) and an AI image generation system outputs a response (response). Depending on the number of interactions between the user and the AI image generation system, the dialogue can be divided into a single-round dialogue and a multi-round dialogue. In a single-round dialogue, the user interacts with the AI image generation system only once. After the user inputs a question and gets a response output by the system, the dialogue ends. In a multi-round dialogue, the user interacts with the AI image generation multiple times. Each interaction is called a "round" in the dialogue, including the question input by the user and the response output by the system to the question.

当前轮对话可以是对话中的任意一轮，例如，可以是第一轮对话，也可以是第二轮对话、第三轮对话等。The current round of dialogue may be any round in the dialogue, for example, it may be the first round of dialogue, the second round of dialogue, the third round of dialogue, and the like.

第一描述数据用于描述待生成的第一图像，即，描述用户的图像生成需求。例如，第一描述数据可以描述用户期望生成的图像的风格、颜色、尺寸、图像中所包括的元素等。The first description data is used to describe the first image to be generated, that is, to describe the image generation requirements of the user. For example, the first description data may describe the style, color, size, elements included in the image, etc. of the image that the user expects to generate.

第一描述数据可以包括任意模态的数据，例如文本、图像、语音等。根据一些实施例，第一描述数据可以仅包括第一描述文本。根据一些实施例，第一描述数据可以同时包括第一描述文本和参考图像。根据一些实施例，第一描述数据可以仅包括第一语音。在第一描述数据包括语音的情况下，可以通过语音识别技术将该语音转化为文本，从而简化后续的数据处理步骤，提高计算效率。The first description data may include data of any modality, such as text, image, voice, etc. According to some embodiments, the first description data may include only the first description text. According to some embodiments, the first description data may include both the first description text and the reference image. According to some embodiments, the first description data may include only the first voice. In the case where the first description data includes voice, the voice may be converted into text by voice recognition technology, thereby simplifying subsequent data processing steps and improving computing efficiency.

图3A示出了根据本公开实施例的对话界面的示意图。对话界面包括输入框301。用户可以在输入框301中输入第一描述文本“帮我画只在溪水边草地上的大白兔，要漫画风格的”，并通过点击按钮302来将该第一描述文本发送至AI图像生成系统(也可称为“AI绘画助手”)。发送第一描述文本后的对话界面如图3B所示。在图3B所示的对话界面中，U表示用户，AI表示AI图像生成系统。FIG3A shows a schematic diagram of a dialogue interface according to an embodiment of the present disclosure. The dialogue interface includes an input box 301. The user can enter a first description text "Help me draw a big white rabbit on the grass by the stream, in a cartoon style" in the input box 301, and send the first description text to the AI image generation system (also referred to as "AI painting assistant") by clicking button 302. The dialogue interface after sending the first description text is shown in FIG3B. In the dialogue interface shown in FIG3B, U represents the user and AI represents the AI image generation system.

回到图3A，用户除了在输入框301中输入第一描述文本“帮我画只在溪水边草地上的大白兔，要漫画风格的”之外，还可以通过点击按钮303来输入参考图像。如图3A所示，在用户未输入参考图像时，按钮303显示有“上传参考图”字样，以提示用户输入参考图像。在用户输入参考图像后，按钮303显示有已输入的参考图像的缩略图，如图3C所示。Returning to FIG. 3A , in addition to inputting the first description text “Help me draw a big white rabbit on the grass beside the stream, in a cartoon style” in the input box 301 , the user can also input a reference image by clicking button 303 . As shown in FIG. 3A , when the user has not input a reference image, button 303 displays the words “Upload reference image” to prompt the user to input a reference image. After the user inputs a reference image, button 303 displays a thumbnail of the input reference image, as shown in FIG. 3C .

在图3C所示的对话界面中，用户通过点击按钮302，可以将第一描述文本和参考图像作为第一描述数据一并发送给AI图像生成系统。发送第一描述数据后的对话界面如图3D所示。图3D所示的对话界面相较于图3B所示的对话界面来说，增加了气泡305。气泡305表示用户输入了参考图像。用户通过对气泡305进行交互操作(例如点击、鼠标指针悬停等)，气泡305上方将弹出参考图像的缩略图。In the dialogue interface shown in FIG3C , the user can send the first description text and the reference image as the first description data to the AI image generation system by clicking button 302. The dialogue interface after sending the first description data is shown in FIG3D . Compared with the dialogue interface shown in FIG3B , the dialogue interface shown in FIG3D adds a bubble 305. Bubble 305 indicates that the user has entered a reference image. By performing interactive operations on bubble 305 (such as clicking, hovering the mouse pointer, etc.), a thumbnail of the reference image will pop up above bubble 305.

在图3A-3D所示的对话界面中，用户可以通过点击按钮304来结束当前对话，并开启新的对话。In the conversation interface shown in FIGS. 3A-3D , the user can end the current conversation and start a new conversation by clicking button 304 .

在一些情况中，用户可能不太清楚自己想要生成什么样的图像，即，不清楚第一描述文本应当如何表达。为了避免这种情况，根据一些实施例，在用户输入第一描述文本之前，AI图像生成系统可以主动向用户推荐第二图像，并输出用于描述该第二图像的第二描述文本，以激发用户的图像创作灵感，引导用户基于该第二描述文本输入第一描述文本。In some cases, the user may not be clear about what kind of image he wants to generate, that is, it is not clear how the first description text should be expressed. To avoid this situation, according to some embodiments, before the user enters the first description text, the AI image generation system can actively recommend a second image to the user and output a second description text for describing the second image to inspire the user's image creation inspiration and guide the user to enter the first description text based on the second description text.

第二图像可以根据任意推荐算法计算得出。第二图像可以有一个或多个。The second image may be calculated according to any recommendation algorithm. There may be one or more second images.

根据一些实施例，可以在不需要用户进行额外操作的前提下，同时输出第二图像及其第二描述文本，以激发用户的创作灵感。According to some embodiments, the second image and its second description text may be output simultaneously without the user having to perform additional operations, so as to inspire the user's creative inspiration.

根据一些实施例，可以仅输出第二图像。第二图像的描述文本在初始状态下未被获取或者已被获取但未被输出，即，在用户未进行额外操作的情况下，第二图像的描述文本不可见。响应于用户对某个第二图像的选择操作，输出该第二图像的第二描述文本。根据该实施例，仅显示用户感兴趣的第二图像的第二描述文本，能够避免因向用户提供过多的冗余信息而导致用户无法获得灵感，并且能够节省对话界面的展示空间。According to some embodiments, only the second image may be output. The description text of the second image is not acquired in the initial state or is acquired but not output, that is, the description text of the second image is not visible without additional operation by the user. In response to the user's selection operation on a second image, the second description text of the second image is output. According to this embodiment, only the second description text of the second image that the user is interested in is displayed, which can avoid the user from being unable to obtain inspiration due to providing too much redundant information to the user, and can save the display space of the dialogue interface.

图3E-3H示出了根据本公开实施的为用户提供创作灵感的对话界面的示意图。3E-3H show schematic diagrams of a conversation interface for providing creative inspiration to a user according to an implementation of the present disclosure.

如图3E所示，在用户首次进入对话界面或开启新对话时，为用户主动推荐第二图像306-309。此时，每个第二图像的第二描述文本都是不可见的。如果用户对当前推荐的第二图像306-309不满意，可以通过点击消息中的“换一批”标签来获取新的第二图像。As shown in FIG3E , when the user enters the conversation interface for the first time or starts a new conversation, the second images 306-309 are actively recommended to the user. At this time, the second description text of each second image is invisible. If the user is not satisfied with the currently recommended second images 306-309, he can obtain new second images by clicking the "Change a Batch" label in the message.

在用户选中第二图像306(例如，用户点击第二图像306、将鼠标指针悬停在第二图像306的时间超过阈值等)的情况下，在第二图像306中显示用于获取该图像的第二描述文本的按钮310，如图3F所示。用户通过点击按钮310，可以获取第二图像306的第二描述文本。When the user selects the second image 306 (for example, the user clicks the second image 306, the time the mouse pointer hovers over the second image 306 exceeds a threshold, etc.), a button 310 for obtaining the second description text of the image is displayed in the second image 306, as shown in FIG3F. The user can obtain the second description text of the second image 306 by clicking the button 310.

在用户点击按钮310后，第二图像306对应的第二描述文本“小白兔，草地，溪水边，云彩，草丛，低饱和度氛围，精致细节，细描，广角镜头，漫画风”被获取并输出至文本框301中，如图3G所示。用户可以直接点击发送按钮302，以将该第二描述文本作为描述其图像生成需求的第一描述文本，发送给AI图像生成系统。或者，用户也可以在第二描述文本的基础上进行修改，以得到第一描述文本，如图3H所示。After the user clicks button 310, the second description text "little white rabbit, grass, streamside, clouds, grass, low saturation atmosphere, exquisite details, fine drawing, wide-angle lens, comic style" corresponding to the second image 306 is obtained and output to the text box 301, as shown in Figure 3G. The user can directly click the send button 302 to send the second description text as the first description text describing its image generation requirements to the AI image generation system. Alternatively, the user can also modify the second description text to obtain the first description text, as shown in Figure 3H.

需要说明的是，用户在当前轮对话中的输入数据不一定总是表达图像生成需求，也可能表达其他需求，例如闲聊、询问AI图像生成系统的各个功能怎么使用等。可以理解，只有在用户的输入数据表达的是图像生成需求的情况下，该输入数据才属于描述待生成的第一图像的第一描述数据。如果用户的输入数据表达的是非图像生成需求，则该输入数据不属于第一描述数据。It should be noted that the user's input data in the current round of dialogue does not necessarily always express image generation requirements, but may also express other requirements, such as chatting, asking how to use various functions of the AI image generation system, etc. It can be understood that only when the user's input data expresses image generation requirements, the input data belongs to the first description data describing the first image to be generated. If the user's input data expresses non-image generation requirements, the input data does not belong to the first description data.

根据一些实施例，可以对用户当前轮对话的输入数据进行意图识别，以判断用户当前是否具有图像生成需求，即，判断当前的输入数据是否为描述待生成的第一图像的第一描述数据。According to some embodiments, the user's input data of the current round of dialogue may be used for intent recognition to determine whether the user currently has an image generation requirement, that is, to determine whether the current input data is first description data describing a first image to be generated.

具体地，可以获取用户在当前轮对话中的输入数据和历史轮对话中的历史对话数据(包括历史用户输入和历史系统应答)。基于输入数据和历史对话数据，识别用户在当前轮对话中的意图。响应于该意图为图像生成意图，将当前轮对话的输入数据作为第一描述数据。Specifically, the user's input data in the current round of dialogue and the historical dialogue data in the historical round of dialogue (including historical user input and historical system response) can be obtained. Based on the input data and the historical dialogue data, the user's intention in the current round of dialogue is identified. In response to the intention, the image generation intention is generated, and the input data of the current round of dialogue is used as the first description data.

根据上述实施例，能够准确识别出用户的需求，针对不同的需求进行不同的处理。例如，针对图像生成需求，进行针对性的图像生成处理；针对其他非图像生成的需求，基于相应的回复策略来回复用户。由此能够提高图像生成的针对性和准确性，避免不必要的图像生成处理，从而提高用户的图像生成体验。According to the above embodiments, the needs of users can be accurately identified, and different processing can be performed for different needs. For example, for image generation needs, targeted image generation processing can be performed; for other non-image generation needs, users can be replied to based on corresponding reply strategies. This can improve the pertinence and accuracy of image generation, avoid unnecessary image generation processing, and thus improve the user's image generation experience.

根据一些实施例，可以利用经训练的大语言模型来识别用户意图。例如，可以将用户当前轮对话的输入数据和历史对话数据填充至预设的提示(prompt)模板中，得到问询数据。将问询数据输入大语言模型，得到大语言模型输出的意图识别结果。According to some embodiments, a trained large language model can be used to identify user intent. For example, the user's input data of the current round of dialogue and historical dialogue data can be filled into a preset prompt template to obtain query data. The query data is input into the large language model to obtain the intent recognition result output by the large language model.

根据一些实施例，在用户具有非图像生成需求的情况下，可以基于相应的回复策略，生成当前轮对话的应答数据。例如，可以利用预设的问答模板对大语言模型进行训练(微调)，然后利用训练后的大语言模型生成用户问题的答案，并返回给用户。又例如，可以预先设置AI图像生成系统的帮助文档。在用户询问AI图像生成系统如何使用时，可以对该帮助文档进行检索，得到用户问题的答案，并返回给用户。According to some embodiments, when the user has non-image generation needs, the response data for the current round of dialogue can be generated based on the corresponding reply strategy. For example, the large language model can be trained (fine-tuned) using a preset question and answer template, and then the trained large language model can be used to generate answers to user questions and return them to the user. For another example, a help document for the AI image generation system can be pre-set. When a user asks how to use the AI image generation system, the help document can be retrieved to obtain answers to the user's questions and returned to the user.

图3I示出了对用户的非图像生成需求进行回复的对话界面的示意图。如图3I所示，用户在当前轮对话中的输入数据为“你是谁？你有什么能力？”。AI图像生成系统利用大语言模型生成该问题的答案“我是AI绘画小助手，可以根据你输入的内容为你创作作品、根据你提出的意见进行修改重画等操作。请问你想画一幅什么样的作品呢，可以告诉我。如果你不知道画什么，我也可以为你提供灵感～”，作为当前轮对话的应答数据输出给用户。FIG3I shows a schematic diagram of a dialogue interface for responding to a user's non-image generation needs. As shown in FIG3I , the user's input data in the current round of dialogue is "Who are you? What are your abilities?" The AI image generation system uses a large language model to generate an answer to this question: "I am an AI painting assistant. I can create works for you based on your input, modify and redraw according to your suggestions, etc. What kind of work do you want to draw? Please tell me. If you don't know what to draw, I can also provide you with inspiration~", which is output to the user as the response data of the current round of dialogue.

在通过意图识别确定用户当前轮对话的意图为图像生成意图后，可以将当前轮对话的输入数据作为描述用户期望生成的第一图像的第一描述数据，并通过执行步骤S220，基于第一描述数据生成第一图像。第一图像可以有多个。After determining that the user's intention in the current round of conversation is the intention to generate an image through intention recognition, the input data of the current round of conversation can be used as first description data describing the first image that the user expects to generate, and by executing step S220, a first image is generated based on the first description data. There can be multiple first images.

根据一些实施例，步骤S220可以包括：基于第一描述数据的数据模态，确定用于生成第一图像的图像生成模型；将第一描述数据输入该图像生成模型，以得到该图像生成模型输出的第一图像。According to some embodiments, step S220 may include: determining an image generation model for generating the first image based on the data modality of the first description data; and inputting the first description data into the image generation model to obtain the first image output by the image generation model.

图像生成模型包括文生图模型(模型的输入数据为文本)、图生图模型(模型的输入数据为图像)、文图混合生图模型(模型的输入数据为文本和图像)等。在步骤S220中，选择输入数据的模态与第一描述数据的模态相同的图像生成模型来生成第一图像。The image generation model includes a text-to-image model (the input data of the model is text), an image-to-image model (the input data of the model is an image), a text-to-image mixed image model (the input data of the model is text and an image), etc. In step S220, an image generation model whose input data has the same modality as the first description data is selected to generate the first image.

根据上述实施例，能够使图像生成模型的生图能力与用户的需求一致，从而保证图像生成的准确性。According to the above embodiments, the image generation capability of the image generation model can be made consistent with the needs of the user, thereby ensuring the accuracy of image generation.

根据一些实施例，第一描述数据包括第一描述文本。相应地，步骤S220可以包括步骤S221和S222。According to some embodiments, the first description data includes a first description text. Accordingly, step S220 may include steps S221 and S222.

在步骤S221中，对第一描述文本进行改写，以生成第二描述数据。第二描述数据包括改写后的第一描述文本。In step S221, the first description text is rewritten to generate second description data. The second description data includes the rewritten first description text.

在步骤S222中，基于第二描述数据，生成第一图像。In step S222, a first image is generated based on the second description data.

根据上述实施例，用户仅需要用口语化的语言表达自己的图像生成需求，无需描述完整的图像细节。AI图像生成系统能够对用户的自然语言指令进行智能改写，充分理解并提炼用户的图像生成需求，为用户生成更加丰富多样的、符合用户需求的第一图像，提高了图像生成的准确性。According to the above embodiment, the user only needs to express his or her image generation needs in colloquial language without describing the complete image details. The AI image generation system can intelligently rewrite the user's natural language instructions, fully understand and refine the user's image generation needs, and generate a richer and more diverse first image that meets the user's needs, thereby improving the accuracy of image generation.

根据一些实施例，步骤S221可以进一步包括步骤S2211-S2213。According to some embodiments, step S221 may further include steps S2211 - S2213 .

在步骤S2211中，确定第一描述文本的明确度。In step S2211, the clarity of the first description text is determined.

在步骤S2212中，基于第一描述文本的明确度，确定第一描述文本的改写策略。In step S2212, a rewriting strategy for the first description text is determined based on the clarity of the first description text.

在步骤S2213中，利用上述改写策略对第一描述文本进行改写，以得到改写后的第一描述文本。In step S2213, the first description text is rewritten using the above rewriting strategy to obtain a rewritten first description text.

根据上述实施例，用户仅需要用口语化的语言表达自己的图像生成需求，无需描述完整的图像细节。AI图像生成系统能够对不同明确度的第一描述文本做不同程度的、针对性的改写，准确理解用户的图像生成需求，为用户生成多样的、符合用户需求的第一图像，提高了图像生成的准确性。According to the above embodiment, the user only needs to express his or her image generation needs in colloquial language without describing the complete image details. The AI image generation system can make targeted rewrites of different degrees of first description texts with different degrees of clarity, accurately understand the user's image generation needs, and generate a variety of first images that meet the user's needs for the user, thereby improving the accuracy of image generation.

根据一些实施例，在步骤S2211中，可以利用经训练的AI模型来确定第一描述文本的明确度。该AI模型例如可以是二分类模型(模型判断第一描述文本是否明确)、回归模型(输出第一描述文本明确的概率值)、大语言模型等。According to some embodiments, in step S2211, the clarity of the first description text may be determined using a trained AI model. The AI model may be, for example, a binary classification model (the model determines whether the first description text is clear), a regression model (outputting a probability value that the first description text is clear), a large language model, etc.

不同的明确度对应于不同的改写策略。改写策略可以实现为经训练的AI改写模型。不同的AI改写模型可以对第一描述文本进行不同程度的改写。根据一些实施例，可以预先配置明确度与AI改写模型的对应关系，并利用与第一描述文本的明确度相对应的AI改写模型对第一描述文本进行改写，以得到改写后的第一描述文本。Different clarity corresponds to different rewriting strategies. The rewriting strategy can be implemented as a trained AI rewriting model. Different AI rewriting models can rewrite the first description text to different degrees. According to some embodiments, the correspondence between clarity and AI rewriting model can be pre-configured, and the first description text is rewritten using the AI rewriting model corresponding to the clarity of the first description text to obtain the rewritten first description text.

例如，在第一描述文本明确的情况下，可以利用第一改写模型对该第一描述文本做浅度的、风格、修饰词方面的改写。例如，第一描述文本“画一只猫”包含明确的主体(猫)，利用第一改写模型对其进行改写，得到的文本可以是“一只可爱的英短猫在地上打滚”、“一只狸花猫站在桌子上发呆”、“一只可爱的小白猫正在地毯上酣睡”等。在第一描述文本不明确的情况下，可以利用第二改写模型为该第一描述文本做深度的、强化的改写，例如添加主体、描述出具象的画面等。例如，第一描述文本“画一幅知音难遇的画”描述的是抽象类型的内容，利用第二改写模型对其进行深度改写，得到的文本可以是“知音难遇，一位士人独自抚琴，面前只有清冷的月光和远处的孤山，油画风格”、“画面中心是两位衣袂飘飘的古人，他们相对而坐，正在抚琴畅谈，周围是青山绿水，水墨画风格”等。For example, when the first description text is clear, the first rewriting model can be used to rewrite the first description text in terms of style and modifiers. For example, the first description text "Draw a cat" contains a clear subject (cat). The first rewriting model can be used to rewrite it, and the resulting text can be "a cute British shorthair cat rolling on the ground", "a tabby cat standing on the table in a daze", "a cute little white cat sleeping soundly on the carpet", etc. When the first description text is unclear, the second rewriting model can be used to rewrite the first description text in depth and intensification, such as adding a subject, describing a concrete picture, etc. For example, the first description text "Draw a painting where a bosom friend is hard to find" describes abstract content. The second rewriting model can be used to rewrite it in depth, and the resulting text can be "a bosom friend is hard to find, a scholar plays the piano alone, with only the cold moonlight and the lonely mountain in the distance in front of him, in the style of oil painting", "the center of the picture is two ancient people with fluttering sleeves, they sit opposite each other, playing the piano and talking, surrounded by green mountains and green waters, in the style of ink painting", etc.

根据一些实施例，在步骤S221中，可以利用经训练的语言模型对第一描述文本进行改写。由此能够利用大语言模型的语言理解能力和生成能力实现智能改写，深度理解用户的图像生成需求，使改写后的文本与用户的需求相符，从而提高改写效率和改写的准确性。According to some embodiments, in step S221, the first description text may be rewritten using the trained language model, thereby making it possible to use the language understanding and generation capabilities of the large language model to achieve intelligent rewriting, deeply understand the user's image generation needs, and make the rewritten text consistent with the user's needs, thereby improving the rewriting efficiency and accuracy.

根据一些实施例，在步骤S222中，可以基于第二描述数据的数据模态，确定用于生成第一图像的图像生成模型。随后，将第二描述数据输入图像生成模型，以得到该图像生成模型输出的第一图像。根据该实施例，能够使图像生成模型的生图能力与用户的需求一致，从而保证图像生成的准确性。并且，由于第二描述数据是经过了改写优化后的描述数据，因此基于第二描述数据所生成的第一图像相较于基于第一描述数据所生成的第一图像来说，更加符合用户的需求，从而提高了图像生成的准确性。According to some embodiments, in step S222, an image generation model for generating the first image can be determined based on the data modality of the second description data. Subsequently, the second description data is input into the image generation model to obtain the first image output by the image generation model. According to this embodiment, the image generation capability of the image generation model can be made consistent with the needs of the user, thereby ensuring the accuracy of image generation. Moreover, since the second description data is the description data after rewriting and optimization, the first image generated based on the second description data is more in line with the needs of the user than the first image generated based on the first description data, thereby improving the accuracy of image generation.

根据一些实施例，在通过步骤S220生成第一图像后，在步骤S230中，将第一图像和针对第一图像的至少一个控件作为当前轮对话的应答数据进行输出。至少一个控件与针对第一图像的至少一个动作分别对应。每个控件被配置为响应于用户对该控件的交互操作(例如点击、拖动、鼠标指针停留等)，对第一图像执行相应的动作。由此，用户可以通过对控件进行操作来对第一图像进行进一步处理，大大降低了操作复杂度，提高了图像生成效率，从而提高了用户体验。According to some embodiments, after the first image is generated by step S220, in step S230, the first image and at least one control for the first image are output as response data for the current round of dialogue. At least one control corresponds to at least one action for the first image. Each control is configured to perform a corresponding action on the first image in response to a user's interactive operation on the control (e.g., click, drag, mouse pointer stay, etc.). Thus, the user can further process the first image by operating the control, which greatly reduces the complexity of the operation, improves the efficiency of image generation, and thus improves the user experience.

根据一些实施例，可以统计用户对AI图像生成系统的各个功能的使用频率，将使用频率较高的功能配置为控件，以便于用户操作，从而降低操作复杂度，提高图像生成效率。According to some embodiments, the frequency of users' use of various functions of the AI image generation system can be counted, and the functions with higher frequency of use can be configured as controls to facilitate user operation, thereby reducing operation complexity and improving image generation efficiency.

根据一些实施例，上述至少一个控件可以包括以下至少之一：用于重新生成第一图像的第一控件；用于引用第一图像的第二控件；用于管理第一图像的第三控件；或者用于收集用户对第一图像的反馈数据的第四控件。According to some embodiments, the at least one control mentioned above may include at least one of: a first control for regenerating the first image; a second control for referencing the first image; a third control for managing the first image; or a fourth control for collecting user feedback data on the first image.

根据上述实施例，通过为用户提供不同类型的控件，能够满足用户的高频的、多样化的图像处理需求，无需用户反复输入自然语言指令，大大降低了操作复杂度，提高了图像生成效率，从而提高了用户体验。According to the above embodiments, by providing users with different types of controls, the users' high-frequency and diversified image processing needs can be met without the users having to repeatedly input natural language instructions, which greatly reduces the complexity of operations, improves image generation efficiency, and thus improves user experience.

图3J和3K示出了显示有第一图像和相应控件的对话界面。应当理解，图3J-3M中所示出的控件仅为示例。控件的数量、功能、类型(例如按钮、标签、滑块等)、布局均可以根据实际需要进行设置。Figures 3J and 3K show a dialog interface showing the first image and corresponding controls. It should be understood that the controls shown in Figures 3J-3M are only examples. The number, function, type (such as buttons, labels, sliders, etc.), and layout of the controls can be set according to actual needs.

如图3J所示，AI图像生成系统为用户生成了四幅第一图像311-314，并且在对话界面中显示有针对第一图像311-314的控件315-318。其中，控件315为引用按钮，用于引用第一图像311-314。控件316为重新生成按钮，用于重新生成第一图像。控件317和318用于收集用户对第一图像的反馈数据。具体地，控件317为点赞按钮，用于收集用户对第一图像的正向反馈数据；控件318为点踩按钮，用于收集用户对第一图像的负向反馈数据。用户通过点击控件315-318，可以触发相应的动作(引用、重新生成、点赞、点踩)。As shown in Figure 3J, the AI image generation system generates four first images 311-314 for the user, and controls 315-318 for the first images 311-314 are displayed in the dialogue interface. Among them, control 315 is a reference button for referencing the first images 311-314. Control 316 is a regenerate button for regenerating the first image. Controls 317 and 318 are used to collect user feedback data on the first image. Specifically, control 317 is a like button for collecting positive feedback data from the user on the first image; control 318 is a dislike button for collecting negative feedback data from the user on the first image. Users can trigger corresponding actions (reference, regenerate, like, dislike) by clicking controls 315-318.

根据一些实施例，当用户将鼠标指针悬停在某个第一图像上时，可以在该第一图像上弹出用于管理该第一图像的一个或多个控件。如图3K所示，当用户将鼠标指针悬停在第一图像311上时，在第一图像311上弹出控件319-321。其中，控件319用于将第一图像311保存在线上，控件320用于将第一图像311下载至本地，控件321用于分享第一图像311。用户通过点击控件319-321，可以触发相应的动作(保存、下载、分享)。According to some embodiments, when a user hovers the mouse pointer over a first image, one or more controls for managing the first image may pop up on the first image. As shown in FIG3K , when a user hovers the mouse pointer over a first image 311, controls 319-321 pop up on the first image 311. Among them, control 319 is used to save the first image 311 online, control 320 is used to download the first image 311 locally, and control 321 is used to share the first image 311. By clicking controls 319-321, the user can trigger corresponding actions (save, download, share).

根据一些实施例，方法200还包括步骤S240-S270。According to some embodiments, method 200 further includes steps S240 - S270 .

在步骤S240中，响应于用户对第二控件的操作，在下一轮对话中引用第一图像。In step S240, in response to the user's operation on the second control, the first image is referenced in the next round of dialogue.

在步骤S250中，获取用户在下一轮对话中输入的第三描述文本，其中，第三描述文本用于描述用户对该第一图像的编辑需求。In step S250, a third description text input by the user in the next round of dialogue is obtained, wherein the third description text is used to describe the user's editing requirements for the first image.

在步骤S260中，将该第一图像和第三描述文本作为用于描述下一轮对话中的待生成的第四图像的第三描述数据。In step S260, the first image and the third description text are used as third description data for describing a fourth image to be generated in the next round of dialogue.

在步骤S270中，基于第三描述数据，生成第四图像。In step S270 , a fourth image is generated based on the third description data.

根据上述实施例，通过引用控件(即第二控件)能够准确定位到待编辑的第一图像，并通过自然语言指令(即第三描述文本)描述对待编辑图像的编辑需求，由此能够在同一个对话界面中进行图像的创作以及对已创作的图像进行进一步编辑，实现了高效、流畅的图像生成。According to the above embodiment, the first image to be edited can be accurately located by referencing the control (i.e., the second control), and the editing requirements for the image to be edited can be described through natural language instructions (i.e., the third description text), thereby enabling image creation and further editing of the created image in the same dialogue interface, thereby achieving efficient and smooth image generation.

根据一些实施例，在步骤S270中，可以对第三描述文本进行改写，以生成第四描述数据。第四描述数据包括第一图像和改写后的第三描述文本。随后，基于第四描述数据，生成第四图像。According to some embodiments, in step S270, the third description text may be rewritten to generate fourth description data. The fourth description data includes the first image and the rewritten third description text. Subsequently, a fourth image is generated based on the fourth description data.

步骤S270的具体实施方式与上述步骤S220类似，此处不再赘述。The specific implementation of step S270 is similar to that of the above step S220 and will not be repeated here.

图3L和3M示出了AI图像生成系统中的引用编辑功能。如图3L所示，第一图像311-314的下方设置有引用控件315。用户通过点击控件315，可以引用第一图像311-314，并且将被引用的内容以气泡322的形式填充至输入框301中。用户可以在输入框301中进一步输入自然语言指令(即第三描述文本)“把第二幅图中的兔子改成灰色”。用户通过点击发送按钮，可以将自然语言指令和引用内容一并发送至AI图像生成系统。如图3M所示，被引用的内容以气泡323的形式显示于用户的自然语言指令的下方。气泡322表示用户引用了历史生成的图像。用户通过对气泡322进行交互操作(例如点击、鼠标指针悬停等)，气泡322上方将弹出被引用的图像的缩略图。3L and 3M illustrate the reference editing function in the AI image generation system. As shown in FIG3L , a reference control 315 is provided below the first image 311-314. By clicking the control 315, the user can reference the first image 311-314, and fill the referenced content into the input box 301 in the form of a bubble 322. The user can further enter a natural language instruction (i.e., a third description text) in the input box 301 "Change the rabbit in the second picture to gray". The user can send the natural language instruction and the referenced content to the AI image generation system together by clicking the send button. As shown in FIG3M , the referenced content is displayed below the user's natural language instruction in the form of a bubble 323. Bubble 322 indicates that the user has referenced a historically generated image. The user interacts with bubble 322 (e.g., clicks, hovers the mouse pointer, etc.), and a thumbnail of the referenced image will pop up above bubble 322.

根据一些实施例，在通过步骤S230输出第一图像和至少一个控件后，可以基于第一图像，生成并输出用于引导用户进行下一轮对话的引导信息。根据该实施例，在完成当前轮的图像生成后，进一步输出引导信息，为用户的下一步操作提供指引，由此能够降低用户的操作复杂度，提高图像生成效率。According to some embodiments, after the first image and at least one control are outputted in step S230, guidance information for guiding the user to conduct the next round of dialogue may be generated and outputted based on the first image. According to this embodiment, after the image generation of the current round is completed, guidance information is further outputted to provide guidance for the user's next operation, thereby reducing the user's operation complexity and improving the efficiency of image generation.

根据一些实施例，引导信息可以描述对第一图像的可行的操作，例如，“改为水墨画的效果”、“第一幅不错，再画一些”等。这类引导信息例如可以从多个预设的引导信息模板中选择得出。According to some embodiments, the guidance information may describe feasible operations on the first image, such as "change to ink painting effect", "the first one is good, draw some more", etc. Such guidance information may be selected from a plurality of preset guidance information templates.

根据另一些实施例，引导信息也可以是基于第一图像的第一描述文本生成的推荐的描述文本。例如，用户通过第一描述文本“帮我画只在溪水边草地上的大白兔”生成了第一图像。通过对第一描述文本进行修改，可以生成新的描述文本“帮我画一只小兔子，草地，溪水边”。这类引导信息例如可以通过大语言模型生成。According to some other embodiments, the guidance information may also be a recommended description text generated based on the first description text of the first image. For example, the user generates the first image through the first description text "Help me draw a big white rabbit on the grass beside the stream". By modifying the first description text, a new description text "Help me draw a little rabbit, on the grass, beside the stream" may be generated. This type of guidance information may be generated, for example, by a large language model.

根据一些实施例，引导信息可以以控件的方式进行输出。用户通过点击该控件，能够将引导信息填充至对话界面的输入框中。用户可以直接将该引导信息作为下一轮对话的输入数据，也可以对该引导信息进行修改，并将修改后的数据作为下一轮对话的输入数据。According to some embodiments, the guidance information may be output in the form of a control. By clicking the control, the user can fill the guidance information into the input box of the dialogue interface. The user may directly use the guidance information as input data for the next round of dialogue, or may modify the guidance information and use the modified data as input data for the next round of dialogue.

图3N和3O示出了根据本公开实施例的引导信息的示意图。3N and 3O are schematic diagrams showing guidance information according to an embodiment of the present disclosure.

图3N中以气泡控件的方式示出了4条引导信息324，分别为“改为水墨画的效果”、“第一幅不错，再画几幅”、“为我重新生成几幅”和“增加“细节刻画，质感细腻””。这些引导信息用于描述对第一图像311-314的可行的操作。用户通过点击引导信息“为我重新生成几幅”，可以将该引导信息填充至输入框301中，作为下一轮对话的输入数据。用户通过点击对话界面中的“换一批”标签，可以获取新的引导信息。FIG3N shows four pieces of guidance information 324 in the form of bubble controls, namely, “Change to ink painting effect”, “The first one is good, draw a few more”, “Regenerate a few for me” and “Add "detailed depiction, delicate texture"”. These guidance information are used to describe the feasible operations on the first images 311-314. By clicking on the guidance information "Regenerate a few for me", the user can fill the guidance information into the input box 301 as input data for the next round of dialogue. The user can obtain new guidance information by clicking on the "Change a batch" label in the dialogue interface.

图3O中以气泡控件的方式示出了3条引导信息325，分别为“帮我画一只小兔子，草地，溪水边”、帮我画一只小兔子，溪水边，精致细节”和“帮我画一只小兔子，草地，溪水边，精致细节，细描，广角镜头”。这些引导信息是系统推荐的描述文本。用户通过点击对话界面中的“换一批”标签，可以获取新的引导信息。FIG3O shows three pieces of guidance information 325 in the form of bubble controls, namely, “Help me draw a little rabbit, grass, streamside”, “Help me draw a little rabbit, streamside, exquisite details” and “Help me draw a little rabbit, grass, streamside, exquisite details, fine drawing, wide-angle lens”. These guidance information are description texts recommended by the system. The user can obtain new guidance information by clicking the “Change a batch” label in the dialogue interface.

根据一些实施例，在输出第一图像后，可以获取用户的行为数据。基于用户的行为数据，确定用户对第一图像的满意度。响应于用户对第一图像不满意，基于第一描述数据，确定待推荐给用户的第三图像，并且通过对话界面输出该第三图像。由此能够在用户产生负向行为时，主动为用户推荐相关的图像，为用户提供创作灵感，从而提高图像生成的效率。According to some embodiments, after the first image is output, the user's behavior data may be obtained. Based on the user's behavior data, the user's satisfaction with the first image is determined. In response to the user being dissatisfied with the first image, a third image to be recommended to the user is determined based on the first description data, and the third image is output through the dialogue interface. In this way, when the user has a negative behavior, relevant images can be actively recommended to the user, providing the user with creative inspiration, thereby improving the efficiency of image generation.

根据一些实施例，用户的行为数据包括用户对上述至少一个控件的操作数据和/或用户在下一轮对话中的输入数据。用户的行为数据能够反映用户对第一图像的满意度。例如，用户点击点赞、保存、下载、分享等按钮，在下一轮对话中输入正向的反馈数据“真棒，感谢～”，意味着用户对第一图像满意。用户点击点踩按钮、在下一轮对话中输入负向的反馈数据“生成效果不好，我不满意”，意味着用户对第一图像不满意。According to some embodiments, the user's behavior data includes the user's operation data on at least one of the above controls and/or the user's input data in the next round of dialogue. The user's behavior data can reflect the user's satisfaction with the first image. For example, the user clicks the like, save, download, share and other buttons, and enters positive feedback data "Great, thanks~" in the next round of dialogue, which means that the user is satisfied with the first image. The user clicks the thumbs-down button and enters negative feedback data "The generation effect is not good, I am not satisfied" in the next round of dialogue, which means that the user is dissatisfied with the first image.

根据一些实施例，可以获取用户的多种行为数据，并将多种行为数据表示成特征向量。将该特征向量输入经训练的满意度分析模型，可以得到模型输出的满意度。满意度的取值范围例如可以是0～1。According to some embodiments, multiple behavior data of users can be obtained and represented as feature vectors. The feature vectors are input into a trained satisfaction analysis model to obtain satisfaction output by the model. The value range of satisfaction can be, for example, 0 to 1.

根据一些实施例，在用户对第一图像不满意的情况下，可以在图像库中查找出与用户输入的第一描述数据相似的图像，即，符合用户需求的图像，作为待推荐给用户的第三图像。According to some embodiments, when the user is not satisfied with the first image, an image similar to the first description data input by the user, that is, an image meeting the user's needs, may be searched in the image library as a third image to be recommended to the user.

根据一些实施例，在确定了用户对第一图像的满意度之后，可以基于第一图像和满意度，对生成该第一图像的图像生成模型进行优化，从而提高图像生成模型的生图效果。According to some embodiments, after determining the user's satisfaction with the first image, the image generation model that generates the first image may be optimized based on the first image and the satisfaction, thereby improving the image generation effect of the image generation model.

例如，可以将满意度较高的第一图像作为正样本，将满意度较低的第一图像作为负样本，对图像生成模型进行微调，从而不断提高图像生成模型的文本/图像理解和图像生成能力，使生成的图像与用户的需求一致，从而提高图像生成的准确性。For example, the first image with higher satisfaction can be used as a positive sample, and the first image with lower satisfaction can be used as a negative sample to fine-tune the image generation model, thereby continuously improving the text/image understanding and image generation capabilities of the image generation model, making the generated image consistent with user needs, thereby improving the accuracy of image generation.

根据一些实施例，如图3P所示，用户与AI图像生成系统的对话界面可以被划分为历史对话记录区域326和对话区域328。历史对话记录区域326用于展示历史对话以及对历史对话进行管理(例如删除)。历史对话记录区域326可以包括用于开启新对话的控件327。用户通过点击控件327，可以结束当前对话，开启新的对话。历史对话例如可以以用户在该次对话中首次输入的数据命名。多个历史对话可以按照用户首次发送数据的时间排序，用户首次发送数据的时间距离当前时间越近，排序越靠前。According to some embodiments, as shown in FIG3P , the dialogue interface between the user and the AI image generation system may be divided into a historical dialogue record area 326 and a dialogue area 328. The historical dialogue record area 326 is used to display historical dialogues and manage (e.g., delete) historical dialogues. The historical dialogue record area 326 may include a control 327 for starting a new dialogue. By clicking on the control 327, the user can end the current dialogue and start a new dialogue. For example, a historical dialogue may be named after the data that the user first entered in the dialogue. Multiple historical dialogues can be sorted according to the time when the user first sent the data. The closer the time when the user first sent the data is to the current time, the higher the sorting.

根据本公开的实施例，为用户提供了一个对话式的AI绘画助手，在一个对话界面为用户提供沉浸式的图像生成服务，理解用户白话输入的内容、在多轮对话中明确用户需求，为用户生图改图。本公开的实施例创新融合了语言交互形式和图形界面形式，在基于对话指令操作的同时，将用户的高频需求设计为快捷点击按键，如重新生成、点击查看详细内容、反馈(点赞、点踩)等，大大简化了用户操作，帮助用户低门槛沉浸式创作，流畅编辑，高效生成满意的内容。此外，本公开的实施例能够为用户主动提供创作灵感和创作能力的引导。同时，支持通过用户对话和点击行为自动积累反馈数据，迅速反馈算法模型迭代。According to the embodiments of the present disclosure, a conversational AI painting assistant is provided for users, which provides users with immersive image generation services in a conversation interface, understands the content of user vernacular input, clarifies user needs in multiple rounds of conversations, and generates and modifies pictures for users. The embodiments of the present disclosure innovatively integrate the language interaction form and the graphical interface form. While operating based on conversation instructions, the user's high-frequency needs are designed as quick click buttons, such as regeneration, clicking to view detailed content, feedback (likes, thumbs down), etc., which greatly simplifies user operations and helps users to immersively create with low thresholds, edit smoothly, and efficiently generate satisfactory content. In addition, the embodiments of the present disclosure can actively provide users with guidance on creative inspiration and creative ability. At the same time, it supports the automatic accumulation of feedback data through user conversations and click behaviors, and rapid feedback of algorithm model iterations.

本公开的实施例弥补了现有绘画产品的使用门槛高、创作链路长等问题，为没有AI生图产品使用经验的用户提供了一个便捷的解决方案。本公开的实施例主要具备以下优势：The embodiments of the present disclosure make up for the high threshold for use and long creation process of existing painting products, and provide a convenient solution for users who have no experience in using AI raw image products. The embodiments of the present disclosure mainly have the following advantages:

1.创作门槛降低：能够理解用户输入的白话内容，并通过多轮对话明确用户需求，用户无需多次修改提示词，无需复杂的参数设置和功能选择，即可生成满意作品。1. Lowering the threshold for creation: The system can understand the vernacular content input by users and clarify user needs through multiple rounds of dialogue. Users can generate satisfactory works without having to modify prompt words multiple times, or complicated parameter settings and function selections.

2.创作链路缩短：利用文本控制编辑技术(将自然语言指令作为控制指令，对图像内容进行修改)，融合了多种编辑功能，包括图像分割、图像局部编辑(inpainting)、图像扩展(outpainting)、风格转换(例如漫画风格、油画风格、水墨画风格等)、尺寸修改、指定位置或区域内元素的增加、修改、删除等。根据用户的指令即可在同一页面完成连续编辑，无需跨多个产品和功能。同时可根据用户的反馈智能推荐用户可能感兴趣的作品，提供创作指引，帮助用户提升创作效率。2. Shorten the creation link: Using text-controlled editing technology (using natural language instructions as control instructions to modify image content), it integrates a variety of editing functions, including image segmentation, local image editing (inpainting), image expansion (outpainting), style conversion (such as comic style, oil painting style, ink painting style, etc.), size modification, adding, modifying, and deleting elements in a specified position or area. Continuous editing can be completed on the same page according to user instructions, without having to cross multiple products and functions. At the same time, it can intelligently recommend works that users may be interested in based on user feedback, provide creation guidance, and help users improve their creation efficiency.

3.反馈收集提效：通过用户对话中的反馈类表述，以及对快捷按钮的点击行为，自动积累反馈数据，无需引导用户输入主动给出反馈，同时支持根据用户行为智能调起反馈收集能力，在不打扰用户使用体验的同时，可极大提升反馈数据量级。3. Improved feedback collection efficiency: Feedback data is automatically accumulated through feedback expressions in user conversations and clicks on shortcut buttons. There is no need to guide users to actively give feedback. At the same time, the feedback collection capability is intelligently activated based on user behavior, which can greatly increase the amount of feedback data without disturbing the user experience.

本公开的实施例可以应用于AI绘画、平面设计、影视后期、图像编辑、视频编辑等领域。The embodiments of the present disclosure can be applied to fields such as AI painting, graphic design, film and television post-production, image editing, and video editing.

根据本公开的实施例，还提供了一种图像生成装置。图4示出了根据本公开实施例的图像生成装置400的结构框图。如图4所示，装置400包括第一获取模块410、第一生成模块420和第一输出模块430。According to an embodiment of the present disclosure, an image generating device is also provided. Fig. 4 shows a structural block diagram of an image generating device 400 according to an embodiment of the present disclosure. As shown in Fig. 4, the device 400 includes a first acquisition module 410, a first generation module 420 and a first output module 430.

第一获取模块，被配置为获取用户在当前轮对话中输入的第一描述数据，其中，所述第一描述数据用于描述待生成的第一图像；A first acquisition module is configured to acquire first description data input by the user in a current round of dialogue, wherein the first description data is used to describe a first image to be generated;

第一生成模块，被配置为基于所述第一描述数据，生成所述第一图像；以及A first generating module, configured to generate the first image based on the first description data; and

第一输出模块，被配置为将所述第一图像和针对所述第一图像的至少一个控件作为所述当前轮对话的应答数据进行输出，其中，所述至少一个控件与针对所述第一图像的至少一个动作分别对应，所述至少一个控件中的任一控件被配置为响应于所述用户对所述控件的操作，对所述第一图像执行相应的动作。The first output module is configured to output the first image and at least one control for the first image as response data for the current round of dialogue, wherein the at least one control corresponds to at least one action for the first image, and any one of the at least one control is configured to perform a corresponding action on the first image in response to the user's operation on the control.

根据本公开的实施例，提供了一种融合了图形界面的对话式图像生成方案。用户通过与AI图像生成系统进行对话和图形界面交互的方式即可完成图像生成，极大地降低了操作复杂度，提高了图像生成的效率。According to an embodiment of the present disclosure, a conversational image generation solution integrating a graphical interface is provided. Users can complete image generation by interacting with the AI image generation system through a conversation and a graphical interface, which greatly reduces the complexity of operation and improves the efficiency of image generation.

根据一些实施例，所述第一获取模块包括：获取单元，被配置为获取所述用户在所述当前轮对话中的输入数据和历史轮对话中的历史对话数据；识别单元，被配置为基于所述输入数据和所述历史对话数据，识别所述用户在所述当前轮对话中的意图；以及第一确定单元，被配置为响应于所述意图为图像生成意图，将所述输入数据作为所述第一描述数据。According to some embodiments, the first acquisition module includes: an acquisition unit, configured to acquire the input data of the user in the current round of conversation and the historical conversation data in the historical round of conversation; an identification unit, configured to identify the intention of the user in the current round of conversation based on the input data and the historical conversation data; and a first determination unit, configured to use the input data as the first description data in response to the intention being an image generation intention.

根据一些实施例，所述第一生成模块包括：第二确定单元，被配置为基于所述第一描述数据的数据模态，确定用于生成所述第一图像的图像生成模型；以及第一输入单元，被配置为将所述第一描述数据输入所述图像生成模型，以得到所述图像生成模型输出的所述第一图像。According to some embodiments, the first generation module includes: a second determination unit, configured to determine an image generation model for generating the first image based on a data modality of the first description data; and a first input unit, configured to input the first description data into the image generation model to obtain the first image output by the image generation model.

根据一些实施例，所述第一描述数据包括第一描述文本，并且其中，所述第一生成模块包括：改写单元，被配置为对所述第一描述文本进行改写，以生成第二描述数据，其中，所述第二描述数据包括改写后的第一描述文本；以及生成单元，被配置为基于所述第二描述数据，生成所述第一图像。According to some embodiments, the first description data includes a first description text, and wherein the first generation module includes: a rewriting unit configured to rewrite the first description text to generate second description data, wherein the second description data includes the rewritten first description text; and a generation unit configured to generate the first image based on the second description data.

根据一些实施例，所述改写单元包括：第一确定子单元，被配置为确定所述第一描述文本的明确度；第二确定子单元，被配置为基于所述明确度，确定所述第一描述文本的改写策略；以及改写子单元，被配置为利用所述改写策略对所述第一描述文本进行改写。According to some embodiments, the rewriting unit includes: a first determination subunit, configured to determine the clarity of the first description text; a second determination subunit, configured to determine a rewriting strategy for the first description text based on the clarity; and a rewriting subunit, configured to rewrite the first description text using the rewriting strategy.

根据一些实施例，所述改写单元进一步被配置为：利用经训练的语言模型对所述第一描述文本进行改写。According to some embodiments, the rewriting unit is further configured to: rewrite the first description text using a trained language model.

根据一些实施例，所述生成单元包括：第三确定子单元，被配置为基于所述第二描述数据的数据模态，确定用于生成所述第一图像的图像生成模型；以及输入子单元，被配置为将所述第二描述数据输入所述图像生成模型，以得到所述图像生成模型输出的所述第一图像。According to some embodiments, the generation unit includes: a third determination subunit, configured to determine an image generation model for generating the first image based on a data modality of the second description data; and an input subunit, configured to input the second description data into the image generation model to obtain the first image output by the image generation model.

根据一些实施例，所述第一描述数据包括第一描述文本，所述装置还包括：推荐模块，被配置为向所述用户推荐第二图像；以及第二输出模块，被配置为输出用于描述所述第二图像的第二描述文本，以引导所述用户输入所述第一描述文本。According to some embodiments, the first description data includes a first description text, and the device also includes: a recommendation module configured to recommend a second image to the user; and a second output module configured to output a second description text for describing the second image to guide the user to input the first description text.

根据一些实施例，所述第二输出模块进一步被配置为：响应于所述用户对所述第二图像的选择操作，输出所述第二描述文本。According to some embodiments, the second output module is further configured to: output the second description text in response to the user's selection operation on the second image.

根据一些实施例，装置400还包括：第二生成模块，被配置为基于所述第一图像，生成并输出用于引导所述用户进行下一轮对话的引导信息。According to some embodiments, the apparatus 400 further includes: a second generating module configured to generate and output, based on the first image, guiding information for guiding the user to conduct a next round of dialogue.

根据一些实施例，装置400还包括：第一确定模块，被配置为基于所述用户的行为数据，确定所述用户对所述第一图像的满意度；第二确定模块，被配置为响应于所述用户对所述第一图像不满意，基于所述第一描述数据，确定待推荐给所述用户的第三图像；以及第三输出模块，被配置为输出所述第三图像。According to some embodiments, the device 400 also includes: a first determination module, configured to determine the user's satisfaction with the first image based on the user's behavioral data; a second determination module, configured to determine a third image to be recommended to the user based on the first description data in response to the user being dissatisfied with the first image; and a third output module, configured to output the third image.

根据一些实施例，装置400还包括：第一确定模块，被配置为基于所述用户的行为数据，确定所述用户对所述第一图像的满意度；以及优化模块，被配置为基于所述第一图像和所述满意度，对生成所述第一图像的图像生成模型进行优化。According to some embodiments, the device 400 also includes: a first determination module, configured to determine the user's satisfaction with the first image based on the user's behavioral data; and an optimization module, configured to optimize the image generation model that generates the first image based on the first image and the satisfaction.

根据一些实施例，所述行为数据包括所述用户对所述至少一个控件的操作数据和/或所述用户在下一轮对话中的输入数据。According to some embodiments, the behavior data includes operation data of the user on the at least one control and/or input data of the user in the next round of dialogue.

根据一些实施例，所述至少一个控件包括以下至少之一：用于重新生成所述第一图像的第一控件；用于引用所述第一图像的第二控件；用于管理所述第一图像的第三控件；或者用于收集所述用户对所述第一图像的反馈数据的第四控件。According to some embodiments, the at least one control includes at least one of: a first control for regenerating the first image; a second control for referencing the first image; a third control for managing the first image; or a fourth control for collecting feedback data of the user on the first image.

根据一些实施例，装置400还包括：引用模块，被配置为响应于所述用户对所述第二控件的操作，在下一轮对话中引用所述第一图像；第二获取模块，被配置为获取所述用户在所述下一轮对话中输入的第三描述文本，其中，所述第三描述文本用于描述所述用户对所述第一图像的编辑需求；第三确定模块，被配置为将所述第一图像和所述第三描述文本作为用于描述所述下一轮对话中的待生成的第四图像的第三描述数据；以及第三生成模块，被配置为基于所述第三描述数据，生成所述第四图像。According to some embodiments, the device 400 also includes: a reference module, configured to reference the first image in the next round of conversation in response to the user's operation on the second control; a second acquisition module, configured to acquire a third description text input by the user in the next round of conversation, wherein the third description text is used to describe the user's editing requirements for the first image; a third determination module, configured to use the first image and the third description text as third description data for describing a fourth image to be generated in the next round of conversation; and a third generation module, configured to generate the fourth image based on the third description data.

应当理解，图4中所示装置400的各个模块和单元可以与参考图2描述的方法200中的各个步骤相对应。由此，上面针对方法200描述的操作、特征和优点同样适用于装置400及其包括的模块和单元。为了简洁起见，某些操作、特征和优点在此不再赘述。It should be understood that the various modules and units of the device 400 shown in FIG4 may correspond to the various steps in the method 200 described with reference to FIG2. Thus, the operations, features and advantages described above for the method 200 are also applicable to the device 400 and the modules and units included therein. For the sake of brevity, some operations, features and advantages are not described in detail herein.

虽然上面参考特定模块讨论了特定功能，但是应当注意，本文讨论的各个模块的功能可以分为多个模块，和/或多个模块的至少一些功能可以组合成单个模块。Although specific functionality is discussed above with reference to specific modules, it should be noted that the functionality of the various modules discussed herein may be separated into multiple modules, and/or at least some functionality of multiple modules may be combined into a single module.

还应当理解，本文可以在软件硬件元件或程序模块的一般上下文中描述各种技术。上面关于图4描述的各个单元可以在硬件中或在结合软件和/或固件的硬件中实现。例如，这些单元可以被实现为计算机程序代码/指令，该计算机程序代码/指令被配置为在一个或多个处理器中执行并存储在计算机可读存储介质中。可替换地，这些单元可以被实现为硬件逻辑/电路。例如，在一些实施例中，模块410-430中的一个或多个可以一起被实现在片上系统(System on Chip,SoC)中。SoC可以包括集成电路芯片(其包括处理器(例如，中央处理单元(Central Processing Unit,CPU)、微控制器、微处理器、数字信号处理器(Digital Signal Processor,DSP)等)、存储器、一个或多个通信接口、和/或其他电路中的一个或多个部件)，并且可以可选地执行所接收的程序代码和/或包括嵌入式固件以执行功能。It should also be understood that various technologies can be described herein in the general context of software hardware elements or program modules. The various units described above with respect to Figure 4 can be implemented in hardware or in hardware in combination with software and/or firmware. For example, these units can be implemented as computer program codes/instructions, which are configured to be executed in one or more processors and stored in a computer-readable storage medium. Alternatively, these units can be implemented as hardware logic/circuits. For example, in some embodiments, one or more of modules 410-430 can be implemented together in a system on chip (System on Chip, SoC). SoC can include an integrated circuit chip (which includes a processor (e.g., a central processing unit (Central Processing Unit, CPU), a microcontroller, a microprocessor, a digital signal processor (Digital Signal Processor, DSP) etc.), a memory, one or more communication interfaces, and/or one or more components in other circuits), and can optionally execute the received program code and/or include embedded firmware to perform functions.

根据本公开的实施例，还提供了一种电子设备，包括：至少一个处理器；以及与上述至少一个处理器通信连接的存储器，该存储器存储有可被上述至少一个处理器执行的指令，该指令被上述至少一个处理器执行，以使上述至少一个处理器能够执行本公开实施例的图像生成方法。According to an embodiment of the present disclosure, there is also provided an electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor, the memory storing instructions executable by the at least one processor, the instructions being executed by the at least one processor so that the at least one processor can execute the image generation method of the embodiment of the present disclosure.

根据本公开的实施例，还提供了一种存储有计算机指令的非瞬时计算机可读存储介质，该计算机指令用于使计算机执行本公开实施例的图像生成方法。According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions is further provided. The computer instructions are used to enable a computer to execute the image generating method of the embodiment of the present disclosure.

根据本公开的实施例，还提供了一种计算机程序产品，包括计算机程序指令，该计算机程序指令在被处理器执行时实现本公开实施例的图像生成方法。According to an embodiment of the present disclosure, a computer program product is further provided, including computer program instructions, which implement the image generating method of the embodiment of the present disclosure when executed by a processor.

参考图5，现将描述可以作为本公开的服务器或客户端的电子设备500的结构框图，其是可以应用于本公开的各方面的硬件设备的示例。电子设备旨在表示各种形式的数字电子的计算机设备，诸如，膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置，诸如，个人数字助理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例，并且不意在限制本文中描述的和/或者要求的本公开的实现。With reference to Figure 5, a block diagram of an electronic device 500 that can be used as a server or client of the present disclosure will now be described, which is an example of a hardware device that can be applied to various aspects of the present disclosure. The electronic device is intended to represent various forms of digital electronic computer devices, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples, and are not intended to limit the implementation of the present disclosure described and/or required herein.

如图5所示，电子设备500包括计算单元501，其可以根据存储在只读存储器(ROM)502中的计算机程序或者从存储单元508加载到随机访问存储器(RAM)503中的计算机程序，来执行各种适当的动作和处理。在RAM 503中，还可存储电子设备500操作所需的各种程序和数据。计算单元501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。As shown in Figure 5, electronic device 500 includes a computing unit 501, which can perform various appropriate actions and processes according to a computer program stored in a read-only memory (ROM) 502 or a computer program loaded from a storage unit 508 into a random access memory (RAM) 503. In RAM 503, various programs and data required for the operation of electronic device 500 can also be stored. Computing unit 501, ROM 502 and RAM 503 are connected to each other via bus 504. Input/output (I/O) interface 505 is also connected to bus 504.

电子设备500中的多个部件连接至I/O接口505，包括：输入单元506、输出单元507、存储单元508以及通信单元509。输入单元506可以是能向电子设备500输入信息的任何类型的设备，输入单元506可以接收输入的数字或字符信息，以及产生与电子设备的用户设置和/或功能控制有关的键信号输入，并且可以包括但不限于鼠标、键盘、触摸屏、轨迹板、轨迹球、操作杆、麦克风和/或遥控器。输出单元507可以是能呈现信息的任何类型的设备，并且可以包括但不限于显示器、扬声器、视频/音频输出终端、振动器和/或打印机。存储单元508可以包括但不限于磁盘、光盘。通信单元509允许电子设备500通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据，并且可以包括但不限于调制解调器、网卡、红外通信设备、无线通信收发机和/或芯片组，例如蓝牙设备、802.11设备、Wi-Fi设备、WiMAX设备、蜂窝通信设备和/或类似物。Multiple components in the electronic device 500 are connected to the I/O interface 505, including: an input unit 506, an output unit 507, a storage unit 508, and a communication unit 509. The input unit 506 can be any type of device that can input information to the electronic device 500. The input unit 506 can receive input digital or character information and generate key signal input related to user settings and/or function control of the electronic device, and can include but is not limited to a mouse, a keyboard, a touch screen, a track pad, a track ball, a joystick, a microphone, and/or a remote controller. The output unit 507 can be any type of device that can present information, and can include but is not limited to a display, a speaker, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 508 can include but is not limited to a disk, an optical disk. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks, and can include but is not limited to a modem, a network card, an infrared communication device, a wireless communication transceiver, and/or a chipset, such as a Bluetooth device, an 802.11 device, a Wi-Fi device, a WiMAX device, a cellular communication device, and/or the like.

计算单元501可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元501的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元501执行上文所描述的各个方法和处理，例如方法200。例如，在一些实施例中，方法200可被实现为计算机软件程序，其被有形地包含于机器可读介质，例如存储单元508。在一些实施例中，计算机程序的部分或者全部可以经由ROM 502和/或通信单元509而被载入和/或安装到电子设备500上。当计算机程序加载到RAM 503并由计算单元501执行时，可以执行上文描述的方法200的一个或多个步骤。备选地，在其他实施例中，计算单元501可以通过其他任何适当的方式(例如，借助于固件)而被配置为执行方法200。The computing unit 501 may be a variety of general and/or special processing components with processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital signal processors (DSPs), and any appropriate processors, controllers, microcontrollers, etc. The computing unit 501 performs the various methods and processes described above, such as method 200. For example, in some embodiments, the method 200 may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as a storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed on the electronic device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the method 200 described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the method 200 in any other appropriate manner (e.g., by means of firmware).

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、复杂可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括：实施在一个或者多个计算机程序中，该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释，该可编程处理器可以是专用或者通用可编程处理器，可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令，并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include: being implemented in one or more computer programs that can be executed and/or interpreted on a programmable system including at least one programmable processor, which can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.

用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器，使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行，作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。The program code for implementing the method of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special-purpose computer, or other programmable data processing device, so that the program code, when executed by the processor or controller, implements the functions/operations specified in the flow chart and/or block diagram. The program code may be executed entirely on the machine, partially on the machine, partially on the machine and partially on a remote machine as a stand-alone software package, or entirely on a remote machine or server.

在本公开的上下文中，机器可读介质可以是有形的介质，其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing. A more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

为了提供与用户的交互，可以在计算机上实施此处描述的系统和技术，该计算机具有：用于向用户显示信息的显示装置(例如，CRT(阴极射线管)或者LCD(液晶显示器)监视器)；以及键盘和指向装置(例如，鼠标或者轨迹球)，用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互；例如，提供给用户的反馈可以是任何形式的传感反馈(例如，视觉反馈、听觉反馈、或者触觉反馈)；并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user can provide input to the computer. Other types of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including acoustic input, voice input, or tactile input).

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如，作为数据服务器)、或者包括中间件部件的计算系统(例如，应用服务器)、或者包括前端部件的计算系统(例如，具有图形用户界面或者网络浏览器的用户计算机，用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如，通信网络)来将系统的部件相互连接。通信网络的示例包括：局域网(LAN)、广域网(WAN)、互联网和区块链网络。The systems and techniques described herein can be implemented in a computing system that includes backend components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes frontend components (e.g., a user computer with a graphical user interface or a web browser through which a user can interact with implementations of the systems and techniques described herein), or a computing system that includes any combination of such backend components, middleware components, or frontend components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: a local area network (LAN), a wide area network (WAN), the Internet, and a blockchain network.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器，也可以为分布式系统的服务器，或者是结合了区块链的服务器。A computer system may include a client and a server. The client and the server are generally remote from each other and usually interact through a communication network. The relationship of client and server is generated by computer programs running on respective computers and having a client-server relationship with each other. The server may be a cloud server, a server of a distributed system, or a server combined with a blockchain.

应该理解，可以使用上面所示的各种形式的流程，重新排序、增加或删除步骤。例如，本公开中记载的各步骤可以并行地执行、也可以顺序地或以不同的次序执行，只要能够实现本公开公开的技术方案所期望的结果，本文在此不进行限制。It should be understood that the various forms of processes shown above can be used to reorder, add or delete steps. For example, the steps described in this disclosure can be performed in parallel, sequentially or in a different order, as long as the desired results of the technical solutions disclosed in this disclosure can be achieved, and this document is not limited here.

虽然已经参照附图描述了本公开的实施例或示例，但应理解，上述的方法、系统和设备仅仅是示例性的实施例或示例，本公开的范围并不由这些实施例或示例限制，而是仅由授权后的权利要求书及其等同范围来限定。实施例或示例中的各种要素可以被省略或者可由其等同要素替代。此外，可以通过不同于本公开中描述的次序来执行各步骤。进一步地，可以以各种方式组合实施例或示例中的各种要素。重要的是随着技术的演进，在此描述的很多要素可以由本公开之后出现的等同要素进行替换。Although the embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it should be understood that the above-mentioned methods, systems and devices are merely exemplary embodiments or examples, and the scope of the present disclosure is not limited by these embodiments or examples, but is only limited by the claims after authorization and their equivalent scope. Various elements in the embodiments or examples may be omitted or replaced by their equivalent elements. In addition, each step may be performed in an order different from that described in the present disclosure. Further, the various elements in the embodiments or examples may be combined in various ways. It is important that with the evolution of technology, many of the elements described herein may be replaced by equivalent elements that appear after the present disclosure.

Claims

1. A method for generating an image, comprising:

Acquire first description data input by the user in the current round of dialogue, wherein the first description data is used to describe a first image to be generated;

generating the first image based on the first description data; and

The first image and at least one control for the first image are output as response data for the current round of dialogue, wherein the at least one control corresponds to at least one action for the first image, and any one of the at least one control is configured to perform a corresponding action on the first image in response to the user's operation on the control.

2. The method according to claim 1, wherein the step of obtaining the first description data input by the user in the current round of dialogue comprises:

Acquire input data of the user in the current round of dialogue and historical dialogue data in historical rounds of dialogue;

Based on the input data and the historical conversation data, identifying the user's intention in the current round of conversation; and

In response to the intention being an image generation intention, the input data is used as the first description data.

3. The method according to claim 1 or 2, wherein generating the first image based on the first description data comprises:

determining, based on the data modality of the first description data, an image generation model for generating the first image; and

The first description data is input into the image generation model to obtain the first image output by the image generation model.

4. The method according to claim 1 or 2, wherein the first description data comprises a first description text, and wherein the generating the first image based on the first description data comprises:

rewriting the first description text to generate second description data, wherein the second description data includes the rewritten first description text; and

Based on the second description data, the first image is generated.

5. The method according to claim 4, wherein rewriting the first description text comprises:

determining the clarity of the first description text;

Based on the clarity, determining a rewriting strategy for the first description text; and

The first description text is rewritten using the rewriting strategy.

6. The method according to claim 4 or 5, wherein rewriting the first description text comprises:

The first description text is rewritten using the trained language model.

7. The method according to any one of claims 4 to 6, wherein generating the first image based on the second description data comprises:

determining an image generation model for generating the first image based on the data modality of the second description data; and

The second description data is input into the image generation model to obtain the first image output by the image generation model.

8. The method according to any one of claims 1 to 7, wherein the first description data comprises a first description text, and the method further comprises:

recommending a second image to the user; and

A second description text for describing the second image is output to guide the user to input the first description text.

9. The method according to claim 8, wherein the outputting a second description text for describing the second image comprises:

In response to the user's selection operation on the second image, the second description text is output.

10. The method according to any one of claims 1 to 9, further comprising:

Based on the first image, guidance information for guiding the user to conduct a next round of dialogue is generated and output.

11. The method according to any one of claims 1 to 10, further comprising:

determining, based on the user's behavior data, the user's satisfaction with the first image;

In response to the user being dissatisfied with the first image, determining a third image to be recommended to the user based on the first description data; and

The third image is output.

12. The method according to any one of claims 1 to 10, further comprising:

determining the user's satisfaction with the first image based on the user's behavior data; and

An image generation model for generating the first image is optimized based on the first image and the satisfaction level.

13. The method according to claim 11 or 12, wherein the behavior data includes the user's operation data on the at least one control and/or the user's input data in the next round of dialogue.

14. The method according to any one of claims 1 to 13, wherein the at least one control comprises at least one of the following:

a first control for regenerating the first image;

a second control for referencing the first image;

a third control for managing the first image; or

A fourth control is used to collect feedback data of the user on the first image.

15. The method according to claim 14, further comprising:

In response to the user's operation on the second control, referencing the first image in a next round of dialogue;

Acquire a third description text input by the user in the next round of dialogue, wherein the third description text is used to describe the user's editing requirement for the first image;

using the first image and the third description text as third description data for describing a fourth image to be generated in the next round of dialogue; and

Based on the third description data, the fourth image is generated.

16. An image generating device, comprising:

A first acquisition module is configured to acquire first description data input by the user in a current round of dialogue, wherein the first description data is used to describe a first image to be generated;

A first generating module, configured to generate the first image based on the first description data; and

The first output module is configured to output the first image and at least one control for the first image as response data for the current round of dialogue, wherein the at least one control corresponds to at least one action for the first image, and any one of the at least one control is configured to perform a corresponding action on the first image in response to the user's operation on the control.

17. The device according to claim 16, wherein the first acquisition module comprises:

An acquisition unit, configured to acquire input data of the user in the current round of dialogue and historical dialogue data in historical rounds of dialogue;

an identification unit configured to identify the user's intention in the current round of dialogue based on the input data and the historical dialogue data; and

The first determining unit is configured to use the input data as the first description data in response to the intention being an image generation intention.

18. The apparatus according to claim 16 or 17, wherein the first generating module comprises:

a second determining unit configured to determine an image generation model for generating the first image based on a data modality of the first description data; and

The first input unit is configured to input the first description data into the image generation model to obtain the first image output by the image generation model.

19. The apparatus according to claim 16 or 17, wherein the first description data comprises a first description text, and wherein the first generation module comprises:

a rewriting unit configured to rewrite the first description text to generate second description data, wherein the second description data includes the rewritten first description text; and

The generating unit is configured to generate the first image based on the second description data.

20. The device according to claim 19, wherein the rewriting unit comprises:

A first determining subunit, configured to determine the clarity of the first description text;

A second determining subunit is configured to determine a rewriting strategy for the first description text based on the clarity; and

The rewriting subunit is configured to rewrite the first description text using the rewriting strategy.

21. The apparatus according to claim 19 or 20, wherein the rewriting unit is further configured to:

The first description text is rewritten using the trained language model.

22. The apparatus according to any one of claims 19 to 21, wherein the generating unit comprises:

a third determining subunit, configured to determine an image generation model for generating the first image based on the data modality of the second description data; and

The input subunit is configured to input the second description data into the image generation model to obtain the first image output by the image generation model.

23. The apparatus according to any one of claims 16 to 22, wherein the first description data comprises a first description text, and the apparatus further comprises:

a recommendation module configured to recommend a second image to the user; and

The second output module is configured to output a second description text for describing the second image to guide the user to input the first description text.

24. The apparatus according to claim 23, wherein the second output module is further configured to:

25. The apparatus according to any one of claims 16 to 24, further comprising:

The second generating module is configured to generate and output guiding information for guiding the user to have a next round of dialogue based on the first image.

26. The apparatus according to any one of claims 16 to 25, further comprising:

A first determination module is configured to determine the user's satisfaction with the first image based on the user's behavior data;

a second determining module, configured to determine, in response to the user being dissatisfied with the first image, a third image to be recommended to the user based on the first description data; and

The third output module is configured to output the third image.

27. The apparatus according to any one of claims 16 to 25, further comprising:

A first determination module is configured to determine the user's satisfaction with the first image based on the user's behavior data; and

An optimization module is configured to optimize an image generation model for generating the first image based on the first image and the satisfaction level.

28. The device according to claim 26 or 27, wherein the behavior data includes the user's operation data on the at least one control and/or the user's input data in the next round of dialogue.

29. The apparatus of any one of claims 16 to 28, wherein the at least one control comprises at least one of the following:

a first control for regenerating the first image;

a second control for referencing the first image;

a third control for managing the first image; or

30. The apparatus of claim 29, further comprising:

a reference module, configured to reference the first image in a next round of dialogue in response to the user's operation on the second control;

a second acquisition module, configured to acquire a third description text input by the user in the next round of dialogue, wherein the third description text is used to describe the user's editing requirements for the first image;

a third determining module, configured to use the first image and the third description text as third description data for describing a fourth image to be generated in the next round of dialogue; and

The third generating module is configured to generate the fourth image based on the third description data.

31. An electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein

The memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method according to any one of claims 1 to 15.

32. A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to execute the method according to any one of claims 1-15.

33. A computer program product comprising computer program instructions, wherein the computer program instructions, when executed by a processor, implement the method of any one of claims 1-15.