[go: up one dir, main page]

WO2025048288A1 - Apparatus and method for sequential semantic generation communication in communication system - Google Patents

Apparatus and method for sequential semantic generation communication in communication system Download PDF

Info

Publication number
WO2025048288A1
WO2025048288A1 PCT/KR2024/011208 KR2024011208W WO2025048288A1 WO 2025048288 A1 WO2025048288 A1 WO 2025048288A1 KR 2024011208 W KR2024011208 W KR 2024011208W WO 2025048288 A1 WO2025048288 A1 WO 2025048288A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
image
terminal
model
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/KR2024/011208
Other languages
French (fr)
Korean (ko)
Inventor
김성륜
남혜린
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University Industry Foundation UIF of Yonsei University
Original Assignee
University Industry Foundation UIF of Yonsei University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020240084186A external-priority patent/KR20250033935A/en
Application filed by University Industry Foundation UIF of Yonsei University filed Critical University Industry Foundation UIF of Yonsei University
Publication of WO2025048288A1 publication Critical patent/WO2025048288A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/18Information format or content conversion, e.g. adaptation by the network of the transmitted or received information for the purpose of wireless delivery to users or terminals

Definitions

  • the present disclosure relates generally to communication systems, and more specifically to devices and methods for sequential semantic generative communication in communication systems.
  • AI artificial intelligence
  • 3GPP communication standards are also preparing for the next-generation 6G communication beyond 5G by applying deep learning technology to communication systems.
  • the advancement of these technologies requires rapid data exchange and information conversion in various industrial fields such as autonomous vehicles, mobile edge computing, and robotics.
  • generative models such as DALL-E, CLIP, and BLIP have recently achieved great results in the task of converting texts and images.
  • These multimodal generative models can understand and generate common meanings between two different data domains, namely texts and images. This plays an important role in utilizing the ability to interpret and explain data of various modalities in communication systems and to generate interactive cognitive meanings.
  • the present disclosure provides a device and method for sequential semantic generation communication in a communication system.
  • the present disclosure provides a device and method for maximizing communication efficiency through conversion between text and images by utilizing a multimodal generative model in a communication system.
  • the present disclosure provides a device and method for reducing communication load by conveying the meaning of image data using text in a communication system.
  • the present disclosure provides a device and method for fast and efficient communication by sequentially conveying the meaning of an image using text in a communication system.
  • the present disclosure provides a device and method for optimizing a transmission order according to the information importance of each word of a text in a communication system.
  • a method of operating a terminal in a communication system may include a process of receiving a first image, a process of generating a text prompt based on a first model from the first image, a process of evaluating the importance of a word included in the text prompt, and a process of transmitting the word to another terminal based on the importance of the word.
  • a terminal in a communication system, includes a transceiver and a control unit operably connected to the transceiver, wherein the control unit can receive a first image, generate a text prompt based on a first model from the first image, evaluate the importance of a word included in the text prompt, and transmit the word to another terminal based on the importance of the word.
  • Devices and methods according to various embodiments of the present disclosure enable fast and efficient data transmission while maintaining high similarity to the original image by reducing communication load and minimizing information loss through efficient conversion between text and images using a multimodal generative model.
  • the devices and methods according to various embodiments of the present disclosure can significantly improve data transmission efficiency and quality in various environments and conditions by optimizing the transmission order according to the information importance of each word through a sequential meaning transmission method, thereby reaching the target similarity level with a minimum number of transmission steps.
  • FIG. 1 illustrates a schematic diagram for sequential semantic communication according to one embodiment of the present disclosure.
  • FIG. 2 illustrates an example of a word transmission method in sequential semantic communication according to one embodiment of the present disclosure.
  • FIG. 3 illustrates simulation results according to transmission methods according to various embodiments of the present disclosure.
  • FIG. 4 is a diagram showing a device configuration according to various embodiments of the present disclosure.
  • FIG. 5 illustrates an example applied to vehicle-to-vehicle communication according to one embodiment of the present disclosure.
  • “at least one of A, B, and C” can mean “only A,” “only B,” “only C,” or “any combination of A, B, and C.” Additionally, “at least one of A, B, or C” or “at least one of A, B, and/or C” can mean “at least one of A, B, and C.”
  • the present disclosure relates to a device and method for sequential semantic generative communication in a communication system. Specifically, the present disclosure describes a technique for reducing communication load and minimizing information loss through efficient conversion between text and images using a multimodal generative model in a communication system, thereby quickly and efficiently transmitting data while maintaining high similarity to the original image.
  • FIG. 1 illustrates a schematic diagram for sequential semantic communication according to one embodiment of the present disclosure.
  • a transmitter for sequential semantic communication can convert an image into a text prompt (101).
  • the transmitter e.g., Alice
  • the transmitter can convert an original image into a text prompt using an image-to-text model ('Img2Txt Gen' model).
  • the transmitter can input an 'image of a cat running through a field of flowers' and generate a text prompt of 'A while cat running through a field of flowers'.
  • the operation (101) can be an operation to convert a main visual element of an image into text, thereby reducing the size of the transmission data and compressing the meaning.
  • the image-to-text model can be an image-based text decoder using a transformer-based text prompt generation technique.
  • the transmitter can prioritize words for sequential transmission (103). Specifically, words can be prioritized in order to sequentially transmit the words in the generated text prompt. For example, the transmitter can use the 'Txt2Img Gen' model and the LPIPS (learned perceptual image patch similarity) metric to evaluate how important each word is for image reconstruction.
  • LPIPS learned perceptual image patch similarity
  • the LPIPS metric measures the visual similarity between the original image and the reconstructed image, so that words with high similarity can be transmitted first. For example, in the sentence 'A white cat running through a field of flowers', 'cat' is evaluated as the most important word and can be transmitted first.
  • the transmitter can analyze the correlation between words using 'LM (language model)' and determine the order of words to be transmitted sequentially, starting from the word with the largest amount of information.
  • the LPIPS metric represents values from 0 to 1 and can express visual similarity.
  • the transmitter can transmit words with high importance to the receiver according to the order of the determined words (105). For example, the word "cat" can be transmitted as the first word.
  • each word is transmitted independently, which reduces the communication load and allows the image to be reconstructed incrementally for each action.
  • the present disclosure enables efficient communication between a transmitter and a receiver, and can reconstruct a high-quality image similar to an original image by compressively transmitting meaningful information through text. In addition, it can realize efficient data transmission while reducing the communication load and maintaining high similarity.
  • FIG. 2 illustrates an example of a word transmission method in sequential semantic communication according to an embodiment of the present disclosure.
  • FIG. 2 presents, but is not limited to, a lowest LPIPS Transmission (201), a Most Attentive Transmission (202), and a Least Attentive Transmission (203) as examples.
  • the transmitter can assume that it knows information about the receiver's text-to-image conversion processor. Therefore, assuming that both the transmitter and the receiver cache and store the necessary models, the transmitter can store and cache the receiver's model. By doing so, the transmitter can predict in advance the image that Bob will generate based on the transmitted word.
  • the transmitter may store only a certain amount of additional model parameters in addition to the required image-to-text transformation model.
  • the transmitter may cache or store all or only some of the receiver's parameters. Depending on these cases, the way the transmitter selects the order of words to transmit may vary.
  • the transmitter caches (stores) the entire model of the receiver, the transmitter can fully utilize all the capabilities of the receiver.
  • the transmitter can predict the generated image of the receiver with the text it has generated. Before the transmitter transmits a word, the transmitter can first predict Bob's expected result with all the words of the generated text prompt. That is, according to the lowest LPIPS transmission (201) method of FIG. 2, one embodiment of the present disclosure can be performed.
  • the transmitter caches (stores) part of the receiver's entire model
  • the transmitter can store only the language model (LM) and the attention module to determine the importance of words and the order in which they are transmitted.
  • LM language model
  • the text-to-image model of the receiver may include text embedding (language model (LM)), image generation (U-Net), and post-processing (UAE) processes.
  • the transmitter may cache only the LM among the models of the receiver to evaluate the importance of words, and the attention module may interpret the text and evaluate the association between words. That is, by using this method, an embodiment of the present disclosure may be performed according to at least one of the most attentive transmission (203) or the least attentive transmission (205) of FIG. 2.
  • the attention value is calculated as the product of the attention weight and the value, and the attention weight can be expressed in the form of a matrix representing the relationship between each word.
  • the objective image is an image (205), which can be expressed as a generated text prompt as a box (207) ('a white cat running through a field of flowers').
  • the transmitter can simulate the expected image reconstruction results when each word is transmitted using the entire model of the receiver.
  • the process of simulating the image reconstruction results can be a process of performing simulations for all candidate words.
  • the word that generates the image most similar to the original image can be selected and transmitted sequentially.
  • minimum LPIPS transmission can be transmitted by selecting the word that minimizes the LPIPS value.
  • the transmitter repeatedly transmits the words one by one, and the receiver that receives them repeatedly transmits the words one by one to reconstruct the final image.
  • the minimum LPIPS transmission can ensure that the reconstructed image becomes progressively more similar to the original image as each word is transmitted.
  • a minimal LPIPS transmission could be the sequential words 'cat', 'white', 'running', 'flower'. That is, by transmitting 'cat' as the first word, the main elements of the cat can be reconstructed. Then, by sequentially transmitting the words "white”, “running”, and “flower", the color, motion, and background elements of the cat can be progressively added.
  • Most attentive transmission (203) corresponds to the case where the transmitter can only cache a part of the receiver's text-to-image generation model, and in this case, LM is mainly used to analyze the association between words. Specifically, most attentive transmission can calculate the attention value between each word and select and transmit the word with the highest relevance. This can be a method to quickly convey the meaning by preferentially transmitting the word with the most important meaning in the sentence.
  • most attentive transmission (203) can calculate the attention value between words using a language model, select the most important word first and transmit it, and then sequentially transmit the word most related to the transmitted word.
  • the most attentive transmission (203) can transmit words in the order of 'cat', 'white', 'running', 'through', 'field', 'flowers', 'of', 'a'. That is, the most important word 'cat' is transmitted first to reconstruct the main elements of the cat, and then the words 'white' and 'running', which are most related to 'cat', are transmitted to add the color and movement of the cat. After that, the remaining words are sequentially transmitted in the order of their high relevance to the transmitted words so that background elements can be added.
  • Least attentive transmission corresponds to the case where the transmitter can only cache a part of the receiver's text-to-image generation model, and in this case, LM is mainly used to analyze the association between words. Specifically, most attentive transmission calculates the attention value between each word, and transmits the most important word first, and then selects and transmits the word with the least correlation to the first word. This can be a way to transmit unimportant information at the beginning of the transmission by preferentially transmitting words with less important meanings in the sentence.
  • most attentive transmission (205) can calculate the attention value between words using a language model, select the most important word first and transmit it, and then sequentially transmit the word with the least correlation to the transmitted word.
  • the least careful transmission can transmit words in the order of 'cat', 'of', 'white', 'running', 'through', 'a', 'field', 'flower'. Specifically, by transmitting the most important word 'cat' as the first word, the main elements of the cat can be reconstructed, and then less important words such as 'of', 'white', etc. can be transmitted to slowly reconstruct the image.
  • FIG. 3 illustrates simulation results according to transmission methods according to various embodiments of the present disclosure.
  • the lowest LPIPS transmission (301) has the lowest LPIPS value and can reach the target similarity level the fastest.
  • the most attentive transmission (303) has a low LPIPS value, but may have a value slightly higher than the lowest LPIPS transmission method.
  • the least attentive transmission (305) has the highest LPIPS value and may require many communication processes.
  • FIG. 4 is a diagram showing a device configuration according to various embodiments of the present disclosure.
  • the device of the present disclosure may include at least one processor (410), a memory (420), and a communication device (430) that is connected to a network and performs communication.
  • the communication node (400) may further include an input interface device (440), an output interface device (450), a storage device (460), etc.
  • Each component included in the device (400) may be connected by a bus (470) and communicate with each other.
  • each component included in the device (400) may be connected through an individual interface or individual bus centered around the processor (410), rather than a common bus (470).
  • the processor (410) may be connected to at least one of a memory (420), a communication device (430), an input interface device (440), an output interface device (450), and a storage device (460) through a dedicated interface.
  • the processor (410) can execute a program command stored in at least one of the memory (420) and the storage device (460).
  • the processor (410) may mean a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor in which methods according to embodiments of the present invention are performed.
  • Each of the memory (420) and the storage device (460) may be configured with at least one of a volatile storage medium and a nonvolatile storage medium.
  • the memory (420) may be configured with at least one of a read only memory (ROM) and a random access memory (RAM).
  • FIG. 5 illustrates an example applied to vehicle-to-vehicle communication according to one embodiment of the present disclosure.
  • vehicles (20, 10, 30) can transmit real-time data through communication with each other.
  • an image describing a specific road condition e.g., an accident, road construction, etc.
  • a text such as "A car accident at the intersection of Main St. and 1 st Ave" can be transmitted by converting the image into text.
  • important information in inter-vehicle communication needs to be transmitted with priority.
  • the most important information can be transmitted first, and secondary information can be transmitted later.
  • “Accident Ahead” can be transmitted as the first text, and then "At the intersection of Main St. and 1 st Ave", "Two vehicles involved” can be transmitted as secondary information.
  • each vehicle may be of a different make and model, and may use different sensors and data formats.
  • using a common text-based semantic representation can improve interoperability.
  • the present disclosure can be applied to vehicle-to-vehicle communication, and can provide great advantages, especially in terms of data transmission efficiency, real-time information sharing, and improved interoperability. Applying this technology to vehicle-to-vehicle communication will greatly improve the safety and efficiency of autonomous vehicles.
  • a computer-readable storage medium storing one or more programs (software modules) may be provided.
  • the one or more programs stored in the computer-readable storage medium are configured for execution by one or more processors in an electronic device.
  • the one or more programs include instructions that cause the electronic device to execute methods according to the embodiments described in the claims or specification of the present disclosure.
  • These programs may be stored in a random access memory, a non-volatile memory including flash memory, a read only memory (ROM), an electrically erasable programmable read only memory (EEPROM), a magnetic disc storage device, a compact disc-ROM (CD-ROM), digital versatile discs (DVDs) or other forms of optical storage devices, a magnetic cassette. Or, they may be stored in a memory composed of a combination of some or all of these. In addition, each configuration memory may be included in multiple numbers.
  • ROM read only memory
  • EEPROM electrically erasable programmable read only memory
  • CD-ROM compact disc-ROM
  • DVDs digital versatile discs
  • each configuration memory may be included in multiple numbers.
  • the program may be stored in an attachable storage device that is accessible via a communications network, such as the Internet, an Intranet, a local area network (LAN), a wide area network (WAN), or a storage area network (SAN), or a combination thereof.
  • the storage device may be connected to a device performing an embodiment of the present disclosure via an external port. Additionally, a separate storage device on the communications network may be connected to a device performing an embodiment of the present disclosure.
  • the components included in the disclosure are expressed in the singular or plural form depending on the specific embodiment presented.
  • the singular or plural expressions are selected to suit the presented situation for the convenience of explanation, and the present disclosure is not limited to the singular or plural components, and even if a component is expressed in the plural form, it may be composed of the singular form, or even if a component is expressed in the singular form, it may be composed of the plural form.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present disclosure relates generally to a communication system and, more particularly, to an apparatus and a method for sequential semantic generation communication in a communication system. An operation method of a terminal in a communication system may comprise the steps of: receiving a first image; on the basis of a first model, generating a text prompt from the first image; evaluating the importance of a word included in the text prompt; and on the basis of the importance of the word, transmitting the word to another terminal.

Description

통신 시스템에서 순차적 시맨틱 생성 통신을 위한 장치 및 방법Device and method for sequential semantic generation communication in a communication system

본 개시(disclosure)는 일반적으로 통신 시스템에 관한 것으로, 보다 구체적으로 통신 시스템에서 순차적 시맨틱 생성 통신을 위한 장치 및 방법에 관한 것이다.The present disclosure relates generally to communication systems, and more specifically to devices and methods for sequential semantic generative communication in communication systems.

최근 인공지능(artificial intelligence, AI) 기술의 발전은 통신 시스템에 큰 영향을 미치고 있습니다. 특히, 제한된 대역폭, 다양한 채널 조건, 다양한 사용자 요구 사항 등과 같은 문제를 해결하기 위해 AI 기술이 적극적으로 활용되고 있다. The recent development of artificial intelligence (AI) technology has had a significant impact on communication systems. In particular, AI technology is being actively utilized to solve problems such as limited bandwidth, various channel conditions, and various user requirements.

3GPP 통신 표준에서도 딥러닝 기술을 통신 시스템에 적용하여 5G를 넘어 차세대 6G 통신을 준비하고 있습니다. 이러한 기술의 발전은 자율 주행 차량, 모바일 엣지 컴퓨팅, 로봇 공학 등 다양한 산업 분야에서 빠른 데이터 교환과 정보 변환을 요구한다.3GPP communication standards are also preparing for the next-generation 6G communication beyond 5G by applying deep learning technology to communication systems. The advancement of these technologies requires rapid data exchange and information conversion in various industrial fields such as autonomous vehicles, mobile edge computing, and robotics.

특히, 최근에는 DALL-E, CLIP, BLIP와 같은 생성 모델이 텍스트와 이미지 간의 변환 작업에서 큰 성과를 거두고 있다. 이러한 멀티모달 생성 모델은 두 가지 다른 데이터 도메인, 즉 텍스트와 이미지 간의 공통 의미를 이해하고 생성할 수 있습니다. 이는 통신 시스템에서 다양한 모달리티의 데이터를 해석하고 설명하며, 상호작용적인 인지적 의미를 생성하는 능력을 활용하는 데 중요한 역할을 한다.In particular, generative models such as DALL-E, CLIP, and BLIP have recently achieved great results in the task of converting texts and images. These multimodal generative models can understand and generate common meanings between two different data domains, namely texts and images. This plays an important role in utilizing the ability to interpret and explain data of various modalities in communication systems and to generate interactive cognitive meanings.

상술한 바와 같은 논의를 바탕으로, 본 개시(disclosure)는, 통신 시스템에서 순차적 시맨틱 생성 통신을 위한 장치 및 방법을 제공한다.Based on the above discussion, the present disclosure provides a device and method for sequential semantic generation communication in a communication system.

또한, 본 개시는, 통신 시스템에서 멀티모달 생성 모델을 활용하여 텍스트와 이미지 간의 변환을 통해 통신 효율성을 극대화하기 위한 장치 및 방법을 제공한다.In addition, the present disclosure provides a device and method for maximizing communication efficiency through conversion between text and images by utilizing a multimodal generative model in a communication system.

또한, 본 개시는, 통신 시스템에서 텍스트를 이용하여 이미지 데이터의 의미를 전달함으로써 통신 부하를 줄이기 위한 장치 및 방법을 제공한다.In addition, the present disclosure provides a device and method for reducing communication load by conveying the meaning of image data using text in a communication system.

또한, 본 개시는, 통신 시스템에서 텍스트를 이용하여 이미지의 의미를 순차적으로 전달하여 빠르고 효율적인 통신을 위한 장치 및 방법을 제공한다.In addition, the present disclosure provides a device and method for fast and efficient communication by sequentially conveying the meaning of an image using text in a communication system.

또한, 본 개시는, 통신 시스템에서 텍스트의 각 단어의 정보 중요도에 따라 전송 순서를 최적화하기 위한 장치 및 방법을 제공한다.In addition, the present disclosure provides a device and method for optimizing a transmission order according to the information importance of each word of a text in a communication system.

본 개시의 다양한 실시 예들에 따르면, 통신 시스템에서 단말의 동작 방법은 제1 이미지를 수신하는 과정과, 제1 이미지로부터 제1 모델을 기반으로 텍스트 프롬프트를 생성하는 과정과, 텍스트 프롬프트에 포함된 단어의 중요성을 평가하는 과정과, 단어의 중요성에 기반하여 다른 단말로 단어를 송신하는 과정을 포함할 수 있다.According to various embodiments of the present disclosure, a method of operating a terminal in a communication system may include a process of receiving a first image, a process of generating a text prompt based on a first model from the first image, a process of evaluating the importance of a word included in the text prompt, and a process of transmitting the word to another terminal based on the importance of the word.

본 개시의 다양한 실시 예들에 따르면, 통신 시스템에서 단말은 송수신부와, 송수신부와 동작 가능하게 연결된 제어부를 포함하고, 제어부는, 제1 이미지를 수신하고, 제1 이미지로부터 제1 모델을 기반으로 텍스트 프롬프트를 생성하고, 텍스트 프롬프트에 포함된 단어의 중요성을 평가하고, 단어의 중요성에 기반하여 다른 단말로 단어를 송신할 수 있다.According to various embodiments of the present disclosure, in a communication system, a terminal includes a transceiver and a control unit operably connected to the transceiver, wherein the control unit can receive a first image, generate a text prompt based on a first model from the first image, evaluate the importance of a word included in the text prompt, and transmit the word to another terminal based on the importance of the word.

본 개시의 다양한 실시 예들에 따른 장치 및 방법은, 멀티모달 생성 모델을 활용한 텍스트와 이미지 간의 효율적 변환을 통해 통신 부하를 줄이고, 정보의 손실을 최소화함으로써, 원본 이미지와 높은 유사성을 유지하면서 빠르고 효율적인 데이터 전송을 할 수 있게 한다.Devices and methods according to various embodiments of the present disclosure enable fast and efficient data transmission while maintaining high similarity to the original image by reducing communication load and minimizing information loss through efficient conversion between text and images using a multimodal generative model.

또한, 본 개시의 다양한 실시 예들에 따른 장치 및 방법은, 순차적 의미 전송 방법을 통해 각 단어의 정보 중요도에 따라 전송 순서를 최적화함으로써 최소한의 전송 단계로 목표 유사성 수준에 도달할 수 있게 하여 다양한 환경과 조건에서 데이터 전송 효율성과 품질을 크게 향상시킬 수 있다.In addition, the devices and methods according to various embodiments of the present disclosure can significantly improve data transmission efficiency and quality in various environments and conditions by optimizing the transmission order according to the information importance of each word through a sequential meaning transmission method, thereby reaching the target similarity level with a minimum number of transmission steps.

본 개시에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtainable from the present disclosure are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by a person skilled in the art to which the present disclosure belongs from the description below.

도 1은 본 개시의 일 실시 예에 따라, 순차적 시맨틱 통신에 대한 개략도를 도시한다.FIG. 1 illustrates a schematic diagram for sequential semantic communication according to one embodiment of the present disclosure.

도 2는 본 개시의 일 실시 예에 따라, 순차적 시맨틱 통신에서 단어 전송 방식의 일 예를 도시한다.FIG. 2 illustrates an example of a word transmission method in sequential semantic communication according to one embodiment of the present disclosure.

도 3은 본 개시의 다양한 실시 예에 따라, 전송 방식에 따른 시뮬레이션 결과를 도시한다.FIG. 3 illustrates simulation results according to transmission methods according to various embodiments of the present disclosure.

도 4는 본 개시의 다양한 실시 예에 따른 장치 구성을 나타낸 도면이다.FIG. 4 is a diagram showing a device configuration according to various embodiments of the present disclosure.

도 5는 본 개시의 일 실시 예에 따라, 차량 간 통신에 적용된 일 예를 도시한다. FIG. 5 illustrates an example applied to vehicle-to-vehicle communication according to one embodiment of the present disclosure.

본 개시에서 사용되는 용어들은 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 다른 실시 예의 범위를 한정하려는 의도가 아닐 수 있다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 용어들은 본 개시에 기재된 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가질 수 있다. 본 개시에 사용된 용어들 중 일반적인 사전에 정의된 용어들은, 관련 기술의 문맥상 가지는 의미와 동일 또는 유사한 의미로 해석될 수 있으며, 본 개시에서 명백하게 정의되지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다. 경우에 따라서, 본 개시에서 정의된 용어일지라도 본 개시의 실시 예들을 배제하도록 해석될 수 없다.The terms used in this disclosure are only used to describe specific embodiments and may not be intended to limit the scope of other embodiments. The singular expression may include the plural expression unless the context clearly indicates otherwise. The terms used herein, including technical or scientific terms, may have the same meaning as commonly understood by a person having ordinary skill in the art described in this disclosure. Among the terms used in this disclosure, terms defined in general dictionaries may be interpreted as having the same or similar meaning as the meaning they have in the context of the related technology, and shall not be interpreted in an ideal or excessively formal meaning unless explicitly defined in this disclosure. In some cases, even if a term is defined in this disclosure, it cannot be interpreted to exclude embodiments of the present disclosure.

본 개시에서 사용되는 용어들은 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 다른 실시 예의 범위를 한정하려는 의도가 아닐 수 있다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 용어들은 본 개시에 기재된 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가질 수 있다. 본 개시에 사용된 용어들 중 일반적인 사전에 정의된 용어들은, 관련 기술의 문맥상 가지는 의미와 동일 또는 유사한 의미로 해석될 수 있으며, 본 개시에서 명백하게 정의되지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다. 경우에 따라서, 본 개시에서 정의된 용어일지라도 본 개시의 실시 예들을 배제하도록 해석될 수 없다.The terms used in this disclosure are only used to describe specific embodiments and may not be intended to limit the scope of other embodiments. The singular expression may include the plural expression unless the context clearly indicates otherwise. The terms used herein, including technical or scientific terms, may have the same meaning as commonly understood by a person having ordinary skill in the art described in this disclosure. Among the terms used in this disclosure, terms defined in general dictionaries may be interpreted as having the same or similar meaning as the meaning they have in the context of the related technology, and shall not be interpreted in an ideal or excessively formal meaning unless explicitly defined in this disclosure. In some cases, even if a term is defined in this disclosure, it cannot be interpreted to exclude embodiments of the present disclosure.

이하에서 설명되는 본 개시의 다양한 실시 예들에서는 하드웨어적인 접근 방법을 예시로서 설명한다. 하지만, 본 개시의 다양한 실시 예들에서는 하드웨어와 소프트웨어를 모두 사용하는 기술을 포함하고 있으므로, 본 개시의 다양한 실시 예들이 소프트웨어 기반의 접근 방법을 제외하는 것은 아니다.In the various embodiments of the present disclosure described below, a hardware-based approach is described as an example. However, since the various embodiments of the present disclosure include techniques using both hardware and software, the various embodiments of the present disclosure do not exclude a software-based approach.

또한, 본 개시의 상세한 설명 및 청구항에서는 “적어도 하나의 A, B 및 C(at least one of A, B, and C)는, “오직 A”, “오직 B”, “오직 C” 또는 “A, B 및 C의 임의의 모든 조합(any combination of A, B, and C)”를 의미할 수 있다. 또한, “적어도 하나의 A, B 또는 C(at least one of A, B, or C)”나 적어도 하나의 A, B, 및/또는 C(at least one of A, B, and/or C)”는 “적어도 하나의 A, B, 및 C(at least one of A, B, and C)”를 의미할 수 있다.Additionally, in the detailed description and claims of this disclosure, “at least one of A, B, and C” can mean “only A,” “only B,” “only C,” or “any combination of A, B, and C.” Additionally, “at least one of A, B, or C” or “at least one of A, B, and/or C” can mean “at least one of A, B, and C.”

이하 본 개시는 통신 시스템에서 순차적 시맨틱 생성 통신을 위한 장치 및 방법에 관한 것이다. 구체적으로, 본 개시는 통신 시스템에서 멀티모달 생성 모델을 활용한 텍스트와 이미지 간의 효율적 변환을 통해 통신 부하를 줄이고, 정보의 손실을 최소화함으로써, 원본 이미지와 높은 유사성을 유지하면서 빠르고 효율적인 데이터 전송하기 위한 기술을 설명한다.The present disclosure relates to a device and method for sequential semantic generative communication in a communication system. Specifically, the present disclosure describes a technique for reducing communication load and minimizing information loss through efficient conversion between text and images using a multimodal generative model in a communication system, thereby quickly and efficiently transmitting data while maintaining high similarity to the original image.

이하 설명에서 사용되는 신호를 지칭하는 용어, 채널을 지칭하는 용어, 제어 정보를 지칭하는 용어, 네트워크 객체(network entity)들을 지칭하는 용어, 장치의 구성 요소를 지칭하는 용어 등은 설명의 편의를 위해 예시된 것이다. 따라서, 본 개시가 후술되는 용어들에 한정되는 것은 아니며, 동등한 기술적 의미를 가지는 다른 용어가 사용될 수 있다.The terms referring to signals, channels, control information, network entities, and components of devices used in the following description are examples for convenience of explanation. Therefore, the present disclosure is not limited to the terms described below, and other terms having equivalent technical meanings may be used.

또한, 본 개시는, 일부 통신 규격(예: 3GPP(3rd Generation Partnership Project))에서 사용되는 용어들을 이용하여 다양한 실시 예들을 설명하지만, 이는 설명을 위한 예시일 뿐이다. 본 개시의 다양한 실시 예들은, 다른 통신 시스템에서도, 용이하게 변형되어 적용될 수 있다.In addition, although the present disclosure describes various embodiments using terms used in some communication standards (e.g., 3rd Generation Partnership Project (3GPP)), this is only an example for explanation. The various embodiments of the present disclosure can be easily modified and applied to other communication systems.

도 1은 본 개시의 일 실시 예에 따라, 순차적 시맨틱 통신에 대한 개략도를 도시한다.FIG. 1 illustrates a schematic diagram for sequential semantic communication according to one embodiment of the present disclosure.

도 1을 참조하면, 순차적 시맨틱 통신을 위한 송신기(이하, 송신기)는 이미지를 텍스트 프롬프트로 변환할 수 있다(101). 구체적으로, 송신자(예: Alice)는 이미지-텍스트 모델('Img2Txt Gen' 모델)을 이용하여 원본 이미지를 텍스트 프롬프트롤 변환할 수 있다. 예를 들어, 송신기는 '고양이가 꽃밭을 뛰어다니는 이미지'를 입력받아 'A while cat running through a field of flowers'라는 텍스트 프롬프트를 생성할 수 있다. 동작(101)은 이미지의 주요 시각적 요소를 텍스트로 변환하여, 전송 데이터의 크기를 줄이고 의미를 압축하는 동작일 수 있다.Referring to FIG. 1, a transmitter for sequential semantic communication (hereinafter, transmitter) can convert an image into a text prompt (101). Specifically, the transmitter (e.g., Alice) can convert an original image into a text prompt using an image-to-text model ('Img2Txt Gen' model). For example, the transmitter can input an 'image of a cat running through a field of flowers' and generate a text prompt of 'A while cat running through a field of flowers'. The operation (101) can be an operation to convert a main visual element of an image into text, thereby reducing the size of the transmission data and compressing the meaning.

일 실시 예에 따라, 이미지-텍스트 모델은 트랜스포머 기반 텍스트 프롬프트 생성 기술을 사용한 이미지 기반 텍스트 디코더일 수 있다.In one embodiment, the image-to-text model can be an image-based text decoder using a transformer-based text prompt generation technique.

송신기는 순차적 전송을 위한 우선 단어를 정렬할 수 있다(103). 구체적으로, 생성된 텍스트 프롬프트에서 단어들을 순차적으로 전송하기 위하여 단어를 중요도에 따라 정렬할 수 있다. 예를 들어, 송신기는 'Txt2Img Gen' 모델과 LPIPS(learned perceptual image patch similarity) 지표를 사용하여 각 단어가 이미지 재구성에 얼마나 중요한지 평가할 수 있다.The transmitter can prioritize words for sequential transmission (103). Specifically, words can be prioritized in order to sequentially transmit the words in the generated text prompt. For example, the transmitter can use the 'Txt2Img Gen' model and the LPIPS (learned perceptual image patch similarity) metric to evaluate how important each word is for image reconstruction.

일 실시 예에 따라, LPIPS 지표는 원본 이미지와 재구성된 이미지 간의 시각적 유사성을 측정하여, 유사성이 높은 단어를 우선 전송할 수 있게 한다. 예를 들어, 'A white cat running through a field of flowers'라는 문장에서 'cat'이 가장 중요한 단어로 평가되어 우선적으로 전송할 수 있다. 동작(103)에서, 송신기는 'LM(language model)'을 활용하여 단어 간의 연관성을 분석하고, 가장 정보량이 많은 단어부터 순차적으로 전송할 단어의 순서를 결정할 수 있다.In one embodiment, the LPIPS metric measures the visual similarity between the original image and the reconstructed image, so that words with high similarity can be transmitted first. For example, in the sentence 'A white cat running through a field of flowers', 'cat' is evaluated as the most important word and can be transmitted first. In operation (103), the transmitter can analyze the correlation between words using 'LM (language model)' and determine the order of words to be transmitted sequentially, starting from the word with the largest amount of information.

일 실시 예에 따라, LPIPS 지표는 0에서 1까지 값을 나타내고, 시각적 유사성을 표현할 수 있다.In one embodiment, the LPIPS metric represents values from 0 to 1 and can express visual similarity.

송신기는 결정된 단어의 순서에 따라 중요도가 높은 단어를 수신기에게 송신할 수 있다(105). 예를 들어, 첫 번째 단어로, "cat"이라는 단어가 전송될 수 있다. The transmitter can transmit words with high importance to the receiver according to the order of the determined words (105). For example, the word "cat" can be transmitted as the first word.

일 실시 예에 따라, 각 단어는 독립적으로 전송되며, 이는 통신 부하를 줄이고 각 동작별로 이미지를 점진적으로 재구성할 수 있게 한다.In one embodiment, each word is transmitted independently, which reduces the communication load and allows the image to be reconstructed incrementally for each action.

수신기("순차적 시맨틱 통신을 위한 송신기")를 수신된 단어를 기반으로 이미지를 재구성할 수 있다(107). 구체적으로, 수신기는 'Txt2Img Gen' 모델을 사용하여 수신된 단어를 기반으로 이미지를 생성할 수 있다. 예를 들어, 'cat'이라는 단어를 수신한 후, 수신기는 해당 단어를 기반으로 고양이 이미지를 재구성할 수 있다. 동작(107)에서 수신기는 전송된 단어를 사용하여 원본 이미지와 시각적으로 유사한 이미지를 생성할 수 있다.A receiver (a "transmitter for sequential semantic communication") can reconstruct an image based on the received word (107). Specifically, the receiver can generate an image based on the received word using the 'Txt2Img Gen' model. For example, after receiving the word 'cat', the receiver can reconstruct an image of a cat based on the word. In operation (107), the receiver can generate an image visually similar to the original image using the transmitted word.

도 1의 실시 예에 의하여, 본 개시는 송신기와 수신기 간의 효율적인 통신을 가능하게 하며, 텍스트를 통해 의미있는 정보를 압축적으로 전달하여 원본 이미지와 유사한 고품질 이미지를 재구성할 수 있다. 또한, 통신 부하를 줄이고, 높은 유사성을 유지하면서도 효율적인 데이터 전송을 실현할 수 있다.According to the embodiment of Fig. 1, the present disclosure enables efficient communication between a transmitter and a receiver, and can reconstruct a high-quality image similar to an original image by compressively transmitting meaningful information through text. In addition, it can realize efficient data transmission while reducing the communication load and maintaining high similarity.

도 2는 본 개시의 일 실시 예에 따라, 순차적 시맨틱 통신에서 단어 전송 방식의 일 예를 도시한다. 도 2에서는 최소 LPIPS 전송(lowest LPIPS Transmission)(201), 가장 주의하는 전송(Most Attentive Transmission)(202), 가장 적게 주의하는 전송(Least Attentive Transmission)(203)를 예로 제시하나, 이에 국한된 것은 아니다.FIG. 2 illustrates an example of a word transmission method in sequential semantic communication according to an embodiment of the present disclosure. FIG. 2 presents, but is not limited to, a lowest LPIPS Transmission (201), a Most Attentive Transmission (202), and a Least Attentive Transmission (203) as examples.

초기 통신 단계에서 수신기가 생성해야 하는 단어를 송신기가 최대한 유사하게 전달하기 위해 송신기는 수신기의 텍스트-이미지 변환 프로세서에 대한 정보를 알고 있다고 가정할 수 있다. 따라서, 송신기와 수신기가 모두 필요한 모델을 캐시하고 저장한다는 가정하에서 송신기는 수신기의 모델을 저장하고 캐시할 수 있다. 이렇게 함으로써 송신기는 전송된 단어를 기반으로 밥이 생성할 이미지를 미리 예측할 수 있다.In order for the transmitter to transmit the word that the receiver needs to generate as similarly as possible during the initial communication phase, the transmitter can assume that it knows information about the receiver's text-to-image conversion processor. Therefore, assuming that both the transmitter and the receiver cache and store the necessary models, the transmitter can store and cache the receiver's model. By doing so, the transmitter can predict in advance the image that Bob will generate based on the transmitted word.

일 실시 예에 따라, 송신기는 필요한 이미지-텍스트 변환 모델 외에 일정량의 추가 모델 매개변수만 저장하는 것이 적절할 수도 있다. 예를 들어, 송신기는 수신기를 모두 캐시, 또는 저장하거나 일부만 캐시, 또는 저장할 수 있다. 이러한 경우에 따라, 송신기가 전송할 단어의 순서를 선택하는 방법이 달라질 수 있다.In some embodiments, it may be appropriate for the transmitter to store only a certain amount of additional model parameters in addition to the required image-to-text transformation model. For example, the transmitter may cache or store all or only some of the receiver's parameters. Depending on these cases, the way the transmitter selects the order of words to transmit may vary.

송신기가 수신기의 전체 모델을 캐시(저장)하는 경우, 송신기는 수신기의 모든 기능을 완전히 활용할 수 있다. 송신기는 자신이 생성한 텍스트로 수신기의 생성된 이미지를 예측할 수 있다. 송신기가 단어를 전송하기 전, 송신기는 먼저 생성된 텍스트 프롬프트의 모든 단어로 밥의 예상 결과를 예측할 수 있다. 즉, 도 2의 최저 LPIPS 전송(201) 방식에 따라, 본 개시의 일 실시 예가 수행될 수 있다.If the transmitter caches (stores) the entire model of the receiver, the transmitter can fully utilize all the capabilities of the receiver. The transmitter can predict the generated image of the receiver with the text it has generated. Before the transmitter transmits a word, the transmitter can first predict Bob's expected result with all the words of the generated text prompt. That is, according to the lowest LPIPS transmission (201) method of FIG. 2, one embodiment of the present disclosure can be performed.

송신기가 수신기의 전체 모델의 일부를 캐시(저장)하는 경우, 송신기는 주로 언어 모델(LM)과 주의(attention) 모듈만을 저장하여 단어의 중요도와 전송 순서를 결정할 수 있다.When the transmitter caches (stores) part of the receiver's entire model, the transmitter can store only the language model (LM) and the attention module to determine the importance of words and the order in which they are transmitted.

구체적으로, 수신기의 텍스트-이미지 모델은 텍스트 임베딩(언어모델(LM)), 이미지 생성(U-Net), 및 후처리(UAE) 과정을 포함할 수 있다. 송신기는 수신기의 모델 중 LM만 캐시하여 단어의 중요도를 평가하고, 주의 모듈은 텍스트를 해석하고 단어 간의 연관성을 평가할 수 있다. 즉, 이러한 방식을 이용하여 도 2의 가장 주의하는 전송(203), 또는 가장 적게 주의하는 전송(205) 중 적어도 하나의 방식에 따라, 본 개시의 일 실시 예가 수행될 수 있다.Specifically, the text-to-image model of the receiver may include text embedding (language model (LM)), image generation (U-Net), and post-processing (UAE) processes. The transmitter may cache only the LM among the models of the receiver to evaluate the importance of words, and the attention module may interpret the text and evaluate the association between words. That is, by using this method, an embodiment of the present disclosure may be performed according to at least one of the most attentive transmission (203) or the least attentive transmission (205) of FIG. 2.

일 실시 예에 따라, 주의 값은 주의 가중치와 값의 곱으로 계산되고, 주의 가중치는 각 단어 간의 관계를 나타내는 행렬 형태로 표현될 수 있다.According to one embodiment, the attention value is calculated as the product of the attention weight and the value, and the attention weight can be expressed in the form of a matrix representing the relationship between each word.

도 2를 참조하면, 목표 이미지(objective image)는 이미지(205)이고, 이를 생성된 텍스트 프롬프트로 표현하면 박스(207)('a white cat running through a field of flowers')로 표현할 수 있다.Referring to FIG. 2, the objective image is an image (205), which can be expressed as a generated text prompt as a box (207) ('a white cat running through a field of flowers').

최소 LPIPS 전송(Lowest LPIPS Transmission)(201)의 경우, 송신기는 수신자의 전체 모델을 사용하여 각 단어가 전송될 때 예상되는 이미지 재구성 결과를 시뮬레이션할 수 있다. 일 실시 예에 따라, 이미지 재구성 결과를 시뮬레이션하는 과정은 모든 후보 단어에 대해 시뮬레이션을 수행하는 과정일 수 있다.For Lowest LPIPS Transmission (201), the transmitter can simulate the expected image reconstruction results when each word is transmitted using the entire model of the receiver. In one embodiment, the process of simulating the image reconstruction results can be a process of performing simulations for all candidate words.

이후, 모든 단어를 시뮬레이션한 후, 원본 이미지와 가장 유사한 이미지를 생성하는 단어를 선택하여 순차적으로 전송할 수 있다. 즉, 최소 LPIPS 전송은 LPIPS 값을 최소화하는 단어를 선택하여 전송할 수 있다.Afterwards, after simulating all words, the word that generates the image most similar to the original image can be selected and transmitted sequentially. In other words, minimum LPIPS transmission can be transmitted by selecting the word that minimizes the LPIPS value.

이후, 송신기는 반복적으로 단어를 하나씩 전송하고, 이를 수신한 수신기는 반복적으로 단어를 하나씩 전송하여 최종 이미지를 재구성할 수 있다.Afterwards, the transmitter repeatedly transmits the words one by one, and the receiver that receives them repeatedly transmits the words one by one to reconstruct the final image.

이를 통해, 최소 LPIPS 전송은 각 단어가 전송될 때마다 재구성된 이미지가 점진적으로 원본 이미지와 유사해질 수 있다.With this, the minimum LPIPS transmission can ensure that the reconstructed image becomes progressively more similar to the original image as each word is transmitted.

예를 들어, 최소 LPIPS 전송은 'cat', 'white', 'running', 'flower' 순어르 단어가 송신될 수 있다. 즉, 첫 번째 단어로 'cat'를 전송하여 고양이의 주요 요소가 재구성될 수 있다. 이후, "white", "running", "flower" 단어를 순차적으로 전송하여 고양이의 색상, 동작, 배경 요소가 점진적으로 추가될 수 있다.For example, a minimal LPIPS transmission could be the sequential words 'cat', 'white', 'running', 'flower'. That is, by transmitting 'cat' as the first word, the main elements of the cat can be reconstructed. Then, by sequentially transmitting the words "white", "running", and "flower", the color, motion, and background elements of the cat can be progressively added.

가장 주의하는 전송(most attentive transmission)(203)은 송신기가 수신기의 텍스트-이미지 생성 모델 중 일부만 캐시할 수 있는 경우에 대응되고, 이러한 경우 주로 LM을 사용하여 단어 간의 연관성을 분석한다. 구체적으로, 가장 주의하는 전송(most attentive transmission)은 각 단어 간의 주의(attention) 값을 계산하여, 가장 관련성이 높은 단어를 선택하여 전송할 수 있다. 이는 문장에서 가장 중요한 의미를 가진 단어를 우선적으로 전송하여, 빠르게 의미를 전달하는 방식일 수 있다.Most attentive transmission (203) corresponds to the case where the transmitter can only cache a part of the receiver's text-to-image generation model, and in this case, LM is mainly used to analyze the association between words. Specifically, most attentive transmission can calculate the attention value between each word and select and transmit the word with the highest relevance. This can be a method to quickly convey the meaning by preferentially transmitting the word with the most important meaning in the sentence.

따라서, 가장 주의하는 전송(most attentive transmission)(203)은 언어 모델을 이용하여 단어 간의 주의 값을 계산하고, 첫 번째로 가장 중요한 단어를 선택하여 송신하고, 이후 송신된 단어와 가장 관련성이 높은 단어를 순차적으로 전송할 수 있다.Therefore, most attentive transmission (203) can calculate the attention value between words using a language model, select the most important word first and transmit it, and then sequentially transmit the word most related to the transmitted word.

예를 들어, 가장 주의하는 전송(most attentive transmission)(203)은 'cat', 'white', 'running', 'through', 'field', 'flowers', 'of', 'a'의 순서로 단어를 전송할 수 있다. 즉, 첫 번째로 가장 중요한 단어인 'cat' 단어를 전송하여 고양이의 주요 요소를 재구성한 후, 'cat'과 가장 관련도가 높은 'white', 'running' 단어를 전송하여 고양이의 색상과 동작을 추가할 수 있다. 이후, 나머지 단어들이 송신된 단어와 연관성이 높은 순으로 순차적으로 전송되어 배경요소들이 추가될 수 있다.For example, the most attentive transmission (203) can transmit words in the order of 'cat', 'white', 'running', 'through', 'field', 'flowers', 'of', 'a'. That is, the most important word 'cat' is transmitted first to reconstruct the main elements of the cat, and then the words 'white' and 'running', which are most related to 'cat', are transmitted to add the color and movement of the cat. After that, the remaining words are sequentially transmitted in the order of their high relevance to the transmitted words so that background elements can be added.

가장 적게 주의하는 전송(least attentive transmission)(205)는 송신기가 수신기의 텍스트-이미지 생성 모델 중 일부만 캐시할 수 있는 경우에 대응되고, 이러한 경우 주로 LM을 사용하여 단어 간의 연관성을 분석한다. 구체적으로, 가장 주의하는 전송(most attentive transmission)은 각 단어 간의 주의(attention) 값을 계산하여, 첫번째 단어는 가장 중요한 단어를 전송하고 이후 첫번째 단어와 가장 관련성이 낮은 단어를 선택하여 전송할 수 있다. 이는 문장에서 덜 중요한 의미를 가진 단어를 우선적으로 전송하여, 전송 초기에 중요하지 않은 정보가 전달되는 방식일 수 있다.Least attentive transmission (205) corresponds to the case where the transmitter can only cache a part of the receiver's text-to-image generation model, and in this case, LM is mainly used to analyze the association between words. Specifically, most attentive transmission calculates the attention value between each word, and transmits the most important word first, and then selects and transmits the word with the least correlation to the first word. This can be a way to transmit unimportant information at the beginning of the transmission by preferentially transmitting words with less important meanings in the sentence.

따라서, 가장 적게 주의하는 전송(most attentive transmission)(205)은 언어 모델을 이용하여 단어 간의 주의 값을 계산하고, 첫 번째로 가장 중요한 단어를 선택하여 송신하고, 이후 송신된 단어와 가장 관련성이 낮은 단어를 순차적으로 전송할 수 있다.Therefore, most attentive transmission (205) can calculate the attention value between words using a language model, select the most important word first and transmit it, and then sequentially transmit the word with the least correlation to the transmitted word.

예를 들어, 가장 적게 주의하는 전송은 'cat', 'of', 'white', 'running', 'through', 'a', 'field', 'flower' 순으로 단어를 전송할 수 있다. 구체적으로, 첫번째 단어로 가장 중요한 'cat' 단어를 전송하여 고양이의 주요 요소를 재구성하고, 이후, 'of', 'white' 등의 덜 중요한 단어를 전송하여 이미지 재구성을 느리게 진행할 수 있다.For example, the least careful transmission can transmit words in the order of 'cat', 'of', 'white', 'running', 'through', 'a', 'field', 'flower'. Specifically, by transmitting the most important word 'cat' as the first word, the main elements of the cat can be reconstructed, and then less important words such as 'of', 'white', etc. can be transmitted to slowly reconstruct the image.

도 3은 본 개시의 다양한 실시 예에 따라, 전송 방식에 따른 시뮬레이션 결과를 도시한다.FIG. 3 illustrates simulation results according to transmission methods according to various embodiments of the present disclosure.

도 3을 참조하면, 최소 LPIPS 전송(lowest LPIPS Transmission)(301)은 LPIPS 값이 가장 낮으며, 가장 빠르게 목표 유사성 수준에 도달할 수 있다. 가장 주의하는 전송(Most Attentive Transmission)(303)은 LPIPS 값이 낮지만, 최소 LPIPS 전송 방식보다는 다소 높은 값을 가질 수 있다. 가장 적게 주의하는 전송(Least Attentive Transmission)(305)은 LPIPS 값이 가장 높으며, 많은 통신 과정이 필요할 수 있다.Referring to Fig. 3, the lowest LPIPS transmission (301) has the lowest LPIPS value and can reach the target similarity level the fastest. The most attentive transmission (303) has a low LPIPS value, but may have a value slightly higher than the lowest LPIPS transmission method. The least attentive transmission (305) has the highest LPIPS value and may require many communication processes.

도 4는 본 개시의 다양한 실시 예에 따른 장치 구성을 나타낸 도면이다. 도 4를 참조하면, 본 개시의 장치는 적어도 하나의 프로세서(410), 메모리(420) 및 네트워크와 연결되어 통신을 수행하는 통신 장치(430)를 포함할 수 있다. 또한, 통신 노드(400)는 입력 인터페이스 장치(440), 출력 인터페이스 장치(450), 저장 장치(460) 등을 더 포함할 수 있다. 장치(400)에 포함된 각각의 구성 요소들은 버스(bus)(470)에 의해 연결되어 서로 통신을 수행할 수 있다.FIG. 4 is a diagram showing a device configuration according to various embodiments of the present disclosure. Referring to FIG. 4, the device of the present disclosure may include at least one processor (410), a memory (420), and a communication device (430) that is connected to a network and performs communication. In addition, the communication node (400) may further include an input interface device (440), an output interface device (450), a storage device (460), etc. Each component included in the device (400) may be connected by a bus (470) and communicate with each other.

다만, 장치(400)에 포함된 각각의 구성요소들은 공통 버스(470)가 아니라, 프로세서(410)를 중심으로 개별 인터페이스 또는 개별 버스를 통하여 연결될 수도 있다. 예를 들어, 프로세서(410)는 메모리(420), 통신 장치(430), 입력 인터페이스 장치(440), 출력 인터페이스 장치(450) 및 저장 장치(460) 중에서 적어도 하나와 전용 인터페이스를 통하여 연결될 수도 있다.However, each component included in the device (400) may be connected through an individual interface or individual bus centered around the processor (410), rather than a common bus (470). For example, the processor (410) may be connected to at least one of a memory (420), a communication device (430), an input interface device (440), an output interface device (450), and a storage device (460) through a dedicated interface.

프로세서(410)는 메모리(420) 및 저장 장치(460) 중에서 적어도 하나에 저장된 프로그램 명령(program command)을 실행할 수 있다. 프로세서(410)는 중앙 처리 장치(central processing unit, CPU), 그래픽 처리 장치(graphics processing unit, GPU), 또는 본 발명의 실시예들에 따른 방법들이 수행되는 전용의 프로세서를 의미할 수 있다. 메모리(420) 및 저장 장치(460) 각각은 휘발성 저장 매체 및 비휘발성 저장매체 중에서 적어도 하나로 구성될 수 있다. 예를 들어, 메모리(420)는 읽기 전용 메모리(read only memory, ROM) 및 랜덤 액세스 메모리(random access memory, RAM) 중에서 적어도 하나로 구성될 수 있다.The processor (410) can execute a program command stored in at least one of the memory (420) and the storage device (460). The processor (410) may mean a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor in which methods according to embodiments of the present invention are performed. Each of the memory (420) and the storage device (460) may be configured with at least one of a volatile storage medium and a nonvolatile storage medium. For example, the memory (420) may be configured with at least one of a read only memory (ROM) and a random access memory (RAM).

도 5는 본 개시의 일 실시 예에 따라, 차량 간 통신에 적용된 일 예를 도시한다. FIG. 5 illustrates an example applied to vehicle-to-vehicle communication according to one embodiment of the present disclosure.

도 5를 참조하면, 차량(20, 10, 30)은 서로 통신을 통하여 실시간 데이터를 전송할 수 있다. Referring to FIG. 5, vehicles (20, 10, 30) can transmit real-time data through communication with each other.

본 개시는 자율 주행 차량의 경우, 도로 상황, 교통 신호, 장애물 등의 정보를 신속하게 주고받을 필요가 있다. 이 때, 이미지 데이터를 텍스트로 변환하여 전송하는 방식은 데이터 전송량을 줄이고, 통신 효율성을 높이는 데 유용할 수 있다. 예를 들어, 도로의 특정 상황(예: 사고 발생, 도로 공사 등)을 설명하는 이미지를 텍스트로 변환하여 전송할 수 있다. 즉, 이미지를 텍스트로 변환하여 "A car accident at the intersection of Main St. and 1st Ave"와 같은 텍스를 전송할 수 있다.In the case of autonomous vehicles, the present disclosure requires rapid exchange of information on road conditions, traffic signals, obstacles, etc. At this time, a method of converting image data into text and transmitting it can be useful for reducing the amount of data transmitted and increasing communication efficiency. For example, an image describing a specific road condition (e.g., an accident, road construction, etc.) can be converted into text and transmitted. That is, a text such as "A car accident at the intersection of Main St. and 1 st Ave" can be transmitted by converting the image into text.

또한, 차량 간 통신에서 중요한 정보는 우선적으로 전송되어야 할 필요가 있다. 본 개시의 실시 예에 따르면, 가장 중요한 정보부터 전송하고, 부차적인 정보는 이후에 전송할 수 있다. 예를 들어, 첫 번째 텍스트로 "Accident Ahead", 그 다음 부차적인 정보로서 "At the intersection of Main St. and 1st Ave", "Two vehicle involved"를 송신할 수 있다.In addition, important information in inter-vehicle communication needs to be transmitted with priority. According to an embodiment of the present disclosure, the most important information can be transmitted first, and secondary information can be transmitted later. For example, "Accident Ahead" can be transmitted as the first text, and then "At the intersection of Main St. and 1 st Ave", "Two vehicles involved" can be transmitted as secondary information.

또한, 차량 간 통신에서 각 차량은 서로 다른 제조사와 모델일 수 있으며, 다양한 센서와 데이터 포맷을 사용할 수 있다. 이러한 경우, 공통의 텍스트 기반 의미 표현을 사용하면 상호 운용성을 높일 수 있다.Additionally, in vehicle-to-vehicle communication, each vehicle may be of a different make and model, and may use different sensors and data formats. In such cases, using a common text-based semantic representation can improve interoperability.

정리하면, 본 개시는 차량 간 통신에도 적용될 수 있으며, 특히 데이터 전송 효율성, 실시간 정보 공유, 상호 운용성 향상 측면에서 큰 이점을 제공할 수 있다. 이러한 기술을 차량 간 통신에 적용하면 자율 주행 차량의 안전성과 효율성을 크게 향상시킬 수 있을 것입니다.In summary, the present disclosure can be applied to vehicle-to-vehicle communication, and can provide great advantages, especially in terms of data transmission efficiency, real-time information sharing, and improved interoperability. Applying this technology to vehicle-to-vehicle communication will greatly improve the safety and efficiency of autonomous vehicles.

개시의 청구항 또는 명세서에 기재된 실시 예들에 따른 방법들은 하드웨어, 소프트웨어, 또는 하드웨어와 소프트웨어의 조합의 형태로 구현될(implemented) 수 있다. The methods according to the embodiments described in the claims or specification of the disclosure may be implemented in the form of hardware, software, or a combination of hardware and software.

소프트웨어로 구현하는 경우, 하나 이상의 프로그램(소프트웨어 모듈)을 저장하는 컴퓨터 판독 가능 저장 매체가 제공될 수 있다. 컴퓨터 판독 가능 저장 매체에 저장되는 하나 이상의 프로그램은, 전자 장치(device) 내의 하나 이상의 프로세서에 의해 실행 가능하도록 구성된다(configured for execution). 하나 이상의 프로그램은, 전자 장치로 하여금 본 개시의 청구항 또는 명세서에 기재된 실시 예들에 따른 방법들을 실행하게 하는 명령어(instructions)를 포함한다. In the case of software implementation, a computer-readable storage medium storing one or more programs (software modules) may be provided. The one or more programs stored in the computer-readable storage medium are configured for execution by one or more processors in an electronic device. The one or more programs include instructions that cause the electronic device to execute methods according to the embodiments described in the claims or specification of the present disclosure.

이러한 프로그램(소프트웨어 모듈, 소프트웨어)은 랜덤 액세스 메모리 (random access memory), 플래시(flash) 메모리를 포함하는 불휘발성(non-volatile) 메모리, 롬(read only memory, ROM), 전기적 삭제가능 프로그램가능 롬(electrically erasable programmable read only memory, EEPROM), 자기 디스크 저장 장치(magnetic disc storage device), 컴팩트 디스크 롬(compact disc-ROM, CD-ROM), 디지털 다목적 디스크(digital versatile discs, DVDs) 또는 다른 형태의 광학 저장 장치, 마그네틱 카세트(magnetic cassette)에 저장될 수 있다. 또는, 이들의 일부 또는 전부의 조합으로 구성된 메모리에 저장될 수 있다. 또한, 각각의 구성 메모리는 다수 개 포함될 수도 있다. These programs (software modules, software) may be stored in a random access memory, a non-volatile memory including flash memory, a read only memory (ROM), an electrically erasable programmable read only memory (EEPROM), a magnetic disc storage device, a compact disc-ROM (CD-ROM), digital versatile discs (DVDs) or other forms of optical storage devices, a magnetic cassette. Or, they may be stored in a memory composed of a combination of some or all of these. In addition, each configuration memory may be included in multiple numbers.

또한, 프로그램은 인터넷(Internet), 인트라넷(Intranet), LAN(local area network), WAN(wide area network), 또는 SAN(storage area network)과 같은 통신 네트워크, 또는 이들의 조합으로 구성된 통신 네트워크를 통하여 접근(access)할 수 있는 부착 가능한(attachable) 저장 장치(storage device)에 저장될 수 있다. 이러한 저장 장치는 외부 포트를 통하여 본 개시의 실시 예를 수행하는 장치에 접속할 수 있다. 또한, 통신 네트워크상의 별도의 저장장치가 본 개시의 실시 예를 수행하는 장치에 접속할 수도 있다.Additionally, the program may be stored in an attachable storage device that is accessible via a communications network, such as the Internet, an Intranet, a local area network (LAN), a wide area network (WAN), or a storage area network (SAN), or a combination thereof. The storage device may be connected to a device performing an embodiment of the present disclosure via an external port. Additionally, a separate storage device on the communications network may be connected to a device performing an embodiment of the present disclosure.

상술한 본 개시의 구체적인 실시 예들에서, 개시에 포함되는 구성 요소는 제시된 구체적인 실시 예에 따라 단수 또는 복수로 표현되었다. 그러나, 단수 또는 복수의 표현은 설명의 편의를 위해 제시한 상황에 적합하게 선택된 것으로서, 본 개시가 단수 또는 복수의 구성 요소에 제한되는 것은 아니며, 복수로 표현된 구성 요소라 하더라도 단수로 구성되거나, 단수로 표현된 구성 요소라 하더라도 복수로 구성될 수 있다.In the specific embodiments of the present disclosure described above, the components included in the disclosure are expressed in the singular or plural form depending on the specific embodiment presented. However, the singular or plural expressions are selected to suit the presented situation for the convenience of explanation, and the present disclosure is not limited to the singular or plural components, and even if a component is expressed in the plural form, it may be composed of the singular form, or even if a component is expressed in the singular form, it may be composed of the plural form.

한편 본 개시의 상세한 설명에서는 구체적인 실시 예에 관해 설명하였으나, 본 개시의 범위에서 벗어나지 않는 한도 내에서 여러 가지 변형이 가능함은 물론이다. 그러므로 본 개시의 범위는 설명된 실시 예에 국한되어 정해져서는 아니 되며 후술하는 특허청구의 범위뿐만 아니라 이 특허청구의 범위와 균등한 것들에 의해 정해져야 한다.Meanwhile, although the detailed description of the present disclosure has described specific embodiments, it is obvious that various modifications are possible within the scope of the present disclosure. Therefore, the scope of the present disclosure should not be limited to the described embodiments, but should be determined not only by the scope of the claims described below, but also by equivalents of the scope of the claims.

Claims (10)

통신 시스템에서 단말의 동작 방법에 있어서,In the method of operating a terminal in a communication system, 제1 이미지를 수신하는 과정과,The process of receiving the first image, 상기 제1 이미지로부터 제1 모델을 기반으로 텍스트 프롬프트를 생성하는 과정과,A process of generating a text prompt based on the first model from the first image above, 상기 텍스트 프롬프트에 포함된 단어의 중요성을 평가하는 과정과,The process of evaluating the importance of words included in the above text prompt, 상기 단어의 중요성에 기반하여 다른 단말로 상기 단어를 송신하는 과정을 포함하는, 방법A method comprising the process of transmitting said word to another terminal based on the importance of said word. 청구항 1에 있어서, 제2 모델은 상기 다른 단말이 상기 단어에 의한 제1 이미지를 예측하기 위한 모델이고,In claim 1, the second model is a model for predicting the first image by the word by the other terminal, 상기 제2 모델은 상기 단말에 일부, 또는 전부 저장되어 있는, 방법The above second model is partially or completely stored in the terminal, 청구항 2에 있어서,In claim 2, 상기 단말에 저장되어 있는 상기 제2 모델에 기반하여, 상기 단어에 의하여 제2 이미지를 예측하는 과정과,A process of predicting a second image based on the second model stored in the terminal, and 상기 제1 이미지와 상기 제2 이미지의 유사성을 식별하는 과정과,A process for identifying the similarity between the first image and the second image, 상기 유사성이 가장 높은 단어를 선택하는 과정과,The process of selecting the word with the highest similarity, 상기 가장 높은 단어를 순차적으로 상기 다른 단말에 송신하는 과정을 포함하는, 방법A method comprising a process of sequentially transmitting the highest word to the other terminal. 청구항 2에 있어서,In claim 2, 상기 단말에 저장되어 있는 상기 제2 모델에 기반하여, 상기 단어에 의하여 제2 이미지를 예측하는 과정은,The process of predicting a second image by the word based on the second model stored in the terminal is as follows: 상기 제1 이미지에서 가장 중요한 단어를 선택하고 송신하는 과정과,The process of selecting and transmitting the most important word from the first image above, 상기 가장 중요한 단어와 관련성이 높은 순서대로 순차적으로 송신하는 과정을 포함하는, 방법A method comprising the process of sequentially transmitting the most important words in order of high relevance to the above 청구항 2에 있어서,In claim 2, 상기 단말에 저장되어 있는 상기 제2 모델에 기반하여, 상기 단어에 의하여 제2 이미지를 예측하는 과정은,The process of predicting a second image by the word based on the second model stored in the terminal is as follows: 상기 제1 이미지에서 가장 중요한 단어를 선택하고 송신하는 과정과,The process of selecting and transmitting the most important word from the first image above, 상기 가장 중요한 단어와 관련성이 낮은 순서대로 순차적으로 송신하는 과정을 포함하는, 방법A method comprising the process of sequentially transmitting the most important words in order of decreasing relevance to the above words. 통신 시스템에서 단말에 있어서,In a terminal in a communication system, 송수신부와,Transmitter and receiver, 상기 송수신부와 동작 가능하게 연결된 제어부를 포함하고,A control unit operably connected to the above transmitter and receiver, 상기 제어부는,The above control unit, 제1 이미지를 수신하고,Receive the first image, 상기 제1 이미지로부터 제1 모델을 기반으로 텍스트 프롬프트를 생성하고,Generate a text prompt based on the first model from the first image above, 상기 텍스트 프롬프트에 포함된 단어의 중요성을 평가하고,Evaluate the importance of words included in the above text prompt, 상기 단어의 중요성에 기반하여 다른 단말로 상기 단어를 송신하는, 장치A device that transmits said word to another terminal based on the importance of said word. 청구항 6에 있어서, 제2 모델은 상기 다른 단말이 상기 단어에 의한 제1 이미지를 예측하기 위한 모델이고,In claim 6, the second model is a model for predicting the first image by the word by the other terminal, 상기 제2 모델은 상기 단말에 일부, 또는 전부 저장되어 있는, 장치The above second model is a device, partly or wholly stored in the terminal. 청구항 7에 있어서, 상기 제어부는, 상기 단말에 저장되어 있는 상기 제2 모델에 기반하여, 상기 단어에 의하여 제2 이미지를 예측하기 위하여,In claim 7, the control unit predicts the second image by the word based on the second model stored in the terminal. 상기 제1 이미지와 상기 제2 이미지의 유사성을 식별하고,Identifying the similarity between the first image and the second image, 상기 유사성이 가장 높은 단어를 선택하고,Select the word with the highest similarity, 상기 가장 높은 단어를 순차적으로 상기 다른 단말에 송신하는, 장치A device that sequentially transmits the highest word to the other terminal. 청구항 7에 있어서,In claim 7, 상기 제어부는, 상기 단말에 저장되어 있는 상기 제2 모델에 기반하여, 상기 단어에 의하여 제2 이미지를 예측하기 위하여,The above control unit, based on the second model stored in the terminal, predicts the second image by the word. 상기 제1 이미지에서 가장 중요한 단어를 선택하고 송신하고,Select the most important word from the first image above and send it, 상기 가장 중요한 단어와 관련성이 높은 순서대로 순차적으로 송신하는 과정을 포함하는, 장치A device comprising a process of sequentially transmitting the most important words in order of high relevance to the above 청구항 7에 있어서, 상기 제어부는, 상기 단말에 저장되어 있는 상기 제2 모델에 기반하여, 상기 단어에 의하여 제2 이미지를 예측하기 위하여,In claim 7, the control unit predicts the second image by the word based on the second model stored in the terminal. 상기 제1 이미지에서 가장 중요한 단어를 선택하고 송신하고,Select the most important word from the first image above and send it, 상기 가장 중요한 단어와 관련성이 낮은 순서대로 순차적으로 송신하는, 장치A device that sequentially transmits the most important words and the words with the lowest relevance.
PCT/KR2024/011208 2023-08-29 2024-07-31 Apparatus and method for sequential semantic generation communication in communication system Pending WO2025048288A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2023-0113485 2023-08-29
KR20230113485 2023-08-29
KR1020240084186A KR20250033935A (en) 2023-08-29 2024-06-27 Apparatus and method for sequential semantic generative communication in wireless communication system
KR10-2024-0084186 2024-06-27

Publications (1)

Publication Number Publication Date
WO2025048288A1 true WO2025048288A1 (en) 2025-03-06

Family

ID=94820029

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2024/011208 Pending WO2025048288A1 (en) 2023-08-29 2024-07-31 Apparatus and method for sequential semantic generation communication in communication system

Country Status (1)

Country Link
WO (1) WO2025048288A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190049886A (en) * 2017-01-25 2019-05-09 구글 엘엘씨 Automatic suggestion responses to images received in messages using the language model
KR102148331B1 (en) * 2019-02-11 2020-08-26 이찬희 System and method for providing contents for the blind and recording medium storing program to implement the method
KR20210130980A (en) * 2020-04-23 2021-11-02 한국과학기술원 Apparatus and method for automatically generating domain specific image caption using semantic ontology
KR20230062430A (en) * 2021-10-29 2023-05-09 서울대학교산학협력단 Method, apparatus and system for determining story-based image sequence
KR20230072454A (en) * 2021-11-17 2023-05-24 주식회사 Lg 경영개발원 Apparatus, method and program for bidirectional generation between image and text

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190049886A (en) * 2017-01-25 2019-05-09 구글 엘엘씨 Automatic suggestion responses to images received in messages using the language model
KR102148331B1 (en) * 2019-02-11 2020-08-26 이찬희 System and method for providing contents for the blind and recording medium storing program to implement the method
KR20210130980A (en) * 2020-04-23 2021-11-02 한국과학기술원 Apparatus and method for automatically generating domain specific image caption using semantic ontology
KR20230062430A (en) * 2021-10-29 2023-05-09 서울대학교산학협력단 Method, apparatus and system for determining story-based image sequence
KR20230072454A (en) * 2021-11-17 2023-05-24 주식회사 Lg 경영개발원 Apparatus, method and program for bidirectional generation between image and text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NAM HYELIN, KIM JINHYUK, PARK JIHONG, KIM SEONG-LYUN: "Semantic Transmission based Semantic Communication through Image-Text Multimodal Transformation", 2023 KOREAN INSTITUTE OF COMMUNICATION SCIENCES SUMMER CONFERENCE, KOREA INSTITUTE OF COMMUNICATION SCIENCES, KOREA, vol. 2023, 21 June 2023 (2023-06-21) - 24 June 2023 (2023-06-24), Korea, pages 0712 - 0713, XP093284574 *

Similar Documents

Publication Publication Date Title
CN111368993B (en) Data processing method and related equipment
WO2020027540A1 (en) Apparatus and method for personalized natural language understanding
CN112487182A (en) Training method of text processing model, and text processing method and device
CN111461226A (en) Adversarial sample generation method, device, terminal and readable storage medium
WO2022068627A1 (en) Data processing method and related device
JP2022103149A (en) Image processing methods and computing devices
US20230099117A1 (en) Spiking neural network-based data processing method, computing core circuit, and chip
CN113486665A (en) Privacy protection text named entity recognition method, device, equipment and storage medium
CN115118675B (en) Data stream transmission acceleration method and system based on intelligent network card equipment
CN107766319B (en) Sequence conversion method and device
WO2022146080A1 (en) Algorithm and method for dynamically changing quantization precision of deep-learning network
WO2020192523A1 (en) Translation quality detection method and apparatus, machine translation system, and storage medium
WO2025060878A1 (en) Information processing method, electronic device and computer readable storage medium
CN116821307B (en) Content interaction method, device, electronic equipment and storage medium
CN114333790A (en) Data processing method, device, equipment, storage medium and program product
WO2023033194A1 (en) Knowledge distillation method and system specialized for pruning-based deep neural network lightening
WO2025048288A1 (en) Apparatus and method for sequential semantic generation communication in communication system
WO2019107625A1 (en) Machine translation method and apparatus therefor
KR102465680B1 (en) Method for, device for, and system for tracking a dialogue state
WO2023171886A1 (en) Deep learning-based molecule design method, and device and computer program for performing same
CN114510911A (en) Text processing method and device, computer equipment and storage medium
KR20250033935A (en) Apparatus and method for sequential semantic generative communication in wireless communication system
CN114360500B (en) Speech recognition method and device, electronic equipment and storage medium
CN114169295B (en) Model training and text generation method, device, electronic device and storage medium
WO2022107951A1 (en) Method for training ultra-lightweight deep learning network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24860209

Country of ref document: EP

Kind code of ref document: A1