[go: up one dir, main page]

WO2025018681A1 - Dispositif électronique pour générer automatiquement une forme courte sur la base de données collectées par intelligence artificielle, et son procédé d'utilisation - Google Patents

Dispositif électronique pour générer automatiquement une forme courte sur la base de données collectées par intelligence artificielle, et son procédé d'utilisation Download PDF

Info

Publication number
WO2025018681A1
WO2025018681A1 PCT/KR2024/009799 KR2024009799W WO2025018681A1 WO 2025018681 A1 WO2025018681 A1 WO 2025018681A1 KR 2024009799 W KR2024009799 W KR 2024009799W WO 2025018681 A1 WO2025018681 A1 WO 2025018681A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
short form
short
markup language
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/KR2024/009799
Other languages
English (en)
Korean (ko)
Inventor
이재호
박기웅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nigc Co Ltd
Original Assignee
Nigc Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nigc Co Ltd filed Critical Nigc Co Ltd
Publication of WO2025018681A1 publication Critical patent/WO2025018681A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services

Definitions

  • the present disclosure relates to an electronic device for automatically generating a short form based on data collected through artificial intelligence, and a method using the same. More specifically, the present disclosure relates to a device for automatically generating a short form related to a markup language body using data extracted from a markup language body source through an artificial intelligence learning model, and a method using the same.
  • short-form video data with short video playback time is attracting people's attention.
  • short-form video is enough to capture the attention of users if it is a field that users are interested in. This can be the basis for the recent demand for short-form video.
  • An embodiment of the present disclosure has been proposed to solve the above-described problem, and can automatically provide content that is expected to be of high interest to users by extracting attribute information through various data from a markup language body source and generating it in short form.
  • An electronic device for automatically generating a short form based on data collected via artificial intelligence includes: a content collection module for collecting data related to the automatic generation of the short form; and a processor for controlling an operation of a short form providing module for providing data related to the automatic generation of the short form as the short form, wherein the processor may be configured to collect first data and second data from at least one markup language body via the content collection module, and to provide at least one short form related to the markup language body via the short form providing module.
  • the processor may be configured to confirm, through the content collection module, rank information corresponding to each of the first data and the second data and event information corresponding to the markup language body, and, through the short form provision module, reflect the rank information corresponding to the first data and the event information corresponding to the first markup language body or the second markup language body from which the first data is collected, thereby generating a first short form based on the first data, and, through the short form provision module, providing the first short form.
  • the processor may be configured to generate a second short form based on the second data by reflecting rank information corresponding to the second data and event information corresponding to the first markup language body or the second markup language body from which the second data is collected through the short form providing module, and provide the second short form through the short form providing module.
  • the processor may be characterized in that, when collecting third data from a markup language body other than the first markup language body or the second markup language body through the content collection module, the processor is set to provide, through the short form providing module, a third short form corresponding to the third data collected from the other markup language body based on information accumulated in a category database and a dictionary database in the process of generating the first short form and the second short form through the first data and the second data.
  • the processor may be characterized in that it is set to identify keywords based on texts constituting the first data and the second data, extract a first keyword for the first data and the second data using a preset first regular expression, extract a second keyword after extraction of the first keyword using a preset second regular expression, extract a third keyword using a preset third regular expression, and provide keywords having an extraction frequency higher than a preset frequency among the extracted third keywords as separate keyword data, and store the separate keyword data in a dictionary database.
  • the processor may be characterized in that it is set to identify keywords based on texts constituting the first data and the second data, provide keywords with an extraction frequency higher than a preset frequency among the keywords as separate keyword data, and, when the separate keyword data is stored in a dictionary database for storing the keywords, determine a regular expression for extracting the separate keyword in the first data and the second data as a fourth regular expression.
  • the processor may include a data classification module for classifying data related to automatic generation of the short form, and may be characterized by mapping categories and sounds corresponding to the first data and the second data through the data classification module.
  • the processor may be configured to check the number of identical texts between the texts constituting the first data and the texts constituting the second data through the data classification module, and if the number of identical texts checked is greater than or equal to a preset number, classify the first data and the second data as similar content through the data classification module, and if the number of identical texts checked is less than a preset number, classify the first data and the second data as dissimilar content through the data classification module.
  • the processor may be configured to check the first category and the second category through the category database, and if the first data and the second data are similar contents, map the first data and the second data to the first category through the data classification module, and if the first data and the second data are dissimilar contents, map the first data to the first category while mapping the second data to the second category through the data classification module.
  • the processor may be characterized in that it is set to map the first sound to the first data according to the category to which the first data is mapped through the data classification module, and to map the second sound to the second data according to the category to which the second data is mapped through the data classification module.
  • a method for automatically generating a short form based on data collected through artificial intelligence performed by an electronic device may include the steps of collecting first data and second data from at least one markup language body through a content collection module of the electronic device; and providing at least one short form related to the markup language body through a short form providing module of the electronic device.
  • the method may further include a step of confirming, through the content collection module, rank information corresponding to each of the first data and the second data, and event information corresponding to the markup language body; and, through the short form providing module, a step of generating a first short form based on the first data by reflecting the rank information corresponding to the first data, and event information corresponding to the first markup language body or the second markup language body from which the first data is collected; and, the method may further include a step of providing the first short form through the short form providing module.
  • the method may further include a step of generating a second short form based on the second data by reflecting rank information corresponding to the second data and event information corresponding to the first markup language body or the second markup language body from which the second data is collected through the short form providing module; and may further include a step of providing the second short form through the short form providing module.
  • the method may further include providing a third short form corresponding to the third data collected from the other markup language body based on information accumulated in the category database and the dictionary database in the process of generating the first short form and the second short form through the first data and the second data through the short form providing module.
  • the method may be characterized in that a keyword is identified based on texts constituting the first data and the second data through a processor of the electronic device, a first keyword for the first data and the second data is extracted using a first regular expression set through the processor, a second keyword is extracted using a second regular expression set after extraction of the first keyword through the processor, a third keyword is extracted using a third regular expression set through the processor, and, among the extracted third keywords, a keyword having an extraction frequency higher than a preset frequency is provided as separate keyword data through the processor, and the separate keyword data is stored in a dictionary database through the processor.
  • a short-form automatic generation electronic device can automatically generate a short-form video from data related to the main text of a markup language source accessed by a user through a terminal via a search engine such as a crawling engine or a machine learning processor, thereby providing immediate video information on content of interest to the user.
  • a search engine such as a crawling engine or a machine learning processor
  • an electronic device for automatically generating short-form videos can automatically generate short-form videos by analyzing similar content by extracting a large amount of content related to a small number of web pages visited by a user using artificial intelligence.
  • FIG. 1 is a schematic block diagram of a system for automatically generating short forms based on data collected via artificial intelligence according to various embodiments of the present disclosure.
  • FIG. 2 is a schematic block diagram of components of a short-form automatic generation electronic device according to various embodiments of the present disclosure.
  • FIG. 3 is a schematic flowchart of a method for automatically generating a short form according to various embodiments of the present disclosure.
  • FIG. 4 is a flowchart illustrating a process for generating a short-form automatic generation process model according to various embodiments of the present disclosure.
  • FIG. 5 is a flowchart of a short form generation process reflecting additional information of a markup language body according to various embodiments of the present disclosure.
  • FIG. 6 is a detailed flowchart regarding short-form generation through a short-form automatic generation process model according to various embodiments of the present disclosure.
  • first, second, etc. are used to distinguish one component from another, and the components are not limited by the aforementioned terms.
  • each step is used for convenience of explanation and do not describe the order of each step. Each step may be performed in a different order than specified unless the context clearly indicates a specific order.
  • the 'device according to the present disclosure includes all of various devices that can perform computational processing and provide results to a user.
  • the device according to the present disclosure may include all of a computer, a server device, and a portable terminal, or may be in the form of any one of them.
  • the computer may include, for example, a notebook, desktop, laptop, tablet PC, slate PC, etc. equipped with a web browser.
  • the above server device is a server that processes information by communicating with an external device, and may include an application server, a computing server, a database server, a file server, a game server, a mail server, a proxy server, and a web server.
  • the above portable terminal may include, for example, all kinds of handheld-based wireless communication devices such as a PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS (Personal Handyphone System), PDA (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), WiBro (Wireless Broadband Internet) terminal, a smart phone, and a wearable device such as a watch, a ring, a bracelet, an anklet, a necklace, glasses, contact lenses, or a head-mounted-device (HMD).
  • a PCS Personal Communication System
  • GSM Global System for Mobile communications
  • PDC Personal Digital Cellular
  • PHS Personal Handyphone System
  • PDA Personal Digital Assistant
  • IMT International Mobile Telecommunication
  • CDMA Code Division Multiple Access
  • W-CDMA Wideband Code Division Multiple Access
  • WiBro Wireless Broadband Internet
  • FIG. 1 is a schematic block diagram of a system for automatically generating short forms based on data collected via artificial intelligence according to various embodiments of the present disclosure.
  • a system for automatically generating a short form based on data collected through artificial intelligence (hereinafter referred to as 'short form automatic generation system (10)') includes an electronic device (100) and a terminal (200). Each node can exchange data with other nodes. Each node can be connected through a network.
  • An electronic device (100) is a device that automatically generates short-form content.
  • Short-form content refers to content created in a short video format.
  • the electronic device (100) can automatically generate and provide short-form content to a user based on data collected from a specific site that a user surfs or crawls the web or content that constitutes a markup body.
  • the electronic device (100) can transmit or receive data to or from another device (e.g., a terminal (200)) through a wired and/or wireless connection.
  • the terminal (200) may be a device that receives a short form generated from an electronic device (100).
  • the terminal (200) may provide the electronic device (100) with learning data necessary for the electronic device (100) to create a short form generation model through learning data using artificial intelligence.
  • the terminal (200) may include a device such as a mobile phone, a laptop computer, a personal computer, or a tablet carried by a user.
  • the terminal (200) may be any device carried by a user and connected to the electronic device (100) via a network so that data can be exchanged between the two.
  • FIG. 2 is a schematic block diagram of components of a short-form automatic generation electronic device according to various embodiments of the present disclosure.
  • the electronic device (100) may include, but is not limited to, a processor (110), a category database (120), and a dictionary database (130) as internal components. Each node may exchange data with other nodes. Each node may be directly electrically connected or may be wired and/or wirelessly connected through a network.
  • the electronic device (100) of the present disclosure may perform the function of the processor (110) through a separate server instead of the processor (110).
  • the electronic device (100) is a device that verifies data collected by a user through a terminal (200) or acquired through an input performed through the terminal (200). For example, the electronic device (100) can verify Internet cookie data, Internet log data, web page data, etc. that a user searches for on an Internet web page by accessing the Internet web page through the terminal (200). That is, the electronic device (100) can verify contents that the user encounters using the terminal (200) in real time, and can process data related to the contents to generate a short form that the user may be interested in.
  • the processor (110) includes, but is not limited to, a content collection module (111), a data classification module (112), and a short-form provision module (113).
  • the content collection module (111) can collect data from at least one markup language body.
  • the markup language body can mean an HTML body.
  • the content collection module (111) can collect all data in the HTML body by converting them into text, and can also collect images or videos.
  • the data classification module (112) can map categories and sounds corresponding to data of the markup language body. For example, the data classification module (112) can determine the mutual similarity between data and classify categories based on the similarity. The data classification module (112) can map sounds suitable for each category to each data in response to the category classification of data. Accordingly, the processor (110) can generate a short form of a set time (e.g., 1 minute) according to the number of data, categories, etc.
  • a set time e.g., 1 minute
  • the short-form providing module (113) can provide at least one short form related to the markup language body to the terminal (200).
  • the short-form providing module (113) can generate a short form corresponding to the data of the markup language body based on a learned model based on a short-form generating model for operating the data classification module (112) and provide the short form to the terminal (200). That is, the short-form providing module (113) can generate and provide content that may be of interest to a user possessing the terminal (200) as a short form.
  • the short-form providing module (113) can generate in real time a short form related to content provided on a web page that the user is viewing in real time through the terminal (200) and provide the short form to at least a part of the web page that the user is viewing in real time.
  • the short-form provision module (113) can generate a short-form related to content provided on a web page that a user is viewing in real time through a terminal (200) and provide it in a pop-up format on at least a portion of the web page that the user is viewing in real time.
  • the processor (110) may include individual modules (111, 112, 113).
  • the individual modules (111, 112, 113) may mean functional blocks according to the functions that the processor (110) operates.
  • the individual modules (111, 112, 113) may correspond to functional blocks that are given names according to the functions of the processor (110).
  • the processor (110) may be implemented as a memory (not shown) that stores data for an algorithm for controlling the operation of components within an electronic device (100) or a program that reproduces the algorithm, and at least one functional block that performs the above-described operation using the data stored in the memory.
  • the processor (110) and the memory may be implemented as separate chips.
  • the processor (110) and the memory may be implemented as a single chip.
  • the processor (110) can control one or more of the components described above in combination to implement various embodiments according to the present disclosure described in FIGS. 3 to 6 below in an electronic device (100).
  • a communication unit may include one or more components that enable communication with an external device (e.g., a terminal (200) of FIG. 1), and may include, for example, at least one of a broadcast receiving module, a wired communication module, a wireless communication module, a short-range communication module, and a location information module.
  • the wired communication module may include various wired communication modules such as a Local Area Network (LAN) module, a Wide Area Network (WAN) module, or a Value Added Network (VAN) module, as well as various cable communication modules such as a Universal Serial Bus (USB), a High Definition Multimedia Interface (HDMI), a Digital Visual Interface (DVI), RS-232 (recommended standard232), power line communication, or plain old telephone service (POTS).
  • LAN Local Area Network
  • WAN Wide Area Network
  • VAN Value Added Network
  • USB Universal Serial Bus
  • HDMI High Definition Multimedia Interface
  • DVI Digital Visual Interface
  • RS-232 recommended standard232
  • POTS plain old telephone service
  • the wireless communication module may include a wireless communication module that supports various wireless communication methods such as GSM (global System for Mobile Communication), CDMA (Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), UMTS (universal mobile telecommunications system), TDMA (Time Division Multiple Access), LTE (Long Term Evolution), 4G, 5G, and 6G, in addition to a WiFi module and a Wireless broadband module.
  • GSM Global System for Mobile Communication
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • UMTS universalal mobile telecommunications system
  • TDMA Time Division Multiple Access
  • LTE Long Term Evolution
  • 4G Long Term Evolution
  • 5G Fifth Generation
  • 6G Wireless broadband module
  • the short-range communication module is for short-range communication and can support short-range communication using at least one of BluetoothTM, RFID (Radio Frequency Identification), Infrared Data Association (IrDA), UWB (Ultra Wideband), ZigBee, NFC (Near Field Communication), Wi-Fi (Wireless-Fidelity), Wi-Fi Direct, and Wireless USB (Wireless Universal Serial Bus) technologies.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wideband
  • ZigBee Ultra Wideband
  • NFC Near Field Communication
  • Wi-Fi Wireless-Fidelity
  • Wi-Fi Direct Wireless USB (Wireless Universal Serial Bus) technologies.
  • the memory can store data supporting various functions of the electronic device (100) and a program for the operation of the processor (110), can store input/output data (e.g., images, videos, etc.), and can store a plurality of application programs (or applications) run on the electronic device (100), data for the operation of the electronic device (100), and commands. At least some of these application programs can be downloaded from an external server via wireless communication.
  • Such memory may include at least one type of storage medium among a flash memory type, a hard disk type, an SSD (Solid State Disk type), an SDD (Silicon Disk Drive) type, a multimedia card micro type, a card type memory (for example, an SD or XD memory, etc.), a random access memory (RAM), a static random access memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, and an optical disk.
  • the memory may be a database that is separate from the electronic device (100) but connected by wire or wirelessly.
  • At least one component may be added or deleted in accordance with the performance of the components illustrated in FIG. 2.
  • the mutual positions of the components may be changed in accordance with the performance or structure of the device.
  • each component illustrated in FIG. 2 represents software and/or hardware components such as a Field Programmable Gate Array (FPGA) and an Application Specific Integrated Circuit (ASIC).
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • FIG. 3 is a schematic flowchart of a method for automatically generating a short form according to various embodiments of the present disclosure.
  • the processor can collect data from the markup language body.
  • the processor can collect data within a web page that a user is surfing using a terminal (e.g., the terminal (200) of FIG. 1) through a content collection module (e.g., the content collection module (111) of FIG. 2).
  • the data may include first data and second data, and the first data and the second data are examples of various data that can be extracted from the markup language body source.
  • the processor can extract the markup language body source of the original page of various web pages, such as a website for purchasing products, a news site providing news, etc. Accordingly, the processor can extract the markup language body source of the original page together and store it in the memory. For example, the processor can extract the markup language body source from a web page where a user is shopping, and check various data, such as the product type, product image, product review, product price, and product details. As another example, the processor can extract the markup language body source from a web page where a user is viewing news, and check various data, such as the image used in the news, the news content, the news comments, and the time the news was written. In this way, the processor can collect various data from the real-time markup language body source that the user is checking through the terminal, and can easily identify the content that the user is interested in in real time.
  • the processor can map categories and sounds corresponding to the data.
  • the processor can map categories and sounds corresponding to the first data and the second data through a data classification module (e.g., the data classification module (112) of FIG. 2).
  • the processor can identify the content included in the first data and the second data. For example, the processor can identify that the first data is about sports and map the first data to the sports category. Then, the processor can identify that the first data is about sports and map the first data to a sound matching sports. For another example, the processor can identify that the second data is about popular songs and map the second data to the popular songs category. Then, the processor can identify that the second data is about popular songs and map the second data to a sound matching popular songs. That is, the processor can identify the content of data collected from the markup language main text source, classify categories of individual data, and map sounds of appropriate moods according to the classification.
  • the information about the category and the information about the sound may be pre-stored in the memory, and may include information about the category and the information about the sound that are cumulatively updated and stored while the processor generates a short form through a short form generation model.
  • the processor may provide a short form related to the markup language body.
  • the processor may provide at least one short form related to at least one markup language body to the user's terminal through a short form providing module (e.g., the short form providing module (113) of FIG. 2).
  • a short form providing module e.g., the short form providing module (113) of FIG. 2.
  • a processor may provide a short form corresponding to content that the user is interested in in real time to the user in real time.
  • the processor may generate a short form related to content being provided on a web page that the user is viewing in real time through the terminal and provide the short form on at least a portion of the display of the terminal in real time.
  • Fig. 4 is a flowchart of a process for generating a short-form automatic generation process model according to various embodiments of the present disclosure.
  • a data classification module e.g., data classification module (111) of Fig. 2 can compare texts constituting first data and second data collected through a content collection module (e.g., content collection module (111) of Fig. 2) to determine the number of identical texts included in each data.
  • the data classification module can determine whether the number of identical texts among the texts constituting the first data and the second data is equal to or greater than a preset number (e.g., 3).
  • a preset number e.g. 3
  • the preset number may be stored in the memory. If the number of identical texts is equal to or greater than the preset number, the data classification module can branch to step S420 and proceed with the process. If the number of identical texts is less than the preset number, the data classification module can branch to step S440 and proceed with the process.
  • the data classification module can classify the first data and the second data into similar content.
  • the data classification module can determine whether the data is similar based on the attribute information of the collected data. For example, the data classification module can collect title text from data in a news web page and determine whether the data is similar.
  • the news web page may list various news titles.
  • the data classification module can compare title texts of various news, and if there are a preset number or more of identical texts, it can determine that the titles are news belonging to the same category.
  • the data classification module can collect product brand and product name texts from data in a shopping mall web page and determine whether the data is similar.
  • the shopping mall web page may list various products.
  • the data classification module can compare brand and product name texts of various products, and if there are a preset number or more of identical texts, it can determine that the products belong to the same category.
  • the data classification module can map the first data and the second data to the first category.
  • the data classification module can classify the first data and the second data into similar contents and map the first data and the second data to the same category.
  • the data classification module can classify the first data and the second data as dissimilar content.
  • the processor can check the first category and the second category from a category database (e.g., the category database (120) of FIG. 2).
  • the first category and the second category may be different categories, while the individual categories may be categories of the same class. Accordingly, the first category and the second category may each have a subcategory that is a subclass.
  • the data classification module may map the first data to the first category and the second data to the second category if the first data and the second data correspond to dissimilar content. That is, the data classification module may determine that the first data and the second data correspond to dissimilar content and map each data to a different category. For example, the data classification module may classify the first data into a sports news category and the second data into an economic news category based on news text on a news web page.
  • the data classification module may map the first data and the second data to the first sound and the second sound according to the respective mapped categories. This may be a process of mapping sounds that match the content that the data visualizes and expresses.
  • the processor maps the first data to the first category
  • the processor may map the first sound to the first data.
  • the first sound may be a sound that matches the first category.
  • the processor maps the second data to the second category
  • the processor may map the second sound to the second data.
  • the second sound may be a sound that matches the second category. That is, when generating a short form, the processor may generate the short form by matching sounds according to genre or mood, such as a product or news.
  • FIG. 5 is a flowchart of a short form generation process reflecting additional information of a markup language body according to various embodiments of the present disclosure.
  • a processor e.g., processor (110) of FIG. 2 can check rank information corresponding to each data and event information corresponding to the markup language body.
  • a processor may verify rank information corresponding to each of the first data and the second data and event information corresponding to at least one markup language body through a content collection module (e.g., content collection module (111) of FIG. 2). For example, when generating a short form, the processor may additionally collect rank information of a web page where a product is sold or a web page of news, event information on the corresponding web page, and the like. Through this, the processor verifies rank information corresponding to the data and event information in progress in the markup language body source.
  • a content collection module e.g., content collection module (111) of FIG. 2
  • the processor may learn by classifying the first data and the second data into their respective categories.
  • the first data may be data related to news, may be mapped to the first category, and may not need to be frequently updated and stored.
  • the second data may be data related to products, may be mapped to the second category, and may need to be frequently updated and stored. This may vary depending on whether each data is non-volatile information or volatile information. That is, the categories of the present disclosure may be classified in various ways depending on the nature or type of data.
  • the processor may generate a short-form generation model through a process of classifying each data, and may develop the short-form generation model itself while learning according to input based on the short-form generation model.
  • the processor may cumulatively apply and learn data received from the user's terminal through the short-form generation model, update the short-form generation model itself, and then apply the updated short-form generation model to new data.
  • the processor can generate a first short form and a second short form.
  • the processor can generate a first short form based on the first data by reflecting rank information corresponding to the first data and event information corresponding to the first markup language body or the second markup language body that collected the first data through a short form providing module (e.g., the short form providing module (113) of FIG. 2).
  • the processor can generate a second short form based on the second data by reflecting rank information corresponding to the second data and event information corresponding to the first markup language body or the second markup language body that collected the second data through the short form providing module.
  • FIG. 6 is a detailed flowchart regarding short-form generation through a short-form automatic generation process model according to various embodiments of the present disclosure.
  • a processor e.g., processor (110) of FIG. 2 can check keywords of text constituting data.
  • the processor can check keywords based on text constituting first data and second data.
  • the processor can extract a first keyword for the first data and the second data using a preset first regular expression. After extracting the first keyword, the processor can extract a second keyword using a preset second regular expression.
  • the processor can extract the nth keyword through the nth regular expression.
  • the processor can extract the nth keyword for data using a preset nth regular expression.
  • the preset regular expression can be an expression for extracting data from a markup language body source of a domain or site, based on a domain or site that a user accesses through a user's terminal (e.g., terminal (200) of FIG. 1).
  • the first regular expression may be an expression according to a promise to interpret keyword information from A to D in the markup language body as a.
  • the processor may extract essential information for generating a short form using the first regular expression.
  • the essential information may be attribute information required to generate the short form.
  • the attribute information may be a product category, a product name, an image of the product, a price of the product, etc.
  • the attribute information may be a title of the web novel, an image of the web novel, etc.
  • the attribute information may be a title of the news, an image of the news, a comment of the news, etc.
  • the second regular expression may be an expression for extracting more general data than the data extracted by the first regular expression.
  • extracting the first keyword from the first data through the first regular expression may extract information included in a unique keyword in the first markup language body.
  • the unique keyword may be an item code, an item number, a product code, etc.
  • extracting the second keyword from the first data through the second regular expression may extract a keyword that is more generally used than the data extracted by the first regular expression in the first markup language body.
  • extracting the third keyword through the third regular expression may extract a more general keyword than extracting the second keyword through the second regular expression. That is, the first keyword, the second keyword, the third keyword, and the nth keyword may correspond to general keywords as n increases.
  • n is 3 or more among the nth regular expressions for extracting the nth keyword, it can be classified into the else area.
  • else can correspond to an area for extracting keywords that users can very commonly encounter.
  • the processor can determine whether the extraction frequency of the keyword is greater than or equal to a preset extraction frequency. After extracting the first keyword and the second keyword through the first regular expression and the second regular expression, the processor can determine a keyword among the third keywords extracted using the third regular expression whose extraction frequency is greater than or equal to the preset frequency.
  • the preset frequency e.g., 3 times
  • the third keyword extracted at a frequency greater than or equal to the preset frequency can be provided to the user.
  • the processor may provide separate keyword data.
  • the processor may provide keywords among the third keywords whose extraction frequency is higher than a preset frequency as separate keyword data. This may be a process in which content that frequently appears in the log is provided to a developer (e.g., an administrator) from the third regular expression or higher as an else area, and when the developer stores it in a dictionary database (e.g., the dictionary database (130) of FIG. 2) or is automatically updated and stored in the dictionary database, the content is recognized as a keyword and attribute information is extracted from the data. For example, when the developer stores log information in the dictionary database, the processor may extract data recognized as a keyword through the log information.
  • a dictionary database e.g., the dictionary database (130) of FIG. 2
  • the processor may extract data recognized as a keyword through the log information. In this way, all keyword data may be stored in the dictionary database.
  • the processor can build a more accurate short-form generation model by repeatedly creating a regular expression based on the results obtained through the keyword extraction process and then extracting keywords again using the created regular expression.
  • the processor can store separate keyword data in the dictionary database.
  • the processor can store keyword data of the third keyword or more (e.g., the third keyword, the fourth keyword, etc.) in the dictionary database.
  • the processor can separately check whether attribute information corresponding to the extracted data corresponds to correct information. For example, in relation to the price information of a product, the processor can match the price information extracted from the markup language body source to information indicating a pre-stored price to confirm whether the extracted price information is correct price information. Through this, the processor can extract and map attribute information based on the initial matching of the price information of the product without a repetitive matching process.
  • the processor may determine the fourth regular expression. If separate keyword data is stored in the dictionary database, the processor may determine the regular expression for extracting the separate keyword as the fourth regular expression.
  • the fourth regular expression is an example of a regular expression after the third regular expression.
  • the processor may utilize the else area of the regular expression to verify attribute information of data extracted from the markup language body.
  • the processor may store information extracted at a preset frequency or higher among the information stored in the log through the else area in a dictionary database and utilize it as a keyword when verifying attribute information in the future.
  • the processor may recommend to the developer, during the process of verifying the attribute information of the data, information that is repeatedly stored or verified among the information stored in the dictionary database. This may correspond to a process in which the processor repeatedly learns and, although it did not recommend anything at first, the frequency of recommendations increases as it visits various sites and the degree of learning increases.
  • the processor can determine the appropriateness of matching between the sound to which data is mapped and the short form generated based on the data.
  • the processor can automatically or manually generate the sound based on the category.
  • the processor can collect the sound data based on the category keyword attribute.
  • the sound data can be sound data that does not infringe on copyright.
  • the processor can directly generate the sound data based on the category keyword attribute.
  • the processor can load the sound data stored by the developer.
  • the processor can match video data and sound data for generating a short form.
  • the processor can set a basic weight as 1/10 and a weight according to the popularity of the video data (e.g., number of likes, comments, etc.) as 4/10 to 8/10.
  • the processor can assign a weight according to the viewing time of the video data as 2/10 and assign a weight according to the choice of a developer (e.g., administrator) within 1/10 to adjust the limit of automatic matching of the video data.
  • the processor can randomly match the sound data by dividing the sound data into n sections (e.g., 10 sections) based on the total weight assigned to the short form to be generated.
  • the processor can apply sound data by randomly selecting sound data within the 10th section 10 times, by randomly selecting sound data within the 9th section 9 times, and by randomly selecting sound data within the 1st section once.
  • the processor can update the weights once a day while maintaining the number of sound data of individual sections as the same number according to the assigned weights.
  • the processor can sort and distinguish the weights by priority, such as the most recent use order and the most recent generation order, when the weights match.
  • the processor may provide a third short form corresponding to the third data collected from the other markup language body based on information accumulated in the category database and the dictionary database in the process of generating the first short form and the second short form through the first data and the second data. That is, the processor may generate the third short form by applying the same and/or similar method of generating the first short form and the second short form. This may be that the processor repeatedly learns a short form generation model through machine learning and generates a short form identically and/or similarly on a web page of a new site based on a process of generating short forms on web pages of some sites.
  • the disclosed embodiments may be implemented in the form of a recording medium storing instructions executable by a computer.
  • the instructions may be stored in the form of program codes, and when executed by a processor, may generate program modules to perform the operations of the disclosed embodiments.
  • the recording medium may be implemented as a computer-readable recording medium.
  • Computer-readable storage media include all types of storage media that store instructions that can be deciphered by a computer. Examples include ROM (Read Only Memory), RAM (Random Access Memory), magnetic tape, magnetic disk, flash memory, and optical data storage devices.
  • ROM Read Only Memory
  • RAM Random Access Memory
  • magnetic tape magnetic tape
  • magnetic disk magnetic disk
  • flash memory optical data storage devices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente divulgation concerne un dispositif électronique permettant de générer automatiquement une forme courte sur la base de données collectées par intelligence artificielle, un processeur étant conçu pour collecter des premières données et des secondes données à partir d'au moins un corps de langage de balisage par l'intermédiaire d'un module de collecte de contenu, et fournir au moins une forme courte associée au corps de langage de balisage par l'intermédiaire d'un module de fourniture de forme courte.
PCT/KR2024/009799 2023-07-14 2024-07-09 Dispositif électronique pour générer automatiquement une forme courte sur la base de données collectées par intelligence artificielle, et son procédé d'utilisation Pending WO2025018681A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020230091581A KR102602936B1 (ko) 2023-07-14 2023-07-14 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 전자 장치 및 이를 이용한 방법
KR10-2023-0091581 2023-07-14

Publications (1)

Publication Number Publication Date
WO2025018681A1 true WO2025018681A1 (fr) 2025-01-23

Family

ID=88964578

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2024/009799 Pending WO2025018681A1 (fr) 2023-07-14 2024-07-09 Dispositif électronique pour générer automatiquement une forme courte sur la base de données collectées par intelligence artificielle, et son procédé d'utilisation

Country Status (2)

Country Link
KR (2) KR102602936B1 (fr)
WO (1) WO2025018681A1 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102602936B1 (ko) * 2023-07-14 2023-11-16 (주)엔아이지씨 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 전자 장치 및 이를 이용한 방법
KR102747528B1 (ko) * 2024-02-07 2024-12-31 (주)엔아이지씨 웹 자료에서 추출된 대본 기반으로 동영상을 자동으로 생성하는 방법, 이를 위한 장치 및 저장 매체
KR102815032B1 (ko) * 2024-11-14 2025-05-30 주식회사 일만백만 인공지능 기반 아티클 원문으로부터 관련 콘텐츠를 자동으로 생성하는 방법
KR102827246B1 (ko) 2024-11-21 2025-07-02 주식회사 웬디미디어 인공지능을 이용한 다국어 숏폼 비디오를 생성하는 방법 및 시스템

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101968599B1 (ko) * 2017-11-14 2019-04-15 한성호 입력 텍스트에 따른 스토리 동영상 생성방법 및 장치
KR20200023013A (ko) * 2018-08-24 2020-03-04 에스케이텔레콤 주식회사 영상 컨텐츠 검색을 지원하는 영상 서비스 장치 및 영상 컨텐츠 검색 지원 방법
KR20200084379A (ko) * 2018-12-20 2020-07-13 김기홍 빅데이터를 활용한 상호 적합도 기반 채용 매칭 온라인 플랫폼 서비스 시스템 및 방법
KR102251612B1 (ko) * 2020-09-25 2021-05-13 김형준 콘텐츠의 분류 관리 방법 및 시스템
KR20220007459A (ko) * 2020-07-10 2022-01-18 박종진 소셜 미디어 활동 분석을 통한 관계형성 서비스 제공방법 및 그 시스템
KR102602936B1 (ko) * 2023-07-14 2023-11-16 (주)엔아이지씨 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 전자 장치 및 이를 이용한 방법

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20220127714A (ko) 2021-03-11 2022-09-20 최영수 크라우드펀딩 중개방법
KR20230127792A (ko) * 2022-02-25 2023-09-01 주식회사 아리모아 3d 기반 영상 콘텐츠 제작 시스템
KR102570134B1 (ko) * 2022-11-28 2023-08-28 앞으로아카데미 주식회사 숏폼 클립 생성 방법 및 시스템

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101968599B1 (ko) * 2017-11-14 2019-04-15 한성호 입력 텍스트에 따른 스토리 동영상 생성방법 및 장치
KR20200023013A (ko) * 2018-08-24 2020-03-04 에스케이텔레콤 주식회사 영상 컨텐츠 검색을 지원하는 영상 서비스 장치 및 영상 컨텐츠 검색 지원 방법
KR20200084379A (ko) * 2018-12-20 2020-07-13 김기홍 빅데이터를 활용한 상호 적합도 기반 채용 매칭 온라인 플랫폼 서비스 시스템 및 방법
KR20220007459A (ko) * 2020-07-10 2022-01-18 박종진 소셜 미디어 활동 분석을 통한 관계형성 서비스 제공방법 및 그 시스템
KR102251612B1 (ko) * 2020-09-25 2021-05-13 김형준 콘텐츠의 분류 관리 방법 및 시스템
KR102602936B1 (ko) * 2023-07-14 2023-11-16 (주)엔아이지씨 인공 지능을 통해 수집한 데이터를 기반으로 숏폼을 자동으로 생성하는 전자 장치 및 이를 이용한 방법
KR102677612B1 (ko) * 2023-07-14 2024-06-24 (주)엔아이지씨 인공 지능을 통해 수집한 데이터에 대한 키워드 기반의 숏폼 자동 생성 장치, 방법 및 프로그램

Also Published As

Publication number Publication date
KR102677612B1 (ko) 2024-06-24
KR102602936B1 (ko) 2023-11-16

Similar Documents

Publication Publication Date Title
WO2025018681A1 (fr) Dispositif électronique pour générer automatiquement une forme courte sur la base de données collectées par intelligence artificielle, et son procédé d'utilisation
CN111414498B (zh) 多媒体信息推荐方法、装置及电子设备
CN111368185B (zh) 数据展示方法、装置、存储介质及电子设备
WO2010095867A2 (fr) Systeme intellectuel personnalise servant a rechercher des informations internet au moyen de symboles et d'icônes par l'intermediaire d'un terminal de communication mobile et d'un terminal d'information base sur ip
WO2017213396A1 (fr) Procédé et un appareil permettant de fournir un service de recommandation de livres
US20090171986A1 (en) Techniques for constructing sitemap or hierarchical organization of webpages of a website using decision trees
WO2016093552A2 (fr) Dispositif terminal et son procédé de traitement de données
CN109872242A (zh) 信息推送方法和装置
WO2014119938A1 (fr) Serveur permettant d'offrir un service ciblant un utilisateur et méthode d'offre de service associée
WO2017121076A1 (fr) Procédé et dispositif de poussée d'informations
US11392589B2 (en) Multi-vertical entity-based search system
EP3230902A2 (fr) Dispositif terminal et son procédé de traitement de données
CN109716377A (zh) 登录页面生成的改进
WO2017115994A1 (fr) Procédé et dispositif destinés à fournir des notes au moyen d'un calcul de corrélation à base d'intelligence artificielle
WO2017160133A2 (fr) Procédé de configuration de classement de publications et serveur de services associé
WO2019164119A1 (fr) Dispositif électronique et son procédé de commande
WO2020190103A1 (fr) Procédé et système de fourniture d'objets multimodaux personnalisés en temps réel
WO2017074066A1 (fr) Serveur fournisseur de contenu internet et support d'enregistrement lisible par ordinateur mettant en œuvre le même procédé
WO2015030269A1 (fr) Serveur et procédé pour générer des informations d'évaluation de stock et dispositif pour recevoir des informations d'évaluation
WO2022244997A1 (fr) Procédé et appareil pour le traitement de données
US20160124580A1 (en) Method and system for providing content with a user interface
CN112348614B (zh) 用于推送信息的方法和装置
WO2014069754A1 (fr) Système et procédé de fourniture de contenu sur la base d'une zone d'intérêt
WO2019143161A1 (fr) Dispositif électronique et son procédé de traitement de mot-clé de recherche
WO2017099535A1 (fr) Procédé et système pour l'auto-visualisation de contenus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24843400

Country of ref document: EP

Kind code of ref document: A1