[go: up one dir, main page]

CN117289869B - Data processing method, device, equipment and storage medium - Google Patents

Data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN117289869B
CN117289869B CN202311278630.0A CN202311278630A CN117289869B CN 117289869 B CN117289869 B CN 117289869B CN 202311278630 A CN202311278630 A CN 202311278630A CN 117289869 B CN117289869 B CN 117289869B
Authority
CN
China
Prior art keywords
data
type
target
determining
extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311278630.0A
Other languages
Chinese (zh)
Other versions
CN117289869A (en
Inventor
陈思
胡媛
姚洁
吴天昊
杨柳
闵媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202311278630.0A priority Critical patent/CN117289869B/en
Publication of CN117289869A publication Critical patent/CN117289869A/en
Application granted granted Critical
Publication of CN117289869B publication Critical patent/CN117289869B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开提供了一种数据处理方法、装置、设备以及存储介质,涉及人工智能和生成式模型领域,尤其涉及智能搜索领域。具体实现方案为:获取初始数据集,初始数据集中包括预提取数据、预提取数据所属原文件的文件标识以及预提取数据在原文件中的位置标识;根据接收到的数据处理请求,从预提取文件中确定目标数据,并确定目标数据的风格特征;根据目标数据的位置标识,从对应的原文件中确定待处理数据;根据数据处理请求和风格特征处理待处理数据,得到数据处理结果。本公开提供的方案可以有效保证数据处理的准确性和完整性,同时,在处理数据过程中结合目标数据的风格特征,可以提高数据处理的智能化效果。

The present disclosure provides a data processing method, device, equipment and storage medium, which relate to the fields of artificial intelligence and generative models, and in particular to the field of intelligent search. The specific implementation scheme is: obtaining an initial data set, which includes pre-extracted data, a file identifier of the original file to which the pre-extracted data belongs, and a position identifier of the pre-extracted data in the original file; determining the target data from the pre-extracted file according to the received data processing request, and determining the style characteristics of the target data; determining the data to be processed from the corresponding original file according to the position identifier of the target data; processing the data to be processed according to the data processing request and the style characteristics to obtain the data processing result. The scheme provided by the present disclosure can effectively ensure the accuracy and completeness of data processing. At the same time, combining the style characteristics of the target data in the process of processing data can improve the intelligent effect of data processing.

Description

Data processing method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to the field of artificial intelligence and generative model technologies, and in particular, to a data processing method, apparatus, device, and storage medium.
Background
At present, network disk products often only store original files or links and the like for data storage, but cannot further process the data in the storage process, for example, a generative model or a large language model is adopted for data processing. In addition, the application path of the stored data after further processing is single.
Disclosure of Invention
The disclosure provides a data processing method, a device, equipment and a storage medium, which effectively improve the intelligent effect of data processing.
According to a first aspect of the present disclosure, there is provided a data processing method comprising:
acquiring an initial data set, wherein the initial data set comprises pre-extraction data, a file identifier of an original file to which the pre-extraction data belongs and a position identifier of the pre-extraction data in the original file;
determining target data from the pre-extracted data according to the received data processing request, and determining style characteristics of the target data;
Determining data to be processed from the corresponding original file according to the position identification of the target data;
and processing the data to be processed according to the data processing request and the style characteristics to obtain a data processing result.
According to a second aspect of the present disclosure, there is provided a data processing method comprising:
acquiring a data extraction request and candidate extraction data;
Determining an original file to which the candidate extraction data belong and a position identifier of the candidate extraction data in the original file;
processing candidate extraction data according to the data extraction request to obtain pre-extraction data;
and storing the pre-extracted data, the file identification of the original file and the position identification in a correlated manner to obtain an initial data set.
According to a third aspect of the present disclosure, there is provided a data processing apparatus comprising:
The first acquisition module is configured to acquire an initial data set, wherein the initial data set comprises pre-extraction data, a file identifier of an original file to which the pre-extraction data belongs and a position identifier of the pre-extraction data in the original file;
A first determining module configured to determine target data from the pre-extracted data according to the received data processing request, and determine style characteristics of the target data;
The second determining module is configured to determine data to be processed from the corresponding original file according to the position identification of the target data;
the first processing module is configured to process the data to be processed according to the data processing request and the style characteristics, and a data processing result is obtained.
According to a fourth aspect of the present disclosure, there is provided a data processing apparatus comprising:
a third acquisition module configured to acquire the data extraction request and the candidate extraction data;
A fourth determining module configured to determine an original file to which the candidate extraction data belongs and a location identifier of the candidate extraction data in the original file;
The second processing module is configured to process the candidate extraction data according to the data extraction request to obtain pre-extraction data;
and the storage module is configured to store the pre-extracted data, the file identification of the original file and the position identification in a correlated manner to obtain an initial data set.
According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:
At least one processor, and
A memory communicatively coupled to the at least one processor, wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method provided in the first aspect or the second aspect.
According to a sixth aspect of the present disclosure there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method as provided in the first or second aspect.
According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method provided according to the first or second aspect.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is an exemplary system architecture diagram to which the data processing methods of the present disclosure may be applied;
FIG. 2 is a schematic diagram of a first embodiment of a data processing method according to the present disclosure;
FIG. 3 is a schematic diagram of a second embodiment of a data processing method according to the present disclosure;
FIG. 4 is a schematic diagram of a third embodiment of a data processing method according to the present disclosure;
FIGS. 5a-5c are schematic illustrations of pre-extraction of data in a data processing method according to the present disclosure;
FIG. 6 is a schematic diagram of a first embodiment of a data processing apparatus according to the present disclosure;
FIG. 7 is a schematic diagram of a second embodiment of a data processing apparatus according to the present disclosure;
Fig. 8 is a block diagram of an electronic device for implementing a data processing method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The data processing method comprises the steps of determining target data from pre-extracted data of an initial data set according to a data processing request, obtaining data to be processed from an original file to which the target data belongs according to the target data, effectively guaranteeing the integrity and accuracy of the processed data, and further integrating style characteristics of the target data in the data processing process, enabling the style characteristics to be integrated into the data processing result, improving the work style of the data processing result and improving the intelligent degree of data processing.
Illustratively, the preprocessing, generating and other processes of the data in the data processing method disclosed by the disclosure can be implemented by adopting a large language model. Large language models (LLM, large Language Model, which are essentially generative models), such as ChatGPT (CHAT GENERATIVE PRE-trained Transformer, chat bot developed by OpenAI institutions), are capable of generating human-like fluent responses for many downstream tasks (e.g., task-oriented conversations and problem solutions).
FIG. 1 illustrates an exemplary system architecture 100 in which embodiments of the data processing methods or data processing apparatus of the present disclosure may be applied.
As shown in fig. 1, system architecture 100 may include a terminal device 101, a network 102, and a server 103. The network 102 is used to provide a communication link between the terminal device 101 and the server 103, and may include various connection types, for example, a wired communication link, a wireless communication link, or an optical fiber cable, etc.
A user can interact with the server 103 through the network 102 using the terminal device 101 to receive or transmit information or the like. Various client applications may be installed on the terminal device 101.
The terminal device 101 may be hardware or software. When the terminal device 101 is hardware, it may be various electronic devices including, but not limited to, a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like, and may also include additional electronic devices on a vehicle such as an in-vehicle terminal. When the terminal apparatus 101 is software, it may be installed in the above-described electronic apparatus. Which may be implemented as a plurality of software or software modules, or as a single software or software module. The present invention is not particularly limited herein.
The server 103 may be hardware or software. When the server 103 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 103 is software, it may be implemented as a plurality of software or software modules (for example, to provide distributed services), or may be implemented as a single software or software module. The present invention is not particularly limited herein.
The data processing method provided by the embodiments of the present disclosure is generally performed by the server 103 or the terminal device 101, and accordingly, the data processing apparatus is generally disposed in the server 103 or the terminal device 101.
It should be noted that the numbers of the terminal device 101, the network 102, and the server 103 in fig. 1 are merely illustrative. There may be any number of terminal devices 101, networks 102, and servers 103, as desired for implementation.
FIG. 2 illustrates a flow 200 of one embodiment of a data processing method according to the present disclosure, with reference to FIG. 2, including the steps of:
Step S201, an initial data set is obtained, wherein the initial data set comprises pre-extraction data, a file identifier of an original file to which the pre-extraction data belongs and a position identifier of the pre-extraction data in the original file.
In the embodiment of the present disclosure, the execution body of the data processing method, such as the terminal device 101 or the server 103 shown in fig. 1, acquires the initial data set from other terminal devices or network servers connected by communication or from a local storage space through a wired or wireless manner. The initial data set comprises pre-extraction data, an original file to which the pre-extraction data belongs and/or a file identifier of the original file, and also comprises a position identifier of the pre-extraction data in the original file to which the pre-extraction data belongs.
In some alternative implementations, the initial data set may be an initial data set that is pre-stored by the user on the terminal device or server. The initial data set is illustratively a data set stored by a user on a network disk through a terminal device such as a mobile phone or a computer, for example, a data set under a collection or cut file folder.
In some alternative implementations, the data in the initial dataset may also include data that is not pre-collected or cut, such as data currently being found or obtained from other applications outside of the network disk, such as WeChat, or web pages, or from other devices.
The pre-extracted data may be various forms of data, for example, may include at least one form of data including documents, audio, video, images, text, etc., and may also include forms of formulas, charts, etc. The documents, audio, video and text may include any language content, such as modern chinese, dialect, oracle, etc., minority languages such as Tibetan, etc., and other languages such as english, italian, etc.
Correspondingly, the original file to which the pre-extracted data belongs can comprise at least one form of a document, audio, video, image, webpage link (namely, the original file is the content displayed in the webpage link), and the content can comprise at least one of characters, voice, image, video, formulas, charts and the like.
The file identification of the original file to which the pre-extracted data belongs can comprise at least one of a file name, a file address (including a network address, a local storage address and the like), a release time, an author and the like.
In some alternative implementations, the pre-extracted data in the initial dataset may include at least part of the original data in the original file, for example, data such as a prompt word, a sentence, a paragraph, a chapter, or the like in a document file, or at least part of material in a picture, or a screen shot or a video clip in a video, or at least part of audio clip in voice data, and may further include reproduced data such as summary information generated according to at least part of the original file, for example, a brief summary of the complete content of a document or a video, a view summary, a comment description of a character or person in a document or a video file, and the like.
The position identification of the pre-extraction data in the original file to which the pre-extraction data belongs comprises a page number or a time stamp of the pre-extraction data in the original file and the like. For example, the page number or number and number of lines of a sentence or paragraph in its original document, and for example, the start time position (e.g., the 5 th minute 8 second position) or the start-stop time position of a sentence in its original video document, and for example, the pixel position or orientation information of an image element in an image document.
In some optional implementations, the pre-extracted data may be the whole original file or a summary of the whole original file, where the location identifier of the corresponding pre-extracted data in the original file is the whole file or other identifier information characterizing the whole file.
Step S202, determining target data from the pre-extracted data according to the received data processing request, and determining style characteristics of the target data.
In the embodiment of the present disclosure, the execution subject of the data processing method, such as the terminal device 101 or the server 103 shown in fig. 1, determines target data from the pre-extracted data of the initial data set acquired in step S201 according to the received data processing request, and determines the style characteristics of the target data.
The data processing request may include at least one of generating a document file associated with a theme or a field, generating a video file associated with a content, generating an image file of a preset scene theme, generating a text segment, and the like.
For example, the data processing request received by the executing body may be to compose a financial research report, or generate a growing video of XXX (name of person), or adjust color matching for a photo album according to shooting time, weather, etc., or generate an experience record of "document+voice" for a product, etc.
In some alternative implementations, the data processing request may also include summarizing or extracting portions of the content for the generated document or video type of file content. For example, a summary or a visual word is generated for the generated document or video, and the number of words, the language restriction, etc. may be increased. For another example, a certain viewpoint, knowledge point, or the like in the generated document, video, audio is extracted to form a piece of text.
The target data is at least part of all the pre-extraction data in the initial data set.
For example, the execution subject may filter, as the target data, pre-extracted data that is adapted to the identification information related to the data processing request from the initial data set according to the identification information of the subject, the field, the keyword, and the like in the data processing request.
In some alternative implementations of embodiments of the present disclosure, determining target data from pre-fetch data based on a received data processing request includes obtaining the data processing request, identifying a data content identifier in the data processing request, and determining target data from the pre-fetch data based on the data content identifier.
In the implementation mode, after the execution main body acquires the data processing request, the execution main body identifies a data content identifier in the data processing request and is used for determining the data content aimed at by the data processing request, and then target data is determined from pre-extracted data according to the data content identifier.
The data content identifiers comprise content identifiers such as data topics, data fields, data names and the like. For example, the data processing request is "write a financial class study report", the data content of which is identified as "financial class", and the execution subject identifies pre-extracted data related to "financial class" from the pre-extracted data, and determines the pre-extracted data as target data. For another example, the data processing request is "write a wugongshan Qiu Youji", whose data content is identified as "wugongshan" and "autumn tour", and the execution subject is configured to screen the pre-extracted data such as text data, image data, video data, voice data, etc. related to "wugongshan" and "autumn tour" from the initial data set, and determine the pre-extracted data as the target data.
In the embodiment of the disclosure, the execution body screens the pre-extracted data associated with the data content identifier as the target data according to the data content identifier in the data processing request, so that the accuracy and the relevance of the data processing are effectively improved.
In some alternative implementations, determining the target data from the pre-extracted data according to the data content identification includes obtaining an extracted content identification of the pre-extracted data, and screening the pre-extracted data with a degree of association greater than or equal to a preset degree of association threshold from the initial dataset as the target data according to the degree of association between the extracted content identification and the data content identification.
The execution main body effectively guarantees the content association degree between the target data and the data processing request through the association degree between the extracted content identification of the pre-extracted data and the data content identification of the data processing request, thereby effectively guaranteeing that the screened target data accords with the data requirement of the data processing request and improving the accuracy and association degree of the data processing.
In some alternative implementations of embodiments of the present disclosure, determining the style characteristics of the target data includes inputting the target data into a pre-trained language model to obtain the style characteristics of the target data.
In this implementation manner, the execution subject inputs the determined target data into a pre-trained language model, and obtains style characteristics of the target data. The pre-trained language model can be a neural network model trained for different data types, or a large language model in the related art.
The style characteristics of the target data are used for representing the work style of the target data, such as the language style, description style and the like of text data, the composition style, color matching style, image parameter style and the like of image data, the language region style, voice characteristics (such as Raili voice, low voice, high voice and the like) and the like of voice data, and the scene style, shooting style, scene switching style and the like of video data.
In the implementation mode, the execution main body integrates the style characteristics of the target data by identifying the style characteristics of the target data in the data processing process, so that the corresponding style characteristics in the data processing result are ensured, the intelligent degree of the data processing process and the uniqueness of the data processing result are improved, and the data processing result is more close to the expectation corresponding to the data processing request.
Step S203, determining the data to be processed from the corresponding original file according to the position identification of the target data.
In the embodiment of the present disclosure, the execution body of the data processing method, for example, the terminal device 101 or the server 103 shown in fig. 1, determines, according to the location identifier of the target data, the data to be processed from the original file corresponding to the target data.
Since the target data is determined from the pre-extraction data, the execution subject can determine the location identification of the pre-extraction data corresponding to the target data as the location identification of the target data.
After the execution main body determines the target data from the pre-extracted data, extracting the associated data from the original file to which the target data belongs according to the position identification of the target data as the data to be processed, so as to ensure the information integrity of the data to be processed, thereby improving the accuracy and the integrity of the data processing result.
Illustratively, the data to be processed includes at least one of raw pre-fetch data as target data, associated data of the target data in the belonging raw file.
In some optional implementations, the original pre-extracted data serving as the target data is a partial chapter in the original file or a complete image in the original image file, and after the pre-extracted data is determined to be the target data, the executing body may directly use the pre-extracted data as the data to be processed.
In some optional implementations, the original pre-extraction data serving as the target data is a summary of at least part of contents in the original file, for example, a content summary generated for the original document file or a content summary of the original video file, etc., and the execution subject may directly use the pre-extraction data as the data to be processed, or may use the pre-extraction data and the original file to which the pre-extraction data belongs together as the data to be processed.
In some alternative implementations, the original pre-extracted data that is the target data is tagged content in the original file, e.g., a point of view listing, tagged sentences or keywords generated for the original document file, and further e.g., a partial scenario description, a recording of a certain shot or scene, etc. for the original video file. At this time, the executing body obtains the associated data of the target data from the original file to which the executing body belongs according to the location identifier of the target data (i.e., the location identifier of the original pre-extracted data), for example, the viewpoint original text, the context content of the original text where the mark sentence or the keyword is located, etc., in the original file, for example, the scenario selection video, the associated video of the pre-set length, the scenario association video, the character video, the video scenario associated with a certain scene, etc., and then the executing body may use the target data and the associated data of the relevant data in the original file as the data to be processed.
In the embodiment of the disclosure, the execution main body extracts the data to be processed from the corresponding original file according to the position identification of the target position, so that the integrity of the data to be processed can be effectively ensured, and the integrity of the data processing result is ensured.
Step S204, the data to be processed is processed according to the data processing request and the style characteristics, and a data processing result is obtained.
In the embodiment of the present disclosure, the execution body of the data processing method, for example, the terminal device 101 or the server 103 shown in fig. 1, processes the data to be processed according to the data processing request and the style characteristics of the target data, and obtains the data processing result.
Because the target data is at least part of the pre-extracted data in the initial data set, the executing body can consider the style of the target data as the style approved by the user. In this case, the execution body processes the data to be processed according to the data processing request, so that the data processing result can be effectively ensured to meet the expectation of the data processing request, and meanwhile, the style of the data approved by the user is simulated by combining the style characteristics of the target data, so that the data processing result can be effectively ensured to meet the style characteristics of the target data, the personal creation characteristic of the data processing result is improved, and the intelligent degree of the data processing is improved.
According to the data processing method provided by the embodiment of the disclosure, the target data is determined from the pre-extracted data of the initial data set according to the received data processing request, and the associated data is acquired from the original file to which the target data belongs as the data to be processed by combining the position identifier of the target data, so that the accuracy and the integrity of the processed data can be effectively ensured, meanwhile, the style of the data processing result is effectively ensured by combining the style characteristics of the target data in the data processing process, and the intelligent degree of the data processing and the unique effect of the data processing result are improved.
It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, etc. of the related personal information of the user all conform to the rules of the related laws and regulations, and do not violate the public welcome.
Fig. 3 shows a flow 300 of a second embodiment of a data processing method according to the present disclosure, with reference to fig. 3, comprising the steps of:
Step S301, an initial data set is obtained, wherein the initial data set comprises pre-extraction data, a file identifier of an original file to which the pre-extraction data belongs and a position identifier of the pre-extraction data in the original file.
In the embodiment of the present disclosure, the execution body of the data processing method, such as the terminal device 101 or the server 103 shown in fig. 1, acquires the initial data set from the network device or the network server, or from the local storage space, through a wired or wireless manner.
Step S301 is substantially identical to step S201 in the embodiment shown in fig. 2, and the detailed implementation may refer to the foregoing description of step S201, which is not repeated herein.
Step S302, determining target data from the pre-extracted data according to the received data processing request, and determining style characteristics of the target data.
In the embodiment of the present disclosure, the execution subject of the data processing method, such as the terminal device 101 or the server 103 shown in fig. 1, determines target data from the pre-extracted data of the initial data set acquired in step S201 according to the received data processing request, and determines the style characteristics of the target data.
Step S302 is substantially identical to step S202 in the embodiment shown in fig. 2, and the detailed implementation may refer to the foregoing description of step S202, which is not repeated herein.
Step S303, determining the data to be processed from the corresponding original file according to the position identification of the target data.
In the embodiment of the present disclosure, the execution body of the data processing method, for example, the terminal device 101 or the server 103 shown in fig. 1, determines, according to the location identifier of the target data, the data to be processed from the original file corresponding to the target data.
Step S303 is substantially identical to step S203 of the embodiment shown in fig. 2, and the detailed implementation may refer to the foregoing description of step S203, which is not repeated herein.
Step S304, a target type and an operation type in the data processing request are identified.
In the embodiment of the present disclosure, an execution subject of the data processing method, such as the terminal device 101 or the server 103 shown in fig. 1, identifies a target type and an operation type in the received data processing request.
Illustratively, the target type includes a data type of the desired result corresponding to the data processing request. Such as reports, papers, videos, albums, voices, etc.
Illustratively, the operation types include processing operation types for data, such as writing, correcting, repairing, generating video, intercepting video, generating audio corresponding to different roles according to text descriptions or introductions, and the like.
In some alternative implementations of embodiments of the present disclosure, the target type includes at least one of a text type (e.g., a text segment, a document file, etc.), an image type, a video type, an audio type, a chart type (e.g., a mind map, a list, etc.), etc., and the operation type includes at least one of text editing, text generation, image processing, image generation, video editing, video generation, audio extraction, or audio generation, etc.
In some alternative implementations, the target type and the operation type may also include other achievable target types and operation types not mentioned above, which are not limited herein.
In step S305, preprocessing data is determined according to the target type and the data to be processed, and the data type of the preprocessing data is adapted to the target type.
In the embodiment of the present disclosure, the execution body of the data processing method, for example, the terminal device 101 or the server 103 shown in fig. 1, performs preprocessing on the data to be processed according to the target type in the data processing request, so as to obtain preprocessed data with a data type that is adaptive to the target type.
In some optional implementations of the disclosed embodiments, determining the pre-processed data from the target type and the to-be-processed data includes determining a candidate type that is adapted to the target type from the target type, determining the to-be-processed data as the pre-processed data in response to an initial type of the to-be-processed data being adapted to the candidate type, identifying to-be-processed content of the to-be-processed data in response to the initial type of the to-be-processed data being not adapted to the candidate type, and compiling the to-be-processed content to the pre-processed data that is adapted to the candidate type.
In this implementation manner, the process that the execution body performs preprocessing on the data to be processed according to the target type is a process of adjusting the data type of the data to be processed, so as to ensure that the adjusted data type of the preprocessed data is adapted to the target type.
The execution body determines a candidate type conforming to the target type according to the target type in the data processing request. For example, the target type is a report in a text type, and candidate types consistent with the report include text, images, and charts.
The execution body then determines an initial type of the data to be processed, and if the initial type of the data to be processed is matched with the candidate types, for example, meets at least one of the candidate types, the data to be processed can be directly determined as the pre-processed data, and if the initial type of the data to be processed is not matched with the candidate types, namely, the initial type of the data to be processed does not belong to or does not comprise any one of the candidate types, the content to be processed of the data to be processed is identified, and the content to be processed is compiled into the pre-processed data meeting the candidate types.
For example, if the candidate type includes text, image, icon, and formula, the initial type of the data to be processed is video, the executing body confirms that the initial type of the data to be processed is not suitable for the candidate type, and then identifies the content to be processed of the video, including identifying the image content and the audio content of the video, compiling the image content into image data, compiling the audio content into text data, and taking the image data and the text data generated after compiling as preprocessing data.
Wherein the process of compiling the data type of the data to be processed by the execution body can be executed by adopting a pre-trained neural network model,
In the implementation mode, the execution main body performs preprocessing on the data to be processed according to the target type in the data processing request so as to ensure that the data type of the preprocessed data is adaptive to the target type, the data to be processed is adaptive to the data processing request on the data type, the accuracy and the processing efficiency of the data processing can be effectively improved, and particularly, under the condition that a plurality of pieces of data to be processed exist, the processing rate of the data content and the accuracy of the data processing result can be effectively improved by performing adaptive adjustment on the data type of the data to be processed.
Step S306, preprocessing data according to the operation type to obtain a data processing result.
In the embodiment of the present disclosure, the execution subject of the data processing method, such as the terminal device 101 or the server 103 shown in fig. 1, processes the pre-processed data obtained in step S305 according to the operation type identified in step S304, thereby obtaining a data processing result.
Illustratively, the processing of the pre-processed data by the execution body may include ordering, stitching, coloring, and the like. For example, the contents of a plurality of pieces of preprocessed text data, image data and the like are sequenced according to the relevance between adjacent data, then are spliced in sequence, the corresponding transition sentences are colored at the spliced position, and the summary sentences are added to be colored adaptively at the starting position and the tail position of the spliced file, so that the expected data processing results such as the report file and the like are obtained.
For another example, a plurality of pieces of preprocessed image data, video data, text data and/or audio data are sequenced and spliced according to the relevance of the content, spliced data are obtained, the text data and/or the audio data are adjusted according to the spliced data, and then video fusion is carried out on the text data and/or the audio data and the spliced data, so that a video file with subtitles and/or audio is obtained as a data processing result.
After the execution main body preprocesses the data to be processed, the data type of the preprocessed data is matched with the target type of the data processing request, and then the preprocessed data is processed according to the operation type in the data processing request, so that the data processing efficiency and the accuracy of the processing result can be improved.
The processing procedure of the execution body on the preprocessed data may be implemented by using a preset algorithm or a pre-trained neural network model, or may be implemented by using other technologies in the related field, which is not limited herein.
In some optional implementations of the embodiments of the present disclosure, processing the pre-processed data according to the operation type to obtain a data processing result includes determining a pre-trained data processing model according to the operation type, and inputting the pre-processed data to the pre-trained data processing model to obtain the data processing result.
In this implementation manner, the executing body determines the pre-trained data processing model according to the operation type in the data processing request, and then inputs the pre-processed data book into the pre-trained data processing model, so as to obtain a data processing result, and thus, the data processing efficiency can be effectively improved.
The data processing model may be a special neural network model for corresponding data operation types, such as pre-trained text editing or video generation, or a large language model in the related art, which is not limited herein.
In the data processing method provided by the embodiment of the disclosure, the execution main body performs preprocessing on the data to be processed according to the target type in the data processing request to obtain the preprocessed data matched with the target type, and then further processes the preprocessed data according to the operation type in the data processing request to obtain the data processing result of the target type, so that the accuracy and the processing efficiency of the data processing are effectively ensured under the condition that the data type of the data processing result meets the data processing request.
Step S307, a data output request is acquired.
In the embodiment of the present disclosure, the execution body of the data processing method, such as the terminal device 101 or the server 103 shown in fig. 1, may further obtain a data output request before or after executing the steps S304 to S306, so as to output corresponding data in time for users to use, such as browsing or consuming, according to the user requirements.
The data output request may be used to request to output data in the initial data set, including pre-extracted data, an original file to which the pre-extracted data belongs, and may also be used to request to output data to be processed determined according to the data processing request, and may also be used to request to output a data processing result. Therefore, the data output request may be received during any one of the data processing methods, or may be received after the data processing steps are completed, and the data at any one stage of the data processing steps may be output according to the data output request.
Step S308, determining the target equipment according to the equipment identification in the data output request.
In the embodiment of the present disclosure, the execution subject of the data processing method, such as the terminal device 101 or the server 103 shown in fig. 1, determines the target device for data output according to the device identification in the data output request acquired in step S307.
In the data output request, a device identification for outputting the data may be included. The executing body determines the target device of the data output by identifying the device identification in the data output request.
Illustratively, the device identification may include a device type, a device name, a device number, and the like. For example, the device name or the device number in the past login history list of the network disk account number may be the device name or the device number in the same network cluster, or the device identifier such as the unused or unconnected device type or device name may be the device identifier.
For the device identification in the past login history list or the history connection list, the executing body can directly determine the target device through the login history list or the history connection list, and for the device identification which is the same network cluster or is not used or connected, the executing body firstly determines the target device through the device identification and then connects the target device through the network cluster or other modes.
Step S309, outputting the data to be processed and/or the data processing result to the target device.
In the embodiment of the present disclosure, the execution subject of the data processing method, such as the terminal device 101 or the server 103 shown in fig. 1, outputs the data to be processed and/or the data processing result to the target device determined in step S308.
After determining the target device and establishing connection with the target device, the execution body outputs the requested data to the target device according to the received data output request. The data requested by the data output request may be at least one of pre-extracted data in the initial data set, an original file to which the pre-extracted data belongs, data to be processed determined according to the data processing request, a data processing result, and the like.
For example, the user may transmit a data output request for pre-extracted data in the initial data set, and the execution body determines data to be output from the initial data set and determines a target device from the data output request after receiving the data output request, and then outputs the data to be output to the target device.
For example, the user may send a data output request for a data processing result, for example, a data output request for a generated research report, the execution body obtains a device identifier in the data output request after receiving the data output request, determines a target device, for example, a notebook computer, a mobile phone, a television, or the like, and then outputs the generated research report to the target device.
The user can check the data sent by the execution body on the target device and then browse, modify or consume the data.
It should be noted that the device identifier in the data output request may include one or multiple devices at the same time, that is, the execution body may send data to multiple target devices for sharing and use.
For example, if the executing body is a web server, the user may collect and browse the received data through a corresponding web application on the target device, and if the target device does not have a corresponding web application, the user may choose to install the web application or choose other corresponding applications to collect and browse the received data, for example, for a document file, the user may choose wps or Microsoft applications to collect and browse the document file, for a video file, the user may choose other types of video players to collect and browse, for an image file, the user may choose an image viewing or editing application to collect and browse or edit.
According to the data processing method provided by the embodiment of the disclosure, before or after data processing, a data output request can be received, and corresponding data is output to target equipment according to the equipment identifier in the data output request, so that the flow property and the interactivity of the data are improved, users can browse and consume the corresponding data on different equipment conveniently, and the use mode and the use rate of the data are improved.
The data processing method in the above embodiment processes the data in the initial data set and the data in the original file to which the data belongs, and the present disclosure also provides a data processing method for obtaining the initial data set so as to execute the above data processing process.
Fig. 4 illustrates a flow 400 of one embodiment of a data processing method according to the present disclosure, which is a process of processing data to obtain an initial data set. Referring to fig. 4, the data processing method includes the steps of:
Step S401, a data extraction request and candidate extraction data are acquired.
In the embodiment of the present disclosure, an execution subject of the data processing method, such as the terminal device 101 or the server 103 shown in fig. 1, acquires a data extraction request and candidate extraction data corresponding to the data extraction request.
In this scheme, the candidate extraction data may be data in all terminals or applications within the data acquisition authority of the execution subject.
By taking a network disk as an example, the executing body is taken as a network disk server, the corresponding candidate extraction data can be data stored in the network disk, can also be data in other application programs outside the network disk, such as chat records, articles, videos and the like in interactive applications such as WeChat and the like, can also be image data, video data, template data and the like in image or video generation or editing software such as cameras, B stations, beauty shows or clips and the like, and can also be image data, text and/or chart data and the like in various browser, reading or notebook applications.
Accordingly, the data extraction request may include a collection request of the user after the screenshot operation, a collection request after the video clip, a cut request after the text data or the file is selected, a cut request after the audio data is generated or clipped, and the like.
In some alternative implementations, the data extraction request may also include a compile-type data processing request. Illustratively, the data extraction request includes generating general description data, such as overall content summaries, opinion summaries, role summaries, or the like, from the selected data or file.
In some alternative implementations, the data extraction request may also include a tag-type data processing request. Illustratively, the data extraction request includes labeling keywords or accent sentences, such as accent data or arrays, sentences indicating views, etc., from the candidate extraction data.
Step S402, determining an original file to which the candidate extraction data belongs and a position identification of the candidate extraction data in the original file.
In the embodiment of the present disclosure, after receiving the candidate extraction data, the execution body of the data processing method, for example, the terminal device 101 or the server 103 shown in fig. 1, determines the original file to which the candidate extraction data belongs and the location identifier of the candidate extraction data in the original file if the candidate extraction data is not a complete original file, for example, a partial chapter of the file, a summary of partial content, a key paragraph or a sentence, and the like.
The candidate extraction data and the original file to which the candidate extraction data belong may be the same or different in data type. For example, the candidate extraction data is image data, and the original file to which the candidate extraction data belongs can be a document file or a video file or an image file, and for another example, the candidate extraction data is audio data, and the original file to which the candidate extraction data belongs can be a video file or an audio file.
In this scheme, after receiving the candidate extraction data, the execution body determines the original file to which the candidate extraction data belongs, and may determine and mark the file identifier corresponding to the original file, for example, the file name, the file type, the link or the address of the location where the file is located, and the like.
Illustratively, the location identifier of the candidate extraction data in the original file includes a page number identifier, a page number and paragraph identifier, or a line number identifier of the candidate extraction data in the original document file, or a timestamp of the candidate extraction data in the original video file or the original audio file, such as a duration identifier of a position appearing in the video or the audio, or an azimuth identifier of the candidate extraction data in the original image data.
Step S403, candidate extraction data is processed according to the data extraction request, and pre-extraction data is obtained.
In the embodiment of the present disclosure, an execution subject of the data processing method, such as the terminal device 101 or the server 103 shown in fig. 1, processes the candidate extraction data according to the data extraction request, resulting in pre-extraction data.
Illustratively, the executing body processes the candidate extracted data according to the extraction type in the received data extraction request to obtain pre-extracted data. Wherein the extraction type in the data extraction request includes a data type and a processing type. Illustratively, the data type is a data type of the pre-extraction data, and the processing type is an operation type corresponding to data processing required for obtaining the pre-extraction data based on the candidate extraction data.
In some alternative implementations, the data type in the data extraction request is the same as the data type of the candidate extraction data, e.g., the data type in the data extraction request and the data type of the candidate extraction data are each a text type, a video type, an audio type.
Illustratively, in this implementation, the execution body pre-processes the candidate extraction data according to the type of processing in the data extraction request, resulting in pre-extraction data. For example, the processing types are summary, mark, etc., and the execution subject performs corresponding processing on the candidate extraction data to obtain pre-extraction data.
In some alternative implementations, the data type in the data extraction request is different from the data type of the candidate extraction data, e.g., the data type in the data extraction request is a text type, the data type of the candidate extraction data is a video type or an audio type, and further e.g., the data type of the data extraction request is an image type, the data type of the candidate extraction data is a video type.
In this implementation manner, the execution body performs type conversion on the data content of the candidate extraction data to obtain conversion data with a data type consistent with that of the data extraction request, and then performs corresponding processing operation according to the processing type of the data extraction request to obtain corresponding pre-extraction data.
In some optional implementations of the embodiments of the disclosure, processing candidate extraction data according to a data extraction request to obtain pre-extraction data includes determining the candidate extraction data as pre-extraction data in response to the data extraction request not including the pre-processing request, determining a processing type of the pre-processing request in response to the data extraction request including the pre-processing request, determining compiled data corresponding to the candidate extraction data in response to the processing type being a compiling type, determining the compiling data as pre-extraction data, and determining tag data in the candidate extraction data in response to the processing type being a tag type, and determining the tag data as pre-extraction data.
In the implementation mode, the execution main body can directly determine the candidate extraction data as the pre-extraction data without further processing when determining that the data extraction request does not comprise the pre-processing request, and when determining that the data extraction request comprises the pre-processing request, the execution main body firstly determines the processing type of the pre-processing request and processes the candidate extraction data according to the processing type to obtain the corresponding pre-extraction data.
The execution main body preprocesses the candidate extracted data according to the preprocessing request and the processing type in the data extraction request, thereby effectively ensuring the accuracy of the extracted data, avoiding the excessive unnecessary data from occupying the storage space and saving the resources.
Step S404, the pre-extraction data, the file identification and the position identification of the original file are stored in a correlated manner, and an initial data set is obtained.
In the embodiment of the present disclosure, the execution body of the data processing method, for example, the terminal device 101 or the server 103 shown in fig. 1, performs association and storage on the pre-extracted data obtained in step S403 and the file identifier of the original file to which the pre-extracted data belongs and the position identifier of the corresponding candidate extracted data in the original file, so as to obtain an initial data set.
According to the data processing method provided by the embodiment of the disclosure, the original file corresponding to the candidate extraction data and the position identification thereof in the original file are determined according to the received data extraction request and the corresponding candidate extraction data, the candidate extraction data is preprocessed according to the data extraction request to obtain pre-extraction data, and the pre-extraction data, the corresponding original file and the position identification thereof in the original file are associated and stored to obtain an initial data set. The scheme not only saves the pre-extraction data, but also correspondingly saves the original file corresponding to the pre-extraction data and the position identification thereof in the original file, and compared with the traditional scheme of full-text storage of the original file, the scheme can effectively reduce the requirement on storage space, save resources, and simultaneously facilitate the understanding of the extracted content according to the pre-extraction data, the acquisition of the original file to which the pre-extraction data belongs and the associated content in the original file, and improve the data utilization efficiency.
Fig. 5a to 5c respectively show a schematic diagram of data in the data processing method according to the embodiment of the present disclosure, where fig. 5a is a schematic diagram of extracting content in a browsed web page, fig. 5b is a schematic diagram of extracting related content of a public number article in interactive software, and fig. 5c is a schematic diagram of partial data in an initial data set obtained by the data processing method according to the present disclosure.
As an implementation of the method shown in the above figures, fig. 6 shows an embodiment of a data processing apparatus according to the present disclosure. The data processing apparatus 600 corresponds to the method embodiment shown in fig. 2 and 3, and the apparatus may be applied to various electronic devices.
Referring to fig. 6, a data processing apparatus 600 provided in an embodiment of the present disclosure includes a first acquisition module 601, a first determination module 602, a second determination module 603, and a first processing module 604. The first obtaining module 601 is configured to obtain an initial data set, wherein the initial data set comprises pre-extraction data, a file identifier of an original file to which the pre-extraction data belongs and a position identifier of the pre-extraction data in the original file, the first determining module 602 is configured to determine target data from the pre-extraction data according to a received data processing request and determine style characteristics of the target data, the second determining module 603 is configured to determine data to be processed from the corresponding original file according to the position identifier of the target data, and the first processing module 604 is configured to process the data to be processed according to the data processing request and the style characteristics to obtain a data processing result.
In the data processing apparatus 600 of the embodiment of the present disclosure, the specific processes of the first acquisition module 601, the first determination module 602, the second determination module 603, and the first processing module 604 and the technical effects thereof may refer to the relevant descriptions of steps S201 to S204 in the corresponding embodiment of fig. 2, and are not repeated herein.
In some optional implementations of the embodiments of the present disclosure, the first determination module includes an acquisition unit, a first identification unit, and a first determination unit. The acquisition unit is configured to acquire the data processing request, the first unit is configured to identify a data content identifier in the data processing request, and the first determination unit is configured to determine target data from the pre-extracted data according to the data content identifier.
In some optional implementations of embodiments of the present disclosure, the first determination module further includes an obtaining unit configured to input the target data into the pre-trained speech model to obtain the style characteristics of the target data.
In some optional implementations of embodiments of the present disclosure, the first processing module includes a second identification unit, a second determination unit, and a processing unit. The data processing system comprises a first identification unit, a second identification unit and a processing unit, wherein the first identification unit is configured to identify a target type and an operation type in a data processing request, the second identification unit is configured to identify pre-processing data according to the target type and data to be processed, the data type of the pre-processing data is adapted to the target type, and the processing unit is configured to process the pre-processing data according to the operation type to obtain a data processing result.
In the data processing apparatus of the embodiment of the present disclosure, the specific processes of the second identifying unit, the second determining unit and the processing unit and the technical effects thereof may refer to the relevant descriptions of steps S304 to S306 in the corresponding embodiment of fig. 3, and are not repeated herein.
In some alternative implementations of embodiments of the present disclosure, the object type includes at least one of a text type, an image type, a video type, an audio type, and the operation type includes at least one of a text edit, an image processing, a video clip, a video generation, an audio extraction, or an audio generation.
In some optional implementations of the disclosed embodiments, the second determining unit is configured to determine a candidate type adapted to the target type according to the target type, determine the data to be processed as pre-processed data in response to the initial type of the data to be processed being adapted to the candidate type, identify the content to be processed of the data to be processed in response to the initial type of the data to be processed being not adapted to the candidate type, and compile the content to be processed into pre-processed data conforming to the candidate type.
In some alternative implementations of embodiments of the present disclosure, the processing unit is configured to determine a pre-trained data processing model based on the type of operation and input pre-processed data to the pre-trained data processing model to obtain data processing results.
In some optional implementations of the embodiments of the present disclosure, the data processing apparatus further includes a second acquisition module, a third determination module, and an output module. The data processing system comprises a first acquisition module, a second acquisition module, a third determination module and an output module, wherein the first acquisition module is configured to acquire a data output request, the third determination module is configured to determine a target device according to a device identifier in the data output request, and the output module is configured to output data to be processed and/or a data processing result to the target device.
In the data processing apparatus of the embodiment of the present disclosure, the specific processing of the second obtaining module, the third determining module, and the output module and the technical effects thereof may refer to the description related to steps S307 to S309 in the corresponding embodiment of fig. 3, which is not repeated herein.
Fig. 7 illustrates one embodiment of a data processing apparatus according to the present disclosure. The data processing apparatus 700 corresponds to the method embodiment shown in fig. 4, and the apparatus may be applied to various electronic devices.
Referring to fig. 7, a data processing apparatus 700 provided in an embodiment of the present disclosure includes a third obtaining module, a fourth determining module, a second processing module, and a saving module. The method comprises the steps of obtaining a data extraction request and candidate extraction data, determining an original file to which the candidate extraction data belong and a position identifier of the candidate extraction data in the original file, processing the candidate extraction data according to the data extraction request to obtain pre-extraction data, and storing the pre-extraction data, the file identifier of the original file and the position identifier in an associated mode to obtain an initial data set.
In the data processing apparatus according to the embodiment of the present disclosure, specific processes of the third obtaining module, the fourth determining module, the second processing module, and the storage module and technical effects thereof may refer to the description of steps S401 to S404 in the corresponding embodiment of fig. 4, which is not repeated herein.
In some optional implementations of the disclosed embodiments, the second processing module is configured to determine candidate extraction data as pre-extraction data in response to the data extraction request not including the pre-processing request, determine a processing type of the pre-processing request in response to the data extraction request including the pre-processing request, determine compiled data corresponding to the candidate extraction data in response to the processing type being a compiled type, determine the compiled data as pre-extraction data, and determine tag data in the candidate extraction data in response to the processing type being a tag type, and determine the tag data as pre-extraction data.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
Various components in the device 800 are connected to the I/O interface 805, including an input unit 806, such as a keyboard, a mouse, etc., an output unit 807, such as various types of displays, speakers, etc., a storage unit 808, such as a magnetic disk, optical disk, etc., and a communication unit 809, such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the respective methods and processes described above, such as a data processing method. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When a computer program is loaded into RAM 803 and executed by computing unit 801, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the data processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special or general purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user, for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback), and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a Local Area Network (LAN), a Wide Area Network (WAN), and the Internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (21)

1. A data processing method, comprising:
Acquiring an initial data set, wherein the initial data set comprises pre-extraction data, a file identifier of an original file to which the pre-extraction data belongs and a position identifier of the pre-extraction data in the original file;
Determining target data from the pre-extracted data according to the received data processing request, and determining style characteristics of the target data;
Determining data to be processed from the corresponding original file according to the position identification of the target data;
The method comprises the steps of identifying a target type and an operation type in a data processing request, determining preprocessing data according to the target type and data to be processed, wherein the data type of the preprocessing data is suitable for the target type, and processing the preprocessing data according to the operation type and combining style characteristics of the target data to obtain a data processing result.
2. The method of claim 1, wherein the determining target data from the pre-fetch data in accordance with the received data processing request comprises:
Acquiring a data processing request;
Identifying a data content identifier in the data processing request;
And determining the target data from the pre-extracted data according to the data content identification.
3. The method of claim 1, wherein the determining the style characteristic of the target data comprises:
and inputting the target data into a pre-trained language model to obtain style characteristics of the target data.
4. The method of claim 1, wherein the target type comprises at least one of a text type, an image type, a video type, an audio type, and
The operation type includes at least one of text editing, image processing, video clipping, video generation, audio extraction, or audio generation.
5. The method of claim 1, wherein the determining pre-processing data from the target type and the data to be processed comprises:
determining a candidate type adapted to the target type according to the target type;
Determining the data to be processed as pre-processing data in response to the initial type of the data to be processed being matched with the candidate type;
and identifying the to-be-processed content of the to-be-processed data in response to the fact that the initial type of the to-be-processed data is not matched with the candidate type, and compiling the to-be-processed content into the pre-processed data conforming to the candidate type.
6. The method of claim 1, wherein the processing the pre-processed data according to the operation type to obtain a data processing result comprises:
Determining a pre-trained data processing model according to the operation type;
and inputting the preprocessing data into the pre-trained data processing model to obtain the data processing result.
7. The method of any of claims 1-6, further comprising:
acquiring a data output request;
determining target equipment according to the equipment identification in the data output request;
and outputting the data to be processed and/or the data processing result to the target equipment.
8. A data processing method, comprising:
acquiring a data extraction request and candidate extraction data;
Determining an original file to which the candidate extraction data belong and a position identifier of the candidate extraction data in the original file;
Processing the candidate extraction data according to the data extraction request to obtain pre-extraction data;
the pre-extraction data, the file identification of the original file and the position identification are stored in an associated mode, and an initial data set is obtained;
acquiring the initial data set;
Determining target data from the pre-extracted data according to the received data processing request, and determining style characteristics of the target data;
Determining data to be processed from the corresponding original file according to the position identification of the target data;
The method comprises the steps of identifying a target type and an operation type in a data processing request, determining preprocessing data according to the target type and data to be processed, wherein the data type of the preprocessing data is suitable for the target type, and processing the preprocessing data according to the operation type and combining style characteristics of the target data to obtain a data processing result.
9. The method of claim 8, wherein the processing the candidate extraction data according to the data extraction request to obtain pre-extraction data comprises:
determining the candidate extraction data as pre-extraction data in response to the data extraction request not including a pre-processing request;
Determining a processing type of the preprocessing request in response to the preprocessing request included in the data extraction request;
Determining the compiling data corresponding to the candidate extracted data in response to the processing type being a compiling type, and determining the compiling data as pre-extracted data;
and determining the tag data in the candidate extraction data in response to the processing type being a tag type, and determining the tag data as pre-extraction data.
10. A data processing apparatus comprising:
The first acquisition module is configured to acquire an initial data set, wherein the initial data set comprises pre-extraction data, a file identifier of an original file to which the pre-extraction data belongs and a position identifier of the pre-extraction data in the original file;
a first determining module configured to determine target data from the pre-extracted data according to the received data processing request, and determine style characteristics of the target data;
the second determining module is configured to determine data to be processed from the corresponding original file according to the position identification of the target data;
a first processing module comprising:
a second identifying unit configured to identify a target type and an operation type in the data processing request;
a second determining unit configured to determine preprocessing data according to the target type and the data to be processed, the data type of the preprocessing data being adapted to the target type;
and the processing unit is configured to process the preprocessing data according to the operation type and in combination with the style characteristics of the target data to obtain a data processing result.
11. The apparatus of claim 10, wherein the first determination module comprises:
An acquisition unit configured to acquire a data processing request;
A first identifying unit configured to identify a data content identification in the data processing request;
A first determining unit configured to determine the target data from the pre-extracted data according to the data content identification.
12. The apparatus of claim 10, wherein the first determination module comprises:
And the obtaining unit is configured to input the target data into a pre-trained language model to obtain style characteristics of the target data.
13. The apparatus of claim 10, wherein the target type comprises at least one of a text type, an image type, a video type, an audio type, and
The operation type includes at least one of text editing, image processing, video clipping, video generation, audio extraction, or audio generation.
14. The apparatus of claim 10, wherein the second determination unit is configured to:
determining a candidate type adapted to the target type according to the target type;
Determining the data to be processed as pre-processing data in response to the initial type of the data to be processed being matched with the candidate type;
and identifying the to-be-processed content of the to-be-processed data in response to the fact that the initial type of the to-be-processed data is not matched with the candidate type, and compiling the to-be-processed content into the pre-processed data conforming to the candidate type.
15. The apparatus of claim 10, wherein the processing unit is configured to:
Determining a pre-trained data processing model according to the operation type;
and inputting the preprocessing data into the pre-trained data processing model to obtain the data processing result.
16. The apparatus of any of claims 10-15, further comprising:
a second acquisition module configured to acquire a data output request;
A third determining module configured to determine a target device according to the device identifier in the data output request;
And the output module is configured to output the data to be processed and/or the data processing result to the target equipment.
17. A data processing apparatus comprising:
a third acquisition module configured to acquire the data extraction request and the candidate extraction data;
A fourth determining module configured to determine an original file to which the candidate extraction data belongs and a location identifier of the candidate extraction data in the original file;
The second processing module is configured to process the candidate extraction data according to the data extraction request to obtain pre-extraction data;
The storage module is configured to store the pre-extracted data, the file identification of the original file and the position identification in an associated manner to obtain an initial data set;
A first acquisition module configured to acquire the initial data set;
a first determining module configured to determine target data from the pre-extracted data according to the received data processing request, and determine style characteristics of the target data;
the second determining module is configured to determine data to be processed from the corresponding original file according to the position identification of the target data;
a first processing module comprising:
a second identifying unit configured to identify a target type and an operation type in the data processing request;
a second determining unit configured to determine preprocessing data according to the target type and the data to be processed, the data type of the preprocessing data being adapted to the target type;
and the processing unit is configured to process the preprocessing data according to the operation type and in combination with the style characteristics of the target data to obtain a data processing result.
18. The apparatus of claim 17, wherein the second processing module is configured to:
determining the candidate extraction data as pre-extraction data in response to the data extraction request not including a pre-processing request;
Determining a processing type of the preprocessing request in response to the preprocessing request included in the data extraction request;
Determining the compiling data corresponding to the candidate extracted data in response to the processing type being a compiling type, and determining the compiling data as pre-extracted data;
and determining the tag data in the candidate extraction data in response to the processing type being a tag type, and determining the tag data as pre-extraction data.
19. An electronic device, comprising:
At least one processor, and
A memory communicatively coupled to the at least one processor, wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
20. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-9.
21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-9.
CN202311278630.0A 2023-09-28 2023-09-28 Data processing method, device, equipment and storage medium Active CN117289869B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311278630.0A CN117289869B (en) 2023-09-28 2023-09-28 Data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311278630.0A CN117289869B (en) 2023-09-28 2023-09-28 Data processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117289869A CN117289869A (en) 2023-12-26
CN117289869B true CN117289869B (en) 2025-04-08

Family

ID=89240550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311278630.0A Active CN117289869B (en) 2023-09-28 2023-09-28 Data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117289869B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111381830A (en) * 2020-03-10 2020-07-07 腾讯科技(深圳)有限公司 Data request processing method and device in program and computer equipment
CN116071452A (en) * 2023-03-07 2023-05-05 网易(杭州)网络有限公司 Style image generation method and device, computer equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106847294B (en) * 2017-01-17 2018-11-30 百度在线网络技术(北京)有限公司 Audio-frequency processing method and device based on artificial intelligence
JP7228998B2 (en) * 2018-08-27 2023-02-27 日本放送協会 speech synthesizer and program
CN112347226B (en) * 2020-11-06 2023-05-26 平安科技(深圳)有限公司 Document knowledge extraction method, device, computer equipment and readable storage medium
US11586816B2 (en) * 2021-06-11 2023-02-21 International Business Machines Corporation Content tailoring for diverse audiences
CN117597680A (en) * 2021-08-19 2024-02-23 浙江吉利控股集团有限公司 Data indexing method, device, equipment and storage medium
CN113723294B (en) * 2021-08-31 2024-07-05 杭州海康威视数字技术股份有限公司 Data processing method and device and object recognition method and device
CN115170390B (en) * 2022-08-31 2023-01-06 广州极尚网络技术有限公司 File stylization method, device, equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111381830A (en) * 2020-03-10 2020-07-07 腾讯科技(深圳)有限公司 Data request processing method and device in program and computer equipment
CN116071452A (en) * 2023-03-07 2023-05-05 网易(杭州)网络有限公司 Style image generation method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN117289869A (en) 2023-12-26

Similar Documents

Publication Publication Date Title
CN104735468B (en) A kind of method and system that image is synthesized to new video based on semantic analysis
CN112733042B (en) Recommendation information generation method, related device and computer program product
WO2019200783A1 (en) Method for data crawling in page containing dynamic image or table, device, terminal, and storage medium
CN111639228B (en) Video retrieval method, device, equipment and storage medium
US20230237255A1 (en) Form generation method, apparatus, and device, and medium
CN115982376B (en) Methods and devices for training models based on text, multi-modal data and knowledge
US11019012B2 (en) File sending in instant messaging application
US9792276B2 (en) Content availability for natural language processing tasks
CN106919711A (en) The method and apparatus of the markup information based on artificial intelligence
CN113408208A (en) Model training method, information extraction method, related device and storage medium
JP7309811B2 (en) Data annotation method, apparatus, electronics and storage medium
CN114880498B (en) Event information display method and device, equipment and medium
CN115329825A (en) Model training method, system, device, and computer-readable storage medium
CN114239501B (en) Contract generation method, device, equipment and medium
CN113656642B (en) Cover image generation method, device, equipment, storage medium and program product
US10878005B2 (en) Context aware document advising
CN113515280A (en) Page code generation method and device
CN117289869B (en) Data processing method, device, equipment and storage medium
WO2023236795A1 (en) Encyclopedia entry processing method and apparatus, and electronic device, medium and program product
CN116156248A (en) Video generation method, device, electronic device and storage medium
CN115329129A (en) Method, device, electronic device and storage medium for generating meeting minutes
CN114528489A (en) User tag determination method, device, electronic equipment and program product
CN113221572A (en) Information processing method, device, equipment and medium
CN110879868A (en) Consultant scheme generation method, device, system, electronic equipment and medium
US20250252615A1 (en) Image processing method and apparatus, and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant