WO2018028443A1 - Data processing method, device and system - Google Patents
Data processing method, device and system Download PDFInfo
- Publication number
- WO2018028443A1 WO2018028443A1 PCT/CN2017/094790 CN2017094790W WO2018028443A1 WO 2018028443 A1 WO2018028443 A1 WO 2018028443A1 CN 2017094790 W CN2017094790 W CN 2017094790W WO 2018028443 A1 WO2018028443 A1 WO 2018028443A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- keyword
- search engine
- feature
- engine database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3335—Syntactic pre-processing, e.g. stopword elimination, stemming
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Definitions
- the present disclosure relates to Internet technologies, and in particular, to a data processing method, device, and system.
- the above-mentioned companies generally use four kinds of data export, which refers to a storage space in which data is stored or a software application capable of generating data, and the storage space or software application can provide a data source for the database and present the massive data stored therein.
- the four data exports are data application exports (such as Facebook's Taobao business and Baidu's Baidu index, etc.), report exports (such as the company's salary report), and knowledge base platform exports (such as Baidu's Baidu Encyclopedia and cluster physical table exports (such as corporate users' personal information).
- the present disclosure provides a data processing method, device and system to improve the efficiency of finding data.
- the present disclosure provides a data processing system including: a query terminal and a search engine database;
- the query terminal is configured to receive a query request of a user, where the query request includes a search keyword; the query terminal acquires a dimension keyword, an index keyword, and a time granularity keyword in the search keyword, and Transmitting the dimension keyword, the index keyword, and the time granularity keyword to the search engine database;
- the search engine database pre-stores data in the data exit, and characteristic information of the data,
- the feature information includes at least one of the following: a dimensional feature, an index feature, and a time granularity feature;
- the search engine database is configured to acquire first data corresponding to the dimension feature matching the dimension keyword, second data corresponding to the index feature matched by the index keyword, and match the time granularity keyword Corresponding third data corresponding to the time granularity feature, and sending the first data, the second data, and the third data to the query terminal;
- the query terminal is further configured to determine, according to the first data, the second data, and the third data, target data that is fed back to the user, and display the target data to the user.
- the present disclosure provides a data processing method, including:
- the querying terminal receives a query request of the user, where the query request includes a search keyword;
- the query terminal acquires a dimension keyword, an index keyword, and a time granularity keyword in the search keyword;
- the search engine database Transmitting, by the query terminal, the dimension keyword, the index keyword, and the time granularity keyword to a search engine database, so that the search engine database obtains a dimension feature that matches the dimension keyword.
- the first data, the second data corresponding to the indicator feature matched by the indicator keyword, and the third data corresponding to the time granularity feature matched by the time granularity keyword, the search engine database is pre-stored in the data exit Data, and characteristic information of the data, the feature information includes at least one of the following: a dimensional feature, an index feature, and a time granularity feature;
- the querying terminal determines target data fed back to the user according to the first data, the second data, and the third data.
- the present disclosure provides a data processing method, including:
- the querying terminal receives a query request of the user, where the query request includes a search keyword;
- the query terminal acquires at least two types of keywords in the search keyword
- the querying terminal sends at least two types of keywords to the search engine database, so that the search engine database obtains source data corresponding to the at least two types of keywords respectively;
- the query terminal receives the source data sent by the search engine database
- the query terminal determines target data fed back to the user according to the source data.
- the present disclosure provides a data processing method, including:
- the search engine database receives the dimension keywords, index keywords, and time granularity key sent by the query terminal.
- a word, the dimension keyword, the index keyword, and the time granularity keyword are: the query terminal receives a query request of a user, and is obtained from a search keyword included in the query request;
- the search engine database pre-stores data in the data exit and feature information of the data, and the feature information includes at least one of the following: a dimension feature, an index feature, and a time granularity feature;
- the search engine data acquires first data corresponding to the dimensional feature matched by the dimension keyword, second data corresponding to the index feature matched by the index keyword, and time granularity matched with the time granularity keyword Third data corresponding to the feature;
- the present disclosure provides a data processing method, including:
- the search engine database acquires first data in the data application, and dimension features, indicator features, and time granularity features of the first data;
- the search engine database respectively acquires a second data in the report, the knowledge base platform, the cluster physical table, and the dimensional characteristics of the second data;
- the search engine database stores the first data, and dimension features, index features, and time granularity features of the first data
- the search engine database stores the second data and dimensional features of the second data.
- the disclosure provides a query terminal, including: a receiving unit, a processing unit, and a sending unit;
- the receiving unit is configured to receive a query request of a user, where the query request includes a search keyword;
- the processing unit is coupled to the receiving unit, configured to acquire a dimension keyword, an index keyword, and a time granularity keyword in the search keyword;
- the sending unit is coupled to the processing unit, configured to send the dimension keyword, the index keyword, and the time granularity keyword to a search engine database, so that the search engine database acquires The first data corresponding to the dimension feature matched by the dimension keyword, the second data corresponding to the indicator feature matched by the index keyword, and the third data corresponding to the time granularity feature matched by the time granularity keyword,
- the search engine database pre-stores data in the data exit and characteristic information of the data, and the data export includes at least one of the following: a data application, a report, a knowledge base platform, and a cluster physical table, and the feature information Including at least one of the following: dimensional features, indicator features, and time granularity features;
- the receiving unit is further configured to receive the first data and the second data sent by the search engine database And the third data;
- the processing unit is further configured to determine target data that is fed back to the user according to the first data, the second data, and the third data.
- the present disclosure provides a search engine database, including: a receiver, a memory, a processor, and a transmitter;
- the receiver is configured to receive a dimension keyword, an index keyword, and a time granularity keyword sent by the query terminal, where the dimension keyword, the index keyword, and the time granularity keyword are the query terminal Receiving a query request of the user, and obtaining the search keyword included in the query request;
- the memory is configured to store data in a data exit, and characteristic information of the data, where the data export includes at least one of the following: a data application, a report, a knowledge base platform, and a cluster physical table, and the feature information Including at least one of the following: dimensional features, indicator features, and time granularity features;
- the processor is coupled to the receiver and the memory, and is configured to acquire first data corresponding to the dimensional feature matched by the dimension keyword, and second data corresponding to the index feature matched by the index keyword And third data corresponding to the time granularity feature matching the time granularity keyword;
- the transmitter is coupled to the processor, configured to send the first data, the second data, and the third data to the query terminal, so that the query terminal is configured according to the first The data, the second data, and the third data determine target data that is fed back to the user.
- data in a data application, a report, a knowledge base platform, and a cluster physical table are collected in advance into a search engine database, and dimension attributes, index features, and time granularity features are added to each piece of data collected.
- the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find the data matching the dimension keyword, the index keyword and the time granularity keyword respectively, and display the matched data to the user; the user does not need to traverse each data export to perform data search, only need to input the search keyword once, the search engine
- the database can find out the data related to the search keyword in all data exits, thereby improving the efficiency of finding data.
- FIG. 1 is a schematic diagram of an optional application scenario of the present disclosure
- FIG. 2 is a schematic structural diagram of a data processing system according to an embodiment of the present disclosure
- FIG. 3 is a flowchart of a data processing method according to Embodiment 1 of the present disclosure
- FIG. 5 is a flowchart of a data processing method according to Embodiment 3 of the present disclosure.
- FIG. 6 is a flowchart of a data processing method according to Embodiment 4 of the present disclosure.
- FIG. 7 is a flowchart of a data processing method according to Embodiment 5 of the present disclosure.
- FIG. 8 is a flowchart of a data processing method according to Embodiment 6 of the present disclosure.
- FIG. 9 is a flowchart of a data processing method according to Embodiment 7 of the present disclosure.
- FIG. 10 is a flowchart of a data processing method according to Embodiment 8 of the present disclosure.
- FIG. 11 is a flowchart of a data processing method according to Embodiment 9 of the present disclosure.
- FIG. 12 is a schematic structural diagram of a query terminal according to Embodiment 1 of the present disclosure.
- FIG. 13 is a schematic structural diagram of a query terminal according to Embodiment 2 of the present disclosure.
- FIG. 14 is a schematic structural diagram of a query terminal according to Embodiment 3 of the present disclosure.
- FIG. 15 is a schematic structural diagram of a search engine database according to an embodiment of the present disclosure.
- the present invention proposes a data processing method, and the specific process of the data processing method provided by the present invention will now be described in conjunction with FIG.
- the user 10 queries the terminal 11 for querying data.
- the user 10 may be a non-technical person in the company or a consumer.
- the query terminal 11 may be a terminal device in the company to which the user 10 belongs, or may be User 10's personal computer, laptop, etc.
- the query terminal 11 is installed with a search engine, and the user 10 can
- the search keyword is input in the search box of the search engine by querying the keyboard of the terminal 11, for example, the search keyword is "the most recent home improvement category transaction amount", and the semantic recognition module 12 splits the search keyword into the big data field. Dimensional keywords, index keywords, and time granularity keywords.
- the dimension keywords are “home improvement category”, the indicator keyword is “transaction amount”, and the time granularity keyword is “the most recent day”.
- the method by which the semantic recognition module 12 splits the search keyword into a dimensional keyword, an index keyword, and a time granularity keyword will be described in detail in the following embodiments.
- the semantic recognition module 12 transmits the split dimension keyword "home improvement category", the index keyword “transaction amount”, and the time granularity keyword "last day” to the search engine database 13, and the data source of the search engine database 13 includes The data application program 15, the report 16, the knowledge base platform 17, and the cluster physical table 18, wherein the data application 15 may specifically be a data product, such as Facebook's Taobao business and Baidu's Baidu index, etc., the data product is a web page.
- the biggest difference between data products and ordinary web products is that the data products carry a large amount of data and need to frequently interact with the background data source, and the background data source is specifically stored with data operable by the data application 15. Device.
- the data in the data application 15 and the report 16 can be stored in the search engine database 13 by the syntax parser 19, taking the data application 15 as an example, since the data application 15 is through a software development kit (Software).
- the Development Kit (SDK) is developed so that the data in the data application 15 can be collected into the parser 19 via the SDK.
- the syntax parser 19 can parse a dimensionality feature, an index feature, a time granular feature, and a read table name of a Structured Query Language (SQL).
- SQL Structured Query Language
- SQL Structured Query Language
- user_type AS user type
- the syntax parser 19 can parse that the dimension feature of the segment of SQL is "user type", the indicator feature is “Pv, Uv”, the time granularity feature is "the last day”, and the read table name is "tbbi.ads_tb_log_1d”.
- the parser parser 19 can parse the dimensional features, index features, and time granularity features of each of the data in the data application 15 and the report 16.
- the parser parser 19 transmits the parsed data to the search engine database 13, which stores not only the data itself but also dimensional features, index features, and time granularity features of the data.
- the search engine database 13 can also store the data in the knowledge base platform 17 and the cluster physical table 18.
- the storage process is specifically: splitting each data in the knowledge base platform 17 and the cluster physical table 18, and extracting the split after extracting
- the dimensional characteristics of each of the data, and each of the data in the knowledge base platform 17 and the cluster physical table 18, and the dimensional characteristics of each data are stored in the search engine database 13.
- each data stored in the search engine database 13 has at least a dimensional feature.
- the search engine database 13 When the search engine database 13 receives the dimension keyword "home improvement category", the index keyword “transaction amount”, and the time granularity keyword “last day” sent by the semantic recognition module 12, respectively, the dimension keyword "home improvement” is found.
- the category “matched data, the data matching the index keyword “the transaction amount”, and the data matching the time granularity keyword “the last day”
- the search engine database 13 sends the found matching data to the sequencer 14, if The search engine database 13 finds only one matching data, and the sequencer 14 sends the matching data to the query terminal 11, and the query terminal 11 displays the matching data; if the search engine database 13 finds more than one matching data, the sorting is performed.
- the device 14 sorts the plurality of matching data according to a preset algorithm, and sends the sorted plurality of matching data to the query terminal 11, and the query terminal 11 displays the plurality of matching data in the order of sorting.
- the preset algorithm for sorting the plurality of matching data by the sequencer 14 includes at least one of the following: a Pagerank algorithm, a CUS-distance algorithm, a Latent Dirichlet Allocation (LDA) algorithm, and a width. Priority Search (BFS) algorithm, etc.
- the data in the data application, the report, the knowledge base platform, and the cluster physical table are collected in advance into the search engine database, and the dimension feature, the index feature, and the time granularity feature are added to each collected data.
- the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find the data matching the dimension keyword, the index keyword and the time granularity keyword respectively, and display the matched data to the user; the user does not need to traverse each data export to perform data search, only need to input the search keyword once, the search engine
- the database can find out the data related to the search keyword in all data exits, thereby improving the efficiency of finding data.
- the data processing system includes a query terminal 1 and a search engine database 2, wherein the query terminal 1 is configured to receive a query request of a user,
- the query request includes a search keyword;
- the query terminal acquires a dimension keyword, an index keyword, and a time granularity keyword in the search keyword, and the dimension keyword, the index keyword, and the time
- the granularity keyword is sent to the search engine database.
- the query terminal 11 receives the query request of the user 10 , and the query request may be performed in various manners.
- the user 10 inputs text and voice on the search engine of the query terminal 11 , and the text or voice includes the user 10 Search for keywords.
- the semantic recognition module 12 and the sequencer 14 may be modules belonging to the query terminal 11, and the semantic recognition module 12 splits the search keywords into dimension keywords, index keywords, and time granularity in the big data domain. Keywords, specifically, the dimension keyword is “home improvement category”, the indicator keyword is “transaction amount”, and the time granularity keyword is “last day”.
- the semantic recognition module 12 also transmits the dimension keyword "home improvement category", the index keyword "transaction amount”, and the time granularity keyword "last day” to the search engine database 2.
- the search engine database 2 pre-stores data in the data exit, and feature information of the data, the feature information including at least one of the following: a dimensional feature, an index feature, and a time granular feature.
- the data export includes: a data application, a report, a knowledge base platform, and a cluster physical table.
- the search engine database 13 stores data in the data application, the report, the knowledge base platform, and the cluster physical table.
- the characteristic information of each data, each data in the data application has dimensional characteristics, index characteristics and time granularity characteristics, and the data in the report, the knowledge base platform and the cluster physical table all have dimensional characteristics.
- the search engine database 2 is configured to acquire first data corresponding to the dimensional feature matched by the dimension keyword, second data corresponding to the index feature matched by the index keyword, and time matched with the time granularity keyword
- the third data corresponding to the granularity feature, and the first data, the second data, and the third data are sent to the query terminal.
- the search engine database 13 When the search engine database 13 receives the dimension keyword "home improvement category", the index keyword "transaction amount”, and the time granularity keyword "last day” sent by the semantic recognition module 12, the search for the dimension keyword can be separately found.
- the search engine database 13 can match the dimension keyword "home improvement category” recognized by the semantic recognition module 12 with the dimensional features of the stored data, and obtain first data corresponding to the dimensional feature matching the dimensional keyword, the first A data may be a plurality of data, and the first data may be data derived from the data application 15, the report 16, the knowledge base platform 17, or the cluster physical table 18.
- the search engine database 13 may further match the index keyword “transaction amount” recognized by the semantic recognition module 12 with the index feature of the stored data, and obtain second data corresponding to the indicator feature that matches the index keyword.
- the second data may be a plurality of data originating from the data application 15.
- the search engine database 13 may also match the time granularity keyword "last day" recognized by the semantic recognition module 12 with the time granularity feature of the stored data to obtain a time granularity feature corresponding to the time granularity keyword.
- the third data which may be a plurality of data originating from the data application 15.
- the first data, the second data, and the third data obtained by the search engine database 13 are sent to the query terminal 11, and may be sent to the sequencer 14 in the query terminal 11.
- the query terminal 1 is further configured to determine target data fed back to the user according to the first data, the second data, and the third data, and display the target data to the user.
- the sequencer 14 sends the matching data to the display of the query terminal 11.
- the display of the inquiry terminal 11 displays the matching data.
- the sequencer 14 matches the plurality of matches according to a preset algorithm.
- the data is sorted, and the sorted plurality of matching data are sent to the display of the query terminal 11, and the display of the query terminal 11 displays the plurality of matching data in a sorted order.
- the data in the data application, the report, the knowledge base platform, and the cluster physical table are collected in advance into the search engine database, and the dimension feature, the index feature, and the time granularity feature are added to each collected data.
- the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find the data matching the dimension keyword, the index keyword and the time granularity keyword respectively, and display the matched data to the user; the user does not need to traverse each data export to perform data search, only need to input the search keyword once, the search engine
- the database can find out the data related to the search keyword in all data exits, thereby improving the efficiency of finding data.
- FIG. 3 is a flowchart of a data processing method according to Embodiment 1 of the present disclosure. As shown in FIG. 3, the method includes the following steps:
- Step S201 The query terminal receives a query request of the user, where the query request includes a search keyword.
- the query terminal 11 receives the query request of the user 10 , and the query request may be performed in various manners.
- the user 10 inputs text and voice on the search engine of the query terminal 11 , and the text or voice includes the user 10 Searching for keywords; or, the search engine of the query terminal 11 is provided with a drop-down list in which keywords are pre-stored, and the user can input the pre-retrieved keywords by selecting keywords in the list and clicking;
- the user 10 previews the text information on the query terminal 11, and the user 10 selects a keyword from the text information of the preview, and searches for the keyword by dragging, sliding, and clicking the function key.
- the user 10 queries the data through the query terminal 11, and the user 10 can be a non-technical person in the company, and can also It is a consumer, and the inquiry terminal 11 may be a terminal device in a company to which the user 10 belongs, or may be a device such as a personal computer or a notebook computer of the user 10.
- the query terminal 11 is installed with a search engine, and the user 10 can input a search keyword in the search box of the search engine by querying the keyboard of the terminal 11, for example, the search keyword is "the latest day home improvement category transaction amount".
- Step S202 The query terminal acquires a dimension keyword, an index keyword, and a time granularity keyword in the search keyword.
- the semantic recognition module 12 and the sequencer 14 may be modules belonging to the query terminal 11 or modules belonging to the search engine database 13 , and the query terminal 11 and the search engine database 13 may be directly connected or may be connected. Connect indirectly through other devices.
- the semantic identification module 12 and the sequencer 14 are directly connected to the query terminal 11, the query terminal 11, and the search engine database 13 as an example.
- the semantic recognition module 12 splits the search keyword into a dimension keyword, an index keyword, and a time granularity keyword in the big data domain.
- the dimension keyword is “home improvement category”
- the index keyword is “transaction amount”.
- the time granularity keyword is "the most recent day.”
- Step S203 The query terminal sends the dimension keyword, the index keyword, and the time granularity keyword to a search engine database, so that the search engine database obtains a dimension matching the dimension keyword.
- the search engine database pre-stores data in the data exit and feature information of the data, and the feature information includes at least one of the following: a dimensional feature, an indicator feature, and a time granularity feature.
- the search engine database pre-stores data in a data exit, and characteristics of the data
- the information includes at least one of the following: a dimensional feature, an indicator feature, and a time granularity feature
- the data outlet includes at least one of the following: a data application, a report, a knowledge base platform, and a cluster physical table.
- the data export includes: a data application, a report, a knowledge base platform, and a cluster physical table.
- the search engine database 13 stores data in the data application, the report, the knowledge base platform, and the cluster physical table. .
- the data source of the search engine database 13 includes a data application 15, a report 16, a knowledge base platform 17, and a cluster physical table 18.
- the data application 15 may specifically be a data product, such as Facebook's Taobao. Business and Baidu's Baidu Index, etc., data products are web products in the form of web pages. The biggest difference between data products and ordinary web products is that data products carry a large amount of data and need to be frequently and later.
- the data source is interactive, and the background data source is specifically a device that stores data operable by the data application 15.
- the data in the data application 15 can be stored in the search engine database 13 by the syntax parser 19, and specifically, the data in the data application 15 is collected into the syntax parser 19 by the SDK.
- the syntax parser 19 can parse a dimensionality feature, an index feature, a time granular feature, and a read table name of a Structured Query Language (SQL).
- SQL Structured Query Language
- a piece of SQL is as follows:
- user_type AS user type
- the syntax parser 19 can parse that the dimension feature of the segment of SQL is "user type", the indicator feature is “Pv, Uv”, the time granularity feature is "the last day”, and the read table name is "tbbi.ads_tb_log_1d”.
- the syntax parser 19 can parse the dimensional features, index features, and time granularity features of each data in the data application 15.
- the parser parser 19 transmits the parsed data to the search engine database 13, which stores not only the data itself but also dimensional features, index features, and time granularity features of the data.
- the search engine database 13 can also store the data in the report 16, the knowledge base platform 17, and the cluster physical table 18.
- the storage process is specifically: performing each data in the report 16, the knowledge base platform 17, and the cluster physical table 18. Splitting, extracting the dimensional features of each of the split data, and storing each of the data in the report 16, the knowledge base platform 17, and the cluster physical table 18, and the dimensional features of each data in the search engine database 13.
- each data stored in the search engine database 13 has at least a dimensional feature.
- the search engine database 13 receives the dimension keyword "home improvement category", the index keyword "transaction amount”, and the time granularity keyword "last day” sent by the semantic recognition module 12, the search for the dimension keyword can be separately found.
- the search engine database 13 stores data in the data application 15 and dimensional features, index features, and time granularity features of each of the data in the data application 15.
- the search engine database 13 also stores data in the report 16, the knowledge base platform 17, and the cluster physical table 18, as well as the dimensional characteristics of each of the data in the report 16, the knowledge base platform 17, and the cluster physical table 18.
- the number in the search engine database 13 The dimensional characteristics of the data may be different and may be the same; the index characteristics of each data may be different and may be the same; the time granularity characteristics of each data may be different and may be the same.
- the search engine database 13 in this embodiment may match the dimension keyword "home improvement category" recognized by the semantic recognition module 12 with the dimensional features of the data stored therein, and obtain the first corresponding to the dimensional feature matching the dimensional keyword.
- a data, the first data may be a plurality of data, and the first data may be data derived from the data application 15, the report 16, the knowledge base platform 17, or the cluster physical table 18.
- the search engine database 13 may further match the index keyword “transaction amount” recognized by the semantic recognition module 12 with the index feature of the stored data, and obtain second data corresponding to the indicator feature that matches the index keyword.
- the second data may be a plurality of data originating from the data application 15.
- the search engine database 13 may also match the time granularity keyword "last day" recognized by the semantic recognition module 12 with the time granularity feature of the stored data to obtain a time granularity feature corresponding to the time granularity keyword.
- the third data which may be a plurality of data originating from the data application 15.
- Step S204 The query terminal receives the first data, the second data, and the third data that are sent by the search engine database.
- the first data, the second data, and the third data obtained by the search engine database 13 are sent to the query terminal 11, and may be sent to the sequencer 14 in the query terminal 11.
- Step S205 The querying terminal determines, according to the first data, the second data, and the third data, target data that is fed back to the user.
- the sequencer 14 sends the matching data to the display of the query terminal 11.
- the display of the inquiry terminal 11 displays the matching data.
- the sequencer 14 matches the plurality of matches according to a preset algorithm.
- the data is sorted, and the sorted plurality of matching data are sent to the display of the query terminal 11, and the display of the query terminal 11 displays the plurality of matching data in a sorted order.
- the data in the data application, the report, the knowledge base platform, and the cluster physical table are collected in advance into the search engine database, and the dimension feature, the index feature, and the time granularity feature are added to each collected data.
- the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find numbers that match the dimension keyword, indicator keyword, and time granularity keyword
- the matching data is displayed to the user; the user does not need to traverse each data outlet to perform data search, and only needs to input the search keyword once, and the search engine database can find the data related to the search keyword in all data outlets. This improves the efficiency of finding data.
- the method for index keywords and time granularity keywords may specifically include the following steps:
- Step S301 The query terminal performs word segmentation on the search keyword to obtain a plurality of target word segments.
- the search keyword input by the user is "the most recent home improvement category transaction amount”.
- the query terminal 11 can also split the search keywords input by the user by using the TF-idf algorithm to obtain a plurality of target word segments, and the plurality of target word segments are “the latest day”, “home improvement category”, and “transaction amount”.
- Step S302 The querying terminal queries a preset mapping table according to each target word segment, where the mapping table includes a dimension word segmentation, an index word segmentation, and a time granularity word segmentation.
- the query terminal 11 is pre-established with a mapping table, which includes a dimension word segmentation, an index word segmentation, and a time granularity word segmentation, the dimension word segmentation may be a plurality of segmentation words having dimensional features, and the index segmentation words may be multiple index features.
- the word segmentation, time granularity word segmentation can be a plurality of word segmentation with time granularity characteristics.
- the query terminal 11 respectively queries the mapping table, and for each target word segment, it is determined whether there is a word segment matching the target word segment in the mapping table.
- Step S303 The query terminal determines, as the dimension keyword, a target word segment that matches the dimension word segment among the plurality of target word segments.
- the “home improvement category” in the plurality of target word segments matches the dimension word segment in the mapping table, the “home improvement category” is used as the dimension keyword in the search keyword.
- Step S304 The query terminal determines, as the index keyword, a target word segment that matches the indicator word segment among the plurality of target word segments.
- the “transaction amount” in the plurality of target particials matches the indicator participle in the mapping table, the “transaction amount” is used as the index keyword in the search keyword.
- Step S305 The query terminal determines, as the time granularity keyword, a target word segment that matches the time granularity word segment among the plurality of target word segments.
- the “last day” in the plurality of target word segments matches the time granularity word segmentation in the mapping table, the “last day” is used as the time granularity keyword in the search keyword.
- a plurality of target word segments are obtained by performing word segmentation processing on the search keywords, and the dimension keywords, index keywords, and time granularity keywords in the plurality of target word segments are queried according to the pre-established mapping table, thereby improving the determined search.
- the efficiency of dimensional keywords, index keywords and time-granulated keywords in keywords are improved.
- FIG. 5 is a flowchart of a data processing method according to Embodiment 3 of the present disclosure. As shown in FIG. 5, based on any of the foregoing embodiments, based on the second embodiment, the data processing method provided by this embodiment is specific. Proceed as follows:
- Step S401 The querying terminal receives a query request of the user, where the query request includes a search keyword.
- Step S402 The query terminal acquires a dimension keyword, an index keyword, and a time granularity keyword in the search keyword.
- Step S403 The query terminal sends the dimension keyword, the index keyword, and the time granularity keyword to a search engine database, so that the search engine database obtains a dimension matching the dimension keyword.
- Step S404 The query terminal receives the first data, the second data, and the third data that are sent by the search engine database.
- Steps S401 to S404 are consistent with steps S201-S204, respectively, and the specific method is not described herein again.
- Step S405 The querying terminal determines whether the first data, the second data, and the third data are the same data. If yes, step S406 is performed; otherwise, step S407 is performed.
- Step S406 The querying terminal determines the same data as target data fed back to the user.
- the sequencer 14 sends the matching data.
- the display of the inquiry terminal 11 displays the matching data.
- Step S407 The query terminal sorts the first data, the second data, and the third data, and determines the sorted data as target data fed back to the user.
- the sequencer 14 matches the plurality of matches according to a preset algorithm.
- the data is sorted, and the sorted plurality of matching data are sent to the display of the query terminal 11, and the display of the query terminal 11 displays the plurality of matching data in a sorted order.
- the method for the query terminal to sort the first data, the second data, and the third data may specifically include the following steps:
- Step S51 The query terminal calculates a weight value of each of the first data, the second data, and the third data.
- the weight value of each data can be calculated by the Pagerank algorithm.
- Step S52 The query terminal calculates a similarity between each of the first data, the second data, and the third data and the search keyword.
- the CUS-distance algorithm can be used to calculate the similarity between each data and the search keyword input by the user.
- Step S53 The query terminal calculates a sort value of each data according to the weight value and the similarity of each data.
- a value obtained by adding a weight value and a similarity of each data may be used as a sort value of the data.
- Step S54 The querying terminal sorts each of the first data, the second data, and the third data according to the sorting value of each data.
- each of the first data, the second data, and the third data may be sorted in descending order according to a sort value of each data.
- the querying terminal determines, according to the sorting value of each data, data that the ranking value in the first data, the second data, and the third data is greater than a first threshold; the query terminal The data whose sort value is greater than the first threshold is sorted according to the size of the sort value.
- the first data, the second data, and the third data may be determined.
- the data in the sorted value is greater than the first threshold, and the data whose sorted value is greater than the first threshold is sorted according to the size of the sorted value.
- a plurality of data matching the search keywords searched by the search engine database are sorted, and the sorting is based on the sort value of each data, the sort value and the weight value of each data, and the data and the search.
- the similarity of the keyword is related, the larger the sorting value is, the stronger the correlation between the data and the search keyword is, and the plurality of sorted data are fed back to the user, and the user can conveniently view the most relevant to the search keyword. Strong data improves the user experience.
- FIG. 6 is a flowchart of a data processing method according to Embodiment 4 of the present disclosure. As shown in FIG. 6, based on any of the foregoing embodiments, based on the second embodiment, the data processing method provided by this embodiment is specific. Proceed as follows:
- Step S601 The querying terminal receives a query request of the user, where the query request includes a search keyword.
- Step S602 The query terminal acquires a dimension keyword, an index keyword, and a time granularity keyword in the search keyword.
- Step S603 The query terminal sends the dimension keyword, the index keyword, and the time granularity keyword to a search engine database, so that the search engine database obtains a dimension matching the dimension keyword.
- Step S604 The query terminal receives the first data, the second data, and the third data that are sent by the search engine database.
- Step S605 The querying terminal determines, according to the first data, the second data, and the third data, target data that is fed back to the user.
- Steps S601 to S605 are respectively consistent with steps S201 to S205, and the specific method is not described herein again.
- Step S606 The query terminal receives a click operation of the user on the target data.
- the sorted target data may be displayed on the query terminal, and the user may click to view the plurality of target data by querying the terminal.
- the query terminal can receive the click operation of the target data by the user.
- Step S607 The querying terminal establishes an association relationship between the user and the target data according to the click operation.
- the association relationship includes a degree of association, the degree of association identifying a degree of association of the user with the target data.
- the query terminal establishes an association relationship between the user and the target data according to a click operation generated by the user clicking a certain target data, and may also calculate target data of the user and the click according to the association rule and the collaborative filtering rule.
- the degree of association, the number of target data clicked by the user may be multiple.
- Step S608 When the user does not input the search keyword, the query terminal displays the target data according to the association relationship.
- the query terminal 11 can display the target data according to the association relationship between the user and the target data that the user clicks on, that is, the query terminal 11 can display the target data that the user clicked to the user. .
- the association relationship includes an association degree, where the association degree identifies a degree of association between the user and the target data.
- the querying terminal displays the target data according to the association relationship, including: the querying terminal displays the target data whose association degree is greater than a second threshold.
- the query terminal displays the target data whose association degree is greater than a second threshold.
- the association relationship between the user and the target data that the user clicks on includes the degree of association between the user and the target data, and the query terminal 11 can also display the target data that the user clicked has a degree of relevance greater than the second threshold.
- the target data that the user clicked may be displayed according to the relationship between the user and the target data, thereby improving The convenience of users to query data.
- FIG. 7 is a flowchart of a data processing method according to Embodiment 5 of the present disclosure. As shown in FIG. 7, the specific steps of the data processing method provided in this embodiment are as follows:
- Step S501 The query terminal receives a query request of the user, where the query request includes a search keyword.
- the query terminal 11 receives the query request of the user 10 , and the query request may be performed in various manners.
- the user 10 inputs text and voice on the search engine of the query terminal 11 , and the text or voice includes the user 10 Search for keywords.
- Step S502 The query terminal acquires at least two types of keywords in the search keyword.
- the query terminal when the query terminal classifies the search keywords that the user requests for the query, it may not be limited to the three types of keywords: the dimension keyword, the index keyword, and the time granularity keyword, because the user does not request the query.
- Each of the search keywords includes three types of keywords: a dimension keyword, an index keyword, and a time granularity keyword. Therefore, the semantic recognition module 12 corresponding to the query terminal 11 shown in FIG. 1 can also retrieve the key of the user requesting the query.
- the word is divided into at least two types of keywords. For example, the user is a seller.
- the search keyword requested by the seller is “Is there a customer to evaluate my product?”, and the verb “evaluation” and the noun “commodity” can be separated.
- Step S503 The query terminal sends at least two types of keywords to the search engine database, so that the search engine database obtains source data corresponding to the at least two types of keywords respectively.
- the query terminal sends the verb "evaluation" and the noun "item” to the search engine database, and the search engine database stores the commodity information of all the seller's products, and the evaluation information of each commodity.
- the search engine database obtains product information of all the products of the seller according to the "product”, and the product information specifically includes a name, a place of origin, a material, and the like, and obtains evaluation information of all the products according to the "evaluation”.
- Step S504 The query terminal receives the source data sent by the search engine database.
- the search engine database transmits the product information and the evaluation information to the query terminal. Since the product information herein may be plural, the evaluation information may be plural.
- Step S505 The querying terminal determines, according to the source data, target data that is fed back to the user.
- the query terminal may determine, according to the number of pieces of evaluation information of each product, product information that is fed back to the product with the most user evaluation information, and may also feed back the first pieces of evaluation information of each product to the user.
- the specific implementation manner in which the query terminal determines the target data fed back to the user is not limited.
- the results of the classification are not limited to the dimension keywords, the index keywords, and the time granularity keywords, thereby improving the flexibility of the search keyword classification and increasing the search keywords.
- the flexibility of the search and the scope of the search are also expanded.
- FIG. 8 is a flowchart of a data processing method according to Embodiment 6 of the present disclosure. As shown in FIG. 8, the specific steps of the data processing method provided in this embodiment are as follows:
- Step S701 The search engine database receives the dimension keyword, the index keyword, and the time granularity keyword sent by the query terminal.
- the dimension keyword, the index keyword, and the time granularity keyword are the query requests received by the query terminal by the user, and are obtained from the search keywords included in the query request.
- the search engine database pre-stores data in the data exit and feature information of the data, and the feature information includes at least one of the following: a dimension feature, an index feature, and a time granular feature.
- the data exit includes at least one of the following: a data application, a report, a knowledge base platform, and a cluster physical table.
- Step S702 The search engine data acquires first data corresponding to the dimensional feature matched by the dimension keyword, second data corresponding to the index feature matched by the index keyword, and matching with the time granularity keyword.
- the time granularity feature corresponds to the third data.
- Step S703 the search engine data sends the first data, the second data, and the third data to the query terminal, so that the query terminal is configured according to the first data, the second The data and the third data determine target data that is fed back to the user.
- the data in the data application, the report, the knowledge base platform, and the cluster physical table are collected in advance into the search engine database, and the dimension feature, the index feature, and the time granularity feature are added to each collected data.
- the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find the data matching the dimension keyword, the index keyword and the time granularity keyword respectively, and display the matched data to the user; the user does not need to traverse each data export to perform data search, only need to input the search keyword once, the search engine
- the database can find out the data related to the search keyword in all data exits, thereby improving the efficiency of finding data.
- FIG. 9 is a flowchart of a data processing method according to Embodiment 7 of the present disclosure. As shown in FIG. 9, the specific steps of the data processing method provided in this embodiment are as follows:
- Step S801 The search engine database stores data in the data application, the report, the knowledge base platform, and the cluster physical table.
- the search engine database 13 pre-stores data in the data application, the report, the knowledge base platform, and the cluster physical table before receiving the search keyword input by the user.
- the data in the data application 15 can be stored in the search engine database 13 by the syntax parser 19, and specifically, the data in the data application 15 is collected into the syntax parser 19 by the SDK.
- the syntax parser 19 can parse a dimensionality feature, an index feature, a time granular feature, and a read table name of a Structured Query Language (SQL).
- SQL Structured Query Language
- a piece of SQL is as follows:
- user_type AS user type
- the syntax parser 19 can parse that the dimension feature of the segment of SQL is "user type", the indicator feature is “Pv, Uv”, the time granularity feature is "the last day”, and the read table name is "tbbi.ads_tb_log_1d”.
- the syntax parser 19 can parse the dimensional features, index features, and time granularity features of each data in the data application 15.
- the parser parser 19 transmits the parsed data to the search engine database 13, which stores not only the data itself but also dimensional features, index features, and time granularity features of the data.
- the search engine database 13 can also store the data in the report 16, the knowledge base platform 17, and the cluster physical table 18.
- the storage process is specifically: performing each data in the report 16, the knowledge base platform 17, and the cluster physical table 18. Splitting, extracting the dimensional features of each of the split data, and storing each of the data in the report 16, the knowledge base platform 17, and the cluster physical table 18, and the dimensional features of each data in the search engine database 13.
- each data stored in the search engine database 13 has at least a dimensional feature.
- Step S802 The search engine database receives the dimension keyword, the index keyword, and the time granularity keyword sent by the query terminal, where the dimension keyword, the index keyword, and the time granularity keyword are the check
- the querying terminal receives the query request of the user and obtains the search keyword included in the query request.
- the search engine database pre-stores data in the data exit and feature information of the data, and the feature information includes at least one of the following: a dimension feature, an index feature, and a time granular feature.
- the data exit includes at least one of the following: a data application, a report, a knowledge base platform, and a cluster physical table.
- Step S803 the search engine data acquires first data corresponding to the dimensional feature matched by the dimension keyword, second data corresponding to the index feature matched by the index keyword, and matching with the time granularity keyword.
- the time granularity feature corresponds to the third data.
- Step S804 the search engine data sends the first data, the second data, and the third data to the query terminal, so that the query terminal is configured according to the first data, the second The data and the third data determine target data that is fed back to the user.
- step S802 to step S804 The principle of the method described in step S802 to step S804 is the same as that of the method described in step S701 to step S703, and details are not described herein again.
- the data in the data application, the report, the knowledge base platform, and the cluster physical table are collected in advance into the search engine database, and the dimension feature, the index feature, and the time granularity feature are added to each collected data.
- the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find the data matching the dimension keyword, the index keyword and the time granularity keyword respectively, and display the matched data to the user; the user does not need to traverse each data export to perform data search, only need to input the search keyword once, the search engine
- the database can find out the data related to the search keyword in all data exits, thereby improving the efficiency of finding data.
- the search engine database stores the data application, the report, the knowledge base platform, and the cluster physical table.
- the data in the specific may include the following steps S901 and S902:
- Step S901 The search engine database stores data in the data application.
- Step S901 can be implemented by the following steps S11-S13:
- Step S11 The search engine database acquires access logic of the data application accessing the data source.
- the access logic includes data in the data application, the data source storing output logic of the data.
- This embodiment introduces a method of storing data in the data application program to the search engine database, and the method described in this embodiment is different from the data application 15 in the syntax parser 19 described in the above embodiment.
- the data application may be specifically in the form of a web page, and needs to interact with the background data source frequently; the background data source may be specifically a device that stores the data application operation data. Since the data application is developed according to the SDK, the SDK has the greatest operational authority for the data application, so the first access logic of the data application to the background data source can be captured by the SDK, and the first access logic includes the data application. The time when the program accesses the background data source, the second access logic of the user to the data application, and so on. Therefore, the second access logic of the user to the data application can be obtained through the existing parsing method.
- the second access logic of the user to the data application includes field information such as the time when the user accesses the data application, the data in the data application currently accessed by the user, and the like. Therefore, through the existing parsing method, the data in the data application currently accessed by the user can be obtained.
- user_type AS user type
- the data in the data application currently accessed by the user is “tbbi.ads_tb_log_1d”, that is, the information after the FROM field.
- the background data source stores the output logic of each data. Therefore, in the background data source, the output logic of the data in the data application currently accessed by the user can be directly searched.
- Step S12 The search engine database determines feature information of data in the data application according to the output logic.
- the output logic includes aggregation object information of the data, indicator information of the participation operation in the aggregation process, and time information of the indicator operation.
- step S12 is specifically: the search engine database determines that the aggregated object information of the data is a dimensional feature of the data; and the search engine database determines that the indicator information of the data participating in the operation in the aggregation process is the An index feature of the data; the search engine database is determined according to time information of the indicator operation Time granularity characteristics of the data.
- the output logic of the data in the data application currently accessed by the user is parsed, and the aggregated object information of the current data, the index information of the participating operations in the aggregation process, and the time information of the index calculation are obtained.
- stat_date Group by user_type, stat_date
- the aggregated object information of the data in the data application currently accessed by the user is stat_date, user_type, that is, the information after the Group by field;
- the index information of the participating operation in the aggregation process is se_lpv_pc_1d_001, Se_uv_pc_1d_001, that is, the information after the count(1) and count(distinct uid) fields;
- the time information of the index operation is '20160119', that is, the fractional area after the where ds field.
- the time granularity feature is 7.
- Step S13 The search engine database stores data in the data application and feature information of the data.
- dimension features, metric features, and time granularity features are added to the data in the data application currently accessed by the user, and the data after the feature is added is stored in the search engine database.
- the dimension feature, the index feature, and the time granularity feature of the data in the data application currently accessed by the user can be obtained, and the data application currently accessed by the user is obtained.
- the data in the program adds the above dimensional features, index features, and time granularity features.
- the data after adding the above features is stored in the search engine database.
- all the data in the data application can be stored in the search engine database, and the search engine data
- Each piece of data in the library has dimensional features, indicator characteristics, and time granularity characteristics.
- Step S902 The search engine database stores the data in the report, the knowledge base platform, and the cluster physical table.
- Step S902 can be implemented by the following steps S21-S23:
- Step S21 The search engine database respectively acquires data in the report, the knowledge base platform, and the cluster physical table.
- each of the data in the report, the knowledge base platform, and the cluster physical table may be split by a TF-iDF algorithm.
- Step S22 The search engine database determines, according to a preset algorithm, a dimensional feature of each data in the report, the knowledge base platform, and the cluster physical table.
- the LDA algorithm and the TOPIC MODEL algorithm are used to extract the features of the split data, and the extracted features are used as the dimensional features of the corresponding data.
- Step S23 The search engine database stores the data in the report, the knowledge base platform, and the cluster physical table, and the dimensional features of the data.
- Dimension features are added to each of the data in the report, the knowledge base platform, and the cluster physical table, and the data after adding the dimensional features is stored in a search engine database.
- all data in the data application is stored in the search engine database, and each data stored in the search engine database from the data application is associated with dimensional features, index features, and time granularity features; in addition, the search engine
- the database stores all the data in the report, the knowledge base platform, and the cluster physical table, and each data association stored from the report, the knowledge base platform, and the cluster physical table to the search engine database has dimensional characteristics.
- FIG. 11 is a flowchart of a data processing method according to Embodiment 9 of the present disclosure. As shown in FIG. 11, the data processing method provided in this embodiment may include the following steps:
- Step S1001 The search engine database acquires first data in the data application, and dimension features, index features, and time granularity features of the first data.
- step S1001 may include the following two types:
- the first type the search engine database receives the first data sent by the parser, and the dimensional feature, the index feature, and the time granularity feature of the first data, where the parser is used to collect the data application
- the first data in the first data, and the dimensional feature, the index feature, and the time granularity feature of the first data are parsed.
- the data in the data application 15 can be stored in the search engine database 13 by the syntax parser 19, and specifically, the data in the data application 15 is collected into the syntax parser 19 by the SDK.
- the syntax parser 19 can parse a dimensionality feature, an index feature, a time granular feature, and a read table name of a Structured Query Language (SQL).
- SQL Structured Query Language
- a piece of SQL is as follows:
- user_type AS user type
- the syntax parser 19 can parse that the dimension feature of the segment of SQL is "user type", the indicator feature is “Pv, Uv”, the time granularity feature is "the last day”, and the read table name is "tbbi.ads_tb_log_1d”.
- the syntax parser 19 can parse the dimensional features, index features, and time granularity features of each data in the data application 15.
- the parser parser 19 transmits the parsed data to the search engine database 13, which stores not only the data itself but also dimensional features, index features, and time granularity features of the data.
- the second type includes the following steps S31-S32:
- Step S31 The search engine database acquires access logic of the data application access data source, where the access logic includes first data in the data application, and the data source stores the first data. Out of logic.
- the data application may be specifically in the form of a web page, and needs to interact with the background data source frequently; the background data source may be specifically a device that stores the data application operation data. Since the data application is developed according to the SDK, the SDK has the greatest operational authority for the data application, so the first access logic of the data application to the background data source can be captured by the SDK, and the first access logic includes the data application. The time when the program accesses the background data source, the second access logic of the user to the data application, and so on. Therefore, the second access logic of the user to the data application can be obtained through the existing parsing method.
- the second access logic of the user to the data application includes field information such as the time when the user accesses the data application, the data in the data application currently accessed by the user, and the like. Therefore, through the existing parsing method, the data in the data application currently accessed by the user can be obtained.
- user_type AS user type
- the data in the data application currently accessed by the user is “tbbi.ads_tb_log_1d”, that is, the information after the FROM field.
- the background data source stores the output logic of each data. Therefore, in the background data source, the output logic of the data in the data application currently accessed by the user can be directly searched.
- the output logic includes aggregation object information of the first data, indicator information of the participation operation in the aggregation process, and time information of the indicator operation.
- the search engine database determines that the aggregated object information of the first data is a dimensional feature of the first data; and the search engine database determines that the indicator information of the first data participating in the operation in the aggregation process is Determining an indicator characteristic of the first data; the search engine database determining a time granularity characteristic of the first data according to time information of the indicator operation.
- Step S32 The search engine database determines, according to the output logic, feature information of the first data in the data application, where the feature information includes a dimension feature, an indicator feature, and a time granularity feature.
- stat_date Group by user_type, stat_date
- the aggregated object information of the data in the data application currently accessed by the user is stat_date, user_type, that is, the information after the Group by field;
- the index information of the participating operation in the aggregation process is se_lpv_pc_1d_001, Se_uv_pc_1d_001, that is, the information after the count(1) and count(distinct uid) fields;
- the time information of the index operation is '20160119', that is, the fractional area after the where ds field.
- the time granularity feature is 7.
- Step S1002 The search engine database respectively acquires a second data in a report, a knowledge base platform, a cluster physical table, and a dimensional feature of the second data.
- the search engine database respectively obtains the report, the knowledge base platform, and second data in the cluster physical table; and the search engine database determines the report and the knowledge base according to a preset algorithm. a platform and a dimensional feature of each second data in the cluster physical table.
- each of the data in the report, the knowledge base platform, and the cluster physical table may be split by a TF-iDF algorithm.
- the LDA algorithm and the TOPIC MODEL algorithm are used to extract the features of the split data, and the extracted features are used as the dimensional features of the corresponding data.
- Step S1003 The search engine database stores the first data, and dimension features, index features, and time granularity features of the first data.
- Step S1004 The search engine database stores the second data, and dimension features of the second data.
- Dimension features are added to each of the data in the report, the knowledge base platform, and the cluster physical table, and the data after adding the dimensional features is stored in a search engine database.
- the data in the data application, the report, the knowledge base platform, and the cluster physical table are collected in advance into the search engine database, and the dimension feature, the index feature, and the time granularity feature are added to each collected data.
- the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find the data matching the dimension keyword, the index keyword and the time granularity keyword respectively, and display the matched data to the user; the user does not need to traverse each data export to perform data search, only need to input the search keyword once, the search engine
- the database can find out the data related to the search keyword in all data exits, thereby improving the efficiency of finding data.
- FIG. 12 is a schematic structural diagram of a query terminal according to Embodiment 1 of the present disclosure. As shown in FIG. 12, the query terminal includes: a receiving unit, a processing unit, and a sending unit.
- the receiving unit is configured to receive a query request of a user, where the query request includes a search keyword.
- the processing unit is coupled to the receiving unit, configured to acquire a dimension keyword, an index keyword, and a time granularity keyword in the search keyword.
- the sending unit is coupled to the processing unit, configured to send the dimension keyword, the index keyword, and the time granularity keyword to a search engine database, so that the search engine database acquires The first data corresponding to the dimension feature matched by the dimension keyword, the second data corresponding to the indicator feature matched by the index keyword, and the third data corresponding to the time granularity feature matched by the time granularity keyword,
- the search engine database pre-stores data in the data exit and characteristic information of the data, and the data export includes at least one of the following: a data application, a report, a knowledge base platform, and a cluster physical table, and the feature information At least one of the following is included: a dimensional feature, an index feature, and a time granular feature.
- the receiving unit is further configured to receive the first data, the second data, and the third data that are sent by the search engine database.
- the processing unit is further configured to determine target data that is fed back to the user according to the first data, the second data, and the third data.
- the data in the data application, the report, the knowledge base platform, and the cluster physical table are collected in advance into the search engine database, and the dimension feature, the index feature, and the time granularity feature are added to each collected data.
- the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find the data matching the dimension keyword, the index keyword and the time granularity keyword respectively, and display the matched data to the user; the user does not need to traverse each data export to perform data search, only need to input the search keyword once, the search engine
- the database can find out the data related to the search keyword in all data exits, thereby improving the efficiency of finding data.
- the processing unit is specifically configured to perform word segmentation processing on the search keyword to obtain a plurality of target word segments; and query a preset mapping table according to each target word segment, where the mapping table includes a dimension. a word segmentation, an index word segmentation, and a time granularity word segmentation; determining, in the plurality of target word segments, a target word segment that matches the dimension word segment as the dimension keyword; and matching the plurality of target word segments with the target word segmentation target The word segmentation is determined as the index keyword; the target segmentation of the plurality of target word segments that matches the time granularity word segment is determined as the time granularity keyword.
- the processing unit is specifically configured to determine the first data, the second data, and the third number Whether the data is the same data; if the first data, the second data, and the third data are the same data, the processing unit determines the same data as target data fed back to the user; The first data, the second data, and the third data are not the same data, and the processing unit sorts the first data, the second data, and the third data, and after sorting The data is determined to be target data that is fed back to the user.
- a plurality of target word segments are obtained by performing word segmentation processing on the search keywords, and the dimension keywords, index keywords, and time granularity keywords in the plurality of target word segments are queried according to the pre-established mapping table, thereby improving the determined search.
- the efficiency of dimensional keywords, index keywords and time-granulated keywords in keywords are improved.
- FIG. 13 is a schematic structural diagram of a query terminal according to Embodiment 2 of the present disclosure. As shown in FIG. 13, the query terminal further includes: a display.
- the receiving unit is further configured to receive a click operation of the target data by the user.
- the processing unit is further configured to establish an association relationship between the user and the target data according to the click operation.
- the display is coupled to the processing unit, and when the user does not input the search keyword, the display displays the target data associated with the association relationship.
- the target data that the user clicked may be displayed according to the relationship between the user and the target data, thereby improving The convenience of users to query data.
- the query terminal 1900 includes a processing component 1922, which further includes one or more processors, and a memory resource represented by the memory 1932, for Instructions executable by the processing component 1922, such as an application, are stored.
- An application stored in memory 1932 can include one or more modules each corresponding to a set of instructions.
- the processing component 1922 is configured to execute instructions to perform the methods of steps S201-S1004 described above.
- Apparatus 1900 can also include a power supply component 1926 configured to perform power management of apparatus 1900, a wired or wireless network interface 1950 configured to connect apparatus 1900 to the network, and an input/output (I/O) interface 1958.
- Device 1900 can operate based on an operating system stored in memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
- FIG. 15 is a schematic structural diagram of a search engine database according to an embodiment of the present disclosure.
- the search engine database includes: a receiver, a memory, a processor, and a transmitter.
- the receiver is configured to receive a dimension keyword, an index keyword, and a time granularity keyword sent by the query terminal, where the dimension keyword, the index keyword, and the time granularity keyword are the query terminal Receiving a query request of the user and obtaining it from the search keywords included in the query request.
- the memory is configured to store data in a data exit, and characteristic information of the data, where the data export includes at least one of the following: a data application, a report, a knowledge base platform, and a cluster physical table, and the feature information At least one of the following is included: a dimensional feature, an index feature, and a time granular feature.
- the processor is coupled to the receiver and the memory, and is configured to acquire first data corresponding to the dimensional feature matched by the dimension keyword, and second data corresponding to the index feature matched by the index keyword And third data corresponding to the time granularity feature matched by the time granularity keyword.
- the transmitter is coupled to the processor, configured to send the first data, the second data, and the third data to the query terminal, so that the query terminal is configured according to the first The data, the second data, and the third data determine target data that is fed back to the user.
- the data in the data application, the report, the knowledge base platform, and the cluster physical table are collected in advance into the search engine database, and the dimension feature, the index feature, and the time granularity feature are added to each collected data.
- the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find the data matching the dimension keyword, the index keyword and the time granularity keyword respectively, and display the matched data to the user; the user does not need to traverse each data export to perform data search, only need to input the search keyword once, the search engine
- the database can find out the data related to the search keyword in all data exits, thereby improving the efficiency of finding data.
- the processor is specifically configured to acquire access logic of the data application accessing a data source, where the access logic includes data in the data application, where the data source is stored Output logic of the data; determining feature information of data in the data application according to the output logic; storing data in the data application, and feature information of the data to the memory .
- the receiver is further configured to receive data sent by a parser, and dimension features, index features, and time granularity features of the data, where the parser is used in a collection center.
- the processor is specifically configured to acquire data in the report, the knowledge base platform, and the cluster physical table, respectively, according to the preset embodiment, and determine the report according to a preset algorithm.
- all data in the data application is stored in the search engine database, and each data stored in the search engine database from the data application is associated with dimensional features, index features, and time granularity features; in addition, the search engine
- the database stores all the data in the report, the knowledge base platform, and the cluster physical table, and each data association stored from the report, the knowledge base platform, and the cluster physical table to the search engine database has dimensional characteristics.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
本申请要求2016年08月11日递交的申请号为201610657498.8、发明名称为“数据处理方法、设备及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims the priority of the Chinese Patent Application Serial No. PCT Application No.
本公开涉及互联网技术,尤其涉及一种数据处理方法、设备及系统。The present disclosure relates to Internet technologies, and in particular, to a data processing method, device, and system.
随着互联网的飞速发展,数据呈爆炸性增长。目前,所有具备大数据资产的公司,其存储数据的数据量均较大。而上述公司一般通过四种数据出口,该数据出口是指存储有数据的存储空间或能够生成数据的软件应用,且该存储空间或软件应用能够为数据库提供数据来源,将其存储的海量数据呈现给公司的所有员工,该四种数据出口分别为数据应用程序出口(比如阿里巴巴公司的淘宝生意经和百度公司的百度指数等)、报表出口(比如公司的工资报表)、知识库平台出口(比如百度公司的百度百科)和集群物理表出口(比如公司用户的个人信息)。With the rapid development of the Internet, data has exploded. Currently, all companies with big data assets have a large amount of data stored. The above-mentioned companies generally use four kinds of data export, which refers to a storage space in which data is stored or a software application capable of generating data, and the storage space or software application can provide a data source for the database and present the massive data stored therein. For all employees of the company, the four data exports are data application exports (such as Alibaba's Taobao business and Baidu's Baidu index, etc.), report exports (such as the company's salary report), and knowledge base platform exports (such as Baidu's Baidu Encyclopedia and cluster physical table exports (such as corporate users' personal information).
对于上述公司的非技术员工,一般需依次查找上述四种数据出口,才能获得所需的数据。比如公司的一非技术员工,有获取公司“某天家装产品的成交金额”的需求,那么该非技术人员,需依次查找公司的数据应用程序出口、报表出口、知识库平台出口以及集群物理表出口,直至查找到公司的“某天家装产品的成交金额”为止。For the non-technical employees of the above companies, it is generally necessary to find the above four data exports in order to obtain the required data. For example, a non-technical employee of the company has the need to obtain the “amount of transaction for a home-made product”, so the non-technical personnel need to find the company's data application export, report export, knowledge base platform export, and cluster physical table. Export until you find the company's "the amount of the day's home improvement products."
由于在实际应用中,上述每种数据出口所呈现数据的数据量均较大,那么非技术员工通过依次查找每种数据出口进行数据的查找,势必会造成查询数据的效率低下。Since in the actual application, the data amount of the data presented by each of the above data outlets is large, the non-technical staff searches for each data export in turn, which inevitably results in inefficient query data.
发明内容Summary of the invention
本公开提供一种数据处理方法、设备及系统,以提高查找数据的效率。The present disclosure provides a data processing method, device and system to improve the efficiency of finding data.
一个方面,本公开提供一种数据处理系统,包括:查询终端和搜索引擎数据库;In one aspect, the present disclosure provides a data processing system including: a query terminal and a search engine database;
所述查询终端,用于接收用户的查询请求,所述查询请求包括检索关键词;所述查询终端获取所述检索关键词中的维度关键词、指标关键词和时间粒度关键词,并将所述维度关键词、所述指标关键词、以及所述时间粒度关键词发送给所述搜索引擎数据库;The query terminal is configured to receive a query request of a user, where the query request includes a search keyword; the query terminal acquires a dimension keyword, an index keyword, and a time granularity keyword in the search keyword, and Transmitting the dimension keyword, the index keyword, and the time granularity keyword to the search engine database;
所述搜索引擎数据库预先存储有数据出口中的数据,以及所述数据的特征信息,所 述特征信息包括下述至少一种:维度特征、指标特征和时间粒度特征;The search engine database pre-stores data in the data exit, and characteristic information of the data, The feature information includes at least one of the following: a dimensional feature, an index feature, and a time granularity feature;
所述搜索引擎数据库,用于获取与所述维度关键词匹配的维度特征对应的第一数据、与所述指标关键词匹配的指标特征对应的第二数据、以及与所述时间粒度关键词匹配的时间粒度特征对应的第三数据,并将所述第一数据、所述第二数据和所述第三数据发送给所述查询终端;The search engine database is configured to acquire first data corresponding to the dimension feature matching the dimension keyword, second data corresponding to the index feature matched by the index keyword, and match the time granularity keyword Corresponding third data corresponding to the time granularity feature, and sending the first data, the second data, and the third data to the query terminal;
所述查询终端,还用于根据所述第一数据、所述第二数据和所述第三数据,确定反馈给所述用户的目标数据,并将所述目标数据显示给所述用户。The query terminal is further configured to determine, according to the first data, the second data, and the third data, target data that is fed back to the user, and display the target data to the user.
另一方面,本公开提供一种数据处理方法,包括:In another aspect, the present disclosure provides a data processing method, including:
查询终端接收用户的查询请求,所述查询请求包括检索关键词;The querying terminal receives a query request of the user, where the query request includes a search keyword;
所述查询终端获取所述检索关键词中的维度关键词、指标关键词和时间粒度关键词;The query terminal acquires a dimension keyword, an index keyword, and a time granularity keyword in the search keyword;
所述查询终端将所述维度关键词、所述指标关键词、以及所述时间粒度关键词发送给搜索引擎数据库,以使所述搜索引擎数据库获取与所述维度关键词匹配的维度特征对应的第一数据、与所述指标关键词匹配的指标特征对应的第二数据、以及与所述时间粒度关键词匹配的时间粒度特征对应的第三数据,所述搜索引擎数据库预先存储有数据出口中的数据,以及所述数据的特征信息,所述特征信息包括下述至少一种:维度特征、指标特征和时间粒度特征;Transmitting, by the query terminal, the dimension keyword, the index keyword, and the time granularity keyword to a search engine database, so that the search engine database obtains a dimension feature that matches the dimension keyword. The first data, the second data corresponding to the indicator feature matched by the indicator keyword, and the third data corresponding to the time granularity feature matched by the time granularity keyword, the search engine database is pre-stored in the data exit Data, and characteristic information of the data, the feature information includes at least one of the following: a dimensional feature, an index feature, and a time granularity feature;
所述查询终端接收所述搜索引擎数据库发送的所述第一数据、所述第二数据和所述第三数据;Receiving, by the query terminal, the first data, the second data, and the third data that are sent by the search engine database;
所述查询终端根据所述第一数据、所述第二数据和所述第三数据,确定反馈给所述用户的目标数据。The querying terminal determines target data fed back to the user according to the first data, the second data, and the third data.
另一方面,本公开提供一种数据处理方法,包括:In another aspect, the present disclosure provides a data processing method, including:
查询终端接收用户的查询请求,所述查询请求包括检索关键词;The querying terminal receives a query request of the user, where the query request includes a search keyword;
所述查询终端至少获取所述检索关键词中的两类关键词;The query terminal acquires at least two types of keywords in the search keyword;
所述查询终端将至少两类关键词发送给搜索引擎数据库,以使所述搜索引擎数据库获取与所述至少两类关键词分别对应的源数据;The querying terminal sends at least two types of keywords to the search engine database, so that the search engine database obtains source data corresponding to the at least two types of keywords respectively;
所述查询终端接收所述搜索引擎数据库发送的所述源数据;The query terminal receives the source data sent by the search engine database;
所述查询终端根据所述源数据,确定反馈给所述用户的目标数据。The query terminal determines target data fed back to the user according to the source data.
另一方面,本公开提供一种数据处理方法,包括:In another aspect, the present disclosure provides a data processing method, including:
搜索引擎数据库接收查询终端发送的维度关键词、指标关键词、以及时间粒度关键 词,所述维度关键词、所述指标关键词、以及所述时间粒度关键词是所述查询终端接收用户的查询请求,并从所述查询请求包括的检索关键词中获取的;The search engine database receives the dimension keywords, index keywords, and time granularity key sent by the query terminal. a word, the dimension keyword, the index keyword, and the time granularity keyword are: the query terminal receives a query request of a user, and is obtained from a search keyword included in the query request;
所述搜索引擎数据库预先存储有数据出口中的数据,以及所述数据的特征信息,所述特征信息包括下述至少一种:维度特征、指标特征和时间粒度特征;The search engine database pre-stores data in the data exit and feature information of the data, and the feature information includes at least one of the following: a dimension feature, an index feature, and a time granularity feature;
所述搜索引擎数据获取与所述维度关键词匹配的维度特征对应的第一数据、与所述指标关键词匹配的指标特征对应的第二数据、以及与所述时间粒度关键词匹配的时间粒度特征对应的第三数据;The search engine data acquires first data corresponding to the dimensional feature matched by the dimension keyword, second data corresponding to the index feature matched by the index keyword, and time granularity matched with the time granularity keyword Third data corresponding to the feature;
所述搜索引擎数据将所述第一数据、所述第二数据和所述第三数据发送给所述查询终端,以使所述查询终端根据所述第一数据、所述第二数据和所述第三数据,确定反馈给所述用户的目标数据。Transmitting, by the search engine data, the first data, the second data, and the third data to the query terminal, so that the query terminal is configured according to the first data, the second data, and the The third data is determined to determine target data that is fed back to the user.
还一方面,本公开提供一种数据处理方法,包括:In still another aspect, the present disclosure provides a data processing method, including:
搜索引擎数据库获取数据应用程序中的第一数据,以及所述第一数据的维度特征、指标特征、时间粒度特征;The search engine database acquires first data in the data application, and dimension features, indicator features, and time granularity features of the first data;
所述搜索引擎数据库分别获取报表、知识库平台、集群物理表中的第二数据,以及所述第二数据的维度特征;The search engine database respectively acquires a second data in the report, the knowledge base platform, the cluster physical table, and the dimensional characteristics of the second data;
所述搜索引擎数据库存储所述第一数据,以及所述第一数据的维度特征、指标特征、时间粒度特征;The search engine database stores the first data, and dimension features, index features, and time granularity features of the first data;
所述搜索引擎数据库存储所述第二数据,以及所述第二数据的维度特征。The search engine database stores the second data and dimensional features of the second data.
另一方面,本公开提供一种查询终端,包括:接收单元、处理单元、以及发送单元;In another aspect, the disclosure provides a query terminal, including: a receiving unit, a processing unit, and a sending unit;
所述接收单元,用于接收用户的查询请求,所述查询请求包括检索关键词;The receiving unit is configured to receive a query request of a user, where the query request includes a search keyword;
所述处理单元,耦合到所述接收单元,用于获取所述检索关键词中的维度关键词、指标关键词和时间粒度关键词;The processing unit is coupled to the receiving unit, configured to acquire a dimension keyword, an index keyword, and a time granularity keyword in the search keyword;
所述发送单元,耦合到所述处理单元,用于将所述维度关键词、所述指标关键词、以及所述时间粒度关键词发送给搜索引擎数据库,以使所述搜索引擎数据库获取与所述维度关键词匹配的维度特征对应的第一数据、与所述指标关键词匹配的指标特征对应的第二数据、以及与所述时间粒度关键词匹配的时间粒度特征对应的第三数据,所述搜索引擎数据库预先存储有数据出口中的数据,以及所述数据的特征信息,所述数据出口包括下述至少一种:数据应用程序、报表、知识库平台以及集群物理表,所述特征信息包括下述至少一种:维度特征、指标特征和时间粒度特征;The sending unit is coupled to the processing unit, configured to send the dimension keyword, the index keyword, and the time granularity keyword to a search engine database, so that the search engine database acquires The first data corresponding to the dimension feature matched by the dimension keyword, the second data corresponding to the indicator feature matched by the index keyword, and the third data corresponding to the time granularity feature matched by the time granularity keyword, The search engine database pre-stores data in the data exit and characteristic information of the data, and the data export includes at least one of the following: a data application, a report, a knowledge base platform, and a cluster physical table, and the feature information Including at least one of the following: dimensional features, indicator features, and time granularity features;
所述接收单元还用于接收所述搜索引擎数据库发送的所述第一数据、所述第二数据 和所述第三数据;The receiving unit is further configured to receive the first data and the second data sent by the search engine database And the third data;
所述处理单元还用于根据所述第一数据、所述第二数据和所述第三数据,确定反馈给所述用户的目标数据。The processing unit is further configured to determine target data that is fed back to the user according to the first data, the second data, and the third data.
再一方面,本公开提供一种搜索引擎数据库,包括:接收器、存储器、处理器、以及发送器;In still another aspect, the present disclosure provides a search engine database, including: a receiver, a memory, a processor, and a transmitter;
所述接收器,用于接收查询终端发送的维度关键词、指标关键词、以及时间粒度关键词,所述维度关键词、所述指标关键词、以及所述时间粒度关键词是所述查询终端接收用户的查询请求,并从所述查询请求包括的检索关键词中获取的;The receiver is configured to receive a dimension keyword, an index keyword, and a time granularity keyword sent by the query terminal, where the dimension keyword, the index keyword, and the time granularity keyword are the query terminal Receiving a query request of the user, and obtaining the search keyword included in the query request;
所述存储器,用于存储数据出口中的数据,以及所述数据的特征信息,所述数据出口包括下述至少一种:数据应用程序、报表、知识库平台以及集群物理表,所述特征信息包括下述至少一种:维度特征、指标特征和时间粒度特征;The memory is configured to store data in a data exit, and characteristic information of the data, where the data export includes at least one of the following: a data application, a report, a knowledge base platform, and a cluster physical table, and the feature information Including at least one of the following: dimensional features, indicator features, and time granularity features;
所述处理器,耦合到所述接收器和所述存储器,用于获取与所述维度关键词匹配的维度特征对应的第一数据、与所述指标关键词匹配的指标特征对应的第二数据、以及与所述时间粒度关键词匹配的时间粒度特征对应的第三数据;The processor is coupled to the receiver and the memory, and is configured to acquire first data corresponding to the dimensional feature matched by the dimension keyword, and second data corresponding to the index feature matched by the index keyword And third data corresponding to the time granularity feature matching the time granularity keyword;
所述发送器,耦合到所述处理器,用于将所述第一数据、所述第二数据和所述第三数据发送给所述查询终端,以使所述查询终端根据所述第一数据、所述第二数据和所述第三数据,确定反馈给所述用户的目标数据。The transmitter is coupled to the processor, configured to send the first data, the second data, and the third data to the query terminal, so that the query terminal is configured according to the first The data, the second data, and the third data determine target data that is fed back to the user.
在本公开中,通过预先采集数据应用程序、报表、知识库平台以及集群物理表中的数据至搜索引擎数据库内,且为所采集的每一条数据添加维度特征、指标特征和时间粒度特征中的至少一个;当搜索引擎接收到用户输入的检索关键词时,首先对检索关键词进行拆分,获得维度关键词、指标关键词以及时间粒度关键词;然后,在预先建立的搜索引擎数据库中,分别查找与维度关键词、指标关键词以及时间粒度关键词相匹配的数据,并将匹配的数据显示给用户;用户无需遍历每个数据出口进行数据查找,仅需输入一次检索关键词,搜索引擎数据库即可查找出所有数据出口中与该检索关键词相关的数据,从而提高了查找数据的效率。In the present disclosure, data in a data application, a report, a knowledge base platform, and a cluster physical table are collected in advance into a search engine database, and dimension attributes, index features, and time granularity features are added to each piece of data collected. At least one; when the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find the data matching the dimension keyword, the index keyword and the time granularity keyword respectively, and display the matched data to the user; the user does not need to traverse each data export to perform data search, only need to input the search keyword once, the search engine The database can find out the data related to the search keyword in all data exits, thereby improving the efficiency of finding data.
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还 可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description It is a certain embodiment of the present invention, and for those skilled in the art, without creative labor, Other drawings can be obtained from these figures.
图1为本公开的一种可选的应用场景的示意图;FIG. 1 is a schematic diagram of an optional application scenario of the present disclosure;
图2为本公开实施例提供的数据处理系统的结构示意图;2 is a schematic structural diagram of a data processing system according to an embodiment of the present disclosure;
图3为本公开实施例一提供的数据处理方法的流程图;FIG. 3 is a flowchart of a data processing method according to
图4为本公开实施例二提供的数据处理方法的流程图;4 is a flowchart of a data processing method according to
图5为本公开实施例三提供的数据处理方法的流程图;FIG. 5 is a flowchart of a data processing method according to Embodiment 3 of the present disclosure;
图6为本公开实施例四提供的数据处理方法的流程图;6 is a flowchart of a data processing method according to Embodiment 4 of the present disclosure;
图7为本公开实施例五提供的数据处理方法的流程图;FIG. 7 is a flowchart of a data processing method according to Embodiment 5 of the present disclosure;
图8为本公开实施例六提供的数据处理方法的流程图;8 is a flowchart of a data processing method according to Embodiment 6 of the present disclosure;
图9为本公开实施例七提供的数据处理方法的流程图;9 is a flowchart of a data processing method according to Embodiment 7 of the present disclosure;
图10为本公开实施例八提供的数据处理方法的流程图;FIG. 10 is a flowchart of a data processing method according to
图11为本公开实施例九提供的数据处理方法的流程图;11 is a flowchart of a data processing method according to Embodiment 9 of the present disclosure;
图12为本公开实施例一提供的查询终端的结构示意图;FIG. 12 is a schematic structural diagram of a query terminal according to
图13为本公开实施例二提供的查询终端的结构示意图;FIG. 13 is a schematic structural diagram of a query terminal according to
图14为本公开实施例三提供的查询终端的结构示意图;FIG. 14 is a schematic structural diagram of a query terminal according to Embodiment 3 of the present disclosure;
图15为本公开实施例提供的搜索引擎数据库的结构示意图。FIG. 15 is a schematic structural diagram of a search engine database according to an embodiment of the present disclosure.
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本发明相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本发明的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. The following description refers to the same or similar elements in the different figures unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Instead, they are merely examples of devices and methods consistent with aspects of the invention as detailed in the appended claims.
现有技术中,当公司的一非技术员工,需要获取该公司“某天家装产品的成交金额”时,需要依次查找公司的数据应用程序出口、报表出口、知识库平台出口以及集群物理表出口,直至查找到公司的“某天家装产品的成交金额”为止,如此将导致数据查找效率的下降。针对这个问题,本案提出了一种数据处理方法,现将结合图1介绍本案提供的数据处理方法的具体过程。In the prior art, when a non-technical employee of the company needs to obtain the "amount of sales of a home-made product" of the company, it is necessary to sequentially search for the company's data application export, report export, knowledge base platform export, and cluster physical table export. Until the company's "the amount of the home improvement product" is found, this will lead to a decline in the efficiency of data search. In response to this problem, the present invention proposes a data processing method, and the specific process of the data processing method provided by the present invention will now be described in conjunction with FIG.
如图1所示,用户10通过查询终端11查询数据,用户10可以是公司里的非技术人员,还可以是消费者,查询终端11可以是用户10所属的公司内的终端设备,还可以是用户10的个人电脑、笔记本电脑等设备。查询终端11安装有搜索引擎,用户10可
通过查询终端11的键盘在搜索引擎的搜索框中输入搜索关键词,例如,搜索关键词是“最近一天家装类目成交金额”,语义识别模块12将该搜索关键词拆分为大数据领域的维度关键词、指标关键词和时间粒度关键词,具体地,维度关键词是“家装类目”、指标关键词是“成交金额”、时间粒度关键词是“最近一天”。语义识别模块12将该搜索关键词拆分为维度关键词、指标关键词和时间粒度关键词的方法将在下述实施例中详细描述。As shown in FIG. 1 , the
语义识别模块12将拆分后的维度关键词“家装类目”、指标关键词“成交金额”、以及时间粒度关键词“最近一天”发送给搜索引擎数据库13,搜索引擎数据库13的数据来源包括数据应用程序15、报表16、知识库平台17和集群物理表18,其中,数据应用程序15具体可以是数据产品,比如阿里巴巴公司的淘宝生意经和百度公司的百度指数等,数据产品是web页面形式的web产品,数据产品与普通的web产品最大区别在于:数据产品承载有大量数据,且需要频繁与后台数据源交互,该后台数据源具体是存储有该数据应用程序15可操作的数据的器件。在本实施例中,数据应用程序15、报表16中的数据可通过语法解析器19存储在搜索引擎数据库13,以数据应用程序15为例,由于数据应用程序15是通过软件开发工具包(Software Development Kit,简称SDK)开发的,所以可通过SDK将数据应用程序15中的数据采集到语法解析器19中。语法解析器19可解析出一段结构化查询语言(Structured Query Language,简称SQL)的维度特征、指标特征、时间粒度特征和读取的表名,例如一段SQL具体如下:The
SELECT stat_date AS日期SELECT stat_date AS date
,user_type AS用户类型, user_type AS user type
,se_lpv_pc_1d_001 AS Pv,se_lpv_pc_1d_001 AS Pv
,se_uv_pc_1d_001 AS Uv,se_uv_pc_1d_001 AS Uv
FROM tbbi.ads_tb_log_1dFROM tbbi.ads_tb_log_1d
WHERE ds='20151026'WHERE ds='20151026'
语法解析器19可解析出该段SQL的维度特征是“用户类型”,指标特征是“Pv、Uv”,时间粒度特征是“最近一天”,读取的表名是“tbbi.ads_tb_log_1d”。通过前述方法语法解析器19可解析出数据应用程序15和报表16中每个数据的维度特征、指标特征和时间粒度特征。The
语法解析器19将解析后的数据发送给搜索引擎数据库13,搜索引擎数据库13中不仅存储有数据本身,同时还存储有数据的维度特征、指标特征和时间粒度特征。另外,
搜索引擎数据库13还可以存储有知识库平台17和集群物理表18中的数据,存储过程具体为:对知识库平台17和集群物理表18中的每个数据进行拆分,提取出拆分后的每个数据的维度特征,并将知识库平台17和集群物理表18中的每个数据,以及每个数据的维度特征存储在搜索引擎数据库13。如此,搜索引擎数据库13中存储的每个数据至少具有维度特征。The
当搜索引擎数据库13接收到语义识别模块12发送的维度关键词“家装类目”、指标关键词“成交金额”、以及时间粒度关键词“最近一天”时,分别查找出与维度关键词“家装类目”匹配的数据、与指标关键词“成交金额”匹配的数据、以及与时间粒度关键词“最近一天”匹配的数据,搜索引擎数据库13将查找出的匹配数据发送给排序器14,若搜索引擎数据库13查找出的匹配数据只有一个,则排序器14将该匹配数据发送给查询终端11,查询终端11显示该匹配数据;若搜索引擎数据库13查找出的匹配数据有多个,则排序器14按照预设算法对该多个匹配数据进行排序,将排序后的多个匹配数据发送给查询终端11,查询终端11按照排序的先后顺序显示该多个匹配数据。在本实施例中,排序器14对该多个匹配数据进行排序的预设算法包括如下至少一种:Pagerank算法、CUS—距离算法、文档主题生成模型(Latent Dirichlet Allocation,简称LDA)算法、宽度优先搜索(Breadth First Search,简称BFS)算法等。When the
本实施例中,通过预先采集数据应用程序、报表、知识库平台以及集群物理表中的数据至搜索引擎数据库内,且为所采集的每一条数据添加维度特征、指标特征和时间粒度特征中的至少一个;当搜索引擎接收到用户输入的检索关键词时,首先对检索关键词进行拆分,获得维度关键词、指标关键词以及时间粒度关键词;然后,在预先建立的搜索引擎数据库中,分别查找与维度关键词、指标关键词以及时间粒度关键词相匹配的数据,并将匹配的数据显示给用户;用户无需遍历每个数据出口进行数据查找,仅需输入一次检索关键词,搜索引擎数据库即可查找出所有数据出口中与该检索关键词相关的数据,从而提高了查找数据的效率。In this embodiment, the data in the data application, the report, the knowledge base platform, and the cluster physical table are collected in advance into the search engine database, and the dimension feature, the index feature, and the time granularity feature are added to each collected data. At least one; when the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find the data matching the dimension keyword, the index keyword and the time granularity keyword respectively, and display the matched data to the user; the user does not need to traverse each data export to perform data search, only need to input the search keyword once, the search engine The database can find out the data related to the search keyword in all data exits, thereby improving the efficiency of finding data.
图2为本公开实施例提供的数据处理系统的结构示意图,如图2所示,数据处理系统包括查询终端1和搜索引擎数据库2,其中,查询终端1用于接收用户的查询请求,所述查询请求包括检索关键词;所述查询终端获取所述检索关键词中的维度关键词、指标关键词和时间粒度关键词,并将所述维度关键词、所述指标关键词、以及所述时间粒度关键词发送给所述搜索引擎数据库。
2 is a schematic structural diagram of a data processing system according to an embodiment of the present disclosure. As shown in FIG. 2, the data processing system includes a
如图1所示,查询终端11接收用户10的查询请求,查询请求的方式可以有多种,例如,用户10在查询终端11的搜索引擎上输入文字、语音,该文字或语音包括用户10预检索的关键词。如图1所示,语义识别模块12和排序器14可以是属于查询终端11中的模块,语义识别模块12将该搜索关键词拆分为大数据领域的维度关键词、指标关键词和时间粒度关键词,具体地,维度关键词是“家装类目”、指标关键词是“成交金额”、时间粒度关键词是“最近一天”。语义识别模块12还将维度关键词“家装类目”、指标关键词“成交金额”、时间粒度关键词“最近一天”发送给搜索引擎数据库2。As shown in FIG. 1 , the
搜索引擎数据库2预先存储有数据出口中的数据,以及所述数据的特征信息,所述特征信息包括下述至少一种:维度特征、指标特征和时间粒度特征。The
可选的,在本实施例中,数据出口包括:数据应用程序、报表、知识库平台以及集群物理表,搜索引擎数据库13存储有数据应用程序、报表、知识库平台以及集群物理表中的数据,以及每个数据的特征信息,数据应用程序中的每个数据具有维度特征、指标特征和时间粒度特征,报表、知识库平台以及集群物理表中的数据均具有维度特征。Optionally, in this embodiment, the data export includes: a data application, a report, a knowledge base platform, and a cluster physical table. The
搜索引擎数据库2用于获取与所述维度关键词匹配的维度特征对应的第一数据、与所述指标关键词匹配的指标特征对应的第二数据、以及与所述时间粒度关键词匹配的时间粒度特征对应的第三数据,并将所述第一数据、所述第二数据和所述第三数据发送给所述查询终端。The
当搜索引擎数据库13接收到语义识别模块12发送的维度关键词“家装类目”、指标关键词“成交金额”、以及时间粒度关键词“最近一天”时,可分别查找出与维度关键词“家装类目”匹配的数据、与指标关键词“成交金额”匹配的数据、以及与时间粒度关键词“最近一天”匹配的数据。搜索引擎数据库13可将语义识别模块12识别出的维度关键词“家装类目”与其存储的数据的维度特征进行匹配,获得与所述维度关键词匹配的维度特征对应的第一数据,该第一数据可以是多个数据,并且该第一数据可以是源自于数据应用程序15、报表16、知识库平台17或集群物理表18的数据。When the
另外,搜索引擎数据库13还可将语义识别模块12识别出的指标关键词“成交金额”与其存储的数据的指标特征进行匹配,获得与所述指标关键词匹配的指标特征对应的第二数据,该第二数据可以是源自于数据应用程序15的多个数据。In addition, the
此外,搜索引擎数据库13还可将语义识别模块12识别出的时间粒度关键词“最近一天”与其存储的数据的时间粒度特征进行匹配,获得与所述时间粒度关键词匹配的时间粒度特征对应的第三数据,该第三数据可以是源自于数据应用程序15的多个数据。
In addition, the
搜索引擎数据库13其获得的所述第一数据、所述第二数据和所述第三数据发送给查询终端11,具体可以发送给查询终端11中的排序器14。The first data, the second data, and the third data obtained by the
查询终端1还用于根据所述第一数据、所述第二数据和所述第三数据,确定反馈给所述用户的目标数据,并将所述目标数据显示给所述用户。The
若搜索引擎数据库13查找出的匹配数据只有一个,即所述第一数据、所述第二数据和所述第三数据为同一数据,则排序器14将该匹配数据发送给查询终端11的显示器,查询终端11的显示器显示该匹配数据。If the
若搜索引擎数据库13查找出的匹配数据有多个,即所述第一数据、所述第二数据和所述第三数据不为同一数据,则排序器14按照预设算法对该多个匹配数据进行排序,将排序后的多个匹配数据发送给查询终端11的显示器,查询终端11的显示器按照排序的先后顺序显示该多个匹配数据。If there are multiple matching data found by the
本实施例中,通过预先采集数据应用程序、报表、知识库平台以及集群物理表中的数据至搜索引擎数据库内,且为所采集的每一条数据添加维度特征、指标特征和时间粒度特征中的至少一个;当搜索引擎接收到用户输入的检索关键词时,首先对检索关键词进行拆分,获得维度关键词、指标关键词以及时间粒度关键词;然后,在预先建立的搜索引擎数据库中,分别查找与维度关键词、指标关键词以及时间粒度关键词相匹配的数据,并将匹配的数据显示给用户;用户无需遍历每个数据出口进行数据查找,仅需输入一次检索关键词,搜索引擎数据库即可查找出所有数据出口中与该检索关键词相关的数据,从而提高了查找数据的效率。In this embodiment, the data in the data application, the report, the knowledge base platform, and the cluster physical table are collected in advance into the search engine database, and the dimension feature, the index feature, and the time granularity feature are added to each collected data. At least one; when the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find the data matching the dimension keyword, the index keyword and the time granularity keyword respectively, and display the matched data to the user; the user does not need to traverse each data export to perform data search, only need to input the search keyword once, the search engine The database can find out the data related to the search keyword in all data exits, thereby improving the efficiency of finding data.
图3为本公开实施例一提供的数据处理方法的流程图,如图3所示,该方法包括如下步骤:FIG. 3 is a flowchart of a data processing method according to
步骤S201、查询终端接收用户的查询请求,所述查询请求包括检索关键词。Step S201: The query terminal receives a query request of the user, where the query request includes a search keyword.
如图1所示,查询终端11接收用户10的查询请求,查询请求的方式可以有多种,例如,用户10在查询终端11的搜索引擎上输入文字、语音,该文字或语音包括用户10预检索的关键词;或者,查询终端11的搜索引擎上设置有下拉列表,该列表中预先存储有关键词,用户可以通过选择列表中的关键词并点击的方式输入预检索的关键词;再或者,用户10在查询终端11上预览文字信息,用户10从其预览的文字信息中选择关键词,通过拖动、滑动、点击功能键的方式对该关键词进行检索。As shown in FIG. 1 , the
用户10通过查询终端11查询数据,用户10可以是公司里的非技术人员,还可以
是消费者,查询终端11可以是用户10所属的公司内的终端设备,还可以是用户10的个人电脑、笔记本电脑等设备。查询终端11安装有搜索引擎,用户10可通过查询终端11的键盘在搜索引擎的搜索框中输入搜索关键词,例如,搜索关键词是“最近一天家装类目成交金额”。The
步骤S202、所述查询终端获取所述检索关键词中的维度关键词、指标关键词和时间粒度关键词。Step S202: The query terminal acquires a dimension keyword, an index keyword, and a time granularity keyword in the search keyword.
如图1所示,语义识别模块12和排序器14可以是属于查询终端11中的模块,也可以是属于搜索引擎数据库13中的模块,查询终端11和搜索引擎数据库13可以直接连接,也可以通过其他设备间接连接。在本实施例中,以语义识别模块12和排序器14属于查询终端11、查询终端11和搜索引擎数据库13直接连接为例。As shown in FIG. 1 , the
语义识别模块12将该搜索关键词拆分为大数据领域的维度关键词、指标关键词和时间粒度关键词,具体地,维度关键词是“家装类目”、指标关键词是“成交金额”、时间粒度关键词是“最近一天”。The
步骤S203、所述查询终端将所述维度关键词、所述指标关键词、以及所述时间粒度关键词发送给搜索引擎数据库,以使所述搜索引擎数据库获取与所述维度关键词匹配的维度特征对应的第一数据、与所述指标关键词匹配的指标特征对应的第二数据、以及与所述时间粒度关键词匹配的时间粒度特征对应的第三数据。Step S203: The query terminal sends the dimension keyword, the index keyword, and the time granularity keyword to a search engine database, so that the search engine database obtains a dimension matching the dimension keyword. The first data corresponding to the feature, the second data corresponding to the indicator feature matched by the index keyword, and the third data corresponding to the time granularity feature matched by the time granularity keyword.
在实施例中,所述搜索引擎数据库预先存储有数据出口中的数据,以及所述数据的特征信息,所述特征信息包括下述至少一种:维度特征、指标特征和时间粒度特征。In an embodiment, the search engine database pre-stores data in the data exit and feature information of the data, and the feature information includes at least one of the following: a dimensional feature, an indicator feature, and a time granularity feature.
所述查询终端将所述维度关键词、所述指标关键词、以及所述时间粒度关键词发送给搜索引擎数据库,所述搜索引擎数据库预先存储有数据出口中的数据,以及所述数据的特征信息,所述特征信息包括下述至少一种:维度特征、指标特征和时间粒度特征,所述数据出口包括下述至少一种:数据应用程序、报表、知识库平台以及集群物理表。可选的,在本实施例中,数据出口包括:数据应用程序、报表、知识库平台以及集群物理表,搜索引擎数据库13存储有数据应用程序、报表、知识库平台以及集群物理表中的数据。Transmitting, by the query terminal, the dimension keyword, the index keyword, and the time granularity keyword to a search engine database, where the search engine database pre-stores data in a data exit, and characteristics of the data The information includes at least one of the following: a dimensional feature, an indicator feature, and a time granularity feature, and the data outlet includes at least one of the following: a data application, a report, a knowledge base platform, and a cluster physical table. Optionally, in this embodiment, the data export includes: a data application, a report, a knowledge base platform, and a cluster physical table. The
如图1所示,搜索引擎数据库13的数据来源包括数据应用程序15、报表16、知识库平台17和集群物理表18,其中,数据应用程序15具体可以是数据产品,比如阿里巴巴公司的淘宝生意经和百度公司的百度指数等,数据产品是web页面形式的web产品,数据产品与普通的web产品最大区别在于:数据产品承载有大量数据,且需要频繁与后
台数据源交互,该后台数据源具体是存储有该数据应用程序15可操作的数据的器件。在本实施例中,数据应用程序15中的数据可通过语法解析器19存储在搜索引擎数据库13,具体的,通过SDK将数据应用程序15中的数据采集到语法解析器19中。语法解析器19可解析出一段结构化查询语言(Structured Query Language,简称SQL)的维度特征、指标特征、时间粒度特征和读取的表名,例如一段SQL具体如下:As shown in FIG. 1, the data source of the
SELECT stat_date AS日期SELECT stat_date AS date
,user_type AS用户类型, user_type AS user type
,se_lpv_pc_1d_001 AS Pv,se_lpv_pc_1d_001 AS Pv
,se_uv_pc_1d_001 AS Uv,se_uv_pc_1d_001 AS Uv
FROM tbbi.ads_tb_log_1dFROM tbbi.ads_tb_log_1d
WHERE ds='20151026'WHERE ds='20151026'
语法解析器19可解析出该段SQL的维度特征是“用户类型”,指标特征是“Pv、Uv”,时间粒度特征是“最近一天”,读取的表名是“tbbi.ads_tb_log_1d”。通过前述方法,语法解析器19可解析出数据应用程序15中每个数据的维度特征、指标特征和时间粒度特征。语法解析器19将解析后的数据发送给搜索引擎数据库13,搜索引擎数据库13中不仅存储有数据本身,同时还存储有数据的维度特征、指标特征和时间粒度特征。The
另外,搜索引擎数据库13还可以存储有报表16、知识库平台17和集群物理表18中的数据,存储过程具体为:对报表16、知识库平台17和集群物理表18中的每个数据进行拆分,提取出拆分后的每个数据的维度特征,并将报表16、知识库平台17和集群物理表18中的每个数据,以及每个数据的维度特征存储在搜索引擎数据库13。如此,搜索引擎数据库13中存储的每个数据至少具有维度特征。In addition, the
当搜索引擎数据库13接收到语义识别模块12发送的维度关键词“家装类目”、指标关键词“成交金额”、以及时间粒度关键词“最近一天”时,可分别查找出与维度关键词“家装类目”匹配的数据、与指标关键词“成交金额”匹配的数据、以及与时间粒度关键词“最近一天”匹配的数据。When the
在本实施例中,搜索引擎数据库13中存储有数据应用程序15中的数据,以及数据应用程序15中每个数据的维度特征、指标特征和时间粒度特征。另外,搜索引擎数据库13还存储有报表16、知识库平台17和集群物理表18中的数据,以及报表16、知识库平台17和集群物理表18中每个数据的维度特征。另外,搜索引擎数据库13中各数
据的维度特征可能不同,可能相同;各数据的指标特征可能不同,可能相同;各数据的时间粒度特征可能不同,可能相同。In the present embodiment, the
本实施例中的搜索引擎数据库13可将语义识别模块12识别出的维度关键词“家装类目”与其存储的数据的维度特征进行匹配,获得与所述维度关键词匹配的维度特征对应的第一数据,该第一数据可以是多个数据,并且该第一数据可以是源自于数据应用程序15、报表16、知识库平台17或集群物理表18的数据。The
另外,搜索引擎数据库13还可将语义识别模块12识别出的指标关键词“成交金额”与其存储的数据的指标特征进行匹配,获得与所述指标关键词匹配的指标特征对应的第二数据,该第二数据可以是源自于数据应用程序15的多个数据。In addition, the
此外,搜索引擎数据库13还可将语义识别模块12识别出的时间粒度关键词“最近一天”与其存储的数据的时间粒度特征进行匹配,获得与所述时间粒度关键词匹配的时间粒度特征对应的第三数据,该第三数据可以是源自于数据应用程序15的多个数据。In addition, the
步骤S204、所述查询终端接收所述搜索引擎数据库发送的所述第一数据、所述第二数据和所述第三数据。Step S204: The query terminal receives the first data, the second data, and the third data that are sent by the search engine database.
搜索引擎数据库13其获得的所述第一数据、所述第二数据和所述第三数据发送给查询终端11,具体可以发送给查询终端11中的排序器14。The first data, the second data, and the third data obtained by the
步骤S205、所述查询终端根据所述第一数据、所述第二数据和所述第三数据,确定反馈给所述用户的目标数据。Step S205: The querying terminal determines, according to the first data, the second data, and the third data, target data that is fed back to the user.
若搜索引擎数据库13查找出的匹配数据只有一个,即所述第一数据、所述第二数据和所述第三数据为同一数据,则排序器14将该匹配数据发送给查询终端11的显示器,查询终端11的显示器显示该匹配数据。If the
若搜索引擎数据库13查找出的匹配数据有多个,即所述第一数据、所述第二数据和所述第三数据不为同一数据,则排序器14按照预设算法对该多个匹配数据进行排序,将排序后的多个匹配数据发送给查询终端11的显示器,查询终端11的显示器按照排序的先后顺序显示该多个匹配数据。If there are multiple matching data found by the
本实施例中,通过预先采集数据应用程序、报表、知识库平台以及集群物理表中的数据至搜索引擎数据库内,且为所采集的每一条数据添加维度特征、指标特征和时间粒度特征中的至少一个;当搜索引擎接收到用户输入的检索关键词时,首先对检索关键词进行拆分,获得维度关键词、指标关键词以及时间粒度关键词;然后,在预先建立的搜索引擎数据库中,分别查找与维度关键词、指标关键词以及时间粒度关键词相匹配的数 据,并将匹配的数据显示给用户;用户无需遍历每个数据出口进行数据查找,仅需输入一次检索关键词,搜索引擎数据库即可查找出所有数据出口中与该检索关键词相关的数据,从而提高了查找数据的效率。In this embodiment, the data in the data application, the report, the knowledge base platform, and the cluster physical table are collected in advance into the search engine database, and the dimension feature, the index feature, and the time granularity feature are added to each collected data. At least one; when the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find numbers that match the dimension keyword, indicator keyword, and time granularity keyword According to the data, the matching data is displayed to the user; the user does not need to traverse each data outlet to perform data search, and only needs to input the search keyword once, and the search engine database can find the data related to the search keyword in all data outlets. This improves the efficiency of finding data.
图4为本公开实施例二提供的数据处理方法的流程图,如图4所示,在图3所示实施例的基础上,所述查询终端获取所述检索关键词中的维度关键词、指标关键词和时间粒度关键词的方法可以具体包括如下步骤:4 is a flowchart of a data processing method according to
步骤S301、所述查询终端对所述检索关键词进行分词处理获得多个目标分词。Step S301: The query terminal performs word segmentation on the search keyword to obtain a plurality of target word segments.
例如步骤S201所述,用户输入的搜索关键词是“最近一天家装类目成交金额”。查询终端11还可通过TF-idf算法对用户输入的检索关键词进行拆分,获得多个目标分词,多个目标分词分别为“最近一天”、“家装类目”、“成交金额”。For example, as described in step S201, the search keyword input by the user is "the most recent home improvement category transaction amount". The
步骤S302、所述查询终端根据各目标分词查询预设的映射表,所述映射表包括维度分词、指标分词和时间粒度分词。Step S302: The querying terminal queries a preset mapping table according to each target word segment, where the mapping table includes a dimension word segmentation, an index word segmentation, and a time granularity word segmentation.
在本实施例中,查询终端11预先建立有映射表,该映射表包括维度分词、指标分词和时间粒度分词,维度分词可以是多个具有维度特征的分词,指标分词可以是多个具有指标特征的分词,时间粒度分词可以是多个具有时间粒度特征的分词。根据步骤S301拆分后的多个目标分词,查询终端11分别查询该映射表,对于每个目标分词,确定该映射表中是否存在与该目标分词匹配的分词。In this embodiment, the
步骤S303、所述查询终端将所述多个目标分词中与所述维度分词匹配的目标分词确定为所述维度关键词。Step S303: The query terminal determines, as the dimension keyword, a target word segment that matches the dimension word segment among the plurality of target word segments.
例如,上述多个目标分词中的“家装类目”与映射表中的维度分词匹配,则将“家装类目”作为检索关键词中的维度关键词。For example, if the “home improvement category” in the plurality of target word segments matches the dimension word segment in the mapping table, the “home improvement category” is used as the dimension keyword in the search keyword.
步骤S304、所述查询终端将所述多个目标分词中与所述指标分词匹配的目标分词确定为所述指标关键词。Step S304: The query terminal determines, as the index keyword, a target word segment that matches the indicator word segment among the plurality of target word segments.
例如,上述多个目标分词中的“成交金额”与映射表中的指标分词匹配,则将“成交金额”作为检索关键词中的指标关键词。For example, if the “transaction amount” in the plurality of target particials matches the indicator participle in the mapping table, the “transaction amount” is used as the index keyword in the search keyword.
步骤S305、所述查询终端将所述多个目标分词中与所述时间粒度分词匹配的目标分词确定为所述时间粒度关键词。Step S305: The query terminal determines, as the time granularity keyword, a target word segment that matches the time granularity word segment among the plurality of target word segments.
例如,上述多个目标分词中的“最近一天”与映射表中的时间粒度分词匹配,则将“最近一天”作为检索关键词中的时间粒度关键词。 For example, if the “last day” in the plurality of target word segments matches the time granularity word segmentation in the mapping table, the “last day” is used as the time granularity keyword in the search keyword.
本实施例中,通过对检索关键词进行分词处理获得多个目标分词,根据预先建立的映射表查询该多个目标分词中的维度关键词、指标关键词以及时间粒度关键词,提高了确定检索关键词中维度关键词、指标关键词以及时间粒度关键词的效率。In this embodiment, a plurality of target word segments are obtained by performing word segmentation processing on the search keywords, and the dimension keywords, index keywords, and time granularity keywords in the plurality of target word segments are queried according to the pre-established mapping table, thereby improving the determined search. The efficiency of dimensional keywords, index keywords and time-granulated keywords in keywords.
图5为本公开实施例三提供的数据处理方法的流程图,如图5所示,在上述任一实施例的基础上,以实施例二为基础,本实施例提供的数据处理方法的具体步骤如下:FIG. 5 is a flowchart of a data processing method according to Embodiment 3 of the present disclosure. As shown in FIG. 5, based on any of the foregoing embodiments, based on the second embodiment, the data processing method provided by this embodiment is specific. Proceed as follows:
步骤S401、查询终端接收用户的查询请求,所述查询请求包括检索关键词。Step S401: The querying terminal receives a query request of the user, where the query request includes a search keyword.
步骤S402、所述查询终端获取所述检索关键词中的维度关键词、指标关键词和时间粒度关键词。Step S402: The query terminal acquires a dimension keyword, an index keyword, and a time granularity keyword in the search keyword.
步骤S403、所述查询终端将所述维度关键词、所述指标关键词、以及所述时间粒度关键词发送给搜索引擎数据库,以使所述搜索引擎数据库获取与所述维度关键词匹配的维度特征对应的第一数据、与所述指标关键词匹配的指标特征对应的第二数据、以及与所述时间粒度关键词匹配的时间粒度特征对应的第三数据。Step S403: The query terminal sends the dimension keyword, the index keyword, and the time granularity keyword to a search engine database, so that the search engine database obtains a dimension matching the dimension keyword. The first data corresponding to the feature, the second data corresponding to the indicator feature matched by the index keyword, and the third data corresponding to the time granularity feature matched by the time granularity keyword.
步骤S404、所述查询终端接收所述搜索引擎数据库发送的所述第一数据、所述第二数据和所述第三数据。Step S404: The query terminal receives the first data, the second data, and the third data that are sent by the search engine database.
步骤S401-步骤S404分别与步骤S201-S204一致,具体方法此处不再赘述。Steps S401 to S404 are consistent with steps S201-S204, respectively, and the specific method is not described herein again.
步骤S405、所述查询终端确定所述第一数据、所述第二数据和所述第三数据是否为同一数据,若是,则执行步骤S406,否则,执行步骤S407。Step S405: The querying terminal determines whether the first data, the second data, and the third data are the same data. If yes, step S406 is performed; otherwise, step S407 is performed.
步骤S406、所述查询终端将所述同一数据确定为反馈给所述用户的目标数据。Step S406: The querying terminal determines the same data as target data fed back to the user.
如图1所示,若搜索引擎数据库13查找出的匹配数据只有一个,即所述第一数据、所述第二数据和所述第三数据为同一数据,则排序器14将该匹配数据发送给查询终端11的显示器,查询终端11的显示器显示该匹配数据。As shown in FIG. 1, if the
步骤S407、所述查询终端对所述第一数据、所述第二数据和所述第三数据进行排序,将排序后的数据确定为反馈给所述用户的目标数据。Step S407: The query terminal sorts the first data, the second data, and the third data, and determines the sorted data as target data fed back to the user.
若搜索引擎数据库13查找出的匹配数据有多个,即所述第一数据、所述第二数据和所述第三数据不为同一数据,则排序器14按照预设算法对该多个匹配数据进行排序,将排序后的多个匹配数据发送给查询终端11的显示器,查询终端11的显示器按照排序的先后顺序显示该多个匹配数据。If there are multiple matching data found by the
在步骤S407中,所述查询终端对所述第一数据、所述第二数据和所述第三数据进行排序的方法具体可以包括如下步骤: In the step S407, the method for the query terminal to sort the first data, the second data, and the third data may specifically include the following steps:
步骤S51、所述查询终端计算所述第一数据、所述第二数据和所述第三数据中每个数据的权重值。Step S51: The query terminal calculates a weight value of each of the first data, the second data, and the third data.
具体可通过Pagerank算法计算每个数据的权重值。Specifically, the weight value of each data can be calculated by the Pagerank algorithm.
步骤S52、所述查询终端计算所述第一数据、所述第二数据和所述第三数据中每个数据与所述检索关键词的相似度。Step S52: The query terminal calculates a similarity between each of the first data, the second data, and the third data and the search keyword.
具体可利用CUS—距离算法,计算每个数据与用户输入的检索关键词的相似度。Specifically, the CUS-distance algorithm can be used to calculate the similarity between each data and the search keyword input by the user.
步骤S53、所述查询终端根据所述每个数据的权重值和相似度,计算所述每个数据的排序值。Step S53: The query terminal calculates a sort value of each data according to the weight value and the similarity of each data.
具体的,可将每个数据的权重值和相似度相加得到的值作为该数据的排序值。Specifically, a value obtained by adding a weight value and a similarity of each data may be used as a sort value of the data.
步骤S54、所述查询终端根据所述每个数据的排序值,对所述第一数据、所述第二数据和所述第三数据中的每个数据进行排序。Step S54: The querying terminal sorts each of the first data, the second data, and the third data according to the sorting value of each data.
具体的,可根据每个数据的排序值,按照从大到小的顺序对所述第一数据、所述第二数据和所述第三数据中的每个数据进行排序。Specifically, each of the first data, the second data, and the third data may be sorted in descending order according to a sort value of each data.
可选的,所述查询终端根据所述每个数据的排序值,确定所述第一数据、所述第二数据和所述第三数据中排序值大于第一阈值的数据;所述查询终端对所述排序值大于第一阈值的数据,按照所述排序值的大小进行排序。Optionally, the querying terminal determines, according to the sorting value of each data, data that the ranking value in the first data, the second data, and the third data is greater than a first threshold; the query terminal The data whose sort value is greater than the first threshold is sorted according to the size of the sort value.
另外,计算出所述第一数据、所述第二数据和所述第三数据中每个数据的排序值后,可确定出所述第一数据、所述第二数据和所述第三数据中排序值大于第一阈值的数据,并对排序值大于第一阈值的数据,按照所述排序值的大小进行排序。In addition, after calculating the ranking values of each of the first data, the second data, and the third data, the first data, the second data, and the third data may be determined. The data in the sorted value is greater than the first threshold, and the data whose sorted value is greater than the first threshold is sorted according to the size of the sorted value.
本实施例中,对搜索引擎数据库查找出的多个与检索关键词匹配的数据进行排序,排序的依据是每个数据的排序值,该排序值与每个数据的权重值和该数据与检索关键词的相似度有关,则排序值越大,表示该数据与检索关键词的关联性越强,将排序后的多个数据反馈给用户,用户可方便的查看到与检索关键词关联性最强的数据,提高了用户体验。In this embodiment, a plurality of data matching the search keywords searched by the search engine database are sorted, and the sorting is based on the sort value of each data, the sort value and the weight value of each data, and the data and the search The similarity of the keyword is related, the larger the sorting value is, the stronger the correlation between the data and the search keyword is, and the plurality of sorted data are fed back to the user, and the user can conveniently view the most relevant to the search keyword. Strong data improves the user experience.
图6为本公开实施例四提供的数据处理方法的流程图,如图6所示,在上述任一实施例的基础上,以实施例二为基础,本实施例提供的数据处理方法的具体步骤如下:FIG. 6 is a flowchart of a data processing method according to Embodiment 4 of the present disclosure. As shown in FIG. 6, based on any of the foregoing embodiments, based on the second embodiment, the data processing method provided by this embodiment is specific. Proceed as follows:
步骤S601、查询终端接收用户的查询请求,所述查询请求包括检索关键词。Step S601: The querying terminal receives a query request of the user, where the query request includes a search keyword.
步骤S602、所述查询终端获取所述检索关键词中的维度关键词、指标关键词和时间粒度关键词。 Step S602: The query terminal acquires a dimension keyword, an index keyword, and a time granularity keyword in the search keyword.
步骤S603、所述查询终端将所述维度关键词、所述指标关键词、以及所述时间粒度关键词发送给搜索引擎数据库,以使所述搜索引擎数据库获取与所述维度关键词匹配的维度特征对应的第一数据、与所述指标关键词匹配的指标特征对应的第二数据、以及与所述时间粒度关键词匹配的时间粒度特征对应的第三数据。Step S603: The query terminal sends the dimension keyword, the index keyword, and the time granularity keyword to a search engine database, so that the search engine database obtains a dimension matching the dimension keyword. The first data corresponding to the feature, the second data corresponding to the indicator feature matched by the index keyword, and the third data corresponding to the time granularity feature matched by the time granularity keyword.
步骤S604、所述查询终端接收所述搜索引擎数据库发送的所述第一数据、所述第二数据和所述第三数据。Step S604: The query terminal receives the first data, the second data, and the third data that are sent by the search engine database.
步骤S605、所述查询终端根据所述第一数据、所述第二数据和所述第三数据,确定反馈给所述用户的目标数据。Step S605: The querying terminal determines, according to the first data, the second data, and the third data, target data that is fed back to the user.
步骤S601-步骤S605分别与步骤S201-步骤S205一致,具体方法此处不再赘述。Steps S601 to S605 are respectively consistent with steps S201 to S205, and the specific method is not described herein again.
步骤S606、所述查询终端接收所述用户对所述目标数据的点击操作。Step S606: The query terminal receives a click operation of the user on the target data.
在步骤S407之后,可将排序后的多个目标数据显示在查询终端,用户通过查询终端可点击查看到该多个目标数据。当用户点击某个目标数据时,查询终端可接收到该用户对该目标数据的点击操作。After the step S407, the sorted target data may be displayed on the query terminal, and the user may click to view the plurality of target data by querying the terminal. When the user clicks on a certain target data, the query terminal can receive the click operation of the target data by the user.
步骤S607、所述查询终端根据所述点击操作建立所述用户与所述目标数据的关联关系。Step S607: The querying terminal establishes an association relationship between the user and the target data according to the click operation.
所述关联关系包括关联度,所述关联度标识所述用户与所述目标数据的关联程度。The association relationship includes a degree of association, the degree of association identifying a degree of association of the user with the target data.
在本实施例中,查询终端根据用户点击某个目标数据产生的点击操作建立所述用户与所述目标数据的关联关系,另外,还可根据关联规则和协同过滤规则计算用户与其点击的目标数据的关联度,该用户点击的目标数据的个数可以是多个。In this embodiment, the query terminal establishes an association relationship between the user and the target data according to a click operation generated by the user clicking a certain target data, and may also calculate target data of the user and the click according to the association rule and the collaborative filtering rule. The degree of association, the number of target data clicked by the user may be multiple.
步骤S608、当用户未输入所述检索关键词时,所述查询终端根据所述关联关系显示所述目标数据。Step S608: When the user does not input the search keyword, the query terminal displays the target data according to the association relationship.
当用户在查询终端11未输入检索关键词时,查询终端11可根据用户与其点击过的目标数据之间的关联关系显示该目标数据,即查询终端11可将用户点击过的目标数据显示给用户。When the user does not input the search keyword in the
具体的,所述关联关系包括关联度,所述关联度标识所述用户与所述目标数据的关联程度。所述查询终端根据所述关联关系显示所述目标数据,包括:所述查询终端显示关联度大于第二阈值的所述目标数据。Specifically, the association relationship includes an association degree, where the association degree identifies a degree of association between the user and the target data. The querying terminal displays the target data according to the association relationship, including: the querying terminal displays the target data whose association degree is greater than a second threshold.
可选的,查询终端显示关联度大于第二阈值的所述目标数据。用户与其点击过的每个目标数据的关联关系还包括用户与该目标数据的关联度,查询终端11还可以显示用户点击过的关联度大于第二阈值的目标数据。
Optionally, the query terminal displays the target data whose association degree is greater than a second threshold. The association relationship between the user and the target data that the user clicks on includes the degree of association between the user and the target data, and the
本实施例中,通过建立用户与其点击过的目标数据之间的关联关系,当用户未输入检索关键词时,可根据用户与目标数据之间的关联关系,显示用户点击过的目标数据,提高了用户查询数据的便捷性。In this embodiment, by establishing an association relationship between the user and the target data that the user clicked on, when the user does not input the search keyword, the target data that the user clicked may be displayed according to the relationship between the user and the target data, thereby improving The convenience of users to query data.
图7为本公开实施例五提供的数据处理方法的流程图,如图7所示,本实施例提供的数据处理方法的具体步骤如下:FIG. 7 is a flowchart of a data processing method according to Embodiment 5 of the present disclosure. As shown in FIG. 7, the specific steps of the data processing method provided in this embodiment are as follows:
步骤S501、查询终端接收用户的查询请求,所述查询请求包括检索关键词。Step S501: The query terminal receives a query request of the user, where the query request includes a search keyword.
如图1所示,查询终端11接收用户10的查询请求,查询请求的方式可以有多种,例如,用户10在查询终端11的搜索引擎上输入文字、语音,该文字或语音包括用户10预检索的关键词。As shown in FIG. 1 , the
步骤S502、所述查询终端至少获取所述检索关键词中的两类关键词。Step S502: The query terminal acquires at least two types of keywords in the search keyword.
在本实施例中,查询终端对用户请求查询的检索关键词分类时,可以不局限于维度关键词、指标关键词和时间粒度关键词这三类关键词,因为,并不是用户请求查询的每个检索关键词都包括维度关键词、指标关键词和时间粒度关键词这三类关键词,因此,如图1所示的查询终端11对应的语义识别模块12还可以将用户请求查询的检索关键词拆分为至少两类关键词,例如,用户是卖家,卖家请求查询的检索关键词是“有客户评价我的商品吗”,可拆分出动词“评价”、名词“商品”。In this embodiment, when the query terminal classifies the search keywords that the user requests for the query, it may not be limited to the three types of keywords: the dimension keyword, the index keyword, and the time granularity keyword, because the user does not request the query. Each of the search keywords includes three types of keywords: a dimension keyword, an index keyword, and a time granularity keyword. Therefore, the
步骤S503、所述查询终端将至少两类关键词发送给搜索引擎数据库,以使所述搜索引擎数据库获取与所述至少两类关键词分别对应的源数据。Step S503: The query terminal sends at least two types of keywords to the search engine database, so that the search engine database obtains source data corresponding to the at least two types of keywords respectively.
查询终端将动词“评价”、名词“商品”发送给搜索引擎数据库,搜索引擎数据库存储有卖家的所有商品的商品信息,以及每件商品的评价信息。搜索引擎数据库根据“商品”获得该卖家的所有商品的商品信息,该商品信息具体包括名称、产地、材料等,根据“评价”获得所有商品的评价信息。The query terminal sends the verb "evaluation" and the noun "item" to the search engine database, and the search engine database stores the commodity information of all the seller's products, and the evaluation information of each commodity. The search engine database obtains product information of all the products of the seller according to the "product", and the product information specifically includes a name, a place of origin, a material, and the like, and obtains evaluation information of all the products according to the "evaluation".
步骤S504、所述查询终端接收所述搜索引擎数据库发送的所述源数据。Step S504: The query terminal receives the source data sent by the search engine database.
搜索引擎数据库将商品信息和评价信息发送给查询终端,由于此处的商品信息可以是多个,评价信息也可以是多个。The search engine database transmits the product information and the evaluation information to the query terminal. Since the product information herein may be plural, the evaluation information may be plural.
步骤S505、所述查询终端根据所述源数据,确定反馈给所述用户的目标数据。Step S505: The querying terminal determines, according to the source data, target data that is fed back to the user.
查询终端可以根据每个商品的评价信息的个数,确定反馈给所述用户评价信息最多的商品的商品信息,也可以将每个商品的前几条评价信息反馈给所述用户,本实施例不限定查询终端确定反馈给所述用户的目标数据的具体实现方式。 The query terminal may determine, according to the number of pieces of evaluation information of each product, product information that is fed back to the product with the most user evaluation information, and may also feed back the first pieces of evaluation information of each product to the user. The specific implementation manner in which the query terminal determines the target data fed back to the user is not limited.
本实施例中,通过对检索关键词进行分类,分类的结果并不局限于维度关键词、指标关键词以及时间粒度关键词,提高了对检索关键词分类的灵活度,增加了对检索关键词进行检索的灵活度,同时也扩大了检索范围。In this embodiment, by classifying the search keywords, the results of the classification are not limited to the dimension keywords, the index keywords, and the time granularity keywords, thereby improving the flexibility of the search keyword classification and increasing the search keywords. The flexibility of the search and the scope of the search are also expanded.
图8为本公开实施例六提供的数据处理方法的流程图,如图8所示,本实施例提供的数据处理方法的具体步骤如下:FIG. 8 is a flowchart of a data processing method according to Embodiment 6 of the present disclosure. As shown in FIG. 8, the specific steps of the data processing method provided in this embodiment are as follows:
步骤S701、搜索引擎数据库接收查询终端发送的维度关键词、指标关键词、以及时间粒度关键词。Step S701: The search engine database receives the dimension keyword, the index keyword, and the time granularity keyword sent by the query terminal.
其中,所述维度关键词、所述指标关键词、以及所述时间粒度关键词是所述查询终端接收用户的查询请求,并从所述查询请求包括的检索关键词中获取的。The dimension keyword, the index keyword, and the time granularity keyword are the query requests received by the query terminal by the user, and are obtained from the search keywords included in the query request.
在本实施例中,所述搜索引擎数据库预先存储有数据出口中的数据,以及所述数据的特征信息,所述特征信息包括下述至少一种:维度特征、指标特征和时间粒度特征。In this embodiment, the search engine database pre-stores data in the data exit and feature information of the data, and the feature information includes at least one of the following: a dimension feature, an index feature, and a time granular feature.
所述数据出口包括下述至少一种:数据应用程序、报表、知识库平台以及集群物理表。The data exit includes at least one of the following: a data application, a report, a knowledge base platform, and a cluster physical table.
步骤S702、所述搜索引擎数据获取与所述维度关键词匹配的维度特征对应的第一数据、与所述指标关键词匹配的指标特征对应的第二数据、以及与所述时间粒度关键词匹配的时间粒度特征对应的第三数据。Step S702: The search engine data acquires first data corresponding to the dimensional feature matched by the dimension keyword, second data corresponding to the index feature matched by the index keyword, and matching with the time granularity keyword. The time granularity feature corresponds to the third data.
步骤S703、所述搜索引擎数据将所述第一数据、所述第二数据和所述第三数据发送给所述查询终端,以使所述查询终端根据所述第一数据、所述第二数据和所述第三数据,确定反馈给所述用户的目标数据。Step S703, the search engine data sends the first data, the second data, and the third data to the query terminal, so that the query terminal is configured according to the first data, the second The data and the third data determine target data that is fed back to the user.
本实施例所述方法的原理与图3所示实施例方法的原理一致,具体过程此处不再赘述。The principle of the method in this embodiment is consistent with the principle of the method in the embodiment shown in FIG. 3. The specific process is not described here.
本实施例中,通过预先采集数据应用程序、报表、知识库平台以及集群物理表中的数据至搜索引擎数据库内,且为所采集的每一条数据添加维度特征、指标特征和时间粒度特征中的至少一个;当搜索引擎接收到用户输入的检索关键词时,首先对检索关键词进行拆分,获得维度关键词、指标关键词以及时间粒度关键词;然后,在预先建立的搜索引擎数据库中,分别查找与维度关键词、指标关键词以及时间粒度关键词相匹配的数据,并将匹配的数据显示给用户;用户无需遍历每个数据出口进行数据查找,仅需输入一次检索关键词,搜索引擎数据库即可查找出所有数据出口中与该检索关键词相关的数据,从而提高了查找数据的效率。 In this embodiment, the data in the data application, the report, the knowledge base platform, and the cluster physical table are collected in advance into the search engine database, and the dimension feature, the index feature, and the time granularity feature are added to each collected data. At least one; when the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find the data matching the dimension keyword, the index keyword and the time granularity keyword respectively, and display the matched data to the user; the user does not need to traverse each data export to perform data search, only need to input the search keyword once, the search engine The database can find out the data related to the search keyword in all data exits, thereby improving the efficiency of finding data.
图9为本公开实施例七提供的数据处理方法的流程图,如图9所示,本实施例提供的数据处理方法的具体步骤如下:FIG. 9 is a flowchart of a data processing method according to Embodiment 7 of the present disclosure. As shown in FIG. 9, the specific steps of the data processing method provided in this embodiment are as follows:
步骤S801、所述搜索引擎数据库存储所述数据应用程序、所述报表、所述知识库平台以及所述集群物理表中的数据。Step S801: The search engine database stores data in the data application, the report, the knowledge base platform, and the cluster physical table.
在图3所示实施例的基础上,在接收用户输入的检索关键词之前,搜索引擎数据库13预先存储有数据应用程序、报表、知识库平台以及集群物理表中的数据。On the basis of the embodiment shown in FIG. 3, the
具体的,数据应用程序15中的数据可通过语法解析器19存储在搜索引擎数据库13,具体的,通过SDK将数据应用程序15中的数据采集到语法解析器19中。语法解析器19可解析出一段结构化查询语言(Structured Query Language,简称SQL)的维度特征、指标特征、时间粒度特征和读取的表名,例如一段SQL具体如下:Specifically, the data in the
SELECT stat_date AS日期SELECT stat_date AS date
,user_type AS用户类型, user_type AS user type
,se_lpv_pc_1d_001 AS Pv,se_lpv_pc_1d_001 AS Pv
,se_uv_pc_1d_001 AS Uv,se_uv_pc_1d_001 AS Uv
FROM tbbi.ads_tb_log_1dFROM tbbi.ads_tb_log_1d
WHERE ds='20151026'WHERE ds='20151026'
语法解析器19可解析出该段SQL的维度特征是“用户类型”,指标特征是“Pv、Uv”,时间粒度特征是“最近一天”,读取的表名是“tbbi.ads_tb_log_1d”。通过前述方法,语法解析器19可解析出数据应用程序15中每个数据的维度特征、指标特征和时间粒度特征。语法解析器19将解析后的数据发送给搜索引擎数据库13,搜索引擎数据库13中不仅存储有数据本身,同时还存储有数据的维度特征、指标特征和时间粒度特征。The
另外,搜索引擎数据库13还可以存储有报表16、知识库平台17和集群物理表18中的数据,存储过程具体为:对报表16、知识库平台17和集群物理表18中的每个数据进行拆分,提取出拆分后的每个数据的维度特征,并将报表16、知识库平台17和集群物理表18中的每个数据,以及每个数据的维度特征存储在搜索引擎数据库13。如此,搜索引擎数据库13中存储的每个数据至少具有维度特征。In addition, the
步骤S802、搜索引擎数据库接收查询终端发送的维度关键词、指标关键词、以及时间粒度关键词,所述维度关键词、所述指标关键词、以及所述时间粒度关键词是所述查 询终端接收用户的查询请求,并从所述查询请求包括的检索关键词中获取的。Step S802: The search engine database receives the dimension keyword, the index keyword, and the time granularity keyword sent by the query terminal, where the dimension keyword, the index keyword, and the time granularity keyword are the check The querying terminal receives the query request of the user and obtains the search keyword included in the query request.
在本实施例中,所述搜索引擎数据库预先存储有数据出口中的数据,以及所述数据的特征信息,所述特征信息包括下述至少一种:维度特征、指标特征和时间粒度特征。In this embodiment, the search engine database pre-stores data in the data exit and feature information of the data, and the feature information includes at least one of the following: a dimension feature, an index feature, and a time granular feature.
所述数据出口包括下述至少一种:数据应用程序、报表、知识库平台以及集群物理表。The data exit includes at least one of the following: a data application, a report, a knowledge base platform, and a cluster physical table.
步骤S803、所述搜索引擎数据获取与所述维度关键词匹配的维度特征对应的第一数据、与所述指标关键词匹配的指标特征对应的第二数据、以及与所述时间粒度关键词匹配的时间粒度特征对应的第三数据。Step S803, the search engine data acquires first data corresponding to the dimensional feature matched by the dimension keyword, second data corresponding to the index feature matched by the index keyword, and matching with the time granularity keyword. The time granularity feature corresponds to the third data.
步骤S804、所述搜索引擎数据将所述第一数据、所述第二数据和所述第三数据发送给所述查询终端,以使所述查询终端根据所述第一数据、所述第二数据和所述第三数据,确定反馈给所述用户的目标数据。Step S804, the search engine data sends the first data, the second data, and the third data to the query terminal, so that the query terminal is configured according to the first data, the second The data and the third data determine target data that is fed back to the user.
步骤S802-步骤S804所述的方法原理与步骤S701-步骤S703所述的方法原理一致,此处不再赘述。The principle of the method described in step S802 to step S804 is the same as that of the method described in step S701 to step S703, and details are not described herein again.
本实施例中,通过预先采集数据应用程序、报表、知识库平台以及集群物理表中的数据至搜索引擎数据库内,且为所采集的每一条数据添加维度特征、指标特征和时间粒度特征中的至少一个;当搜索引擎接收到用户输入的检索关键词时,首先对检索关键词进行拆分,获得维度关键词、指标关键词以及时间粒度关键词;然后,在预先建立的搜索引擎数据库中,分别查找与维度关键词、指标关键词以及时间粒度关键词相匹配的数据,并将匹配的数据显示给用户;用户无需遍历每个数据出口进行数据查找,仅需输入一次检索关键词,搜索引擎数据库即可查找出所有数据出口中与该检索关键词相关的数据,从而提高了查找数据的效率。In this embodiment, the data in the data application, the report, the knowledge base platform, and the cluster physical table are collected in advance into the search engine database, and the dimension feature, the index feature, and the time granularity feature are added to each collected data. At least one; when the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find the data matching the dimension keyword, the index keyword and the time granularity keyword respectively, and display the matched data to the user; the user does not need to traverse each data export to perform data search, only need to input the search keyword once, the search engine The database can find out the data related to the search keyword in all data exits, thereby improving the efficiency of finding data.
图10为本公开实施例八提供的数据处理方法的流程图,如图10所示,所述搜索引擎数据库存储所述数据应用程序、所述报表、所述知识库平台以及所述集群物理表中的数据具体可以包括如下步骤S901和S902:10 is a flowchart of a data processing method according to
步骤S901、所述搜索引擎数据库存储所述数据应用程序中的数据。Step S901: The search engine database stores data in the data application.
步骤S901的可以通过如下步骤S11-S13来实现:Step S901 can be implemented by the following steps S11-S13:
步骤S11、所述搜索引擎数据库获取所述数据应用程序访问数据源的访问逻辑。Step S11: The search engine database acquires access logic of the data application accessing the data source.
所述访问逻辑包括所述数据应用程序中的数据,所述数据源存储有所述数据的产出逻辑。 The access logic includes data in the data application, the data source storing output logic of the data.
本实施例介绍将所述数据应用程序中的数据存储到所述搜索引擎数据库的方法,且本实施例所述的方法不同于上述实施例所述的通过语法解析器19将数据应用程序15中的数据存储在搜索引擎数据库13的方法。This embodiment introduces a method of storing data in the data application program to the search engine database, and the method described in this embodiment is different from the
在本实施例中,所述数据应用程序可具体为Web页面形式,需频繁与后台数据源进行交互;所述后台数据源可具体为存储所述数据应用程序操作数据的器件。由于数据应用程序是根据SDK所开发的,SDK对数据应用程序具有最大的操作权限,因此可通过SDK捕获数据应用程序对后台数据源的第一访问逻辑,所述第一访问逻辑中包括数据应用程序访问后台数据源的时间、用户对数据应用程序的第二访问逻辑等字段。因此,通过现有的解析方式,即可获取用户对数据应用程序的第二访问逻辑。用户对数据应用程序的第二访问逻辑中包括用户访问数据应用程序的时间、用户当前所访问的数据应用程序中的数据等字段信息。因此,通过现有的解析方式,即可获取用户当前所访问的数据应用程序中的数据。In this embodiment, the data application may be specifically in the form of a web page, and needs to interact with the background data source frequently; the background data source may be specifically a device that stores the data application operation data. Since the data application is developed according to the SDK, the SDK has the greatest operational authority for the data application, so the first access logic of the data application to the background data source can be captured by the SDK, and the first access logic includes the data application. The time when the program accesses the background data source, the second access logic of the user to the data application, and so on. Therefore, the second access logic of the user to the data application can be obtained through the existing parsing method. The second access logic of the user to the data application includes field information such as the time when the user accesses the data application, the data in the data application currently accessed by the user, and the like. Therefore, through the existing parsing method, the data in the data application currently accessed by the user can be obtained.
假设用户对数据应用程序的第二访问逻辑为:Suppose the user's second access logic to the data application is:
SELECT stat_date AS日期SELECT stat_date AS date
,user_type AS用户类型, user_type AS user type
,se_lpv_pc_1d_001 AS Pv,se_lpv_pc_1d_001 AS Pv
,se_uv_pc_1d_001 AS Uv,se_uv_pc_1d_001 AS Uv
FROM tbbi.ads_tb_log_1dFROM tbbi.ads_tb_log_1d
WHERE ds='20151026'WHERE ds='20151026'
通过对上述第二访问逻辑进行解析,即可获取用户当前所访问的数据应用程序中的数据为“tbbi.ads_tb_log_1d”,即FROM字段后的信息。By parsing the second access logic, the data in the data application currently accessed by the user is “tbbi.ads_tb_log_1d”, that is, the information after the FROM field.
另外,后台数据源中存储有每个数据的产出逻辑,因此,在后台数据源中,可直接查找用户当前所访问的数据应用程序中的数据的产出逻辑。In addition, the background data source stores the output logic of each data. Therefore, in the background data source, the output logic of the data in the data application currently accessed by the user can be directly searched.
步骤S12、所述搜索引擎数据库根据所述产出逻辑,确定所述数据应用程序中的数据的特征信息。Step S12: The search engine database determines feature information of data in the data application according to the output logic.
具体的,所述产出逻辑包括所述数据的聚合对象信息、聚合过程中参与运算的指标信息以及指标运算的时间信息。Specifically, the output logic includes aggregation object information of the data, indicator information of the participation operation in the aggregation process, and time information of the indicator operation.
步骤S12的实现方式具体为:所述搜索引擎数据库确定所述数据的聚合对象信息为所述数据的维度特征;所述搜索引擎数据库确定所述数据在聚合过程中参与运算的指标信息为所述数据的指标特征;所述搜索引擎数据库根据所述指标运算的时间信息,确定 所述数据的时间粒度特征。The implementation of step S12 is specifically: the search engine database determines that the aggregated object information of the data is a dimensional feature of the data; and the search engine database determines that the indicator information of the data participating in the operation in the aggregation process is the An index feature of the data; the search engine database is determined according to time information of the indicator operation Time granularity characteristics of the data.
对用户当前所访问的数据应用程序中的数据的产出逻辑进行解析,获取当前数据的聚合对象信息、聚合过程中参与运算的指标信息以及指标运算的时间信息。The output logic of the data in the data application currently accessed by the user is parsed, and the aggregated object information of the current data, the index information of the participating operations in the aggregation process, and the time information of the index calculation are obtained.
在本实施例中,假设用户当前所访问的数据应用程序中的数据的产出逻辑,如下:In this embodiment, it is assumed that the output logic of the data in the data application currently accessed by the user is as follows:
Select stat_date,user_type,count(1)se_lpv_pc_1d_001,count(distinct uid)se_uv_pc_1d_001Select stat_date, user_type, count(1)se_lpv_pc_1d_001, count(distinct uid)se_uv_pc_1d_001
From tbcdm.dwd_tb_log_1d where ds=’20160119’From tbcdm.dwd_tb_log_1d where ds=’20160119’
Group by user_type,stat_dateGroup by user_type, stat_date
通过对上述产出逻辑进行解析,可获得用户当前所访问的数据应用程序中的数据的聚合对象信息为stat_date,user_type,即Group by字段后的信息;聚合过程中参与运算的指标信息为se_lpv_pc_1d_001,se_uv_pc_1d_001,即count(1)和count(distinct uid)字段后的信息;指标运算的时间信息为’20160119’,即where ds字段后的分数区。By parsing the above output logic, the aggregated object information of the data in the data application currently accessed by the user is stat_date, user_type, that is, the information after the Group by field; the index information of the participating operation in the aggregation process is se_lpv_pc_1d_001, Se_uv_pc_1d_001, that is, the information after the count(1) and count(distinct uid) fields; the time information of the index operation is '20160119', that is, the fractional area after the where ds field.
确定当前数据的聚合对象信息为当前数据的维度特征,确定当前数据聚合过程中参与运算的指标信息为当前数据的指标特征,以及,根据当前数据指标运算的时间信息,确定当前数据的时间粒度特征。Determining the aggregated object information of the current data as a dimensional feature of the current data, determining that the indicator information of the participating operation in the current data aggregation process is an indicator feature of the current data, and determining the time granularity characteristic of the current data according to the time information of the current data index operation .
另外,还可将上述指标运算的时间信息所代表的时间区间,作为当前数据的时间粒度特征,比如,当前数据指标运算的时间信息,即where字段后的分数区为“ds=’20160119’”,则当前数据的时间粒度特征为1,再如,当前数据指标运算的时间信息,即where字段后的分数区为“ds>=’20160101’and ds<=’20160107’”,则当前数据的时间粒度特征为7。In addition, the time interval represented by the time information of the above index calculation may be used as the time granularity feature of the current data, for example, the time information of the current data index operation, that is, the score area after the where field is “ds='20160119'” , the time granularity characteristic of the current data is 1, and, for example, the time information of the current data index operation, that is, the score area after the where field is “ds>='20160101'and ds<='20160107'”, the current data The time granularity feature is 7.
步骤S13、所述搜索引擎数据库存储所述数据应用程序中的数据,以及所述数据的特征信息。Step S13: The search engine database stores data in the data application and feature information of the data.
最后,为用户当前所访问的数据应用程序中的数据添加维度特征、指标特征以及时间粒度特征,且将添加特征后的数据存储到搜索引擎数据库中。Finally, dimension features, metric features, and time granularity features are added to the data in the data application currently accessed by the user, and the data after the feature is added is stored in the search engine database.
在本施例中,由于用户每访问一次数据应用程序,即可获取一次用户当前所访问的数据应用程序中的数据的维度特征、指标特征以及时间粒度特征,且为用户当前所访问的数据应用程序中的数据添加上述维度特征、指标特征以及时间粒度特征,最后,将添加上述特征后的数据,存储到搜索引擎数据库中。当用户访问尽数据应用程序中的所有数据时,即可将数据应用程序中的所有数据存储到搜索引擎数据库内,则搜索引擎数据 库内的每条数据均有维度特征、指标特征和时间粒度特征。In this embodiment, since the user accesses the data application once, the dimension feature, the index feature, and the time granularity feature of the data in the data application currently accessed by the user can be obtained, and the data application currently accessed by the user is obtained. The data in the program adds the above dimensional features, index features, and time granularity features. Finally, the data after adding the above features is stored in the search engine database. When the user accesses all the data in the data application, all the data in the data application can be stored in the search engine database, and the search engine data Each piece of data in the library has dimensional features, indicator characteristics, and time granularity characteristics.
步骤S902、所述搜索引擎数据库存储所述报表、所述知识库平台以及所述集群物理表中的数据。Step S902: The search engine database stores the data in the report, the knowledge base platform, and the cluster physical table.
步骤S902的可以通过如下步骤S21-S23来实现:Step S902 can be implemented by the following steps S21-S23:
步骤S21、所述搜索引擎数据库分别获取所述报表、所述知识库平台以及所述集群物理表中的数据。Step S21: The search engine database respectively acquires data in the report, the knowledge base platform, and the cluster physical table.
本实施例可通过TF-iDF算法拆分所述报表、所述知识库平台以及所述集群物理表中的每个数据。In this embodiment, each of the data in the report, the knowledge base platform, and the cluster physical table may be split by a TF-iDF algorithm.
步骤S22、所述搜索引擎数据库根据预设算法,确定所述报表、所述知识库平台以及所述集群物理表中每个数据的维度特征。Step S22: The search engine database determines, according to a preset algorithm, a dimensional feature of each data in the report, the knowledge base platform, and the cluster physical table.
利用LDA算法和TOPIC MODEL算法对拆分后的数据进行特征提取,并将提取的特征作为对应数据的维度特征。The LDA algorithm and the TOPIC MODEL algorithm are used to extract the features of the split data, and the extracted features are used as the dimensional features of the corresponding data.
步骤S23、所述搜索引擎数据库存储所述报表、所述知识库平台以及所述集群物理表中每个数据,以及所述数据的维度特征。Step S23: The search engine database stores the data in the report, the knowledge base platform, and the cluster physical table, and the dimensional features of the data.
为所述报表、所述知识库平台以及所述集群物理表中的每个数据添加维度特征,且将添加维度特征后的数据,存储到搜索引擎数据库中。Dimension features are added to each of the data in the report, the knowledge base platform, and the cluster physical table, and the data after adding the dimensional features is stored in a search engine database.
本实施例中,搜索引擎数据库中存储有数据应用程序中的所有数据,且从数据应用程序存储到搜索引擎数据库中的每个数据关联有维度特征、指标特征和时间粒度特征;另外,搜索引擎数据库中存储有报表、知识库平台以及集群物理表中的所有数据,且从报表、知识库平台以及集群物理表存储到搜索引擎数据库中的每个数据关联有维度特征。In this embodiment, all data in the data application is stored in the search engine database, and each data stored in the search engine database from the data application is associated with dimensional features, index features, and time granularity features; in addition, the search engine The database stores all the data in the report, the knowledge base platform, and the cluster physical table, and each data association stored from the report, the knowledge base platform, and the cluster physical table to the search engine database has dimensional characteristics.
图11为本公开实施例九提供的数据处理方法的流程图,如图11所示,本实施例提供的数据处理方法可以包括如下步骤:FIG. 11 is a flowchart of a data processing method according to Embodiment 9 of the present disclosure. As shown in FIG. 11, the data processing method provided in this embodiment may include the following steps:
步骤S1001、搜索引擎数据库获取数据应用程序中的第一数据,以及所述第一数据的维度特征、指标特征、时间粒度特征。Step S1001: The search engine database acquires first data in the data application, and dimension features, index features, and time granularity features of the first data.
在本实施例中,步骤S1001的实现方式可以包括以下两种:In this embodiment, the implementation manner of step S1001 may include the following two types:
第一种:所述搜索引擎数据库接收语法解析器发送的所述第一数据,以及所述第一数据的维度特征、指标特征、时间粒度特征,所述语法解析器用于采集所述数据应用程序中的第一数据,以及解析所述第一数据的维度特征、指标特征、时间粒度特征。 The first type: the search engine database receives the first data sent by the parser, and the dimensional feature, the index feature, and the time granularity feature of the first data, where the parser is used to collect the data application The first data in the first data, and the dimensional feature, the index feature, and the time granularity feature of the first data are parsed.
具体的,数据应用程序15中的数据可通过语法解析器19存储在搜索引擎数据库13,具体的,通过SDK将数据应用程序15中的数据采集到语法解析器19中。语法解析器19可解析出一段结构化查询语言(Structured Query Language,简称SQL)的维度特征、指标特征、时间粒度特征和读取的表名,例如一段SQL具体如下:Specifically, the data in the
SELECT stat_date AS日期SELECT stat_date AS date
,user_type AS用户类型, user_type AS user type
,se_lpv_pc_1d_001 AS Pv,se_lpv_pc_1d_001 AS Pv
,se_uv_pc_1d_001 AS Uv,se_uv_pc_1d_001 AS Uv
FROM tbbi.ads_tb_log_1dFROM tbbi.ads_tb_log_1d
WHERE ds='20151026'WHERE ds='20151026'
语法解析器19可解析出该段SQL的维度特征是“用户类型”,指标特征是“Pv、Uv”,时间粒度特征是“最近一天”,读取的表名是“tbbi.ads_tb_log_1d”。通过前述方法,语法解析器19可解析出数据应用程序15中每个数据的维度特征、指标特征和时间粒度特征。语法解析器19将解析后的数据发送给搜索引擎数据库13,搜索引擎数据库13中不仅存储有数据本身,同时还存储有数据的维度特征、指标特征和时间粒度特征。The
第二种包括如下步骤S31-S32:The second type includes the following steps S31-S32:
步骤S31、所述搜索引擎数据库获取所述数据应用程序访问数据源的访问逻辑,所述访问逻辑包括所述数据应用程序中的第一数据,所述数据源存储有所述第一数据的产出逻辑。Step S31: The search engine database acquires access logic of the data application access data source, where the access logic includes first data in the data application, and the data source stores the first data. Out of logic.
在本实施例中,所述数据应用程序可具体为Web页面形式,需频繁与后台数据源进行交互;所述后台数据源可具体为存储所述数据应用程序操作数据的器件。由于数据应用程序是根据SDK所开发的,SDK对数据应用程序具有最大的操作权限,因此可通过SDK捕获数据应用程序对后台数据源的第一访问逻辑,所述第一访问逻辑中包括数据应用程序访问后台数据源的时间、用户对数据应用程序的第二访问逻辑等字段。因此,通过现有的解析方式,即可获取用户对数据应用程序的第二访问逻辑。用户对数据应用程序的第二访问逻辑中包括用户访问数据应用程序的时间、用户当前所访问的数据应用程序中的数据等字段信息。因此,通过现有的解析方式,即可获取用户当前所访问的数据应用程序中的数据。In this embodiment, the data application may be specifically in the form of a web page, and needs to interact with the background data source frequently; the background data source may be specifically a device that stores the data application operation data. Since the data application is developed according to the SDK, the SDK has the greatest operational authority for the data application, so the first access logic of the data application to the background data source can be captured by the SDK, and the first access logic includes the data application. The time when the program accesses the background data source, the second access logic of the user to the data application, and so on. Therefore, the second access logic of the user to the data application can be obtained through the existing parsing method. The second access logic of the user to the data application includes field information such as the time when the user accesses the data application, the data in the data application currently accessed by the user, and the like. Therefore, through the existing parsing method, the data in the data application currently accessed by the user can be obtained.
假设用户对数据应用程序的第二访问逻辑为: Suppose the user's second access logic to the data application is:
SELECT stat_date AS日期SELECT stat_date AS date
,user_type AS用户类型, user_type AS user type
,se_lpv_pc_1d_001 AS Pv,se_lpv_pc_1d_001 AS Pv
,se_uv_pc_1d_001 AS Uv,se_uv_pc_1d_001 AS Uv
FROM tbbi.ads_tb_log_1dFROM tbbi.ads_tb_log_1d
WHERE ds='20151026'WHERE ds='20151026'
通过对上述第二访问逻辑进行解析,即可获取用户当前所访问的数据应用程序中的数据为“tbbi.ads_tb_log_1d”,即FROM字段后的信息。By parsing the second access logic, the data in the data application currently accessed by the user is “tbbi.ads_tb_log_1d”, that is, the information after the FROM field.
另外,后台数据源中存储有每个数据的产出逻辑,因此,在后台数据源中,可直接查找用户当前所访问的数据应用程序中的数据的产出逻辑。In addition, the background data source stores the output logic of each data. Therefore, in the background data source, the output logic of the data in the data application currently accessed by the user can be directly searched.
所述产出逻辑包括所述第一数据的聚合对象信息、聚合过程中参与运算的指标信息以及指标运算的时间信息。具体的,所述搜索引擎数据库确定所述第一数据的聚合对象信息为所述第一数据的维度特征;所述搜索引擎数据库确定所述第一数据在聚合过程中参与运算的指标信息为所述第一数据的指标特征;所述搜索引擎数据库根据所述指标运算的时间信息,确定所述第一数据的时间粒度特征。The output logic includes aggregation object information of the first data, indicator information of the participation operation in the aggregation process, and time information of the indicator operation. Specifically, the search engine database determines that the aggregated object information of the first data is a dimensional feature of the first data; and the search engine database determines that the indicator information of the first data participating in the operation in the aggregation process is Determining an indicator characteristic of the first data; the search engine database determining a time granularity characteristic of the first data according to time information of the indicator operation.
步骤S32、所述搜索引擎数据库根据所述产出逻辑,确定所述数据应用程序中的第一数据的特征信息,所述特征信息包括维度特征、指标特征、时间粒度特征。Step S32: The search engine database determines, according to the output logic, feature information of the first data in the data application, where the feature information includes a dimension feature, an indicator feature, and a time granularity feature.
在本实施例中,假设用户当前所访问的数据应用程序中的数据的产出逻辑,如下:In this embodiment, it is assumed that the output logic of the data in the data application currently accessed by the user is as follows:
Select stat_date,user_type,count(1)se_lpv_pc_1d_001,count(distinct uid)se_uv_pc_1d_001Select stat_date, user_type, count(1)se_lpv_pc_1d_001, count(distinct uid)se_uv_pc_1d_001
From tbcdm.dwd_tb_log_1d where ds=’20160119’From tbcdm.dwd_tb_log_1d where ds=’20160119’
Group by user_type,stat_dateGroup by user_type, stat_date
通过对上述产出逻辑进行解析,可获得用户当前所访问的数据应用程序中的数据的聚合对象信息为stat_date,user_type,即Group by字段后的信息;聚合过程中参与运算的指标信息为se_lpv_pc_1d_001,se_uv_pc_1d_001,即count(1)和count(distinct uid)字段后的信息;指标运算的时间信息为’20160119’,即where ds字段后的分数区。By parsing the above output logic, the aggregated object information of the data in the data application currently accessed by the user is stat_date, user_type, that is, the information after the Group by field; the index information of the participating operation in the aggregation process is se_lpv_pc_1d_001, Se_uv_pc_1d_001, that is, the information after the count(1) and count(distinct uid) fields; the time information of the index operation is '20160119', that is, the fractional area after the where ds field.
确定当前数据的聚合对象信息为当前数据的维度特征,确定当前数据聚合过程中参与运算的指标信息为当前数据的指标特征,以及,根据当前数据指标运算的时间信息,确定当前数据的时间粒度特征。 Determining the aggregated object information of the current data as a dimensional feature of the current data, determining that the indicator information of the participating operation in the current data aggregation process is an indicator feature of the current data, and determining the time granularity characteristic of the current data according to the time information of the current data index operation .
另外,还可将上述指标运算的时间信息所代表的时间区间,作为当前数据的时间粒度特征,比如,当前数据指标运算的时间信息,即where字段后的分数区为“ds=’20160119’”,则当前数据的时间粒度特征为1,再如,当前数据指标运算的时间信息,即where字段后的分数区为“ds>=’20160101’and ds<=’20160107’”,则当前数据的时间粒度特征为7。In addition, the time interval represented by the time information of the above index calculation may be used as the time granularity feature of the current data, for example, the time information of the current data index operation, that is, the score area after the where field is “ds='20160119'” , the time granularity characteristic of the current data is 1, and, for example, the time information of the current data index operation, that is, the score area after the where field is “ds>='20160101'and ds<='20160107'”, the current data The time granularity feature is 7.
步骤S1002、所述搜索引擎数据库分别获取报表、知识库平台、集群物理表中的第二数据,以及所述第二数据的维度特征。Step S1002: The search engine database respectively acquires a second data in a report, a knowledge base platform, a cluster physical table, and a dimensional feature of the second data.
具体的,所述搜索引擎数据库分别获取所述报表、所述知识库平台以及所述集群物理表中的第二数据;所述搜索引擎数据库根据预设算法,确定所述报表、所述知识库平台以及所述集群物理表中每个第二数据的维度特征。Specifically, the search engine database respectively obtains the report, the knowledge base platform, and second data in the cluster physical table; and the search engine database determines the report and the knowledge base according to a preset algorithm. a platform and a dimensional feature of each second data in the cluster physical table.
本实施例可通过TF-iDF算法拆分所述报表、所述知识库平台以及所述集群物理表中的每个数据。利用LDA算法和TOPIC MODEL算法对拆分后的数据进行特征提取,并将提取的特征作为对应数据的维度特征。In this embodiment, each of the data in the report, the knowledge base platform, and the cluster physical table may be split by a TF-iDF algorithm. The LDA algorithm and the TOPIC MODEL algorithm are used to extract the features of the split data, and the extracted features are used as the dimensional features of the corresponding data.
步骤S1003、所述搜索引擎数据库存储所述第一数据,以及所述第一数据的维度特征、指标特征、时间粒度特征。Step S1003: The search engine database stores the first data, and dimension features, index features, and time granularity features of the first data.
步骤S1004、所述搜索引擎数据库存储所述第二数据,以及所述第二数据的维度特征。Step S1004: The search engine database stores the second data, and dimension features of the second data.
为所述报表、所述知识库平台以及所述集群物理表中的每个数据添加维度特征,且将添加维度特征后的数据,存储到搜索引擎数据库中。Dimension features are added to each of the data in the report, the knowledge base platform, and the cluster physical table, and the data after adding the dimensional features is stored in a search engine database.
本实施例中,通过预先采集数据应用程序、报表、知识库平台以及集群物理表中的数据至搜索引擎数据库内,且为所采集的每一条数据添加维度特征、指标特征和时间粒度特征中的至少一个;当搜索引擎接收到用户输入的检索关键词时,首先对检索关键词进行拆分,获得维度关键词、指标关键词以及时间粒度关键词;然后,在预先建立的搜索引擎数据库中,分别查找与维度关键词、指标关键词以及时间粒度关键词相匹配的数据,并将匹配的数据显示给用户;用户无需遍历每个数据出口进行数据查找,仅需输入一次检索关键词,搜索引擎数据库即可查找出所有数据出口中与该检索关键词相关的数据,从而提高了查找数据的效率。In this embodiment, the data in the data application, the report, the knowledge base platform, and the cluster physical table are collected in advance into the search engine database, and the dimension feature, the index feature, and the time granularity feature are added to each collected data. At least one; when the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find the data matching the dimension keyword, the index keyword and the time granularity keyword respectively, and display the matched data to the user; the user does not need to traverse each data export to perform data search, only need to input the search keyword once, the search engine The database can find out the data related to the search keyword in all data exits, thereby improving the efficiency of finding data.
图12为本公开实施例一提供的查询终端的结构示意图,如图12所示,该查询终端包括:接收单元、处理单元、以及发送单元。
FIG. 12 is a schematic structural diagram of a query terminal according to
所述接收单元,用于接收用户的查询请求,所述查询请求包括检索关键词。The receiving unit is configured to receive a query request of a user, where the query request includes a search keyword.
所述处理单元,耦合到所述接收单元,用于获取所述检索关键词中的维度关键词、指标关键词和时间粒度关键词。The processing unit is coupled to the receiving unit, configured to acquire a dimension keyword, an index keyword, and a time granularity keyword in the search keyword.
所述发送单元,耦合到所述处理单元,用于将所述维度关键词、所述指标关键词、以及所述时间粒度关键词发送给搜索引擎数据库,以使所述搜索引擎数据库获取与所述维度关键词匹配的维度特征对应的第一数据、与所述指标关键词匹配的指标特征对应的第二数据、以及与所述时间粒度关键词匹配的时间粒度特征对应的第三数据,所述搜索引擎数据库预先存储有数据出口中的数据,以及所述数据的特征信息,所述数据出口包括下述至少一种:数据应用程序、报表、知识库平台以及集群物理表,所述特征信息包括下述至少一种:维度特征、指标特征和时间粒度特征。The sending unit is coupled to the processing unit, configured to send the dimension keyword, the index keyword, and the time granularity keyword to a search engine database, so that the search engine database acquires The first data corresponding to the dimension feature matched by the dimension keyword, the second data corresponding to the indicator feature matched by the index keyword, and the third data corresponding to the time granularity feature matched by the time granularity keyword, The search engine database pre-stores data in the data exit and characteristic information of the data, and the data export includes at least one of the following: a data application, a report, a knowledge base platform, and a cluster physical table, and the feature information At least one of the following is included: a dimensional feature, an index feature, and a time granular feature.
所述接收单元还用于接收所述搜索引擎数据库发送的所述第一数据、所述第二数据和所述第三数据。The receiving unit is further configured to receive the first data, the second data, and the third data that are sent by the search engine database.
所述处理单元还用于根据所述第一数据、所述第二数据和所述第三数据,确定反馈给所述用户的目标数据。The processing unit is further configured to determine target data that is fed back to the user according to the first data, the second data, and the third data.
本实施例中,通过预先采集数据应用程序、报表、知识库平台以及集群物理表中的数据至搜索引擎数据库内,且为所采集的每一条数据添加维度特征、指标特征和时间粒度特征中的至少一个;当搜索引擎接收到用户输入的检索关键词时,首先对检索关键词进行拆分,获得维度关键词、指标关键词以及时间粒度关键词;然后,在预先建立的搜索引擎数据库中,分别查找与维度关键词、指标关键词以及时间粒度关键词相匹配的数据,并将匹配的数据显示给用户;用户无需遍历每个数据出口进行数据查找,仅需输入一次检索关键词,搜索引擎数据库即可查找出所有数据出口中与该检索关键词相关的数据,从而提高了查找数据的效率。In this embodiment, the data in the data application, the report, the knowledge base platform, and the cluster physical table are collected in advance into the search engine database, and the dimension feature, the index feature, and the time granularity feature are added to each collected data. At least one; when the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find the data matching the dimension keyword, the index keyword and the time granularity keyword respectively, and display the matched data to the user; the user does not need to traverse each data export to perform data search, only need to input the search keyword once, the search engine The database can find out the data related to the search keyword in all data exits, thereby improving the efficiency of finding data.
在图12所示实施例的基础上,所述处理单元具体用于对所述检索关键词进行分词处理获得多个目标分词;根据各目标分词查询预设的映射表,所述映射表包括维度分词、指标分词和时间粒度分词;将所述多个目标分词中与所述维度分词匹配的目标分词确定为所述维度关键词;将所述多个目标分词中与所述指标分词匹配的目标分词确定为所述指标关键词;将所述多个目标分词中与所述时间粒度分词匹配的目标分词确定为所述时间粒度关键词。On the basis of the embodiment shown in FIG. 12, the processing unit is specifically configured to perform word segmentation processing on the search keyword to obtain a plurality of target word segments; and query a preset mapping table according to each target word segment, where the mapping table includes a dimension. a word segmentation, an index word segmentation, and a time granularity word segmentation; determining, in the plurality of target word segments, a target word segment that matches the dimension word segment as the dimension keyword; and matching the plurality of target word segments with the target word segmentation target The word segmentation is determined as the index keyword; the target segmentation of the plurality of target word segments that matches the time granularity word segment is determined as the time granularity keyword.
进一步的,所述处理单元具体用于确定所述第一数据、所述第二数据和所述第三数 据是否为同一数据;若所述第一数据、所述第二数据和所述第三数据是同一数据,则所述处理单元将所述同一数据确定为反馈给所述用户的目标数据;若所述第一数据、所述第二数据和所述第三数据不是同一数据,则所述处理单元对所述第一数据、所述第二数据和所述第三数据进行排序,将排序后的数据确定为反馈给所述用户的目标数据。Further, the processing unit is specifically configured to determine the first data, the second data, and the third number Whether the data is the same data; if the first data, the second data, and the third data are the same data, the processing unit determines the same data as target data fed back to the user; The first data, the second data, and the third data are not the same data, and the processing unit sorts the first data, the second data, and the third data, and after sorting The data is determined to be target data that is fed back to the user.
本实施例中,通过对检索关键词进行分词处理获得多个目标分词,根据预先建立的映射表查询该多个目标分词中的维度关键词、指标关键词以及时间粒度关键词,提高了确定检索关键词中维度关键词、指标关键词以及时间粒度关键词的效率。In this embodiment, a plurality of target word segments are obtained by performing word segmentation processing on the search keywords, and the dimension keywords, index keywords, and time granularity keywords in the plurality of target word segments are queried according to the pre-established mapping table, thereby improving the determined search. The efficiency of dimensional keywords, index keywords and time-granulated keywords in keywords.
图13为本公开实施例二提供的查询终端的结构示意图,如图13所示,查询终端还包括:显示器。FIG. 13 is a schematic structural diagram of a query terminal according to
所述接收单元还用于接收所述用户对所述目标数据的点击操作。The receiving unit is further configured to receive a click operation of the target data by the user.
所述处理单元还用于根据所述点击操作建立所述用户与所述目标数据的关联关系。The processing unit is further configured to establish an association relationship between the user and the target data according to the click operation.
所述显示器,耦合到所述处理单元,当用户未输入所述检索关键词时,所述显示器显示所述关联关系关联的所述目标数据。The display is coupled to the processing unit, and when the user does not input the search keyword, the display displays the target data associated with the association relationship.
本实施例中,通过建立用户与其点击过的目标数据之间的关联关系,当用户未输入检索关键词时,可根据用户与目标数据之间的关联关系,显示用户点击过的目标数据,提高了用户查询数据的便捷性。In this embodiment, by establishing an association relationship between the user and the target data that the user clicked on, when the user does not input the search keyword, the target data that the user clicked may be displayed according to the relationship between the user and the target data, thereby improving The convenience of users to query data.
图14为本公开实施例三提供的查询终端的结构示意图,参照图14,查询终端1900包括处理组件1922,其进一步包括一个或多个处理器,以及由存储器1932所代表的存储器资源,用于存储可由处理组件1922的执行的指令,例如应用程序。存储器1932中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1922被配置为执行指令,以执行上述步骤S201-S1004的方法。14 is a schematic structural diagram of a query terminal according to Embodiment 3 of the present disclosure. Referring to FIG. 14, the
装置1900还可以包括一个电源组件1926被配置为执行装置1900的电源管理,一个有线或无线网络接口1950被配置为将装置1900连接到网络,和一个输入输出(I/O)接口1958。装置1900可以操作基于存储在存储器1932的操作系统,例如WindowsServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM或类似。
图15为本公开实施例提供的搜索引擎数据库的结构示意图,如图15所示,该搜索引擎数据库包括:接收器、存储器、处理器、以及发送器。 FIG. 15 is a schematic structural diagram of a search engine database according to an embodiment of the present disclosure. As shown in FIG. 15, the search engine database includes: a receiver, a memory, a processor, and a transmitter.
所述接收器,用于接收查询终端发送的维度关键词、指标关键词、以及时间粒度关键词,所述维度关键词、所述指标关键词、以及所述时间粒度关键词是所述查询终端接收用户的查询请求,并从所述查询请求包括的检索关键词中获取的。The receiver is configured to receive a dimension keyword, an index keyword, and a time granularity keyword sent by the query terminal, where the dimension keyword, the index keyword, and the time granularity keyword are the query terminal Receiving a query request of the user and obtaining it from the search keywords included in the query request.
所述存储器,用于存储数据出口中的数据,以及所述数据的特征信息,所述数据出口包括下述至少一种:数据应用程序、报表、知识库平台以及集群物理表,所述特征信息包括下述至少一种:维度特征、指标特征和时间粒度特征。The memory is configured to store data in a data exit, and characteristic information of the data, where the data export includes at least one of the following: a data application, a report, a knowledge base platform, and a cluster physical table, and the feature information At least one of the following is included: a dimensional feature, an index feature, and a time granular feature.
所述处理器,耦合到所述接收器和所述存储器,用于获取与所述维度关键词匹配的维度特征对应的第一数据、与所述指标关键词匹配的指标特征对应的第二数据、以及与所述时间粒度关键词匹配的时间粒度特征对应的第三数据。The processor is coupled to the receiver and the memory, and is configured to acquire first data corresponding to the dimensional feature matched by the dimension keyword, and second data corresponding to the index feature matched by the index keyword And third data corresponding to the time granularity feature matched by the time granularity keyword.
所述发送器,耦合到所述处理器,用于将所述第一数据、所述第二数据和所述第三数据发送给所述查询终端,以使所述查询终端根据所述第一数据、所述第二数据和所述第三数据,确定反馈给所述用户的目标数据。The transmitter is coupled to the processor, configured to send the first data, the second data, and the third data to the query terminal, so that the query terminal is configured according to the first The data, the second data, and the third data determine target data that is fed back to the user.
本实施例中,通过预先采集数据应用程序、报表、知识库平台以及集群物理表中的数据至搜索引擎数据库内,且为所采集的每一条数据添加维度特征、指标特征和时间粒度特征中的至少一个;当搜索引擎接收到用户输入的检索关键词时,首先对检索关键词进行拆分,获得维度关键词、指标关键词以及时间粒度关键词;然后,在预先建立的搜索引擎数据库中,分别查找与维度关键词、指标关键词以及时间粒度关键词相匹配的数据,并将匹配的数据显示给用户;用户无需遍历每个数据出口进行数据查找,仅需输入一次检索关键词,搜索引擎数据库即可查找出所有数据出口中与该检索关键词相关的数据,从而提高了查找数据的效率。In this embodiment, the data in the data application, the report, the knowledge base platform, and the cluster physical table are collected in advance into the search engine database, and the dimension feature, the index feature, and the time granularity feature are added to each collected data. At least one; when the search engine receives the search keyword input by the user, firstly splits the search keyword to obtain a dimension keyword, an index keyword, and a time granularity keyword; and then, in a pre-established search engine database, Find the data matching the dimension keyword, the index keyword and the time granularity keyword respectively, and display the matched data to the user; the user does not need to traverse each data export to perform data search, only need to input the search keyword once, the search engine The database can find out the data related to the search keyword in all data exits, thereby improving the efficiency of finding data.
在图15所示实施例基础上,所述处理器具体用于获取所述数据应用程序访问数据源的访问逻辑,所述访问逻辑包括所述数据应用程序中的数据,所述数据源存储有所述数据的产出逻辑;根据所述产出逻辑,确定所述数据应用程序中的数据的特征信息;将所述数据应用程序中的数据,以及所述数据的特征信息存储到所述存储器。On the basis of the embodiment shown in FIG. 15, the processor is specifically configured to acquire access logic of the data application accessing a data source, where the access logic includes data in the data application, where the data source is stored Output logic of the data; determining feature information of data in the data application according to the output logic; storing data in the data application, and feature information of the data to the memory .
或者,在图15所示实施例基础上,所述接收器还用于接收语法解析器发送的数据,以及所述数据的维度特征、指标特征、时间粒度特征,所述语法解析器用于采集所述数据应用程序中的数据,以及解析所述数据的维度特征、指标特征、时间粒度特征;所述处理器还用于将所述数据应用程序中的数据,以及所述数据的维度特征、指标特征、时间粒度特征存储到所述存储器。 Or, on the basis of the embodiment shown in FIG. 15, the receiver is further configured to receive data sent by a parser, and dimension features, index features, and time granularity features of the data, where the parser is used in a collection center. Data in the data application, and analyzing dimensional features, index features, and time granularity features of the data; the processor is further configured to use data in the data application, and dimensional characteristics and indicators of the data Feature, time granularity features are stored to the memory.
或者,在图15所示实施例基础上,所述处理器具体用于分别获取所述报表、所述知识库平台以及所述集群物理表中的数据;根据预设算法,确定所述报表、所述知识库平台以及所述集群物理表中每个数据的维度特征;将所述报表、所述知识库平台以及所述集群物理表中每个数据,以及所述数据的维度特征存储到所述存储器。Or the processor is specifically configured to acquire data in the report, the knowledge base platform, and the cluster physical table, respectively, according to the preset embodiment, and determine the report according to a preset algorithm. The knowledge base platform and the dimensional characteristics of each data in the cluster physical table; storing the data in the report, the knowledge base platform, and the cluster physical table, and the dimensional characteristics of the data to the Said memory.
本实施例中,搜索引擎数据库中存储有数据应用程序中的所有数据,且从数据应用程序存储到搜索引擎数据库中的每个数据关联有维度特征、指标特征和时间粒度特征;另外,搜索引擎数据库中存储有报表、知识库平台以及集群物理表中的所有数据,且从报表、知识库平台以及集群物理表存储到搜索引擎数据库中的每个数据关联有维度特征。In this embodiment, all data in the data application is stored in the search engine database, and each data stored in the search engine database from the data application is associated with dimensional features, index features, and time granularity features; in addition, the search engine The database stores all the data in the report, the knowledge base platform, and the cluster physical table, and each data association stored from the report, the knowledge base platform, and the cluster physical table to the search engine database has dimensional characteristics.
最后应说明的是:以上各实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述各实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本公开各实施例技术方案的范围。 It should be noted that the above embodiments are merely illustrative of the technical solutions of the present disclosure, and are not intended to be limiting; although the present disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present disclosure. range.
Claims (29)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610657498.8 | 2016-08-11 | ||
| CN201610657498.8A CN107729336B (en) | 2016-08-11 | 2016-08-11 | Data processing method, device and system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018028443A1 true WO2018028443A1 (en) | 2018-02-15 |
Family
ID=61162620
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2017/094790 Ceased WO2018028443A1 (en) | 2016-08-11 | 2017-07-28 | Data processing method, device and system |
Country Status (3)
| Country | Link |
|---|---|
| CN (1) | CN107729336B (en) |
| TW (1) | TW201805839A (en) |
| WO (1) | WO2018028443A1 (en) |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111309729A (en) * | 2020-02-13 | 2020-06-19 | 湖南快乐阳光互动娱乐传媒有限公司 | Data query method and device |
| CN111563095A (en) * | 2020-04-30 | 2020-08-21 | 上海新炬网络信息技术股份有限公司 | Data retrieval device based on HBase |
| CN112862458A (en) * | 2021-03-02 | 2021-05-28 | 岭东核电有限公司 | Nuclear power test procedure supervision method and device, computer equipment and storage medium |
| CN115145924A (en) * | 2022-07-15 | 2022-10-04 | 中国农业银行股份有限公司 | Data processing method, device, equipment and storage medium |
| CN115934779A (en) * | 2022-12-20 | 2023-04-07 | 深圳市城市公共安全技术研究院有限公司 | CIM search method, device, electronic equipment and storage medium |
| CN116383489A (en) * | 2023-03-17 | 2023-07-04 | 北京五八赶集信息技术有限公司 | Message push method, device, electronic device and storage medium |
| CN116579729A (en) * | 2023-03-17 | 2023-08-11 | 中电金信数字科技集团有限公司 | Service data processing method and device |
| CN117093708A (en) * | 2023-10-17 | 2023-11-21 | 中电数创(北京)科技有限公司 | Method for intelligently identifying search intention of user and visually displaying search results of element |
| CN118051543A (en) * | 2024-04-16 | 2024-05-17 | 宁德时代新能源科技股份有限公司 | Battery data query method, system, electronic device and storage medium |
Families Citing this family (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108664586B (en) * | 2018-05-07 | 2022-04-15 | 北京国电通网络技术有限公司 | Information acquisition method and system |
| CN108647213A (en) * | 2018-05-21 | 2018-10-12 | 辽宁工程技术大学 | A kind of composite key semantic relevancy appraisal procedure based on coupled relation analysis |
| CN109063108B (en) * | 2018-07-27 | 2020-03-03 | 北京字节跳动网络技术有限公司 | Search ranking method and device, computer equipment and storage medium |
| CN110928903B (en) * | 2018-08-31 | 2024-03-15 | 阿里巴巴集团控股有限公司 | Data extraction method and device, equipment and storage medium |
| CN109344300A (en) * | 2018-08-31 | 2019-02-15 | 深圳壹账通智能科技有限公司 | The data query of natural language is intended to determine method, apparatus and computer equipment |
| CN111435376A (en) * | 2019-01-15 | 2020-07-21 | 北京京东尚科信息技术有限公司 | Information processing method and system, computer system, and computer-readable storage medium |
| CN111967233A (en) * | 2019-05-20 | 2020-11-20 | 北京京东尚科信息技术有限公司 | Report generation method and device, computer readable storage medium and electronic equipment |
| CN110716950B (en) * | 2019-09-20 | 2024-05-17 | 北京神州数码云科信息技术有限公司 | A method, device, equipment and computer storage medium for establishing a caliber system |
| CN110737432B (en) * | 2019-09-20 | 2023-10-20 | 黄沙沙 | Script aided design method and device based on root list |
| CN110688541A (en) * | 2019-10-08 | 2020-01-14 | 中国建设银行股份有限公司 | Report data query method and device, storage medium and electronic equipment |
| CN110807089B (en) * | 2019-10-29 | 2023-02-28 | 出门问问创新科技有限公司 | Question answering method and device and electronic equipment |
| CN110851543A (en) * | 2019-11-08 | 2020-02-28 | 深圳市彬讯科技有限公司 | Data modeling method, device, equipment and storage medium |
| CN112948414A (en) * | 2019-12-19 | 2021-06-11 | 深圳市明源云链互联网科技有限公司 | Data report generation method and device, electronic equipment and storage medium |
| CN111400556A (en) * | 2020-03-06 | 2020-07-10 | 上海数据交易中心有限公司 | Data query method and device, computer equipment and storage medium |
| CN111913984A (en) * | 2020-08-18 | 2020-11-10 | 南开大学 | A picture book information query method and system based on preschool children's cognition |
| CN113793193B (en) * | 2021-08-13 | 2024-02-02 | 唯品会(广州)软件有限公司 | Data search accuracy verification method, device, equipment and computer readable medium |
| CN115934723A (en) * | 2022-12-15 | 2023-04-07 | 北京掌行通信息技术有限公司 | Query method, device, equipment and storage medium for multi-dimensional index data |
| CN115964562A (en) * | 2022-12-19 | 2023-04-14 | 国家电网有限公司 | A data search method, device and related equipment |
| CN116257545B (en) * | 2022-12-28 | 2024-01-30 | 联通智网科技股份有限公司 | Data query method and device, electronic equipment and storage medium |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080033920A1 (en) * | 2006-08-04 | 2008-02-07 | Kaelin Lee Colclasure | Method and apparatus for searching metadata |
| KR20080058634A (en) * | 2006-12-22 | 2008-06-26 | 엔에이치엔(주) | Search system and method |
| CN101620605A (en) * | 2008-07-04 | 2010-01-06 | 华为技术有限公司 | Search method, search server and search system |
| CN102521223A (en) * | 2011-09-02 | 2012-06-27 | 天津市道本科技有限公司 | Three-word-in-one enterprise knowledge associative storing, searching and presenting method |
| WO2014127500A1 (en) * | 2013-02-19 | 2014-08-28 | Google Inc. | Natural language processing based search |
Family Cites Families (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060047649A1 (en) * | 2003-12-29 | 2006-03-02 | Ping Liang | Internet and computer information retrieval and mining with intelligent conceptual filtering, visualization and automation |
| KR100776697B1 (en) * | 2006-01-05 | 2007-11-16 | 주식회사 인터파크지마켓 | Intelligent product search method and system based on customer purchase behavior analysis |
| US8156154B2 (en) * | 2007-02-05 | 2012-04-10 | Microsoft Corporation | Techniques to manage a taxonomy system for heterogeneous resource domain |
| CN101661477A (en) * | 2008-08-26 | 2010-03-03 | 华为技术有限公司 | Search method and system |
| CN102314654B (en) * | 2010-07-08 | 2017-10-17 | 阿里巴巴集团控股有限公司 | A kind of information-pushing method and Information Push Server |
| CN102385585A (en) * | 2010-08-27 | 2012-03-21 | 阿里巴巴集团控股有限公司 | Establishing method of webpage database, webpage searching method and relative device |
| CN102033910A (en) * | 2010-11-19 | 2011-04-27 | 福建富士通信息软件有限公司 | Enterprise search engine technology based on multiple data resources |
| CN102184257A (en) * | 2011-06-02 | 2011-09-14 | 广东亿迅科技有限公司 | Unified searching method, device and system |
| US20150302006A1 (en) * | 2014-04-18 | 2015-10-22 | Verizon Patent And Licensing Inc. | Advanced search for media content |
| CN104820715B (en) * | 2015-05-19 | 2019-01-29 | 杭州迅涵科技有限公司 | Based on the associated data sharing of various dimensions and analysis method and system |
| CN105279286A (en) * | 2015-11-27 | 2016-01-27 | 陕西艾特信息化工程咨询有限责任公司 | Interactive large data analysis query processing method |
-
2016
- 2016-08-11 CN CN201610657498.8A patent/CN107729336B/en active Active
-
2017
- 2017-06-12 TW TW106119497A patent/TW201805839A/en unknown
- 2017-07-28 WO PCT/CN2017/094790 patent/WO2018028443A1/en not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080033920A1 (en) * | 2006-08-04 | 2008-02-07 | Kaelin Lee Colclasure | Method and apparatus for searching metadata |
| KR20080058634A (en) * | 2006-12-22 | 2008-06-26 | 엔에이치엔(주) | Search system and method |
| CN101620605A (en) * | 2008-07-04 | 2010-01-06 | 华为技术有限公司 | Search method, search server and search system |
| CN102521223A (en) * | 2011-09-02 | 2012-06-27 | 天津市道本科技有限公司 | Three-word-in-one enterprise knowledge associative storing, searching and presenting method |
| WO2014127500A1 (en) * | 2013-02-19 | 2014-08-28 | Google Inc. | Natural language processing based search |
Cited By (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111309729A (en) * | 2020-02-13 | 2020-06-19 | 湖南快乐阳光互动娱乐传媒有限公司 | Data query method and device |
| CN111563095A (en) * | 2020-04-30 | 2020-08-21 | 上海新炬网络信息技术股份有限公司 | Data retrieval device based on HBase |
| CN111563095B (en) * | 2020-04-30 | 2023-05-26 | 上海新炬网络信息技术股份有限公司 | HBase-based data retrieval device |
| CN112862458A (en) * | 2021-03-02 | 2021-05-28 | 岭东核电有限公司 | Nuclear power test procedure supervision method and device, computer equipment and storage medium |
| CN115145924A (en) * | 2022-07-15 | 2022-10-04 | 中国农业银行股份有限公司 | Data processing method, device, equipment and storage medium |
| CN115934779A (en) * | 2022-12-20 | 2023-04-07 | 深圳市城市公共安全技术研究院有限公司 | CIM search method, device, electronic equipment and storage medium |
| CN116383489A (en) * | 2023-03-17 | 2023-07-04 | 北京五八赶集信息技术有限公司 | Message push method, device, electronic device and storage medium |
| CN116579729A (en) * | 2023-03-17 | 2023-08-11 | 中电金信数字科技集团有限公司 | Service data processing method and device |
| CN116579729B (en) * | 2023-03-17 | 2024-06-11 | 中电金信数字科技集团有限公司 | Service data processing method and device |
| CN117093708A (en) * | 2023-10-17 | 2023-11-21 | 中电数创(北京)科技有限公司 | Method for intelligently identifying search intention of user and visually displaying search results of element |
| CN117093708B (en) * | 2023-10-17 | 2024-02-13 | 中电数创(北京)科技有限公司 | Method for intelligently identifying search intention of user and visually displaying search results of element |
| CN118051543A (en) * | 2024-04-16 | 2024-05-17 | 宁德时代新能源科技股份有限公司 | Battery data query method, system, electronic device and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| TW201805839A (en) | 2018-02-16 |
| CN107729336A (en) | 2018-02-23 |
| CN107729336B (en) | 2021-07-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2018028443A1 (en) | Data processing method, device and system | |
| US11663254B2 (en) | System and engine for seeded clustering of news events | |
| CN103514183B (en) | Information search method and system based on interactive document clustering | |
| CN104166651B (en) | Method and device for data search based on integration of similar data objects | |
| TWI544351B (en) | Extended query method and system | |
| WO2019214245A1 (en) | Information pushing method and apparatus, and terminal device and storage medium | |
| US20120246154A1 (en) | Aggregating search results based on associating data instances with knowledge base entities | |
| US20100274821A1 (en) | Schema Matching Using Clicklogs | |
| US20160034514A1 (en) | Providing search results based on an identified user interest and relevance matching | |
| US9720979B2 (en) | Method and system of identifying relevant content snippets that include additional information | |
| WO2021196541A1 (en) | Method, apparatus and device used to search for content, and computer-readable storage medium | |
| CN110232126B (en) | Hot spot mining method, server and computer readable storage medium | |
| JP5057474B2 (en) | Method and system for calculating competition index between objects | |
| CA2956627C (en) | System and engine for seeded clustering of news events | |
| WO2017121272A1 (en) | Method and device for processing user behavior data | |
| WO2012129152A2 (en) | Annotating schema elements based associating data instances with knowledge base entities | |
| US9552415B2 (en) | Category classification processing device and method | |
| JP6664599B2 (en) | Ambiguity evaluation device, ambiguity evaluation method, and ambiguity evaluation program | |
| US20180089193A1 (en) | Category-based data analysis system for processing stored data-units and calculating their relevance to a subject domain with exemplary precision, and a computer-implemented method for identifying from a broad range of data sources, social entities that perform the function of Social Influencers | |
| US9400789B2 (en) | Associating resources with entities | |
| CA3051919C (en) | Machine learning (ml) based expansion of a data set | |
| CN114691845A (en) | Semantic search method and device, electronic equipment, storage medium and product | |
| KR20160120583A (en) | Knowledge Management System and method for data management based on knowledge structure | |
| Yan et al. | A multimodal retrieval and ranking method for scientific documents based on HFS and XLNet | |
| CN116910054A (en) | Data processing methods, devices, electronic equipment and computer-readable storage media |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17838573 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 17838573 Country of ref document: EP Kind code of ref document: A1 |