[go: up one dir, main page]

CN119719135A - A SQL statement completion method and system for database security management and control platform based on syntax tree - Google Patents

A SQL statement completion method and system for database security management and control platform based on syntax tree Download PDF

Info

Publication number
CN119719135A
CN119719135A CN202411789482.3A CN202411789482A CN119719135A CN 119719135 A CN119719135 A CN 119719135A CN 202411789482 A CN202411789482 A CN 202411789482A CN 119719135 A CN119719135 A CN 119719135A
Authority
CN
China
Prior art keywords
table name
syntax tree
names
completion
sql
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411789482.3A
Other languages
Chinese (zh)
Inventor
王伟斌
黄彪
杨小华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xinshu Technology Co ltd
Original Assignee
Beijing Xinshu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xinshu Technology Co ltd filed Critical Beijing Xinshu Technology Co ltd
Priority to CN202411789482.3A priority Critical patent/CN119719135A/en
Publication of CN119719135A publication Critical patent/CN119719135A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提出一种基于语法树的数据库安全管控平台SQL语句补全方法、系统,根据SQL语句的结构和逻辑提供准确的补全建议,对于嵌套子查询、联合查询等复杂的SQL语句,也具有较好的处理能力,可以精准识别SQL语句中的语法错误,SQL语句格式化、代码高亮等功能也容易实现。本发明确保对于历史数据中的每一个表名,都能基于时间衰减和特定的周期性扰动计算排序分值,整体上使得较久远的数据分值较低,较新近的数据分值较高,同时考虑了数据的重复性。

The present invention proposes a SQL statement completion method and system for a database security management platform based on a syntax tree, which provides accurate completion suggestions according to the structure and logic of SQL statements, and has good processing capabilities for complex SQL statements such as nested subqueries and joint queries, and can accurately identify syntax errors in SQL statements. Functions such as SQL statement formatting and code highlighting are also easy to implement. The present invention ensures that for each table name in historical data, the ranking score can be calculated based on time decay and specific periodic disturbances, which makes the score of older data lower and the score of more recent data higher, while taking into account the repeatability of the data.

Description

SQL sentence completion method and system for database security management and control platform based on grammar tree
Technical Field
The invention relates to a method and a system for supplementing SQL sentences of a database security management and control platform based on a grammar tree, belonging to the technical field of database security management.
Background
In recent years, information technology has been rapidly developed, and the amount of data generated and stored in various industries of society has been exponentially increased. As the core of data storage, databases are of great interest. The use and management of databases is a very complex task, the more voluminous the database data volume, the more complex it is to use and manage. In this case, database security management and control becomes an important task in order to secure database assets. The database security management and control platform has a plurality of functions such as data protection, authentication and access control, data shielding and desensitization, and the like. In the use process of the database security management and control platform, a convenient SQL (Structured Query Language) editor is provided, so that a developer can design an accurate SQL sentence conveniently, and the method is one of necessary functions.
SQL is a standard language for accessing relational databases, and the advent of SQL provides great flexibility and convenience for data manipulation. However, the mastering of SQL is not a very early time, even with a certain programming basis, even though database theory has been learned previously, it is not easy for an inexperienced SQL beginner to design SQL statements. The syntax structure of SQL is complex, if complex queries are to be written, not only the SQL syntax is familiar, but also the specific structure of the database is combined. Thus, even experienced SQL developers are often very inefficient in facing complex query logic.
In order to improve the design efficiency of SQL sentences, a plurality of auxiliary tools, such as SQL sentence complementation tools, are appeared. In the process of writing the SQL sentence by the user, the SQL sentence complementation tool can automatically provide code suggestions according to the content which is input by the user and the known database structure, thereby helping the user to construct the SQL sentence more quickly. The function has the advantages that firstly, SQL sentence complementation replaces a purely manual mode to write the SQL sentence to a certain extent, the working efficiency is improved, more input errors possibly occurring in the purely manual mode are reduced, and secondly, the SQL sentence complementation provides prompts such as table names, field names and the like, so that the SQL sentence writing difficulty is reduced for users unfamiliar with a database structure.
At present, the SQL sentence completion function is realized mainly by the following method:
(1) The keyword matching-based method is used for giving a complement suggestion according to the input content of the user. For example, when the user enters a partial start content of a table name or column name, the tool will list all matching complete table names or column names for the user to select. The keyword matching-based method is the most basic implementation method, has higher efficiency, and is difficult to provide enough help for complex queries such as multi-table connection queries.
(2) Template-based methods, which are to build templates according to common SQL query patterns and generate complement suggestions according to the templates. The method has obvious advantages and disadvantages, namely good complementing effect when the SQL sentences covered by the template are adopted, and poor complementing effect when the SQL sentences not covered by the template are otherwise adopted. Therefore, the cost of the maintenance of the template is high. For less conventional or more complex queries, it is difficult to have an adapted template, so SQL complementation works poorly.
(3) A statistical or machine learning-based method includes such steps as analyzing the historical inquiry record of user, learning the inquiry habit of user, and providing the complement advice based on it, or training deep learning model, predicting the inquiry intention of user, and providing the complement advice based on the intention. Such methods rely heavily on historical data quality for training, low quality historical data or noise data, and have a relatively large negative impact on the quality of SQL completions, and in addition, the training and maintenance costs of the model are relatively high.
Drawings
FIG. 1 is a fragment of an SQL statement.
FIG. 2 is a diagram of the system modules and components of the present invention and the data flow process.
FIG. 3 is a user input SQL and SQL code for generating a syntax tree.
Fig. 4 is a usage record of table names.
FIG. 5 is a table name ordering algorithm flow chart.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a database security management and control platform SQL sentence complementation method based on a grammar tree, the method uses a parser to convert a portion of the SQL statement input by the user into a syntax tree, and then predicts the next possible content based on the syntax tree structure and context information. The method specifically comprises the following steps:
(1) The construction of a grammar tree, namely, inputting SQL sentences in an editor by a user to generate an original grammar tree;
(2) The module receives the original grammar tree generated by the grammar tree construction module, traverses the original grammar tree, adds ID and father node ID for each node, marks the table name and column name, adds type information for clause nodes and outputs the grammar tree after reinforcement;
(3) Generating a complement suggestion, namely giving the complement suggestion according to the enhanced grammar tree and the current cursor position;
(4) Database object management, namely returning the database object information according to the database object information request;
(5) Displaying the complement suggestion in the editor UI;
And when the completion display is performed, the SQL sentence completion result is displayed to the user according to the keyword, the table name and the column name, the higher ranking score is given to the newer record, the lower ranking score is given to the longer record, and the disturbance is added into the decay function.
Further, in the editor integration module, sorting is performed according to the keywords, the table names and the column names, and the method specifically comprises the following steps:
3.1 preparing a table name sequence x n,xn-1,…,x2,x1, initializing an empty set S for storing the processed table names and the corresponding sorting scores thereof, wherein n is the number of the table names in the history record, and x i represents the table name with the sequence number i, and i is more than or equal to 1 and less than or equal to n.
3.2 Calculating parametersWherein, α max and α min are respectively the maximum value and the minimum value of the preset parameter α, log () represents the base 10 logarithm, and n max is the preset maximum value of n;
3.3, calculating the sorting score for the table name with the sequence number of i, wherein i is circularly calculated from 1 to n, and the calculation process is as follows:
① Calculating a ranking score f (i) =e -αi(1+b×sin2 ((1- α) i+c)), where sin () is a sine function, b e [0,1 ], c is the phase offset;
② Inquiring whether the information of the table name x i exists in the set S, if the information of the table name x i exists in the set S, finding out the sorting score corresponding to the table name x i from the set S, and marking as At this point, a new rank score for table name x i is calculatedAnd will beUpdate to set S, order the ranking score of table name x i if the table name x i information was not previously present in set SAnd will beAdded to the set S.
③ See if there are other table names following table name x i in the table name sequence. If there are other table names, go to step ①, and if there are no other table names, the description is completed, go to step 3.4.
3.4 Rank the table names in set S from big to small in rank score.
Based on the method, the invention also provides a database security management and control platform SQL sentence completion system based on the grammar tree, which comprises the following steps:
(1) The grammar tree construction module receives SQL sentences input by a user and generates an original grammar tree;
(2) The grammar tree enhancement module is used for receiving the original grammar tree generated by the grammar tree construction module, traversing the original grammar tree, adding an ID and a father node ID for each node, marking a table name and a column name, adding type information for clause nodes, and outputting the enhanced grammar tree;
(3) The completion suggestion generation module is used for giving a completion suggestion according to the enhanced grammar tree and the current cursor position;
(4) The database object management module returns the database object information according to the database object information request;
(5) An editor integration module that displays the complement suggestion in an editor UI;
And when the completion display is performed, the SQL sentence completion result is displayed to the user according to the keyword, the table name and the column name, the higher ranking score is given to the newer record, the lower ranking score is given to the longer record, and the disturbance is added into the decay function.
Further, in the editor integration module, sorting is performed according to the keywords, the table names and the column names, and the method specifically comprises the following steps:
3.1 preparing a table name sequence x n,xn-1,…,x2,x1, initializing an empty set S for storing the processed table names and the corresponding sorting scores thereof, wherein n is the number of the table names in the history record, and x i represents the table name with the sequence number i, and i is more than or equal to 1 and less than or equal to n.
3.2 Calculating parametersWherein, α max and α min are respectively the maximum value and the minimum value of the preset parameter α, log () represents the base 10 logarithm, and n max is the preset maximum value of n;
3.3, calculating the sorting score for the table name with the sequence number of i, wherein i is circularly calculated from 1 to n, and the calculation process is as follows:
① Calculating a ranking score f (i) =e -αi(1+b×sin2 ((1- α) i+c)), where sin () is a sine function, b e [0,1 ], c is the phase offset;
② Inquiring whether the information of the table name x i exists in the set S, if the information of the table name x i exists in the set S, finding out the sorting score corresponding to the table name x i from the set S, and marking as At this point, a new rank score for table name x i is calculatedAnd will beUpdate to set S, order the ranking score of table name x i if the table name x i information was not previously present in set SAnd will beAdded to the set S.
③ See if there are other table names following table name x i in the table name sequence. If there are other table names, go to step ①, and if there are no other table names, the description is completed, go to step 3.4.
3.4 Rank the table names in set S from big to small in rank score.
The invention can provide accurate complement suggestion according to the structure and logic of SQL sentences, has better processing capability for complex SQL sentences such as nested sub-queries, joint queries and the like, can accurately identify grammar errors in the SQL sentences, and is easy to realize the functions such as SQL sentence formatting, code highlighting and the like. The invention ensures that for each table name in the historical data, the ranking score can be calculated based on the time decay and the specific periodic disturbance, so that the data score of longer time is lower and the data score of more recent time is higher as a whole, and the repeatability of the data is considered.
Detailed Description
Example 1
Based on the analysis, the invention designs an SQL sentence completion method based on a grammar tree (Abstract Syntax Tree, AST), which is named AST-SQL. The method uses a parser to convert a part of SQL sentences input by a user into a grammar tree, and predicts the next possible content according to the grammar tree structure and the context information, and takes the next possible content as a complement suggestion.
A syntax tree is a tree-like representation of the source code syntax structure, that is, the structure of a program is represented in the form of a tree, each node on the tree representing a syntax structure such as an expression, statement or type declaration, etc. In SQL, in particular, the syntax tree can represent the constituent parts of the SQL statement in a hierarchical and clear manner. The AST-SQL method can provide accurate complement suggestion according to the current context because the structure and logic of SQL sentences can be obtained through a grammar tree, and has better processing capability even for complex SQL sentences such as nested sub-queries, joint queries and the like. Meanwhile, the implementation basis of the method is a grammar tree, so that grammar errors in SQL sentences have congenital advantages in recognition capability. In addition, functions like SQL statement formatting, code highlighting and the like can be easily realized as auxiliary functions on the basis of a grammar tree.
In the AST-SQL method, each node of the syntax tree has some fixed attributes, including:
(1) ID-a unique ID per node.
(2) Type Each node has a type attribute, which indicates the type of the node, and specifically comprises statement,clause,keyword,property_access,identifier,comma,function_call,parenthesis,operator,literal.
(3) Index, the elements of the array child have index subscript attributes, so that the elements are convenient to search forward and backward.
According to the definition of the AST-SQL method for the node, taking a simple SQL statement as an example, it can be roughly split into fragments as in fig. 1.
On the basis of an AST-SQL method, the invention designs an SQL sentence completion system based on a grammar tree, and key components of the system comprise:
(1) SQLFormatterAst the component is used to generate the syntax tree of the SQL statement. In the database field, the components are relatively mature, and in general, tools for formatting SQL sentences have the core of building a grammar tree around the SQL sentences.
(2) SqlAst the method is used for analyzing the grammar tree generated by SQLFormatterAst, the meta information contained in the grammar tree is more conventional, and in order to be suitable for being used in SQL sentence completion, additional meta information such as node ID (ID field), father node ID (parentId field) and the like is added in the conventional grammar tree, and the additional meta information is helpful for establishing the hierarchical relationship among different nodes, so that a precondition is provided for improving the efficiency of the method. The component ultimately outputs an enhanced syntax tree that contains structural information about the SQL statement.
(3) SqlAdvisor determining appropriate complement suggestions based on the cursor position and the syntax tree. The component locates the node where the cursor is located, then analyzes the context of the node (e.g., clause type, node type, etc.), and generates a completion suggestion based on the analysis result. The completion suggestion is finally output, including the type of completion (table name, column name, keywords, etc.) and related context information.
(4) DatabaseObjectStore storing information about database objects (e.g., tables and columns) for providing completion options. The component loads the information of the corresponding database object according to the authority of the user and provides the complement option according to the request.
(5) SqlCompletionProvider implementing completion logic in the editor, including exposing completion suggestions. In flow, the component receives SqlAdvisor the generated completion suggestion and generates the completion term in the editor based on the completion suggestion and then presents the completion term to the user.
The key modules included in the system are:
(1) Grammar tree construction this module receives the SQL statement entered by the user and then uses SQLFormatterAst components to generate the original grammar tree.
(2) Syntax tree enhancement-the module receives the original syntax tree generated by the syntax tree construction module and then uses SqlAst components to do the following:
1) Traversing the original grammar tree, adding ID, father node ID and other information for each node,
2) Specific nodes are marked, such as table names, column names, etc.
3) Type information is added for clause nodes.
And finally, outputting the enhanced grammar tree.
(3) And generating a complement suggestion, namely giving the complement suggestion according to the enhanced grammar tree and the current cursor position. The module mainly uses SqlAdvisor components which firstly locate the node where the cursor is located, then analyze the context of the cursor node, determine the completion type and finally generate the completion suggestion, including the completion type and the context information.
(4) And (3) managing the database object, namely returning the database object information by the module according to the database object information request on the basis of the DatabaseObjectStore component.
(5) Editor integration the module uses SqlCompletionProvider components to follow up the complement suggestion and displays it specifically in the editor UI.
The SQL statement based on AST-SQL complements the complete module and component diagram of the system and the data flow process is shown in figure 2. The method mainly comprises the following implementation steps:
(1) Syntax tree construction
The user inputs SQL sentences in the editor, and the editor records the cursor position in the input process. In the AST-SQL method, the cursor position is marked with a special string __ cursor __. The SQLFormatterAst component is used to parse the SQL statement with __ cursor __ markup, generating the original syntax tree.
It should be noted that, the user may input many SQL statements in one SQL editor, and when the SQL statements are completed, only the SQL statement in which the current cursor is located need to be analyzed, instead of all the SQL statements in the whole SQL editor. The SQL sentences are divided by the marks, the method for acquiring the SQL sentences where the cursor is located is to search the nearest mark position before the cursor and the nearest mark position after the cursor, and the content between the 2 marks is the SQL sentences where the cursor is located. The semicolons used in this process are not included in either case, one in SQL notes, such as Hello, world, and the other in strings, such as 'Hello, world'.
FIG. 3 illustrates in specific contrast the user input SQL and SQL used to generate the grammar tree (where | represents the cursor).
(2) Syntax tree enhancement
In this step, the SqlAst component augments the original syntax tree that was generated, assigning each node information such as ID, type, parent and child nodes. Traversing clauses in the grammar tree, and carrying out different marking operations, such as marking table names, column names and the like, according to the clause types.
(3) Cursor position location
The SqlAdvisor component looks up the node labeled __ cursor __ and determines the node where the cursor is located. And searching the parent node of the cursor node by a recursion method until the nearest clause node is found. The completion type (table name, column name, keyword, etc.) is judged according to the type and position of the cursor node.
(4) Complement suggestion generation
A completion suggestion is generated SqlAdvisor based on the position and context of the cursor node. In short, the complement suggestion is to obtain what type of data, such as table name, column name, and common keywords, should be prompted to input by the current cursor position. Therefore, the content generated by the partial complement proposal is trivial and includes more head ends, and mainly includes the following cases:
1) Complement table name:
The table names, including view names, are completed in the system. The case of complementing table names mainly includes:
in the FROM clause, traversing forward in the grammar tree, complementing column names if ON is encountered, complementing table names if JOIN is encountered, complementing table names if comma or the first element is in front.
INSERT INTO clause, complement the table name, and may require automatic addition of aliases, etc. at full time.
INSERT IGNORE INTO clauses, complement table names.
Replacement INTO clause, complement table name.
DELETE FROM clause, complement table name.
UPDATE clause, complement table name.
DROP TABLE clause, complement TABLE name.
CREATE VIEW clauses-View map title does not require complementation.
VALUES clauses, in which the values entered by the user are not required to be completed.
DROP VIEW clause-complement VIEW map title.
The TRUNCATE TABLE clause, complement the TABLE name.
ALTER TABLE clause, complement TABLE name.
2) Complement column name:
SELECT clause, complement column names according to tables in the FROM clause.
WHERE clause, complement column name.
GROUP BY clause, complement column name.
ORDER BY clause, complement column name.
HAVING clause, complement column name.
3) Complement key:
in other cases, the keywords are complemented.
(5) Database object retrieval
The corresponding database object is retrieved from DatabaseObjectStore components according to the type of complement suggestion. If some tables or columns have not been loaded, an asynchronous load is performed and the completion list is updated.
For table name completion, if a user inputs a schema in SQL, then the table under the schema is completed, and if no schema is input, then all tables are completed.
For column name completion, if a table name is input by the user, the column of the table is completed, and if no table name is input, the completion is performed according to the table in the FROM clause.
(6) Completion presentation in an editor
The completion suggestions generated by the SqlAdvisor component are displayed to the user by the SqlCompletionProvider component. The presentation of the complement suggestion takes into account differences in different types of hinting items, such as keywords, table names, column names, etc., and supports different types of icons.
In the complement presentation, a ranking algorithm is designed, which can rank according to keywords, table names and column names, respectively. And when the SQL sentence completion result is displayed to the user, the result is sequenced according to the sequencing algorithm and then displayed. The algorithm is calculated by giving a higher ranking score to newer records and a lower ranking score to longer records, wherein the lower ranking score contains a decay function. At the same time, in order to avoid that the attenuation process is too ideal, a slight disturbance is added to the attenuation function.
The system stores the usage records of the key words, table names and column names of the user, and the usage records of the table names are shown in fig. 4.
The usage record of Table names includes Table names and the order of usage, and in the above example, the user sequentially accesses tables such as Table4, table3, table2. The first Table4 was used a long time ago, while the last Table2 was last used just recently. The Table names are numbered according to the reverse order of the actual use order, so that the corresponding serial numbers of the Table names can be obtained, the last Table2 serial number is 1, the first Table4 serial number is 10, and the other serial numbers are sequentially numbered. The method and process of keywords and column names are consistent with table names, so the algorithm will be described only by taking the table names as examples.
According to the above rule, let the sequence of table names be x n,xn-1,…,x2,x1, where n is the number of table names in the history, it should be noted that there may be duplication in these n table names. In this sequence, x i represents the table name with the sequence number i, 1.ltoreq.i.ltoreq.n.
The specific steps of the algorithm are shown in fig. 5. The specific information of each step is as follows:
1) Initialization of
The table name sequence x n,xn-1,…,x2,x1 is prepared and the empty set S is initialized for storing the processed table names and their corresponding ranking scores.
2) Parameter calculation
The parameter alpha is calculated, and the parameter is required to be used in the following steps, and the calculation method comprises the following steps:
Where α max and α min are the maximum and minimum values of the preset parameter α, respectively, log () represents the base-10 logarithm, and n max is the preset maximum value of n.
3) Historical data traversal
The calculation of the rank scores is performed on table names with the sequence number i, note that i is calculated circularly from 1 to n.
The calculation process is as follows:
① The method for calculating the sequencing score comprises the following steps:
f(i)=e-αi(1+b×sin2((1-α)i+c))
where sin () is a sine function, b e [0, 1) controls the amplitude of the periodic disturbance, c is the phase offset, and controls the starting position of the disturbance.
② Repeating the table name check:
Query set S is made as to whether there is information for table name x i already in set S.
If the inspection result shows that the information of table name x i already exists in set S, the ranking score corresponding to table name x i is found out from set S and recorded asAt this time, a new ranking score of table name x i is calculated by: And the new ranking score of the calculated table name x i Updated into the set S.
If the inspection results show that the information for table name x i was not previously present in set S, then let the table name x i rank scoreAnd the table name x i and the calculated ranking scoreAdded to the set S.
③ Traversing:
See if there are other table names following table name x i in the table name sequence. If there are other table names, the step a is skipped to continue processing, otherwise, if there are no other table names, the step a is continued after the traversal is completed.
4) Result output
The processed table names and the corresponding sorting scores thereof are stored in the set S, and the table names are sorted according to the sorting scores from big to small, so that the table name sequence for display output can be obtained.
The algorithm ensures that for each table name in the historical data, a ranking score can be calculated based on the time decay and the particular periodic disturbance, resulting in a lower data score for longer and higher data score for more recent as a whole, while taking into account the repeatability of the data.
Example 2
The following describes the use of the SQL statement completion system with specific examples.
Assume that there are 3 SQL statements:
these 3 SQL statements are not complete, and each SQL is actually in the middle of editing. In the SQL editor, there should be one SQL sentence complement suggestion for each possible input content of the SQL current position (the current input position, i.e. cursor position, is represented by "|" in the 3 SQL sentences). The specific process is as follows:
(1) Syntax tree construction
The user inputs SQL sentences in the editor, and the editor records the cursor position in the input process. In the AST-SQL method, the cursor position is marked with a special string __ cursor __. According to the method, the 3 SQL sentences are respectively converted into:
The SQLFormatterAst component then parses the SQL statement with __ cursor __ markup, generating the original syntax tree.
(2) Syntax tree enhancement
In this step, the SqlAst component augments the original syntax tree that was generated, assigning each node information such as ID, type, parent and child nodes. Traversing clauses in the grammar tree, and carrying out different marking operations, such as marking table names, column names and the like, according to the clause types. According to the method, the enhanced grammar trees corresponding to the 3 SQL sentences are respectively as follows:
(3) Cursor position location
The SqlAdvisor component looks up the node labeled __ cursor __ and determines the node where the cursor is located. And searching the parent node of the cursor node by a recursion method until the nearest clause node is found. The completion type (table name, column name, keyword, etc.) is judged according to the type and position of the cursor node.
From the generated grammar tree, the 3 SQL sentences should be complemented with column names, table names and table names, respectively.
(4) Complement suggestion generation
A completion suggestion is generated SqlAdvisor based on the position and context of the cursor node. In short, the complement suggestion is to obtain what type of data, such as table name, column name, and common keywords, should be prompted to input by the current cursor position. From the generated grammar tree, the 3 SQL sentences should be presented with column names, table names and table names, respectively.
(5) Database object retrieval
The corresponding database object is retrieved from DatabaseObjectStore components according to the type of complement suggestion. If some tables or columns have not been loaded, an asynchronous load is performed and the completion list is updated.
For table name completion, if a user inputs a schema in SQL, then the table under the schema is completed, and if no schema is input, then all tables are completed.
For column name completion, if a table name is input by the user, the column of the table is completed, and if no table name is input, the completion is performed according to the table in the FROM clause.
For the 3 SQL sentences, the 1 st SQL sentence needs to supplement column names, but no table names are provided, so that the table names cannot be prompted, and for the 2 nd and 3 rd SQL sentences, the table names need to be supplemented, and at the moment, the corresponding database tables are retrieved from the DatabaseObjectStore component, so that a table name list is obtained.
(6) Completion presentation in an editor
The completion suggestions generated by the SqlAdvisor component are displayed to the user by the SqlCompletionProvider component. The presentation of the complement suggestion takes into account differences in different types of hinting items, such as keywords, table names, column names, etc., and supports different types of icons.
For the 3 SQL sentences, the 1 st SQL sentence needs to supplement column names, but no table names are provided, so that the 3 st SQL sentence cannot be prompted, and for the 2 nd and 3 rd SQL sentences, the corresponding database tables are retrieved from the DatabaseObjectStore components to obtain a table name list, and the table name list is provided for the SqlAdvisor components. If the part of the list is found in the use record, the sorting score of the part of the list needs to be calculated and displayed in front of the prompt area, and if the part of the list is not used in the use record, the list is displayed in back of the prompt area after being sequentially arranged by letters.
The calculation of the ranking score adopts a ranking algorithm which can rank according to keywords, table names and column names respectively. And when the SQL sentence completion result is displayed to the user, the result is sequenced according to the sequencing algorithm and then displayed. The algorithm is calculated by giving a higher ranking score to newer records and a lower ranking score to longer records, wherein the lower ranking score contains a decay function. At the same time, in order to avoid that the attenuation process is too ideal, a slight disturbance is added to the attenuation function.
The specific steps of the algorithm of the ordered set of table name historical data :X={x10,x9,…,x1}={"student","scores","courses","student","departments","scores","student","scores","courses","student"}, are:
1) Initialization of
The initialization empty set S is used to store the processed table names and their corresponding ranking scores, n=10.
2) Parameter calculation
Let a max=0.5,amin=0.1,nmax =15, b=0.2, c=0.5, according to the given formula
The parameter α, α≡0.366 can be calculated.
3) Historical data traversal
The calculation of the rank scores is performed on table names with the sequence number i, note that i is calculated circularly from 1 to n.
The calculation process is as follows:
① The method for calculating the sequencing score comprises the following steps:
f(i)=e-αi(1+b×sin2((1-α)i+c))
where sin () is a sine function, b e [0, 1) controls the amplitude of the periodic disturbance, c is the phase offset, and controls the starting position of the disturbance.
② Repeating the table name check:
Query set S is made as to whether there is information for table name x i already in set S.
If the inspection result shows that the information of table name x i already exists in set S, the ranking score corresponding to table name x i is found out from set S and recorded asAt this time, a new ranking score of table name x i is calculated by: And the new ranking score of the calculated table name x i Updated into the set S.
If the inspection results show that the information for table name x i was not previously present in set S, then let the table name x i rank scoreAnd the table name x i and the calculated ranking scoreAdded to the set S.
③ Traversing:
See if there are other table names following table name x i in the table name sequence. If there are other table names, the step a is skipped to continue processing, otherwise, if there are no other table names, the step a is continued after the traversal is completed.
The specific calculation process is as follows:
when i=1, the table name x 1, i.e. the table name "student",
f(1)=e-0.366×1(1+0.2×sin2((1-0.366)×1+0.5))≈0.697
The table name "student" and its rank score of 0.697 are added to set S.
When i=2, the processing table name x 2, i.e. table name "courses",
f(2)=e-0.366×2(1+0.2×sin2((1-0.366)×2+0.5))≈0.483
The table name "courses" and its rank score 0.483 are added to set S.
When i=3, the table name x 3, i.e. the table name "score",
f(3)=e-0.366×3(1+0.2×sin2((1-0.366)×3+0.5))≈0.332
The table name "score" and its rank score 0.332 are added to set S.
When i=4, the table name x 4, i.e. the table name "student",
f(4)=e-0.366×4(1+0.2×sin2((1-0.366)×4+0.5))≈0.228
Because the table name "student" was previously present in set S, the rank score corresponding to table name x 4 was found from set S and recorded asOn the basis of which the weight is updated:
The ranking score of the table name "student" in set S is updated to 0.732.
When i=5, the table name x 5, i.e. the table name "score",
f(5)=e-0.366×5(1+0.2×sin2((1-0.366)×5+0.5))≈0.157
Because the table name "score" was previously present in set S, the rank score corresponding to table name x 5 was found from set S and recorded asOn the basis of which the weight is updated:
The ranking score of the table name "score" in set S is updated to 0.415.
And so on, the following steps are respectively:
When i=6, the table name "departments" and its ranking score 0.105 are added to the set S.
When i=7, the ranking score of the table name "student" in the set S is updated to 0.753.
When i=8, the ranking score of table name "courses" in set S is updated to 0.502.
When i=9, the ranking score of the table name "score" in the set S is updated to 0.432.
When i=10, the ranking score of the table name "student" in the set S is updated to 0.767.
4) Result output
The processed table names and the corresponding sorting scores thereof are stored in the set S, and the table names are sorted according to the sorting scores from big to small, so that the table name sequence for display output can be obtained. The results of the above example are as follows:
Finally, four table names of student, courses, score, departments may be displayed in order in the completion suggestion for presentation to the user.
The algorithm sorting process has the advantages that ① is simple and easy to implement, a complex mathematical model or a deep learning framework is not needed, understanding and implementation are easy, a large amount of computing resources are not needed, and implementation difficulty is low. ② The time sensitivity, algorithm takes into account the time decay factor, which means that more recent data has a higher score, which helps to capture the latest trend with higher accuracy. ③ The repeated data processing is carried out, the algorithm effectively processes the repeated data, and the score calculation method accurately reflects the repeatability of the data points and simultaneously adapts to the time attenuation characteristic. ④ The algorithm can calculate results quickly in response, which is very useful for application scenarios requiring real-time feedback.

Claims (4)

1.一种基于语法树的数据库安全管控平台SQL语句补全方法,该方法使用解析器将用户输入的部分SQL语句转换为语法树,然后根据语法树结构和上下文信息,预测下一个可能的内容,具体包括以下步骤:1. A SQL statement completion method for a database security management platform based on a syntax tree. The method uses a parser to convert part of the SQL statement input by the user into a syntax tree, and then predicts the next possible content based on the syntax tree structure and context information. The method specifically includes the following steps: (1)语法树构建:用户在编辑器中输入SQL语句,生成原始语法树;(1) Syntax tree construction: The user enters the SQL statement in the editor to generate the original syntax tree; (2)语法树增强:该模块接收语法树构建模块生成的原始语法树,遍历原始语法树,为每个节点添加ID、父节点ID,再对表名、列名进行标记,最后为子句节点添加类型信息,输出增强后的语法树;(2) Syntax tree enhancement: This module receives the original syntax tree generated by the syntax tree construction module, traverses the original syntax tree, adds an ID and a parent node ID to each node, then marks the table name and column name, and finally adds type information to the clause node, and outputs the enhanced syntax tree; (3)补全建议生成:根据增强后的语法树及当前光标的位置,给出补全建议;(3) Completion suggestion generation: Completion suggestions are given based on the enhanced syntax tree and the current cursor position; (4)数据库对象管理:根据数据库对象信息请求,返回数据库对象信息;(4) Database object management: Return database object information based on database object information request; (5)编辑器集成:在编辑器UI中显示补全建议;(5) Editor integration: Display completion suggestions in the editor UI; 其特征在于:补全显示时,按照关键字、表名、列名排序,将SQL语句补全结果展示给用户,给较新的记录较高的排序分数,给较久的记录较低的排序分数,并在衰减函数中加入扰动。Its characteristics are as follows: when displaying completion, the SQL statement completion results are displayed to the user in order of keywords, table names, and column names, higher ranking scores are given to newer records, lower ranking scores are given to older records, and disturbances are added to the attenuation function. 2.如权利要求1所述的一种基于语法树的数据库安全管控平台SQL语句补全方法,其特征在于:编辑器集成模块中,根据关键字、表名、列名进行排序,具体包括以下步骤:2. A method for completing SQL statements of a database security management and control platform based on a syntax tree as claimed in claim 1, characterized in that: in the editor integration module, sorting is performed according to keywords, table names, and column names, specifically comprising the following steps: 3.1准备好表名序列xn,xn-1,…,x2,x1,并初始化空集合S用于存储已处理过的表名及其对应的排序分值,n为历史记录中的表名个数,xi表示序号为i的表名,1≤i≤n。3.1 Prepare the table name sequence xn , xn -1 , ..., x2 , x1 , and initialize the empty set S to store the processed table names and their corresponding sorting scores. n is the number of table names in the historical records, xi represents the table name with sequence number i, 1≤i≤n. 3.2计算参数其中,αmax和αmin分别为预设的参数α的最大值和最小值,log()表示以10为底的对数,nmax是预设的n的最大值;3.2 Calculation parameters Wherein, α max and α min are respectively the maximum and minimum values of the preset parameter α, log() represents the logarithm with base 10, and n max is the maximum value of the preset n; 3.3对序号为i的表名计算排序分值,i从1开始,循环计算到n,计算过程为:3.3 Calculate the ranking score for the table name with serial number i, i starts from 1 and loops to n. The calculation process is: ①计算排序分值f(i)=e-αi(1+b×sin2((1-α)i+c)),其中,sin()为正弦函数,b∈[0,1),c为相位偏移;① Calculate the ranking score f(i) = e -αi (1+b×sin 2 ((1-α)i+c)), where sin() is the sine function, b∈[0,1), and c is the phase offset; ②查询集合S中是否已存在表名xi的信息,如果表名xi信息已存在于集合S中,则从集合S中找出表名xi对应的排序分值,记为此时计算表名xi新的排序分值并将更新到集合S中;如果表名xi信息此前并未存在于集合S中,则令表名xi的排序分值并将添加到集合S中。② Query whether the information of table name xi already exists in the set S. If the information of table name xi already exists in the set S, find the ranking score corresponding to table name xi from the set S, which is recorded as At this time, calculate the new ranking score of table name x i and will Update to the set S; if the table name x i information does not exist in the set S before, then let the ranking score of table name x i and will Add to the set S. ③查看表名序列中表名xi后是否还有其他表名。如果还有其他表名,则跳转到步骤①;如果没有其他表名,说明已遍历完毕,转到步骤3.4。③ Check whether there are other table names after table name x i in the table name sequence. If there are other table names, jump to step ①; if there are no other table names, it means that the traversal has been completed, and go to step 3.4. 3.4按照排序分值从大到小对集合S中的表名排序。3.4 Sort the table names in set S according to the sorting scores from large to small. 3.一种基于语法树的数据库安全管控平台SQL语句补全系统,该系统包括:3. A SQL statement completion system for a database security management and control platform based on a syntax tree, the system comprising: (1)语法树构建模块:该模块接收用户输入的SQL语句,生成原始语法树;(1) Syntax tree construction module: This module receives the SQL statement input by the user and generates the original syntax tree; (2)语法树增强模块:该模块接收语法树构建模块生成的原始语法树,遍历原始语法树,为每个节点添加ID、父节点ID,再对表名、列名进行标记,最后为子句节点添加类型信息,输出增强后的语法树;(2) Syntax tree enhancement module: This module receives the original syntax tree generated by the syntax tree construction module, traverses the original syntax tree, adds an ID and a parent node ID to each node, then marks the table name and column name, and finally adds type information to the clause node and outputs the enhanced syntax tree; (3)补全建议生成模块:该模块根据增强后的语法树及当前光标的位置,给出补全建议;(3) Completion suggestion generation module: This module gives completion suggestions based on the enhanced syntax tree and the current cursor position; (4)数据库对象管理模块:该模块根据数据库对象信息请求,返回数据库对象信息;(4) Database object management module: This module returns database object information according to the database object information request; (5)编辑器集成模块:该模块在编辑器UI中显示补全建议;(5) Editor integration module: This module displays completion suggestions in the editor UI; 其特征在于:补全显示时,按照关键字、表名、列名排序,将SQL语句补全结果展示给用户,给较新的记录较高的排序分数,给较久的记录较低的排序分数,并在衰减函数中加入扰动。Its characteristics are as follows: when displaying completion, the SQL statement completion results are displayed to the user in order of keywords, table names, and column names, higher ranking scores are given to newer records, lower ranking scores are given to older records, and disturbances are added to the attenuation function. 4.如权利要求3所述的一种基于语法树的数据库安全管控平台SQL语句补全系统,其特征在于:编辑器集成模块中,根据关键字、表名、列名进行排序,具体包括以下步骤:4. A syntax tree-based database security management and control platform SQL statement completion system as claimed in claim 3, characterized in that: in the editor integration module, sorting is performed according to keywords, table names, and column names, specifically comprising the following steps: 3.1准备好表名序列xn,xn-1,…,x2,x1,并初始化空集合S用于存储已处理过的表名及其对应的排序分值,n为历史记录中的表名个数,xi表示序号为i的表名,1≤i≤n。3.1 Prepare the table name sequence xn , xn -1 , ..., x2 , x1 , and initialize the empty set S to store the processed table names and their corresponding sorting scores. n is the number of table names in the historical records, xi represents the table name with sequence number i, 1≤i≤n. 3.2计算参数其中,αmax和αmin分别为预设的参数α的最大值和最小值,log()表示以10为底的对数,nmax是预设的n的最大值;3.2 Calculation parameters Wherein, α max and α min are respectively the maximum and minimum values of the preset parameter α, log() represents the logarithm with base 10, and n max is the maximum value of the preset n; 3.3对序号为i的表名计算排序分值,i从1开始,循环计算到n,计算过程为:3.3 Calculate the ranking score for the table name with serial number i, i starts from 1 and loops to n. The calculation process is: ①计算排序分值f(i)=e-αi(1+b×sin2((1-α)i+c)),其中,sin()为正弦函数,b∈[0,1),c为相位偏移;① Calculate the ranking score f(i) = e -αi (1+b×sin 2 ((1-α)i+c)), where sin() is the sine function, b∈[0,1), and c is the phase offset; ②查询集合S中是否已存在表名xi的信息,如果表名xi信息已存在于集合S中,则从集合S中找出表名xi对应的排序分值,记为此时计算表名xi新的排序分值并将更新到集合S中;如果表名xi信息此前并未存在于集合S中,则令表名xi的排序分值并将添加到集合S中。② Query whether the information of table name xi already exists in the set S. If the information of table name xi already exists in the set S, find the ranking score corresponding to table name xi from the set S, which is recorded as At this time, calculate the new ranking score of table name x i and will Update to the set S; if the table name x i information does not exist in the set S before, then let the ranking score of table name x i and will Add to the set S. ③查看表名序列中表名xi后是否还有其他表名。如果还有其他表名,则跳转到步骤①;如果没有其他表名,说明已遍历完毕,转到步骤3.4。③ Check whether there are other table names after table name x i in the table name sequence. If there are other table names, jump to step ①; if there are no other table names, it means that the traversal has been completed, and go to step 3.4. 3.4按照排序分值从大到小对集合S中的表名排序。3.4 Sort the table names in set S according to the sorting scores from large to small.
CN202411789482.3A 2024-12-06 2024-12-06 A SQL statement completion method and system for database security management and control platform based on syntax tree Pending CN119719135A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411789482.3A CN119719135A (en) 2024-12-06 2024-12-06 A SQL statement completion method and system for database security management and control platform based on syntax tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411789482.3A CN119719135A (en) 2024-12-06 2024-12-06 A SQL statement completion method and system for database security management and control platform based on syntax tree

Publications (1)

Publication Number Publication Date
CN119719135A true CN119719135A (en) 2025-03-28

Family

ID=95092505

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411789482.3A Pending CN119719135A (en) 2024-12-06 2024-12-06 A SQL statement completion method and system for database security management and control platform based on syntax tree

Country Status (1)

Country Link
CN (1) CN119719135A (en)

Similar Documents

Publication Publication Date Title
JP3639972B2 (en) Method and apparatus for database modeling and query using a structure similar to natural language
US6594669B2 (en) Method for querying a database in which a query statement is issued to a database management system for which data types can be defined
US6327593B1 (en) Automated system and method for capturing and managing user knowledge within a search system
US6985899B2 (en) Method and system for composing a query for a database and traversing the database
JP2002297605A (en) Structured document search method, structured document search device, and program
JPH10509264A (en) Computer system to create semantic object model from existing relational database schema
US20050091199A1 (en) Method and system for generating SQL joins to optimize performance
WO2006098031A1 (en) Keyword managing apparatus
CN113326286A (en) Semantic analysis method supporting dialect SQL blood margin analysis
CN119025552A (en) A natural language intelligent number asking method, device and storage medium
JP2006172446A (en) Complex data access
CN100390794C (en) A method for organizing communication equipment command sets in the form of navigation tree
CN119272753B (en) Developing intelligent document parsing methods
CN119271697B (en) Method for reducing complex SQL generated by natural language by building table schema and SQL conversion
EP3816814B1 (en) Crux detection in search definitions
CN119719135A (en) A SQL statement completion method and system for database security management and control platform based on syntax tree
CN120030045A (en) A data query and analysis method and system based on LLM and semantic model
CN114676155B (en) Method for determining code prompt information, method for determining data set, and electronic device
CN109739835A (en) Method and device for saving data version
KR101020234B1 (en) Method and apparatus for creating a database application source
US20030028370A1 (en) System and method for providing a fixed grammar to allow a user to create a relational database without programming
CN118484197B (en) Configuration-driven SQL semantic relation construction device and method
McDowell A reusable component retrieval system for prototyping.
JP2002297603A (en) Information extraction method, structured document management device, and program
CN120929486A (en) Multi-round dialogue database query method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination