[go: up one dir, main page]

WO2019095768A1 - Procédé de filtrage d'informations d'utilisateur, serveur, et support de stockage lisible par ordinateur - Google Patents

Procédé de filtrage d'informations d'utilisateur, serveur, et support de stockage lisible par ordinateur Download PDF

Info

Publication number
WO2019095768A1
WO2019095768A1 PCT/CN2018/102396 CN2018102396W WO2019095768A1 WO 2019095768 A1 WO2019095768 A1 WO 2019095768A1 CN 2018102396 W CN2018102396 W CN 2018102396W WO 2019095768 A1 WO2019095768 A1 WO 2019095768A1
Authority
WO
WIPO (PCT)
Prior art keywords
user information
probability
correct probability
correct
piece
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2018/102396
Other languages
English (en)
Chinese (zh)
Inventor
徐国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Smart Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Smart Technology Co Ltd filed Critical OneConnect Smart Technology Co Ltd
Publication of WO2019095768A1 publication Critical patent/WO2019095768A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Definitions

  • the present application relates to the field of data analysis and application technologies, and in particular, to a user information screening method, a server, and a computer readable storage medium.
  • the entered policy data has also exploded.
  • the original source is basically imported manually, so the error message will inevitably occur during the manual import process.
  • the user information data in the information database cannot be accurately classified according to the accuracy, and the user information data with high accuracy cannot be accurately selected.
  • the present application proposes a user information screening method, a server, and a computer readable storage medium to quickly and accurately filter comprehensive and accurate user information data from a complex information database.
  • the present application provides a server, where the server includes a memory, a processor, and a memory information filtering program executable on the processor, where the user information screening program is
  • the processor When the processor is executed, the following steps are performed: reading each piece of user information; determining a correct probability of each element in each piece of user information according to a preset determination rule; and calculating corresponding user information according to a correct probability of the element The correct probability is selected; the user information whose correct probability is greater than the preset probability threshold is selected for classification.
  • the present application further provides a user information screening method, where the method is applied to a server, the method includes: reading each piece of user information; and determining each piece of user information according to a preset determination rule. The correct probability of the element in the middle; the correct probability of the corresponding user information is calculated according to the correct probability of the element; and the user information whose correct probability is greater than the preset probability threshold is selected for the correctness classification.
  • the present application further provides a computer readable storage medium storing a user information screening program, the user information screening program being executable by at least one processor, so that The at least one processor performs the steps of the user information screening method as described above.
  • the user information screening method, the server and the computer readable storage medium proposed by the present application may first determine the correct probability of the elements constituting the user information, and then calculate the corresponding user by the correct probability of the element. The correctness level of the information, so that the comprehensive and accurate user information data is quickly and correctly selected from the complicated information database.
  • FIG. 1 is a schematic diagram of an optional hardware architecture of a server
  • FIG. 2 is a schematic diagram of a program module of a first embodiment of a user information screening program of the present application
  • FIG. 3 is a schematic diagram of a program module of a second embodiment of the user information screening program of the present application.
  • FIG. 4 is a schematic flowchart of a first embodiment of a method for screening user information according to the present application
  • FIG. 5 is a schematic flowchart of a second embodiment of a method for screening user information according to the present application.
  • FIG. 1 it is a schematic diagram of an optional hardware architecture of the server 1.
  • the server 1 may be a computing device such as a rack server, a blade server, a tower server, or a rack server.
  • the server 1 may be a standalone server or a server cluster composed of multiple servers.
  • the server 1 may include, but is not limited to, the memory 11, the processor 12, and the network interface 13 being communicably connected to each other through a system bus.
  • the server 1 connects to the network through the network interface 13 (not shown in FIG. 1), and acquires or transmits all information including user information data.
  • the network may be an intranet, an Internet, a Global System of Mobile communication (GSM), a Wideband Code Division Multiple Access (WCDMA), a 4G network, or a 5G network.
  • Wireless or wired networks such as networks, Bluetooth, Wi-Fi, and call networks.
  • Figure 1 only shows the server 1 with the components 11-13, but it should be understood that not all illustrated components are required to be implemented, and more or fewer components may be implemented instead.
  • the memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), and a random access memory (RAM). , static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like.
  • the memory 11 may be an internal storage unit of the server 1, such as a hard disk or memory of the server 1.
  • the memory 11 may also be an external storage device of the server 1, such as a plug-in hard disk equipped with the server 1, a smart memory card (SMC), and a secure digital (Secure Digital). , SD) card, flash card (Flash Card), etc.
  • the memory 11 can also include both the internal storage unit of the server 1 and its external storage device.
  • the memory 11 is generally used to store an operating system installed in the server 1 and various types of application software, such as program codes of the user information screening program 200. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.
  • the processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 12 is typically used to control the overall operation of the server 1, such as performing data interaction or communication related control and processing, and the like.
  • the processor 12 is configured to run program code or process data stored in the memory 11, such as running the user information screening program 200 and the like.
  • the network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the server 1 and other electronic devices.
  • the present application proposes a user information screening program 200.
  • FIG. 2 it is a program module diagram of the first embodiment of the user information screening program 200 of the present application.
  • the user information screening program 200 includes a series of computer program instructions stored in the memory 11, and when the computer program instructions are executed by the processor 12, the user information filtering operation of the embodiments of the present application may be implemented. .
  • the user information screening program 200 can be divided into one or more modules based on the particular operations implemented by the various portions of the computer program instructions. For example, in FIG. 2, the user information screening program 200 can be divided into a reading module 201, a determining module 202, a computing module 203, and an output module 204. among them:
  • the reading module 201 is configured to read each piece of user information.
  • the user information screening program 200 may acquire user information data stored by the other electronic devices according to a user instruction; when the server 1 When the user information data is stored, the user information screening program 200 may directly acquire the user information data stored by the server 1.
  • the determining module 202 is configured to respectively determine a correct probability of an element in each piece of user information according to a preset determining rule.
  • each element may be compared with a determination rule of a correct probability corresponding to the element, thereby determining a correct probability of the element.
  • the user information in the entered policy data generally has a plurality of fields, including the name, the ID number, the mobile phone number, the mailbox, and the identification and encoding.
  • the name consists of the first name and the last name, the last name includes the hundred family names, and the name is composed of 1-6 Chinese characters; the ID number is composed of 18 digits, and the first 6 digits are the division code of the administrative district.
  • 7 to 14 digits are the date of birth code, 15th to 17th digits are sequential codes, and the checksum of all digits is a specific value;
  • the mobile phone number is composed of 11 digits, the first 3 digits of network identification number, 4
  • the -7 digit is the regional code;
  • the mailbox is composed of the username +@+mail server domain name, and the username is composed of letters, numbers and other common characters (such as underscores, addition and subtraction symbols), and the mail server domain name is through the Internet.
  • the network connection test can connect to the server domain name. Therefore, the determining module 202 can use the feature as a determination rule of the correct probability of the element, the ID card number, the mobile phone number, the mailbox, and the like in the user information.
  • the determining module 202 determines that the correct probability of the element is 1.
  • the determining module 202 determines that the correct probability of the element is a value less than one. For example, in the case of the name of the user information, the name does not exceed six Chinese characters, and the last name is included in the hundred names.
  • the correct probability of the name can be determined to be 90%; when the surname includes non-Chinese characters, the correct probability of the name can be determined to be 30%; when the name is composed of more than 6 Chinese characters, it can be determined
  • the correct rate of the name is 80%; when the name includes non-Chinese characters, the correct probability of the name can be determined to be 30%; when the above-mentioned one error condition occurs in both the first name and the first name, the correct probability of the name can be determined as the correctness of each of them
  • the ID number should be 18 digits, and the first 6 digits are the administrative division code, and the 7th to 14th digits are valid birth date codes.
  • the checksum of all digits is a specific one. value.
  • the correct probability of the ID card number is 40%; when the ID card number contains exactly 18, and the first 6 digits of the ID number are the administrative division code, the 7th Bits up to 14 are valid birth date codes, but the checksum of all digits is not a specific value, then the correct probability of determining the positive number is 80%; when the ID number contains non-numbers, it can be determined The correct probability of the ID number is 30%.
  • the mobile phone number is 11 digits
  • the current 3 digit network identification number, and 4-7 digits are area codes.
  • the correct probability of the mobile phone number can be determined to be 80%, when the mobile phone number is less than 11 digits If the composition, or the mobile number contains non-numeric, the correct probability of determining the mobile number is 30%.
  • the mailbox consists of the username +@+mail server domain name, and the username has the specified character format.
  • the determining module 203 compares the elements in each piece of user information acquired by the obtaining module 201 with the determining rule corresponding to the element, and can directly determine the correct probability of the element.
  • the ID number in the user information contains 18 digits, and the first 6 digits of the identity positive number are administrative division codes, and the 7th to 14th digits are valid birth date codes, but all digits are verified. And if it is not a specific value, the correct probability of the identity positive number is 80%.
  • the mobile phone number in the user information is composed of 11 digits, the first 3 digits of the network identification number, and the 4-7 digits are area codes, and the correct probability of the mobile phone number is 1.
  • the mailbox in the user information is composed of a username +@+mail server domain name, and the composition of the mailbox user name conforms to a preset rule, but the server domain name that cannot be connected through the Internet connection test is the mailbox
  • the correct probability is 50%. That is to say, in the read user information, the correct probability of the name is 80%, the correct probability of the identity card number is 80%, the correct probability of the mobile phone number is 1, and the correct probability of the mailbox is 50%.
  • the calculating module 203 is configured to calculate a correct probability of the corresponding user information according to the correct probability of the element.
  • the calculating module 203 respectively assigns a correct probability weight value to each element in the user information determined by the determining module 202, and then according to the correct probability of each element and the corresponding correct probability weight value. Calculate the correct probability of the user information.
  • the calculation module presets that the correct weight of the name is 0.3, the correct probability weight of the ID number is 0.3, the correct probability weight of the mobile phone number is 0.2, and the correct probability weight of the mailbox is 0.2.
  • the determining module 202 determines that the user information has a correct probability of 80%, the correct probability of the ID number is 80%, the correct probability of the mobile phone number is 1, and the correct probability of the mailbox is 50%.
  • the calculation module may calculate the correct probability of the corresponding user information according to the correct probability weight value of each element of the setting and integrate the correct probability of each element.
  • the output module 204 is configured to select user information whose correct probability is greater than a preset probability threshold to perform correctness classification.
  • the output module 204 presets at least one probability threshold; and then compares the correct probability of each piece of user information calculated by the calculating module 203 with the at least one probability threshold, thereby obtaining the accuracy of each piece of user information. level.
  • the output module 204 when the output module 204 is set with a probability threshold, the output module 204 directly outputs user information with a correct probability of user information greater than or equal to the probability threshold as correct user information.
  • the output module 204 when the output module 204 is configured with two probability thresholds or more than two probability thresholds, the output module 204 may respectively compare the correct probabilities of the user information with all the probability thresholds, thereby outputting The correctness level of a user's information. For example, when there are two probability thresholds, the output module 204 compares the correct probability of the user information calculated by the calculation module 203 with a preset first threshold and a second threshold, the first threshold being greater than the Second threshold.
  • the correctness of the user information is low; when the correct probability of the user information is less than the second threshold, it is determined that the accuracy of the user information is too low, and the user information is error information.
  • the accuracy level of the user information is then output in a form, document, graphic or other form.
  • the user information screening program 200 includes the decoding module 201, the determination module 202, the calculation module 203, and the output module 204 in the first embodiment, and includes a decomposition module 205, and a setting. Module 206.
  • the reading module 201, the determining module 202, the calculating module 203, and the output module 204 have the same functions as the corresponding program modules in the first embodiment of the user information screening program 200. I won't go into details here. Since the user information data that is sometimes entered does not decompose the user information into elements and save them to specific fields. Therefore, after the reading module 201 reads the user information, before the determining module 203 determines the correct probability of the elements in the user information, the decomposition module 205 and the setting module 206 are also required to perform processing.
  • the decomposition module 205 is configured to decompose user information into at least one element.
  • the decomposition module 205 first decomposes the user information into a name including a name, a mobile phone number, an ID card number, and the like according to content included in the user information, such as “name”, “mobile phone”, “identity”, “mailbox”, and the like. Elements such as mailboxes.
  • the decomposition module 205 can directly identify the characteristic content in the user information, and decompose the user information into elements according to the characteristic content. If the content of each user information needs to include the characteristic content, the content is decomposed as an element of the user information.
  • the setting module 206 is configured to set a composition format of each of the at least one element, and set a determination rule of a correct probability of each element according to a composition format of each element.
  • the setting module 206 may set a composition format of each element according to characteristics of the element, and then according to each element The format is composed, and the judgment rule of the correct probability of each element is set. For example, after the decomposition module 205 decomposes the user information into elements such as a name, an ID number, a mobile phone number, a mailbox, and the like, the setting module 206 first sets each element according to the characteristics of the name, the ID number, the mobile phone number, and the mailbox.
  • the composition format such as the name consists of the first name and the first name, the surname includes the hundred family names, and the name is composed of 1-6 Chinese characters; the ID number is composed of 18 digits, the first 6 is the administrative division code, the first 7 to 14 digits are the date of birth code, 15th to 17th digits are sequential codes, and the checksum of all digits is a specific value; the mobile phone number is composed of 11 digits, the first 3 digits of network identification number, 4 -7 digits are area codes, 8-11 digits are user numbers; mailboxes are composed of username +@+mail server domain name, and user names are composed of letters, numbers and other common characters (such as underscores, addition and subtraction symbols).
  • the mail server domain name is the server domain name that can be connected through the Internet connection test.
  • the setting module 206 also sets a judgment rule of the correct probability of the element according to the composition format of each element. For example, in the case of the name of the user information, the name does not exceed six Chinese characters, and the last name is included in the hundred names.
  • the surname is a Chinese character that is not in the family name
  • the correct probability of the name can be determined to be 90%
  • the surname includes non-Chinese characters the correct probability of the name can be determined to be 30%
  • the name is composed of more than 6 Chinese characters, it can be determined
  • the correct rate of the name is 80%
  • the correct probability of the name can be determined to be 30%
  • the correct probability of the name can be determined as the correctness of each of them
  • the product of the probabilities, such as the first name and the last name are not Chinese characters.
  • the ID number should be 18 digits, and the first 6 digits are the administrative division code, and the 7th to 14th digits are valid birth date codes.
  • the checksum of all digits is a specific one. value.
  • the correct probability of the ID card number is 40%; when the ID card number contains exactly 18, and the first 6 digits of the ID number are the administrative division code, the 7th Bits up to 14 are valid birth date codes, but the checksum of all digits is not a specific value, then the correct probability of determining the positive number is 80%; when the ID number contains non-numbers, it can be determined The correct probability of the ID number is 30%.
  • the mobile phone number is 11 digits, the current 3 digit network identification number, and 4-7 digits are area codes.
  • the correct probability of the mobile phone number can be determined to be 80%, when the mobile phone number is less than 11 digits If the composition, or the mobile number contains non-numeric, the correct probability of determining the mobile number is 30%.
  • the mailbox consists of the username +@+mail server domain name, and the username has the specified character format.
  • the present application also proposes a user information screening method.
  • FIG. 4 it is a schematic flowchart of the first embodiment of the user information screening method of the present application.
  • the order of execution of the steps in the flowchart shown in FIG. 4 may be changed according to different requirements, and some steps may be omitted.
  • step S500 each piece of user information is read.
  • the user information data stored by the other electronic devices may be read according to a user instruction; when the server 1 stores the user information data
  • the user information data stored by the server 1 can also be directly read.
  • Step S502 Determine, according to a preset determination rule, a correct probability of an element in each piece of user information.
  • each element may be compared with a determination rule of a correct probability corresponding to the element, thereby determining a correct probability of the element.
  • the user information in the entered policy data generally has a plurality of fields, including the name, the ID number, the mobile phone number, the mailbox, and the identification and encoding.
  • the name consists of the first name and the last name, the last name includes the hundred family names, and the name is composed of 1-6 Chinese characters; the ID number is composed of 18 digits, and the first 6 digits are the division code of the administrative district.
  • 7 to 14 digits are the date of birth code, 15th to 17th digits are sequential codes, and the checksum of all digits is a specific value;
  • the mobile phone number is composed of 11 digits, the first 3 digits of network identification number, 4
  • the -7 digit is the regional code;
  • the mailbox is composed of the username +@+mail server domain name, and the username is composed of letters, numbers and other common characters (such as underscores, addition and subtraction symbols), and the mail server domain name is through the Internet.
  • the network connection test can connect to the server domain name. Therefore, the feature can be used as a judgment rule of the correct probability of the element such as the name, the ID number, the mobile phone number, and the mailbox in the user information.
  • the correct probability of the element when the element in the user information meets the judgment rule of the corresponding correct probability, it is determined that the correct probability of the element is 1.
  • the correct probability of the element is a value less than one. For example, in the case of the name of the user information, the name does not exceed six Chinese characters, and the last name is included in the hundred names.
  • the correct probability of the name can be determined to be 90%; when the surname includes non-Chinese characters, the correct probability of the name can be determined to be 30%; when the name is composed of more than 6 Chinese characters, it can be determined
  • the correct rate of the name is 80%; when the name includes non-Chinese characters, the correct probability of the name can be determined to be 30%; when the above-mentioned one error condition occurs in both the first name and the first name, the correct probability of the name can be determined as the correctness of each of them
  • the ID number should be 18 digits, and the first 6 digits are the administrative division code, and the 7th to 14th digits are valid birth date codes.
  • the checksum of all digits is a specific one. value.
  • the correct probability of the ID card number is 40%; when the ID card number contains exactly 18, and the first 6 digits of the ID number are the administrative division code, the 7th Bits up to 14 are valid birth date codes, but the checksum of all digits is not a specific value, then the correct probability of determining the positive number is 80%; when the ID number contains non-numbers, it can be determined The correct probability of the ID number is 30%.
  • the mobile phone number is 11 digits
  • the current 3 digit network identification number, and 4-7 digits are area codes.
  • the correct probability of the mobile phone number can be determined to be 80%, when the mobile phone number is less than 11 digits If the composition, or the mobile number contains non-numeric, the correct probability of determining the mobile number is 30%.
  • the mailbox consists of the username +@+mail server domain name, and the username has the specified character format.
  • Step S504 calculating a correct probability of the corresponding user information according to the correct probability of the element.
  • each element in the user information is respectively assigned a correct probability weight value, and then the correct probability of the user information is calculated according to the correct probability of each element and the corresponding correct probability weight value.
  • the specific calculation process is: multiplying the correct probability of each element by the corresponding correct probability weight value, and then adding, thereby obtaining the correct probability of the user information.
  • the correct probability weight of the name is 0.3
  • the correct probability weight of the ID number is 0.3
  • the correct probability weight of the mobile phone number is 0.2
  • the correct probability weight of the mailbox is 0.2.
  • the correct probability of the name is 80%
  • the correct probability of the identity card number is 80%
  • the correct probability of the mobile phone number is 1, and the correct probability of the mailbox is 50%.
  • Step S506 Select user information whose correct probability is greater than the preset probability threshold to perform correctness classification.
  • At least one probability threshold is preset; then the calculated correct probability of each piece of user information is compared with the at least one probability threshold, thereby obtaining a level of accuracy of each piece of user information.
  • a probability threshold when a probability threshold is set, user information whose correct probability of user information is greater than or equal to the probability threshold is directly output as correct user information.
  • the correct probability of the user information can be separately compared with all the probability thresholds, respectively, thereby outputting a level of accuracy of the user information. For example, when there are two probability thresholds, the correct probability of the user information is compared with a preset first threshold and a second threshold, the first threshold being greater than the second threshold.
  • the correctness of the user information is low; when the correct probability of the user information is less than the second threshold, it is determined that the accuracy of the user information is too low, and the user information is error information.
  • the accuracy level of the user information is then output in a form, document, graphic or other form.
  • the user information screening method provided in this embodiment may be configured to read each piece of user information, determine the correct probability of the element in each piece of user information according to a preset determination rule, and then determine the correct probability according to the element. Calculate the correct probability of the corresponding user information, and finally select the user information whose correct probability is greater than the preset probability threshold to classify the correctness. In this way, it is possible to quickly and correctly filter comprehensive and accurate user information data from a complex information database and provide a reference for accuracy.
  • FIG. 5 it is a schematic flowchart of a second embodiment of the user information screening method of the present application.
  • the steps S600-S606 of the user information screening method are similar to the steps S500-S506 of the first embodiment, except that the method further includes steps S608-S610.
  • steps S608-S610 are also required. among them:
  • Step S608 the user information is decomposed into at least one element.
  • the user information is first decomposed into elements including a name, a mobile phone number, an ID card number, a mailbox, and the like according to content included in the user information, such as "name”, “mobile phone”, “identity”, “mailbox” and the like.
  • content included in the user information such as "name”, "mobile phone”, “identity”, "mailbox” and the like.
  • the character recognition is a relatively common technical means, the characteristic content in the user information can be directly recognized, and the user information is decomposed into elements according to the characteristic content, and each When the user information includes the characteristic content, the content is decomposed as an element of the user information.
  • Step S610 setting a composition format of each of the at least one element, and setting a judgment rule of a correct probability of each element according to a composition format of each element.
  • a composition format of each element may be set according to characteristics of the element, and then each element is set according to a composition format of each element The rule of judgment of the correct probability.
  • the composition format of each element may be set according to the characteristics of the name, the ID number, the mobile phone number, and the mailbox, such as the name by the first name and the last name.
  • the surname includes a hundred family names, and the name is composed of 1-6 Chinese characters; the ID card number is composed of 18 digits, the first 6 digits are the administrative division code, and the 7th to 14th digits are the birth date code.
  • the 15th to 17th digits are sequential codes, and the checksum of all digits is a specific value;
  • the mobile phone number is composed of 11 digits, the first 3 digits of network identification number, 4-7 digits are area codes, 8-11
  • the digit is the user number;
  • the mailbox is composed of the username +@+mail server domain name, the username is composed of letters, numbers and other common characters (such as underscore, plus or minus symbols), and the mail server domain name is connected through the Internet. Test the domain name of the server to which you can connect.
  • the judgment rule of the correct probability of the element is set.
  • the name does not exceed six Chinese characters, and the last name is included in the hundred names.
  • the correct probability of the name can be determined to be 90%; when the surname includes non-Chinese characters, the correct probability of the name can be determined to be 30%; when the name is composed of more than 6 Chinese characters, it can be determined
  • the correct rate of the name is 80%; when the name includes non-Chinese characters, the correct probability of the name can be determined to be 30%; when the above-mentioned one error condition occurs in both the first name and the first name, the correct probability of the name can be determined as the correctness of each of them
  • the product of the probabilities, such as the first name and the last name are not Chinese characters.
  • the ID number should be 18 digits, and the first 6 digits are the administrative division code, and the 7th to 14th digits are valid birth date codes.
  • the checksum of all digits is a specific one. value.
  • the correct probability of the ID card number is 40%; when the ID card number contains exactly 18, and the first 6 digits of the ID number are the administrative division code, the 7th Bits up to 14 are valid birth date codes, but the checksum of all digits is not a specific value, then the correct probability of determining the positive number is 80%; when the ID number contains non-numbers, it can be determined The correct probability of the ID number is 30%.
  • the mobile phone number is 11 digits, the current 3 digit network identification number, and 4-7 digits are area codes.
  • the correct probability of the mobile phone number can be determined to be 80%, when the mobile phone number is less than 11 digits If the composition, or the mobile number contains non-numeric, the correct probability of determining the mobile number is 30%.
  • the mailbox consists of the username +@+mail server domain name, and the username has the specified character format.
  • the user information screening method provided in this embodiment may be configured to read each piece of user information, determine the correct probability of the element in each piece of user information according to a preset determination rule, and then determine the correct probability according to the element. Calculate the correct probability of the corresponding user information, and finally select the user information whose correct probability is greater than the preset probability threshold to classify the correctness. In this way, it is possible to intelligently, quickly and correctly filter out comprehensive and accurate user information data from a complex information database, and provide a reference for correctness.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better.
  • Implementation Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Information Transfer Between Computers (AREA)
  • Computer And Data Communications (AREA)

Abstract

L'invention concerne un procédé de filtrage d'informations d'utilisateur, un serveur, et un support de stockage lisible par ordinateur. Le procédé consiste à : lire chaque information d'utilisateur (S500); déterminer respectivement, selon une règle de détermination prédéfinie, les probabilités d'exactitude d'éléments dans chaque information d'utilisateur (S502); calculer la probabilité d'exactitude des informations d'utilisateur correspondantes en fonction des probabilités d'exactitude des éléments (S504); et sélectionner les informations d'utilisateur, dont la probabilité d'exactitude est supérieure à une valeur de seuil de probabilité prédéfinie, pour effectuer une classification d'exactitude (S506). Au moyen du procédé de filtrage d'informations d'utilisateur, du serveur et du support de stockage lisible par ordinateur, des informations d'utilisateur complètes et précises peuvent être sélectionnées rapidement et correctement à partir d'une base de données d'informations complexes.
PCT/CN2018/102396 2017-11-15 2018-08-27 Procédé de filtrage d'informations d'utilisateur, serveur, et support de stockage lisible par ordinateur Ceased WO2019095768A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711130640.4A CN107977404B (zh) 2017-11-15 2017-11-15 用户信息筛选方法、服务器及计算机可读存储介质
CN201711130640.4 2017-11-15

Publications (1)

Publication Number Publication Date
WO2019095768A1 true WO2019095768A1 (fr) 2019-05-23

Family

ID=62013519

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/102396 Ceased WO2019095768A1 (fr) 2017-11-15 2018-08-27 Procédé de filtrage d'informations d'utilisateur, serveur, et support de stockage lisible par ordinateur

Country Status (2)

Country Link
CN (1) CN107977404B (fr)
WO (1) WO2019095768A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378043A (zh) * 2021-06-03 2021-09-10 北京沃东天骏信息技术有限公司 用户筛选的方法和装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977404B (zh) * 2017-11-15 2020-08-28 深圳壹账通智能科技有限公司 用户信息筛选方法、服务器及计算机可读存储介质
CN110705942A (zh) * 2019-10-10 2020-01-17 环旭电子股份有限公司 条码信息的筛选方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120166307A1 (en) * 2010-12-23 2012-06-28 Alibaba Group Holding Limited Determination of permissibility associated with e-commerce transactions
CN103500195A (zh) * 2013-09-18 2014-01-08 小米科技有限责任公司 分类器更新方法、装置、系统及设备
CN105825367A (zh) * 2016-03-16 2016-08-03 聚相投资管理(上海)有限公司 一种云端智能服务器及其在邮件分类中的应用
CN106650783A (zh) * 2015-10-30 2017-05-10 李静涛 用于移动终端数据分类、生成、匹配的方法、装置及系统
CN107977404A (zh) * 2017-11-15 2018-05-01 上海壹账通金融科技有限公司 用户信息筛选方法、服务器及计算机可读存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002952106A0 (en) * 2002-10-15 2002-10-31 Silverbrook Research Pty Ltd Methods and systems (npw008)
US20160051167A1 (en) * 2012-10-10 2016-02-25 Invensense, Inc. System and method for activity classification
CN103888254B (zh) * 2012-12-21 2017-05-31 阿里巴巴集团控股有限公司 一种网络验证信息的方法和装置
CN105589885B (zh) * 2014-10-24 2019-07-02 阿里巴巴集团控股有限公司 一种数据一致性校验的方法及系统
CN106326776A (zh) * 2015-07-02 2017-01-11 阿里巴巴集团控股有限公司 基于规则的数据对象验证方法、装置、系统及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120166307A1 (en) * 2010-12-23 2012-06-28 Alibaba Group Holding Limited Determination of permissibility associated with e-commerce transactions
CN103500195A (zh) * 2013-09-18 2014-01-08 小米科技有限责任公司 分类器更新方法、装置、系统及设备
CN106650783A (zh) * 2015-10-30 2017-05-10 李静涛 用于移动终端数据分类、生成、匹配的方法、装置及系统
CN105825367A (zh) * 2016-03-16 2016-08-03 聚相投资管理(上海)有限公司 一种云端智能服务器及其在邮件分类中的应用
CN107977404A (zh) * 2017-11-15 2018-05-01 上海壹账通金融科技有限公司 用户信息筛选方法、服务器及计算机可读存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378043A (zh) * 2021-06-03 2021-09-10 北京沃东天骏信息技术有限公司 用户筛选的方法和装置

Also Published As

Publication number Publication date
CN107977404B (zh) 2020-08-28
CN107977404A (zh) 2018-05-01

Similar Documents

Publication Publication Date Title
JP6594988B2 (ja) 住所テキストを処理する方法及び機器
CN108768929B (zh) 电子装置、征信反馈报文的解析方法及存储介质
US10645105B2 (en) Network attack detection method and device
CN112861648A (zh) 文字识别方法、装置、电子设备及存储介质
CN111666971A (zh) 基于位置定位的事件识别方法、装置、设备及存储介质
CN111177129B (zh) 标签体系的构建方法、装置、设备及存储介质
CN113434674A (zh) 数据解析方法、装置、电子设备及可读存储介质
WO2019019636A1 (fr) Procédé d'identification d'utilisateur, dispositif électronique et support d'informations lisible par ordinateur
CN112036579A (zh) 多分类模型自学习在线更新方法、系统及装置
CN108241529B (zh) 薪资计算方法、应用服务器及计算机可读存储介质
CN112579621B (zh) 数据展示方法、装置、电子设备及计算机存储介质
WO2019095768A1 (fr) Procédé de filtrage d'informations d'utilisateur, serveur, et support de stockage lisible par ordinateur
CN111343162A (zh) 系统安全登录方法、装置、介质及电子设备
CN113254672A (zh) 异常账号的识别方法、系统、设备及可读存储介质
CN111046087A (zh) 一种数据处理方法、装置、设备及存储介质
CN110517154A (zh) 数据模型训练方法、系统及计算机设备
CN107944931A (zh) 种子用户拓展方法、电子设备及计算机可读存储介质
CN112199374A (zh) 针对数据缺失的数据特征挖掘方法及其相关设备
CN114900835A (zh) 恶意流量智能检测方法、装置及存储介质
CN112650741B (zh) 异常数据的识别与修正方法、系统、设备及可读存储介质
US20160378817A1 (en) Systems and methods of identifying data variations
CN113824717B (zh) 一种配置检查方法及装置
CN110599278B (zh) 聚合设备标识符的方法、装置和计算机存储介质
CN110351330B (zh) 数据上传方法、装置、计算机设备及存储介质
CN110968584B (zh) 一种画像生成系统、方法、电子设备及可读存储介质

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 22/09/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18879445

Country of ref document: EP

Kind code of ref document: A1