[go: up one dir, main page]

CN120578695A - User identification method, system and storage medium based on distributed data processing - Google Patents

User identification method, system and storage medium based on distributed data processing

Info

Publication number
CN120578695A
CN120578695A CN202510718267.2A CN202510718267A CN120578695A CN 120578695 A CN120578695 A CN 120578695A CN 202510718267 A CN202510718267 A CN 202510718267A CN 120578695 A CN120578695 A CN 120578695A
Authority
CN
China
Prior art keywords
user
redis
data
attribute
internal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510718267.2A
Other languages
Chinese (zh)
Inventor
张桃龙
彭磊
徐晋毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Zhongbang Bank Co Ltd
Original Assignee
Wuhan Zhongbang Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Zhongbang Bank Co Ltd filed Critical Wuhan Zhongbang Bank Co Ltd
Priority to CN202510718267.2A priority Critical patent/CN120578695A/en
Publication of CN120578695A publication Critical patent/CN120578695A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供基于分布式数据处理的用户标识方法、系统及存储介质,涉及大数据分析和用户行为追踪领域。主旨在于解决在多平台多渠道环境下用户唯一标识的生成、管理和关联问题,以及由此引发的数据处理效率不足和实时性差的问题。本发明主要方案包括基于Spark和Kafka的分布式数据流处理框架,实现对用户行为数据的高效解析、处理和分发;采用多级用户标识体系,包括设备ID、用户ID和全局ID的生成与关联;利用Redis高性能缓存实现标识映射关系的快速查询;通过Kafka主题分发机制将处理后的数据高效分发至不同的业务模块。本方案有效解决了大规模用户数据环境下标识管理的复杂性,提高了数据处理效率和实时性,为用户行为分析提供了可靠的数据基础。

The present invention provides a user identification method, system and storage medium based on distributed data processing, which relates to the fields of big data analysis and user behavior tracking. The main purpose is to solve the problems of generation, management and association of unique user identification in a multi-platform and multi-channel environment, as well as the problems of insufficient data processing efficiency and poor real-time performance caused by this. The main scheme of the present invention includes a distributed data stream processing framework based on Spark and Kafka to achieve efficient parsing, processing and distribution of user behavior data; adopt a multi-level user identification system, including the generation and association of device ID, user ID and global ID; use Redis high-performance cache to achieve fast query of identification mapping relationships; and efficiently distribute processed data to different business modules through the Kafka topic distribution mechanism. This solution effectively solves the complexity of identification management in a large-scale user data environment, improves data processing efficiency and real-time performance, and provides a reliable data foundation for user behavior analysis.

Description

User identification method, system and storage medium based on distributed data processing
Technical Field
The present invention relates to the technical field of big data analysis and user behavior tracking, and more particularly, to a method, a system and a storage medium for user identification based on distributed data processing.
Background
With the rapid development of mobile internet and various applications, users may use the same application on multiple platforms and multiple devices at the same time, resulting in dispersion of user behavior data in different channels. How to identify and correlate the behavior of the same user on different devices, and generate a unified user view, is an important challenge faced by the field of data analysis. Traditional user identification methods based on Cookies or device IDs are difficult to cope with such complex scenes, and particularly show obvious limitations in application scenes such as user cross-device behavior analysis and the like.
Meanwhile, as the business scale expands, the data volume of user behavior increases exponentially, and higher requirements are put on the throughput capacity and instantaneity of the data processing system. The traditional single machine processing mode or simple distributed architecture often has the problems of low processing efficiency, poor system stability and the like when facing to mass data, and cannot meet the requirement of business on real-time analysis of user behavior data.
In addition, the user identification system needs to support the dynamic development of the service, can flexibly adapt to the change of the requirements of different applications, and has higher requirements on the expandability of the system. Most of the existing user identification systems are designed aiming at specific scenes, lack enough universality and expansibility, and are difficult to support diversified business requirements.
Therefore, designing a high-efficiency, reliable and extensible user identification computing method, realizing unique identification and behavior association of users in a multi-device and multi-platform environment, and providing real-time data processing capability is a problem to be solved in the current data analysis field.
Disclosure of Invention
Aiming at the technical problems existing in the prior art, the invention provides a user identification method, a system and a storage medium based on distributed data processing, and the efficient processing and analysis of user behavior data are realized through Redis distributed storage, a multistage ID mapping mechanism and a real-time data processing technology. The method and the device solve the problems of complexity of user identification management, insufficient data processing efficiency and poor system expansibility in the prior art.
According to a first aspect of the present invention, there is provided a method of user identification based on distributed data processing, comprising the steps of:
receiving user behavior data and performing basic verification on the data;
Constructing a Redis Key of the equipment ID, inquiring and distributing the internal ID of the equipment in batches, and distributing the session ID;
Constructing a Redis Key of the user ID, and inquiring and distributing the user internal ID in batches;
and processing the user attribute, the equipment attribute and the event attribute to complete the standardization and completion of the data structure.
On the basis of the technical scheme, the invention can also make the following improvements.
Optionally, the user behavior data comprises a message object list, wherein each message comprises app_key, original data and user and device information, and the performing basic verification on the data comprises:
And carrying out JSON format verification and analysis on each message, marking the message with failed analysis as invalid, and recording an error log.
Checking the message structure according to a predefined schema, and screening out data with illegal structure;
and inquiring a database according to the app_key in the message, acquiring a corresponding application ID (identity) appId, and complementing the application ID to the message object.
Optionally, the Redis Key of the build device ID includes:
Traversing all effective messages, extracting an application ID and a device ID from each message, and splicing the application ID and the device ID into a Redis Key: "d: appId: { deviceId }";
the Redis Key for all unique device IDs is collected.
Optionally, the batch querying and assigning device internal IDs include:
Batch inquiring Redis, and obtaining the internal device ID corresponding to the Redis Key of each device ID currently, DEVICEINTERNALID;
For unassigned device ID Key, obtaining the current maximum device ID Key of the application ID as ID d: { appId }, self-increasing and assigning a new ID, writing in a Redis: hash structure, wherein field is the device ID, value is the new ID, and simultaneously self-increasing a counter.
And maintaining Redis keys of all the device IDs and the corresponding device internal IDs in a local Map.
Optionally, the constructing the rediskey of the user ID includes:
traversing all message event attributes, extracting user-defined IDs (userId), and splicing the user-defined IDs (userId) into Redis Key: "u: appId: { userId }";
the Redis Key of all unique user IDs is collected.
Optionally, the batch querying and assigning the user internal ID includes:
Obtaining the current maximum user internal ID of the application ID for the Redis Key of the unassigned user ID, self-increasing and assigning a new ID, and writing in the Redis;
And maintaining Redis Key of all user IDs and the corresponding user internal IDs in a local Map.
Optionally, the global ID allocation and mapping maintenance includes:
The opportunity to assign new globalId is to assign new globalId to "user internal ID" and "device internal ID" only if there is no mapping in Redis for both "user internal ID" and "device internal ID";
Priority merge to the existing globalId, new creation is only performed when none exists, and the fact that the same user or the same equipment always belongs to the unique globalId under the same application is ensured;
the global ID is written into the message structure, and the multi-level mapping relation of Redis is synchronously updated.
Optionally, the user attribute processing includes:
Traversing all the messages, and extracting custom attributes aiming at the data item with the type of usr;
Judging the attribute type, checking the attribute name length, and filtering illegal or ultra-long attributes;
Each legitimate attribute is assigned a unique attribute ID and type, looked up/registered through a cache or database, and written into a message structure.
Optionally, the device attribute processing includes:
extracting custom device attributes for a data item of type "pl";
judging the attribute type and the checking length, and filtering illegal or overlength attributes;
assigning unique attribute ID and type to each legal equipment attribute, searching/registering through a cache or a database, and writing in a message structure;
the event attribute processing includes:
extracting custom event attributes aiming at various event data items with types of 'evt' and the like;
judging the attribute type and the checking length, and filtering illegal or overlength attributes;
Each legal event attribute is assigned a unique attribute ID and type, looked up/registered through a cache or database, and written into a message structure.
According to a second aspect of the present invention there is provided a distributed data processing based subscriber identity system comprising:
The user data acquisition module is used for receiving user behavior data and performing basic verification on the data;
the device ID and session ID distribution module is used for constructing a Redis Key of the device ID and inquiring and distributing the internal ID of the device in batches;
The user ID and global ID distribution module is used for constructing Redis Key of the user ID, inquiring and distributing the user internal ID in batches, and carrying out global ID distribution and mapping maintenance;
and the attribute processing and data structure complementing module is used for processing the user attribute, the equipment attribute and the event attribute and completing the standardization and the complementation of the data structure.
According to a third aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of a user identification method based on distributed data processing.
The invention has the technical effects and advantages that:
The invention provides a user identification method, a system and a storage medium based on distributed data processing, which effectively solve the problems of user identification and association in a multi-platform multi-device environment by maintaining a plurality of mapping relations among device IDs, user IDs and global IDs, and provide a complete and unified data basis for user behavior analysis. By adopting a distributed computing framework based on Spark and Kafka and combining with Redis high-performance cache, the high-throughput and low-delay data processing capability is realized, and the system processing efficiency and instantaneity are remarkably improved. Through the modularized design and the Kafka theme distribution mechanism, the expandability and flexibility of the system are improved, the system can adapt to the change of requirements of different business scenes, and the sustainable development of business is supported. And a partitioning strategy based on keys is adopted in the data distribution link, so that the data of the same user are ensured to be sent to the same partition, the data consistency guarantee is provided for the subsequent user behavior analysis, and the accuracy of analysis results is improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
Fig. 1 is a flowchart of a user identification method based on distributed data processing according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the current user behavior analysis system has the following challenges:
1. The data consistency is that the mapping relation of the equipment ID, the user ID and the global ID is difficult to ensure consistency in a distributed environment;
2. real-time requirements, namely supporting high concurrency real-time data processing;
3. data quality, namely ensuring the accuracy of data cleaning and conversion;
4. system expansibility, namely processing of large-scale user behavior data needs to be supported;
5. multidimensional analysis, namely supporting multidimensional data analysis of user attributes, equipment attributes, event attributes and the like.
Based on the defects in the background art, the embodiment of the invention provides a user identification method based on distributed data processing, and particularly as shown in fig. 1, the method comprises the following steps:
s1, receiving user behavior data, and performing basic verification on the behavior data;
the performing basic verification on the data specifically includes:
The system receives user behavior data in batches, such as a message object list, wherein each message comprises app_key, original data, user and equipment information;
And carrying out JSON format verification and analysis on each message, marking the message with failed analysis as invalid, and recording an error log.
Checking the message structure according to a predefined schema, and screening out data with illegal structure;
and inquiring a database according to the app_key in the message, acquiring a corresponding application ID (identity) appId, and complementing the application ID to the message object.
S2, constructing a Redis Key of the equipment ID, and inquiring and distributing the internal ID of the equipment in batches;
REDIS KEYKV is the Key of storage component Redis. The Redis Key construction of the device ID specifically comprises:
traversing all valid messages, extracting an application ID and a device identifier deviceId from each message, and splicing the application ID and the device identifier to form a Redis Key in a manner of'd: appId: { deviceId }';
Collecting Redis Key of all unique device IDs facilitates subsequent batch operations.
The batch querying and assigning device internal IDs include:
Batch inquiring Redis, obtaining the current corresponding internal device ID of Redis Key of each device ID (DEVICEINTERNALID), obtaining the current maximum device internal ID of the application ID (Key is ID: d: { appId }), self-increasing and distributing new ID, writing Redis (Hash structure, field is deviceId, value is new ID), and self-increasing counter.
And maintaining Redis keys of all the device IDs and the corresponding device internal IDs in a local Map.
Write-back message structure
All valid messages are traversed and the device internal ID is obtained from the local Map according to "d: appId: { deviceId }".
The device internal ID is written into the user information field of the message and synchronously written into the attribute field of each event in the data list in the message.
S3, constructing a Redis Key of the user ID, inquiring and distributing the user internal ID in batches, and carrying out global ID distribution and mapping maintenance;
The Redis Key construction of the user ID specifically comprises:
traversing all message event attributes, extracting user-defined IDs (userId), and splicing the user-defined IDs (userId) into Redis Key: "u: appId: { userId }";
And collecting Redis keys of all unique user IDs, so that subsequent batch operation is facilitated.
Batch querying and assigning user internal IDs includes:
Batch inquiring Redis, obtaining the current corresponding user internal ID of Redis Key of each user ID userInternalId, obtaining the current maximum user internal ID of the application ID (Redis Key is ID: u: $ { appId }), self-increasing and distributing new ID, writing in Redis (Hash structure, field is userId, value is new ID), and self-increasing counter.
And maintaining Redis Key of all user IDs and the corresponding user internal IDs in a local Map.
Write-back message structure
All valid messages are traversed and the user internal ID is written into the event attribute field.
Global ID allocation and mapping maintenance includes:
The following mapping relationship is mainly maintained in Redis:
Device internal ID→global ID (dz: $ { appId }: $ { DEVICEINTERNALID } → globalId);
user internal ID→global ID (uz: $ { appId }: $ { userInternalId } → globalId);
global id→user internal ID (zu: $ { appId }: $ { globalId } → userInternalId);
The opportunity to assign a new globalId is that a new globalId is assigned to the combination only if no mapping exists in Redis for both the "user internal ID" and the "device internal ID".
The attribution of new globalId, the newly allocated globalId will simultaneously establish the following mappings:
device internal ID → globalId
User internal ID→ globalId (if user ID is present)
GlobalId →user internal ID (if there is user ID)
The merging principle is that the merging is preferentially conducted to the existing globalId, and new creation is conducted only when none exists, so that the fact that the same user or the same equipment always belongs to the unique globalId under the same application is ensured. globalId's allocation strictly guarantees the uniqueness and incrementation of the same user-device merger under the same application.
And writing the global ID into a message structure, synchronously updating the multi-level mapping relation of Redis, and ensuring the uniqueness and consistency of the equipment, the user and the global ID.
And S4, processing the user attribute, the equipment attribute and the event attribute to complete standardization and completion of the data structure.
The user attribute processing includes:
traversing all the messages, and extracting the custom attribute aiming at the data item with the type of usr.
Judging the attribute type (such as character string, numerical value, boolean, etc.), checking the length of the attribute name, and filtering illegal or ultra-long attributes.
Each legitimate attribute is assigned a unique attribute ID and type, looked up/registered through a cache or database, and written into a message structure.
The device attribute processing includes:
custom device attributes are extracted for data items of type "pl".
Judging the attribute type and the check length, and filtering illegal or ultra-long attributes.
Each legitimate device attribute is assigned a unique attribute ID and type, looked up/registered through a cache or database, and written into a message structure.
The event attribute processing includes:
For various event data items with the type of evt and the like, the custom event attribute is extracted.
Judging the attribute type and the check length, and filtering illegal or ultra-long attributes.
Each legal event attribute is assigned a unique attribute ID and type, looked up/registered through a cache or database, and written into a message structure.
Data structure standardization and complementation:
And organizing information such as equipment, users, events and the like into a standardized data structure, and complementing all IDs and attribute information.
The data collection and enqueuing application buried point SDK collects user behavior data including event types, event attributes, equipment information, user information and the like, and sends the data to the Kafka message queue. The message queue uses behavior _data as a subject name and is responsible for storing buried point data from each application.
Initializing consumer system initialization SPARK STREAMING context and Kafka consumer, setting relevant parameters, creating direct current receiving Kafka message. The main parameters include consumer group ID, auto-commit configuration, maximum message size, etc.
Message consumption and processing SPARK STREAMING consumes Kafka messages in batches per second, encapsulates the messages into objects, contains information such as topics, partitions, offsets, key values and the like, and then delivers the objects to a real-time processing program for processing.
The batch message processing real-time handler receives the list of messages and processes each message according to the following steps.
The JSON parsing checks whether the message is in a legal JSON format, parses the message content, and stores the parsing result in the data field of the message object.
Data verification verifies whether the data structure meets the basic requirements according to a predefined schema, and unsatisfactory messages are marked as invalid.
The application identification queries the database according to the application key (app_key) in the message, obtains the corresponding application ID, and stores in the appId field of the message object.
The device ID is generated to generate a device ID for the message;
The format is "d: { appId }: { DEVICEIDENTIFIER }, where DEVICEIDENTIFIER is from the device identifier in the message.
Session management adds session ID and UUID for a particular type of event (evt, ss, se, mkt, abp) for tracking user sessions.
The user ID is generated to add a user ID to the data in the format "u: { appId }: { customUserId }", where customUserId is from a user-defined identifier in the message.
Global ID generation generates a global ID for a message, processing logic is as follows:
extracting a device ID and a user ID from the message;
Constructing Redis query keys, device appId, deviceId and user appId, userId;
batch inquiring Redis to obtain global ID mapping corresponding to the equipment ID and the user ID;
the global ID is determined according to the following rules:
If a user ID exists and a corresponding global ID already exists, the global ID is used, if a user ID does not exist but a device ID already exists a corresponding global ID is used, and if none exists, a new global ID is generated (by incrementing the counter of the application: ID: { appId }).
Updating the mapping relation in Redis, expressed as:
mapping of device ID to Global ID device: { appId }
Mapping of user ID to Global ID user: { appId }
Mapping of Global ID to user ID IDs: { appId }
The global ID is added to the message attributes.
The user identification system is constructed by the following three levels of user identification systems, including:
The device ID ($system_did) is generated based on the device unique identifier in the format "d: { appId }: { DEVICEIDENTIFIER }" for the device.
User ID ($system_uid) is generated based on the user-defined identifier in the format "u: { appId: { customUserId }".
Global ID ($system_id), a globally unique identifier generated by the system, for associating the device ID with the user ID.
User attribute processing processes user-related custom attributes and generates unique identifiers for the user attributes.
The device attributes process information associated with the device and generate a unique identifier for the device attributes.
The event attribute processing processes the custom event and generates a unique identifier for the event attribute.
Data structure construction a data distribution structure is constructed, comprising:
organizing device information into a device data structure;
organizing user information into a user data structure;
organizing event information into an event data structure;
Integrating global ID information and establishing association between data;
Data distribution distributes the processed data to different Kafka topics:
transmitting the complete data to a system_total theme;
Send data to the system_total_random topic with appId _ systemId as a key;
Transmitting the user data to a system_user theme;
Transmitting the device data to a system_device theme;
sending event data to a system_event topic;
the data transmission adopts an asynchronous mode, the network transmission overhead is reduced by using a Snappy compression algorithm, and the data of the same user is ensured to be transmitted to the same partition through the partition strategy of the key.
Through the detailed implementation mode, the invention realizes the unique identification and behavior association of the user in the multi-platform multi-device environment, provides high-efficiency, reliable and extensible data processing capability, and provides a solid data base for user behavior analysis.
In summary, the method for identifying the user based on the distributed data processing according to the embodiment of the present invention implements unique identification and behavior association of the user in a multi-platform and multi-device environment through a distributed data processing technology, and provides efficient data processing and distribution capabilities. The system builds a real-time data processing frame based on SPARK STREAMING and Kafka, uses Redis as a high-performance cache to store a user identification mapping relation, and realizes generation, management and association of user identifications through a multi-level identification system (equipment ID, user ID and global ID).
According to a second aspect of the present invention there is provided a distributed data processing based subscriber identity system comprising:
The user data acquisition module is used for receiving user behavior data and performing basic verification on the data;
the device ID and session ID distribution module is used for constructing a Redis Key of the device ID and inquiring and distributing the internal ID of the device in batches;
The user ID and global ID distribution module is used for constructing Redis Key of the user ID, inquiring and distributing the user internal ID in batches, and carrying out global ID distribution and mapping maintenance;
and the attribute processing and data structure complementing module is used for processing the user attribute, the equipment attribute and the event attribute and completing the standardization and the complementation of the data structure.
It may be understood that the user identification system based on distributed data processing provided by the present invention corresponds to the user identification method based on distributed data processing provided in the foregoing embodiments, and relevant technical features of the user identification system based on distributed data processing may refer to relevant technical features of the user identification method based on distributed data processing, which are not described herein again.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the steps of implementing the distributed data processing based user identification method provided by the methods above.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
It should be noted that the foregoing description is only a preferred embodiment of the present invention, and although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood that modifications, equivalents, improvements and modifications to the technical solution described in the foregoing embodiments may occur to those skilled in the art, and all modifications, equivalents, and improvements are intended to be included within the spirit and principle of the present invention.

Claims (10)

1.基于分布式数据处理的用户标识方法,其特征在于,包括以下步骤:1. A user identification method based on distributed data processing, characterized in that it comprises the following steps: 接收用户行为数据,并对所述行为数据进行基础校验;Receive user behavior data and perform basic verification on the behavior data; 构建设备ID的Redis Key,批量查询与分配设备内部ID;Build the Redis Key of the device ID, and batch query and assign the internal ID of the device; 构建用户ID的Redis Key,批量查询与分配用户内部ID,并进行全局ID分配与映射维护;Build the Redis key of the user ID, batch query and assign user internal IDs, and perform global ID allocation and mapping maintenance; 进行用户属性、设备属性和事件属性处理, 完成数据结构标准化与补全。Process user attributes, device attributes, and event attributes to complete data structure standardization and completion. 2.根据权利要求1所述的基于分布式数据处理的用户标识方法,其特征在于,所述用户行为数据包括:消息对象列表,每条消息包含 app_key、原始数据和用户与设备信息;所述对所述行为数据进行基础校验包括:2. The user identification method based on distributed data processing according to claim 1, wherein the user behavior data includes: a list of message objects, each message containing an app_key, raw data, and user and device information; and the basic verification of the behavior data includes: 对每条消息进行JSON格式校验与解析,解析失败的消息标记为无效,并记录错误日志;Perform JSON format verification and parsing on each message. Messages that fail parsing are marked as invalid and an error log is recorded. 根据预定义schema对消息结构进行校验,筛除结构不合法的数据;Verify the message structure according to the predefined schema and filter out data with illegal structure; 根据消息中的 app_key 查询数据库,获取对应的应用ID:appId,并补全到消息对象。Query the database based on the app_key in the message, obtain the corresponding application ID: appId, and complete it in the message object. 3.根据权利要求1所述的基于分布式数据处理的用户标识方法,其特征在于,所述构建设备ID的Redis Key包括:3. The user identification method based on distributed data processing according to claim 1, wherein the Redis Key for constructing the device ID comprises: 遍历所有有效消息,从每条消息中提取应用ID和设备ID,拼接成 Redis Key:“d:appId:{deviceId}”Traverse all valid messages, extract the application ID and device ID from each message, and concatenate them into a Redis key: "d:appId:{deviceId}" 收集所有唯一的设备ID的Redis Key。Collect the Redis keys of all unique device IDs. 4.根据权利要求1所述的基于分布式数据处理的用户标识方法,其特征在于,所述批量查询与分配设备内部ID包括:4. The user identification method based on distributed data processing according to claim 1, wherein the batch query and allocation of device internal IDs comprises: 批量查询 Redis,获取每个设备ID的Redis Key 以及当前对应的内部设备ID;Batch query Redis to obtain the Redis key of each device ID and the current corresponding internal device ID; 对于未分配的设备ID Key,获取该应用ID当前的最大设备内部ID:Key为 id:d:{appId},自增分配新ID,并写入 Redis:Hash 结构,field为设备ID,value为新ID,同时自增计数器;For unassigned device ID keys, obtain the current maximum internal device ID for the application ID: the key is id:d:{appId}, automatically assign a new ID, and write it to Redis: Hash structure, with the field being the device ID and the value being the new ID, while also automatically incrementing the counter; 将所有设备ID的Redis Key 与其对应的设备内部ID 维护在本地 Map 中。Maintain the Redis keys of all device IDs and their corresponding internal device IDs in a local Map. 5.根据权利要求1所述的基于分布式数据处理的用户标识方法,其特征在于,所述构建用户ID的Redis Key包括:5. The user identification method based on distributed data processing according to claim 1, wherein the Redis Key for constructing the user ID comprises: 遍历所有消息事件属性,提取用户自定义ID:userId,拼接成 Redis Key:“u:appId:{userId}”;Traverse all message event attributes, extract the user-defined ID: userId, and concatenate them into a Redis key: "u:appId:{userId}"; 收集所有唯一的用户ID 的Redis Key。Collect the Redis keys of all unique user IDs. 6.根据权利要求1所述的基于分布式数据处理的用户标识方法,其特征在于,所述批量查询与分配用户内部ID 包括:6. The user identification method based on distributed data processing according to claim 1, wherein the batch query and allocation of user internal IDs comprises: 批量查询 Redis,获取每个用户ID的Redis Key 当前对应的用户内部ID:userInternalId;对于未分配的用户ID的Redis Key,获取该应用ID当前的最大用户内部ID,自增分配新ID,并写入 Redis;Batch query Redis to obtain the current user internal ID corresponding to the Redis key of each user ID: userInternalId; for the Redis key of unassigned user IDs, obtain the current maximum user internal ID of the application ID, automatically increment and allocate a new ID, and write it to Redis; 将所有用户ID的Redis Key 与其对应的用户内部ID 维护在本地 Map 中。Maintain the Redis Key of all user IDs and their corresponding internal user IDs in a local Map. 7.根据权利要求1所述的基于分布式数据处理的用户标识方法,其特征在于,所述全局ID分配与映射维护包括:7. The user identification method based on distributed data processing according to claim 1, wherein the global ID allocation and mapping maintenance comprises: 分配新 globalId 的时机:只有当“用户内部ID”和“设备内部ID”都未在 Redis 中存在映射时,才为“用户内部ID”和“设备内部ID”分配新的 globalId;When to allocate a new globalId: Only when neither the "user internal ID" nor the "device internal ID" is mapped in Redis, a new globalId is allocated for the "user internal ID" and "device internal ID"; 优先归并到已有的 globalId,只有都不存在时才新建,确保同一用户或同一设备在同一应用下始终归属于唯一的 globalId;Prioritize merging into existing globalIds, and only create a new one if none exist. This ensures that the same user or device always has a unique globalId under the same app. 将全局ID写入消息结构,并同步更新 Redis 的多级映射关系。Write the global ID into the message structure and synchronously update the multi-level mapping relationship of Redis. 8.根据权利要求1所述的基于分布式数据处理的用户标识方法,其特征在于,所述用户属性处理包括:8. The user identification method based on distributed data processing according to claim 1, wherein the user attribute processing comprises: 遍历所有消息,针对类型为“usr”的数据项,提取自定义属性;Traverse all messages and extract custom attributes for data items of type "usr"; 判断属性类型,校验属性名长度,过滤非法或超长属性;Determine the attribute type, check the attribute name length, and filter out illegal or overlong attributes; 为每个合法属性分配唯一属性ID和类型,通过缓存或数据库查找/注册,并写入消息结构;Assign a unique attribute ID and type to each legal attribute, look up/register it through the cache or database, and write it into the message structure; 所述设备属性处理包括:The device attribute processing includes: 针对类型为“pl”的数据项,提取自定义设备属性;For data items of type "pl", extract custom device attributes; 判断属性类型、校验长度,过滤非法或超长属性;Determine attribute type, check length, and filter illegal or overlength attributes; 为每个合法设备属性分配唯一属性ID和类型,通过缓存或数据库查找/注册,并写入消息结构;Assign a unique attribute ID and type to each legal device attribute, look up/register it through cache or database, and write it into the message structure; 所述事件属性处理包括:The event attribute processing includes: 针对类型为“evt”等各种类型的事件数据项,提取自定义事件属性;Extract custom event attributes for event data items of various types such as "evt"; 判断属性类型、校验长度,过滤非法或超长属性;Determine attribute type, check length, and filter illegal or overlength attributes; 为每个合法事件属性分配唯一属性ID和类型,通过缓存或数据库查找/注册,并写入消息结构。Assign a unique attribute ID and type to each legal event attribute, look up/register it through the cache or database, and write it into the message structure. 9.基于分布式数据处理的用户标识系统,其特征在于,包括:9. A user identification system based on distributed data processing, characterized by comprising: 用户数据获取模块,用于接收用户行为数据,并对所述数据进行基础校验;A user data acquisition module is used to receive user behavior data and perform basic verification on the data; 设备ID与会话ID分配模块,用于构建设备ID的Redis Key,批量查询与分配设备内部ID;Device ID and session ID allocation module, used to construct the Redis key of device ID, batch query and allocate device internal ID; 用户ID与全局ID分配模块,用于构建用户ID的Redis Key,批量查询与分配用户内部ID,并进行全局ID分配与映射维护;User ID and global ID allocation module, used to build Redis keys for user IDs, batch query and allocate user internal IDs, and perform global ID allocation and mapping maintenance; 属性处理与数据结构补全模块,用于进行用户属性、设备属性和事件属性处理, 完成数据结构标准化与补全。The attribute processing and data structure completion module is used to process user attributes, device attributes, and event attributes, and complete data structure standardization and completion. 10.一种计算机可读存储介质,其特征在于,其上存储有计算机管理类程序,所述计算机程序被处理器执行时实现如权利要求1-8任一项所述的基于分布式数据处理的用户标识方法。10. A computer-readable storage medium, characterized in that a computer management program is stored thereon, and when the computer program is executed by a processor, the user identification method based on distributed data processing according to any one of claims 1 to 8 is implemented.
CN202510718267.2A 2025-05-30 2025-05-30 User identification method, system and storage medium based on distributed data processing Pending CN120578695A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510718267.2A CN120578695A (en) 2025-05-30 2025-05-30 User identification method, system and storage medium based on distributed data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510718267.2A CN120578695A (en) 2025-05-30 2025-05-30 User identification method, system and storage medium based on distributed data processing

Publications (1)

Publication Number Publication Date
CN120578695A true CN120578695A (en) 2025-09-02

Family

ID=96863425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510718267.2A Pending CN120578695A (en) 2025-05-30 2025-05-30 User identification method, system and storage medium based on distributed data processing

Country Status (1)

Country Link
CN (1) CN120578695A (en)

Similar Documents

Publication Publication Date Title
US11411897B2 (en) Communication method and communication apparatus for message queue telemetry transport
JP4132441B2 (en) Data management device for managed objects
US6487581B1 (en) Apparatus and method for a multi-client event server
CN108134764A (en) A kind of Distributed data share exchange method and system
US20090300181A1 (en) Methods and systems for dynamic grouping of enterprise assets
CN112256954A (en) A message push processing method and related system
CN101442558B (en) Method and system for providing index service for P2SP network
CN111427613B (en) Application program interface API management method and device
CN111026709A (en) Data processing method and device based on cluster access
CN101360345A (en) A data service management method, device and system
CN109298937A (en) File parsing method and network device
CN114443940A (en) A message subscription method, device and device
CN113761079A (en) Data access method, system and storage medium
CN107766207A (en) Distributed automatic monitoring method, system, computer-readable recording medium and terminal device
CN114969441A (en) Knowledge mining engine system based on graph database
CN107612833A (en) A kind of URI method for routing and relevant apparatus based on storage system
US12282479B2 (en) Intelligent parity service with database query optimization
JP4009591B2 (en) Domain naming system (DNS) for accessing databases
CN111984505A (en) Operation and maintenance data acquisition engine and acquisition method
CN111858617A (en) User searching method and device, computer readable storage medium and electronic equipment
WO2025025734A1 (en) Task execution method and apparatus, device and storage medium
US20190182356A1 (en) Data networking method in data-centric network system and apparatus implementing same
CN105978744A (en) Resource allocation method, device and system
CN118426937A (en) Port resource allocation method, device, equipment, storage medium and program product
CN112235367B (en) Method, system, terminal and storage medium for subscribing entity behavior relation message

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination