WO2015134019A1

WO2015134019A1 - Processing primary key modifications

Info

Publication number: WO2015134019A1
Application number: PCT/US2014/020940
Authority: WO
Inventors: Alistair Veitch; Kimberly Keeton; Charles B MORREY III; Charles Stuart JOHNSON
Original assignee: Hewlett Packard Development Co LP
Current assignee: Hewlett Packard Development Co LP
Priority date: 2014-03-05
Filing date: 2014-03-05
Publication date: 2015-09-11
Anticipated expiration: 2016-09-05

Abstract

Method for processing a primary key modification in a system comprising a first device and a second device is provided. The second device supports a processing pipeline with a plurality of processing stages to process data updates from a plurality of update sources. The primary key modification operation is from one of the plurality of update sources to modify a source primary key to a destination primary key, and it is associated with a timestamp. The first device retrieves the primary key modification operation from the second device. When data updates from all of the plurality of update sources up until the timestamp have been up observed, the first device applies the primary key modification operation by generating a delete operation to delete the source primary key and a create operation to create the destination primary key. The first device provides the delete operation and the create operation to the processing pipeline of the second device to generate an output data structure that is accessible to process a query.

Description

Processing Primary Key Modifications

Background

[0001 ] Data mining, analysis, and search often make up a substantial portion of enterprise application workloads. Examples of data that are the subject of data mining, analysis, and search include purchase transactions, news updates, web search results, email notifications, hardware or software monitoring observations, and so forth.

Brief Description of Drawings

[0002] By way of non-limiting examples, the present disclosure will be described with reference to the following drawings, in which:

[0003] Fig. 1 is a flowchart of a process for processing a primary key modification using a processing pipeline, according to examples of the present disclosure;

[0004] Fig. 2 is a schematic diagram of a system for processing a primary key modification, according to examples of the present disclosure;

[0005] Fig. 3 is a schematic diagram of the system in Fig. 2, further illustrating data updates at various stages of a processing pipeline, according to examples of the present disclosure;

[0006] Fig. 4 is a schematic diagram illustrating arrival of a primary key modification operation and other data updates over time, according to examples of the present disclosure;

[0007] Fig. 5 is a schematic diagram of the system in Fig. 2, further illustrating processing of a primary key modification operation to rename a directory name, according to examples of the present disclosure; [0008] Fig. 6 is a flowchart of a detailed process for processing a primary key modification, according to examples of the present disclosure; and

[0009] Fig. 7 is a schematic diagram of a structure of a device capable of acting as an update source, server system or client device, according to examples of the present disclosure.

Detailed Description

[0010] To improve the ability to locate the content of various data stored across an organization, metadata associated with such data from many sources may be uploaded to a server system (or multiple server systems) to allow users to submit queries against the server system(s) to locate data based on the metadata. Examples of metadata include metadata computed based on content of the data, including hashes (e.g., produced by applying hash functions on data), term vectors (e.g., containing terms in the data), fingerprints, feature vectors. Other examples of metadata include file system metadata, such as file owners or creators, file size and security attributes, or information associated with usage of the data, such as access frequency statistics.

[001 1 ] The server system is designed to support data updates from multiple sources across the organization (e.g., up to hundreds of thousands or even millions for a large organization). A "data update" may generally refer to a creation, modification, and/or deletion of data. Since there may be a relatively large amount of data updates to upload to the server system, it may take a relatively long period of time before the data updates are available for access by queries submitted to the server system.

[0012] To improve efficiency, the server system may implement a "processing pipeline" with multiple processing stages for performing respective data processing. After one processing stage has completed, the processing stage sends processed data updates to another processing stage for further processing. The processing stages are arranged to sequentially apply processing of data updates that pass through the processing pipeline, thereby allowing the process to be parallelized.

[0013] Fig. 1 is a flowchart of a process 100 for processing a primary key modification in a system that includes a first device and a second device, according to examples of the present disclosure. The second device supports a processing pipeline with a plurality of processing stages to process data updates from a plurality of update sources. The first device supports the following techniques.

[0014] At 1 10, the first device retrieves a primary key modification operation from the second device. The primary key modification operation is from one of the plurality of update sources to modify a source primary key to a destination primary key. The primary key modification operation is associated with a timestamp.

[0015] At 120, the first device determines whether data updates have been observed from all of the plurality of update sources up until the timestamp of the primary key modification operation.

[0016] At 130, in response to an affirmative determination, the first device applies the primary key modification operation by generating a delete operation to delete the source primary key and a create operation to create the destination primary key.

[0017] At 140, the first device provides the delete operation and the create operation to the processing pipeline of the second device, such that they are transformed into an output data structure that is accessible to process a query. As shown at 150, the delete operation and the create operation are

transformed into the output data structure at the second device. [0018] In general, it is difficult to support data updates relating to primary key modifications using a processing pipeline having multiple processing stages. For example, the processing pipeline may process data updates in batches, each batch being self-contained. However, primary key modifications, by their very nature, do not result in self-contained updates because the new primary key may affect all other data updates in the current batch and following batches. As such, simple read-modify-write operations are not supported by the processing pipeline because of the asynchronous processing nature of the processing stages.

[0019] According to examples of the present disclosure, a "complete prefix" of data updates is observed prior to applying the primary key modification operation. The complete prefix, which includes data updates up until the timestamp of the primary key modification operation, is observed to ensure correct processing of the data updates. Once the complete prefix has been observed, delete operation and create operation are generated for the primary key modification and uploaded to the processing pipeline. Data updates in the complete prefix may include data updates associated with the source primary key, even if they are still being processed by the processing pipeline.

[0020] Any suitable system with a processing pipeline and multiple update sources may implement process 100 for processing primary key modifications, such as a distributed file system, etc. For example in a distributed file system, there is a possibility of out-of-order data update observations, i.e. data updates having an earlier timestamp being observed at a later time than the primary key modification operation. If the primary key modification operation is applied before the earlier data updates are observed, subsequent processing of the earlier data updates will be incorrect because the source primary key has been modified. Any suitable primary key modification may be performed using process 100 in Fig. 1 , such as renaming of file name, and directory name, modification of a unique object ID in a catalogue of objects, modification a social security ID in a personnel file, etc. [0021 ] Process 100 may also be implemented in other systems, such as a distributed object-based storage system, etc. In this case, the update sources may be nodes storing the data for the objects. In another example, process 100 may be implemented in sensor data collection environments, where the update sources are sensors that generate periodic readings.

[0022] Distributed file system

[0023] Fig. 2 illustrates a distributed file system 200 in which primary key modifications may be processed, according to examples of the present disclosure. Distributed file system 200 may be implemented using any suitable file system.

[0024] Distributed file system 200 is generally scalable and includes multiple nodes 210-1 and 210-2 (collectively referred to as "nodes 210" or individually as a generic "node 21 0", two shown for simplicity) that monitor file system events relating to data updates to be performed. Nodes 210 may include or otherwise access a journal 220 that records data updates as journal entries.

[0025] Each node 210 may support "segments" 21 2, which are logically contiguous pieces of storage built out of any combination of physical storage. A "file system" is generally a concatenation of segments 212 supported by nodes 210. For example, if there are 10 nodes 210 supporting two segments 212 each, there will be a total of 20 segments in the file system. Multiple nodes 210 may also form a "cluster" in the distributed file system 200. Individual nodes 210 may be logically or physically distinct. In distributed file system 200, node 210, or more specifically segment 212 of node 21 0, may be an "update source" (also known as "event source") from which data updates (or events) originate.

[0026] Nodes 210 may include kernel level logic (not shown for simplicity) to detect kernel level events that correspond to intent to perform or initiation of kernel level operations. Examples of kernel level operations include delete, read and write, etc. Kernel level events generally identify parameters that are relevant to the corresponding operation, such as file name, timestamp, number of bytes read, user, etc.

[0027] Nodes 210 may further include user level logic (also not shown for simplicity) to detect user level events that correspond to the node initiating user level operations. User level events may also identify parameters relevant to corresponding operations, such as file name, user-defined tag, user name and timestamp, etc.

[0028] Journal entries in journal 220 are outputs from the kernel of node 210. In a storage system for archive data, a journal entry is part of an "archive journal file" or "AJ file". Journal 220 may be implemented using any suitable journal, e.g. a Journaling Block Device (JBD) that acts as the journal for the Third Extended File System (Ext3), etc. Entries in the journal 220 may be metadata (instead of file content) representing a corresponding data update. Individual entries in the journal 220 may be marked "uncommitted" until confirmation is received that the contents of the operations represented by the entries have been written to non-volatile storage (e.g., hard disk) within file system 200. Different types of data updates may be recorded in journal 220. If a system crash or power failure occurs, journal 220 may be used as a

"checkpoint" to recover unsaved information.

[0029] Distributed file system 200 further includes upload entity 230 to retrieve data updates 232 recorded in journal entries 220 and upload them to processing pipeline 244. It should be noted that "upload entity" 230 may be a physical or logical entity that is on the same physical device as an update source, or separately on a different physical device. For example, upload entity 230 may be a journal scanning or aggregation component (also referred to as "Archive Journal (AJ) Scanner") that transforms journal entries 220 into a format readable by server system 240. Upload entity 230 also orders data updates based on timestamps and aggregates them into a batch of updates.

[0030] A batch of data updates 232 may form a "self-consistent update (SCU)" that represents a single atomic unit of data updates. An SCU is not considered "durable" until all the individual updates in the SCU are written to stable

(persistent) storage. Atomic application of data updates 232 of an SCU to stable storage means that all data updates 232 of the SCU are applied, or none is applied at all. Atomicity prevents partial data updates of the SCU, which can cause data consistency problems. Processing pipeline 244 supports primary key modification operations because it is capable of applying batches of data updates atomically, thereby continuously maintaining a consistent view of the data.

[0031 ] Each SCU may contain insert, update and delete operations. Upload entity 230 may form an SCU until either a sufficient amount of time has passed (e.g., based on a timeout threshold) or a sufficient amount of data has been collected (e.g., based on some predefined size).

[0032] Processing pipeline

[0033] In the example in Fig. 2, server system 240 includes coordinator 242 and processing pipeline 244 with multiple processing stages: ingest stage 248, sorting stage 250 and merging stage 252. In general, stages 248-252 are logically distinct and may be run on the same physical device or different physical devices. If run on different physical devices, server system 240 represents a logical system that supports the stages 248-252 of processing pipeline 244.

[0034] Coordinator 242 manages or controls different stages of processing pipeline 244. Coordinator 242 also schedules each SCU to be processed by the next stage. When each stage completes processing, it sends a message to coordinator 242. This allows coordinator 242 to track which upload entities 230 in the system hold which SCUs and what stages of processing they have completed.

[0035] Each stage 248, 250, 252 of processing pipeline 244 may be individually and independently scalable. Parallelism in each stage 248, 250, 252 may be enhanced by implementing the stage using more processors. Data updates 232 in SCUs from upload entities 230 are processed by processing pipeline 244 in stages as follows.

[0036] Ingest stage 248 durably stores incoming data updates 232 from upload entities 230 to generate at least one unsorted SCU. When stored "durably", data updates of the unsorted SCU are not lost upon some error condition or power failure. Unsorted SCUs may be stored in an update data structure associated with ingest stage 248.

[0037] Sorting stage 250 is a transformation stage that sorts unsorted SCUs to generate sorted SCUs, such as according to a primary key. Sorted SCUs may be stored in an update data structure associated with sorting stage 250.

[0038] Merging stage 252 is a further transformation stage that combines multiple sorted SCUs into a merged SCU, which generally represents a single set of authority data structure 256 that is accessible to process a query 262 from a query client device 260.

[0039] By merging sorted SCUs together, the query cost of retrieving fresher results may be reduced because there are fewer SCUs that must be examined by the query 262. Any suitable merging process may be used, such as a tree- based merging process, etc. In this case, SCUs are placed into a tree as leaf nodes and once a sufficient number of SCUs are available or a sufficient time has passed, they are merged together. [0040] Throughout the present disclosure, the term "authority" 254 (used also known as "authority data structure", "authority table", "base table" and "output data structure") generally refers to a data structure or set of data structures that is searched in response to data queries 262 from query client devices 260. For example, authority 254 may be organized as a table of named, typed columns, where the columns are described in the table schema and each row of the table provides a distinct instance. The authority 254 may represent the minimal amount of data that must be queried to retrieve a result. Data store 256 may store multiple authority tables 254.

[0041 ] Examples of authority data structure 254 may include the following:

[0042] Data structure to store metadata associated with File Lifetime (FL), such as create time and delete time associated with a file, etc.

[0043] Data structure to store metadata associated with Change Log (CL), such as last activity time and last activity reason, etc.

[0044] Data structure to store metadata associated with File Delete Time (FDT), such as timestamp of deletion, etc.

[0045] Data structure to store metadata associated with File Objects (FO), such as modification date, size and any custom metadata, etc. For example, a row in the data structure is created for each file in the distributed file system 200.

[0046] Server system 21 0 may also maintain update data structures at various stages 248, 250, 252 of processing pipeline 244. Similar to authority 254, an update data structure may be organized as a table of named, typed columns, where the columns are described in the table schema and each row of the table provides a distinct instance. An update data structure may have the same schema as the associated authority 254, as well as additional columns to indicate the type of operation and a timestamp. Data in the update data structure is to be applied to authority 254 after being processed through processing pipeline 244.

[0047] Processing pipeline 244 provides the ability to trade query result freshness for query performance. "Query result freshness" refers to how up-to- date data should be for a response to a query. In some applications, a query client 260 may want a relatively quick response 264 to a query 262, but it may be willing to accept results that are out-of-date (e.g., out-of-date by a certain time period, such as 12 hours, one day, etc.). On the other hand, another query client 260 may want an up-to-date result, but it may be willing to accept a slower response time to a query 262. Through the use of a pipelined architecture, different stages of processing pipeline 244 may be selectively accessed depending on user's requirements of freshness and/or query performance.

[0048] To process a query 262, server system 240 may selectively access authority data structure 254 or update data structure of any stage 248, 250, 252 of processing pipeline 244. The selection is a trade-off between query result freshness and query performance. Accessing authority data structure 254 provides better query performance, but the response 264 retrieved may not be up-to-date (since there may be various data updates being processed in the different stages of processing pipeline 244).

[0049] On the other hand, accessing stages 248, 250, 252 would increase the amount of time to process the query 242 but provide fresher (i.e. more up-to- date results). The amount of time depends upon which stage is accessed; accessing a later stage involves less query processing time compared with an earlier one. For example, accessing content of sorted and merged update tables provided by sorting stage 250 and merging stage 252 takes less time than accessing unsorted update tables maintained by ingest stage 248. [0050] Although not shown in Fig. 2, processing pipeline 244 may further include an ID (identifier) remapping stage between ingest stage 248 and sorting stage 250 to transform initial IDs of SCUs into global IDs to generate remapped SCUs. In particular, an ID remapping stage maps an ID in a first space to an ID in a second space that is searchable. Initial IDs used by ingest stage 248 are assigned to each entity (e.g., file names, etc.) as those entities are processed and remapped to global ID space later to achieve enhanced pipeline performance when text fields like file system pathnames are extremely long. Substituting short numeric binary fields for long strings requiring lengthy string processing reduces overhead at some steps.

[0051 ] Fig. 3 is a schematic diagram of the system in Fig. 2, further illustrating data updates at various stages 248, 250, 252 of processing pipeline 244, according to examples of the present disclosure. In this example, Node 1 and Node 2 support a total of four segments (labelled Segment 1 to Segment 4) that form File System 1 . Node 1 and Node 2 output journal entries that are stored in respective archive journal files in per-segment directories in journal 220, such as AJFile_ 1 to AJFile_3 in directory /S_1/ for segment 1 , AJFile_ 1 to AJFile_5 in directory /S_2/ for segment 2, and so on.

[0052] Upload entity 230 (e.g. AJ Scanner) sorts journal entries 220 according to their timestamps and convert them from journal entry format to a format readable by server system 240. The converted journal entries are uploaded to ingest stage 248 of processing pipeline 244 in the form of SCUs 232.

[0053] Uploaded SCUs 232 are durably stored at ingest stage 248, before being transformed by sorting stage 250 and merging stage 252. For example, at 31 0 in Fig. 3, ingest stage 248 durably stores SCU_57Xo SCU_65. At 320 in Fig. 3, sorting stage 250 sorts the SCUs into sorted SCUs according to at least one key, producing sorted Table A and Table B of SCU_57, etc. At 330 in Fig. 3, tables of sorted SCUs SCU_57 to SCU_65 axe merged during merging stage 252 into SCU_57_65/Table A, SCU_57_65/Table B and SCO _57_65/T able C. [0054] Complete prefix of data updates

[0055] Referring to Fig. 1 again, a primary key modification operation is applied after it is determined that data updates have been observed by processing pipeline 244 from all update sources (e.g., segments 212 of the file system 200) up until the timestamp of the primary key modification at blocks 120 and 130. As will be explained below, this is to ensure correct processing of data updates.

[0056] Fig. 4 is a schematic diagram illustrating arrival of a primary key modification operation and other data updates at processing pipeline 244 over time, according to examples of the present disclosure. For simplicity, the file system has three segments in represented by boxes labelled "Segment S1 " (i.e. Segment 1 ), "Segment S2" (i.e. Segment 2) and "Segment S3" (i.e.

Segment 3). Each data update (indicated generally at 402 for simplicity) is associated with a timestamp indicating when the data update occurs, according to the originating segment.

[0057] Timelines 41 0, 412 and 414 indicate various data updates from S1 , S2 and S3 respectively over time. Timestamps 7^~S6 to TS12 indicate when the data updates occur at the originating segment, according to the originating segment. The data updates occur according to their timestamp labelling order, for example data update with TS6 occurs before TS7, TS8 and so on. In practice, timestamps from the same source may be compared directly, but timestamps from different segments are compared by considering a clock skew among the segments. Timestamps within that clock skew may be considered as occurring "at the same time."

[0058] In this example, journal 220 includes two data updates from S1 with timestamps 7^~S9 and TS11 respectively, three from S2 with timestamps TS7, TS9 and TS10 respectively and three from S3 with timestamps TS6, 7^"S7 and TS12 respectively. Data update 420 from S2 is a primary key modification operation (i.e. move DirA DirB) to rename DirA as DirB that is associated with timestamp TS9. At upload entity 230, data updates with different timestamps from a single segment are batched together, such as SCU_1281 with data updates from S1 , SCU_1280 from S2 and SCU_1279 \χοπ\ S3.

[0059] Since system 200 in Fig. 2 is a distributed file system, data updates from various segments may arrive at processing pipeline 244 in an out-of-order manner. For example, timeline 430 indicates different arrival times of

SCU_ 1281, SCU_1280 an6 SCU_ 1279 at processing pipeline 244, i.e.

SCU_ 1279 is first uploaded, followed by SCU_ 1280 and then SCU_ 1281.

Although data update with timestamp 7^~S72 from S3 occurs at a later time, it is uploaded as part of SCU_1279 before data updates with earlier timestamps (i.e. VS7 to TS1 1) from segments S1 and S2. Similarly, although the data updates with timestamp TS8 from S1 occurs earlier, it is uploaded at a later time in the primary key modification operation 420 with timestamp TS9 from S2.

[0060] If primary key modification operation 420 with timestamp TS9 from S2 is applied before the (earlier) data update with timestamp 7^"S8 from S1 is observed, it may adversely affect the 7^"S8 data update because source primary key DirA will be deleted. To determine whether a complete prefix of data updates have been observed from all S1 , S2 and S3 up until the timestamp of the primary key modification (i.e. TS9), a value called "minimum freshness" is used.

[0061 ] For each segment, server system 240 records a freshness timestamp representing the time the latest data update for the segment. The minimum of these freshness timestamps across all segments, i.e. min(freshness), represents the segment for which the system has the oldest information.

Referring to 440 in the example in Fig. 4, the freshness timestamps and minimum freshness change over time as SCUs are uploaded, i.e. at 442 (before SCU_1279 is uploaded), 444 (after SCU_1279 is uploaded but before SCU_1280), 446 (after SCU_1279 and SCU_1280 are uploaded but before SCU_1281) and 448 (after all SCUs are uploaded).

[0062] At 442 in Fig. 4, before SCU_1279 is uploaded, the freshness timestamps for S1 , S2 and S3 are TS7, TS4 and TS5 respectively, in which case the min(freshness) = minimum( 7^"S7, TS4, TS5) = TS4. At 444 in Fig. 4, after SCU_1279 is uploaded, the freshness timestamp of S3 is updated from 7^"S5 to TS12. Since data updates from S1 and S2 have not been uploaded, their freshness timestamp is unchanged. The minimum freshness across S1 ,

52 and S3 is therefore min(freshness) = mm{ TS7, TS4, TS12) = TS4.

[0063] At 446 in Fig. 4, after SCU_1280 is uploaded, the freshness timestamp of S2 changes from TS4 to TS10. The minimum freshness across S1 , S2 and

53 is therefore min(freshness) = min( 7S7, TS10, TS12) = TS7. Since the minimum freshness of TS7 < TS9 (i.e. less than the freshness of TS9 of primary key modification 420), this indicates that data updates up until TS9 have not been observed.

[0064] At 448 in Fig. 4, after SCU_1281 is uploaded, the freshness timestamp of S1 changes from TS7 to TS1 1 . The minimum freshness across S1 , S2 and S3 is therefore min(freshness) = m\n( TS1 1, TS10, TS12) = TS10. Since minimum freshness TS10 > TS9, this indicates that data updates up until TS9 have been observed.

[0065] The freshness timestamp of each segment may be stored in a data structure, such as a freshness table indicated generally at 450 in Fig. 4. The minimum freshness timestamp changes to a newer timestamp whenever there has been an authority merge at processing pipeline 244, and updates from the oldest segment (the one which defines the minimum freshness) arrive and update the freshness of that segment to a fresher value. At this point, entries of the update data structure that are older than the new minimum freshness timestamp cannot have any outstanding updates with timestamps older than the minimum freshness timestamp.

[0066] The minimum freshness also represents a time boundary at which the system has a complete knowledge of all data updates that may have happened across all segments. The minimum freshness may also be defined as follows: min(freshness) = min(ST_max for source's uploaded SCU).

[0067] STmax is the largest timestamp present in any uploaded SCU from a given source (e.g., segment S1 , S2 or S3 in Fig. 4). An uploaded SCU is an SCU that has been ingested, but not transformed (e.g., sorted, merged, etc.) by processing pipeline 244.

[0068] In the example in Fig. 4, primary key modification operation 420 should be applied by deleting source primary key DirA and creating destination primary key DirB once the minimum freshness value is greater than TS9, thereby indicating that a complete prefix of data updates up until TS9 should have been observed from all segments by processing pipeline 244.

[0069] Primary key modification

[0070] Fig. 5 illustrates the system in Fig. 2 and processing of a primary key modification operation to rename a directory name, according to examples of the present disclosure. Processing of the primary key modification operation may be implemented using upload entity 230 (e.g., AJ Scanner), server system 240 (also referred to as "second device" in Fig. 1 ) and client device 260 (also referred to as "first device" in Fig. 1 ).

[0071 ] Client device 260 supports an external program or external utility capable of sending queries to, and obtaining query responses from, server system 240. Additionally, client device 260 is capable of uploading SCUs to processing pipeline 244. The external program may be implemented using any suitable technique, such as a programming language (e.g., C++), etc.

[0072] Fig. 5 will be explained with reference to Fig. 6, a flowchart of a detailed process 600 for processing a primary key modification, according to examples of the present disclosure. Blocks 510 to 590 in Fig. 5 correspond to blocks 61 0 to 690 in Fig. 6, respectively. When the primary key modification is to rename a file name or directory name, the destination primary key should inherit metadata or attributes of the source primary key, including non-physical inode metadata and any other custom attribute.

[0073] At 510 in Fig. 5 and 610 in Fig. 6, upload entity 230 retrieves a journal entry associated with primary key modification operation "move DirA DirB" \ AJFile_325. The timestamp of the primary key modification operation is also recorded in the journal entry, i.e. TS9 in the example in Fig. 4. As previously explained with reference to Fig. 2 and Fig. 3, upload entity 230 also converts the format of AJFile_325 into a format readable by processing pipeline 244.

[0074] At 520 in Fig. 5 and 620 in Fig. 6, upload entity 230 generates an SCU to record the primary key modification operation in a log data structure called "primary key modification log". The purpose is to allow the system to keep track of all primary key modification operations from all segments of the file system. In this example, the log data structure stores metadata about "move DirA DirB", including source primary key DirA, destination primary key DirB and timestamp TS9.

[0075] At 530 in Fig. 5 and 630 in Fig. 6, upload entity 230 uploads the SCU labelled SCU_ 1280 Xo processing pipeline 244.

[0076] At 540 in Fig. 5 and 640 in Fig. 6, processing pipeline 244 of server system 240 processes the SCU_ 1280 in stages 248, 250, 252 until it is in the authority 254. In more detail, SCU_ 1280 is durably stored at ingest stage 248 and transformed (i.e. sorted and merged) at sorting stage 250 and merging stage 252 until log data structure 540 is accessible to process a query from client device 260 at the end of processing pipeline 244.

[0077] At 550 in Fig. 5 and 650 in Fig. 6 (related to 1 10 in Fig. 1 ), client device 260 retrieves primary key modification operation from server system 240. In this example, client device 260 queries log data structure 540, which returns "DirA, DirB, TS9"\n response.

[0078] At 560 in Fig. 5 and 660 in Fig. 6 (related to 120 in Fig. 1 ), client device 260 determines whether data updates from all segments in the file system up until the timestamp of the primary key modification operation have been observed. Any suitable implementation may be used, such as using freshness table 450 in the example in Fig. 4, etc. Freshness table 450 is processed by processing pipeline 244 like all other tables. When updates to the other tables reach authority 254, updates to the freshness table will also be reflected in the authority freshness table, indicating the timestamps that are represented in the rest of the tables in authority 254. In another example, coordinator 242 may store the freshness timestamps in a data structure that is updated by processing pipeline 244.

[0079] As explained with reference to Fig. 4, client device 260 may determine whether the minimum freshness of data updates from all segments of the files system has exceeded timestamp 7^"S9 as follows:

[0080] Client 260 sends a query to processing pipeline 244 to determine the minimum freshness value of data updates from all segments. When the minimum freshness timestamp is greater than a timestamp of the primary key modification operation (i.e. min(freshness) > TS9), client 260 determines that data updates from all segments of the file system up until the timestamp of the primary key modification operation have been observed. [0081 ] At 570 in Fig. 5 and At 570 in Fig. 5 and 670 in Fig. 6 (related to 130 in Fig. 1 ), client device 260 applies the primary key modification operation by generating a delete operation to delete source primary key DirA and a create operation to create destination primary key DirB Xo create destination primary key. Directories DirA and DirB may be relative or absolute paths. In one example, this may be performed as follows:

[0082] Client 260 sends a query to processing pipeline 244 to retrieve records that are associated with source primary key DirA from authority 254. For example, data structure storing metadata associated with file time (FL), change log (CL), file delete time (FDT) and file objects (FO), etc., may be queried.

[0083] For each row or record that is associated with source primary key DirA, client device 260 generates a delete operation to delete source primary key DirA and a create operation to create destination primary key DirB in the record.

[0084] At 580 and 582 in Fig. 5 and 680 in Fig. 6 (related to 140 in Fig. 1 ), client device 260 provides the delete and create operations to processing pipeline 244. The operations may be provided in the form of new SCU_2008 to delete DirA ("Delete DirA") and create DirB ("Delete DirB"). SCU_2008 may further include a delete operation to delete "DirA DirB TS9" \r primary key modification log 540 such that client device 260 will not repeat the same process for the same primary key modification operation the next time primary key modification log 540 is queried. All the relevant updates are uploaded to processing pipeline 244 in a single SCU to maintain cross table consistency.

[0085] At 590 in Fig. 5 and 690 in Fig. 6 (related to 150 in Fig. 1 ), processing pipeline 244 of server system 240 processes the SCU_2008 in stages 248, 250, 252 until it is in the authority 254. In more detail, SCU_2008 is durably stored at ingest stage 248 and transformed (i.e. sorted and merged) at sorting stage 250 and merging stage 252 until they are accessible to process a query at the end of processing pipeline 244.

[0086] Process 600 in Fig. 6 may be implemented using existing stages of processing pipeline 244 without requiring any substantial changes (if any). As such, process 600 will profit from improving freshness of queries through the use of queries that specify a freshness requirement. Whenever the minimum freshness value changes, data updates may be processed and sent to authority tables 254 along with delete operations to update primary key modification log 540.

[0087] If processing pipeline 244 processes data updates from a large number of segments or nodes, client device 260 might become a bottleneck to processing due to the need to retrieve primary key modification operations from, and re-ingest them to, processing pipeline 244. It is possible to parallelize processing of the primary key modifications operations themselves. For example, using namespace partitioning, etc., one worker could process primary key modification operations in a sub-tree X while another worker on primary key modification operations in a sub-tree Y. When authority table 254 is range partitioned, the authority and/or replicated portions of the partitions may be processed in parallel to improve query scalability and throughput.

There will be speedup from processing portions of a particular range in parallel.

[0088] Additional processing

[0089] Some data structures in authority 254 may require additional processing that goes beyond preserving original values in records associated with the source primary key. Some examples of additional processing are provided below.

[0090] In a data structure storing metadata associated with File Lifetime (FL), updating a delete time attribute of the source primary key to a timestamp of the primary key modification operation and a create time attribute of the destination primary key as a create time of the source primary key.

[0091 ] In a data structure storing metadata associated with Change Log (CL), updating a last activity time attribute of the source primary key to a timestamp of the primary key modification operation.

[0092] In a data structure storing metadata associated with File Delete Time (FDT), recording the source primary key and timestamp of the primary key modification operation.

[0093] The above additional processing may be performed by client device 260, such as by generating an SCU that include the above data updates and upload the SCU to processing pipeline 244 for processing.

[0094] Further, upload entity 230 (e.g. AJ Scanner) may be modified to differentiate between two types of primary key modification operation: file rename and directory rename. If the entity being renamed is a directory, upload entity 230 uploads an additional rename operation with an appropriate wildcard character (^*) to processing pipeline 244 at 630 in Fig. 6.

[0095] The additional operation ensures children of the directory are also renamed. For example, "move DirA DirB" would result in the directory rename itself, plus rename of its children, i.e. "move DirA/^* DirB/^*". The additional operation may be processed by client device 260 and processing pipeline 244 similar to the processing of "move DirA DirB".

[0096] Example Devices 700

[0097] The above examples can be implemented by hardware, software or firmware or a combination thereof. Referring to Fig. 7, example device 700 is capable of acting as upload entity 230, server system 240 or client device 260. Device 700 includes processor 71 0, memory 720 and network interface device 740 that communicate with each other via bus 730. Although not shown device 700 may include a persistent storage (e.g., disk), in case memory (e.g., dynamic random-access memory (DRAM)) does not have sufficient storage for processing. Processor 71 0 is to perform processes described herein with reference to Fig. 1 to Fig. 6.

[0098] Memory 720 may store any necessary data 722 for facilitating processing of primary key modifications. Memory 720 may store machine- readable instructions 724 executable by the processor 710 to cause the processor 710 to perform processes described herein with reference to Fig. 1 to Fig. 6.

[0099] The methods, processes and units described herein may be implemented by hardware (including hardware logic circuitry), software or firmware or a combination thereof. The term "processor" is to be interpreted broadly to include a processing unit, application-specific integrated

circuit (ASIC), logic unit, or programmable gate array etc. The processes, methods and functional units may all be performed by the one or more processors 710; reference in this disclosure or the claims to a 'processor' should thus be interpreted to mean One or more processors'.

[00100] Further, the processes and methods described in this disclosure may be implemented in the form of a computer software product. The computer software product is stored in a storage medium and comprises a plurality of instructions for making a processor to implement the methods recited in the examples of the present disclosure.

[00101 ] The figures are only illustrations of an example, wherein the units or procedure shown in the figures are not necessarily essential for implementing the present disclosure. Those skilled in the art will understand that the units in the device in the example can be arranged in the device in the examples as described, or can be alternatively located in one or more devices different from that in the examples. Any units in the examples described can be combined into one module or further divided into a plurality of sub-units.

[00102] Although the flowcharts described show a specific order of execution, the order of execution may differ from that which is depicted. For example, the order of execution of two or more blocks may be changed relative to the order shown. Also, two or more blocks shown in succession may be executed concurrently or with partial concurrence. All such variations are within the scope of the present disclosure.

[00103] Throughout the present disclosure, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps.

[00104] It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the above-described embodiments, without departing from the broad general scope of the present disclosure. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.

Claims

1 . A method for processing a primary key modification in a system comprising a first device, and a second device that supports a processing pipeline with a plurality of processing stages to process data updates from a plurality of update sources, the method comprising the first device:

retrieving a primary key modification operation from the second device, wherein the primary key modification operation is from one of the plurality of update sources to modify a source primary key to a destination primary key and the primary key modification operation is associated with a timestamp; and in response to determination that data updates from all of the plurality of update sources up until the timestamp of the primary key modification operation have been up observed,

applying the primary key modification operation by generating a delete operation to delete the source primary key and a create operation to create the destination primary key; and

providing the delete operation and create operation to the processing pipeline of the second device such that they are transformed into an output data structure that is accessible to process a query.

2. The method of claim 1 , wherein determination that data updates from all of the plurality of update sources up until the timestamp of the primary key modification operation have been up observed comprises:

sending a query to the second device to determine a minimum freshness timestamp of the data updates from all of the plurality of update sources;

when the minimum freshness timestamp is greater than a timestamp of the primary key modification operation, determining that data updates from all of the plurality of update sources up until the timestamp of the primary key modification operation have been observed.

3. The method of claim 2, wherein the query is to query a data structure storing a freshness timestamp of each update source, and the minimum freshness timestamp indicates the update source having the oldest data update across all of the plurality of update sources.

4. The method of claim 1 , wherein retrieving the primary key modification operation from the second device comprises:

sending a query to the second device to retrieve the primary key modification operation from a log data structure that stores primary key modification operations from all of the plurality of update sources.

5. The method of claim 1 , wherein applying the primary key modification operation comprises:

sending a query to the second device to retrieve records of a data structure that are associated with the source primary key; and

generating, for each record associated with the source primary key, a delete operation to delete the source primary key and a create operation to create the destination primary key in the record.

6. The method of claim 5, wherein providing the delete operation and create operation to the second device comprises:

generating a batch of updates that include the delete operation and create operation of each record; and

providing the batch of updates to an ingest stage of the processing pipeline of the second device for transformation by a sorting stage and a merging stage of the processing pipeline into the output data structure.

7. The method of claim 5, wherein the data structure, from which records associated with the source primary key are retrieved, stores metadata associated with objects in the system.

8. The method of claim 5, further comprising providing at least one of the following data updates to the ingest stage of the processing pipeline of the second device:

data update to update, in a data structure storing metadata associated with File Lifetime (FL), a delete time attribute of the source primary key to a timestamp of the primary key modification operation and a create time attribute of the destination primary key as a create time of the source primary key;

data update to update, in a data structure storing metadata associated with Change Log (CL), a last activity time attribute of the source primary key to a timestamp of the rename operation; and

data update to record, in a data structure storing metadata associated with File Delete Time (FDT), the source primary key and timestamp of the primary key modification operation.

9. The method of claim 1 , wherein the plurality of update sources are segments of a distributed file system.

10. The method of claim 1 , wherein when the source primary key and destination primary key are:

hierarchical primary key names, wherein the primary key modification operation further comprises an operation to rename children of the source primary key as children of the destination primary key.

1 1 . A computer-readable medium storing instructions, which when executed by a processor of a first device, cause the processor to process of a primary key modification in a system comprising the first device, and a second device that supports a processing pipeline with a plurality of processing stages to process data updates from a plurality of update sources, comprising:

providing the delete operation and create operation to the processing pipeline such that they are transformed into an output data structure that is accessible to process a query.

12. The computer-readable medium of claim 1 1 , wherein determination that data updates from all of the plurality of update sources up until the timestamp of the primary key modification operation have been up observed comprises: sending a query to the second device to determine a minimum freshness timestamp of the data updates from all of the plurality of update sources;

13. The computer-readable medium of claim 1 1 , wherein applying the primary key modification operation comprises:

14. The computer-readable medium of claim 13, wherein providing the delete operation and create operation to the second device comprises: generating a batch of updates that include the delete operation and create operation of each record; and

15. A system for processing a primary key modification, the system comprising a client device and a server device that supports a processing pipeline with a plurality of processing stages to process data updates from a plurality of update sources, the client device is to:

retrieve a primary key modification operation from the second device, wherein the primary key modification operation is from one of the plurality of update sources to modify a source primary key to a destination primary key and the primary key modification operation is associated with a timestamp; and in response to determination that data updates from all of the plurality of update sources up until the timestamp of the primary key modification operation have been up observed,

apply the primary key modification operation by generating a delete operation to delete the source primary key and a create operation to create the destination primary key; and

provide the delete operation and create operation to the processing pipeline of the second device such that they are transformed into an output data structure that is accessible to process a query.