CN120066400A

CN120066400A - Front-end and rear-end fused thermal data multi-stage braking caching method and device

Info

Publication number: CN120066400A
Application number: CN202411894321.0A
Authority: CN
Inventors: 惠伟; 郝明哲; 王敬林; 洪铂; 牛宏睿; 刘宇; 甄建廷; 杨广北; 孔德强; 孙奕旸; 苏伦; 陈磊; 连振飞; 阳淦婷; 张向阳; 郝伟俊; 杨钊; 马龙; 杨扬
Original assignee: China Academy of Railway Sciences Corp Ltd CARS; Institute of Computing Technologies of CARS; Beijing Jingwei Information Technology Co Ltd
Current assignee: China Academy of Railway Sciences Corp Ltd CARS; Institute of Computing Technologies of CARS; Beijing Jingwei Information Technology Co Ltd
Priority date: 2024-12-20
Filing date: 2024-12-20
Publication date: 2025-05-30

Abstract

The present invention provides a hot data multi-level braking cache method and device with front-end and back-end integration. On the basis of data call based on negotiation cache and strong cache, the client browser distributes static resource requests to the static resource server group through the proxy server for querying and reading static resources, and forwards dynamic resource requests to the dynamic resource server group for querying dynamic resources. At the same time, hot and cold data are bypassed and identified through access statistics and trend prediction, and the changes of hot and cold data and data changes in the original database are monitored to actively update, add or delete data in the multi-level cache including browser local cache, proxy server cache and distributed cache, thereby improving the response speed and reading efficiency of hot data access, and the effect is outstanding in scenarios with clear business scenarios, linear trends, frequent data updates and high data consistency requirements.

Description

Front-end and rear-end fused thermal data multi-stage braking caching method and device

Technical Field

The invention relates to the technical field of data caching, in particular to a front-end and rear-end fusion hot data multi-stage braking caching method and device.

Background

Hot data refers to data that is frequently accessed and used. Such data typically requires fast access and response and is therefore typically stored on a higher performance, faster access storage medium, and typically provides a more optimal access reality scheme.

The judgment of cold and hot data is based on two criteria, namely (1) the frequency of access, namely, the more times a data access is performed in a period of time, the higher the data heat of the data access. (2) The timeliness of access, i.e., the closer the accessed data is to the current point in time, the higher its data warmth. Because most application scenarios have temporal and spatial locality, the probability of the current access to the data and the subsequent access is relatively large.

With the development of the internet, the amount of hot data access in applications is increasing, resulting in a sudden increase in network, application, database, storage and I/O (input/output) pressures, and there is a need to address how to efficiently process these high frequency accessed hot data. Particularly, under the scenes of frequent data updating, moderate scale and higher data consistency requirement, the traditional single cache scheme is difficult to meet the requirement, so that a fusion cache architecture combining a front-end cache, a rear-end multi-level cache and a trend prediction algorithm is needed to improve the system performance and the data response speed.

Disclosure of Invention

In view of this, the embodiment of the invention provides a front-end and rear-end fusion hot data multi-stage braking caching method and device, so as to eliminate or improve one or more defects in the prior art, and solve the problem that the prior art faces to caching pressure generated by hot data access promotion.

The invention provides a front-end and back-end fusion thermal data multi-stage braking caching method, which comprises the following steps:

The client browser inquires the resource update state from the proxy server based on a negotiation caching mechanism when requesting resources, and preferentially invokes data in a browser local cache based on a strong caching mechanism under the condition of no update;

The proxy server receives a client access request, distributes the static resource request to a static resource server group for inquiring the static resource updating state and reading the static resource updating state when updating, and forwards the dynamic resource request to a dynamic resource server group for inquiring the dynamic resource updating state and reading the dynamic resource updating state when updating;

The proxy server establishes a data access log under each service scene according to the dynamic resource request based on a bypass cache, counts the access times of each data item according to a preset time period and a preset time slice for each service scene as access heat, writes the access heat into a hot data access list, counts the hot data access list and writes the access heat accumulation list;

The proxy server marks the data items which exist in the previous time period and do not exist in the current time period in the hot data identification table as new hot data, and the data items which do not exist in the previous time period and exist in the current time period are defined as cold data;

The active cache scheduling server adds the new hot data and deletes the cold data in the browser local cache, the proxy server cache and the distributed cache based on hot driving, and actively updates or deletes old hot data in the browser local cache, the proxy server cache and the distributed cache based on data driving according to data change in a database.

In some embodiments, the method combines the browser local cache, the proxy cache, and the distributed cache with the database based on a read-write-through policy.

In some embodiments, the thermal data qualification acceptance threshold is one-half of a minimum access heat in the access heat accumulation table for a previous period.

In some embodiments, revising the access heat introduction trend prediction for each data item in the hot data identification table comprises:

fitting the access heat of each time slice of a single data item in the current time period based on linear trend prediction, and presuming a predicted access heat value corresponding to each time slice in the next time period, wherein the expression is as follows:

wherein y _t represents the actual observed value of the time slice at the moment t in the current time period, A is the intercept of straight line fitting, and b is the slope of straight line fitting; the average value of y _t is the average value of y _t, N represents the number of data points;

And carrying out weighted average on the access heat observed value of the data item in each time slice in the hot data access quantity list in the current time period and the predicted access heat value in each time slice in the next time period so as to revise the access heat of each data item.

Taking the number of times of occurrence of the corresponding time slices of the data item in a plurality of time periods as a weight, carrying out weighted average on the observed values of the access heat of the corresponding time slices of the data item in a plurality of time periods to obtain the predicted access heat value of the corresponding time slices of the data item in the next time period, wherein the calculation formula is as follows:

Wherein x _i represents an observed value of the access heat of the data item in the i-th time period target time slice, w _i represents a weight of the i-th time period, and n represents the number of time periods; representing the predicted access heat value;

In some embodiments, the access popularity observation value and the predicted access popularity value of each data item in the hot data identification table are weighted and averaged to revise the access popularity of each data item, wherein the weight of the access popularity observation value is 1, and the weight of the predicted access popularity value is the ratio of the preset time period to the preset time slice length.

In some embodiments, the hot data access list, the access hot accumulation table, and the hot data identification table are marked with hash values of uniform resource locators of data.

On the other hand, the invention also provides a front-end and rear-end fused thermal data multi-stage braking caching device, which comprises:

The client is used for loading the browser, inquiring the resource update state from the proxy server based on a negotiation caching mechanism when requesting resources, and preferentially calling the data in the local cache of the browser based on a strong caching mechanism under the condition of no update;

the static resource server group is used for caching static resources;

a dynamic resource server group for caching dynamic resources based on the distributed storage;

The proxy server is used for receiving the client access request, distributing the static resource request to a static resource server group for inquiring the static resource updating state and reading the static resource updating state when updating, forwarding the dynamic resource request to the dynamic resource server group for inquiring the dynamic resource updating state and reading the dynamic resource updating state when updating, establishing a data access log under each service scene aiming at the dynamic resource request based on a bypass cache, counting the access times of each data item according to a preset time period and a preset time slice aiming at each service scene as access hotness and writing the access hotness into a hot data access list, counting the hot data access list and writing the access hotness accumulation list;

The active cache scheduling server is used for newly adding the new hot data and deleting the cold data in the browser local cache, the proxy server cache and the distributed cache based on hot driving, and actively updating or deleting old hot data existing in the browser local cache, the proxy server cache and the distributed cache based on data driving according to data change in a database;

And the database is used for storing the original data and synchronizing the local cache of the browser, the proxy server cache and the distributed cache based on a read-write penetration strategy.

In another aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program/instruction which when executed by a processor performs the steps of the above method.

In another aspect, the invention also provides a computer program product comprising a computer program/instruction which, when executed by a processor, implements the steps of the above method.

The invention has the advantages that:

According to the front-end and back-end fusion hot data multi-stage braking caching method and device, based on data call of negotiation cache and strong cache, a client browser distributes a static resource request to a static resource server group through a proxy server for inquiring and reading the static resource and forwarding the dynamic resource request to the dynamic resource server group for inquiring the dynamic resource, meanwhile, cold and hot data is bypass-identified through access quantity statistics and trend prediction, cold and hot data change and data change in an original database are monitored to actively update, newly increase or delete data of a multi-stage cache comprising a browser local cache, a proxy server cache and a distributed cache, response speed and reading efficiency of hot data access are improved, and effects are prominent in scenes with clear service scenes, linear trend, frequent data update frequency and high data consistency requirement.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.

It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the above-described specific ones, and that the above and other objects that can be achieved with the present invention will be more clearly understood from the following detailed description.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate and together with the description serve to explain the application. In the drawings:

Fig. 1 is a schematic flow chart of a front-end and back-end fusion thermal data multi-level braking caching method according to an embodiment of the invention.

Fig. 2 is a dynamic and static separation structure diagram in a front-rear end fusion thermal data multi-stage braking caching method according to an embodiment of the invention.

Fig. 3 is a schematic diagram of thermal data accumulation logic in a front-to-back end fusion thermal data multi-stage braking caching method according to an embodiment of the invention.

Fig. 4 is a schematic diagram of thermal data bypass identification logic in a front-to-back end fusion thermal data multi-level braking caching method according to an embodiment of the invention.

Fig. 5 is a schematic diagram of thermal data bypass identification logic including trend prediction in a front-to-back end fusion thermal data multi-level brake caching method according to an embodiment of the invention.

Fig. 6 is a schematic diagram of new hot data and cold data identification logic in a front-to-back end fusion hot data multi-stage braking caching method according to an embodiment of the invention.

Fig. 7 is a diagram illustrating a cold-hot data bypass identification structure in a front-to-rear end fusion hot data multi-stage braking caching method according to an embodiment of the present invention.

Fig. 8 is a schematic diagram of active cache scheduling logic in a front-to-back end fusion hot data multi-level braking cache method according to an embodiment of the invention.

Fig. 9 is a schematic diagram of front-end buffer management logic in a front-end and back-end fusion hot data multi-level brake buffer method according to an embodiment of the invention.

Fig. 10 is a schematic diagram of an overall structure of a front-rear end fusion thermal data multi-stage braking caching method according to an embodiment of the invention.

Detailed Description

The present invention will be described in further detail with reference to the following embodiments and the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. The exemplary embodiments of the present invention and the descriptions thereof are used herein to explain the present invention, but are not intended to limit the invention.

It should be noted here that, in order to avoid obscuring the present invention due to unnecessary details, only structures and/or processing steps closely related to the solution according to the present invention are shown in the drawings, while other details not greatly related to the present invention are omitted.

It should be emphasized that the term "comprises/comprising" when used herein is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.

It is also noted herein that the term "coupled" may refer to not only a direct connection, but also an indirect connection in which an intermediate is present, unless otherwise specified.

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the drawings, the same reference numerals represent the same or similar components, or the same or similar steps.

Aiming at the application scene of high-frequency access of hot data, which has faster updating frequency, moderate data template and higher requirement on data consistency, the invention provides a front-end and back-end fusion multi-level cache technical architecture so as to meet the read-write access requirement on the hot data. In the conventional multi-level bypass cache architecture, the cache architecture technology of the server is focused, and the integration and use of the front-end cache lack of systematic carding. The total data size is generally larger in the common application scene of the internet, the full-size caching scheme is not considered, and in this case, for the less-size thermal data of a non-full size in the internet application, the design of the active multi-level cache architecture AMC (Active Multilevel Cache) with the unified front-end and back-end cache scheduling technology is more suitable.

An aspect of the present invention provides a front-end and rear-end fusion thermal data multi-level braking caching method, referring to fig. 1 and 9, the method includes the following steps S101 to S105. The method operation environment comprises a client, a proxy server, a static resource server group, a dynamic resource server group and a database.

Step S101, the client browser inquires the update state of the resource from the proxy server based on a negotiation caching mechanism when requesting the resource, and preferentially invokes the data in the local cache of the browser based on a strong caching mechanism under the condition of no update.

Step S102, the proxy server receives the client access request, distributes the static resource request to the static resource server group for inquiring the static resource updating state and reading the static resource updating state when updating, forwards the dynamic resource request to the dynamic resource server group for inquiring the dynamic resource updating state and reading the dynamic resource updating state when updating, the dynamic resource server group adopts distributed cache, and the proxy server executes negotiation caching mechanism and strong caching mechanism based on the proxy server cache.

Step 103, the proxy server establishes a data access log under each service scene according to the dynamic resource request based on the bypass buffer, counts the access times of each data item according to a preset time period and a preset time slice for each service scene as access heat, writes the access heat into a hot data access list, counts the hot data access list and writes the access heat accumulation list, discards the data items with the access heat smaller than the hot data identification standard receiving threshold value, sorts the data items according to the access heat, stores the data items with the first set number into the hot data identification list, and revises the access heat introduction trend prediction of each data item in the hot data identification list.

And step S104, the proxy server marks the data items which exist in the previous time period and do not exist in the current time period in the hot data identification table as new hot data, the data items which do not exist in the previous time period and exist in the current time period are defined as cold data, and the content push values of the new hot data and the cold data are actively cached to the scheduling server.

Step S105, the active cache scheduling server adds new hot data and deletes cold data in the browser local cache, the proxy server cache and the distributed cache based on hot driving, and actively updates or deletes old hot data in the existing browser local cache, the proxy server cache and the distributed cache according to data change in the database based on data driving.

In steps S101 to S105, the main logic is to read data based on the negotiation caching mechanism and the strong caching mechanism when the browser of the client initiates the data request. Under the condition of synchronously caching data and updating data in a database, the data in the local cache of the browser is preferentially adopted. For the data request of the client, the proxy server delivers the static resources to the static resource server group for processing, and delivers the dynamic resources to the dynamic resource server group, so that the efficiency can be improved by carrying out the targeted architecture design according to different characteristics of the dynamic and static resources. The proxy server also carries out cold and hot data bypass identification on the hot data, and introduces trend prediction for revision, carries out new addition and deletion on new hot data in the multi-level cache based on the identification result of the cold and hot data, and modifies the data which are changed in the database so as to meet the quick access requirement of the client on the hot data and improve the access efficiency.

In step S101, the client is a device that directly interacts with the user, and performs data access and processing by loading a browser or an application program. The negotiation caching mechanism refers to that the browser sends a request to the server at the back end first to inquire whether the resource to be accessed is updated or not. The server replies whether the browser uses the browser cache or the server resource according to the updated state of the resource. The negotiation buffer depends on Last-Modified or ETag in the HTTP response header and If-Modified-nonce and If-None-Match fields in the HTTP request header. Last-Modified is an HTTP header returned by the server in the response, indicating the Last modification time of the resource. This time is the time the server considers the resource was last modified. When a browser requests a certain resource, the server can tell the browser the Last modification time of the resource through Last-Modified, and the browser can judge whether the resource in the cache is out of date or not through the Last-Modified time. If-Modified-nonce is an HTTP header attached to the browser at the time of the request, indicating the time when the browser last cached the resource. The browser will set If-Modified-nonce to Last-Modified time stored in the cache in the subsequent request, asking the server If the resource is updated. When the browser re-requests the resource, it sends an If-Modified-nonce request header, and the server can determine whether the resource changes according to the time. If the resource has Not changed since the last cache, the server will return a 304Not Modified response indicating that the client can continue to use the cached version. If the resource has changed, the server returns a new resource. ETag is a server-generated unique identifier that represents the content or status of a resource. Unlike Last-Modified, ETag is a hash value or unique identifier based on the resource content, which enables it to provide more accurate cache control. the value of ETag will change each time the resource content changes. The ETag allows the server to more accurately determine whether the resource is updated or not, even if the modification time of the resource is unchanged, the value of the ETag changes as long as the content of the resource changes. When the browser requests, it will send If-None-Match request header, send the ETag value stored in buffer to the server, inquire whether the resource has changed. The strong caching mechanism refers to that when a browser requests a resource, it first checks whether a local cache has a copy of the resource, and whether the copy expires. If the copy of the resource is not expired, the browser directly uses the local cache and does not send a request to the server, so that the webpage loading speed is increased. The implementation of strong caching relies on two fields in the HTTP response header, expire and Cache-Control. Expires is a type of HTTP response header that specifies the expiration time of a resource, i.e., the expiration date and time that the browser cached the resource. Before this time, the browser can use the resource directly from the cache without requesting the server again. Cache-Control is a response header introduced by the HTTP/1.1 standard to replace or supplement the expire. It provides more control options to manage browser caching, uses relative time (seconds) to specify the cache duration of the resource, and is more flexible and accurate.

Meanwhile, by constructing the front-end database, indexedDB may be used for data caching and manipulation for data resources in the form of more No Sql databases, particularly those data sets that may further require computational operations such as front-end filtering, screening, lookup, etc. IndexedDB is a client database provided by a browser that can store large amounts of structured data and support transactional operations.

The browser may pre-cache the resource in the bypass or monitor the cache for application of the resource, etc. The method is generally realized by using a Service workbench script technology, and is mainly realized by means of Web workbench scheduling and caching by using localStorage and IndexDB. Corresponding resource requests can be directly intercepted during access, and can be directly obtained from a local cache when hit can be achieved without actually sending the requests to a server. When the caching strategy is effective, the basic page access function can be still provided under the condition of poor network environment or network disconnection. The consistency of the data can be ensured by negotiating a caching mechanism when the browser cache scheduling is performed by using the Service workbench.

In step S102, the proxy server is a server located between the client and the origin (resource) server, and in order to acquire the content from the origin server, the client sends a request to the proxy server and designates the target origin server, and then the proxy server forwards the request and returns the acquired content to the client. Proxy servers are often used Hou Hui as application gateways. Meanwhile, the proxy service is used as a loop on the network resource access and acquisition link, and can also support negotiation caching and forced caching strategies in the HTTP protocol. Proxy service caching relies on two fields in the HTTP response header, namely the expire and Cache-Control fields, and when the Cache-Control value is public, all content will be cached (both client and proxy servers can be cached). However, even when the Cache-Control value is other, the proxy server can also override this setting and force the use of proxy service caches. Distributed caching is a technique for caching data onto multiple nodes to increase the speed of data access. Supporting horizontal linear expansion can improve cache capacity and performance by adding more nodes. Having higher performance, concurrent access speed can be improved by storing data in a decentralized manner. Meanwhile, single-point faults can be avoided, and high availability of the cache service is ensured through multiple copies and copy consistency.

The proxy server distinguishes dynamic and static resources for client access requests, and performs dynamic and static separation, namely, in a typical Web application architecture, static resources (static data) and dynamic resources (dynamic data) are distinguished into different system architecture designs, so that the access performance and maintainability of the whole application are further improved.

As shown in fig. 2, dynamic and static separation is mainly implemented by performing corresponding settings on the reverse proxy server. The proxy server can serve as a dynamic-static separation gateway to distribute static resource requests to a common Web server group, which is simpler and quicker, and forward the requests of dynamic resources (dynamic data) to a back-end application server group for processing. The architecture design method can improve the accessibility and maintainability of the whole service. The service of each link can be designed in a targeted manner according to different characteristics of dynamic and static resources.

Distributed caching of dynamic resource server groups is a technique for caching data onto multiple nodes to increase the speed of data access. Supporting horizontal linear expansion can improve cache capacity and performance by adding more nodes. Having higher performance, concurrent access speed can be improved by storing data in a decentralized manner. Meanwhile, single-point faults can be avoided, and high availability of the cache service is ensured through multiple copies and copy consistency.

In step S103, the proxy server performs hot data bypass identification, and the shared access log of the proxy server or the load server is used for confirmation by using the bypass hot data identification system (HDBDS: hot Data Bypass Detection System). The access log is stored according to a fixed time slice (here, an independent file is generated daily), so that the reading and writing efficiency of the access log can be further improved. The hot data bypass identifies the application server, reads corresponding new logs at regular time, and counts the logs.

The access heat (AP: access Popularity) refers to the "Point" at which a user accesses a piece of Data (Data) to generate a bit of the Data, and the sum of all access Point values in a statistical period is the access heat of the Data item.

Based on LFU (Least Frequency Used) algorithm, the data with highest occurrence frequency in the historical data is considered to be hot data, and the core idea is that hot data which is more likely to be accessed in the future is data which is frequently accessed in the past. In order to realize the algorithm, the access times of each data in the access mode are counted, the data are ordered according to the access times, and when the condition of insufficient cache capacity occurs, the data with the minimum access times in the cache are eliminated.

A thermal data statistics time period (TR: TIME RANGE) is defined, and thermal data statistics for different dimensions may have different statistics time periods. Taking a "24-hour hot news" as an example in a content distribution class application, the statistical period may be 24 hours long. For "real-time hotspots," the statistical time slice length may be 1 hour.

Thermal data statistics time slices (TS: TIME SLICE) are defined, and there are also different "statistics time slices" for thermal data statistics of different dimensions. Taking "24-hour hot news" in a content distribution application as an example, the control of the calculation frequency and the rapid presentation requirement of the information of rapid heat rise are combined, and the statistical time granularity of the heat data is preferably about 30 minutes. For the 'real-time hot spot', the shorter the statistical time granularity length is, the better, and the comprehensive control of the calculation frequency and the rapid presentation requirement of the information of the rapid rise of the heat are better, preferably 1 minute.

An access hotness accumulation table (APST: access Popularity Statistics Table) is defined, which requires accumulation of each data access log in the application for a hot data analysis of a certain dimension.

As shown in FIG. 3, this is implemented as a list of hot data access volumes (APRL: access Popularity Record List) of TR/TS length. When The system operation reaches a TS duration trigger point, recording The current Time, and then:

The round statistics Time interval This TS is [ The Time-TS, the Time ], and The slice number APST SEQ in The statistics table is Mod (The Time-TR Begin Time, the Time).

The statistics application server will collect all acquired data actions GetData (see at ① in fig. 2) of This turn TS time interval (This TS) in the access log information in the current TS, and accumulate DataKey (the key of the data is usually the Hash value of the URL of the acquired data, and This URL value should also be saved together for use in subsequent cache scheduling) as a statistics classification (see at ② in fig. 2). The corresponding accumulation of data should be updated into the APST SEQ entry (see at ④⑤ in fig. 2) in the APST sub-table.

After all the acquiring operations in the Data access log in the This TS time range are accumulated, the value of the accumulation result item in each APST sub-table may be Summed (SUM) to obtain the total access point value of the current Data in the TR range, that is, the current heat AP, and the AP value of the corresponding item in the APST table is updated (see at ⑥⑦ in fig. 2).

The APs may be ordered after the APST for all data is obtained after the log is analyzed by-pass. Half-value rejection can be performed during the sorting process based on historical data to simplify computation.

Defining the hot data value number (QHD: quantity of Hot Data), and finally determining which data in the access hot accumulation table are hot data for the access hot accumulation table with the completed ordering. This value is also typically directly related to the amount of buffer space that the system can allocate.

A thermal data identification criteria acceptance threshold (THD: threshold of Hot Data) is defined, which is typically half the minimum AP value in the resulting value of the last thermal data calculation, which is the minimum requirement for thermal data identification of the required data item. The method is mainly used for a half-value discarding process with reduced calculation amount.

A Hot Data identification Table (HDT: hot Data Table) is defined, and the final result after calculation of the Hot Data identification is stored in the Hot Data identification Table.

As shown in fig. 4, when hot data is identified (Hot Data Detection), the data records of all APs less than THD in the APST calculated by this identification are discarded first, and do not participate in the ranking calculation (see fig. 3 at ①②).

The APST with the AP value higher than the threshold value is stored in the HDT-1 table and ordered (see ③ in FIG. 3), and the result is stored in the HDT-2 table. According to the service logic requirement, the front QHD data sequenced in the HDT-2 table is subjected to hot data pre-identification and stored in a hot data identification table HDT (see at ④⑤ in figure 3).

When the two-step calculation process is specifically implemented, the data discarding and sorting can be realized through one-time bubbling sorting, and the calculation process is further optimized.

At this point, the calculation of the thermal data is completed, and finally, the thermal data identification standard acceptance threshold THD is updated by taking the minimum AP value in the result set according to the result set of the thermal data (see at ⑥⑦ in fig. 3). In some embodiments, the thermal data identification criteria acceptance threshold THD is one-half of the minimum access heat in the access heat accumulation table for the previous period.

In a business scenario with obvious trend characteristics, the data access trend of the next time slice TS can be further predicted.

As shown in fig. 5, half-value discard may be performed in advance when the trend value of the next TS in the service logic does not absolutely affect the AP value of the HDT, otherwise this step should be skipped. The TAP field is added to the HDT-1 table after half-value discard to store the trend prediction value of the next TS period. At this time, trend calculation may be performed using the access amount AP values of all TSs in the TR period corresponding to the DATA in APRL. The calculation result is the AP predicted value of the next TS period, which is denoted as TAP herein, and is stored in the TAP field. Let Trend be the Trend function, expressed as tap=trend (knownx's, knowny's).

In order to balance the resource conflict between computing resources and cache scheduling optimization, a linear trend prediction algorithm with small calculation amount is used.

In some embodiments, the access heat introduction trend prediction of each data item in the hot data identification table is revised, including steps S201 to S202:

Step S201, fitting access heat of each time slice of a single data item in a current time period based on linear trend prediction, and presuming a predicted access heat value corresponding to each time slice in a next time period, wherein the expression is as follows:

wherein y _t represents the actual observed value of the time slice at the moment t in the current time period, A is the intercept of straight line fitting, and b is the slope of straight line fitting; the average value of y _t is the average value of y _t, Is the mean value of time, n represents the number of data points.

The method has the advantages of simple occupied algorithm, less consumption of calculation resources, suitability for various fields and capability of providing a certain prediction accuracy. The disadvantage is that assuming that the data is linear, the data prediction effect for non-linear relationships is poor and therefore short term fluctuations of the data cannot be captured.

And S202, carrying out weighted average on the access heat observed value of the data items in each time slice in the hot data access quantity list of the current time period and the predicted access heat value in each time slice of the next time period so as to revise the access heat of each data item.

In real service, there are service scenes in which some trends rise with obvious boundaries. If a group of people has a voting scene of all the people, the future trend value cannot be larger than the number of all the people, but the trend value far larger than the number of all the people can be predicted by the straight line trend prediction algorithm because the rising amplitude of the initial trend is more obvious. To further balance the absolute impact of trend predictions on thermal data identification, it is necessary to further consider the impact of historical AP data. Here a new algorithm is introduced.

In some embodiments, the access heat introduction trend prediction of each data item in the thermal data identification table is revised, including steps S301 to S302:

Step S301, taking the occurrence times of the corresponding time slices of the data item in a plurality of time periods as weights, carrying out weighted average on the observed values of the access heat of the corresponding time slices of the data item in a plurality of time periods to obtain the predicted access heat value of the corresponding time slices of the data item in the next time period, wherein the calculation formula is as follows:

Wherein x _i represents an observed value of the access heat of the data item in the target time slice of the ith time period, w _i represents the weight of the ith time period, and n represents the number of time periods; Representing a predicted access heat value.

And S302, carrying out weighted average on the access heat observed value of the data items in each time slice in the hot data access quantity list of the current time period and the predicted access heat value in each time slice of the next time period so as to revise the access heat of each data item.

The method has the advantages of simple calculation, simple and convenient realization, and particularly higher flexibility, and the weighted average method endows different data with different weights, and the sizes of the weights can be adjusted according to specific service requirements, so that the actual situation can be reflected better. The importance of different indexes is considered, namely, the different indexes are weighted in a weighted average method, the weight of the different indexes can be adjusted according to the actual demand and the importance of the indexes, the actual condition of the data can be reflected more truly, and the final result is more objective and accurate. The influence of the abnormal data can be further reduced, and the abnormal value in the data can be balanced by adjusting the weight, so that a more accurate data average number is obtained. The disadvantage is mainly the large subjective factor, and the adjustment of weights is usually dependent on human experience and machine learning. The different experiences of the person and the sample coverage of the machine learning can lead to differences in weight setting, which affect the result of the weighted average method. The requirement on data distribution is high, namely, the weighted average method is suitable for the situation of normal data distribution, and if the data has an off-state distribution or an extreme value, the result of the weighted average method can be influenced.

In steps S202 and S302, the access popularity observation value and the predicted access popularity value of each data item in the hot data identification table are weighted and averaged to revise the access popularity of each data item, the weight of the access popularity observation value is 1, and the weight of the predicted access popularity value is the ratio of the preset time period to the preset time slice length. The weight is assigned 1 in the weighted average process for the data items AP in the hot data access amount list APRL, and TR/TS for the calculated trend value TAP. And using all the AP values and the TAP values in the APRL table and the corresponding weights to carry out weighted average calculation to obtain the final AP value of the HDT table.

In step S104, in the new hot data and cold data identification process, as shown in fig. 6, after the confirmed HDT table is obtained, the new hot data and cold data can be confirmed by comparing with OHDT (Old Hot Data Table) of the previous round.

The new hot data (NHDT: new Hot Data Table) is the data existing in the current HDT table but not in the OHDT table, and the data needs to be buffered for new operations, and the calculation formula is nhdt=hdt-OHDT.

Cold Data (CDT: cold Data Table) is Data that does not exist in the current HDT Table but exists in OHDT tables, and the Data needs to be subjected to a cache deletion operation, and the calculation formula is CDT= OHDT-HDT.

And pushing the identified content to the active cache scheduling server after the identification of the new hot data and the new cold data, and executing the next active cache scheduling on the corresponding data by the active cache scheduling server.

And triggering different hot data accumulation and hot data identification modules according to different service scenes by taking an access log as input, finally forming and outputting a hot data table of the service scene of the TS round, outputting an HDT after further access trend calculation, and outputting the CDT after comparing with the HDT result of the previous TS round so as to form a complete cold and hot data bypass identification subsystem for the active cache scheduling subsystem. The functional architecture of this subsystem is shown in fig. 7.

In step S105, active cache scheduling is executed, and in two cases, the first is hot driving, and after the TS is used for judging the cold and hot data, the cache in the system is changed and scheduled. And triggering the active cache scheduling after HDT, NHDT, CDT of the business scene is determined to be finished. The second is data driving, when the data source generates corresponding change, the buffer scheduling subsystem needs to perform corresponding active update or delete buffer scheduling on the old hot data in the existing buffer according to specific conditions. In the system architecture, the active cache scheduling of the hot data caused by the change of the second data source is mainly realized by capturing the changed data and actively performing corresponding scheduling on a distributed cache (Redis) and a proxy server cache (Nginx) according to the change condition by the system.

As shown in fig. 8, the data publisher is a data persistent database, in this architecture, taking MySql database as an example, since most databases do not support active publishing services, data publishing can be implemented by using change data Capture (CDC: CHANGE DATA Capture) software, and typical software is Debezium.

When Debezium finds that changed data appears in the database, the corresponding data is pushed to Kafka to generate a corresponding theme (Kafka Topic).

At this time, the dynamic cache scheduling module subscribing the corresponding theme is triggered, and corresponding scheduling is performed on the distributed cache and the proxy server cache browser cache in sequence.

When the heat driving is triggered, if a data item exists in the NHDT table, new heat data is generated, and at the moment, a new operation needs to be performed on the content of the cache with the corresponding key value of DataKey in the NHDT table in the multi-level cache. If there is a data item in the CDT table, it indicates that new cold data is generated, and at this time, deletion operation needs to be performed on the content of the cache with the corresponding key value of DataKey in the CDT table.

And when the data driving is triggered, carrying out DataKey assembly on the main key of the received changed data item. The specific process is that front end access url is constructed according to a main key of data and front end service access logic, then the front end access url is subjected to Hash operation to obtain DataKey of current data, and then the current data is processed according to three operation types.

First, a new adding operation, in which the record of the data is newly added in the APST, and a TR/TS structure table is created, and the access heat accumulation is started for the data.

Second, the update operation first looks up DataKey in the HDT, and if present, indicates that there is a change in the hot data. At this time, the content of the cache with the corresponding key value DataKey in the multi-level cache needs to be updated. The response header field Last-Modified is added to the corresponding data record, and the specific value of the response header field Last-Modified is the actual change time of the source data, so that the response of the server can be directly output.

Third, delete operation, first look up DataKey in HDT, if present, indicates that hot data has been deleted. At this time, deletion operation is required for the content of the cache with the key value DataKey corresponding to the multi-level cache. It should be noted that when a data source generates a delete operation, the scheduler needs to delete all records of the data in the APST access hot accumulation table and the data record in the HDT table in synchronization in addition to the cache delete schedule. To prevent the calculation result from being contaminated by the deleted data when the cold and hot data bypass confirmation is performed at the next TS time point.

The addition, deletion and modification of distributed caches (Redis) are usually directly operated using the set, del method.

The implementation of addition and deletion of proxy cache (nmginx) is relatively cumbersome, and nmginx itself is not a programming language environment, it is a lightweight HTTP and reverse proxy. Therefore, the nmginx itself does not support an active cache scheduling function, and cannot implement active cache loading. Cache scheduling can only be done by installing a third party module plug-in, such as ngx _http_pump_module. The purge module can delete the designated cache which needs to be cleaned, and cannot actively load the corresponding new cache. For the cache data which needs to be subjected to designated loading, the cache data can be deleted only by the purge module, and then the corresponding cache is automatically generated when the client side requests the first data access request by relying on a read-write penetrating cache mechanism.

In some embodiments, the method combines browser local caches, proxy caches, and distributed caches with the database based on read-write-through policies. A Read/Write Through cache policy (Read/Write Through) is to tightly combine a cache (generally referred to as a cache service) with a database, and an application program will operate Through a cache layer when reading and writing data. If the cache is not hit, the application program queries the data from the database through the cache layer and writes the data into the cache, and when the data is written, the application program writes the data into the cache first and writes the data into the database through the cache layer. In this way, the data remains consistent between the cache and the database.

In some embodiments, the hot data access list, the access hot accumulation table, and the hot data identification table are marked with hash values of uniform resource locators of the data.

In various large-scale application scenes, the requirements of further improving the speed and reducing the delay are met in the acquisition and the application of the existing data, and the front-end calculation can move part of calculation tasks from the server to the client, so that some data processing and analysis calculation are closer to the client, and the times, time and network bandwidth occupation of data transmission are obviously reduced. The front-end persistent data storage technology supporting the front-end computing application scene is a front-end database.

Further, in the present architecture, the front-end complex computing data and the normal display data are distinguished. The front-end buffer is matched with the dynamic and static separation strategy of the back-end buffer to perform distinguishing treatment. The front end of the cache of the static resource (static data) does not make active scheduling and is completely dependent on the cache policy of the browser. The architecture of dynamic resources (dynamic data) adopting bypass caching still depends on the time effectiveness of the back-end multi-level caching, and all caches of access results can be bypassed according to the header information to determine whether to cache or not. In the architecture, a Service workbench is used for front-end bypass cache scheduling, the cache is further subjected to unified management by a bypass, and data consistency is verified by adopting an outdated policy during re-verification, and implementation logic is shown in fig. 9.

When using a Service Worker, a situation may be encountered with respect to the HTTP status code 304. The 304 status code means that the requested resource is unmodified and the cached version can continue to be used. The fetch event handling function in the Service workbench may perform specific processing when detecting that a certain resource returns to the 304 state code. First, it is ensured that the requested resource is cached correctly and that the cache is used when the resource is valid. Secondly, when the resources are changed, the Service Worker is ensured to update the resources in the cache.

Accordingly, the present invention also provides an apparatus/system comprising a computer device including a processor and a memory, the memory having stored therein computer instructions for executing the computer instructions stored in the memory, the apparatus/system implementing the steps of the method as described above when the computer instructions are executed by the processor.

The embodiments of the present invention also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the edge computing server deployment method described above. The computer readable storage medium may be a tangible storage medium such as Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disks, hard disk, a removable memory disk, a CD-ROM, or any other form of storage medium known in the art.

the static resource server group is used for caching static resources;

The cache scheduling process is shown in fig. 10. When a browser of the client initiates a request, the server works to take over and judge the request, and when the request is a static resource, no further processing is carried out, and the request is completed by a default mechanism of the browser. When the request is dynamic data, the Service workbench firstly requests the data and loads the data into the browser cache. Other scripts of the page read directly from the cache when data is needed and no more data requests are made.

The front-end initiates a request to the system to first reach the dynamic-static separation gateway. Wherein the corresponding static resource request will be forwarded to a static resource load balancing server (also static resource proxy service cache) which will return the request to the local cache as the case may be, or further forward the request to a back static resource server group.

And the corresponding dynamic data request will be forwarded to the dynamic data load balancing server (also dynamic data proxy service cache), which will return the request to the local cache as the case may be, or further forward the request to the back application server group while recording the corresponding access record in a log file on the distributed file server.

When the request penetrates through the dynamic data proxy service cache to reach the application server, the application server queries the distributed cache, if the request hits, the data in the distributed cache is returned, if the request misses, the database or file system reading operation is further carried out, and the data is obtained and then returned to the client.

The user accesses the generated log file and is analyzed by the hot data bypass identification module containing trend prediction factors at regular time. And carrying out corresponding cold and hot data identification according to different service scenes, and storing corresponding results into a database. And then triggering the active cache scheduling module to update the multi-level cache by the hot drive.

Meanwhile, if the related service of the application server performs corresponding writing operation on the confirmed hot data. The data driver triggers the active cache scheduling module to update the multi-level cache after being captured by the changed data capturing module.

In summary, in the front-end and back-end fused hot data multi-stage braking caching method and device, based on data call by the client browser based on negotiation cache and strong cache, the static resource request is distributed to the static resource server group through the proxy server for inquiring and reading the static resource and forwarding the dynamic resource request to the dynamic resource server group for inquiring the dynamic resource, meanwhile, cold and hot data are bypass-identified through access quantity statistics and trend prediction, cold and hot data change and data change in an original database are monitored to actively update, newly increase or delete the multi-stage cache comprising the browser local cache, the proxy server cache and the distributed cache, response speed and reading efficiency of hot data access are improved, and the method and device have outstanding effects in scenes with clear service scenes, linear trend, frequent data update frequency and high data consistency requirement.

Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein can be implemented as hardware, software, or a combination of both. The particular implementation is hardware or software dependent on the specific application of the solution and the design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave.

It should be understood that the invention is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. The method processes of the present invention are not limited to the specific steps described and shown, but various changes, modifications and additions, or the order between steps may be made by those skilled in the art after appreciating the spirit of the present invention.

In this disclosure, features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations can be made to the embodiments of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A hot data multi-level braking cache method with front-end and back-end fusion, characterized in that the method comprises the following steps:

When requesting a resource, the client browser queries the proxy server for resource update status based on the negotiated cache mechanism, and in the absence of an update, preferentially calls the data in the browser local cache based on the strong cache mechanism;

The proxy server receives the client access request, distributes the static resource request therein to the static resource server group for querying the static resource update status and reading it when updating, and forwards the dynamic resource request therein to the dynamic resource server group for querying the dynamic resource update status and reading it when updating; the dynamic resource server group adopts distributed cache; the proxy server executes the negotiation cache mechanism and the strong cache mechanism based on the proxy server cache;

The proxy server establishes a data access log for each business scenario for the dynamic resource request based on the bypass cache, counts the number of accesses to each data item as the access heat and writes it into a hot data access list according to a preset time period and a preset time slice for each business scenario, counts the hot data access list and writes it into an access heat accumulation table; after discarding the data items whose access heat is less than the acceptance threshold of the hot data identification standard, sorts them according to the size of the access heat, and stores the first set number of data items into the hot data identification table; the access heat of each data item in the hot data identification table is revised by introducing trend prediction;

The proxy server marks the data items in the hot data identification table that existed in the previous time period but not in the current time period as new hot data, and defines the data items that did not exist in the previous time period but exist in the current time period as cold data; and pushes the content values of the new hot data and the cold data to the active cache scheduling server;

The active cache scheduling server adds the new hot data and deletes the cold data in the browser local cache, the proxy server cache and the distributed cache based on heat drive; and actively updates or deletes the old hot data already in the browser local cache, the proxy server cache and the distributed cache based on data drive according to data changes in the database.

2. The hot data multi-level braking cache method with front-end and back-end integration according to claim 1 is characterized in that the method combines the browser local cache, the proxy server cache and the distributed cache with the database based on a read-write penetration strategy.

3. The front-end and back-end integrated hot data multi-level braking cache method according to claim 1 is characterized in that the hot data identification standard acceptance threshold is half of the minimum access heat in the access heat accumulation table in the previous period.

4. The front-end and back-end integrated hot data multi-level braking cache method according to claim 1 is characterized in that the access heat of each data item in the hot data identification table is revised by introducing trend prediction, including:

Based on the linear trend prediction, the access popularity of a single data item in each time slice in the current time period is fitted, and the predicted access popularity value corresponding to each time slice in the next time period is inferred. The expression is:

Among them, _yt represents the actual observation value of the time slice at time t in the current time period, Indicates the predicted access heat value of the time slice at time t in the next time period; a is the intercept of the straight line fitting, and b is the slope of the straight line fitting; is the mean of _yt , is the mean of time; n represents the number of data points;

The access heat observation value of the data item in each time slice in the hot data access volume list of the current time period and the predicted access heat value in each time slice of the next time period are weighted averaged to revise the access heat of each data item.

5. The front-end and back-end integrated hot data multi-level braking cache method according to claim 1 is characterized in that the access heat of each data item in the hot data identification table is revised by introducing trend prediction, including:

Taking the number of times the data item appears in the corresponding time slices in multiple time periods as the weight, the observed values of the access heat of the data item in the corresponding time slices in multiple time periods are weighted averaged to obtain the predicted access heat value of the data item in the next time period corresponding to the corresponding time slice. The calculation formula is:

Wherein, _xi represents the observed value of the access heat of the data item in the target time slice of the i-th time period, _wi represents the weight of the i-th time period, and n represents the number of time periods; Indicates the predicted access heat value;

6. The front-end and back-end fused hot data multi-level brake cache method according to any one of claims 4 or 5 is characterized in that the access heat observation value and the predicted access heat value of each data item in the hot data identification table are weighted averaged to revise the access heat of each data item, the weight of the access heat observation value is 1, and the weight of the predicted access heat value is the ratio of the preset time period to the preset time slice length.

7. The front-end and back-end integrated hot data multi-level brake cache method according to claim 1 is characterized in that the data items are marked with hash values of the uniform resource locators of the data in the hot data access list, the access heat accumulation table and the hot data identification table.

8. A multi-level braking cache device for hot data with front-end and back-end fusion, characterized by comprising:

The client is used to load the browser, query the proxy server for resource update status based on the negotiated cache mechanism when requesting resources, and preferentially call the data in the browser local cache based on the strong cache mechanism when there is no update;

Static resource server group, used to cache static resources;

Dynamic resource server group, used for caching dynamic resources based on distributed storage;

A proxy server is used to receive the client access request, distribute the static resource request therein to the static resource server group for querying the static resource update status and reading it when updating, and forward the dynamic resource request therein to the dynamic resource server group for querying the dynamic resource update status and reading it when updating; and, based on the bypass cache, establish a data access log under each business scenario for the dynamic resource request, count the number of accesses to each data item as the access heat according to the preset time period and preset time slice for each business scenario and write it into a hot data access list, count the hot data access list and write it into an access heat accumulation table; after discarding the data items whose access heat is less than the acceptance threshold of the hot data identification standard, sort them according to the access heat size, and store the first set number of data items into the hot data identification table; the access heat of each data item in the hot data identification table is revised by introducing trend prediction; the proxy server executes the negotiated cache mechanism and the strong cache mechanism based on the proxy server cache;

An active cache scheduling server is used to drive the browser local cache, the proxy server cache and the distributed cache to add the new hot data and delete the cold data based on the heat; and to actively update or delete the old hot data already in the browser local cache, the proxy server cache and the distributed cache based on the data change in the database based on the data drive;

The database is used to store original data and synchronize with the browser local cache, the proxy server cache and the distributed cache based on a read-write penetration strategy.

9. A computer-readable storage medium having a computer program/instruction stored thereon, wherein the computer program/instruction, when executed by a processor, implements the steps of the method as claimed in any one of claims 1 to 7.

10. A computer program product, comprising a computer program/instruction, characterized in that when the computer program/instruction is executed by a processor, the steps of the method according to any one of claims 1 to 7 are implemented.