WO2018137528A1 - Method and device for accessing resource - Google Patents
Method and device for accessing resource Download PDFInfo
- Publication number
- WO2018137528A1 WO2018137528A1 PCT/CN2018/073073 CN2018073073W WO2018137528A1 WO 2018137528 A1 WO2018137528 A1 WO 2018137528A1 CN 2018073073 W CN2018073073 W CN 2018073073W WO 2018137528 A1 WO2018137528 A1 WO 2018137528A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- resource
- target resource
- target
- type
- url
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/60—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9566—URL specific, e.g. using aliases, detecting broken or misspelled links
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/563—Data redirection of data network streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/568—Storing data temporarily at an intermediate stage, e.g. caching
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/60—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
- H04L67/63—Routing a service request depending on the request content or context
Definitions
- the present disclosure relates to the field of the Internet, and in particular, to a resource access method and apparatus.
- the increasingly abundant resources on the Internet bring convenience to users' lives.
- users' large access to resources brings congestion to the backbone network of the Internet, which not only affects the quality of user access to resources, but also gives ISPs ( Internet Service Provider (Internet Service Provider) has brought a lot of cost pressure and hindered the development of the Internet.
- ISPs Internet Service Provider (Internet Service Provider) has brought a lot of cost pressure and hindered the development of the Internet.
- the resources accessed by the user can be downloaded and cached locally, so that the user can access the resource locally.
- the Internet resource service system includes a Redirection Subsystem (RSS), a Statistical Analysis Subsystem (SAS), a Dispatching Subsystem (DSS), a Cache Subsystem (CSS), and Management Subsystem (MSS).
- RSS is used to obtain an access request sent by the user to the Internet server, and the resource information carried by the access request is sent to the CSS via the SAS and the DSS, to notify the CSS to download and cache the resource to the source station of the resource, or to access the resource.
- the request is redirected to the CSS so that the user can request access to the resource from the CSS; the CSS is used to download and cache resources from the Internet; the MSS is used to manage the various subsystems.
- the resource access process specifically includes:
- Each website on the Internet analyzes the resources of the website, finds out the resources that the website can cache, identifies them, then develops plug-ins for each website on the Internet and loads the plug-ins into RSS.
- the RSS obtains the access request, and calls the plug-in of each website to identify the resource. If the plug-in of the website identifies the resource, the access request is resolved. Get a unique identifier for the resource. Further, the RSS may send the unique identifier of the resource to the CSS via the SAS and the DSS to notify the CSS to download and cache the resource to the source station of the resource according to the unique identifier of the resource.
- the RSS analyzes the access request, and queries the resource to send a redirect message to the user after the CSS has been cached.
- the message carries the address of the CSS, so that the user can request the CSS to access the resource according to the address of the CSS, and the CSS returns the resource, thereby implementing access to the resource of the website.
- the RSS call of the plug-in of the website identifies the resource, and after obtaining the unique identifier of the resource by parsing the access request, the unique identifier of the resource is sent to the CSS, and the CSS performs the resource according to the unique identifier of the resource. Operations such as downloading, caching, and redirection to enable users to access the resource.
- plug-ins for each website In order to meet the user's access requirements for the resources of a large number of websites on the Internet, it is necessary to separately develop plug-ins for each website, which is large in development and high in cost.
- the first aspect provides a resource access method, where the method includes: when acquiring an access request of a terminal to a target resource, determining a type of the target resource according to a universal identification feature of multiple resource types, the access Requesting a Uniform Resource Locator URL carrying the target resource, the universal identification feature of each resource type is obtained by analyzing a plurality of resource samples; and obtaining a rule and a URL of the target resource according to the type of the target resource Obtaining a unique identifier of the target resource, and querying whether the target resource exists in the cache subsystem CSS according to the unique identifier of the target resource; if the target resource exists in the CSS, sending the weight to the terminal Oriented message, the redirect message carries an address of the CSS, and the terminal accesses the target resource according to the address of the CSS.
- the RSS determines the type of the target resource according to the universal identification feature of the multiple resource types, and obtains the unique identifier corresponding to the type of the target resource.
- the rule and the URL of the target resource obtain a unique identifier of the target resource. If the target resource exists in the CSS according to the unique identifier of the target resource, the redirect message is sent to the terminal, thereby redirecting the access request to On the CSS, the terminal can access the target resource from the CSS.
- the RSS can realize the identification and access of the target resources, and solves the need to separately develop plug-ins for each website, and the development amount is large.
- the problem of high cost since the plurality of types of universal identification features are obtained according to statistical analysis of multiple resource samples, the recognition rate of the resources by the RSS and the access efficiency of the resources by the terminal can be greatly improved.
- the determining, according to the universal identification feature of the multiple resource types, the type of the target resource comprises: accessing the target resource from a source station of the target resource And obtaining the format information of the target resource according to the format information of the target resource and the URL of the target resource, where the target universal identification feature is in a format corresponding to the target resource. And a universal identification feature that matches the URL of the target resource; determining a resource type corresponding to the target universal identification feature as a type of the target resource.
- the RSS obtains the format information of the target resource, and determines the target universal identification feature that matches the format information and the URL of the target resource, and determines the resource type corresponding to the target universal identification feature as the target resource.
- the type of resource type is determined with high accuracy.
- the acquiring the unique identifier of the target resource and the URL of the target resource according to the type of the target resource, and acquiring the unique identifier of the target resource includes: when When the target resource type is a video type, the full resource of the URL of the target resource is obtained as a unique identifier of the target resource, and the target resource type is a video type. If the URL of the target resource is a static link or the URL of the target resource is a dynamic link and the URL of the target resource does not include a range parameter, the full path of the URL of the target resource is obtained as the target resource.
- the URL of the target resource is a dynamic link and the URL of the target resource includes a range parameter, obtaining an absolute path of the URL of the target resource as a unique identifier of the target resource, the range The parameter is used to indicate the amount of data requested by the access request.
- the method provided by the embodiment of the present disclosure obtains the unique identifier of the target resource according to the URL of the target resource and the unique identifier acquisition rule corresponding to the type of the target resource, and the accuracy of the unique identifier acquisition is high.
- the acquiring process of the universal identification feature of the picture type includes: acquiring format information of the plurality of picture samples; determining, for each picture sample, format information of the picture sample as Identifying a feature of the picture sample; determining, according to the identification feature of the plurality of picture samples, a universal recognition feature of the picture type, wherein the universal recognition feature of the picture type is occupied by the identification feature of the plurality of picture samples The ratio is greater than the identification feature of the first specified ratio.
- the method provided by the embodiment of the present disclosure determines the identification features of the plurality of picture samples according to the format information of the plurality of picture samples, and further determines the universal identification feature of the picture type, so that the RSS can be implemented by using the universal recognition feature of the picture type.
- the recognition of picture resources improves the recognition rate.
- the acquiring process of the universal identification feature of any one of the web page text type, the download type, the audio type, or the video type includes: obtaining, for each resource type, the resource type Format information of the plurality of target samples and a URL of the plurality of target samples; determining identification features of the plurality of target samples of the resource type, the identification features of each target sample are used to describe format information of the target sample and a URL of the target sample; determining a universal recognition feature of the resource type according to the identification feature of the plurality of target samples, wherein the universal recognition feature of the resource type is occupied by the identification feature of the plurality of target samples The ratio is greater than the identification feature of the second specified ratio.
- the method provided by the embodiment of the present disclosure determines, for any resource type of a webpage text type, a download type, an audio type, or a video type, an identification feature of the plurality of target samples according to format information of the plurality of target samples of the resource type,
- the universal identification feature of the resource type is further determined, so that the RSS can identify the resource of the resource type by using the universal identification feature of the resource type, and improve the recognition rate.
- a second aspect provides a resource access device, where the device includes a plurality of function modules, and the plurality of function modules are used to execute the resource access method provided by the first aspect and any possible implementation manner thereof.
- a resource access device comprising: a processor; a memory for storing processor-executable instructions; and the executable instruction is configured to: when acquiring an access request of the terminal to the target resource Determining, according to the universal identification feature of the multiple resource types, the type of the target resource, the access request carrying a uniform resource locator URL of the target resource, and the universal identification feature of each resource type by analyzing multiple resource samples Obtaining: a unique identifier obtaining rule and a URL of the target resource corresponding to the type of the target resource, acquiring a unique identifier of the target resource; and querying whether the cache subsystem CSS exists according to the unique identifier of the target resource And the target resource; if the target resource exists in the CSS, sending a redirect message to the terminal, where the redirect message carries an address of the CSS, and the terminal is based on the address of the CSS Accessing the target resource.
- the executable instruction is configured to: obtain, according to the response information of the access request of the source resource of the target resource to the target resource, format information of the target resource; Determining, by the format information of the resource and the URL of the target resource, a target universal identification feature, the target universal identification feature being a universal recognition feature matching the format information of the target resource and the URL of the target resource; The resource type corresponding to the universal identification feature is determined as the type of the target resource.
- the executable instruction is configured to: when the type of the target resource is a picture type, a webpage text type, an application download type, or an audio type, obtain a full path of the URL of the target resource. a unique identifier of the target resource; when the type of the target resource is a video type, if the URL of the target resource is a static link or the URL of the target resource is a dynamic link and the URL of the target resource is not Include a range parameter, the full path of the URL of the target resource is obtained as a unique identifier of the target resource, and if the URL of the target resource is a dynamic link and the URL of the target resource includes a range parameter, The absolute path of the URL of the target resource is obtained as a unique identifier of the target resource, and the range parameter is used to indicate the amount of data requested by the access request.
- the executable instruction is configured to: obtain format information of a plurality of picture samples; for each picture sample, determine format information of the picture sample as an identification feature of the picture sample; Identifying features of the plurality of picture samples, determining a universal recognition feature of the picture type, wherein the universal recognition feature of the picture type is that the proportion of the identification features of the plurality of picture samples is greater than the first specified ratio feature.
- the executable instruction is configured to: obtain, for each resource type, format information of a plurality of target samples of the resource type and a URL of the multiple target samples; determine the resource type An identification feature of the plurality of target samples, the identification feature of each target sample is used to describe format information of the target sample and a URL of the target sample; determining the resource type according to the identification features of the plurality of target samples
- the universal identification feature, the universal identification feature of the resource type is an identification feature in which the proportion of the identification features of the plurality of target samples is greater than the second specified ratio.
- FIG. 1 is a schematic structural diagram of a resource access system according to an embodiment of the present disclosure
- FIG. 2 is a schematic structural diagram of a terminal 101 according to an embodiment of the present disclosure
- FIG. 3 is a schematic structural diagram of a resource access apparatus 300 according to an embodiment of the present disclosure.
- FIG. 4 is a flowchart of a resource access method according to an embodiment of the present disclosure.
- FIG. 5 is a schematic structural diagram of a resource access apparatus according to an embodiment of the present disclosure.
- FIG. 1 is a schematic structural diagram of a resource access system according to an embodiment of the present disclosure.
- the system architecture includes a terminal 101, an RSS 102, a SAS 103, a DSS 104, a CSS 105, and an MSS 106.
- the RSS 102 is configured to obtain an access request sent by the terminal 101 to the Internet server, send the access request to the CSS 105 via the SAS 103 and the DSS 104, or redirect the access request to the CSS 105, and replace the source station with the CSS 105.
- Service SAS 103 is used to send an access request sent by RSS 102 to DSS 104
- DSS 104 is used to send an access request to CSS 105 and is responsible for synchronizing resource indexes
- CSS 105 is used to download and cache resources from the Internet, such that the terminal The 101 may preferentially access the required resources from the CSS 105
- the MSS 106 is used to manage the RSS 102, the SAS 103, the DSS 104, and the CSS 105.
- FIG. 2 is a schematic structural diagram of a terminal 101 according to an embodiment of the present disclosure.
- the terminal 101 includes:
- the terminal 101 may include an RF (Radio Frequency) circuit 110, a memory 120 including one or more computer readable storage media, an input unit 130, a display unit 140, a sensor 150, an audio circuit 160, and a WiFi (Wireless Fidelity, wireless).
- the fidelity module 170 includes a processor 180 having one or more processing cores, and a power supply 190 and the like. It will be understood by those skilled in the art that the terminal structure shown in FIG. 2 does not constitute a limitation to the terminal, and may include more or less components than those illustrated, or a combination of certain components, or different component arrangements. among them:
- the RF circuit 110 can be used for transmitting and receiving information or during a call, and receiving and transmitting signals. Specifically, after receiving downlink information of the base station, the downlink information is processed by one or more processors 180. In addition, the data related to the uplink is sent to the base station. .
- the RF circuit 110 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (Low Noise Amplifier). , duplexer, etc.
- RF circuitry 110 can also communicate with the network and other devices via wireless communication.
- Wireless communication can use any communication standard or protocol, including but not limited to GSM (Global System of Mobile communication), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access) Divisional Multiple Access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), e-mail, SMS (Short Messaging Service), and the like.
- GSM Global System of Mobile communication
- GPRS General Packet Radio Service
- CDMA Code Division Multiple Access
- WCDMA Wideband Code Division Multiple Access
- LTE Long Term Evolution
- e-mail Short Messaging Service
- the memory 120 can be used to store software programs and modules, and the processor 180 executes various functional applications and data processing by running software programs and modules stored in the memory 120.
- the memory 120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored according to The data created by the use of the terminal 101 (such as audio data, phone book, etc.) and the like.
- memory 120 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 120 may also include a memory controller to provide access to memory 120 by processor 180 and input unit 130.
- the input unit 130 can be configured to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function controls.
- input unit 130 can include touch-sensitive surface 131 as well as other input devices 132.
- Touch-sensitive surface 131 also referred to as a touch display or trackpad, can collect touch operations on or near the user (such as a user using a finger, stylus, etc., on any suitable object or accessory on touch-sensitive surface 131 or The operation near the touch-sensitive surface 131) and driving the corresponding connecting device according to a preset program.
- the touch-sensitive surface 131 can include two portions of a touch detection device and a touch controller.
- the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information.
- the processor 180 is provided and can receive commands from the processor 180 and execute them.
- the touch-sensitive surface 131 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
- the input unit 130 can also include other input devices 132.
- other input devices 132 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.
- the display unit 140 can be used to display information input by the user or information provided to the user and various graphical user interfaces of the terminal 101, which can be composed of graphics, text, icons, video, and any combination thereof.
- the display unit 140 may include a display panel 141.
- the display panel 141 may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like.
- the touch-sensitive surface 131 may cover the display panel 141, and when the touch-sensitive surface 131 detects a touch operation thereon or nearby, it is transmitted to the processor 180 to determine the type of the touch event, and then the processor 180 according to the touch event The type provides a corresponding visual output on display panel 141.
- touch-sensitive surface 131 and display panel 141 are implemented as two separate components to implement input and input functions, in some embodiments, touch-sensitive surface 131 can be integrated with display panel 141 for input. And output function.
- Terminal 101 may also include at least one type of sensor 150, such as a light sensor, motion sensor, and other sensors.
- the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 141 according to the brightness of the ambient light, and the proximity sensor may close the display panel 141 when the terminal 101 moves to the ear. / or backlight.
- the gravity acceleration sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity.
- the terminal 101 can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, here Let me repeat.
- the audio circuit 160, the speaker 161, and the microphone 162 can provide an audio interface between the user and the terminal 101.
- the audio circuit 160 can transmit the converted electrical data of the received audio data to the speaker 161 for conversion to the sound signal output by the speaker 161; on the other hand, the microphone 162 converts the collected sound signal into an electrical signal by the audio circuit 160. After receiving, it is converted into audio data, and then processed by the audio data output processor 180, transmitted to the terminal, for example, via the RF circuit 110, or outputted to the memory 120 for further processing.
- the audio circuit 160 may also include an earbud jack to provide communication of the peripheral earphones with the terminal 101.
- WiFi is a short-range wireless transmission technology
- the terminal 101 can help users to send and receive emails, browse web pages, and access streaming media through the WiFi module 170, which provides wireless broadband Internet access for users.
- FIG. 2 shows the WiFi module 170, it can be understood that it does not belong to the essential configuration of the terminal 101, and may be omitted as needed within the scope of not changing the essence of the invention.
- the processor 180 is a control center of the terminal 101 that connects various portions of the entire handset with various interfaces and lines, by running or executing software programs and/or modules stored in the memory 120, and recalling data stored in the memory 120, The various functions and processing data of the terminal 101 are performed to perform overall monitoring of the mobile phone.
- the processor 180 may include one or more processing cores; preferably, the processor 180 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
- the modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 180.
- the terminal 101 also includes a power source 190 such as a battery for powering various components.
- the power source can be logically coupled to the processor 180 through a power management system to manage functions such as charging, discharging, and power management through the power management system.
- Power supply 190 may also include any one or more of a DC or AC power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
- the terminal 101 may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
- the display unit of the terminal is a touch screen display
- the terminal further includes a memory, and executable instructions, wherein the executable instructions are stored in the memory and configured to be executed by one or more processors.
- FIG. 3 is a schematic structural diagram of a resource access apparatus 300 according to an embodiment of the present disclosure.
- device 300 can be provided as any of RSS, SAS, DSS, CSS, or MSS.
- apparatus 300 includes a processing component 322 that further includes one or more processors, and memory resources represented by memory 332 for storing instructions executable by processing component 322, such as an application.
- An application stored in memory 332 may include one or more modules each corresponding to a set of instructions.
- processing component 322 is configured to execute instructions to perform the methods of the embodiments illustrated in FIG. 4 described below.
- Device 300 may also include a power supply component 326 configured to perform power management of device 300, a wired or wireless network interface 350 configured to connect device 300 to the network, and an input/output (I/O) interface 358.
- Device 300 can operate based on an operating system stored in memory 332, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
- FIG. 4 is a schematic flowchart of a domain name access method according to an embodiment of the present disclosure. This embodiment uses the resource access device as an RSS as an example. Referring to FIG. 4, the method includes:
- the RSS acquires a universal identification feature of multiple resource types and a unique identifier acquisition rule of the multiple resource types, and the universal recognition feature of each resource type is obtained by analyzing multiple resource samples.
- the plurality of resource types include a picture type, a webpage text type, a download type, an audio type, and a video type.
- the universal identification feature of each resource type may be obtained by analyzing a plurality of resource samples on the Internet, and after obtaining the universal identification features of the multiple resource types, generating the multiple resources based on the universal identification features of the multiple resource types.
- a plug-in of a type the RSS can acquire a universal identification feature of the multiple resource types by loading the plug-ins of the multiple resource types.
- the RSS may also establish a resource type identification model based on the universal identification features of the multiple resource types to obtain a universal identification feature of the multiple resource types, which is not limited by the embodiment of the disclosure.
- the RSS is taken as an example for obtaining the universal identification feature of the multiple resource types, and the obtaining may be obtained from any device, or may be managed by RSS or system. Personnel analyze and obtain.
- step 401a The process of obtaining the universal identification features of the multiple resource types will be described below by using the processes in step 401a and step 401b, respectively.
- the acquisition process of the universal identification feature X1 may include steps 401a1 to 401a3:
- 401a1 Get format information of multiple picture samples.
- the format information of the picture sample is used to indicate the resource type and resource format of the picture sample.
- the format information of the image sample may be image/jpg, where image indicates that the resource type of the image sample is a picture, and jpg indicates that the resource format of the image sample is a jpg format.
- the format information of the plurality of picture samples may be obtained from the download record of the plurality of picture samples, and the format information of the picture sample is saved in the download record of each picture sample.
- the format information of the plurality of picture samples is counted, and the statistical results are as follows:
- Jpg, gif, png, jpeg suffixes account for about 85%
- the picture resource is generally a cacheable static picture resource. Therefore, the picture resource can be identified according to the format information, that is, the format information of the picture resource can be determined as the identification feature of the picture resource.
- the format information of the picture sample is determined as an identification feature of the picture sample.
- the format information of the picture resource may be determined as the identification feature of the picture resource in step 401a1. For each picture sample, the format information of the picture sample may be determined as the identification feature of the picture sample.
- the image/jpg can be determined as the identification feature of the picture sample A.
- the identification features of the plurality of picture samples can be obtained, as shown in the statistical result in step 401a1.
- 401a3 Determine, according to the identification feature of the plurality of picture samples, a universal recognition feature of the picture type, where the universal recognition feature of the picture type is an identification feature that is greater than a first specified ratio in the identification features of the plurality of picture samples.
- the first specified ratio may be selected by the developer.
- the first specified ratio may be 0.7%.
- the recognition features of the plurality of picture samples whose proportions are larger than the first specified ratio (0.7%) include image/jpeg, image/png, image/gif, image/webp, and image. /jpg, the proportion is 64.0588%, 16.9975%, 15.1025%, 2.31268% and 0.740895%. Therefore, the universal recognition feature X1 of the picture type may include: the format information is image/jpeg, image/png, image/gif, image/jpg, and image/webp.
- the recognition rate is 85%, and in the embodiment of the present disclosure, the recognition of the universal recognition feature X1 for the picture type (identification image/ Jpeg, image/png, image/gif, image/webp, and image/jpg), the recognition rate can be increased to 98%.
- the acquisition process of the universal identification feature X2 to X5 may include steps 401b1 to 401b3:
- 401b1 For each resource type, obtain format information of a plurality of target samples of the resource type and a URL of the plurality of target samples.
- the obtaining the format information of the plurality of target samples is the same as the step 401a1, and the URL of each target sample is a link corresponding to the access request of the target sample, and is carried by the access request of the target sample, so In the access request of the target samples, the URLs of the plurality of target samples are obtained.
- the resource type is the webpage text type
- the target sample is a webpage text sample as an example
- the format information of the text samples of the plurality of webpages is counted, and the statistical result is as follows:
- Text/html, text/plain, text/javascript, text/css, text/xml account for 99% of the total traffic
- the proportions of the webpage text samples in which the URLs are statically linked and dynamically linked in the plurality of webpage text samples are 57% and 42%, respectively. It may be noted that there may be a dynamic link resource that cannot be directly cached in the webpage text resource. Therefore, in order to ensure the accuracy of the identification, the webpage text resource cannot be identified only according to the format information, but the webpage text resource needs to be identified by combining the format information and the URL. That is, the identification information of the webpage text resource can be determined by combining the format information and the URL.
- the resource type is the download type
- the target sample is a download sample as an example
- the format information of the plurality of downloaded samples is counted, and the statistical result is as follows:
- the proportion of download samples with static URLs and dynamic links in the plurality of download samples is 51.4511% and 48.8549%, respectively. It is indicated that there may be a dynamic link resource that cannot be directly cached in the download resource. Therefore, in order to ensure the accuracy of the identification, the download resource cannot be identified only according to the format information, and the download resource needs to be identified by combining the format information and the URL, that is, the combination can be combined.
- the format information and the URL determine the identification characteristics of the downloaded resource.
- Traffic identification 94.0494% (MIME and URL suffix match, whether dynamic or static)
- Audio/mpeg accounts for 97% of total traffic
- the proportion of the audio samples with static and dynamic links in the plurality of audio samples is 87.2962% and 12.7038%, respectively. It is indicated that there may be dynamic link resources that cannot be directly cached in the audio resource. Therefore, in order to ensure the accuracy of the identification, the audio resource cannot be identified only according to the format information, but the audio resource needs to be combined with the format information and the URL, that is, the audio resource can be combined.
- the format information and the URL determine the identification characteristics of the audio resource.
- Traffic identification 83.0772% (MIME and URL suffix match, whether dynamic or static)
- the proportion of the target samples with static URLs and dynamic links in the plurality of video samples is 39.5552% and 60.4448%, respectively. It is indicated that there may be dynamic link resources that cannot be directly cached in the video resource. Therefore, in order to ensure the accuracy of the identification, the video resource cannot be identified only according to the format information, but the video resource needs to be identified by combining the format information and the URL, that is, the video resource can be combined.
- the format information and the URL determine the identification characteristics of the video resource.
- 401b2 Determine an identification feature of the plurality of target samples of the resource type, and the identification feature of each target sample is used to describe format information of the target sample and a URL of the target sample.
- the identification feature of the webpage text resource may be determined by combining the format information and the URL.
- the identification feature of the webpage text sample may be determined by combining the format information and the URL.
- the "format information is text/javascript and the suffix of the URL is js" can be determined as a webpage text sample.
- the identification feature of B Through the step 401b2, the identification features of the plurality of webpage text samples can be obtained, and the statistical results of the identification features of the plurality of webpage text samples are as follows:
- format information contains the same text type and URL suffix (such as format information is text/xml, URL suffix is xml) 4.07%
- the format information is text/javascript and the suffix of the URL is js (regardless of whether the URL is dynamic link or static link) 16.99%
- the format information is text/html and the suffix of the URL is htm or html (regardless of whether the URL is dynamic or static) 12.56%
- the format information is text/html and the URL is the link with the domain name plus "/" (similar to http://xxx.com/) or the domain name plus the absolute path and the link ending with "/" (similar to http ://xxx.com/yyy/)2.38%
- the identification feature of the download resource may be determined by combining the format information and the URL.
- the identification feature of the downloaded sample may be determined by combining the format information and the URL.
- the format information of the downloaded sample C is application/octet-stream and the URL is http://xxx.com/yyy
- the format information "application/octet-stream and the URL is statically linked" can be determined as the download sample C.
- Identification feature Through the step 401b2, the identification features of the plurality of downloaded samples can be obtained, and the statistical results of the identification features of the plurality of downloaded samples are as follows:
- the format information is application/vnd.android.package-archive or application/zip or application/pdf and the URL is suffixed with apk or zip or pdf (regardless of whether the URL is dynamic or static) 3.38%
- the identification feature of the audio resource may be determined by combining the format information and the URL, and for each audio sample, the identification feature of the audio sample may be determined in combination with the format information and the URL.
- the identification features of the plurality of audio samples can be obtained, and the statistical results of the identification features of the plurality of audio samples are as follows:
- the format information is audio/ogg and the suffix of the URL is ogg 1.50%
- the format information is audio/mpeg or application/octet-stream or audio/mp3 and the suffix of the URL is mp3 96.89%
- the identification feature of the video resource may be determined by combining the format information and the URL.
- the identification feature of the video sample may be determined by combining the format information and the URL.
- the identification features of the plurality of video samples can be obtained, and the statistical results of the identification features of the plurality of video samples are as follows:
- the format information is video/MP2T or video/mp2t or video/m2ts and the suffix of the URL is ts 8.30%
- format information is video/x-flv or video/flv and the URL suffix is flv 1.20%
- the format information is video/3gpp and the URL suffix is 3gp 2.17%
- the format information is video/webm and the URL suffix is webm 0.03%
- 401b3 Determine, according to the identification feature of the plurality of target samples, a universal identification feature of the resource type, where the universal recognition feature of the resource type is that the proportion of the identification features of the plurality of target samples is greater than a second specified ratio .
- the second specified ratio may be selected by the developer, and the values of the second specified ratio may be different for different resource types.
- the second specified ratio may be 2.3%.
- the recognition features of the plurality of webpage text samples whose proportions are larger than the second specified ratio include X2a to X2d:
- the format information contains the same text type and URL suffix (such as the format information is text/xml, the URL suffix is xml);
- the format information is text/javascript and the suffix of the URL is js (regardless of whether the URL is a dynamic link or a static link);
- the format information is text/html and the suffix of the URL is htm or html (regardless of whether the URL is a dynamic link or a static link);
- the format information is text/html and the URL is the link with the domain name plus "/" (similar to http://xxx.com/) or the domain name plus the absolute path and the link ending with "/" (similar to http ://xxx.com/yyy/).
- the universal identification feature X2 of the web page text type may include the above X2a to X2d.
- the recognition rate is 20%
- the identification of the universal recognition feature X2 for the web page text type in the embodiment of the present disclosure (identification X2a) To X2d)
- the recognition rate can be increased to 36%.
- the second specified ratio may be 3.3%.
- the recognition features of the plurality of downloaded samples whose proportions are larger than the second specified ratio include X3a and X3b:
- the format information is application/octet-stream and the URL is a static link;
- the format information is application/vnd.android.package-archive or application/zip or application/pdf and the URL is suffixed with apk or zip or pdf (regardless of whether the URL is dynamic or static).
- the proportion of X3a and X3b is 51.15% and 3.38%, respectively. Therefore, the general identification feature X3 of the download type may include the above X3a and X3b.
- the recognition rate is 27%, and in the embodiment of the present disclosure, for the identification of the universal recognition feature X3 of the download type (identification X3a and X3b), the recognition rate can be increased to 54. %.
- the second specified ratio may be 1.5%.
- the recognition features of the plurality of audio samples that occupy a larger proportion than the second specified ratio include X4a and X4b:
- the format information is audio/ogg and the suffix of the URL is ogg.
- the format information is audio/mpeg or application/octet-stream or audio/mp3 and the suffix of the URL is mp3.
- the proportion of X4a and X4b is 1.50% and 96.89%, respectively. Therefore, the universal identification feature X4 of the download type may include X4a and X4b described above.
- the recognition rate is 87%, and in the embodiment of the present disclosure, for the identification of the universal identification feature X4 of the download type (identification X4a and X4b), the recognition rate can be improved to 98.39. %.
- the second specified ratio may be 0.03%.
- the recognition features of the plurality of video samples whose proportions are larger than the second specified ratio include X5a to X5e:
- the format information is video/mp4 and the suffix of the URL is mp4.
- the format information is video/MP2T or video/mp2t or video/m2ts and the suffix of the URL is ts.
- the format information is video/x-flv or video/flv and the suffix of the URL is flv.
- the format information is video/3gpp and the suffix of the URL is 3gp.
- the format information is video/webm and the suffix of the URL is webm.
- the universal identification feature X5 of the video type may include the above X5a to X5e.
- the recognition rate is 39%, and in the embodiment of the present disclosure, for the identification of the universal identification feature X4 of the download type (identification X4a and X4b), the recognition rate can be improved to 83. %.
- the unique identifier acquisition rules of the multiple resource types may also be acquired.
- the unique identifier obtaining rule of the multiple resource types may be: when the type of the resource is a picture type, a webpage text type, an application download type, or an audio type, the full path of the URL of the resource is obtained as a unique identifier of the resource;
- the type of the resource is a video type, if the URL of the resource is a static link or the URL of the resource is a dynamic link and the URL of the resource does not include a range parameter, the full path of the URL of the resource is obtained as the unique
- the identifier if the URL of the resource is a dynamic link and the URL of the resource includes a range parameter, the absolute path of the URL of the resource is obtained as a unique identifier of the resource.
- the range parameter is used to indicate the amount of data requested by the access request.
- the range parameter may be range, and the value of the range parameter
- the URL is static link means that the URL does not contain "?”, similar to http://xxx.com/yyy/zzz.jpg;
- the full path of the URL refers to the entire URL; the absolute path of the URL refers to the part before the "?" in the URL.
- the identification feature of the target sample is determined by combining the format information and the URL in step 401b, thereby determining the universal recognition feature of the webpage text type, the download type, the audio type, and the video type.
- the identification features of the target samples may also be determined based only on the format information, thereby determining the universal recognition features of the resource types.
- the determination process in this case is the same as the determination process of the universal recognition features of the picture types in step 401a. .
- the terminal sends an access request to the target resource to the Internet server, where the access request carries the URL of the target resource.
- the URL of the target resource is a link corresponding to the access request of the target resource.
- the URL of the target resource can be http://xxx.com/yyy.js.
- the RSS obtains an access request of the terminal to the target resource, and determines a type of the target resource according to the universal identification feature of the multiple resource types.
- the RSS when the terminal sends an access request for the target resource to the Internet server, the RSS may obtain the access request.
- the RSS may obtain the access request in the bypass networking mode, the RSS may obtain the access request by monitoring, or In the direct route networking mode, the RSS may be used as a proxy server to obtain the access request by means of a direct receiving manner.
- the specific manner of obtaining the access request by the RSS is not limited in the embodiment of the present disclosure.
- the determining the type of the target resource according to the universal identification feature of the plurality of resource types may include steps 403a to 403c:
- the source station of the target resource may refer to an Internet server.
- the Internet server responds to the access request, such as sending a response message to the terminal, where the response information includes the target resource. Format information.
- the RSS can obtain the response information.
- the RSS can obtain the response information by monitoring, or in the direct network mode, the RSS can act as a proxy.
- the server obtains the response information by means of direct receiving, and further obtains format information of the target resource included in the response information.
- the format information of the target resource can be text/javascript.
- the format information of the target resource is located at a header of the response information of the access request, and is used to indicate a resource format of the target resource.
- 403b Determine, according to the format information of the target resource and the URL of the target resource, a target universal identification feature, where the target universal recognition feature is a universal recognition feature that matches format information of the target resource and a URL of the target resource.
- the RSS After the RSS obtains the format information of the target resource in step 403a, the format information and the URL of the target resource are matched with the common identification features X1 to X5 of the plurality of resource types obtained in step 401, respectively, and the target resource is determined therefrom.
- Universal identification feature For example, the format information of the target resource is text/javascript, and the URL is http://xxx.com/yyy.js.
- the RSS may determine that the X2b in the universal identification feature X2 can be formatted with the target resource. Matching with the URL, therefore, the RSS can determine the universal identification feature X2 as the target universal recognition feature.
- the resource type corresponding to the target universal identification feature (such as the universal recognition feature X2) determined by step 403b is a webpage text type. Therefore, it can be determined that the type of the target resource is a webpage text type.
- the RSS obtains a unique identifier of the target resource according to a unique identifier obtaining rule corresponding to the type of the target resource, and obtains a unique identifier of the target resource.
- the step 404 may include: when the type of the target resource is a picture type, a webpage text type, an application download type, or an audio type, the RSS The full path of the URL is obtained as a unique identifier of the target resource; when the type of the target resource is a video type, if the URL of the target resource is a static link or the URL of the target resource is a dynamic link and the URL of the target resource is not Including the range parameter, the RSS obtains the full path of the URL of the target resource as a unique identifier of the target resource. If the URL of the target resource is a dynamic link and the URL of the target resource includes a range parameter, the RSS uses the target resource. The absolute path of the URL is obtained as a unique identifier of the target resource, and the range parameter is used to indicate the amount of data requested by the access request.
- the type of the target resource is a webpage text type
- the URL of the target resource is http://xxx.com/yyy.js
- the http://xxx.com/yyy.js is obtained.
- a unique identifier for the target resource is obtained.
- the RSS sends a download notification to the CSS to the CSS, where the download notification carries a unique identifier of the target resource.
- the RSS may send the unique identifier of the target resource to the CSS in the form of a download notification, for notifying the CSS to download the target resource.
- the RSS can send the download notification to the CSS via SAS and DSS.
- the CSS receives the download notification, and downloads and caches the target resource according to the unique identifier of the target resource.
- the CSS may download and cache the target resource from the source station (Internet server) of the target resource according to the unique identifier of the target resource carried in the download notification.
- the unique identifier of the target resource may be transmitted to the DSS, where the unique identifier of the target resource is recorded in the resource index, and the resource index of the DSS is used to record all caches of the CSS.
- the unique identifier of the resource may be transmitted to the DSS, where the unique identifier of the target resource is recorded in the resource index, and the resource index of the DSS is used to record all caches of the CSS.
- Steps 401 to 406 are the process of acquiring the unique identifier of the target resource when the first access to the terminal accesses the target resource, and sending the unique identifier of the target resource to the CSS, and the CSS downloads and caches the target resource.
- the terminal sends an access request to the target resource to the Internet server, where the access request carries the URL of the target resource.
- the RSS obtains an access request of the terminal to the target resource, and determines a type of the target resource according to the universal identification feature of the multiple resource types.
- the RSS obtains a unique identifier of the target resource according to the unique identifier obtaining rule corresponding to the type of the target resource, and obtains a unique identifier of the target resource.
- Steps 407 to 409 are the same as steps 402 to 404, and are not described herein again.
- the RSS queries whether the target resource exists in the CSS according to the unique identifier of the target resource.
- the resource index of the DSS is used to record the unique identifier of all cached resources of the CSS.
- the step 410 may include: the RSS sends a query message to the scheduling subsystem DSS, where the query message carries a unique identifier of the target resource, and the DSS queries whether the unique identifier of the target resource is recorded in the resource index; if the RSS receives To the specified response message returned by the DSS, it is determined that the target resource exists in the CSS, and the specified response message is used to indicate that the unique identifier of the target resource is recorded in the resource index of the DSS.
- the RSS may send a redirect message to the terminal to redirect the access request of the terminal to the CSS.
- the terminal receives the redirect packet, and accesses the target resource according to the address of the CSS.
- the step 412 may include: the terminal sending an access request for the target resource to the CSS according to the address of the CSS carried in the redirect message; and the CSS returns the target resource to the terminal.
- Steps 407 to 412 are processes in which the RSS re-accesses the access request of the terminal to the target resource, and redirects the access request to the CSS, so that the terminal can access the target resource from the CSS.
- the RSS determines the type of the target resource according to the universal identification feature of the multiple resource types, and obtains the unique identifier corresponding to the type of the target resource.
- the rule and the URL of the target resource obtain a unique identifier of the target resource. If the target resource exists in the CSS according to the unique identifier of the target resource, the redirect message is sent to the terminal, thereby redirecting the access request to On the CSS, the terminal can access the target resource from the CSS.
- the RSS can realize the identification and access of the target resources, and solves the need to separately develop plug-ins for each website, and the development amount is large.
- the problem of high cost since the plurality of types of universal identification features are obtained according to statistical analysis of multiple resource samples, the recognition rate of the resources by the RSS and the access efficiency of the resources by the terminal can be greatly improved.
- FIG. 5 is a schematic structural diagram of a resource access apparatus according to an embodiment of the present disclosure.
- the apparatus includes a determining module 501, an obtaining module 502, a querying module 503, and a transmitting module 504.
- the determining module 501 is configured to determine, according to a universal identification feature of the multiple resource types, a type of the target resource, where the access request carries a uniform resource locator URL of the target resource, when the terminal obtains an access request for the target resource.
- the universal identification feature of each resource type is obtained by analyzing multiple resource samples;
- the obtaining module 502 is configured to obtain a unique identifier of the target resource according to the unique identifier obtaining rule and the URL of the target resource corresponding to the type of the target resource;
- the querying module 503 is configured to query whether the target resource exists in the cache subsystem CSS according to the unique identifier of the target resource.
- the sending module 504 is configured to: if the target resource exists in the CSS, send a redirect message to the terminal, where the redirect message carries the address of the CSS, and the terminal accesses the target resource according to the address of the CSS. .
- the determining module 501 is configured to perform the foregoing step 403.
- the obtaining module 502 is configured to perform any one of the foregoing steps 401.
- the obtaining module 502 is configured to perform the above step 404.
- the RSS determines the type of the target resource according to the universal identification feature of the multiple resource types, and obtains a rule according to the unique identifier corresponding to the type of the target resource.
- the URL of the target resource obtains the unique identifier of the target resource. If the target resource exists in the CSS according to the unique identifier of the target resource, the redirect message is sent to the terminal, thereby redirecting the access request to the CSS. So that the terminal can access the target resource from the CSS.
- the RSS can realize the identification and access of the target resources, and solves the need to separately develop plug-ins for each website, and the development amount is large.
- the problem of high cost since the plurality of types of universal identification features are obtained according to statistical analysis of multiple resource samples, the recognition rate of the resources by the RSS and the access efficiency of the resources by the terminal can be greatly improved.
- non-transitory computer readable storage medium comprising instructions, such as a memory comprising instructions executable by a processor in a resource access device to perform resource access in the above embodiments method.
- the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
- a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
- the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
本申请要求于2017年1月25日提交中国专利局、申请号为201710056394.6、发明名称为“资源访问方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. PCT Application No. No. No. No. No. No. No. No. No. No. No. No
本公开涉及互联网领域,特别涉及一种资源访问方法及装置。The present disclosure relates to the field of the Internet, and in particular, to a resource access method and apparatus.
随着互联网的发展,互联网上日益丰富的资源给用户生活带来便利的同时,用户对资源的大量访问给互联网的骨干网带来了拥塞,不仅影响用户对资源的访问质量,而且给ISP(Internet Service Provider,互联网服务提供商)带来了很大的成本压力,阻碍了互联网的发展。为了减少骨干网的拥塞,可以将用户访问的资源下载并缓存在本地,使得用户可以到本地来访问该资源。With the development of the Internet, the increasingly abundant resources on the Internet bring convenience to users' lives. At the same time, users' large access to resources brings congestion to the backbone network of the Internet, which not only affects the quality of user access to resources, but also gives ISPs ( Internet Service Provider (Internet Service Provider) has brought a lot of cost pressure and hindered the development of the Internet. In order to reduce the congestion of the backbone network, the resources accessed by the user can be downloaded and cached locally, so that the user can access the resource locally.
目前,互联网资源服务系统包括重定向子系统(Redirection Subsystem,RSS)、统计分析子系统(Statistical Analysis Subsystem,SAS)、调度子系统(Dispatching Subsystem,DSS)、缓存子系统(Cache Subsystem,CSS)和管理子系统(Management Subsystem,MSS)。其中,RSS用于获取用户发送给互联网服务器的访问请求,将访问请求所携带的资源信息经由SAS和DSS发送给CSS,以通知CSS去该资源的源站下载并缓存该资源,或将该访问请求重定向到CSS,使得用户可以向该CSS请求访问该资源;CSS用于从互联网下载并缓存资源;MSS用于对各个子系统进行管理。基于该互联网资源服务系统,资源访问过程具体包括:Currently, the Internet resource service system includes a Redirection Subsystem (RSS), a Statistical Analysis Subsystem (SAS), a Dispatching Subsystem (DSS), a Cache Subsystem (CSS), and Management Subsystem (MSS). The RSS is used to obtain an access request sent by the user to the Internet server, and the resource information carried by the access request is sent to the CSS via the SAS and the DSS, to notify the CSS to download and cache the resource to the source station of the resource, or to access the resource. The request is redirected to the CSS so that the user can request access to the resource from the CSS; the CSS is used to download and cache resources from the Internet; the MSS is used to manage the various subsystems. Based on the Internet resource service system, the resource access process specifically includes:
互联网上的每个网站对该网站的资源进行分析,找出该网站可缓存的资源,并识别它们,然后针对互联网上的每个网站开发插件,并将这些插件加载到RSS中。当用户向互联网服务器发送对某个网站的资源的访问请求时,RSS获取该访问请求,并调用各个网站的插件对该资源进行识别,如果该网站的插件识别到该资源,则通过解析访问请求得到该资源的唯一标识。进而,RSS可以将该资源的唯一标识经由SAS和DSS发送给CSS,以通知CSS根据该资源的唯一标识,去该资源的源站下载并缓存该资源。当用户再次向互联网服务器发送对该资源的访问请求时,RSS获取该访问请求后,由该RSS对该访问请求进行解析,并查询该资源在CSS已经缓存后,向用户发送重定向报文,该报文携带该CSS的地址,使得用户可以根据该CSS的地址,向该CSS请求访问该资源,由该CSS返回该资源,从而实现对该网站的资源的访问。Each website on the Internet analyzes the resources of the website, finds out the resources that the website can cache, identifies them, then develops plug-ins for each website on the Internet and loads the plug-ins into RSS. When the user sends an access request to a resource of a certain website to the Internet server, the RSS obtains the access request, and calls the plug-in of each website to identify the resource. If the plug-in of the website identifies the resource, the access request is resolved. Get a unique identifier for the resource. Further, the RSS may send the unique identifier of the resource to the CSS via the SAS and the DSS to notify the CSS to download and cache the resource to the source station of the resource according to the unique identifier of the resource. When the user sends the access request to the Internet server again, after the RSS obtains the access request, the RSS analyzes the access request, and queries the resource to send a redirect message to the user after the CSS has been cached. The message carries the address of the CSS, so that the user can request the CSS to access the resource according to the address of the CSS, and the CSS returns the resource, thereby implementing access to the resource of the website.
在实现本发明的过程中,发明人发现现有技术至少存在以下问题:In the process of implementing the present invention, the inventors have found that the prior art has at least the following problems:
上述技术中,RSS调用该网站的插件识别到该资源,并通过解析访问请求得到该资源的唯一标识后,将该资源的唯一标识发送给CSS,由CSS根据该资源的唯一标识,进行该资源的下载、缓存和重定向等操作,从而实现用户对该资源的访问。为了满足用户对互联网上大量网站的资源的访问需求,需要为每个网站单独开发插件,开发量大,成本高。In the above technology, the RSS call of the plug-in of the website identifies the resource, and after obtaining the unique identifier of the resource by parsing the access request, the unique identifier of the resource is sent to the CSS, and the CSS performs the resource according to the unique identifier of the resource. Operations such as downloading, caching, and redirection to enable users to access the resource. In order to meet the user's access requirements for the resources of a large number of websites on the Internet, it is necessary to separately develop plug-ins for each website, which is large in development and high in cost.
发明内容Summary of the invention
为了解决现有技术中单独开发插件所造成的开发量大、成本高等问题,本公开实施例提供了一种资源访问方法及装置。所述技术方案如下:In order to solve the problems of large amount of development and high cost caused by separately developing plug-ins in the prior art, embodiments of the present disclosure provide a resource access method and apparatus. The technical solution is as follows:
第一方面,提供了一种资源访问方法,所述方法包括:当获取到终端对目标资源的访问请求时,根据多个资源类型的通用识别特征,确定所述目标资源的类型,所述访问请求携带所述目标资源的统一资源定位符URL,每个资源类型的通用识别特征通过分析多个资源样本得到;根据所述目标资源的类型所对应的唯一标识获取规则和所述目标资源的URL,获取所述目标资源的唯一标识;根据所述目标资源的唯一标识,查询缓存子系统CSS中是否存在所述目标资源;如果所述CSS中存在所述目标资源,则向所述终端发送重定向报文,所述重定向报文携带所述CSS的地址,由所述终端根据所述CSS的地址对所述目标资源进行访问。The first aspect provides a resource access method, where the method includes: when acquiring an access request of a terminal to a target resource, determining a type of the target resource according to a universal identification feature of multiple resource types, the access Requesting a Uniform Resource Locator URL carrying the target resource, the universal identification feature of each resource type is obtained by analyzing a plurality of resource samples; and obtaining a rule and a URL of the target resource according to the type of the target resource Obtaining a unique identifier of the target resource, and querying whether the target resource exists in the cache subsystem CSS according to the unique identifier of the target resource; if the target resource exists in the CSS, sending the weight to the terminal Oriented message, the redirect message carries an address of the CSS, and the terminal accesses the target resource according to the address of the CSS.
本公开实施例提供的方法,当获取到终端对目标资源的访问请求时,RSS根据多个资源类型的通用识别特征,确定目标资源的类型,并根据该目标资源的类型所对应的唯一标识获取规则和该目标资源的URL,获取该目标资源的唯一标识,如果根据该目标资源的唯一标识,查询CSS中存在该目标资源,则向终端发送重定向报文,从而将该访问请求重定向到CSS上,使得终端可以从CSS访问该目标资源。RSS通过预先获取多个资源类型的通用识别特征和该多个资源类型的唯一标识获取规则,即可实现对目标资源的识别以及访问,解决了需要为每个网站单独开发插件,开发量大,成本高的问题。且由于该多个类型的通用识别特征根据多个资源样本统计分析得到,可以大大提高RSS对资源的识别率和终端对资源的访问效率。The method provided by the embodiment of the present disclosure, when acquiring the access request of the terminal to the target resource, the RSS determines the type of the target resource according to the universal identification feature of the multiple resource types, and obtains the unique identifier corresponding to the type of the target resource. The rule and the URL of the target resource obtain a unique identifier of the target resource. If the target resource exists in the CSS according to the unique identifier of the target resource, the redirect message is sent to the terminal, thereby redirecting the access request to On the CSS, the terminal can access the target resource from the CSS. By pre-acquiring the universal identification features of multiple resource types and the unique identification acquisition rules of the multiple resource types, the RSS can realize the identification and access of the target resources, and solves the need to separately develop plug-ins for each website, and the development amount is large. The problem of high cost. Moreover, since the plurality of types of universal identification features are obtained according to statistical analysis of multiple resource samples, the recognition rate of the resources by the RSS and the access efficiency of the resources by the terminal can be greatly improved.
在第一方面的第一种可能实现方式中,所述根据多个资源类型的通用识别特征,确定所述目标资源的类型包括:从所述目标资源的源站对所述目标资源的访问请求的响应信息中,获取所述目标资源的格式信息;根据所述目标资源的格式信息和所述目标资源的URL,确定目标通用识别特征,所述目标通用识别特征为与所述目标资源的格式信息和所述目标资源的URL匹配的通用识别特征;将所述目标通用识别特征对应的资源类型确定为所述目标资源的类型。In a first possible implementation manner of the first aspect, the determining, according to the universal identification feature of the multiple resource types, the type of the target resource comprises: accessing the target resource from a source station of the target resource And obtaining the format information of the target resource according to the format information of the target resource and the URL of the target resource, where the target universal identification feature is in a format corresponding to the target resource. And a universal identification feature that matches the URL of the target resource; determining a resource type corresponding to the target universal identification feature as a type of the target resource.
本公开实施例提供的方法,RSS通过获取目标资源的格式信息,并确定与该目标资源的格式信息和URL匹配的目标通用识别特征,将该目标通用识别特征对应的资源类型确定为该目标资源的类型,资源类型确定的准确性高。The method provided by the embodiment of the present disclosure, the RSS obtains the format information of the target resource, and determines the target universal identification feature that matches the format information and the URL of the target resource, and determines the resource type corresponding to the target universal identification feature as the target resource. The type of resource type is determined with high accuracy.
在第一方面的第二种可能实现方式中,所述根据所述目标资源的类型所对应的唯一标识获取规则和所述目标资源的URL,获取所述目标资源的唯一标识包括:当所述目标资源类型为图片类型、网页文本类型、应用下载类型或音频类型时,将所述目标资源的URL的全路径获取为所述目标资源的唯一标识;当所述目标资源类型为视频类型时,如果所述目标资源的URL为静态链接或所述目标资源的URL为动态链接且所述目标资源的URL中不包含范围参数,则将所述目标资源的URL的全路径获取为所述目标资源的唯一标识,如果所述目标资源的URL为动态链接且所述目标资源的URL中包含范围参数,则将所述目标资源的URL的绝对路径获取为所述目标资源的唯一标识,所述范围参数用于指示所述访问请求所请求的数据量。In a second possible implementation manner of the first aspect, the acquiring the unique identifier of the target resource and the URL of the target resource according to the type of the target resource, and acquiring the unique identifier of the target resource includes: when When the target resource type is a video type, the full resource of the URL of the target resource is obtained as a unique identifier of the target resource, and the target resource type is a video type. If the URL of the target resource is a static link or the URL of the target resource is a dynamic link and the URL of the target resource does not include a range parameter, the full path of the URL of the target resource is obtained as the target resource. a unique identifier, if the URL of the target resource is a dynamic link and the URL of the target resource includes a range parameter, obtaining an absolute path of the URL of the target resource as a unique identifier of the target resource, the range The parameter is used to indicate the amount of data requested by the access request.
本公开实施例提供的方法,根据目标资源的URL和目标资源的类型所对应的唯一标识获取规则,来获取目标资源的唯一标识,唯一标识获取的准确性高。The method provided by the embodiment of the present disclosure obtains the unique identifier of the target resource according to the URL of the target resource and the unique identifier acquisition rule corresponding to the type of the target resource, and the accuracy of the unique identifier acquisition is high.
在第一方面的第三种可能实现方式中,图片类型的通用识别特征的获取过程包括:获取多个图片样本的格式信息;对于每个图片样本,将所述图片样本的格式信息确定为所述图片样本的识别特征;根据所述多个图片样本的识别特征,确定所述图片类型的通用识别特征, 所述图片类型的通用识别特征为在所述多个图片样本的识别特征中所占比例大于第一指定比例的识别特征。In a third possible implementation manner of the first aspect, the acquiring process of the universal identification feature of the picture type includes: acquiring format information of the plurality of picture samples; determining, for each picture sample, format information of the picture sample as Identifying a feature of the picture sample; determining, according to the identification feature of the plurality of picture samples, a universal recognition feature of the picture type, wherein the universal recognition feature of the picture type is occupied by the identification feature of the plurality of picture samples The ratio is greater than the identification feature of the first specified ratio.
本公开实施例提供的方法,根据多个图片样本的格式信息,确定该多个图片样本的识别特征,进而确定图片类型的通用识别特征,使得RSS可以通过该图片类型的通用识别特征,实现对图片资源的识别,提高了识别率。The method provided by the embodiment of the present disclosure determines the identification features of the plurality of picture samples according to the format information of the plurality of picture samples, and further determines the universal identification feature of the picture type, so that the RSS can be implemented by using the universal recognition feature of the picture type. The recognition of picture resources improves the recognition rate.
在第一方面的第四种可能实现方式中,网页文本类型、下载类型、音频类型或视频类型中任一资源类型的通用识别特征的获取过程包括:对于每个资源类型,获取所述资源类型的多个目标样本的格式信息和所述多个目标样本的URL;确定所述资源类型的多个目标样本的识别特征,每个目标样本的识别特征用于描述所述目标样本的格式信息和所述目标样本的URL;根据所述多个目标样本的识别特征,确定所述资源类型的通用识别特征,所述资源类型的通用识别特征为在所述多个目标样本的识别特征中所占比例大于第二指定比例的识别特征。In a fourth possible implementation manner of the first aspect, the acquiring process of the universal identification feature of any one of the web page text type, the download type, the audio type, or the video type includes: obtaining, for each resource type, the resource type Format information of the plurality of target samples and a URL of the plurality of target samples; determining identification features of the plurality of target samples of the resource type, the identification features of each target sample are used to describe format information of the target sample and a URL of the target sample; determining a universal recognition feature of the resource type according to the identification feature of the plurality of target samples, wherein the universal recognition feature of the resource type is occupied by the identification feature of the plurality of target samples The ratio is greater than the identification feature of the second specified ratio.
本公开实施例提供的方法,对于网页文本类型、下载类型、音频类型或视频类型中任一资源类型,根据该资源类型的多个目标样本的格式信息,确定该多个目标样本的识别特征,进而确定该资源类型的通用识别特征,使得RSS可以通过该资源类型的通用识别特征,实现对该资源类型的资源的识别,提高了识别率。The method provided by the embodiment of the present disclosure determines, for any resource type of a webpage text type, a download type, an audio type, or a video type, an identification feature of the plurality of target samples according to format information of the plurality of target samples of the resource type, The universal identification feature of the resource type is further determined, so that the RSS can identify the resource of the resource type by using the universal identification feature of the resource type, and improve the recognition rate.
第二方面,提供了一种资源访问装置,所述装置包括多个功能模块,该多个功能模块用于执行上述第一方面所提供的资源访问方法以及其任一种可能实现方式。A second aspect provides a resource access device, where the device includes a plurality of function modules, and the plurality of function modules are used to execute the resource access method provided by the first aspect and any possible implementation manner thereof.
第三方面,提供了一种资源访问装置,该资源访问装置包括:处理器;用于存储处理器可执行指令的存储器;该可执行指令用于执行:当获取到终端对目标资源的访问请求时,根据多个资源类型的通用识别特征,确定所述目标资源的类型,所述访问请求携带所述目标资源的统一资源定位符URL,每个资源类型的通用识别特征通过分析多个资源样本得到;根据所述目标资源的类型所对应的唯一标识获取规则和所述目标资源的URL,获取所述目标资源的唯一标识;根据所述目标资源的唯一标识,查询缓存子系统CSS中是否存在所述目标资源;如果所述CSS中存在所述目标资源,则向所述终端发送重定向报文,所述重定向报文携带所述CSS的地址,由所述终端根据所述CSS的地址对所述目标资源进行访问。In a third aspect, a resource access device is provided, the resource access device comprising: a processor; a memory for storing processor-executable instructions; and the executable instruction is configured to: when acquiring an access request of the terminal to the target resource Determining, according to the universal identification feature of the multiple resource types, the type of the target resource, the access request carrying a uniform resource locator URL of the target resource, and the universal identification feature of each resource type by analyzing multiple resource samples Obtaining: a unique identifier obtaining rule and a URL of the target resource corresponding to the type of the target resource, acquiring a unique identifier of the target resource; and querying whether the cache subsystem CSS exists according to the unique identifier of the target resource And the target resource; if the target resource exists in the CSS, sending a redirect message to the terminal, where the redirect message carries an address of the CSS, and the terminal is based on the address of the CSS Accessing the target resource.
在一种可能实现方式中,该可执行指令用于执行:从所述目标资源的源站对所述目标资源的访问请求的响应信息中,获取所述目标资源的格式信息;根据所述目标资源的格式信息和所述目标资源的URL,确定目标通用识别特征,所述目标通用识别特征为与所述目标资源的格式信息和所述目标资源的URL匹配的通用识别特征;将所述目标通用识别特征对应的资源类型确定为所述目标资源的类型。In a possible implementation, the executable instruction is configured to: obtain, according to the response information of the access request of the source resource of the target resource to the target resource, format information of the target resource; Determining, by the format information of the resource and the URL of the target resource, a target universal identification feature, the target universal identification feature being a universal recognition feature matching the format information of the target resource and the URL of the target resource; The resource type corresponding to the universal identification feature is determined as the type of the target resource.
在一种可能实现方式中,该可执行指令用于执行:当所述目标资源的类型为图片类型、网页文本类型、应用下载类型或音频类型时,将所述目标资源的URL的全路径获取为所述目标资源的唯一标识;当所述目标资源的类型为视频类型时,如果所述目标资源的URL为静态链接或所述目标资源的URL为动态链接且所述目标资源的URL中不包含范围参数,则将所述目标资源的URL的全路径获取为所述目标资源的唯一标识,如果所述目标资源的URL为动态链接且所述目标资源的URL中包含范围参数,则将所述目标资源的URL的绝对路径获取为所述目标资源的唯一标识,所述范围参数用于指示所述访问请求所请求的数据量。In a possible implementation, the executable instruction is configured to: when the type of the target resource is a picture type, a webpage text type, an application download type, or an audio type, obtain a full path of the URL of the target resource. a unique identifier of the target resource; when the type of the target resource is a video type, if the URL of the target resource is a static link or the URL of the target resource is a dynamic link and the URL of the target resource is not Include a range parameter, the full path of the URL of the target resource is obtained as a unique identifier of the target resource, and if the URL of the target resource is a dynamic link and the URL of the target resource includes a range parameter, The absolute path of the URL of the target resource is obtained as a unique identifier of the target resource, and the range parameter is used to indicate the amount of data requested by the access request.
在一种可能实现方式中,该可执行指令用于执行:获取多个图片样本的格式信息;对于每个图片样本,将所述图片样本的格式信息确定为所述图片样本的识别特征;根据所述多个图片样本的识别特征,确定所述图片类型的通用识别特征,所述图片类型的通用识别特征为在所述多个图片样本的识别特征中所占比例大于第一指定比例的识别特征。In a possible implementation, the executable instruction is configured to: obtain format information of a plurality of picture samples; for each picture sample, determine format information of the picture sample as an identification feature of the picture sample; Identifying features of the plurality of picture samples, determining a universal recognition feature of the picture type, wherein the universal recognition feature of the picture type is that the proportion of the identification features of the plurality of picture samples is greater than the first specified ratio feature.
在一种可能实现方式中,该可执行指令用于执行:对于每个资源类型,获取所述资源类型的多个目标样本的格式信息和所述多个目标样本的URL;确定所述资源类型的多个目标样本的识别特征,每个目标样本的识别特征用于描述所述目标样本的格式信息和所述目标样本的URL;根据所述多个目标样本的识别特征,确定所述资源类型的通用识别特征,所述资源类型的通用识别特征为在所述多个目标样本的识别特征中所占比例大于第二指定比例的识别特征。In a possible implementation, the executable instruction is configured to: obtain, for each resource type, format information of a plurality of target samples of the resource type and a URL of the multiple target samples; determine the resource type An identification feature of the plurality of target samples, the identification feature of each target sample is used to describe format information of the target sample and a URL of the target sample; determining the resource type according to the identification features of the plurality of target samples The universal identification feature, the universal identification feature of the resource type is an identification feature in which the proportion of the identification features of the plurality of target samples is greater than the second specified ratio.
图1是本公开实施例提供的一种资源访问系统的结构示意图;FIG. 1 is a schematic structural diagram of a resource access system according to an embodiment of the present disclosure;
图2是本公开实施例提供的一种终端101的结构示意图;FIG. 2 is a schematic structural diagram of a terminal 101 according to an embodiment of the present disclosure;
图3是本公开实施例提供的一种资源访问装置300的结构示意图;FIG. 3 is a schematic structural diagram of a
图4是本公开实施例提供的一种资源访问方法流程图;FIG. 4 is a flowchart of a resource access method according to an embodiment of the present disclosure;
图5是本公开实施例提供的一种资源访问装置的结构示意图。FIG. 5 is a schematic structural diagram of a resource access apparatus according to an embodiment of the present disclosure.
为使本公开的目的、技术方案和优点更加清楚,下面将结合附图对本公开实施方式作进一步地详细描述。The embodiments of the present disclosure will be further described in detail below with reference to the accompanying drawings.
图1是本公开实施例提供的一种资源访问系统结构示意图。参见图1,该系统结构包括:终端101、RSS 102、SAS 103、DSS 104、CSS 105和MSS 106。FIG. 1 is a schematic structural diagram of a resource access system according to an embodiment of the present disclosure. Referring to FIG. 1, the system architecture includes a terminal 101, an
其中,RSS 102用于获取终端101发送给互联网服务器的访问请求,将该访问请求经由SAS 103和DSS 104发送给CSS 105或将该访问请求重定向到CSS 105,由CSS 105取代源站负责资源服务;SAS 103用于将RSS 102发送的访问请求发送给DSS 104;DSS 104用于将访问请求发送到CSS 105,并负责资源索引的同步;CSS 105用于从互联网下载并缓存资源,使得终端101可以优先从CSS 105访问所需的资源;MSS 106用于对RSS 102、SAS 103、DSS 104以及CSS 105进行管理。The
图2是本公开实施例提供的一种终端101的结构示意图。参见图2,该终端101包括:FIG. 2 is a schematic structural diagram of a terminal 101 according to an embodiment of the present disclosure. Referring to FIG. 2, the terminal 101 includes:
终端101可以包括RF(Radio Frequency,射频)电路110、包括有一个或一个以上计算机可读存储介质的存储器120、输入单元130、显示单元140、传感器150、音频电路160、WiFi(Wireless Fidelity,无线保真)模块170、包括有一个或者一个以上处理核心的处理器180、以及电源190等部件。本领域技术人员可以理解,图2中示出的终端结构并不构成对终端的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:The terminal 101 may include an RF (Radio Frequency)
RF电路110可用于收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,交由一个或者一个以上处理器180处理;另外,将涉及上行的数据发送给基站。 通常,RF电路110包括但不限于天线、至少一个放大器、调谐器、一个或多个振荡器、用户身份模块(SIM)卡、收发信机、耦合器、LNA(Low Noise Amplifier,低噪声放大器)、双工器等。此外,RF电路110还可以通过无线通信与网络和其他设备通信。无线通信可以使用任一通信标准或协议,包括但不限于GSM(Global System of Mobile communication,全球移动通讯系统)、GPRS(General Packet Radio Service,通用分组无线服务)、CDMA(Code Division Multiple Access,码分多址)、WCDMA(Wideband Code Division Multiple Access,宽带码分多址)、LTE(Long Term Evolution,长期演进)、电子邮件、SMS(Short Messaging Service,短消息服务)等。The
存储器120可用于存储软件程序以及模块,处理器180通过运行存储在存储器120的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器120可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据终端101的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器120可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器120还可以包括存储器控制器,以提供处理器180和输入单元130对存储器120的访问。The
输入单元130可用于接收输入的数字或字符信息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。具体地,输入单元130可包括触敏表面131以及其他输入设备132。触敏表面131,也称为触摸显示屏或者触控板,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触敏表面131上或在触敏表面131附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触敏表面131可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器180,并能接收处理器180发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触敏表面131。除了触敏表面131,输入单元130还可以包括其他输入设备132。具体地,其他输入设备132可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。The
显示单元140可用于显示由用户输入的信息或提供给用户的信息以及终端101的各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。显示单元140可包括显示面板141,可选的,可以采用LCD(Liquid Crystal Display,液晶显示器)、OLED(Organic Light-Emitting Diode,有机发光二极管)等形式来配置显示面板141。进一步的,触敏表面131可覆盖显示面板141,当触敏表面131检测到在其上或附近的触摸操作后,传送给处理器180以确定触摸事件的类型,随后处理器180根据触摸事件的类型在显示面板141上提供相应的视觉输出。虽然在图2中,触敏表面131与显示面板141是作为两个独立的部件来实现输入和输入功能,但是在某些实施例中,可以将触敏表面131与显示面板141集成而实现输入和输出功能。The
终端101还可包括至少一种传感器150,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线 的明暗来调节显示面板141的亮度,接近传感器可在终端101移动到耳边时,关闭显示面板141和/或背光。作为运动传感器的一种,重力加速度传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于终端101还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
音频电路160、扬声器161,传声器162可提供用户与终端101之间的音频接口。音频电路160可将接收到的音频数据转换后的电信号,传输到扬声器161,由扬声器161转换为声音信号输出;另一方面,传声器162将收集的声音信号转换为电信号,由音频电路160接收后转换为音频数据,再将音频数据输出处理器180处理后,经RF电路110以发送给比如另一终端,或者将音频数据输出至存储器120以便进一步处理。音频电路160还可能包括耳塞插孔,以提供外设耳机与终端101的通信。The
WiFi属于短距离无线传输技术,终端101通过WiFi模块170可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图2示出了WiFi模块170,但是可以理解的是,其并不属于终端101的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。WiFi is a short-range wireless transmission technology, and the terminal 101 can help users to send and receive emails, browse web pages, and access streaming media through the
处理器180是终端101的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器120内的软件程序和/或模块,以及调用存储在存储器120内的数据,执行终端101的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器180可包括一个或多个处理核心;优选的,处理器180可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器180中。The
终端101还包括给各个部件供电的电源190比如电池),优选的,电源可以通过电源管理系统与处理器180逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源190还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。The terminal 101 also includes a
尽管未示出,终端101还可以包括摄像头、蓝牙模块等,在此不再赘述。具体在本实施例中,终端的显示单元是触摸屏显示器,终端还包括有存储器,以及可执行指令,其中可执行指令存储于存储器中,且经配置以由一个或者一个以上处理器执行。Although not shown, the terminal 101 may further include a camera, a Bluetooth module, and the like, and details are not described herein again. Specifically in this embodiment, the display unit of the terminal is a touch screen display, the terminal further includes a memory, and executable instructions, wherein the executable instructions are stored in the memory and configured to be executed by one or more processors.
图3是本公开实施例提供的一种资源访问装置300的结构示意图。例如,装置300可以被提供为RSS、SAS、DSS、CSS或MSS中的任一种。参照图3,装置300包括处理组件322,其进一步包括一个或多个处理器,以及由存储器332所代表的存储器资源,用于存储可由处理部件322的执行的指令,例如应用程序。存储器332中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件322被配置为执行指令,以执行下述图4所示实施例中的方法。FIG. 3 is a schematic structural diagram of a
装置300还可以包括一个电源组件326被配置为执行装置300的电源管理,一个有线或无线网络接口350被配置为将装置300连接到网络,和一个输入输出(I/O)接口358。装置300可以操作基于存储在存储器332的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM, LinuxTM,FreeBSDTM或类似。
图4是本公开实施例提供的一种域名访问方法的流程示意图。该实施例以资源访问装置为RSS为例进行说明,参见图4,包括:FIG. 4 is a schematic flowchart of a domain name access method according to an embodiment of the present disclosure. This embodiment uses the resource access device as an RSS as an example. Referring to FIG. 4, the method includes:
401、RSS获取多个资源类型的通用识别特征和该多个资源类型的唯一标识获取规则,每个资源类型的通用识别特征通过分析多个资源样本得到。401. The RSS acquires a universal identification feature of multiple resource types and a unique identifier acquisition rule of the multiple resource types, and the universal recognition feature of each resource type is obtained by analyzing multiple resource samples.
本公开实施例中,该多个资源类型包括图片类型、网页文本类型、下载类型、音频类型以及视频类型。每个资源类型的通用识别特征可以通过对互联网上的多个资源样本进行分析得到,得到该多个资源类型的通用识别特征后,再基于该多个资源类型的通用识别特征生成该多个资源类型的插件,RSS可以通过加载该多个资源类型的插件,来获取该多个资源类型的通用识别特征。当然,RSS还可以基于该多个资源类型的通用识别特征建立资源类型识别模型,来获取该多个资源类型的通用识别特征,本公开实施例对此不做限定。In the embodiment of the present disclosure, the plurality of resource types include a picture type, a webpage text type, a download type, an audio type, and a video type. The universal identification feature of each resource type may be obtained by analyzing a plurality of resource samples on the Internet, and after obtaining the universal identification features of the multiple resource types, generating the multiple resources based on the universal identification features of the multiple resource types. A plug-in of a type, the RSS can acquire a universal identification feature of the multiple resource types by loading the plug-ins of the multiple resource types. Of course, the RSS may also establish a resource type identification model based on the universal identification features of the multiple resource types to obtain a universal identification feature of the multiple resource types, which is not limited by the embodiment of the disclosure.
需要说明的是,本公开实施例仅以获取上述多个资源类型的通用识别特征的为RSS为例进行说明,该获取可以是指从任一设备中获取,也可以是指由RSS或系统管理人员进行分析从而获取。It should be noted that, in the embodiment of the present disclosure, the RSS is taken as an example for obtaining the universal identification feature of the multiple resource types, and the obtaining may be obtained from any device, or may be managed by RSS or system. Personnel analyze and obtain.
下面分别采用步骤401a和步骤401b中的过程,对该多个资源类型的通用识别特征的获取过程进行说明:The process of obtaining the universal identification features of the multiple resource types will be described below by using the processes in step 401a and step 401b, respectively.
401a、对于图片类型的通用识别特征X1,该通用识别特征X1的获取过程可以包括步骤401a1至401a3:401a. For the universal identification feature X1 of the picture type, the acquisition process of the universal identification feature X1 may include steps 401a1 to 401a3:
401a1:获取多个图片样本的格式信息。401a1: Get format information of multiple picture samples.
对于每个图片样本,该图片样本的格式信息用于指示该图片样本的资源类型以及资源格式。例如,图片样本的格式信息可以为image/jpg,其中,image表示该图片样本的资源类型为图片,jpg表示该图片样本的资源格式为jpg格式。For each picture sample, the format information of the picture sample is used to indicate the resource type and resource format of the picture sample. For example, the format information of the image sample may be image/jpg, where image indicates that the resource type of the image sample is a picture, and jpg indicates that the resource format of the image sample is a jpg format.
该多个图片样本的格式信息可以从该多个图片样本的下载记录中获取,每个图片样本的下载记录中保存有该图片样本的格式信息。对该多个图片样本的格式信息进行统计,统计结果如下:The format information of the plurality of picture samples may be obtained from the download record of the plurality of picture samples, and the format information of the picture sample is saved in the download record of each picture sample. The format information of the plurality of picture samples is counted, and the statistical results are as follows:
链接分布维度:Link distribution dimension:
静态链接91%,动态链接9%Static link 91%, dynamic link 9%
从后缀和contontentype能对应上的流量:91%Traffic from the suffix and contontentype: 91%
后缀分布维度:Suffix distribution dimension:
jpg,gif,png,jpeg后缀占比85%左右Jpg, gif, png, jpeg suffixes account for about 85%
格式信息分布维度:Format information distribution dimension:
image/jpeg,image/png,image/gif,image/webp,image/jpg占比98%Image/jpeg, image/png, image/gif, image/webp, image/jpg accounted for 98%
格式信息:Format information:
由上述统计结果可知,该多个图片样本中URL为静态链接和动态链接的图片样本所占比例分别为91%和9%,也即是,该多个图片样本的URL几乎均为静态链接,说明图片资源一般为可缓存的静态图片资源,因此可以根据格式信息对图片资源进行识别,即可以将图片资源的格式信息确定为该图片资源的识别特征。It can be seen from the above statistical results that the proportion of the image samples in the plurality of image samples that are statically linked and dynamically linked is 91% and 9%, respectively, that is, the URLs of the plurality of image samples are almost static links. The picture resource is generally a cacheable static picture resource. Therefore, the picture resource can be identified according to the format information, that is, the format information of the picture resource can be determined as the identification feature of the picture resource.
401a2:对于每个图片样本,将该图片样本的格式信息确定为该图片样本的识别特征。401a2: For each picture sample, the format information of the picture sample is determined as an identification feature of the picture sample.
针对步骤401a1中可以将图片资源的格式信息确定为该图片资源的识别特征,对于每个图片样本,可以将该图片样本的格式信息确定为该图片样本的识别特征。The format information of the picture resource may be determined as the identification feature of the picture resource in step 401a1. For each picture sample, the format information of the picture sample may be determined as the identification feature of the picture sample.
例如,图片样本A的格式信息为image/jpg,则可以将image/jpg确定为图片样本A的识别特征。通过该步骤401a2,可以得到该多个图片样本的识别特征,参见步骤401a1中的统计结果。For example, if the format information of the picture sample A is image/jpg, the image/jpg can be determined as the identification feature of the picture sample A. Through the step 401a2, the identification features of the plurality of picture samples can be obtained, as shown in the statistical result in step 401a1.
401a3:根据该多个图片样本的识别特征,确定图片类型的通用识别特征,该图片类型的通用识别特征为在该多个图片样本的识别特征中所占比例大于第一指定比例的识别特征。401a3: Determine, according to the identification feature of the plurality of picture samples, a universal recognition feature of the picture type, where the universal recognition feature of the picture type is an identification feature that is greater than a first specified ratio in the identification features of the plurality of picture samples.
其中,第一指定比例可以由开发人员选取,例如,该第一指定比例可以为0.7%。根据步骤401a1中的统计结果,该多个图片样本的识别特征中所占比例大于第一指定比例(0.7%)的识别特征包括image/jpeg、image/png、image/gif、image/webp以及image/jpg,所占比例分别为64.0588%、16.9975%、15.1025%、2.31268%以及0.740895%。因此,图片类型的通用识别特征X1可以包括:格式信息为image/jpeg、image/png、image/gif、image/jpg以及image/webp。The first specified ratio may be selected by the developer. For example, the first specified ratio may be 0.7%. According to the statistical result in step 401a1, the recognition features of the plurality of picture samples whose proportions are larger than the first specified ratio (0.7%) include image/jpeg, image/png, image/gif, image/webp, and image. /jpg, the proportion is 64.0588%, 16.9975%, 15.1025%, 2.31268% and 0.740895%. Therefore, the universal recognition feature X1 of the picture type may include: the format information is image/jpeg, image/png, image/gif, image/jpg, and image/webp.
相关技术中针对URL为静态链接且包含特定后缀(jpg、gif、png以及jpeg)的识别,识别率为85%,而本公开实施例中针对图片类型的通用识别特征X1的识别(识别image/jpeg、image/png、image/gif、image/webp以及image/jpg),识别率可以提高到98%。In the related art, for the identification that the URL is a static link and contains a specific suffix (jpg, gif, png, and jpeg), the recognition rate is 85%, and in the embodiment of the present disclosure, the recognition of the universal recognition feature X1 for the picture type (identification image/ Jpeg, image/png, image/gif, image/webp, and image/jpg), the recognition rate can be increased to 98%.
401b、对于网页文本类型的通用识别特征X2、下载类型的通用识别特征X3、音频类型的通用识别特征X4以及视频类型的通用识别特征X5,该通用识别特征X2至X5的获取过程均可以包括步骤401b1至401b3:401b. For the universal recognition feature X2 of the webpage text type, the universal recognition feature X3 of the download type, the universal recognition feature X4 of the audio type, and the universal recognition feature X5 of the video type, the acquisition process of the universal identification feature X2 to X5 may include steps 401b1 to 401b3:
401b1:对于每个资源类型,获取该资源类型的多个目标样本的格式信息和该多个目标样本的URL。401b1: For each resource type, obtain format information of a plurality of target samples of the resource type and a URL of the plurality of target samples.
其中,该多个目标样本的格式信息的获取与步骤401a1同理,每个目标样本的URL为该目标样本的访问请求对应的链接,由该目标样本的访问请求携带,因此,可以从该多个目标样本的访问请求中,获取该多个目标样本的URL。The obtaining the format information of the plurality of target samples is the same as the step 401a1, and the URL of each target sample is a link corresponding to the access request of the target sample, and is carried by the access request of the target sample, so In the access request of the target samples, the URLs of the plurality of target samples are obtained.
(1)以该资源类型为网页文本类型,目标样本为网页文本样本为例,对该多个网页文本样本的格式信息进行统计,统计结果如下:(1) The resource type is the webpage text type, and the target sample is a webpage text sample as an example, and the format information of the text samples of the plurality of webpages is counted, and the statistical result is as follows:
链接分布维度:Link distribution dimension:
静态链接57%,动态链接42%Static link 57%, dynamic link 42%
流量识别:34.6217%Traffic identification: 34.6217%
后缀分布维度:Suffix distribution dimension:
htm,html,js,css的静态链接流量占比20%Html, html, js, css static link traffic accounted for 20%
格式信息分布维度:Format information distribution dimension:
text/html,text/plain,text/javascript,text/css,text/xml占总体流量99%Text/html, text/plain, text/javascript, text/css, text/xml account for 99% of the total traffic
由上述统计结果可知,该多个网页文本样本中URL为静态链接和动态链接的网页文本样本所占比例分别为57%和42%。说明网页文本资源中可能存在不可直接缓存的动态链接资源,因此,为了保证识别的准确性,不能仅根据格式信息对网页文本资源进行识别,而需要结合格式信息以及URL对网页文本资源进行识别,即可以结合格式信息以及URL来确定网页文本资源的识别特征。It can be seen from the above statistical results that the proportions of the webpage text samples in which the URLs are statically linked and dynamically linked in the plurality of webpage text samples are 57% and 42%, respectively. It may be noted that there may be a dynamic link resource that cannot be directly cached in the webpage text resource. Therefore, in order to ensure the accuracy of the identification, the webpage text resource cannot be identified only according to the format information, but the webpage text resource needs to be identified by combining the format information and the URL. That is, the identification information of the webpage text resource can be determined by combining the format information and the URL.
(2)以该资源类型为下载类型,目标样本为下载样本为例,对该多个下载样本的格式信息进行统计,统计结果如下:(2) The resource type is the download type, and the target sample is a download sample as an example, and the format information of the plurality of downloaded samples is counted, and the statistical result is as follows:
链接分布维度:Link distribution dimension:
静态链接51.1451%,动态链接48.8549%Static link 51.1451%, dynamic link 48.8549%
后缀分布维度:Suffix distribution dimension:
特定下载后缀的静态链接流量占比27%Static link traffic for a specific download suffix is 27%
格式信息分布维度:Format information distribution dimension:
application/octet-stream,application/vnd.android.package-archive,applicationApplication/octet-stream,application/vnd.android.package-archive,application
/zip,application/pdf占总体流量的96%/zip, application/pdf accounts for 96% of total traffic
由上述统计结果可知,该多个下载样本中URL为静态链接和动态链接的下载样本所占比例分别为51.1451%和48.8549%。说明下载资源中可能存在不可直接缓存的动态链接资源,因此,为了保证识别的准确性,不能仅根据格式信息对下载资源进行识别,而需要结合格式信息以及URL对下载资源进行识别,即可以结合格式信息以及URL来确定下载资源的识别特征。It can be seen from the above statistical results that the proportion of download samples with static URLs and dynamic links in the plurality of download samples is 51.4511% and 48.8549%, respectively. It is indicated that there may be a dynamic link resource that cannot be directly cached in the download resource. Therefore, in order to ensure the accuracy of the identification, the download resource cannot be identified only according to the format information, and the download resource needs to be identified by combining the format information and the URL, that is, the combination can be combined. The format information and the URL determine the identification characteristics of the downloaded resource.
(3)以该资源类型为音频类型,目标样本为音频样本为例,对该多个音频样本的格式信息进行统计,统计结果如下:(3) Taking the resource type as the audio type and the target sample as an audio sample as an example, the format information of the plurality of audio samples is counted, and the statistical result is as follows:
链接分布维度:Link distribution dimension:
静态链接:87.2962%,动态链接:12.7038%Static link: 87.2962%, dynamic link: 12.7038%
流量识别:94.0494%(MIME和URL后缀相符,无论动态还是静态链接)Traffic identification: 94.0494% (MIME and URL suffix match, whether dynamic or static)
后缀分布维度:Suffix distribution dimension:
mp3,ogg,m4a后缀的流量占了87%左右,Mp3, ogg, m4a suffix traffic accounted for about 87%,
格式信息分布维度:Format information distribution dimension:
audio/mpeg占总体流量的97%Audio/mpeg accounts for 97% of total traffic
由上述统计结果可知,该多个音频样本中URL为静态链接和动态链接的音频样本所占比例分别为87.2962%和12.7038%。说明音频资源中可能存在不可直接缓存的动态链接资源,因此,为了保证识别的准确性,不能仅根据格式信息对音频资源进行识别,而需要结合格式信息以及URL对音频资源进行识别,即可以结合格式信息以及URL来确定音频资源的识别特征。It can be seen from the above statistical results that the proportion of the audio samples with static and dynamic links in the plurality of audio samples is 87.2962% and 12.7038%, respectively. It is indicated that there may be dynamic link resources that cannot be directly cached in the audio resource. Therefore, in order to ensure the accuracy of the identification, the audio resource cannot be identified only according to the format information, but the audio resource needs to be combined with the format information and the URL, that is, the audio resource can be combined. The format information and the URL determine the identification characteristics of the audio resource.
(4)以该资源类型为视频类型,目标样本为视频样本为例,对该多个视频样本的格式信息进行统计,统计结果如下:(4) Taking the resource type as the video type and the target sample as a video sample as an example, the format information of the plurality of video samples is counted, and the statistical result is as follows:
链接分布维度:Link distribution dimension:
静态链接:39.5552%,动态链接:60.4448%Static link: 39.5552%, dynamic link: 60.4448%
流量识别:83.0772%(MIME和URL后缀相符,无论动态还是静态链接)Traffic identification: 83.0772% (MIME and URL suffix match, whether dynamic or static)
后缀分布维度:Suffix distribution dimension:
mp4,ts,3gp,m4v,flv,webm主流后缀的静态链接流量占了39%左右Mp4, ts, 3gp, m4v, flv, webm mainstream suffix static link traffic accounted for about 39%
格式信息分布维度:Format information distribution dimension:
video/mp4,video/MP2T,video/mp2t,video/m2ts,video/x-flv,video/flv,video/3gpp, video/webm流量占比99%Video/mp4, video/MP2T, video/mp2t, video/m2ts, video/x-flv, video/flv, video/3gpp, video/webm traffic 99%
由上述统计结果可知,该多个视频样本中URL为静态链接和动态链接的目标样本所占比例分别为39.5552%和60.4448%。说明视频资源中可能存在不可直接缓存的动态链接资源,因此,为了保证识别的准确性,不能仅根据格式信息对视频资源进行识别,而需要结合格式信息以及URL对视频资源进行识别,即可以结合格式信息以及URL来确定视频资源的识别特征。It can be seen from the above statistical results that the proportion of the target samples with static URLs and dynamic links in the plurality of video samples is 39.5552% and 60.4448%, respectively. It is indicated that there may be dynamic link resources that cannot be directly cached in the video resource. Therefore, in order to ensure the accuracy of the identification, the video resource cannot be identified only according to the format information, but the video resource needs to be identified by combining the format information and the URL, that is, the video resource can be combined. The format information and the URL determine the identification characteristics of the video resource.
401b2:确定该资源类型的多个目标样本的识别特征,每个目标样本的识别特征用于描述该目标样本的格式信息和该目标样本的URL。401b2: Determine an identification feature of the plurality of target samples of the resource type, and the identification feature of each target sample is used to describe format information of the target sample and a URL of the target sample.
(1)针对步骤401b1(1)中可以结合格式信息以及URL来确定网页文本资源的识别特征,对于每个网页文本样本,可以结合格式信息以及URL来确定该网页文本样本的识别特征。(1) For the step 401b1 (1), the identification feature of the webpage text resource may be determined by combining the format information and the URL. For each webpage text sample, the identification feature of the webpage text sample may be determined by combining the format information and the URL.
例如,网页文本样本B的格式信息为text/javascript,URL为http://xxx.com/yyy.js,则可以将“格式信息为text/javascript且URL的后缀为js”确定为网页文本样本B的识别特征。通过该步骤401b2,可以得到该多个网页文本样本的识别特征,对该多个网页文本样本的识别特征的统计结果如下:For example, if the format information of the webpage text sample B is text/javascript and the URL is http://xxx.com/yyy.js, the "format information is text/javascript and the suffix of the URL is js" can be determined as a webpage text sample. The identification feature of B. Through the step 401b2, the identification features of the plurality of webpage text samples can be obtained, and the statistical results of the identification features of the plurality of webpage text samples are as follows:
X2a、格式信息包含的文本类型和URL的后缀相同(如格式信息为text/xml,URL的后缀为xml)4.07%X2a, format information contains the same text type and URL suffix (such as format information is text/xml, URL suffix is xml) 4.07%
X2b、格式信息为text/javascript且URL的后缀为js(无论URL为动态链接还是静态链接)16.99%X2b, the format information is text/javascript and the suffix of the URL is js (regardless of whether the URL is dynamic link or static link) 16.99%
X2c、格式信息为text/html且URL的后缀为htm或html(无论URL为动态链接还是静态链接)12.56%X2c, the format information is text/html and the suffix of the URL is htm or html (regardless of whether the URL is dynamic or static) 12.56%
X2d、格式信息为text/html且URL为域名加上“/”结尾的链接(类似于http://xxx.com/)或域名加上绝对路径且以“/”结尾的链接(类似于http://xxx.com/yyy/)2.38%X2d, the format information is text/html and the URL is the link with the domain name plus "/" (similar to http://xxx.com/) or the domain name plus the absolute path and the link ending with "/" (similar to http ://xxx.com/yyy/)2.38%
(2)针对步骤401b1(2)中可以结合格式信息以及URL来确定下载资源的识别特征,对于每个下载样本,可以结合格式信息以及URL来确定该下载样本的识别特征。(2) For the step 401b1 (2), the identification feature of the download resource may be determined by combining the format information and the URL. For each downloaded sample, the identification feature of the downloaded sample may be determined by combining the format information and the URL.
例如,下载样本C的格式信息为application/octet-stream,URL为http://xxx.com/yyy,则可以将“格式信息为application/octet-stream且URL为静态链接”确定为下载样本C的识别特征。通过该步骤401b2,可以得到该多个下载样本的识别特征,对该多个下载样本的识别特征的统计结果如下:For example, if the format information of the downloaded sample C is application/octet-stream and the URL is http://xxx.com/yyy, the format information "application/octet-stream and the URL is statically linked" can be determined as the download sample C. Identification feature. Through the step 401b2, the identification features of the plurality of downloaded samples can be obtained, and the statistical results of the identification features of the plurality of downloaded samples are as follows:
X3a、格式信息为application/octet-stream且URL为静态链接 51.15%X3a, the format information is application/octet-stream and the URL is static link 51.15%
X3b、格式信息为application/vnd.android.package-archive或application/zip或application/pdf且URL的后缀为apk或zip或pdf(无论URL为动态链接还是静态链接)3.38%X3b, the format information is application/vnd.android.package-archive or application/zip or application/pdf and the URL is suffixed with apk or zip or pdf (regardless of whether the URL is dynamic or static) 3.38%
(3)针对步骤401b1(3)中可以结合格式信息以及URL来确定音频资源的识别特征,对于每个音频样本,可以结合格式信息以及URL来确定该音频样本的识别特征。(3) For the step 401b1 (3), the identification feature of the audio resource may be determined by combining the format information and the URL, and for each audio sample, the identification feature of the audio sample may be determined in combination with the format information and the URL.
例如,音频样本D的格式信息为audio/ogg,URL为http://xxx.com/yyy.ogg,则可以将“格式信息为audio/ogg且URL的后缀为ogg”确定为音频样本C的识别特征。通过该步骤401b2,可以得到该多个音频样本的识别特征,对该多个音频样本的识别特征的统计结果如下:For example, if the format information of the audio sample D is audio/ogg and the URL is http://xxx.com/yyy.ogg, the "format information is audio/ogg and the suffix of the URL is ogg" can be determined as the audio sample C. Identify features. Through the step 401b2, the identification features of the plurality of audio samples can be obtained, and the statistical results of the identification features of the plurality of audio samples are as follows:
X4a、格式信息为audio/ogg且URL的后缀为ogg 1.50%X4a, the format information is audio/ogg and the suffix of the URL is ogg 1.50%
X4b、格式信息为audio/mpeg或application/octet-stream或audio/mp3且URL的后缀为mp3 96.89%X4b, the format information is audio/mpeg or application/octet-stream or audio/mp3 and the suffix of the URL is mp3 96.89%
(4)针对步骤401b1(4)中可以结合格式信息以及URL来确定视频资源的识别特征,对于每个视频样本,可以结合格式信息以及URL来确定该视频样本的识别特征。(4) For the step 401b1 (4), the identification feature of the video resource may be determined by combining the format information and the URL. For each video sample, the identification feature of the video sample may be determined by combining the format information and the URL.
例如,视频样本E的格式信息为video/mp4,URL为http://xxx.com/yyy.MP4,则可以将“格式信息为video/mp4且URL的后缀为mp4”确定为视频样本E的识别特征。通过该步骤401b2,可以得到该多个视频样本的识别特征,对该多个视频样本的识别特征的统计结果如下:For example, if the format information of the video sample E is video/mp4 and the URL is http://xxx.com/yyy.MP4, the "format information is video/mp4 and the suffix of the URL is mp4" can be determined as the video sample E. Identify features. Through the step 401b2, the identification features of the plurality of video samples can be obtained, and the statistical results of the identification features of the plurality of video samples are as follows:
X5a、格式信息为video/mp4且URL的后缀为mp4 71.30%X5a, the format information is video/mp4 and the suffix of the URL is mp4 71.30%
X5b、格式信息为video/MP2T或video/mp2t或video/m2ts且URL的后缀为ts 8.30%X5b, the format information is video/MP2T or video/mp2t or video/m2ts and the suffix of the URL is ts 8.30%
X5c、格式信息为video/x-flv或video/flv且URL的后缀为flv 1.20%X5c, format information is video/x-flv or video/flv and the URL suffix is flv 1.20%
X5d、格式信息为video/3gpp且URL的后缀为3gp 2.17%X5d, the format information is video/3gpp and the URL suffix is 3gp 2.17%
X5e、格式信息为video/webm且URL的后缀为webm 0.03%X5e, the format information is video/webm and the URL suffix is webm 0.03%
401b3:根据该多个目标样本的识别特征,确定该资源类型的通用识别特征,该资源类型的通用识别特征为在该多个目标样本的识别特征中所占比例大于第二指定比例的识别特征。401b3: Determine, according to the identification feature of the plurality of target samples, a universal identification feature of the resource type, where the universal recognition feature of the resource type is that the proportion of the identification features of the plurality of target samples is greater than a second specified ratio .
其中,第二指定比例可以由开发人员选取,而针对不同的资源类型,第二指定比例的取值可以不同。The second specified ratio may be selected by the developer, and the values of the second specified ratio may be different for different resource types.
(1)对于网页文本类型,该第二指定比例可以为2.3%。根据步骤401b2(1)中的统计结果,该多个网页文本样本的识别特征中所占比例大于第二指定比例的识别特征包括X2a至X2d:(1) For the web page text type, the second specified ratio may be 2.3%. According to the statistical result in step 401b2 (1), the recognition features of the plurality of webpage text samples whose proportions are larger than the second specified ratio include X2a to X2d:
X2a、格式信息包含的文本类型和URL的后缀相同(如格式信息为text/xml,URL的后缀为xml);X2a, the format information contains the same text type and URL suffix (such as the format information is text/xml, the URL suffix is xml);
X2b、格式信息为text/javascript且URL的后缀为js(无论URL为动态链接还是静态链接);X2b, the format information is text/javascript and the suffix of the URL is js (regardless of whether the URL is a dynamic link or a static link);
X2c、格式信息为text/html且URL的后缀为htm或html(无论URL为动态链接还是静态链接);X2c, the format information is text/html and the suffix of the URL is htm or html (regardless of whether the URL is a dynamic link or a static link);
X2d、格式信息为text/html且URL为域名加上“/”结尾的链接(类似于http://xxx.com/)或域名加上绝对路径且以“/”结尾的链接(类似于http://xxx.com/yyy/)。X2d, the format information is text/html and the URL is the link with the domain name plus "/" (similar to http://xxx.com/) or the domain name plus the absolute path and the link ending with "/" (similar to http ://xxx.com/yyy/).
其中,X2a至X2d所占比例分别为4.07%、16.99%、12.56%和2.38%。因此,网页文本类型的通用识别特征X2可以包括上述X2a至X2d。Among them, X2a to X2d accounted for 4.07%, 16.99%, 12.56% and 2.38%, respectively. Therefore, the universal identification feature X2 of the web page text type may include the above X2a to X2d.
相关技术中针对URL为静态链接且包含特定后缀(htm、html、js以及css)的识别,识 别率为20%,而本公开实施例中针对网页文本类型的通用识别特征X2的识别(识别X2a至X2d),识别率可以提高到36%。In the related art, for the identification that the URL is a static link and contains a specific suffix (htm, html, js, and css), the recognition rate is 20%, and the identification of the universal recognition feature X2 for the web page text type in the embodiment of the present disclosure (identification X2a) To X2d), the recognition rate can be increased to 36%.
(2)对于下载类型,该第二指定比例可以为3.3%。根据步骤401b2(2)中的统计结果,该多个下载样本的识别特征中所占比例大于第二指定比例的识别特征包括X3a和X3b:(2) For the download type, the second specified ratio may be 3.3%. According to the statistical result in step 401b2 (2), the recognition features of the plurality of downloaded samples whose proportions are larger than the second specified ratio include X3a and X3b:
X3a、格式信息为application/octet-stream且URL为静态链接;X3a, the format information is application/octet-stream and the URL is a static link;
X3b、格式信息为application/vnd.android.package-archive或application/zip或application/pdf且URL的后缀为apk或zip或pdf(无论URL为动态链接还是静态链接)。X3b, the format information is application/vnd.android.package-archive or application/zip or application/pdf and the URL is suffixed with apk or zip or pdf (regardless of whether the URL is dynamic or static).
其中,X3a和X3b所占比例分别为51.15%和3.38%。因此,下载类型的通用识别特征X3可以包括上述X3a和X3b。Among them, the proportion of X3a and X3b is 51.15% and 3.38%, respectively. Therefore, the general identification feature X3 of the download type may include the above X3a and X3b.
相关技术中针对URL为静态链接且包含特定后缀的识别,识别率为27%,而本公开实施例中针对下载类型的通用识别特征X3的识别(识别X3a和X3b),识别率可以提高到54%。In the related art, for the identification that the URL is a static link and contains a specific suffix, the recognition rate is 27%, and in the embodiment of the present disclosure, for the identification of the universal recognition feature X3 of the download type (identification X3a and X3b), the recognition rate can be increased to 54. %.
(3)对于音频类型,该第二指定比例可以为1.5%。根据步骤401b2(3)中的统计结果,该多个音频样本的识别特征中所占比例大于第二指定比例的识别特征包括X4a和X4b:(3) For the audio type, the second specified ratio may be 1.5%. According to the statistical result in step 401b2 (3), the recognition features of the plurality of audio samples that occupy a larger proportion than the second specified ratio include X4a and X4b:
X4a、格式信息为audio/ogg且URL的后缀为ogg。X4a, the format information is audio/ogg and the suffix of the URL is ogg.
X4b、格式信息为audio/mpeg或application/octet-stream或audio/mp3且URL的后缀为mp3。X4b, the format information is audio/mpeg or application/octet-stream or audio/mp3 and the suffix of the URL is mp3.
其中,X4a和X4b所占比例分别为1.50%和96.89%。因此,下载类型的通用识别特征X4可以包括上述X4a和X4b。Among them, the proportion of X4a and X4b is 1.50% and 96.89%, respectively. Therefore, the universal identification feature X4 of the download type may include X4a and X4b described above.
相关技术中针对URL为静态链接且包含特定后缀的识别,识别率为87%,而本公开实施例中针对下载类型的通用识别特征X4的识别(识别X4a和X4b),识别率可以提高到98.39%。In the related art, for the identification that the URL is a static link and contains a specific suffix, the recognition rate is 87%, and in the embodiment of the present disclosure, for the identification of the universal identification feature X4 of the download type (identification X4a and X4b), the recognition rate can be improved to 98.39. %.
(4)对于视频类型,该第二指定比例可以为0.03%。根据步骤401b2(4)中的统计结果,该多个视频样本的识别特征中所占比例大于第二指定比例的识别特征包括X5a至X5e:(4) For the video type, the second specified ratio may be 0.03%. According to the statistical result in step 401b2 (4), the recognition features of the plurality of video samples whose proportions are larger than the second specified ratio include X5a to X5e:
X5a、格式信息为video/mp4且URL的后缀为mp4。X5a, the format information is video/mp4 and the suffix of the URL is mp4.
X5b、格式信息为video/MP2T或video/mp2t或video/m2ts且URL的后缀为ts。X5b, the format information is video/MP2T or video/mp2t or video/m2ts and the suffix of the URL is ts.
X5c、格式信息为video/x-flv或video/flv且URL的后缀为flv。X5c, the format information is video/x-flv or video/flv and the suffix of the URL is flv.
X5d、格式信息为video/3gpp且URL的后缀为3gp。X5d, the format information is video/3gpp and the suffix of the URL is 3gp.
X5e、格式信息为video/webm且URL的后缀为webm。X5e, the format information is video/webm and the suffix of the URL is webm.
其中,X5a至X5e所占比例分别为71.30%,8.30%,1.20%,2.17%,0.03%。因此,视频类型的通用识别特征X5可以包括上述X5a至X5e。Among them, the proportion of X5a to X5e is 71.30%, 8.30%, 1.20%, 2.17%, 0.03%. Therefore, the universal identification feature X5 of the video type may include the above X5a to X5e.
相关技术中针对URL为静态链接且包含特定后缀的识别,识别率为39%,而本公开实施例中针对下载类型的通用识别特征X4的识别(识别X4a和X4b),识别率可以提高到83%。In the related art, for the identification that the URL is a static link and contains a specific suffix, the recognition rate is 39%, and in the embodiment of the present disclosure, for the identification of the universal identification feature X4 of the download type (identification X4a and X4b), the recognition rate can be improved to 83. %.
本公开实施例中,在获取多个资源类型的通用识别特征的过程中,还可以获取该多个资源类型的唯一标识获取规则。该多个资源类型的唯一标识获取规则可以为:当资源的类型为图片类型、网页文本类型、应用下载类型或音频类型时,将该资源的URL的全路径获取为该资源的唯一标识;当资源的类型为视频类型时,如果该资源的URL为静态链接或该资源的URL为动态链接且该资源的URL中不包含范围参数,则将该资源的URL的全路径获取为该资源的唯一标识,如果该资源的URL为动态链接且该资源的URL中包含范围参数,则将该资源的URL的绝对路径获取为该资源的唯一标识。该范围参数用于指示该访问请求所请求的数据量,例如,该范围参数可以为range,范围参数的取值为1M,则表明该访问请求所请求的视频大小为 1M。In the embodiment of the present disclosure, in the process of acquiring the universal identification features of the multiple resource types, the unique identifier acquisition rules of the multiple resource types may also be acquired. The unique identifier obtaining rule of the multiple resource types may be: when the type of the resource is a picture type, a webpage text type, an application download type, or an audio type, the full path of the URL of the resource is obtained as a unique identifier of the resource; When the type of the resource is a video type, if the URL of the resource is a static link or the URL of the resource is a dynamic link and the URL of the resource does not include a range parameter, the full path of the URL of the resource is obtained as the unique The identifier, if the URL of the resource is a dynamic link and the URL of the resource includes a range parameter, the absolute path of the URL of the resource is obtained as a unique identifier of the resource. The range parameter is used to indicate the amount of data requested by the access request. For example, the range parameter may be range, and the value of the range parameter is 1 M, indicating that the video size requested by the access request is 1 M.
其中,URL为静态链接是指URL中不包含“?”,类似http://xxx.com/yyy/zzz.jpg;URL为动态链接是指URL中包含“?”,类似http://xxx.com/yyy/zzz.mp4?userid=aaa&key=bbb,“?”后面一般是参数或者用户相关信息。URL的全路径是指整个URL;URL的绝对路径是指URL中“?”前面的部分。Among them, the URL is static link means that the URL does not contain "?", similar to http://xxx.com/yyy/zzz.jpg; the URL is dynamic link means that the URL contains "?", similar to http://xxx .com/yyy/zzz.mp4? Userid=aaa&key=bbb, "?" is usually followed by parameters or user-related information. The full path of the URL refers to the entire URL; the absolute path of the URL refers to the part before the "?" in the URL.
需要说明的是:为了保证识别的准确性,步骤401b中采用结合格式信息以及URL的方式来确定目标样本的识别特征,进而确定网页文本类型、下载类型、音频类型以及视频类型的通用识别特征。实际上,也可以仅根据格式信息来确定目标样本的识别特征,进而确定这些资源类型的通用识别特征,此情况下的确定过程,与步骤401a中对图片类型的通用识别特征的确定过程同理。It should be noted that, in order to ensure the accuracy of the identification, the identification feature of the target sample is determined by combining the format information and the URL in step 401b, thereby determining the universal recognition feature of the webpage text type, the download type, the audio type, and the video type. In fact, the identification features of the target samples may also be determined based only on the format information, thereby determining the universal recognition features of the resource types. The determination process in this case is the same as the determination process of the universal recognition features of the picture types in step 401a. .
402、终端向互联网服务器发送对目标资源的访问请求,该访问请求携带目标资源的URL。402. The terminal sends an access request to the target resource to the Internet server, where the access request carries the URL of the target resource.
其中,该目标资源的URL为该目标资源的访问请求对应的链接。例如,目标资源的URL可以为http://xxx.com/yyy.js。The URL of the target resource is a link corresponding to the access request of the target resource. For example, the URL of the target resource can be http://xxx.com/yyy.js.
403、RSS获取终端对目标资源的访问请求,并根据该多个资源类型的通用识别特征,确定该目标资源的类型。403. The RSS obtains an access request of the terminal to the target resource, and determines a type of the target resource according to the universal identification feature of the multiple resource types.
本公开实施例中,在终端向互联网服务器发送对目标资源的访问请求时,RSS可以获取该访问请求,例如,在旁路组网方式中,RSS可以通过监听来获取该访问请求,或,在直路组网方式中,RSS可以作为代理服务器通过直接接收的方式来获取该访问请求,本公开实施例对RSS获取该访问请求的具体方式不做限定。该根据多个资源类型的通用识别特征,确定该目标资源的类型可以包括步骤403a至403c:In the embodiment of the present disclosure, when the terminal sends an access request for the target resource to the Internet server, the RSS may obtain the access request. For example, in the bypass networking mode, the RSS may obtain the access request by monitoring, or In the direct route networking mode, the RSS may be used as a proxy server to obtain the access request by means of a direct receiving manner. The specific manner of obtaining the access request by the RSS is not limited in the embodiment of the present disclosure. The determining the type of the target resource according to the universal identification feature of the plurality of resource types may include steps 403a to 403c:
403a:从该目标资源的源站对该目标资源的访问请求的响应信息中,获取该目标资源的格式信息。403a: Obtain format information of the target resource from response information of the source station of the target resource to the target resource.
其中,该目标资源的源站可以指互联网服务器,终端向互联网服务器发送对目标资源的访问请求后,该互联网服务器会响应该访问请求,如向终端发送响应信息,该响应信息包含该目标资源的格式信息。The source station of the target resource may refer to an Internet server. After the terminal sends an access request to the target resource to the Internet server, the Internet server responds to the access request, such as sending a response message to the terminal, where the response information includes the target resource. Format information.
在互联网服务器向终端发送响应信息时,RSS可以获取该响应信息,例如,在旁路组网方式中,RSS可以通过监听来获取该响应信息,或,在直路组网方式中,RSS可以作为代理服务器通过直接接收的方式来获取该响应信息,进而得到该响应信息中包含的目标资源的格式信息。例如,目标资源的格式信息可以为text/javascript。When the Internet server sends the response information to the terminal, the RSS can obtain the response information. For example, in the bypass networking mode, the RSS can obtain the response information by monitoring, or in the direct network mode, the RSS can act as a proxy. The server obtains the response information by means of direct receiving, and further obtains format information of the target resource included in the response information. For example, the format information of the target resource can be text/javascript.
在一种可能实现方式中,该目标资源的格式信息位于该访问请求的响应信息的头部,用于指示该目标资源的资源格式。In a possible implementation manner, the format information of the target resource is located at a header of the response information of the access request, and is used to indicate a resource format of the target resource.
403b:根据该目标资源的格式信息和该目标资源的URL,确定目标通用识别特征,该目标通用识别特征为与该目标资源的格式信息和该目标资源的URL匹配的通用识别特征。403b: Determine, according to the format information of the target resource and the URL of the target resource, a target universal identification feature, where the target universal recognition feature is a universal recognition feature that matches format information of the target resource and a URL of the target resource.
RSS通过步骤403a获取到目标资源的格式信息后,可以将目标资源的格式信息和URL分别与步骤401中得到的该多个资源类型的通用识别特征X1至X5进行匹配,从中确定该目标资源的通用识别特征。例如,目标资源的格式信息为text/javascript,URL为http://xxx.com/yyy.js,在依次匹配过程中,RSS可以确定通用识别特征X2中的X2b能够与该目标资源的格式信息和URL匹配,因此,RSS可以将通用识别特征X2确定为目标通用识别特征。After the RSS obtains the format information of the target resource in step 403a, the format information and the URL of the target resource are matched with the common identification features X1 to X5 of the plurality of resource types obtained in step 401, respectively, and the target resource is determined therefrom. Universal identification feature. For example, the format information of the target resource is text/javascript, and the URL is http://xxx.com/yyy.js. In the sequential matching process, the RSS may determine that the X2b in the universal identification feature X2 can be formatted with the target resource. Matching with the URL, therefore, the RSS can determine the universal identification feature X2 as the target universal recognition feature.
403c:将该目标通用识别特征对应的资源类型确定为该目标资源的类型。403c: Determine a resource type corresponding to the target universal identification feature as a type of the target resource.
由步骤403b确定的目标通用识别特征(如通用识别特征X2)对应的资源类型为网页文本类型,因此,可以确定目标资源的类型为网页文本类型。The resource type corresponding to the target universal identification feature (such as the universal recognition feature X2) determined by step 403b is a webpage text type. Therefore, it can be determined that the type of the target resource is a webpage text type.
404、RSS根据该目标资源的类型所对应的唯一标识获取规则和该目标资源的URL,获取该目标资源的唯一标识。404. The RSS obtains a unique identifier of the target resource according to a unique identifier obtaining rule corresponding to the type of the target resource, and obtains a unique identifier of the target resource.
针对步骤401获取的该多个资源类型的唯一标识获取规则,该步骤404可以包括:当该目标资源的类型为图片类型、网页文本类型、应用下载类型或音频类型时,RSS将该目标资源的URL的全路径获取为该目标资源的唯一标识;当该目标资源的类型为视频类型时,如果该目标资源的URL为静态链接或该目标资源的URL为动态链接且该目标资源的URL中不包含范围参数,则RSS将该目标资源的URL的全路径获取为该目标资源的唯一标识,如果该目标资源的URL为动态链接且该目标资源的URL中包含范围参数,则RSS将该目标资源的URL的绝对路径获取为该目标资源的唯一标识,该范围参数用于指示该访问请求所请求的数据量。For the unique identifier acquisition rule of the multiple resource types obtained in step 401, the step 404 may include: when the type of the target resource is a picture type, a webpage text type, an application download type, or an audio type, the RSS The full path of the URL is obtained as a unique identifier of the target resource; when the type of the target resource is a video type, if the URL of the target resource is a static link or the URL of the target resource is a dynamic link and the URL of the target resource is not Including the range parameter, the RSS obtains the full path of the URL of the target resource as a unique identifier of the target resource. If the URL of the target resource is a dynamic link and the URL of the target resource includes a range parameter, the RSS uses the target resource. The absolute path of the URL is obtained as a unique identifier of the target resource, and the range parameter is used to indicate the amount of data requested by the access request.
针对步骤403中的举例,该目标资源的类型为网页文本类型,该目标资源的URL为http://xxx.com/yyy.js,则将该http://xxx.com/yyy.js获取为该目标资源的唯一标识。For the example in step 403, the type of the target resource is a webpage text type, and the URL of the target resource is http://xxx.com/yyy.js, and the http://xxx.com/yyy.js is obtained. A unique identifier for the target resource.
405、RSS向CSS发送对目标资源的下载通知,该下载通知中携带该目标资源的唯一标识。405. The RSS sends a download notification to the CSS to the CSS, where the download notification carries a unique identifier of the target resource.
本公开实施例中,RSS在获取该目标资源的唯一标识后,可以采用下载通知的形式,将该目标资源的唯一标识发送给CSS,用于通知CSS下载该目标资源。In the embodiment of the present disclosure, after obtaining the unique identifier of the target resource, the RSS may send the unique identifier of the target resource to the CSS in the form of a download notification, for notifying the CSS to download the target resource.
需要说明的是:RSS可以将该下载通知经由SAS和DSS发送给CSS。It should be noted that the RSS can send the download notification to the CSS via SAS and DSS.
406、CSS接收该下载通知,并根据该目标资源的唯一标识,下载并缓存该目标资源。406. The CSS receives the download notification, and downloads and caches the target resource according to the unique identifier of the target resource.
CSS在接收到RSS发送的下载通知后,可以根据该下载通知中携带的该目标资源的唯一标识,从该目标资源的源站(互联网服务器)下载并缓存该目标资源。After receiving the download notification sent by the RSS, the CSS may download and cache the target resource from the source station (Internet server) of the target resource according to the unique identifier of the target resource carried in the download notification.
此外,CSS下载并缓存该目标资源后,可以将该目标资源的唯一标识传送给DSS,由DSS将该目标资源的唯一标识记录在资源索引中,该DSS的资源索引用于记录该CSS所有缓存的资源的唯一标识。In addition, after the CSS downloads and caches the target resource, the unique identifier of the target resource may be transmitted to the DSS, where the unique identifier of the target resource is recorded in the resource index, and the resource index of the DSS is used to record all caches of the CSS. The unique identifier of the resource.
步骤401至406是RSS首次获取到终端对目标资源的访问请求时,获取该目标资源的唯一标识,并将该目标资源的唯一标识发送给CSS,由CSS下载并缓存该目标资源的过程。Steps 401 to 406 are the process of acquiring the unique identifier of the target resource when the first access to the terminal accesses the target resource, and sending the unique identifier of the target resource to the CSS, and the CSS downloads and caches the target resource.
407、终端向互联网服务器发送对目标资源的访问请求,该访问请求携带目标资源的URL。407. The terminal sends an access request to the target resource to the Internet server, where the access request carries the URL of the target resource.
408、RSS获取终端对目标资源的访问请求,并根据该多个资源类型的通用识别特征,确定该目标资源的类型。408. The RSS obtains an access request of the terminal to the target resource, and determines a type of the target resource according to the universal identification feature of the multiple resource types.
409、RSS根据该目标资源的类型所对应的唯一标识获取规则和该目标资源的URL,获取该目标资源的唯一标识。409. The RSS obtains a unique identifier of the target resource according to the unique identifier obtaining rule corresponding to the type of the target resource, and obtains a unique identifier of the target resource.
步骤407至409与步骤402至404同理,在此不再赘述。Steps 407 to 409 are the same as steps 402 to 404, and are not described herein again.
410、RSS根据该目标资源的唯一标识,查询CSS中是否存在该目标资源。410. The RSS queries whether the target resource exists in the CSS according to the unique identifier of the target resource.
由步骤406可知,DSS的资源索引用于记录CSS所有缓存的资源的唯一标识。相应地,该步骤410可以包括:RSS向调度子系统DSS发送查询消息,该查询消息携带该目标资源的唯一标识,由DSS查询该资源索引中是否记录有该目标资源的唯一标识;如果RSS接收到DSS返回的指定应答消息,则确定CSS中存在该目标资源,该指定应答消息用于指示该DSS的资源索引中记录有该目标资源的唯一标识。It can be seen from step 406 that the resource index of the DSS is used to record the unique identifier of all cached resources of the CSS. Correspondingly, the step 410 may include: the RSS sends a query message to the scheduling subsystem DSS, where the query message carries a unique identifier of the target resource, and the DSS queries whether the unique identifier of the target resource is recorded in the resource index; if the RSS receives To the specified response message returned by the DSS, it is determined that the target resource exists in the CSS, and the specified response message is used to indicate that the unique identifier of the target resource is recorded in the resource index of the DSS.
411、如果CSS中存在该目标资源,则向终端发送重定向报文,该重定向报文携带该CSS 的地址。411. If the target resource exists in the CSS, send a redirect message to the terminal, where the redirect message carries the address of the CSS.
RSS通过步骤410确定CSS中存在该目标资源后,可以采用向终端发送重定向报文的方式,将终端对该目标资源的访问请求重定向到CSS上。After determining that the target resource exists in the CSS, the RSS may send a redirect message to the terminal to redirect the access request of the terminal to the CSS.
412、终端接收该重定向报文,并根据该CSS的地址对该目标资源进行访问。412. The terminal receives the redirect packet, and accesses the target resource according to the address of the CSS.
该步骤412可以包括:终端根据该重定向报文中携带的CSS的地址,向CSS发送对目标资源的访问请求;CSS向终端返回该目标资源。The step 412 may include: the terminal sending an access request for the target resource to the CSS according to the address of the CSS carried in the redirect message; and the CSS returns the target resource to the terminal.
步骤407至412是RSS再次获取到终端对目标资源的访问请求时,将该访问请求重定向到CSS,使得终端可以从CSS访问该目标资源的过程。Steps 407 to 412 are processes in which the RSS re-accesses the access request of the terminal to the target resource, and redirects the access request to the CSS, so that the terminal can access the target resource from the CSS.
本公开实施例提供的方法,当获取到终端对目标资源的访问请求时,RSS根据多个资源类型的通用识别特征,确定目标资源的类型,并根据该目标资源的类型所对应的唯一标识获取规则和该目标资源的URL,获取该目标资源的唯一标识,如果根据该目标资源的唯一标识,查询CSS中存在该目标资源,则向终端发送重定向报文,从而将该访问请求重定向到CSS上,使得终端可以从CSS访问该目标资源。RSS通过预先获取多个资源类型的通用识别特征和该多个资源类型的唯一标识获取规则,即可实现对目标资源的识别以及访问,解决了需要为每个网站单独开发插件,开发量大,成本高的问题。且由于该多个类型的通用识别特征根据多个资源样本统计分析得到,可以大大提高RSS对资源的识别率和终端对资源的访问效率。The method provided by the embodiment of the present disclosure, when acquiring the access request of the terminal to the target resource, the RSS determines the type of the target resource according to the universal identification feature of the multiple resource types, and obtains the unique identifier corresponding to the type of the target resource. The rule and the URL of the target resource obtain a unique identifier of the target resource. If the target resource exists in the CSS according to the unique identifier of the target resource, the redirect message is sent to the terminal, thereby redirecting the access request to On the CSS, the terminal can access the target resource from the CSS. By pre-acquiring the universal identification features of multiple resource types and the unique identification acquisition rules of the multiple resource types, the RSS can realize the identification and access of the target resources, and solves the need to separately develop plug-ins for each website, and the development amount is large. The problem of high cost. Moreover, since the plurality of types of universal identification features are obtained according to statistical analysis of multiple resource samples, the recognition rate of the resources by the RSS and the access efficiency of the resources by the terminal can be greatly improved.
图5是本公开实施例提供的一种资源访问装置的结构示意图。参照图5,该装置包括确定模块501、获取模块502、查询模块503和发送模块504。FIG. 5 is a schematic structural diagram of a resource access apparatus according to an embodiment of the present disclosure. Referring to FIG. 5, the apparatus includes a determining
该确定模块501,用于当获取到终端对目标资源的访问请求时,根据多个资源类型的通用识别特征,确定该目标资源的类型,该访问请求携带该目标资源的统一资源定位符URL,每个资源类型的通用识别特征通过分析多个资源样本得到;The determining
获取模块502,用于根据该目标资源的类型所对应的唯一标识获取规则和该目标资源的URL,获取该目标资源的唯一标识;The obtaining
查询模块503,用于根据该目标资源的唯一标识,查询缓存子系统CSS中是否存在该目标资源;The
发送模块504,用于如果该CSS中存在该目标资源,则向该终端发送重定向报文,该重定向报文携带该CSS的地址,由该终端根据该CSS的地址对该目标资源进行访问。The sending
在另一可能实施例中,该确定模块501,用于执行上述步骤403。In another possible embodiment, the determining
在另一可能实施例中,该获取模块502,用于执行上述步骤401中的任一种获取过程。In another possible embodiment, the obtaining
在另一可能实施例中,该获取模块502,用于执行上述步骤404。In another possible embodiment, the obtaining
本公开实施例中,当获取到终端对目标资源的访问请求时,RSS根据多个资源类型的通用识别特征,确定目标资源的类型,并根据该目标资源的类型所对应的唯一标识获取规则和该目标资源的URL,获取该目标资源的唯一标识,如果根据该目标资源的唯一标识,查询CSS中存在该目标资源,则向终端发送重定向报文,从而将该访问请求重定向到CSS上,使得终端可以从CSS访问该目标资源。RSS通过预先获取多个资源类型的通用识别特征和该多个资源类型的唯一标识获取规则,即可实现对目标资源的识别以及访问,解决了需要为每个网站单独开发插件,开发量大,成本高的问题。且由于该多个类型的通用识别特征根据多个资源样本统计分析得到,可以大大提高RSS对资源的识别率和终端对资源的访问效率。In the embodiment of the present disclosure, when the access request of the terminal to the target resource is obtained, the RSS determines the type of the target resource according to the universal identification feature of the multiple resource types, and obtains a rule according to the unique identifier corresponding to the type of the target resource. The URL of the target resource obtains the unique identifier of the target resource. If the target resource exists in the CSS according to the unique identifier of the target resource, the redirect message is sent to the terminal, thereby redirecting the access request to the CSS. So that the terminal can access the target resource from the CSS. By pre-acquiring the universal identification features of multiple resource types and the unique identification acquisition rules of the multiple resource types, the RSS can realize the identification and access of the target resources, and solves the need to separately develop plug-ins for each website, and the development amount is large. The problem of high cost. Moreover, since the plurality of types of universal identification features are obtained according to statistical analysis of multiple resource samples, the recognition rate of the resources by the RSS and the access efficiency of the resources by the terminal can be greatly improved.
需要说明的是:上述实施例提供的资源访问装置在资源访问时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的资源访问装置与资源访问方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that, when the resource access device provided by the foregoing embodiment is used for resource access, only the division of each functional module is described. In actual applications, the function allocation may be completed by different functional modules as needed. The internal structure of the device is divided into different functional modules to perform all or part of the functions described above. In addition, the resource access device and the resource access method embodiment provided by the foregoing embodiments are in the same concept, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器,上述指令可由资源访问装置中的处理器执行以完成上述实施例中的资源访问方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer readable storage medium comprising instructions, such as a memory comprising instructions executable by a processor in a resource access device to perform resource access in the above embodiments method. For example, the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium. The storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
以上所述仅为本公开的较佳实施例,并不用以限制本公开,凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。The above description is only the preferred embodiment of the present disclosure, and is not intended to limit the disclosure. Any modifications, equivalent substitutions, improvements, etc., which are within the spirit and principles of the present disclosure, should be included in the protection of the present disclosure. Within the scope.
Claims (10)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710056394.6 | 2017-01-25 | ||
| CN201710056394.6A CN108347460B (en) | 2017-01-25 | 2017-01-25 | Resource access method and device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018137528A1 true WO2018137528A1 (en) | 2018-08-02 |
Family
ID=62961861
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/073073 Ceased WO2018137528A1 (en) | 2017-01-25 | 2018-01-17 | Method and device for accessing resource |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN108347460B (en) |
| WO (1) | WO2018137528A1 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113784354A (en) * | 2021-09-17 | 2021-12-10 | 城云科技(中国)有限公司 | Request conversion method and device based on gateway |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109145220B (en) * | 2018-09-10 | 2022-03-29 | 北京知道创宇信息技术股份有限公司 | Data processing method and device and electronic equipment |
| CN109246229B (en) * | 2018-09-28 | 2021-08-27 | 网宿科技股份有限公司 | Method and device for distributing resource acquisition request |
| CN109168028B (en) * | 2018-11-06 | 2022-11-22 | 北京达佳互联信息技术有限公司 | Video generation method, device, server and storage medium |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102170479A (en) * | 2011-05-21 | 2011-08-31 | 成都市华为赛门铁克科技有限公司 | Updating method of Web buffer and updating device of Web buffer |
| CN102622454A (en) * | 2012-04-23 | 2012-08-01 | 杭州电子科技大学 | Video website-oriented Internet video search method based on text analysis |
| CN103384993A (en) * | 2012-12-14 | 2013-11-06 | 华为技术有限公司 | Redirection method, gateway and server for user equipment to access webpage |
| CN103841045A (en) * | 2012-11-22 | 2014-06-04 | 中国移动通信集团公司 | Internet cache processing method, content detection subsystem and Cache system |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107015996A (en) * | 2016-01-28 | 2017-08-04 | 阿里巴巴集团控股有限公司 | A kind of resource access method, apparatus and system |
-
2017
- 2017-01-25 CN CN201710056394.6A patent/CN108347460B/en active Active
-
2018
- 2018-01-17 WO PCT/CN2018/073073 patent/WO2018137528A1/en not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102170479A (en) * | 2011-05-21 | 2011-08-31 | 成都市华为赛门铁克科技有限公司 | Updating method of Web buffer and updating device of Web buffer |
| CN102622454A (en) * | 2012-04-23 | 2012-08-01 | 杭州电子科技大学 | Video website-oriented Internet video search method based on text analysis |
| CN103841045A (en) * | 2012-11-22 | 2014-06-04 | 中国移动通信集团公司 | Internet cache processing method, content detection subsystem and Cache system |
| CN103384993A (en) * | 2012-12-14 | 2013-11-06 | 华为技术有限公司 | Redirection method, gateway and server for user equipment to access webpage |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113784354A (en) * | 2021-09-17 | 2021-12-10 | 城云科技(中国)有限公司 | Request conversion method and device based on gateway |
| CN113784354B (en) * | 2021-09-17 | 2024-04-09 | 城云科技(中国)有限公司 | Request conversion method and device based on gateway |
Also Published As
| Publication number | Publication date |
|---|---|
| CN108347460B (en) | 2020-04-14 |
| CN108347460A (en) | 2018-07-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10182101B2 (en) | Method, apparatus and system for sharing webpage | |
| CN107040609B (en) | Network request processing method and device | |
| US9754113B2 (en) | Method, apparatus, terminal and media for detecting document object model-based cross-site scripting attack vulnerability | |
| US10095666B2 (en) | Method and terminal for adding quick link | |
| US10956653B2 (en) | Method and apparatus for displaying page and a computer storage medium | |
| WO2018196588A1 (en) | Information sharing method, apparatus and system | |
| CN109948090B (en) | Webpage loading method and device | |
| CN107766358B (en) | Page sharing method and related device | |
| CN109088844B (en) | Information interception method, terminal, server and system | |
| WO2014173167A1 (en) | Method, apparatus and system for filtering data of web page | |
| CN104065693A (en) | Method, device and system for accessing network data in webpage applications | |
| US20160112340A1 (en) | Method and system for resource sharing | |
| CN110020293B (en) | Multimedia data display method, device and storage medium | |
| WO2014032559A1 (en) | Method and device for downloading file | |
| WO2018137528A1 (en) | Method and device for accessing resource | |
| CN109145182B (en) | Data acquisition method and device, computer equipment and system | |
| CN105227598B (en) | Resource sharing method, device and system based on cloud storage | |
| CN113064635A (en) | Page display method and device of smart television | |
| WO2015062234A1 (en) | Mobile terminal resource processing method, device and apparatus | |
| CN110138887B (en) | A data processing method, device and storage medium | |
| CN105025064B (en) | Download the method, apparatus and system of file | |
| CN112799857A (en) | Application access method and device | |
| CN117407610A (en) | Page preloading method and device, electronic equipment and storage medium | |
| CN107798008B (en) | Content pushing system, method and device | |
| CN106331887B (en) | Calling method of webpage player, playing method and device of multimedia file |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18744426 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18744426 Country of ref document: EP Kind code of ref document: A1 |